Stop supervising AI in real time. The pattern that surfaced today: load up 50 tasks, go home, run the work overnight, and read a one-page brief in the morning. Five decisions, send it back out. The shift is from AI-as-tool to AI-as-staff, and the operators on the call are running the playbook six months ahead of where the platforms will be.
What follows are notes from this week's Executive AI Roundtable discussion, shared under the Chatham House Rule.
AI Workshop for CEOs
The graveyard-shift pattern, the author-vs-reviewer split, the voice rhythm — these are the operator workflows the workshop walks through with a group of 8 CEOs, plus a 1-on-1 session to map them to how your team actually works.
Reserve Your Seat →
The Graveyard Shift
One CEO has stopped working with AI in real time. He loads GitHub Issues with everything the system needs to know — 50 to 100 tasks at a time — then he drives home. He went there to escape a specific failure mode: keeping task state within the model's context costs tokens, breaks down on long horizons, and makes it impossible to hand off work between agents. Moving the state outside — into the issue tracker — lets agents read on the way in and write on the way out, without carrying the management overhead in their own memory.
His verbatim framing of the shift:
I call it a graveyard shift because it just goes and works for a while. Not on one thing, but on 50 things.
The role reversal landed harder when he named what kind of work relationship this is:
It's much more like working with a developer, a developer who's now working exceedingly fast, right? It's less like working with a coding tool where I'm the primary coder and more like working with a staff.
Because AI works so quickly, instead of waiting for human input when a choice arises, the graveyard shift generates three possible solutions to every decision. In the morning review, the CEO doesn't have to pick A, B, or C — they can pull the elements they like from each variant and ask for three more rounds based on the new constraints. The token spend is a rounding error. The decision quality is the point.
Long-running prompts can also perform intensive code review work. Point an agent at a clean repo with an open-ended directive — look for bugs, undocumented code, unused code, missing tests; when you find a bug, write a test that proves it, fix the bug, and make sure the test passes. Come back to a documented, tested, debugged change set. You don't supervise; you decide.
Brief In, Decisions Out
The graveyard shift reports its results to kick off the next day's work. The output at the end of the shift is a single document: a brief. Three parts. Here's what I did. Here's what's blocked. Here are the five decisions I need from you.
The reason the brief matters more than the work itself: without it, the CEO would have to read through hours of chronological LLM output. With it, the meeting between human and AI is calibrated to where human judgment is scarce — five high-leverage calls per session. Files for new ideas get opened automatically. Issues for blockers get updated. The next shift launches on top of the answers.
Tuning the brief is itself a piece of work. The prompt that generates it gets iterated on every shift — what would you have asked me first? — until the five questions it asks are ones only the CEO can answer. Until the brief consistently makes the operator more decisive instead of more informed, the system hasn't earned its overnight run.
The Author Is Not the Reviewer
A different sub-pattern surfaced: when the graveyard shift produces code, reviewing it yourself defeats the leverage. So the output flows through a second AI before any human looks at it. Claude Code writes the code. Codex reviews it. They iterate five to twelve rounds until the code is clean, documented, and tested. Only then does the human see it.
The principle is the one every editor knows: a writer is bad at editing their own draft. Too close to the thinking. Crossing model boundaries — from Claude to GPT, or vice versa — offers a clean context and different training data. The first quality passes are already complete by the time the human opens the PR.
For workflows where the stakes are higher than code quality — marketing copy that touches brand voice, customer-facing decisions, anything where a model's pattern-completing tendencies cost you — the same author/reviewer split pays out. Different models. Different prompts. Distance.
Voice Is the Cognitive Interface
Two operators have moved the bulk of their thinking off the keyboard and onto dictation. One prepaid Wispr Flow for a year so he'd stop shopping around. The other set up a $60 Elgato Stream Deck Mini, disabled every button except one, and labeled it dictate. The friction of remembering a keyboard shortcut was enough to keep him typing. A nice glowing button on his desk helped to make it a habit.
The argument behind the move:
AI wants more input. There are so many examples where the prompt is like 12 words. But where AI shines is when it gets 400 words.
The mechanism is dictation, not transcription quality. Think out loud at speaking speed — closer to 500 words per minute than the 90 you can type — and ship the raw stream into Claude without editing it. The model handles the noise. The repetition exposes the thinking. The yammering is a feature.
A second operator put the same point in the personal frame:
I'm just trying to be a lot more verbal, because I can talk close to my speed of thinking, closer than typing.
A participant suggested a target: 10,000 dictated words per day. That implies a different shape of working day — fewer tabs, more pacing while you talk, and a slow shift away from screen-tethered work. Think of this as an experiment, an attempt to delegate more work to your AI tools through monologue.
Static Frameworks Move Marketing Off the Page Builder
A pattern from the consumer side of the room: marketing teams and non-technical founders aren't getting the same benefits coders do. They watch the engineering side compress weeks of work with AI, then turn to update the website and hit Elementor, copy-paste, file uploads, image management — the same drag they've had for a decade.
A founder rebuilt his stack around exactly this gap. He moved his WordPress sites to Astro — a static framework Claude can read and write without translation — then built an in-browser UI layer on top so the marketing team never sees code:
it's removed all of the friction for updating my site, or I can have a customer call and immediately build a landing page, or, like, a pitch for that specific person.
Thirty seconds, customer call to landing page. The downstream play is integrations — Google Sheets for live data, Fathom for analytics. Many non-technical operators toil with drudgery because every workflow ends in a copy-paste fight with a CMS. There is an opportunity internally, or as a product, to better integrate the AI prompt with the marketing and sales tools of the trade.
The desktop-metaphor gap is real. Marketing leads excited about Claude don't know what Git is. They don't use command lines. They think in files and folders, not commits and branches. Products that translate AI capability into desktop-shaped surfaces — clickable, drag-and-drop, no terminal — unlock buyers that the current tooling stack ignores.
For the larger pattern of AI compressing team-level work down to single-operator scale, see Why Bootstrappers Now Need an AI Chief of Staff.
Talk Through the Screen as You'd Talk to a Designer
An attendee wired a prototype together in an afternoon. Tailscale to create a fake LAN between his Mac and his iPhone. Screen-sharing from the phone. A listening agent on the receiving end. He starts a call, shares his iPhone screen, and talks through the changes he wants — move the banner, expand the section, adjust the color. The agent watches the screen and listens.
The strategic bet was unambiguous:
I think this is going to be a feature in Replit and Claude Code and Codex and all these in the next 3 to 9 months.
So the question isn't whether the workflow works — it does. The question is what to do with a six-month head start when you don't have a following. One attendee named it directly:
I don't have a following of devs, I don't have a following of indie hackers, I don't have a following of normies, I don't have a following. So it's, like, that is an open question in my mind.
The real moat in AI tooling is the audience. Build a Squarespace plugin and ride someone else's distribution? Sell the workflow back to the platforms? Anyone with distribution can probably build whatever you built for less than they'd pay to license it. It's the challenge of selling AI tools. Small enough to vibe-code in a weekend, large enough to matter to anyone whose customers don't write code.
For the inverse — the B2B SaaS incumbent defending against the customer who threatens to vibe-code their product instead of renewing — see Addressing the Vibe-Coding Objection.
Under both of these patterns is the same theme: the keyboard is no longer the only interface to AI, and the people who design around that get a window before the rest of the market catches up.
Quality Is Whatever the CEO Says It Is
What does quality mean for your product?
A CEO who can't answer that question explicitly is asking the organization to guess. Engineering will optimize for one thing, marketing for another, customer support for a third. The Tesla Model 3 case read like a parable: high-end features that satisfy luxury buyers fail to satisfy mass-market customers, who measure quality by whether the car starts every morning.
John's framing of the gap:
nobody has any idea what quality means, unless you, as the CEO, say what it means, because it can mean 100 different things depending on what you value.
For SAP, quality is reliability and documentation; the people who use the UI aren't the people who see the output, so polish doesn't matter. For some consumer products, design is quality — the Apple Magic Mouse's charging port placement is a quality failure precisely because it violates what Apple's customers expect of the brand. Both definitions are correct. Different markets, different yardsticks, and the CEO sets which one applies - or at least transmits it to the rest of the organization.
This connects to the hidden gap operators face when their team is shipping AI features without a stake in the ground on what quality means for the company. Every feature ships, every feature gets measured against a different yardstick, and the CEO wonders why customer satisfaction seems low despite rapidly shipped features.
The cost of being early is real:
There's no such thing as a best practice yet, because the products aren't even finished yet, right? The innovation's still coming at a rapid pace.
Conservative buyers want a Salesforce template they can copy. There isn't one. Geoffrey Moore named the underlying structure decades ago: small businesses are uniquely bad at innovation because they can't afford the risk that hedge funds and large enterprises absorb routinely. Dharmesh Shah described HubSpot's growth inflection as the moment the product became boring enough for SMBs to trust it. AI is nowhere near boring yet. Small businesses end up forced into innovation they didn't sign up for, on tools that change weekly. The CEOs on this call largely see these challenges as the cost of investing in a new technology that's improving by the day.
For how CEO-defined quality compounds into shipping speed as an organization scales, see Growing Software Quality.
Where to Start
Here are some ideas you can try before our next roundtable.
- Move the source of truth out of the model. GitHub Issues, Linear, a shared doc — anywhere the AI can read state in and write state out without carrying it in context.
- Train the brief. Iterate on the skill that produces your morning memo until the five questions it asks are ones only you can answer.
- Cross model boundaries on review. Author with one model, review with another. The principle is distance; the implementation is whichever pair you have access to.
- Buy a button. A Stream Deck Mini, a programmable keyboard, or anything that converts a keyboard shortcut into a visible affordance. Build a habit of dictating instead of typing.
- Pick the right interface for your buyer. If your customers are non-technical, every workflow that ends in CLI or Git is a market you're not selling into. Build for a metaphor that your customers will understand, or get pushed out by something that does.
- Define quality for your organization. Write it down. Show it to engineering, marketing, and support. If they disagree on what it means, you've found your real problem.
Resources From the Roundtable
- Astro — static site framework Claude can read and write directly; foundation for the friction-free marketing-team workflow.
- Wispr Flow — voice-to-text dictation tool used in the cognitive-interface pattern; one-year subscription removes the optimization spiral.
- Stream Deck Mini — Elgato hardware (~$60) with programmable buttons; the dictate button pattern.
- Tailscale — VPN that creates a fake LAN between devices; foundation for the iPhone-screen-sharing prototype.
- GitHub Issues — the external source-of-truth layer in the graveyard-shift pattern.
- Claude Code — Anthropic's coding agent; the primary author in the author/reviewer split.
- OpenAI Codex — OpenAI's coding agent; the primary reviewer in the same split.