Eva · legaltech-brain

Codex is easy to underestimate. At first glance it looks like another AI coding tool; if you’re not an engineer, a natural conclusion is that it’s not for you. That reading misses how much Codex makes possible. Picture a Monday morning: A request for a launch plan lands in your inbox. You forward it to Codex, which has its own email account, and close your laptop while Codex runs tasks in the cloud, or on a machine like a Mac Mini that you keep active. On your commute to the office, you get an email notification on your phone: Codex has read the relevant Slack threads, pulled customer notes out of Google Drive, checked last quarter’s numbers in PostHog, and started a go-to-market plan in a shared Notion document. It just needs you to confirm one detail about timing, which you do with a thumbs-up. By the time you reach your desk, a draft is waiting for review. This is a day in the life of an agent-pilled knowledge worker. It all runs on OpenAI’s agent, Codex, in the Codex desktop app. We use “Codex” to refer to the app throughout this guide. Codex is a workspace for you and your AI agents. Give Codex access to the files, apps, and tools it needs, and it gathers context, moves through the task across every surface it can reach—including your connected apps, the browser, and your computer. That makes it useful not just for code, but for a broad range of knowledge work. There are two ways to work with agents in Codex: Delegate or collaborate. Delegate tasks that are predictable, repeatable, and low-risk. With clear, well-specified instructions, the agent can execute autonomously and bring back finished work for your review. Collaborate on tasks that are judgment-heavy, exploratory, or iterative. You work alongside the model toward an outcome that matches your vision. AI progress has reached a point where expertise is easy to replicate. Each new model can do more of what used to require rare skill—which creates both more opportunity and more noise. The people who work best in this environment know how to direct AI’s capability without losing their personal judgment. They ride the models rather than being overwhelmed by them. Expert Codex users are one of the clearest examples of what that looks like in practice. This guide is about becoming one of those people. It covers how to set up a workspace, run high-leverage knowledge-work tasks, and turn repeated work into durable systems that get better over time. If you’re ready to think of your work in terms of systems instead of one-off tasks, this guide is for you. Part 1: Understanding Codex What Codex is Codex is a tool-using agentic workspace: You give it a goal and it plans the work, uses available tools and context, and produces a result for you to review. It can read and write files on your computer, connect to external services through plugins and other integrations, run multi-step tasks without asking for guidance, generate code and scripts when a task needs them, and maintain context across a persistent workspace. Specific capabilities that make Codex worth using: Works alongside you on multiple tasks in parallel Pulls context from the apps and files you connect Uses a supported browser and desktop workflows when a task needs on-screen action Checks its own work, revises, and keeps going Holds a persistent goal across a long-running session, instead of treating each message as a one-off request Turns repeatable tasks into recurring workflows Helps route shared requests from places like Slack, email, or forms Lets you start, steer, approve, and review work from your phone while Codex works in the cloud or on a machine, such as a Mac Mini, that you keep awake These capabilities make Codex useful both for delegating well-specified tasks and as a shared workspace for human-agent collaboration. Deciding which mode fits which needs is the meta-skill of modern knowledge work. A note on Goals A Goal in Codex, initiated using the /goal command, is a persistent objective that shapes an entire session rather than living and dying with a single message. Instead of re-briefing the agent on every turn, you tell it what “done” looks like, how success gets checked, and which constraints to respect. Codex then keeps working toward that outcome across interruptions and session breaks. Goals let you delegate long-horizon work, collaborate without losing the thread, and compound progress over time instead of restarting from scratch. A simple test for when to use /goal: If you’d type the same sentence into three prompts in a row—“cite every factual claim, match the house style, never send without my review”—make it a goal instead. Goals versus skills. A skill is a reusable set of packaged instructions (sometimes with scripts) that teaches Codex how to handle a recurring kind of task well. A goal, on the other hand, is what you’re trying to accomplish in a given stretch of work. It guides one session until the objective is met, then it’s done. Codex on mobile Codex also runs from your phone through the ChatGPT mobile app, remotely controlling the machine where your work is happening. The mobile app suits the lightweight parts of a workflow: You can kick off a task, answer a question, approve an action, or review a draft from anywhere. Heavier review still deserves a real screen. What Codex isn’t Codex isn’t a magic intern that can safely act without supervision. It isn’t a replacement for taste, judgment, or ownership. It isn’t a replacement for human review or fact-checking. It isn’t useful for tasks where the source data is inaccessible, the criteria for success are entirely subjective, or the stakes of an error are too high to allow autonomous action. Useful rules A task is a good candidate for Codex if it has at least two of the following traits: It requires pulling data from multiple sources. It involves repeated steps you do regularly. It can be checked against objective criteria. It produces a durable artifact—a document, a plan, a report, a script. It benefits from synthesis across many inputs. It’s annoying enough that you routinely delay or avoid it. Delegate tasks when they are: Repeatable Objective Checkable Low-risk Collaborate on tasks that are: Ambiguous Judgment-heavy Exploratory Iterative Codex, Claude Code, and Claude Cowork If you’ve used Claude Code, you already have a mental model for an agent that works on your machine. For broader knowledge work, OpenAI and Anthropic have arrived at a similar experience from different directions. Anthropic packages everything into one Claude app with three modes: Chat, Code, and Cowork. Code began as a terminal tool for developers (Claude Code) and now has a graphical version inside the app—no terminal required. It’s built for code repositories, but with the right connectors it handles a lot of general knowledge work too. Cowork takes the same engine and aims it at non-coding work, with folder access, Chrome browsing, computer use, scheduled tasks, and persistent project memory. Codex is OpenAI’s counterpart, but rather than split the work across modes, it puts coding and knowledge work in a single workspace. A few things give Codex an edge for knowledge work today: One surface, not two. Anthropic splits agentic work between Code and Cowork; Codex handles both in the same place, so you’re never deciding which mode a task belongs in. A browser that works beside you. Codex renders the pages inside the app itself as a shared view between you and the agent. The Claude app operates a stand-alone Chrome window or your full screen instead. For logged-in sites, both rely on a Chrome extension. In our experience, Codex’s built-in browser tends to be faster, more reliable, and more useful for collaborative work. Connectors out of the box. Codex comes with a catalog of connectors you authorize in a click; in the Claude app you add tools as MCP servers, which requires a bit more assembly. Which surface is right comes down to model preference and workflow habits; Codex has the edge for us today—but the labs ship fast, and that can change. The Codex knowledge work loop Every sustainable Codex workflow follows the same five-step pattern: Connect → Contextualize → Delegate/collaborate → Review → Compound Connect: Give Codex access to the systems you use for work—Gmail, Slack, Notion, Google Drive, your calendar, your analytics tools, your support platform, and/or local files. Without connected apps or source access, Codex is limited to the local/project files it can access, uploaded or linked materials, and context you provide in the thread. With connections, it can find what it needs on its own. Contextualize: Put your goals, preferences, project details, source links, review standards, and standing rules in files Codex can access, then cite those files in Codex’s AGENTS.md file to make them readily available. This is the difference between an agent that has to be re-briefed every time and one that already understands who you are, what you’re working on, and how you like to work. Delegate/collaborate: Decide whether the task needs close collaboration or can run on its own. Either way, specify inputs, output format, and acceptance criteria, then let it work. Review: Check the output in the destination app. If Codex drafted Slack messages, review them in Slack. If it wrote a strategy document, review it in your word processor of choice, such as Google Docs, Notion, or Proof. Content that looks fine in a terminal or the Codex app may read differently in the space where it will ultimately be used. Compound: Turn what works into something reusable. Save the prompt. Document the workflow. Add mistakes to your review checklist and keep your context files up to date. Each session should make future sessions faster. Part 2: Setup Connect your systems Connect the tools you want Codex to have access to. This includes Gmail, Slack, Notion, Google Drive, your calendar, analytics tools, support platforms, or anything else for which Codex has an integration. Once the relevant tools are connected, Codex can look at your actual work context and suggest workflows based on your messages, files, meetings, and recurring tasks. Connecting a tool isn’t the same thing as letting Codex act on it. Across everything you connect, Codex can read and draft while still asking for your approval before it sends, posts, archives, or deletes. That makes broad access low-risk early on: Connect generously so Codex can find workflows worth building. Then, once you know which ones you’ll keep, disconnect the tools you don’t need to reduce risk and limit unnecessary data exposure. Three ways Codex reaches your tools Codex can touch the same tool in more than one way, and knowing which access path is which saves a lot of confusion: Connectors (plugins) give Codex structured, API-level access to an app—Gmail, Slack, Notion, your analytics tools. This is the most reliable and repeatable option, so use it whenever a connector exists. Browser use lets Codex operate a web page directly through its in-app browser—useful for local previews, public pages, and anything you want to watch it do on a shared screen. For sites that require you to be signed in, like your email client, the Codex Chrome extension works inside your logged-in browser. Computer use lets Codex see and operate your desktop the way a person would—clicking through an app, changing a setting, or working with software that only exists as a graphical interface. The rule of thumb: Reach for a connector first, the browser next, and computer use when nothing else can get to the task. Starting prompt—use this once your integrations are set up: Connect to the tools I use for work: [List your tools—Gmail, Slack, Notion, Drive, etc.]. Then look at my work patterns across those tools and suggest three workflows I should set up first. For each one, describe the input sources, the output artifact, how often it should run, what approval looks like, and what would make the workflow worth keeping long-term. Once the relevant tools are connected and permissioned, this prompt lets Codex inspect the available work context and suggest automation candidates rather than forcing you to invent them. Build your Codex workspace Build Codex’s workspace before running any workflows. Skip this step and you’ll likely stall. A Codex workspace is a folder—local on your machine, synced to GitHub if you want version control—that contains the context files, workflow instructions, and review standards Codex reads at the start of each session. Think of it as an onboarding manual the agent reads at the start of each session. An example workspace structure your-workspace/ ├── README.md # Start here—orientation ├── identity/ # About you │ ├── context.md │ ├── preferences.md │ └── rules.md ├── playbooks/ # Process—repeatable workflows │ ├── workflows/ │ ├── inbox-sweep.md │ └── research-brief.md ├── sources/ # Source shelf—inputs │ ├── sources/ │ ├── key-links.md │ └── recurring-docs.md ├── outputs/ # Finished work │ ├── outputs/ │ ├── drafts/ │ └── reports/ └── reviews/ # Quality checks—guardrails ├── data-checklist.md └── writing-checklist.md What you’re doing here has a name: context engineering—a term popularized by Shopify CEO Toby Lütke and prominent AI engineer Andrej Karpathy. Getting the right context to the model at the right time accounts for at least half of its performance. At the start of each session, Codex looks at AGENTS.md, which works as the table of contents. You can write your standing instructions directly in it, but we recommend keeping AGENTS.md short and pointing it at more detailed files: context.md for who you are and what you’re working on, preferences.md for how you want the work done, and rules.md for what it may and may not do without asking. What to put in your context files context.md should cover: Your role and the function you own Active projects and their current status The tools you use daily and what each one is for The people or teams you work with most closely How decisions typically get made in your context preferences.md should cover: Writing style and tone (formal or conversational, terse or thorough) Communication preferences (what you like to review before it goes out and what can be drafted and queued without your involvement) Decision-making preferences (when to ask before acting and when to proceed and report back) rules.md should cover: What Codex may never do without explicit approval: Send, post, archive, delete, modify a source of truth, or move money What Codex may do without asking: Draft, summarize, research, outline, organize Any standing constraints specific to your work (e.g., client confidentiality rules, brand standards, data handling requirements) Starting prompt—use this to have Codex create your workspace structure: [First: Create a folder on your desktop called “Codex”] Set up this folder as a simple Codex workspace for knowledge work. Create three starter files: 1. context.md—who I am, what I’m working on, what tools I use, and who I work with 2. preferences.md—how I like work to be written, reviewed, and handled 3. rules.md—what you may do without asking, what you must ask before doing, and what you must never do Interview me one question at a time to gather the information you need to fill in each file. The “one pinned chat per project” rule The workspace folder is for your context; pinned chats are for your work. You can find the option to pin a chat next to the chat name in the app’s lefthand navigation bar. A useful habit from day one is to keep one persistent, pinned thread per project or area of responsibility—one for the product launch, one for weekly reporting, one for recruiting—rather than spinning up a fresh chat for every request. A standing thread accumulates context as you go, so Codex remembers what you have already established and you don’t have to re-explain the project each time. A pinned chat with a goal and the thread itself turns Codex into a reliable home for that stream of work. Part 3: The five levels of Codex use Codex power users don’t arrive there all at once. They get there in stages, and each stage calls for a different way of thinking about what Codex is doing and what it’s good for. Skip ahead too quickly, and you’ll get frustrated —either you don’t trust it yet, or you haven’t built the infrastructure for more autonomous work. At every level, you should know when to hand work to Codex and when to stay in the loop as its collaborator. Level 1: One-off knowledge work Mental model: Codex as a capable, thorough research and drafting assistant. Mode: Collaborate. At this level, nothing is automated. You run single-session tasks, review everything before it leaves your hands, and build familiarity with how Codex handles different types of work. Best first tasks: Summarize a meeting transcript and extract decisions, open questions, and follow-up actions. Turn scattered notes into a structured outline. Build a research brief from a set of links and documents. Rewrite a draft against a style guide. Create a review checklist for a document, launch plan, or strategy memo. Convert a written draft into an audio file for editing on the go. Use the attached [documents/links/notes] to produce [specific artifact]. Prioritize accuracy over elegance. Include source links for any factual claims. Flag anything uncertain or that requires my verification. End with the three questions I should answer before this artifact is ready to use. Review habit: Before polishing any output, ask Codex to list the assumptions it made and where it is least confident. This surfaces problems before you invest time in refinement. Move to Level 2 when: You keep wishing Codex remembered what you told it last time. Level 2: Multi-source workflows Mental model: Codex as a cross-system analyst that can assemble information you could never pull together manually in a reasonable amount of time. Mode: Collaborate. At this level, Codex can synthesize outputs from multiple connected systems—Slack threads, Notion pages, email archives, analytics dashboards, and Google Drive documents—but it still needs guidance and feedback. Example multi-source tasks: A go-to-market plan built from internal meeting transcripts, Slack discussions, customer notes, and a strategy template A weekly KPI report from analytics, revenue data, support volume, and social metrics A summary synthesized from Slack, Notion, Drive links, and past drafts A weekly leadership brief assembled from team standups, metrics, and open decisions I need [specific artifact]. Sources to use: - [Tool 1]: [what to look for there] - [Tool 2]: [what to look for there] - [Tool 3]: [what to look for there] Output format: [describe the structure you want] Before you start, give me a short plan: Identify the sources you will inspect, the artifact you will produce, any gaps or unknowns you anticipate, and the checks you will run before calling it done. If anything requires sending, posting, archiving, or modifying a source of truth, ask first. A warning about data: A one-shot attempt at pulling data from multiple systems can be wrong because of stale data, mismatched definitions, permissions gaps, or join errors. For any metric that informs business decisions or agent actions, verify column by column against your primary source. The closer a number is to a source of truth, the more carefully it needs to be checked. Make your outputs agent-readable: Plans and reports you generate in Codex will be read by other people—but also, increasingly, by their agents. Write them in plain, structured language that a human can scan and an agent can query. Clear section headers, explicit decisions, and labeled action items make the artifact useful in both directions. Move to Level 3 when: You keep running the same multi-source workflow more than once a week and wishing it happened automatically. Level 3: Repeated chores into persistent workflows Mental model: Codex as an automated operations layer that handles predictable, recurring work so you don’t have to. Mode: Hybrid. Some tasks are fully predictable and can run without back-and-forth. These tasks are ripe for delegation. Tasks that involve judgment, strategy, or creative decisions suit collaboration. A useful heuristic: If you could write a checklist that covers 90 percent of the cases, delegate it. If you would need to think about it differently each time, collaborate. In either case, look for “computer chores”—recurring tasks that take time and attention, but don’t require human judgment at every single touchpoint. Common chore candidates: End-of-day check for unanswered Slack messages and emails, with drafted replies Weekly metrics brief from analytics, revenue, and support data Meeting-note cleanup and action-item extraction after each recorded call Customer support pattern detection and issue routing Draft-to-review package that formats a piece for editor handoff Recruiting research for an open role Before building any persistent workflow, fill out this template. It becomes the instruction file Codex reads every time the workflow runs. (The workflows in Part 4 are each an example of this canvas applied.) Workflow name: Trigger or cadence: Input sources: Output artifact: Approval rules: What Codex may do without asking: What Codex must ask before doing: Verification steps: Where the final output lives: When to retire or revise this workflow: Review discipline for automated workflows: Don’t review automated output inside Codex. Draft in Codex, then review in the destination app—Slack for Slack messages, Gmail for email drafts, word processors for documents. Content that looks fine in a terminal often reads differently in the tool where it’s ultimately used, and the context switch catches things a Codex review pass would miss. Move to Level 4 when: Your prompt-based workflow hits a ceiling—the task is too complex or too custom to handle in text alone, and a small script or local tool would make it reliable. Level 4: Build small tools when prompts are not enough Mental model: Codex as a builder that creates lightweight infrastructure to make your workflows more reliable, faster, or more repeatable. Sometimes the best Codex output is a small script, a local app, a custom dashboard, or a review surface that makes a recurring workflow easier, rather than pure text. Mode: Hybrid. In some cases, Codex may generate an artifact independently for you to review and then move on. In others, the artifact it produces may become a space where you and the agent iterate together. Examples of when a small tool helps: A recurring workflow that requires pulling from an API that has no Codex integration. A short script handles the connection reliably. A review process where you need to see formatted output side by side with the source. A simple local app gives you the view. A task that needs to run on a schedule without your involvement. A script set to run on a timer (a cron job) handles the timing. A workflow that accumulates structured data over time. A lightweight database or structured file tracks it persistently. Practical approach for non-engineers: Run the task manually in Codex once to confirm the output is what you want Ask Codex: “Which steps in this workflow could be made more reliable with a small script or tool?” Have Codex prototype the tool and explain what it does in plain language Run it on your data and verify the output matches what the manual process produced Keep only the parts that reduce friction. Discard what adds complexity without benefit. You don’t need to understand every line of code to use a tool Codex built. You do need to understand what data it touches, what it produces, and where the review step is. If you can’t explain those three things, the tool isn’t ready to run autonomously. Move to level 5 when: You give Codex the same feedback repeatedly and have standing preferences that you’d prefer it to apply on its own. Level 5: Compound your Codex system Mental model: Codex as a system that can improve over time when you save useful workflows, maintain review rules, and use memories or skills to codify preferences where available. Mode: Hybrid. Some instructions will dictate how the agent approaches autonomous work; others will guide how the model interacts with you in collaboration mode. The idea of “compounding” work comes from compound engineering, the AI-native coding methodology coined by Kieran Klaassen and Nityesh Agarwal while building Cora, Every’s email client. The canonical example is a product requirements document (PRD) that writes the scaffolding for the next one: The artifact you produce becomes the tool that speeds up the next round. The four habits below are how you put it into practice as a knowledge worker, not just an engineer. Remember: Each useful session should make future sessions faster and more reliable. In practice, that requires doing four things consistently after completing any significant piece of work: 1. Save successful prompts as workflow files. When a prompt produces exactly the right output, document it. Write down the input sources, the exact prompt, the output format, and the review step. Save it in your workflows/ folder. The next time you need the same output, the agent will have that reference to work from. 2. Add mistakes to review checklists. When Codex gets something wrong—a number that was off, a tone that missed the mark, or an assumption it should not have made—add a specific check to your relevant review file, and instruct Codex to check its work against those guardrails. 3. Update your context files after major projects. When a project ends, update context.md to reflect what changed—new priorities, new tools, what worked, and what didn’t. Codex can use this when you point it to the file, turn it into a skill/workflow, or store the pattern in Codex memory where available. 4. Ask Codex to identify compounding opportunities. At the end of any session where you did something useful, run this prompt: Based on what we just did, what parts of this workflow should become a reusable skill, an automation, or a small tool? What context should I add to my project files so we don’t have to re-establish this next time? Forking for your discipline: The compound engineering plugin, Every’s open-source system for structured agent workflows, installable in Codex with one command, works for knowledge work out of the box, but its review agents are optimized for coding needs like establishing frontend patterns and reviewing for code performance. Knowledge workers can fork it into a version with reviewers tuned for strategic alignment, data accuracy, writing quality, and communication standards. A forked version, compound knowledge, is publicly available on Every’s GitHub, and is designed to be readable and editable by non-engineers. Part 4: Workflow library These workflows are meant as inspiration to get you started. Adapt the inputs, outputs, and approval rules to your specific tools and standards. 1. Inbox zero review queue Best for: Anyone whose email backlog is a recurring source of anxiety or dropped balls. Input sources: Gmail or your email client of choice. Output artifact: A structured list of draft replies, proposed actions (archive, delegate, flag), and any emails flagged for your personal attention because the draft alone isn’t sufficient. Dan Shipper kept inbox zero for 10 days straight with Codex. To use this workflow, have Codex: Gather email through Cora running in the in-app browser. Render the email queue as a single page. Go through each item with you as you dictate the action the AI should take (e.g., “research this,” “draft that,” “pull the documents our lawyers asked for.”) You can do this via chat or voice with a dictation tool like Monologue (we recommend the latter). First prompt: Go through my inbox for the past [time period]. For each email that needs a response or action: 1. Categorize it: needs reply/needs action/can archive/already handled 2. If it needs a reply, draft one in my voice using the style in preferences.md 3. If it needs action, describe the action clearly 4. Flag any email where a draft reply isn’t enough—where I need to think about this personally before responding Don’t send anything. Create drafts only. I will review in Gmail. Review step: Review all drafts in Gmail before sending. Don’t approve from inside Codex. How to compound: After a few sessions, add a rule file describing your categorization preferences—which senders always get priority, which topics can be archived without reply, and which types of requests need a human-written response. 2. Daily unanswered message roundup Best for: Anyone who communicates across Slack, email, and other channels and loses track of what still needs a response. Input sources: Slack, Gmail, any other communication tool you use. Output artifact: A list of unanswered items with drafted replies or proposed reactions, organized by urgency. First prompt: Look across my Slack and Gmail for the past 24 hours. Find everything that was directed at me that I have not responded to. For each item: 1. Draft a reply or suggest a reaction (thumbs up, etc.) if a short acknowledgment is appropriate 2. Flag items where a more considered response is needed3. Flag anything time-sensitive Present the list organized by urgency. Don’t send anything. Review step: Review in Slack and Gmail. How to compound: After a few runs, save a rules file specifying which Slack channels are high-priority, which senders always warrant a human response, and which types of messages can be handled with a reaction rather than a reply. 3. Research brief creation Best for: Anyone preparing for a meeting, a pitch, a content piece, or a strategic decision and needing a thorough, sourced summary of a topic. Input sources: Provided links, Notion, Drive, web search. Output artifact: A structured brief with background, key facts, open questions, and source links. First prompt: Build a research brief on [topic]. Sources to prioritize: [List any specific links, documents, or databases]. Structure the brief as: - Background: what I need to know to have a smart conversation about this - Key facts and data points, each with a source link - Competing perspectives or significant disagreements in the field - Open questions I should be able to answer before [meeting/decision/deadline] - Three things I should read next if I want to go deeper Flag any claims you are less than confident about. Review step: Check source links. Verify any statistics against the original source before using them. How to compound: Save a brief template in your workflows/ folder. After each brief, add any recurring sources (newsletters, databases, key authors) to your sources/key-links.md so Codex checks them by default. 4. Writing with a parallel review loop Best for: Writers who want Codex running alongside them as they draft—checking the work, flagging issues, and responding in parallel without interrupting the writing session. Input sources: Your draft (open in your word processor through Codex’s in-app browser), any relevant style guides, source documents, or review standards in your workspace. Output artifact: An annotated draft with inline feedback, flagged issues, and suggested revisions—produced continuously as you write rather than in a single pass at the end. Setup: Open your draft in Proof or the in-app browser. Start a Codex session with your workspace context loaded. Give Codex standing instructions for what to monitor and how to respond. First prompt: I am writing [describe the piece—type, audience, purpose]. As I draft, run a continuous review loop. Check for: - Claims that need a source or are stated with more confidence than the evidence supports - Passages where the argument loses clarity or the logic has a gap - Sentences that violate the style preferences in preferences.md - Anything that reads as filler, throat-clearing, or AI-generated phrasing Don’t rewrite anything without being asked. Flag issues as I go with a brief note on what the problem is and what would fix it. Check in every [X minutes / X paragraphs] or when I ask. Review step: Read the flagged issues at natural stopping points—the end of a section or session. Decide which to address and which to dismiss. Don’t let the feedback loop interrupt the drafting flow; the value is in the accumulation, not in responding to every flag in real time. How to compound: After each writing session, add any recurring flags to your reviews/writing-checklist.md. Patterns that come up repeatedly are candidates for a standing rule in your preferences file, so Codex catches them automatically next time. 5. Source management for research Best for: Writers and researchers who need to organize source material before drafting. Input sources: Links, PDFs, past drafts, notes, transcripts. Output artifact: A structured document with the core argument, supporting evidence organized by claim, counterarguments, and a gap analysis (what is still missing). First prompt: I am writing a piece on [topic]. The core argument I want to make is [argument]. Here are my source materials: [links/documents]. Build an evidence room that: 1. States the core argument clearly 2. Lists the strongest supporting evidence for each main point, with source links 3. Lists the strongest counterarguments and how I might address them 4. Identifies any gaps—claims I am making that lack strong evidence 5. Flags any sources that conflict with each other Review step: Read the evidence room before drafting. Verify any statistics or quotes you plan to use directly. How to compound: Save the evidence format as a workflow template. Add a standing note to your context file about your writing voice and recurring themes so Codex calibrates its framing. 6. Information via audio Best for: Anyone who processes information better by listening than reading, or who wants to take time away from a screen but stay on top of work. Input sources: Any written content: drafts, research briefs, meeting summaries, strategy documents, reports, lengthy emails, articles. Output artifact: An audio file saved to a location accessible from your phone (Dropbox, Drive, etc.). First prompt: Convert the attached [document/draft/report] into a clear audio file. Read it at a natural pace—not rushed, not slow. Save it to [Dropbox/Drive location] as [filename]. Review step: Listen on your commute, walk, or wherever you have time away from a screen. Take notes on your phone as things come up. Return to the source material with whatever you noticed. How to compound: Add a standing instruction to your context file covering your audio preferences—such as speed, file format, naming convention, and preferred save location—so you do not have to specify each time. You can also prompt Codex to convert content automatically at the end of certain workflows: “After generating the weekly metrics report, convert it to audio and save to [location].” 7. Go-to-market plan generator Best for: Anyone responsible for launching a product, feature, or initiative and who has done the thinking in meetings and Slack but has not had time to formalize it. Input sources: Meeting transcripts, Slack threads, customer notes, a preferred strategy template. Output artifact: A complete go-to-market plan, structured for human review and agent querying. First prompt: Build a go-to-market plan for [product/initiative]. Sources to pull from: - Meeting transcripts: [Notion location or links] - Slack discussions: [channels or search terms] - Customer research: [document or location] - Template to follow: [link or paste template] The plan should be readable by a human in five minutes and structured so that an agent can answer specific questions about it (e.g., “What is the target ICP?” “What is the launch timeline?”). Start with a compound engineering brainstorm step. Give me a draft in Proof or Notion. Flag anything in the plan you added that was not in the source material—I only want synthesis of what we have already decided, not new suggestions baked in. Review step: Review in Notion or Proof. Verify that every major claim traces to something in the source material. Anything the model added that was not in your sources should be flagged for your decision. How to compound: Save the template and the prompt. After each launch, add a retrospective note to your context file about what the plan got right and wrong. Future plans will be calibrated by past ones. 8. KPI report Best for: Anyone responsible for tracking metrics and needing a regular, reliable view across multiple data sources. Input sources: Analytics (PostHog, Mixpanel, Amplitude), revenue data (Stripe), support volume, social metrics, saved past reports. Output artifact: A one-page report covering headlines, usage metrics, system health, and follow-up items. First prompt: Generate a product pulse report for [time period]. Data sources: - Product analytics: [tool and what to pull] - Revenue: [tool and what to pull] - Support: [tool and what to pull] - Social: [tool and what to pull] Structure: 1. Headlines (three to five bullets summarizing what matters most) 2. Usage (primary engagement metric, value-realization metric, conversions, deltas vs. prior period) 3. System health (error rates, latency, top error signatures) 4. Follow-ups (one to five things worth investigating, specific enough to act on) Flag any number that differs significantly from the prior report. If something is anomalous, investigate one level deeper before including it. Review step: Verify every number in the report against its source. Don’t use this report as a business source of truth until you have confirmed accuracy column by column. In practice, one-shot metrics pulls are often five to 10 percent off—a common result of definition mismatches and join errors across multi-source pulls. How to compound: Save each report as a dated file in your outputs/reports/ folder. Over time, Codex can compare reports, identify trends, and flag when something has changed. The folder becomes the working memory of your product. 9. Customer support for product work Best for: Teams where support patterns should feed into product decisions and small fixes. Input sources: Support platform (Intercom, Zendesk), issue tracker (Linear, GitHub Issues). Output artifact: A deduplicated list of issues with suggested priority, plus small issues ready to hand off for fixes. First prompt: Go through my support queue for the past [time period]. For each support thread: 1. Identify the underlying issue or request. 2. Check whether a similar issue already exists in [Linear/GitHub Issues]. 3. If it does, link them. If it doesn’t, draft a new issue. 4. Flag any issue that appears more than [threshold] times—these are priorities. 5. For issues that appear straightforward to fix, note that they are candidates for direct implementation. Don’t create issues in the tracker yet. Give me the list to review first. Review step: Review the issue list before anything goes into the tracker. Confirm deduplication is accurate—support tickets often describe the same underlying problem in different words. How to compound: After each session, add a note about recurring issue types so Codex can categorize faster next time. Build a persistent list of known issues so deduplication improves over time. 10. Pull requests for non-engineers Best for: Anyone who needs to make a small, well-scoped change to a codebase—such as copy updates, configuration changes, or content edits—without deep engineering knowledge. Input sources: The relevant files or repository, and a clear description of the change. Output artifact: A pull request (PR) that is reviewer-friendly and doesn’t touch anything outside the intended scope. First prompt: I need to make the following change: [describe the change clearly]. Before making any changes: 1. Show me which files are affected 2. Confirm the scope of the change—nothing outside these files should be touched 3. Explain what you are going to do in plain language before doing it After making the change: 1. Summarize what was changed and why 2. List every file that was touched 3. Explain how you verified the change is correct 4. Flag anything a reviewer should look at carefully Make the smallest useful change. Don’t refactor or improve anything adjacent. Review step: Review the Codex preview before the PR is opened. Review the PR itself in GitHub or your code review tool. Ask a technical colleague to approve before merging if you are uncertain. How to compound: Save a template of your preferred PR format. After each PR, add a note about anything that requires correction so future PRs avoid the same issue. 11. Recruiting research Best for: Anyone doing outbound recruiting for a role with a specific background profile. Input sources: LinkedIn, Twitter/X, company websites, alumni databases, public professional networks. Output artifact: A list of candidates with background summaries and contact information or connection points. First prompt: I am hiring for [role]. The ideal candidate has [background profile—experience, prior companies, skills, career trajectory]. Search for candidates who match this profile. For each candidate: 1. Summarize their background in two to three sentences 2. Note why they match the profile 3. Identify any connection point (mutual connections, follows, shared affiliations) 4. Provide a link to their public profile Return the top [number] candidates, ranked by how closely they match the profile. Review step: Review each candidate before any outreach. Verify that the background summaries are accurate by checking the linked profiles. Don’t send any outreach through Codex. How to compound: Save the role profile as a template. After a successful hire, document what the actual background looked like versus the initial profile to calibrate future searches. 12. Strategy and planning agent Best for: Leaders and operators who need to compress OKR planning, quarterly planning, or strategic reviews from days to hours. Input sources: Past planning documents, meeting transcripts, leadership context notes, relevant metrics. Output artifact: A draft plan or OKR set, structured for review and iteration. First prompt: I need to draft [quarterly plan / OKR set / strategic review] for [scope]. Pull from: - Past plans: [location] - Recent meeting transcripts: [location] - Current metrics: [location or description] - Leadership context: [document or description] Structure the output as [desired format]. Flag any goal or initiative you are recommending that doesn’t have explicit support in the source material. I want synthesis of what has already been decided, not new recommendations baked in without my review. Review step: Review in Notion or Proof. Before sharing with leadership or the team, confirm that every major commitment traces to a decision that was actually made. How to compound: After each planning cycle, add a retrospective to your context file. Did the goals prove achievable? What was missing from the original plan? Future planning sessions will be informed by past ones. 13. Personal learning tool Best for: Anyone who wants to use Codex to support skill-building, practice, or self-directed learning. Input sources: External APIs, files, structured practice materials, your own notes. Output artifact: A custom interactive tool—like a tutor, a quiz, or a practice environment—built for your learning goal. Example: A musician wants to practice chord identification. They connect a MIDI keyboard and describe what they want, and Codex builds a small app that listens to what they play, identifies the chord, and tracks progress over time. First prompt: I want to build a personal learning tool for [skill or subject]. My current level: [beginner/intermediate/what I know already]. What I want to practice: [specific aspect of the skill]. How I want feedback: [immediate/after each session/scored]. Build a prototype I can use locally. Explain what it does and how to use it before I start. Review step: Try the tool on real practice material before committing to it. Verify it is actually testing what you intended. How to compound: After each practice session, ask Codex to update the tool based on what you found most and least useful. The tool improves as your needs become clearer. Part 5: Operating Codex well How to Steer Codex Operating Codex well is management work. You evaluate talent (which prompts, agents, and workflows to trust), set vision (what to point Codex at, and what “done” should look like), exercise taste (catching output that is technically correct but wrong for the moment), and know when to let be or take the wheel. Give Codex an outcome. Describe what you want to end up with, not how to get there. “Build a research brief on [topic] with these sources and this structure” produces better results than “First search Slack, then search Notion, then...” Ask for a plan before long-running work. For any task that will take more than a few minutes or touch multiple systems, ask Codex to explain what it’s about to do before it starts. This catches misunderstandings early and gives you a chance to redirect it before it gets too far along. Ask Codex what it needs before it starts. For complex tasks, a short briefing prompt saves time: Before you start, tell me what additional context would help you do this better. What are the most important things you would want to know? Require citations and audit trails for important claims. Any document that will be shared or used for decisions should have source links for factual claims. Make this a standing rule in your preferences file. Don’t over-manage every micro-step once the plan is good. Once you have confirmed the approach, let Codex work. Interrupting undermines autonomous operation and produces worse results than reviewing the completed output. Review in the destination app. Always. Set explicit no-send /no-post/no-archive/no-modify rules in your rules file. These should apply by default to any sensitive workflow. Make Codex ask before taking any action that can’t easily be undone. Three questions to ask before approving any significant output: What was the hardest decision you made in producing this? What alternatives did you consider and reject? Where are you least confident? These questions surface the judgment calls the model made, the options it dismissed, and the places most likely to contain errors. Safety, trust, and risks Risk categories Green—proceed with standard review: Summaries, outlines, internal drafts, research briefs, personal notes, low-stakes scripts. Yellow—review carefully before sharing or acting: Strategy documents, customer-support drafts, product specs, recruiting research, non-destructive data pulls, PR drafts for small changes. Red—don’t proceed without explicit human verification: Sending messages to clients or customers, changing source-of-truth data, making production code changes, moving money, legal or compliance claims, unreconciled metrics used for business decisions. Common failure modes and how to handle them Confident wrongness. Codex can state incorrect facts with high confidence. For any factual claim that matters, verify against the source. Never pass a statistic or claim to another person without checking it. Metrics errors. Joining data from multiple sources introduces definition mismatches and calculation errors. Verify column by column for any metric used in decisions. Out-of-scope changes. Codex sometimes modifies files or makes improvements adjacent to the task you assigned. Review the changes line by line (called a “diff”), not just the final output, especially for any task involving code. Automations that break. Persistent workflows stop working when tools update their APIs, credentials expire, or context files become stale. Every automation needs an owner who checks it periodically. Sever that connection—stop tending it—and the agent stops being useful. “Set it and forget it” isn’t a stable operating mode. Plugin and integration failure. Plugins and integrations need maintenance: Permissions expire, APIs change, configurations need updates, and some changes require restarting Codex. Integration failures—particularly with Notion and Gmail—happen and aren’t always obvious. If a workflow produces strange output, check whether the connection is still working before assuming the prompt is wrong. Usage limits. Long-running sessions can hit usage limits and stop mid-task. For complex workflows, break work into stages rather than attempting everything in a single session. Untrusted input. Anything Codex reads—an email, a web page, a shared document, a support ticket—can contain instructions aimed at the agent rather than at you, sometimes hidden from human eyes. If Codex is browsing untrusted sites or processing external messages while holding broad write access, those buried instructions can turn into actions—like sending data where it shouldn’t go. So keep destructive actions behind approval, and scope each workflow to the least access it needs, so a hijacked instruction has nowhere to go. The human ownership standard: Codex can touch any artifact in your workflow, but a human must direct the work, stand behind the output, and be able to discuss any specific decision in it. If someone asks you about a bullet point in a document Codex drafted, you should be able to answer. An AI-drafted document is fine—expected, even—but if someone talks it through with you and it’s clear you have no idea what’s in it, that’s a problem. Team workflows: From personal Codex to shared operating system Individual Codex workflows compound over time. Team workflows compound faster but require coordination. What changes when a team uses Codex Teams build trust in agents through the humans who operate them. When a colleague receives a document or plan that Codex drafted, they trust it to the degree they trust the person who shared it. Infrastructure that makes team Codex work Shared review surfaces. A shared document review tool (Proof, Notion, Google Docs) makes agent-generated documents easier to inspect and comment on than outputs reviewed only inside Codex. Codex-mediated routing. Teams can combine Codex threads, automations, Slack or GitHub integrations, remote connections, and app-server APIs to build routing workflows: Requests arrive in Slack, email, or another shared intake surface; Codex helps triage them, creates reviewable tasks or drafts, and routes the work to the right human or Codex workspace for execution. Each route needs clear ownership, permissions, review rules, and a source of truth. For teams doing a lot of cross-functional requests, such as legal reviews, data pulls, or copy approvals, this pattern removes significant coordination overhead. A key mechanic to making this style of work possible is giving Codex its own email address. Codex doesn’t come with one—you set it up with a tool like Nylas that gives an agent an inbox. Once it has that address, you can treat it like another teammate. Routes built on an email address still need the same discipline as any other: a clear owner, scoped permissions, and a review step before anything goes back out. Agent-readable shared documentation. Plans, strategy documents, and operational guides written for both human and agent readers become shared infrastructure. Any team member—or any team member’s agent—can query them for specific information without interrupting the author. Explicit ownership. Every persistent workflow needs a named owner. That person is responsible for monitoring output quality, updating the workflow when it breaks, and retiring it when it’s no longer useful. Automation degrades without ownership. A simple way to get a team to use Codex Don’t try to convert everyone. As a rule of thumb, a tenth of any team will adopt a new tool no matter what, a tenth never will, and the other 80 percent come along once someone shows them how it helps their own job. Aim at that 80 percent. Three things, done together, help along adoption: A note from a leader that makes using AI the expectation, not a nice-to-have A weekly meeting where anyone can show a prompt or workflow they’ve built A regular message that names the people whose work stood out Set the expectation, give people a place to share what works, and recognize them for it—that’s most of the battle. Part 6: Getting started The seven-day Codex power-user plan Day 1: Connect and inspect. Install the Codex desktop app. Connect your primary tools—Gmail, Slack, Notion, Drive, and any analytics or support tools you use. Run the workflow discovery prompt from Part 2 and review the three automation suggestions Codex returns. Don’t build anything yet. Just read the suggestions and identify which one is most useful. Day 2: Create your context files. Create your codex-workspace/ folder. Write context.md, preferences.md, and rules.md. Keep each one to one page. The goal is to capture the most important things Codex should know about you—not to be exhaustive. Day 3: Run three one-off tasks. Choose one summary task, one research brief, and one draft or plan. Use the prompt patterns from Level 1. Review each output carefully and note where Codex got things right and where it needed correction. Day 4: Build your first workflow. Take the most useful automation suggestion from Day 1 and fill out the workflow canvas from Level 3. Save it to workflows/ in your workspace. Run it once manually and verify the output. Day 5: Add review rules. Create reviews/data-checklist.md, reviews/writing-checklist.md, and reviews/comms-checklist.md. Start each one with five checks based on what you noticed during Days 3 and 4. These will grow over time. Day 6: Turn one workflow into a reusable artifact. Take the workflow from Day 4 and document the prompt, the output format, the review step, and any known edge cases. Save it as a complete workflow file. Run it again and verify the documentation is accurate. Day 7: Compound. Run the compounding prompt at the end of your Codex session: Based on everything we have done this week, what should become a reusable skill, an automation, or a small tool? What context should I add to my project files so future sessions start from a better baseline? Review Codex’s suggestions and implement the one that would save the most time over the next month. 30-day extension: Week 1: One personal workflow running reliably Week 2: One multi-source workflow pulling from at least three connected tools Week 3: One small tool or automation that handles a chore without your involvement Week 4: One shared or team workflow with explicit ownership and review cadence Start today. Connect the tools you’re comfortable permissioning and ask Codex what recurring workflows it can see from the available context. That question, and what you do with the answer, is the gateway to the Codex universe.

Codex for Knowledge Work

Article