Working with OpenAI's Codex CLI: Commands, Agents, and Advanced Workflows
A full guide to Codex CLI that covers why it matters, how to get started, the core commands, the role of AGENTS.md, everyday workflows, long running sessions, and advanced features like MCP and CI/CD
Why Codex?
Codex may look like a simple CLI, but it is built to be much more. It works as a coding partner that can fit into different parts of your workflow, whether you are in the terminal or GUI (Graphical User Interface), running jobs in CI, or connecting it to other systems. What makes Codex stand out is how it handles iteration, keeps work reproducible, and gives you control over automation.
In this post, I will cover the core commands, show how AGENTS.md
helps guide Codex, walk through common workflows, and look at some of the more advanced features like long running sessions and integration with pipelines.
Getting Started with Codex
When you first launch Codex, you’re greeted with a simple welcome screen. You can sign in two different ways:
Sign in with ChatGPT
If you already have a ChatGPT Plus, Pro, or Team plan, Codex ties right into that subscription. It is the easiest path since everything just works out of the box.Provide your own API key
If you prefer usage-based billing or want more control over environments, you can drop in an API key. This is handy if you are running Codex inside CI/CD pipelines or keeping it separate from your main ChatGPT account.
Both options land you in the same place, a CLI that feels familiar but has a few extra powers. From here, Codex is ready to act as your coding agent, whether you are testing it locally, building agents, or wiring it into a workflow.
Codex Commands 101
Codex comes with a handful of commands that shape how you work with it. Each one has a purpose, and when you put them together you start to see the bigger picture of how Codex is meant to be used.
/model
Switch models depending on what you need. Sometimes you want speed, other times you want deeper reasoning. This lets you make that tradeoff on the fly./approvals
Set how cautious Codex should be when it comes to shell commands. You can lock it down so nothing runs without your say, or loosen it up when you trust the workflow./review
Ask Codex to look over your code and point out issues. It’s not a replacement for a full review, but it’s great for quick feedback and catching low-hanging fruit./new
Start fresh without leaving the CLI. If a session gets messy, this is a clean slate./init
Kick off anAGENTS.md
file. This is where you define how Codex should behave in your repo, which helps keep things consistent and reproducible./compact
Condense the conversation when it starts getting long. It helps Codex stay focused without losing the thread./diff
Show the changes Codex made before you commit them. It’s a simple way to keep visibility and control./mention
Pull specific files into the conversation so Codex knows exactly what you’re working on.
The commands are simple on the surface, but each one reinforces a principle: be clear about intent, keep humans in control, and make it easy to iterate without losing context.
Understanding AGENTS.md
When you use Codex or any coding agent, there is always a tension between freeform prompts and staying grounded in your project’s rules. That is where AGENTS.md comes in. It is a dedicated place, right in your repo, for giving agents the context and instructions they need to act responsibly.
What is AGENTS.md?
Think of it as an agent oriented README. It is not only for people but specifically designed to guide AI agents.
AGENTS.md complements your README. The README stays human friendly, while AGENTS.md contains things that are more relevant to automated agents such as build instructions, style rules, and security guidelines.
It is Markdown, free form. There is no rigid schema, but it is widely adopted. (agents.md)
In large or monorepo setups, you can have nested AGENTS.md files. Agents will pick the nearest file in the directory hierarchy. (agents.md)
Why use AGENTS.md in your project?
Gives your agent a predictable location for instructions. Instead of sprinkling context everywhere, it is centralized.
Prevents clutter in the README. You avoid mixing build or security rules into high level project descriptions.
It is expressive. You can include things like:
Build commands
Test commands
Style or linting rules
File structure conventions
Security caveats or data handling rules
Deployment hints or CI steps
Easy to evolve. If requirements change, update AGENTS.md and agents will adapt. (agents.md)
Minimal Example
Here is a small AGENTS.md sketch you might use in a Python repo:
# AGENTS.md
## Overview
This project is a web scraper and data processor. Use Python 3.11, target output is CSV.
## Setup commands
- `pip install -r requirements.txt`
- `poetry run pytest`
## Code style
- Use Black with default settings
- Enforce `flake8` with `E501` max line length 88
## Testing instructions
- Always run `pytest --maxfail=1 --disable-warnings -q`
- Add tests for edge cases (timeouts, empty pages)
## Security / data rules
- Never commit API keys or credentials
- Use environment variables for secrets
- Sanitize all external inputs before parsing
When an agent is asked to write code or tests, it will read that file and follow the rules you set.
How AGENTS.md works with Codex
Codex can scaffold an AGENTS.md for you with
/init
.During a session, Codex considers the closest AGENTS.md file as part of its context and will follow constraints and style rules from there.
If there are multiple AGENTS.md files, agents prefer the nearest one in the same folder or parent over global ones. (agents.md)
If instructions conflict, explicit prompts from you override what is in AGENTS.md. (agents.md)
Workflows in Practice
Commands and AGENTS.md are most useful when you put them together into everyday tasks. Here are a few simple workflows that show how Codex fits into real development.
1. Safe Refactor Loop
Ask Codex to clean up a piece of code. Start in Read Only mode so it can only suggest changes. Use /diff
to preview the edits. If you are comfortable, switch to Auto so Codex can apply the changes directly in your workspace.
This pattern gives you the best of both worlds: proposals you can review first, and automation once you are ready.
2. Generate Docs from Code
Run /init
to create an AGENTS.md that lays out your repo’s rules. Then ask Codex to generate docstrings or update your README. If the conversation gets too long, use /compact
to keep things clear. For this workflow, Auto is often enough since the changes are low risk and easy to review.
3. Quick Bug Audit
Use /review
to have Codex scan your code for potential issues. If the problem is local, narrow it down with /mention file.py
so it focuses only where you need it. Keep this in Read Only mode so Codex just points out problems instead of trying to fix them automatically.
4. When to Consider Full Access
Full Access lets Codex edit files and run commands without asking. It can be useful in controlled environments, like inside a sandbox or a CI pipeline, but it is not meant to be your default. Treat it as an advanced setting for tasks you trust Codex to handle end to end.
Long Running Contexts for Complex Tasks
Most coding assistants are built for short bursts of help. They can write a function, fix a bug, or clean up a file, but then the session ends and you have to start again. Codex is different. It can stay on a task for up to seven hours, carrying the context forward, testing its own work, and refining the result until the job is finished.
OpenAI has shown Codex running continuously for up to seven hours on a single task. That could mean taking on a large refactor, running tests until everything passes, or building out a bigger feature one step at a time. The key is that Codex does not stop after one attempt. It loops, checks itself, and keeps making progress, which feels closer to a teammate working through the details than a tool that only answers once.
Why does this matter?
Bigger scope: Codex can work on real-world problems like untangling legacy code or scaffolding an entire feature.
Iteration built in: It has the room to write, test, and fix code without you restarting the process.
Better fit for agents: A longer memory makes it easier for Codex to be part of a system where one tool handles planning and Codex handles the coding grind.
The guardrails still matter here. Approval modes and AGENTS.md become even more important, because you do not want a long process drifting into places it should not touch. But the fact that Codex can sustain this kind of run changes how you think about it. It is less of a quick helper and more of a collaborator that can stay with you on complex work.
With recent updates like GPT-5-Codex, Codex can now adjust its thinking time mid-task, scaling performance depending on complexity. This upgrade solidifies the ability to run multi-hour jobs with confidence.
Example: Building a Project From a Prompt
You can also hand Codex a clean slate and ask it to scaffold an entire project. For example, you could prompt it with “Build a simple note-taking app with a web frontend, a backend API, and local file storage.”
Codex would begin by setting up the project structure. It might create folders for the frontend, backend, and data, then generate the HTML, CSS, and JavaScript for the interface. After that, it could spin up a backend in Python or Node, define the API endpoints, and connect them to the frontend. Once the core pieces are talking to each other, Codex can move on to writing tests and adding basic documentation so the project is easier to understand.
Because it can run for hours, Codex is able to refine as it goes. If something breaks, it can retry, fix the bug, and keep moving. It does not stop after one attempt. Instead, it works through the project step by step until the foundation is solid.
This is not about producing a perfect, production-ready system. It is about getting most of the scaffolding, boilerplate, and wiring in place quickly so you can focus your effort on the parts that make the project unique.
Advanced Features
Once you are comfortable with commands and workflows, Codex has a few advanced options that open it up beyond single CLI sessions. These are the tools that let you plug Codex into bigger systems or make it part of your pipelines.
Running as an MCP Server
Codex can run as an MCP server, which means other agents or tools can call it as a capability. You can launch it with the MCP Inspector and send prompts to Codex as if it were just another service in your stack. This is useful if you are building multi agent systems or want to wire Codex into your own automation framework.
Start the server with:
npx @modelcontextprotocol/inspector codex mcp
Use the inspector to list tools, send prompts, and adjust sandbox modes.
Increase request timeouts for long tasks so Codex has room to run.
Configuration with config.toml
Codex reads defaults from a config file, usually ~/.codex/config.toml
. You can set things like which model to use, your approval mode, or sandbox settings. This keeps Codex predictable across sessions and makes it easier to share setup across a team.
Sandbox and Approval Modes
Codex supports different sandbox levels and approval modes that decide how much freedom it has. Start in read only if you want zero risk. Switch to workspace write when you trust the flow. Reserve full access for controlled environments. These modes become even more important in long sessions where Codex is making many changes over time.
Headless in CI/CD
You can run Codex headless in a pipeline instead of interacting with it directly. This works well for:
Drafting changelogs from commit history
Reviewing pull requests and pointing out risks
Generating test stubs or updating docs
The key is to treat Codex as a producer of artifacts, not something that edits your repo on its own. Write the output to files, upload them as artifacts, and let humans decide what gets merged.
Tracing and Verbose Logs
When Codex takes on long or complex tasks, you can enable tracing to see the steps it took. This gives you a paper trail of its reasoning and helps you debug when something does not go as expected.
Best Practices Recap
As you start working with Codex, a few principles make the difference between a fun demo and something you can rely on:
Write prompts around intent, not syntax. Tell Codex what you want it to achieve, not just what code to write.
Stay in control with approval modes. Start in read only, move to workspace write when you trust the workflow, and save full access for controlled cases.
Use
AGENTS.md
to set the rules. Codify your project’s style, build commands, and constraints so Codex always has the right context.Keep sessions lean. Use
/compact
when conversations get long and/new
when you need a fresh start.Think in workflows, not one-off commands. Chain commands together so Codex can help you iterate, review, and refine instead of just generating snippets.
Leverage long sessions wisely. Let Codex grind through tests, scaffolding, or migrations, but always set clear boundaries.
Use advanced features where they fit. MCP makes Codex part of larger systems, and headless runs in CI/CD are great for artifacts like changelogs, reviews, or docs.
These habits keep Codex safe, predictable, and actually useful in day-to-day work.
Closing Thoughts
Codex has grown well beyond being a simple CLI. It is now a coding partner that can adapt to the way you work, whether that means quick refactors in the terminal, long running sessions that handle complex tasks, or integrations into larger systems. With commands, AGENTS.md
, and a few practical workflows, you have everything you need to start small and build confidence. From there, advanced features like MCP and CI/CD pipelines open the door to real-world automation and scale.
The release of GPT-5-Codex makes this even more compelling. The model is better at refactoring, more reliable in code reviews, and capable of working continuously for hours on end. That combination of adaptability and persistence makes it feel less like a quick assistant and more like a true collaborator.
The best way to learn Codex is still the simplest: try it in your own projects. Set it up in a safe environment, define a few rules in AGENTS.md
, and give it a repetitive or time-consuming task. You will quickly see where it adds the most value and where you want to stay more hands-on.