How Do I Get Things Done While I Sleep - My Loop Workflow

My Loop Workflow

Watch the full walkthrough on YouTube.

I used to think loops were just a fancy way of saying "keep asking the AI to do stuff."

Then I tried one properly, and it changed how I maintain Synara. Not because Codex suddenly became magic. Not because I can blindly trust an agent. But because of one small shift that makes the whole thing different:

the prompt is not just the request anymore. The prompt is the workflow.

Where It Started

The idea came from a post by @theo. He'd let Codex run through a pile of stale PRs overnight: close the useless ones, revive the out-of-date ones, give each revived PR its own thread to build it and a second thread to review it. He said it changed how he thought about loops, and that he wasn't even a loop guy before this.

The original idea traces back to a post from @steipete, so credit goes to both of them. Theo's framing was dead simple:

Tell Codex to maintain your repos. Wake up every 5 minutes and direct work to threads.

That clicked for me, so I tested it on Synara's issues and PRs. It was worth it.

Quick Context: What Synara Is

Synara is an open source desktop app I maintain. It's a GUI for agentic coding tools, basically a single place to use all your providers and subscriptions: Codex, Gemini, OpenCode, Cursor, Grok, Kilo Code, and more. We've crossed almost 5,700 downloads and we're closing in on 1,000 stars.

That also means there is always something to do. Issues, PRs, small bugs, old branches, features I want to try but never have time to start.

If you maintain an open source project, you know the feeling. The hard part isn't only writing the code. It's keeping everything moving without losing your mind. You open GitHub and there are five things that are all "small," but each one needs context, testing, review, cleanup, and a decision.

That's exactly the pile loops are good at.

My First Real Overnight Run

The first real test happened at night, around 1:40am.

I had a few Synara PRs and issues I wanted to handle, but I didn't want to sit there going one by one. So before bed, I wrote one long prompt. The idea was simple: hand Codex four PRs and let it run the whole engineering loop on each one while I slept. No fast mode, because the goal was to let it run for hours.

The prompt basically said:

create a separate worktree thread for every PR
keep every branch isolated, never mix changes
fix the issue in that worktree only, then run the tests
start a second same-worktree thread that runs /review to find problems, then fix them
refactor any duplicated logic with the proper skills so the code stays clean
push the branch and open or update the PR
tag the Codex bot and check back every 5 minutes for its findings
fix bot findings and repeat until the latest head is clean
write a final report at the end

The single most important line was this:

DO NOT MIX BRANCHES OR CHANGES.

I repeated that idea a lot. Once you run multiple agents at the same time, isolation is everything. One PR, one worktree, one thread. It sounds boring, but boring rules are exactly what make overnight automation possible. If you skip this, anything unstaged on main leaks into the new worktrees and you wake up to a mess.

What I Woke Up To

In the morning, I had four PRs done. Four issues fixed. Separate branches, separate worktrees, review passes, and PRs ready to inspect.

And, most importantly, a report explaining what happened.

That final report is underrated. When an agent runs for hours, the morning question isn't only "did it work?" The real question is "can I understand what happened without reading the entire thread?"

The report answered the things I actually needed:

what was fixed
which PR belonged to which issue
what files changed and what tests ran
which review findings were fixed
what the bot said
what was still risky or blocked

It even ended with a clean little table: per PR, is it clean, is it mergeable, what's left. Without that, you wake up to chaos. With it, you wake up to a status page. Huge difference.

The Real Workflow

This is the part that matters.

The value isn't in writing a giant prompt. The value is in encoding the actual engineering process. When I tell Codex "fix this PR," I get a patch. Sometimes good, sometimes messy. But when I describe the loop, I get something closer to how I'd actually work:

Inspect the current branch, PR state, and latest head.
Understand the issue.
Implement the fix.
Run the relevant checks.
Review the diff.
Fix review findings.
Refactor duplicated or messy logic.
Review again.
Push.
Ask the GitHub bot to review.
Watch for feedback.
Fix feedback.
Stop only when the latest head is clean.

That last line is the important one. Stop only when the latest head is clean. Not when an old review comment said it was clean three commits ago. Not when the agent "feels done." The latest pushed head has to be the thing that was reviewed.

Why Latest Head Matters

Review bots can fool you if you don't check the commit they actually reviewed.

Picture this: the bot reviews commit A and says everything's good. Then Codex pushes commit B. If you only look at the old comment, you'd think the PR is clean. But that clean review belongs to A, not B. That's fake confidence.

So now I explicitly tell Codex: do not trust stale review results. Only count a clean review if it applies to the latest pushed head. This is the kind of small detail that makes loops feel reliable instead of random.

The Prompt Pattern

This is the actual prompt I run before going to sleep:

I want you to create a separate worktree thread for every PR shown in the image.
 
Use GPT-5.5 with extra-high reasoning for every thread.
 
Important:
DO NOT MIX BRANCHES OR CHANGES.
Each PR must have its own isolated worktree, branch, and thread.
Do not touch main unless the workflow explicitly requires it.
Do not edit sibling worktrees.
Keep everything separated.
 
For every PR, start a dedicated worktree thread and run this as the /goal:
 
/goal Try to fix the issue described in this PR. First inspect the current
branch, PR state, latest head, and relevant files. Then implement the fix in
this worktree only.
 
After the fix:
1. Run the relevant tests/checks.
2. Create a new same-worktree thread and run a /review pass on the changes.
3. If the review finds issues, fix them in the same worktree.
4. After that, look for ways to clean up, optimize, or refactor the code you
   created, especially if there is duplicated logic, messy abstractions, or
   avoidable complexity. Use the appropriate skills for this.
5. Run another same-worktree /review pass after the refactor.
6. Fix any remaining review findings.
 
Once the PR is clean:
1. Push the branch.
2. Create or update the PR for that fixed work.
3. Ask codex-bot to review it.
4. Keep checking every 5 minutes for codex-bot feedback.
5. If codex-bot responds with findings, fix them, push again, and ask for
   another review.
6. Only consider the PR done when the latest pushed head has no actionable
   findings.
 
Do not trust stale review results.
Always verify that the clean review applies to the latest pushed head.
 
When a PR is fully handled, close that thread with a concise final status:
PR link
branch/worktree
what was fixed
what was changed
tests/checks run
review status
anything still risky or blocked
 
After every PR is done, create one new final report thread.
That report should explain, in a comprehensive way:
every PR handled
what changed in each one
why the changes were needed
what tests/checks/reviews were run
which branches/PRs were created
what is fully done
what, if anything, is still pending or blocked

It's not perfect, and you can obviously tighten it for your own repo. But it works because it tells the agent exactly what "done" means. That's the real trick.

Good Prompts Define Done

A vague prompt asks for output. A good loop asks for a state.

For example:

tests pass
git diff --check is clean
the PR exists
the latest head was reviewed
the bot has no actionable findings
the final report exists

Those are states. They're checkable. They make the agent far less likely to stop at a random "looks good to me" moment.

This is probably the biggest unlock for me. I don't want Codex to stop because it's tired, confused, or satisfied. I want it to stop because the workflow hit a clear condition.

What Still Goes Wrong

This isn't magic. You still need judgment, you still need to read the PR, and you still need to know when the architecture is just wrong.

I learned that one fast. With one Synara feature, the first direction was too narrow. The model was thinking about multiple Codex accounts, but the real abstraction needed to be multiple provider instances: a structure that works for Codex, Claude, Gemini, and whatever comes next. Not "account A vs account B."

In that case the right move wasn't "keep coding." It was: stop, diagnose the architecture, restart with a better goal. That's still progress. Sometimes the best loop is the one that realizes it's solving the wrong problem.

Why This Is So Good For Open Source

Open source maintenance is made of tiny loops. Review this. Fix that. Check if this PR is still valid. Update the branch. Run the test. Fix the review. Write the summary.

None of it is impossible. It's just expensive in attention, and attention is the real bottleneck. This workflow lets me spend less energy babysitting the maintenance loop and more energy on the decisions that actually matter: what gets merged, what gets rejected, and where Synara should go next.

That's where I want my brain to be. Not refreshing the same PR every five minutes waiting for a bot comment.

How I Think About Codex Now

I don't think of Codex as a chat box anymore. I think of it more like a worker that needs a production system around it.

Give it a vague task, get vague work. But give it context, boundaries, isolation rules, review loops, checks, stop conditions, and reporting requirements, and you get something much closer to real engineering output. That doesn't mean I trust it blindly. It means I give it a better process to operate inside.

If You Want To Try It

Start with one issue. Not ten. One.

Create one isolated worktree. Ask Codex to fix the issue. Then ask a second same-worktree thread to review the diff. Fix the findings. Ask the bot to review the PR. Make Codex watch for feedback. Only once that works should you scale it to multiple PRs.

The number of agents isn't the interesting part. The loop is.

One practical tip: if you're running this overnight on a Mac, the machine needs to stay awake or the agents stop. Keep something like Amphetamine running, or just leave the lid open so it doesn't sleep.

Final Thought

The crazy part isn't that Codex can work while I sleep.

The crazy part is that I can define a process before bed and wake up to work that's already moved through implementation, review, cleanup, bot feedback, and reporting. That feels less like prompting and more like designing a tiny operating system for my work.

For maintaining Synara, that's a genuine game changer. Because I don't want to spend my energy shoving maintenance tasks from one tab to another. I want to build, I want to decide, I want to keep the project moving. Loops let me do that.

Honestly, I think every open source maintainer should try it at least once.

Sources

Theo's video: I guess we're writing loops now?
Theo's post: x.com/theo
Peter's post: x.com/steipete

I'm documenting the full build-in-public journey on X/Twitter, including what I ship, what breaks, and what I learn.