Orbit | Introducing Nightshift: An Autonomous Overnight Agent That Actually Works

Most "autonomous coding agents" need you to sit and watch them. They drift off-task after a few minutes. They overwrite files they shouldn't. They produce changes you can't audit.

We built Nightshift to solve this. It's an open-source Python orchestrator that runs AI agents overnight — unattended, for hours — with real enforcement, not prompt discipline. You go to sleep. It works. You wake up to a reviewed worktree, a shift log, and a machine-readable record of what it actually did.

935 tests. 80 PRs merged. 28 modules. MIT license. Available on GitHub now.

The Problem with Autonomous Agents

The current generation of AI coding tools — Claude Code, Codex, Cursor — are powerful in interactive mode. You ask, they deliver. But leave them running autonomously and things break down fast.

The failure modes are predictable:

Tunnel vision. The agent fixates on one file or one pattern, producing 40 commits that all touch the same module.
No guardrails. It modifies lockfiles, deletes files it shouldn't, or drifts into infrastructure code.
No audit trail. You come back to a diff with no context on why changes were made or what was verified.
No verification. The agent "fixes" code that doesn't build anymore.

These aren't model limitations. They're orchestration failures. The models are capable — but they need a control plane that enforces policy, tracks state, and knows when to stop.

Two Loops, One Daemon

Nightshift ships with two autonomous loops and a unified daemon that coordinates them.

Loop 1: Hardening. The agent reads your repo's instructions and shift log, picks a discovery strategy (security, error resilience, test coverage, accessibility, code hygiene, performance, or production polish), and finds issues. Small fixes get committed. Larger issues get logged for human review. After each fix, the runner executes your verification command — pnpm build, cargo check, pytest, whatever your repo uses. Failed verification reverts the cycle.

Loop 2: Feature Builder. Give it a natural-language feature description and it profiles your repo, decomposes the work into buildable waves, coordinates sub-agents, and maintains build state. You can pause, check status, and resume. The full CLI surface:

python3 -m nightshift plan "add rate limiting to the API"
python3 -m nightshift build "add rate limiting to the API" --yes
python3 -m nightshift build --status
python3 -m nightshift build --resume

Both loops run through the same runner, same verification gates, and same policy enforcement. The only difference is what the agent is doing.

Real Enforcement, Not Prompt Discipline

The original version of Nightshift relied on prompts to keep the agent in line. It worked sometimes. The current version has a Python orchestrator with typed configs, verification gates, cycle state tracking, and halt conditions.

Seven verification stages run after every cycle:

Commit + shift log update included? No commit, no credit.
Touched blocked files or lockfiles? Instant rejection.
Repo verification command pass? pnpm build, cargo check — whatever you configure.
Deleted files? Zero tolerance.
Balanced across categories and paths? Anti-tunnel-vision steering.
Exploring different codebase areas? Breadth enforcement.
Prompt or control-file modifications? Flagged explicitly.

Failures revert or halt the shift. The runner enforces policy — the agent provides intelligence.

Pluggable Agents

Nightshift doesn't lock you into one model. The same runner, same verification, and same policy engine work with both Codex and Claude Code. The only difference is the CLI adapter.

python3 -m nightshift run --agent claude
python3 -m nightshift run --agent codex

Adding a new agent means writing one adapter module. The orchestrator doesn't care which model is behind the CLI — it cares about verification results.

What You Wake Up To

Everything happens in an isolated git worktree. Your working directory is untouched. In the morning, you get four artifacts:

docs/Nightshift/YYYY-MM-DD.md — Human-readable shift log with executive summary, numbered fixes with reasoning, logged issues that exceeded autonomous scope, and recommendations.
docs/Nightshift/YYYY-MM-DD.state.json — Machine-readable cycle state: cycle counts, categories touched, files changed, verification status, halt reasons. Audit a shift programmatically.
docs/Nightshift/YYYY-MM-DD.runner.log — Raw runner output: every orchestrator decision, verification result, and policy check.
nightshift/YYYY-MM-DD branch — Isolated review branch with atomic, prefixed commits. Cherry-pick individual fixes or merge the whole thing.

The morning review is simple:

cat docs/Nightshift/2026-04-06.md
git log nightshift/2026-04-06 --oneline
git merge nightshift/2026-04-06

Five Daemon Roles

Beyond the two loops, Nightshift runs a unified daemon that auto-selects from five rotating roles:

Builder — Writes code and commits fixes.
Reviewer — Audits existing changes and open PRs.
Overseer — Process auditor that checks task queue health and session fidelity.
Strategist — Produces strategic reports on codebase health and priorities.
Achiever — Drives task completion across the queue.

The daemon maintains task queues, documentation, learnings databases, and cost tracking between sessions. Handoffs (docs/handoffs/) carry context between sessions. Learnings (docs/learnings/) accumulate hard-won knowledge — things like "mypy rejects .get() on required TypedDict fields" or "sessions die at 500 max turns without warning."

The system manages its own operations.

Security Hardening

Running an agent autonomously for 8 hours means taking security seriously. Nightshift includes:

After-task injection protection via environment variables
PR title sanitization against adversarial input
XML boundary escaping for pentest reports
Self-modification guard with snapshot recovery
Watchdog service with rate-limited auto-restart
Blocked paths and globs to keep the agent away from infrastructure code

The runner starts each session with an internal red-team preflight. Exploit paths and brittle automation edges are surfaced before the agent writes any code.

Configuration

Drop a .nightshift.json in your repo root to override defaults. If verify_command is omitted, Nightshift infers one from package.json, Cargo.toml, go.mod, or pyproject.toml.

{
  "agent": "claude",
  "hours": 8,
  "cycle_minutes": 30,
  "verify_command": null,
  "blocked_paths": [".github/", "deploy/"],
  "blocked_globs": ["*.lock", "pnpm-lock.yaml"],
  "max_fixes_per_cycle": 3,
  "score_threshold": 3
}

Default: 8 hours, 30-minute cycles. Multi-repo support runs a full shift on each repo sequentially:

python3 -m nightshift multi /repo1 /repo2 --agent claude

The AI Coding Toolchain Wars: Why Every Company Wants to Own Your Workflow — Context on the autonomous agent landscape and where Codex and Claude Code are heading.
The Plan-Execute-Critique Loop — The architectural pattern behind Nightshift's cycle design.
AI Code Security: The Vulnerabilities Nobody's Talking About — Why autonomous agents need real security guardrails, not just prompt warnings.

Install and Run Tonight

One command:

curl -sL https://raw.githubusercontent.com/Recusive/Nightshift/main/scripts/install.sh | bash

Then:

python3 -m nightshift run --agent claude

Open source. MIT license. 935 tests passing across 28 modules. Not a script — an engineering system.

View on GitHub or explore all Orbit skills.

Sources & Further Reading

Nightshift on GitHub — Source code, documentation, and installation
Orbit Skills — Browse the full skill catalog including Nightshift
Nightshift product page — Technical details, architecture, and roadmap

Introducing Nightshift: An Autonomous Overnight Agent That Actually Works

The Problem with Autonomous Agents

Two Loops, One Daemon

Real Enforcement, Not Prompt Discipline

Pluggable Agents

What You Wake Up To

Five Daemon Roles

Security Hardening

Configuration

Install and Run Tonight

Sources & Further Reading

Related articles

Introducing OPS: The System That Makes AI Write Code Like a Senior Engineer

Measure Once: Rebuilding Chat Virtualization for AI-Era Apps

The AI Coding Toolchain Wars: Why Every Company Wants to Own Your Workflow

The Problem with Autonomous Agents

Two Loops, One Daemon

Real Enforcement, Not Prompt Discipline

Pluggable Agents

What You Wake Up To

Five Daemon Roles

Security Hardening

Configuration

Related Reading

Install and Run Tonight

Sources & Further Reading

Related articles

Introducing OPS: The System That Makes AI Write Code Like a Senior Engineer

Measure Once: Rebuilding Chat Virtualization for AI-Era Apps

The AI Coding Toolchain Wars: Why Every Company Wants to Own Your Workflow