From Chatbots to Digital Workers
Building Autonomous Infrastructure with Computer Science Principles
From chat to work
The shift is small in words, large in consequence:
yesterday → "answer my question"
today → "finish this job"
A chatbot returns text. A digital worker returns a completed artifact — a draft, a ticket, a report, a closed loop.
A paradigm shift
| Messages API | Managed Agents | |
|---|---|---|
| Infrastructure | You build the loop, manage the sandbox, handle tools by hand. | Pre-built agent harness running in managed cloud infrastructure. |
| State & Memory | Stateless. You resend the whole story every time. | Stateful sessions. Filesystem and history survive sleep. |
| Capability | Answers and fine-grained control. | Long-running async work with built-in tools (Bash, files, web). |
From hand-rolled loops over stateless prompts to managed, stateful agents that finish.
Six tools on the table
Agent
The job description. Who they are, what they're allowed to touch.
Environment
The private office. Clean desk, locked doors, pre-installed software.
Session
The workday. Starts, takes breaks, comes back with the papers still on the desk.
Skills
A table of contents — not a textbook. Read only the chapters you need.
Vaults
A safe deposit box. Agent knows the lock; the session brings the key.
Outcomes
The grader. Checks the work against the rubric until it's right.
The next six slides go one tool at a time. Analogy first. Code second.
The Agent — a job description
Think: hiring paperwork
An Agent is who the worker is and what tools the role is allowed to touch. Same Agent can be hired into many jobs — the description doesn't change between shifts.
Label: create an agent
curl -X POST https://api.anthropic.com/v1/agents \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2026-04-01" \
-H "content-type: application/json" \
-d '{
"name": "social-asset-generator",
"model": {"id": "claude-opus-4-7"},
"system": "You draft social posts...",
"tools": [{"type": "agent_toolset_20260401"}]
}'
One Agent definition, versioned and reused. The brain, separated from any single task.
The Environment — a private office
Think: a clean desk in a locked room
A pre-built workspace with the right software already installed and locked doors to systems the worker shouldn't touch. Same room shape, fresh for every workday.
Label: create an environment
curl -X POST https://api.anthropic.com/v1/environments \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2026-04-01" \
-H "content-type: application/json" \
-d '{
"os": "ubuntu-22.04",
"packages": ["python3.12", "pandas2.2.0"],
"networking": "limited",
"allowed_hosts": ["api.internal-data.com"]
}'
A reproducible container — secure, isolated, predictable. Your core systems stay untouched.
The Session — a workday
Think: a desk that remembers
A worker clocks in, does the job, takes a break — and when they return, the papers are still on the desk. Sessions checkpoint when idle and resume exactly where they left off.
Label: start a session
curl -X POST https://api.anthropic.com/v1/sessions \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2026-04-01" \
-d '{"agent_id": "agt_...", "environment_id": "env_..."}'
# send a message
curl -X POST https://api.anthropic.com/v1/sessions/$ID/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-d '{"content": "Draft next weeks campaign."}'
# container checkpoints on idle. resume tomorrow.
Long jobs don't need to fit in one conversation. State survives sleep.
Skills — a table of contents, not a textbook
Think: scanning the index
Skills are folders of expertise. The agent scans the titles, opens only the chapters it needs, ignores the rest. The whole library is available; the context window stays light.
Label: attach a skill
curl -X PATCH https://api.anthropic.com/v1/agents/$AGENT_ID \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2026-04-01" \
-d '{
"version": 3,
"skills": [
{"type": "anthropic", "skill": "docx"},
{"type": "custom", "skill_id": "skl_brand_voice"}
]
}'
Progressive disclosure: load on demand, not all at once. Deep expertise without token bloat.
Vaults — a safe deposit box
Think: the lock vs the key
The Agent knows the shape of the lock — it knows it needs Slack. The Session brings the user's actual key. Build the product once; serve thousands of users without ever co-mingling their credentials.
Label: store a credential, then use it
# store the user's credential in a vault
curl -X POST https://api.anthropic.com/v1/vaults/$VAULT_ID/credentials \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-d '{"name": "slack_oauth", "value": "xoxb-..."}'
# attach it at session creation — agent never sees the secret
curl -X POST https://api.anthropic.com/v1/sessions \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-d '{"agent_id": "agt_...", "vault_ids": ["vlt_steve_slack"]}'
Manage your product at the agent level. Manage your users at the session level.
Outcomes — replace the back-and-forth
Direct prompt
You ask. It answers. You read it, decide if it's right, and re-ask until it is.
You are the grader. You can't go to dinner.
Outcome
You state the rubric once. An independent grader checks each draft and sends it back until it passes.
You read the final draft only.
Label: define an outcome
curl -X POST https://api.anthropic.com/v1/sessions/$ID/outcomes \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2026-04-01" \
-d '{
"rubric": "10 LinkedIn posts. Each under 280 chars. Each ends with a question.",
"max_iterations": 5
}'
# returns: status = satisfied | needs_revision | max_iterations_reached
Conversation becomes work the moment you can name "done."
Webhooks — call me when it's done
Think: a tap on the shoulder
You don't sit and wait. You hand off the job, go to dinner, and the agent calls you back when the artifact is ready. Hours of work happen in the background.
Label: register a webhook + receive it
# tell the platform where to call you
curl -X POST https://api.anthropic.com/v1/webhooks \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-d '{"url": "https://yourapp.com/hook", "events": ["session.outcome.satisfied"]}'
# later, you receive:
# POST https://yourapp.com/hook
# { "id": "evt_...", "type": "session.outcome.satisfied", "session_id": "sess_..." }
# fetch the artifact with a GET on receipt.
Long jobs no longer block humans. Work that finishes itself.
Permissions — how much rope?
- Always ask — human approves every action — training wheels
- Ask once — approve at the start of a session, then run free
- Always allow — read-only or well-tested tasks — full autonomy
Label: set a permission policy
curl -X PATCH https://api.anthropic.com/v1/sessions/$ID/tools/slack \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-d '{"permission_policy": "always_ask"}'
Trust isn't binary. Turn it tool-by-tool, agent-by-agent, as confidence grows.
A crew, not a soloist
Manager
└── Coordinator Agent
├── Drafter Agent # writes the post copy
└── Reviewer Agent # checks tone + brand
Coordinator
The manager. Splits the work, hands out tasks, gathers results.
Specialists
Each agent has its own job description, tools, and rubric.
Shared substrate
Same office, same files. Agents work in parallel without stepping on each other.
Parallelization and specialization. More hands, sharper work.
Steve ships a week of campaigns over the weekend
The human story
- Friday 5:30pm — Steve kicks off the agent: "Draft next week's launch posts for LinkedIn, X, and Instagram."
- 5:35pm — He closes his laptop and goes home for the weekend.
- Monday 8:00am — His phone buzzes. The agent is done.
- 8:10am — He reviews 15 posts and 5 images. Approves them. Coffee.
Under the hood
POST /sessionswith the social-asset-generator agent- Session checkpoints when Steve disconnects
- Grader returns
satisfiedon iteration 3 - Webhook fires; Steve gets the artifact link
The agent worked most of the weekend. Steve worked ten minutes.
What just changed
Work that finishes itself. Permissions you control. Memory that survives the meeting.
Ten patterns you can ship Monday
Each is a pre-built starting point — a job description, a toolbelt, a system prompt — ready to clone and customize.
- Research — Blank Agent · Deep Researcher · Structured Extractor
- Marketing & Ops — Social Asset Generator · Sprint Retro · Field Monitor
- Customer — Support Agent · Support-to-Eng · Contract Tracker · Data Analyst
Three groups. Steal what fits.
Turn information into answers
Blank Agent
The core toolset, nothing more. A foundation to build any custom agent from scratch.
no MCP
Deep Researcher
Breaks a question into sub-questions, hunts authoritative sources, synthesizes with citations.
no MCP
Structured Extractor
Messy text in, typed JSON out. Validated against your schema.
no MCP
Best when the input is text or web data and the output is structured truth.
Ship the recurring work
Social Asset Generator
Drafts posts across platforms, generates images, schedules the week.
Figma · Buffer · Slack
Sprint Retro Facilitator
Pulls a closed sprint, synthesizes themes, writes the retro doc before the meeting.
Linear · Slack
Field Monitor
Scans blogs on a topic, writes a weekly "what changed" brief.
Notion
Best when work spans multiple tools and happens on a cadence.
Closer to the customer
Support Agent
Answers from docs and the knowledge base. Escalates when it's stuck.
Notion · Slack
Support-to-Eng Escalator
Reads an Intercom thread, reproduces the bug, files a Jira ticket with repro steps.
Intercom · Atlassian · Slack
Contract Tracker
Extracts clauses, sets deadline reminders, tracks obligations in Asana.
Box · Asana
Data Analyst
Loads, explores, visualizes. Answers ad-hoc questions from datasets.
Amplitude
Best when there's a human on the other end waiting for an answer.
Social Asset Generator — the full template
Label: social-asset-generator.yaml
name: Social asset generator
model: claude-sonnet-4-6
system: |
You draft a week of social posts
across LinkedIn, X, and Instagram
with images and schedules them.
1. Read the brand brief
2. Draft posts per platform tone
3. Generate images in Figma
4. Schedule via Buffer
5. Notify the team in Slack
mcp_servers:
- figma
- buffer
- slack
tools:
- agent_toolset_20260401
Why this template
claude-sonnet-4-6 — fast and cost-effective. This work is volume, not depth.
Three MCP servers — the toolbelt is the whole point. Each one is a tab a marketer would otherwise switch between.
Numbered system prompt — five clear steps. The agent has a playbook, not a vibe.
Clone it. Swap Buffer for your scheduler. Ship Monday.
You came for chatbots.
