As engineers, we’re obsessed with efficiency and automating anything we find ourselves doing more than twice. If you’ve ever done this, you know that the happy path is always easy, but the second the inputs get complex, automation becomes really hard. This is because computers have traditionally required extremely specific instructions in order to execute.
The state of AI models available to us today has changed that. We now have access to computers that can reason, and make judgement calls in lieu of specifying every edge case under the sun.
That’s what AI agents are all about.
Today we’re excited to share a few announcements on how we’re making it even easier to build AI agents on Cloudflare, including:
-
agents-sdk
— a new JavaScript framework for building AI agents -
Updates to Workers AI: structured outputs, tool calling, and longer context windows for Workers AI, Cloudflare’s serverless inference engine
-
An update to the workers-ai-provider for the AI SDK
We truly believe that Cloudflare is the ideal platform for building Agents and AI applications (more on why below), and we’re constantly working to make it better — you can expect to see more announcements from us in this space in the future.
Before we dive deep into the announcements, we wanted to give you a quick primer on agents. If you are familiar with agents, feel free to skip ahead.
Agents are AI systems that can autonomously execute tasks by making decisions about tool usage and process flow. Unlike traditional automation that follows predefined paths, agents can dynamically adapt their approach based on context and intermediate results. Agents are also distinct from co-pilots (e.g. traditional chat applications) in that they can fully automate a task, as opposed to simply augmenting and extending human input.
-
Agents → non-linear, non-deterministic (can change from run to run)
-
Workflows → linear, deterministic execution paths
-
Co-pilots → augmentative AI assistance requiring human intervention
If this is your first time working with, or interacting with agents, this example will illustrate how an agent works within a context like booking a vacation.
Imagine you’re trying to book a vacation. You need to research flights, find hotels, check restaurant reviews, and keep track of your budget.
Traditional workflow automation
A traditional automation system follows a predetermined sequence: it can take inputs such as dates, location, and budget, and make calls to predefined APIs in a fixed order. However, if any unexpected situations arise, such as flights being sold out, or the specified hotels being unavailable, it cannot adapt.

AI co-pilot
A co-pilot acts as an intelligent assistant that can provide hotel and itinerary recommendations based on your preferences. If you have questions, it can understand and respond to natural language queries and offer guidance and suggestions. However, it is unable to take the next steps to execute the end-to-end action on its own.

Agent
An agent combines AI’s ability to make judgements and call the relevant tools to execute the task. An agent’s output will be nondeterministic given: real-time availability and pricing changes, dynamic prioritization of constraints, ability to recover from failures, and adaptive decision-making based on intermediate results. In other words, if flights or hotels are unavailable, an agent can reassess and suggest a new itinerary with altered dates or locations, and continue executing your travel booking.

You can now add agent powers to any existing Workers project with just one command:
$ npm i agents-sdk
… or if you want to build something from scratch, you can bootstrap your project with the agents-starter template:
$ npm create cloudflare@latest agents-starter --template=cloudflare/agents-starter
// ... and then deploy it
$ npm run deploy
agents-sdk
is a framework that allows you to build agents — software that can autonomously execute tasks — and deploy them directly into production on Cloudflare Workers.
Your agent can start with the basics and act on HTTP requests…
import { Agent } from "agents-sdk";
export class IntelligentAgent extends Agent {
async onRequest(request) {
// Transform intention into response
return new Response("Ready to assist.");
}
}
Although this is just the initial release of agents-sdk
, we wanted to ship more than just a thin wrapper over an existing library. Agents can communicate with clients in real time, persist state, execute long-running tasks on a schedule, send emails, run asynchronous workflows, browse the web, query data from your Postgres database, call AI models, and support human-in-the-loop use-cases. All of this works today, out of the box.
For example, you can build a powerful chat agent with the AIChatAgent
class:
// src/index.ts
export class Chat extends AIChatAgent<Env> {
/**
* Handles incoming chat messages and manages the response stream
* @param onFinish - Callback function executed when streaming completes
*/
async onChatMessage(onFinish: StreamTextOnFinishCallback<any>) {
// Create a streaming response that handles both text and tool outputs
return agentContext.run(this, async () => {
const dataStreamResponse = createDataStreamResponse({
execute: async (dataStream) => {
// Process any pending tool calls from previous messages
// This handles human-in-the-loop confirmations for tools
const processedMessages = await processToolCalls({
messages: this.messages,
dataStream,
tools,
executions,
});
// Initialize OpenAI client with API key from environment
const openai = createOpenAI({
apiKey: this.env.OPENAI_API_KEY,
});
// Cloudflare AI Gateway
// const openai = createOpenAI({
// apiKey: this.env.OPENAI_API_KEY,
// baseURL: this.env.GATEWAY_BASE_URL,
// });
// Stream the AI response using GPT-4
const result = streamText({
model: openai("gpt-4o-2024-11-20"),
system: `
You are a helpful assistant that can do various tasks. If the user asks, then you can also schedule tasks to be executed later. The input may have a date/time/cron pattern to be input as an object into a scheduler The time is now: ${new Date().toISOString()}.
`,
messages: processedMessages,
tools,
onFinish,
maxSteps: 10,
});
// Merge the AI response stream with tool execution outputs
result.mergeIntoDataStream(dataStream);
},
});
return dataStreamResponse;
});
}
async executeTask(description: string, task: Schedule<string>) {
await this.saveMessages([
...this.messages,
{
id: generateId(),
role: "user",
content: `scheduled message: ${description}`,
},
]);
}
}
export default {
async fetch(request: Request, env: Env, ctx: ExecutionContext) {
if (!env.OPENAI_API_KEY) {
console.error(
"OPENAI_API_KEY is not set, don't forget to set it locally in .dev.vars, and use `wrangler secret bulk .dev.vars` to upload it to production"
);
return new Response("OPENAI_API_KEY is not set", { status: 500 });
}
return (
// Route the request to our agent or return 404 if not found
(await routeAgentRequest(request, env)) ||
new Response("Not found", { status: 404 })
);
},
} satisfies ExportedHandler<Env>;
… and connect to your Agent with any React-based front-end with the useAgent
hook that can automatically establish a bidirectional WebSocket, sync client state, and allow you to build Agent-based applications without a mountain of bespoke code:
// src/app.tsx
import { useAgent } from "agents-sdk/react";
const agent = useAgent({
agent: "chat",
});
We spent some time thinking about the production story here too: an agent framework that absolves itself of the hard parts — durably persisting state, handling long-running tasks & loops, and horizontal scale — is only going to get you so far. Agents built with agents-sdk
can be deployed directly to Cloudflare and run on top of Durable Objects — which you can think of as stateful micro-servers that can scale to tens of millions — and are able to run wherever they need to. Close to a user for low-latency, close to your data, and/or anywhere in between.
agents-sdk
also exposes:
-
Integration with React applications via a
useAgent
hook that can automatically set up a WebSocket connection between your app and an agent -
An
AIChatAgent
extension that makes it easier to build intelligent chat agents -
State management APIs via
this.setState
as well as a nativesql
API for writing and querying data within each Agent -
State synchronization between frontend applications and the agent state
-
Agent routing, enabling agent-per-user or agent-per-workflow use-cases. Spawn millions (or tens of millions) of agents without having to think about how to make the infrastructure work, provision CPU, or scale out storage.
Over the coming weeks, expect to see even more here: tighter integration with email APIs to enable more human-in-the-loop use-cases, hooks into WebRTC for voice & video interactivity, a built-in evaluation (evals) framework, and the ability to self-host agents on your own infrastructure.
We’re aiming high here: we think this is just the beginning of what agents are capable of, and we think we can make Workers the best place (but not the only place) to build & run them.
When users express needs conversationally, tool calling converts these requests into structured formats like JSON that APIs can understand and process, allowing the AI to interact with databases, services, and external systems. This is essential for building agents, as it allows users to express complex intentions in natural language, and AI to decompose these requests, call appropriate tools, evaluate responses and deliver meaningful outcomes.
When using tool calling or building AI agents, the text generation model must respond with valid JSON objects rather than natural language. Today, we’re adding JSON mode support to Workers AI, enabling applications to request a structured output response when interacting with AI models. Here’s a request to @cf/meta/llama-3.1-8b-instruct-fp8-fast
using JSON mode:
{
"messages": [
{
"role": "system",
"content": "Extract data about a country."
},
{
"role": "user",
"content": "Tell me about India."
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"capital": {
"type": "string"
},
"languages": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"name",
"capital",
"languages"
]
}
}
}
And here’s how the model will respond:
{
"response": {
"name": "India",
"capital": "New Delhi",
"languages": [
"Hindi",
"English",
"Bengali",
"Telugu",
"Marathi",
"Tamil",
"Gujarati",
"Urdu",
"Kannada",
"Odia",
"Malayalam",
"Punjabi",
"Sanskrit"
]
}
}
As you can see, the model is complying with the JSON schema definition in the request and responding with a validated JSON object. JSON mode is compatible with OpenAI’s response_format
implementation:
response_format: {
title: "JSON Mode",
type: "object",
properties: {
type: {
type: "string",
enum: ["json_object", "json_schema"],
},
json_schema: {},
}
}
This is the list of models that now support JSON mode:
We will continue extending this list to keep up with new, and requested models.
Lastly, we are changing how we restrict the size of AI requests to text generation models, moving from byte-counts to token-counts, introducing the concept of context window and raising the limits of the models in our catalog.
In generative AI, the context window is the sum of the number of input, reasoning, and completion or response tokens a model supports. You can now find the context window limit on each model page in our developer documentation and decide which suits your requirements and use case.
JSON mode is also the perfect companion when using function calling. You can use structured JSON outputs with traditional function calling or the Vercel AI SDK via the workers-ai-provider
.
One of the most common ways to build with AI tooling today is by using the popular AI SDK. Cloudflare’s provider for the AI SDK makes it easy to use Workers AI the same way you would call any other LLM, directly from your code.
In the most recent version, we’ve shipped the following improvements:
-
Tool calling enabled for generateText
-
Streaming now works out of the box
-
Usage statistics are now enabled
-
You can now use AI Gateway, even when streaming
A key part of building agents is using LLMs for routing, and making decisions on which tools to call next, and summarizing structured and unstructured data. All of these things need to happen quickly, as they are on the critical path of the user-facing experience.
Workers AI, with its globally distributed fleet of GPUs, is a perfect fit for smaller, low-latency LLMs, so we’re excited to make it easy to use with tools developers are already familiar with.
Since launching Workers in 2017, we’ve been building a platform to allow developers to build applications that are fast, scalable, and cost-efficient from day one. We took a fundamentally different approach from the way code was previously run on servers, making a bet about what the future of applications was going to look like — isolates running on a global network, in a way that was truly serverless. No regions, no concurrency management, no managing or scaling infrastructure.
The release of Workers was just the beginning, and we continued shipping primitives to extend what developers could build. Some more familiar, like a key-value store (Workers KV), and some that we thought would play a role in enabling net new use cases like Durable Objects. While we didn’t quite predict AI agents (though “Agents” was one of the proposed names for Durable Objects), we inadvertently created the perfect platform for building them.
What do we mean by that?
To be able to run agents efficiently, you need a system that can seamlessly scale up and down to support the constant stop, go, wait patterns. Agents are basically long-running tasks, sometimes waiting on slow reasoning LLMs and external tools to execute. With Cloudflare, you don’t have to pay for long-running processes when your code is not executing. Cloudflare Workers is designed to scale down and only charge you for CPU time, as opposed to wall-clock time.
In many cases, especially when calling LLMs, the difference can be in orders of magnitude — e.g. 2–3 milliseconds of CPU vs. 10 seconds of wall-clock time. When building on Workers, we pass that difference on to you as cost savings.
We took a similar serverless approach when it comes to inference itself. When you need to call an AI model, you need it to be instantaneously available. While the foundation model providers offer APIs that make it possible to just call the LLM, if you’re running open-source models, LoRAs, or self-trained models, most cloud providers today require you to pre-provision resources for what your peak traffic will look like. This means that the rest of the time, you’re still paying for GPUs to sit there idle. With Workers AI, you can pay only when you’re calling our inference APIs, as opposed to unused infrastructure. In fact, you don’t have to think about infrastructure at all, which is the principle at the core of everything we do.
Durable Objects and Workflows provide a robust programming model that ensures guaranteed execution for asynchronous tasks that require persistence and reliability. This makes them ideal for handling complex operations like long-running deep thinking LLM calls, human-in-the-loop approval processes, or interactions with unreliable third-party APIs. By maintaining state across requests and automatically handling retries, these tools create a resilient foundation for building sophisticated AI agents that can perform complex, multistep tasks without losing context or progress, even when operations take significant time to complete.
Did you catch all of that?
No worries if not: we’ve updated our agents documentation to include everything we talked about above, from breaking down the basics of agents, to showing you how to tackle foundational examples of building with agents.
We’ve also updated our Workers prompt with knowledge of the agents-sdk library, so you can use Cursor, Windsurf, Zed, ChatGPT or Claude to help you build AI Agents and deploy them to Cloudflare.
We’re just getting started, and we love to see all that you build. Please join our Discord, ask questions, and tell us what you’re building.