Core components
Streaming
LangChain implements a streaming system to surface real-time updates.
Streaming is crucial for enhancing the responsiveness of applications built on LLMs. By displaying output progressively, even before a complete response is ready, streaming significantly improves user experience (UX), particularly when dealing with the latency of LLMs.
Overview
LangChain's streaming system lets you surface live feedback from agent runs to your application.
What's possible with LangChain streaming:
- Stream agent progress — get state updates after each agent step.
- Stream LLM tokens — stream language model tokens as they're generated.
- Stream custom updates — emit user-defined signals (e.g.,
"Fetched 10/100 records"). - Stream multiple modes — choose from
updates(agent progress),messages(LLM tokens + metadata), orcustom(arbitrary user data).
Agent progress
To stream agent progress, use the stream or astream methods with stream_mode="updates". This emits an event after every agent step.
For example, if you have an agent that calls a tool once, you should see the following updates:
- LLM node:
AIMessagewith tool call requests - Tool node:
ToolMessagewith execution result - LLM node: Final AI response
from langchain.agents import create_agent
def get_weather(city: str) -> str:
"""Get weather for a given city."""
return f"It's always sunny in {city}!"
agent = create_agent(
model="gpt-5-nano",
tools=[get_weather],
)
for chunk in agent.stream(
{"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
stream_mode="updates",
):
for step, data in chunk.items():
print(f"step: {step}")
print(f"content: {data['messages'][-1].content_blocks}")step: model
content: [{'type': 'tool_call', 'name': 'get_weather', 'args': {'city': 'San Francisco'}, 'id': 'call_OW2NYNsNSKhRZpjW0wm2Aszd'}]
step: tools
content: [{'type': 'text', 'text': "It's always sunny in San Francisco!"}]
step: model
content: [{'type': 'text', 'text': 'It's always sunny in San Francisco!'}]LLM tokens
To stream tokens as they are produced by the LLM, use stream_mode="messages". Below you can see the output of the agent streaming tool calls and the final response.
from langchain.agents import create_agent
def get_weather(city: str) -> str:
"""Get weather for a given city."""
return f"It's always sunny in {city}!"
agent = create_agent(
model="gpt-5-nano",
tools=[get_weather],
)
for token, metadata in agent.stream(
{"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
stream_mode="messages",
):
print(f"node: {metadata['langgraph_node']}")
print(f"content: {token.content_blocks}")
print("\n")node: model
content: [{'type': 'tool_call_chunk', 'id': 'call_vbCyBcP8VuneUzyYlSBZZsVa', 'name': 'get_weather', 'args': '', 'index': 0}]
node: model
content: [{'type': 'tool_call_chunk', 'id': None, 'name': None, 'args': '{"', 'index': 0}]
node: model
content: [{'type': 'tool_call_chunk', 'id': None, 'name': None, 'args': 'city', 'index': 0}]
node: model
content: [{'type': 'tool_call_chunk', 'id': None, 'name': None, 'args': '":"', 'index': 0}]
node: model
content: [{'type': 'tool_call_chunk', 'id': None, 'name': None, 'args': 'San', 'index': 0}]
node: model
content: [{'type': 'tool_call_chunk', 'id': None, 'name': None, 'args': ' Francisco', 'index': 0}]
node: model
content: [{'type': 'tool_call_chunk', 'id': None, 'name': None, 'args': '"}', 'index': 0}]
node: model
content: []
node: tools
content: [{'type': 'text', 'text': "It's always sunny in San Francisco!"}]
node: model
content: []
node: model
content: [{'type': 'text', 'text': 'Here'}]
node: model
content: [{'type': 'text', 'text': ''s'}]
node: model
content: [{'type': 'text', 'text': ' what'}]
node: model
content: [{'type': 'text', 'text': ' I'}]
node: model
content: [{'type': 'text', 'text': ' got'}]
node: model
content: [{'type': 'text', 'text': ':'}]
node: model
content: [{'type': 'text', 'text': ' "'}]
node: model
content: [{'type': 'text', 'text': "It's"}]
node: model
content: [{'type': 'text', 'text': ' always'}]
node: model
content: [{'type': 'text', 'text': ' sunny'}]
node: model
content: [{'type': 'text', 'text': ' in'}]
node: model
content: [{'type': 'text', 'text': ' San'}]
node: model
content: [{'type': 'text', 'text': ' Francisco'}]
node: model
content: [{'type': 'text', 'text': '!"\n\n'}]Custom updates
To stream updates from tools as they are executed, you can use get_stream_writer.
from langchain.agents import create_agent
from langgraph.config import get_stream_writer
def get_weather(city: str) -> str:
"""Get weather for a given city."""
writer = get_stream_writer()
# stream any arbitrary data
writer(f"Looking up data for city: {city}")
writer(f"Acquired data for city: {city}")
return f"It's always sunny in {city}!"
agent = create_agent(
model="claude-sonnet-4-5-20250929",
tools=[get_weather],
)
for chunk in agent.stream(
{"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
stream_mode="custom"
):
print(chunk)Looking up data for city: San Francisco
Acquired data for city: San FranciscoIf you add get_stream_writer inside your tool, you won't be able to invoke the tool outside of a LangGraph execution context.
Stream multiple modes
You can specify multiple streaming modes by passing stream mode as a list: stream_mode=["updates", "custom"]:
from langchain.agents import create_agent
from langgraph.config import get_stream_writer
def get_weather(city: str) -> str:
"""Get weather for a given city."""
writer = get_stream_writer()
writer(f"Looking up data for city: {city}")
writer(f"Acquired data for city: {city}")
return f"It's always sunny in {city}!"
agent = create_agent(
model="gpt-5-nano",
tools=[get_weather],
)
for stream_mode, chunk in agent.stream(
{"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
stream_mode=["updates", "custom"]
):
print(f"stream_mode: {stream_mode}")
print(f"content: {chunk}")
print("\n")stream_mode: updates
content: {'model': {'messages': [AIMessage(content='', response_metadata={'token_usage': {'completion_tokens': 280, 'prompt_tokens': 132, 'total_tokens': 412, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 256, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-5-nano-2025-08-07', 'system_fingerprint': None, 'id': 'chatcmpl-C9tlgBzGEbedGYxZ0rTCz5F7OXpL7', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--480c07cb-e405-4411-aa7f-0520fddeed66-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'San Francisco'}, 'id': 'call_KTNQIftMrl9vgNwEfAJMVu7r', 'type': 'tool_call'}], usage_metadata={'input_tokens': 132, 'output_tokens': 280, 'total_tokens': 412, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 256}})]}}
stream_mode: custom
content: Looking up data for city: San Francisco
stream_mode: custom
content: Acquired data for city: San Francisco
stream_mode: updates
content: {'tools': {'messages': [ToolMessage(content="It's always sunny in San Francisco!", name='get_weather', tool_call_id='call_KTNQIftMrl9vgNwEfAJMVu7r')]}}
stream_mode: updates
content: {'model': {'messages': [AIMessage(content='San Francisco weather: It's always sunny in San Francisco!\n\n', response_metadata={'token_usage': {'completion_tokens': 764, 'prompt_tokens': 168, 'total_tokens': 932, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 704, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-5-nano-2025-08-07', 'system_fingerprint': None, 'id': 'chatcmpl-C9tljDFVki1e1haCyikBptAuXuHYG', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='lc_run--acbc740a-18fe-4a14-8619-da92a0d0ee90-0', usage_metadata={'input_tokens': 168, 'output_tokens': 764, 'total_tokens': 932, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 704}})]}}Disable streaming
In some applications you might need to disable streaming of individual tokens for a given model.
This is useful in multi-agent systems to control which agents stream their output.
See the Models guide to learn how to disable streaming.

