Overview
Pipecat provides two variants of the OpenAI Responses API LLM service:OpenAIResponsesLLMService(WebSocket-based, recommended): Maintains a persistent WebSocket connection for lower-latency inference and automatically usesprevious_response_idto send only incremental context when possible.OpenAIResponsesHttpLLMService(HTTP-based): Uses server-sent events (SSE) via HTTP streaming. Each request opens a new connection. Use this when WebSocket is not available or preferred.
LLMContext and LLMContextAggregatorPair.
The Responses API is a newer OpenAI API designed for conversational AI
applications. It differs from the Chat Completions API in its request/response
structure and streaming format. See OpenAI Responses API
documentation for
more details.
WebSocket vs HTTP
Use WebSocket (OpenAIResponsesLLMService) when:
- You need the lowest possible latency for real-time conversations
- Your workflow involves frequent tool/function calls
- You want automatic incremental context optimization without server-side storage
OpenAIResponsesHttpLLMService) when:
- WebSocket connections are blocked by your infrastructure
- You prefer stateless request/response patterns
- You don’t need the incremental context optimization
previous_response_id optimization works with store=False (the default) using a connection-local in-memory cache—no conversations are stored on OpenAI’s servers. The HTTP variant does not offer this optimization by default, as it would require store=True (30-day OpenAI-side conversation storage).
OpenAI Responses API Reference
Pipecat’s API methods for OpenAI Responses integration
Example Implementation
Interruptible conversation example
OpenAI Documentation
Official OpenAI Responses API documentation
OpenAI Platform
Access models and manage API keys
Installation
To use OpenAI services, install the required dependencies:Prerequisites
OpenAI Account Setup
Before using OpenAI Responses LLM services, you need:- OpenAI Account: Sign up at OpenAI Platform
- API Key: Generate an API key from your account dashboard
- Model Selection: Choose from available models (GPT-4.1, GPT-4o, GPT-4o-mini, etc.)
- Usage Limits: Set up billing and usage limits as needed
Required Environment Variables
OPENAI_API_KEY: Your OpenAI API key for authentication
Configuration
Common Parameters
These parameters are available for bothOpenAIResponsesLLMService and OpenAIResponsesHttpLLMService:
OpenAI API key. If
None, uses the OPENAI_API_KEY environment variable.Custom base URL for the OpenAI API. Override for proxied or self-hosted
deployments.
OpenAI organization ID.
OpenAI project ID.
Additional HTTP headers to include in every request.
Service tier to use (e.g., “auto”, “flex”, “priority”).
Runtime-configurable model settings. See Settings below.
WebSocket-Specific Parameters
The following parameter is only available forOpenAIResponsesLLMService (WebSocket variant):
WebSocket endpoint URL. Override for custom deployments or proxies.
Settings
Runtime-configurable settings passed via thesettings constructor argument using OpenAIResponsesLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | "gpt-4.1" | OpenAI model identifier. (Inherited from base settings.) |
system_instruction | str | None | System instruction/prompt for the model. (Inherited from base settings.) |
temperature | float | NOT_GIVEN | Sampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative. |
top_p | float | NOT_GIVEN | Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output. |
frequency_penalty | float | None | Penalty for frequent tokens (-2.0 to 2.0). Positive values discourage repetition. |
presence_penalty | float | None | Penalty for new topics (-2.0 to 2.0). Positive values encourage the model to talk about new topics. |
seed | int | None | Random seed for deterministic outputs. |
max_completion_tokens | int | NOT_GIVEN | Maximum completion tokens to generate. |
NOT_GIVEN values are omitted from the API request entirely, letting the
OpenAI API use its own defaults. This is different from None, which would be
sent explicitly.Usage
Basic Setup
WebSocket variant (recommended):With Custom Settings
Both
OpenAIResponsesLLMService.Settings and OpenAIResponsesHttpLLMService.Settings use the same OpenAIResponsesLLMSettings class, so settings are identical between variants.Updating Settings at Runtime
Model settings can be changed mid-conversation usingLLMUpdateSettingsFrame:
Out-of-Band Inference
Run a one-shot inference without pushing frames through the pipeline:Notes
- WebSocket is the new default: As of Pipecat version with PR #4141,
OpenAIResponsesLLMServiceuses WebSocket transport by default. If you need the HTTP streaming behavior, useOpenAIResponsesHttpLLMServiceinstead. Both have identical constructor args and settings. - Persistent WebSocket connection: The WebSocket variant maintains a persistent connection to
wss://api.openai.com/v1/responsesand automatically reconnects on connection loss. Connection lifetime is limited to 60 minutes server-side, after which automatic reconnection occurs. - Incremental context optimization: The WebSocket variant uses
previous_response_idto send only incremental context when the conversation prefix hasn’t changed, reducing latency and costs. This works withstore=False(no server-side storage) via a connection-local cache. - Responses API vs Chat Completions API: The Responses API has a different request/response structure compared to the Chat Completions API. Use
OpenAILLMServicefor the Chat Completions API andOpenAIResponsesLLMServiceorOpenAIResponsesHttpLLMServicefor the Responses API. - Universal LLM Context: Both services work with the universal
LLMContextandLLMContextAggregatorPair, making it easy to switch between different LLM providers. - Function calling: Supports OpenAI’s tool/function calling format. Register function handlers on the pipeline task to handle tool calls automatically.
- Usage metrics: Automatically tracks token usage, including cached tokens and reasoning tokens.
- Service tiers: Supports OpenAI’s service tier system for prioritizing requests.
Event Handlers
BothOpenAIResponsesLLMService and OpenAIResponsesHttpLLMService support the following event handlers, inherited from LLMService:
| Event | Description |
|---|---|
on_completion_timeout | Called when an LLM completion request times out |
on_function_calls_started | Called when function calls are received and execution is about to start |