Streaming
Streaming allows you to receive responses token by token as they’re generated, enabling real-time output in your applications.Basic Streaming
Stream Response Structure
Each chunk is aChatCompletionChunk object:
| Field | Type | Description |
|---|---|---|
id | str | Completion identifier |
object | str | "chat.completion.chunk" |
created | int | Unix timestamp |
model | str | Model used |
choices | list | List of streaming choices |
Choice Delta
Collecting Full Response
Async Streaming
Detecting Stream End
With Fallbacks
Streaming works with fallbacks. If the primary model fails, the SDK automatically tries the next model:Streaming with Error Handling
When to Use Streaming
| Use Case | Stream? |
|---|---|
| Chat interfaces | Yes |
| Real-time output | Yes |
| Long-form generation | Yes |
| Batch processing | No |
| API integrations | Depends |
Performance Considerations
- TTFT (Time to First Token): Streaming provides faster perceived response time
- Total latency: Similar to non-streaming
- Memory: Streaming uses less memory for long responses