API Reference
Base URL
Authentication
All requests require an API key via header:Endpoints
POST /v1/chat/completions
Create a chat completion. Request:| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model identifier |
messages | array | Yes | Conversation messages |
max_tokens | integer | No | Maximum tokens to generate |
temperature | float | No | Randomness (0-2) |
top_p | float | No | Nucleus sampling |
stream | boolean | No | Enable streaming |
stop | array | No | Stop sequences |
POST /v1/completions
Create a text completion. Request:| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model identifier |
prompt | string | Yes | Text prompt |
max_tokens | integer | No | Maximum tokens |
temperature | float | No | Randomness |
stop | array | No | Stop sequences |
GET /v1/models
List available models. Response:GET /v1/providers
List providers for a model. Query Parameters:| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model identifier |
Message Object
| Field | Type | Description |
|---|---|---|
role | string | system, user, or assistant |
content | string | Message content |
Usage Object
| Field | Type | Description |
|---|---|---|
prompt_tokens | integer | Input token count |
completion_tokens | integer | Output token count |
total_tokens | integer | Total tokens |
input_cost_usd | float | Input cost (USD) |
output_cost_usd | float | Output cost (USD) |
cache_input_cost_usd | float | Cached input cost |
total_cost_usd | float | Total cost (USD) |
latency_ms | float | Request latency (ms) |
ttft_ms | float | Time to first token (ms) |
Streaming
Setstream: true to receive Server-Sent Events:
Rate Limits
Rate limits are applied per API key. When exceeded, you’ll receive a 429 response withretry_after header.