Deployments

PureRouter is a completely independent product from PureCPP. You can use PureRouter without needing PureCPP and vice versa.

In addition to automatic routing through profiles, PureRouter allows you to directly access specific models through their deployment IDs. This is useful when you need a particular model for a specific use case.

Before You Start

To use deployments, you will need:

An API Key, which you can Generate an API Key
At least one Deployment, which you can create via the Deployments page.

Instance Type pricing

Machine	Configuration	Price/h (USD)
NVIDIA T4 (Small)	1 GPU (16 GB)	$1.41/h
NVIDIA A10G (Medium)	1 GPU (24 GB)	$1.21/h
NVIDIA A10G (Large)	1 GPU (24 GB)	$1.62/h
NVIDIA A10G (XL)	1 GPU (24 GB)	$2.45/h
NVIDIA A10G (Multi-GPU)	4 GPUs (96 GB)	$3.64/h

Accessing Deployments

Synchronous Deployment Call

To access a specific deployment, you need its unique ID:

from purerouter import PureRouter
from purerouter.types import InvokeRequest

client = PureRouter(
    router_key="sk_...",
    base_url="https://api.purerouter-api.com",
    timeout=300.0
)
deployment_id = "your id deployment"

req = InvokeRequest(
    prompt="Tell a brief story about PureAI.",
    max_tokens=250,
    temperature=0.8
)

result = client.deployments.invoke(deployment_id, req)
text = result["choices"][0]["text"]
print(text)

Asynchronous Deployment Call

For high-performance applications, use async operations:

import asyncio
from purerouter import AsyncPureRouter
from purerouter.types import InvokeRequest

async def main():
    client = AsyncPureRouter(
        router_key="sk_...",
        base_url="https://api.purerouter-api.com",
        timeout=300.0
    )
    deployment_id = ""

    req = InvokeRequest(
        prompt="Tell a brief story about PureAI.",
        max_tokens=250,
        temperature=0.8,
        stream=False  # default; may omit
    )

    result = await client.deployments.ainvoke(deployment_id, req)
    text = result["choices"][0]["text"]
    print(text)

asyncio.run(main())

Invocation Parameters

When invoking a specific deployment, you can configure various parameters:

req = InvokeRequest(
    prompt="Explain the concept of machine learning.",
    max_tokens=500,      # Limits response size
    temperature=0.7,     # Controls randomness (0.0 to 1.0)
    top_p=0.95,         # Nucleus sampling
    stream=False        # Set to True for streaming responses
)

response = client.deployments.invoke("deployment-id", req)
text = response["choices"][0]["text"]
print(text)

Response Streaming

To get real-time responses (streaming):

import asyncio
import json
import sys
from purerouter import AsyncPureRouter
from purerouter.types import InvokeRequest

async def main():
    client = AsyncPureRouter(
        router_key="sk_...",
        base_url="https://api.purerouter-api.com",
        timeout=300.0
    )
    deployment_id = ""

    req = InvokeRequest(
        prompt="Hi.",
        max_tokens=250,
        temperature=0.8,
        stream=True
    )

    final = []
    async for ev in client.deployments.astream(deployment_id, req):
        line = (ev.data or "").strip()
        if not line or line == "[DONE]":
            continue

        if line.startswith("data:"):
            line = line[len("data:"):].strip()

        try:
            obj = json.loads(line)
        except json.JSONDecodeError:
            sys.stdout.write(line)
            sys.stdout.flush()
            final.append(line)
            continue

        choices = obj.get("choices") or []
        if choices and isinstance(choices[0], dict) and "text" in choices[0]:
            tok = choices[0]["text"] or ""
            sys.stdout.write(tok)
            sys.stdout.flush()
            final.append(tok)

asyncio.run(main())

Use Cases for Direct Deployments

Response Consistency

When you need absolute consistency in responses, always using the same model through the deployment ID ensures you’ll get predictable results, without variations that can occur with automatic routing.

Specialized Models

If you’ve deployed a fine-tuned model for a specific task, you can access it directly by ID to leverage its specialized training.

Comparative Testing

To compare the performance of different models on the same task, you can invoke each one directly and evaluate the results.

Regulatory Requirements

In scenarios where there are specific regulatory requirements about which models can be used, direct access ensures compliance.

Practical Example: Fallback System

def process_query_with_fallback(query):
    # Try first with the preferred model
    try:
        response = client.deployments.invoke(
            "primary-model-id",
            InvokeRequest(
                prompt=query,
                temperature=0.7
            )
        )
        return response["choices"][0]["text"]
    except Exception as e:
        print(f"Error in primary model: {e}")

        # Fallback to an alternative model
        try:
            response = client.deployments.invoke(
                "fallback-model-id",
                InvokeRequest(
                    prompt=query,
                    temperature=0.7
                )
            )
            return response["choices"][0]["text"]
        except Exception as e:
            print(f"Error in fallback model: {e}")

            # Last resort: use automatic routing
            response = client.router.infer(InferRequest(
                prompt=query,
                profile="balanced"
            ))
            return response.output_text

Deployment ID

Após criar um deployment, você receberá um ID único que identifica seu modelo específico. É com esse ID que você vai se comunicar diretamente com o deployment através da API.

O Deployment ID é essencial para:

Fazer chamadas diretas para o modelo específico
Garantir consistência nas respostas
Acessar modelos especializados ou fine-tuned
Implementar sistemas de fallback

Next Steps

Python SDK

Learn to use the PureRouter Python SDK

Routing Profiles

Learn more about economy, balanced, and quality profiles

API Reference

Check the complete API documentation

introduction

PureRouter

PureCPP

PureRouter Deployments

Deployments

Before You Start

Instance Type pricing

Accessing Deployments

Synchronous Deployment Call

Asynchronous Deployment Call

Invocation Parameters

Response Streaming

Use Cases for Direct Deployments

Practical Example: Fallback System

Deployment ID

Next Steps

Python SDK

Routing Profiles

API Reference

introduction

PureRouter

PureCPP

​Deployments

​Before You Start

​Instance Type pricing

​Accessing Deployments

​Synchronous Deployment Call

​Asynchronous Deployment Call

​Invocation Parameters

​Response Streaming

​Use Cases for Direct Deployments

​Practical Example: Fallback System

​Deployment ID

​Next Steps

Python SDK

Routing Profiles

API Reference

Deployments

Before You Start

Instance Type pricing

Accessing Deployments

Synchronous Deployment Call

Asynchronous Deployment Call

Invocation Parameters

Response Streaming

Use Cases for Direct Deployments

Practical Example: Fallback System

Deployment ID

Next Steps