Skip to main content

Deployments

PureRouter is a completely independent product from PureCPP. You can use PureRouter without needing PureCPP and vice versa.
In addition to automatic routing through profiles, PureRouter allows you to directly access specific models through their deployment IDs. This is useful when you need a particular model for a specific use case.

Before You Start

To use deployments, you will need:

Instance Type pricing

MachineConfigurationPrice/h (USD)
NVIDIA T4 (Small)1 GPU (16 GB)$1.41/h
NVIDIA A10G (Medium)1 GPU (24 GB)$1.21/h
NVIDIA A10G (Large)1 GPU (24 GB)$1.62/h
NVIDIA A10G (XL)1 GPU (24 GB)$2.45/h
NVIDIA A10G (Multi-GPU)4 GPUs (96 GB)$3.64/h

Accessing Deployments

Synchronous Deployment Call

To access a specific deployment, you need its unique ID:
from purerouter import PureRouter
from purerouter.types import InvokeRequest

client = PureRouter(
    router_key="sk_...",
    base_url="https://api.purerouter-api.com",
    timeout=300.0
)
deployment_id = "your id deployment"

req = InvokeRequest(
    prompt="Tell a brief story about PureAI.",
    max_tokens=250,
    temperature=0.8
)

result = client.deployments.invoke(deployment_id, req)
text = result["choices"][0]["text"]
print(text)

Asynchronous Deployment Call

For high-performance applications, use async operations:
import asyncio
from purerouter import AsyncPureRouter
from purerouter.types import InvokeRequest

async def main():
    client = AsyncPureRouter(
        router_key="sk_...",
        base_url="https://api.purerouter-api.com",
        timeout=300.0
    )
    deployment_id = ""

    req = InvokeRequest(
        prompt="Tell a brief story about PureAI.",
        max_tokens=250,
        temperature=0.8,
        stream=False  # default; may omit
    )

    result = await client.deployments.ainvoke(deployment_id, req)
    text = result["choices"][0]["text"]
    print(text)

asyncio.run(main())

Invocation Parameters

When invoking a specific deployment, you can configure various parameters:
req = InvokeRequest(
    prompt="Explain the concept of machine learning.",
    max_tokens=500,      # Limits response size
    temperature=0.7,     # Controls randomness (0.0 to 1.0)
    top_p=0.95,         # Nucleus sampling
    stream=False        # Set to True for streaming responses
)

response = client.deployments.invoke("deployment-id", req)
text = response["choices"][0]["text"]
print(text)

Response Streaming

To get real-time responses (streaming):
import asyncio
import json
import sys
from purerouter import AsyncPureRouter
from purerouter.types import InvokeRequest

async def main():
    client = AsyncPureRouter(
        router_key="sk_...",
        base_url="https://api.purerouter-api.com",
        timeout=300.0
    )
    deployment_id = ""

    req = InvokeRequest(
        prompt="Hi.",
        max_tokens=250,
        temperature=0.8,
        stream=True
    )

    final = []
    async for ev in client.deployments.astream(deployment_id, req):
        line = (ev.data or "").strip()
        if not line or line == "[DONE]":
            continue

        if line.startswith("data:"):
            line = line[len("data:"):].strip()

        try:
            obj = json.loads(line)
        except json.JSONDecodeError:
            sys.stdout.write(line)
            sys.stdout.flush()
            final.append(line)
            continue

        choices = obj.get("choices") or []
        if choices and isinstance(choices[0], dict) and "text" in choices[0]:
            tok = choices[0]["text"] or ""
            sys.stdout.write(tok)
            sys.stdout.flush()
            final.append(tok)

asyncio.run(main())

Use Cases for Direct Deployments

When you need absolute consistency in responses, always using the same model through the deployment ID ensures you’ll get predictable results, without variations that can occur with automatic routing.
If you’ve deployed a fine-tuned model for a specific task, you can access it directly by ID to leverage its specialized training.
To compare the performance of different models on the same task, you can invoke each one directly and evaluate the results.
In scenarios where there are specific regulatory requirements about which models can be used, direct access ensures compliance.

Practical Example: Fallback System

def process_query_with_fallback(query):
    # Try first with the preferred model
    try:
        response = client.deployments.invoke(
            "primary-model-id",
            InvokeRequest(
                prompt=query,
                temperature=0.7
            )
        )
        return response["choices"][0]["text"]
    except Exception as e:
        print(f"Error in primary model: {e}")

        # Fallback to an alternative model
        try:
            response = client.deployments.invoke(
                "fallback-model-id",
                InvokeRequest(
                    prompt=query,
                    temperature=0.7
                )
            )
            return response["choices"][0]["text"]
        except Exception as e:
            print(f"Error in fallback model: {e}")

            # Last resort: use automatic routing
            response = client.router.infer(InferRequest(
                prompt=query,
                profile="balanced"
            ))
            return response.output_text

Deployment ID

Após criar um deployment, você receberá um ID único que identifica seu modelo específico. É com esse ID que você vai se comunicar diretamente com o deployment através da API. Deployment ID O Deployment ID é essencial para:
  • Fazer chamadas diretas para o modelo específico
  • Garantir consistência nas respostas
  • Acessar modelos especializados ou fine-tuned
  • Implementar sistemas de fallback

Next Steps