Base URL
All API requests should be made to the Plexor Labs API server:
https://api.plexor.dev
Authentication
Plexor Labs accepts multiple authentication methods for maximum compatibility with existing SDKs and tooling. All authentication is done via HTTP headers.
Supported Authentication Headers
| Header | Format | Description |
|---|---|---|
Authorization |
Bearer <API_KEY> |
Standard Bearer token authentication. Works with both Plexor Labs API keys and JWT tokens. |
x-api-key |
<API_KEY> |
Anthropic SDK's default header. Fully supported for compatibility with Claude Code and other Anthropic tools. |
X-Plexor-Key |
<API_KEY> |
Plexor Labs' native authentication header. Takes highest precedence when multiple auth headers are present. |
API Key Format
Plexor Labs API keys follow the format plx_<user_id>_<secret>. You can create API keys
from your API Keys dashboard.
plx_user_abc123_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6
Anthropic-Compatible Endpoint
The primary endpoint for Plexor Labs' intelligent LLM routing. This endpoint is fully compatible with Anthropic's Messages API, making it a drop-in replacement for existing Anthropic integrations.
When you send a request to this endpoint, Plexor Labs analyzes the request complexity and routes it to the optimal provider based on your configured mode (eco, balanced, quality, or passthrough).
OpenAI-Compatible Endpoint
For applications using the OpenAI SDK or expecting OpenAI's response format, Plexor Labs provides a fully compatible Chat Completions endpoint.
This endpoint accepts OpenAI's request format and returns OpenAI-compatible responses, while still providing Plexor Labs' intelligent routing and cost optimization.
Additional OpenAI Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /v1/models |
List all available models across all providers |
Compression-Only Endpoint
For users who want to control their own routing but leverage Plexor Labs' prompt compression, we provide a standalone compression endpoint. This lets you get compressed prompts without routing to any LLM provider.
This endpoint uses LLMLingua-2 (Microsoft's perplexity-based compression) to intelligently compress prompts while preserving semantic meaning. A heuristic fallback is used when LLMLingua is unavailable.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
prompt |
string | One of prompt/messages | Plain text prompt to compress |
messages |
array | One of prompt/messages | Array of message objects to compress |
mode |
string | No | eco (30%), balanced (50%, default), or quality (70%) |
target_ratio |
float | No | Custom compression ratio (0.1-0.9), overrides mode default |
Example Request
curl -X POST https://api.plexor.dev/v1/compress \ -H "Content-Type: application/json" \ -H "X-Plexor-Key: YOUR_API_KEY" \ -d '{ "prompt": "Please help me write a function that calculates the factorial of a number. The function should handle edge cases like negative numbers and zero.", "mode": "balanced" }'
Response Format
{
"original": {
"text": "Please help me write a function that calculates...",
"tokens": 52
},
"compressed": {
"text": "Write factorial function. Handle negative, zero.",
"tokens": 26
},
"compression_ratio": 0.5,
"tokens_saved": 26,
"estimated_savings_usd": 0.000078,
"mode_used": "balanced",
"techniques_applied": ["llmlingua-2"],
"classification": {
"category": "code",
"confidence": 0.90,
"task_type": "code_generation",
"reasoning": "Code generation verb with code artifact detected.",
"method": "fast_path",
"suggested_provider": "deepseek"
}
}
Routing Headers
Plexor Labs provides special headers to control routing behavior. These are optional and allow you to fine-tune how requests are processed.
X-Plexor-Mode
Controls the cost/quality tradeoff for request routing. If not specified, defaults to balanced.
| Mode | Behavior | Best For |
|---|---|---|
eco |
Routes to the cheapest capable model. Maximizes cost savings. | Simple queries, drafts, brainstorming, high-volume tasks |
balanced |
Intelligently balances cost and quality based on request complexity. Default mode. | General use, most applications |
quality |
Prioritizes response quality. Uses premium models for complex tasks. | Complex reasoning, production code, critical analysis |
passthrough |
Routes directly to the requested model with no optimization. Bypasses intelligent routing. | Testing, benchmarks, specific model requirements |
X-Plexor-Provider / X-Force-Provider
Force requests to be routed to a specific provider, overriding intelligent routing. Both header names are supported for compatibility.
| Provider | Description |
|---|---|
auto |
Let Plexor Labs choose the optimal provider (default) |
claude or anthropic |
Force routing to Anthropic's Claude models |
openai |
Force routing to OpenAI models |
deepseek |
Force routing to DeepSeek models |
mistral |
Force routing to Mistral AI models |
gemini |
Force routing to Google Gemini models |
Additional Routing Headers
| Header | Type | Description |
|---|---|---|
X-Plexor-Session-Id |
String | Session ID for conversation continuity. Format: gw_<ULID> or sess_<ULID> |
X-Plexor-Skip-Optimization |
Boolean | Set to true to skip prompt optimization |
Request Body
The request body follows Anthropic's Messages API format. Below is a complete reference of all supported fields.
Required Fields
| Field | Type | Description |
|---|---|---|
model |
string | The model to use. Examples: claude-3-5-sonnet-20241022, claude-opus-4-5, gpt-4o |
messages |
array | Array of message objects. First message must have role user. |
max_tokens |
integer | Maximum tokens to generate. Range: 1-200000. Default: 1024 |
Message Object
| Field | Type | Description |
|---|---|---|
role |
string | The role: user, assistant, system, or tool |
content |
string | array | Message content. Can be a string or array of content blocks (text, image, tool_use, tool_result) |
Optional Fields
| Field | Type | Default | Description |
|---|---|---|---|
system |
string | array | null | System prompt to guide the model's behavior |
temperature |
float | 1.0 | Sampling temperature (0.0-1.0). Lower values are more deterministic. |
top_p |
float | null | Nucleus sampling parameter (0.0-1.0) |
top_k |
integer | null | Top-k sampling parameter |
stop_sequences |
array | null | Sequences that will stop generation (max 4) |
stream |
boolean | false | Enable streaming responses (SSE format) |
tools |
array | null | List of tools the model can use |
tool_choice |
string | object | null | How to use tools: auto, any, or specific tool |
Plexor Labs Extension Fields
You can also include Plexor Labs-specific fields in the request body as an alternative to headers:
| Field | Type | Description |
|---|---|---|
plexor_mode |
string | Same as X-Plexor-Mode header |
plexor_provider |
string | Same as X-Plexor-Provider header |
plexor_session_id |
string | Same as X-Plexor-Session-Id header |
Response Format
Responses follow Anthropic's Messages API format with additional Plexor Labs metadata.
Standard Response Fields
| Field | Type | Description |
|---|---|---|
id |
string | Unique message ID (format: msg_<random>) |
type |
string | Always "message" |
role |
string | Always "assistant" |
content |
array | Array of content blocks (text and/or tool_use) |
model |
string | The model that was requested |
stop_reason |
string | Why generation stopped: end_turn, max_tokens, stop_sequence, tool_use |
usage |
object | Token usage: { input_tokens, output_tokens } |
Plexor Labs Extension Fields
| Field | Type | Description |
|---|---|---|
plexor_provider_used |
string | The actual provider that handled the request (e.g., deepseek, anthropic) |
plexor_session_id |
string | Session ID for this conversation |
plexor_cost_usd |
float | Actual cost incurred for this request in USD |
plexor_savings_usd |
float | Cost savings compared to using Claude directly |
classification |
object | null | Task classification with category, confidence, and suggested provider. See Task Classification |
Response Headers
Plexor Labs also returns detailed metrics in response headers:
| Header | Description |
|---|---|
X-Plexor-Request-Id |
Unique request identifier for debugging |
X-Plexor-Session-Id |
Session ID for conversation continuity |
X-Plexor-Mode |
The routing mode that was used |
X-Plexor-Provider-Used |
The provider that handled the request |
X-Plexor-Actual-Cost |
Actual cost in USD |
X-Plexor-Baseline-Cost |
What the request would have cost with Claude |
X-Plexor-Savings |
Cost savings in USD |
X-Plexor-Savings-Percent |
Percentage cost savings |
X-Plexor-Latency-Ms |
Total request latency in milliseconds |
Example Response
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! I'm doing well, thank you for asking. How can I help you today?"
}
],
"model": "claude-3-5-sonnet-20241022",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 12,
"output_tokens": 24
},
// Plexor Labs extension fields
"plexor_provider_used": "deepseek",
"plexor_session_id": "gw_01HQVX8K9JNMKQP3R4STUVWX",
"plexor_cost_usd": 0.00012,
"plexor_savings_usd": 0.00288
}
Task Classification
Every Plexor Labs API response now includes a task classification object that reveals how Plexor categorized your prompt. This classification drives routing decisions, compression strategy selection, and provider recommendations.
Classification Object
The classification object is included in responses from all prompt-processing endpoints:
/v1/optimize, /v1/complete, /v1/completions, and /v1/compress.
| Field | Type | Description |
|---|---|---|
category |
string | Semantic category: code, reasoning, creative, factual, general, tool |
confidence |
float | Classification confidence from 0.0 to 1.0 |
task_type |
string | Legacy routing type for backward compatibility (e.g., code_generation, analysis) |
reasoning |
string | Human-readable explanation of the classification decision |
method |
string | Which classification path was used: fast_path (heuristic, <1ms), semantic (BERT embeddings, ~10ms), or fallback (keyword-based) |
suggested_provider |
string | Recommended provider for this task category |
Category Reference
| Category | Description | Suggested Provider |
|---|---|---|
code |
Programming, implementation, code generation | deepseek |
reasoning |
Analysis, design, evaluation, comparison | claude |
creative |
Writing, documentation, content generation | claude |
factual |
Information retrieval, definitions, lists | mistral |
general |
Simple questions, clarifications | mistral |
tool |
GitHub operations, filesystem, CI/CD | claude |
Example Classification Response
{
"optimized_messages": [...],
"original_tokens": 12,
"optimized_tokens": 10,
"tokens_saved": 2,
"compression_ratio": 0.833,
"recommended_provider": "deepseek",
"classification": {
"category": "code",
"confidence": 0.90,
"task_type": "code_generation",
"reasoning": "Code generation verb with code artifact detected.",
"method": "fast_path",
"suggested_provider": "deepseek"
}
}
Response Location by Endpoint
| Endpoint | Response Location |
|---|---|
POST /v1/optimize |
response.classification |
POST /v1/complete |
response.plexor.classification |
POST /v1/completions |
response.classification |
POST /v1/compress |
response.classification |
code tasks
with confidence > 0.75 to DeepSeek for cost savings, while sending reasoning
tasks to Claude for quality. When confidence is below 0.55, fall back to Claude for safety.
See the full classification API docs for detailed examples.
Streaming
Plexor Labs supports streaming responses via Server-Sent Events (SSE). Enable streaming by setting
stream: true in your request body.
Streaming Request
curl -X POST https://api.plexor.dev/gateway/anthropic/v1/messages \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{ "model": "claude-3-5-sonnet-20241022", "max_tokens": 1024, "stream": true, "messages": [ {"role": "user", "content": "Write a haiku about programming"} ] }'
Stream Events
Streaming responses include the following event types:
| Event Type | Description |
|---|---|
message_start |
Initial message metadata |
content_block_start |
Start of a content block |
content_block_delta |
Incremental text content |
content_block_stop |
End of a content block |
message_delta |
Final usage information |
message_stop |
End of message |
Error Handling
Plexor Labs uses standard HTTP status codes and returns detailed error information in a consistent format.
HTTP Status Codes
| Code | Meaning | Description |
|---|---|---|
200 |
OK | Request succeeded |
400 |
Bad Request | Invalid request parameters or body format |
401 |
Unauthorized | Missing or invalid API key |
403 |
Forbidden | API key valid but lacks required permissions |
429 |
Too Many Requests | Rate limit exceeded |
500 |
Internal Server Error | Server error - retry with exponential backoff |
502 |
Bad Gateway | Upstream provider error |
503 |
Service Unavailable | Service temporarily unavailable |
Error Response Format
{
"type": "error",
"error": {
"type": "invalid_request_error",
"message": "Invalid model name: 'gpt-5-ultra'. See supported models at https://docs.plexor.ai/models"
}
}
Error Types
| Type | Description |
|---|---|
authentication_error |
Invalid or missing API key |
invalid_request_error |
Malformed request or invalid parameters |
rate_limit_error |
Too many requests |
api_error |
Internal server error |
overloaded_error |
Service temporarily overloaded |
Rate Limiting
Plexor Labs implements rate limiting to ensure fair usage and service stability. Rate limits are applied per API key.
Rate Limit Headers
Rate limit information is returned in response headers:
| Header | Description |
|---|---|
X-RateLimit-Limit |
Maximum requests allowed per time window |
X-RateLimit-Remaining |
Remaining requests in current window |
X-RateLimit-Reset |
Unix timestamp when the rate limit resets |
Retry-After |
Seconds to wait before retrying (only on 429) |
Code Examples
Complete examples for making API requests in various languages.
cURL
curl -X POST https://api.plexor.dev/gateway/anthropic/v1/messages \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "X-Plexor-Mode: balanced" \ -d '{ "model": "claude-3-5-sonnet-20241022", "max_tokens": 1024, "messages": [ { "role": "user", "content": "Explain the concept of recursion in programming." } ], "system": "You are a helpful programming tutor. Explain concepts clearly with examples." }'
Python
import requests # Configuration API_KEY = "YOUR_API_KEY" BASE_URL = "https://api.plexor.dev" # Make the request response = requests.post( f"{BASE_URL}/gateway/anthropic/v1/messages", headers={ "Content-Type": "application/json", "Authorization": f"Bearer {API_KEY}", "X-Plexor-Mode": "balanced", }, json={ "model": "claude-3-5-sonnet-20241022", "max_tokens": 1024, "messages": [ { "role": "user", "content": "What is the capital of France?" } ] } ) # Handle the response if response.status_code == 200: data = response.json() print(f"Response: {data['content'][0]['text']}") print(f"Provider used: {data.get('plexor_provider_used', 'unknown')}") print(f"Cost: ${data.get('plexor_cost_usd', 0):.6f}") else: print(f"Error: {response.status_code} - {response.text}")
Python with Anthropic SDK
import anthropic # Point the Anthropic SDK at Plexor Labs client = anthropic.Anthropic( api_key="YOUR_PLEXOR_API_KEY", base_url="https://api.plexor.dev/gateway/anthropic" ) # Use the SDK normally - requests are routed through Plexor Labs message = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, how are you?"} ] ) print(message.content[0].text)
Node.js
const API_KEY = 'YOUR_API_KEY'; const BASE_URL = 'https://api.plexor.dev'; async function chat(message) { const response = await fetch( `${BASE_URL}/gateway/anthropic/v1/messages`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${API_KEY}`, 'X-Plexor-Mode': 'balanced', }, body: JSON.stringify({ model: 'claude-3-5-sonnet-20241022', max_tokens: 1024, messages: [ { role: 'user', content: message } ] }) } ); if (!response.ok) { throw new Error(`HTTP error: ${response.status}`); } const data = await response.json(); console.log('Response:', data.content[0].text); console.log('Provider:', data.plexor_provider_used); console.log('Cost: $' + (data.plexor_cost_usd || 0).toFixed(6)); return data; } // Example usage chat('Explain JavaScript closures in simple terms.') .catch(console.error);
Node.js with OpenAI SDK
import OpenAI from 'openai'; // Point the OpenAI SDK at Plexor Labs' OpenAI-compatible endpoint const client = new OpenAI({ apiKey: 'YOUR_PLEXOR_API_KEY', baseURL: 'https://api.plexor.dev/gateway/openai/v1', }); async function main() { const completion = await client.chat.completions.create({ model: 'gpt-4o', messages: [ { role: 'user', content: 'Write a haiku about coding.' } ], }); console.log(completion.choices[0].message.content); } main();
Streaming Example (Python)
import anthropic client = anthropic.Anthropic( api_key="YOUR_PLEXOR_API_KEY", base_url="https://api.plexor.dev/gateway/anthropic" ) # Stream the response with client.messages.stream( model="claude-3-5-sonnet-20241022", max_tokens=1024, messages=[ {"role": "user", "content": "Write a short story about a robot learning to paint."} ] ) as stream: for text in stream.text_stream: print(text, end="", flush=True) print() # Final newline
base_url to point to Plexor Labs, and all SDK features
including streaming, tool use, and vision work automatically.