HTTP API Reference - Plexor Labs Documentation

Complete reference for the Plexor Labs REST API. Drop-in compatible with both Anthropic's Messages API and OpenAI's Chat Completions API, with intelligent routing and cost optimization features.

Base URL

All API requests should be made to the Plexor Labs API server:

Base URL

https://api.plexor.dev

Authentication

Plexor Labs accepts multiple authentication methods for maximum compatibility with existing SDKs and tooling. All authentication is done via HTTP headers.

Supported Authentication Headers

Header	Format	Description
`Authorization`	`Bearer <API_KEY>`	Standard Bearer token authentication. Works with both Plexor Labs API keys and JWT tokens.
`x-api-key`	`<API_KEY>`	Anthropic SDK's default header. Fully supported for compatibility with Claude Code and other Anthropic tools.
`X-Plexor-Key`	`<API_KEY>`	Plexor Labs' native authentication header. Takes highest precedence when multiple auth headers are present.

API Key Format

Plexor Labs API keys follow the format plx_<user_id>_<secret>. You can create API keys from your API Keys dashboard.

Example API Key

plx_user_abc123_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6

Keep Your API Key Secret

Never expose your API key in client-side code, public repositories, or logs. Treat it like a password.

Anthropic-Compatible Endpoint

The primary endpoint for Plexor Labs' intelligent LLM routing. This endpoint is fully compatible with Anthropic's Messages API, making it a drop-in replacement for existing Anthropic integrations.

POST /gateway/anthropic/v1/messages

When you send a request to this endpoint, Plexor Labs analyzes the request complexity and routes it to the optimal provider based on your configured mode (eco, balanced, quality, or passthrough).

OpenAI-Compatible Endpoint

For applications using the OpenAI SDK or expecting OpenAI's response format, Plexor Labs provides a fully compatible Chat Completions endpoint.

POST /gateway/openai/v1/chat/completions

This endpoint accepts OpenAI's request format and returns OpenAI-compatible responses, while still providing Plexor Labs' intelligent routing and cost optimization.

Additional OpenAI Endpoints

Method	Endpoint	Description
GET	`/v1/models`	List all available models across all providers

Compression-Only Endpoint

For users who want to control their own routing but leverage Plexor Labs' prompt compression, we provide a standalone compression endpoint. This lets you get compressed prompts without routing to any LLM provider.

POST /v1/compress

This endpoint uses LLMLingua-2 (Microsoft's perplexity-based compression) to intelligently compress prompts while preserving semantic meaning. A heuristic fallback is used when LLMLingua is unavailable.

Request Body

Field	Type	Required	Description
`prompt`	string	One of prompt/messages	Plain text prompt to compress
`messages`	array	One of prompt/messages	Array of message objects to compress
`mode`	string	No	`eco` (30%), `balanced` (50%, default), or `quality` (70%)
`target_ratio`	float	No	Custom compression ratio (0.1-0.9), overrides mode default

Example Request

cURL - Compress Prompt

curl -X POST https://api.plexor.dev/v1/compress \
  -H "Content-Type: application/json" \
  -H "X-Plexor-Key: YOUR_API_KEY" \
  -d '{
    "prompt": "Please help me write a function that calculates the factorial of a number. The function should handle edge cases like negative numbers and zero.",
    "mode": "balanced"
  }'

Response Format

JSON Response

{
  "original": {
    "text": "Please help me write a function that calculates...",
    "tokens": 52
  },
  "compressed": {
    "text": "Write factorial function. Handle negative, zero.",
    "tokens": 26
  },
  "compression_ratio": 0.5,
  "tokens_saved": 26,
  "estimated_savings_usd": 0.000078,
  "mode_used": "balanced",
  "techniques_applied": ["llmlingua-2"],
  "classification": {
    "category": "code",
    "confidence": 0.90,
    "task_type": "code_generation",
    "reasoning": "Code generation verb with code artifact detected.",
    "method": "fast_path",
    "suggested_provider": "deepseek"
  }
}

Self-Routing Use Case

Use this endpoint when you want to compress prompts but route requests to providers yourself. Perfect for multi-provider strategies, A/B testing, or custom routing logic.

Routing Headers

Plexor Labs provides special headers to control routing behavior. These are optional and allow you to fine-tune how requests are processed.

X-Plexor-Mode

Controls the cost/quality tradeoff for request routing. If not specified, defaults to balanced.

Mode	Behavior	Best For
`eco`	Routes to the cheapest capable model. Maximizes cost savings.	Simple queries, drafts, brainstorming, high-volume tasks
`balanced`	Intelligently balances cost and quality based on request complexity. Default mode.	General use, most applications
`quality`	Prioritizes response quality. Uses premium models for complex tasks.	Complex reasoning, production code, critical analysis
`passthrough`	Routes directly to the requested model with no optimization. Bypasses intelligent routing.	Testing, benchmarks, specific model requirements

X-Plexor-Provider / X-Force-Provider

Force requests to be routed to a specific provider, overriding intelligent routing. Both header names are supported for compatibility.

Provider	Description
`auto`	Let Plexor Labs choose the optimal provider (default)
`claude` or `anthropic`	Force routing to Anthropic's Claude models
`openai`	Force routing to OpenAI models
`deepseek`	Force routing to DeepSeek models
`mistral`	Force routing to Mistral AI models
`gemini`	Force routing to Google Gemini models

Additional Routing Headers

Header	Type	Description
`X-Plexor-Session-Id`	String	Session ID for conversation continuity. Format: `gw_<ULID>` or `sess_<ULID>`
`X-Plexor-Skip-Optimization`	Boolean	Set to `true` to skip prompt optimization

Request Body

The request body follows Anthropic's Messages API format. Below is a complete reference of all supported fields.

Required Fields

Field	Type	Description
`model`	string	The model to use. Examples: `claude-3-5-sonnet-20241022`, `claude-opus-4-5`, `gpt-4o`
`messages`	array	Array of message objects. First message must have role `user`.
`max_tokens`	integer	Maximum tokens to generate. Range: 1-200000. Default: 1024

Message Object

Field	Type	Description
`role`	string	The role: `user`, `assistant`, `system`, or `tool`
`content`	string \| array	Message content. Can be a string or array of content blocks (text, image, tool_use, tool_result)

Optional Fields

Field	Type	Default	Description
`system`	string \| array	null	System prompt to guide the model's behavior
`temperature`	float	1.0	Sampling temperature (0.0-1.0). Lower values are more deterministic.
`top_p`	float	null	Nucleus sampling parameter (0.0-1.0)
`top_k`	integer	null	Top-k sampling parameter
`stop_sequences`	array	null	Sequences that will stop generation (max 4)
`stream`	boolean	false	Enable streaming responses (SSE format)
`tools`	array	null	List of tools the model can use
`tool_choice`	string \| object	null	How to use tools: `auto`, `any`, or specific tool

Plexor Labs Extension Fields

You can also include Plexor Labs-specific fields in the request body as an alternative to headers:

Field	Type	Description
`plexor_mode`	string	Same as `X-Plexor-Mode` header
`plexor_provider`	string	Same as `X-Plexor-Provider` header
`plexor_session_id`	string	Same as `X-Plexor-Session-Id` header

Response Format

Responses follow Anthropic's Messages API format with additional Plexor Labs metadata.

Standard Response Fields

Field	Type	Description
`id`	string	Unique message ID (format: `msg_<random>`)
`type`	string	Always `"message"`
`role`	string	Always `"assistant"`
`content`	array	Array of content blocks (text and/or tool_use)
`model`	string	The model that was requested
`stop_reason`	string	Why generation stopped: `end_turn`, `max_tokens`, `stop_sequence`, `tool_use`
`usage`	object	Token usage: `{ input_tokens, output_tokens }`

Plexor Labs Extension Fields

Field	Type	Description
`plexor_provider_used`	string	The actual provider that handled the request (e.g., `deepseek`, `anthropic`)
`plexor_session_id`	string	Session ID for this conversation
`plexor_cost_usd`	float	Actual cost incurred for this request in USD
`plexor_savings_usd`	float	Cost savings compared to using Claude directly
`classification`	object \| null	Task classification with category, confidence, and suggested provider. See Task Classification

Response Headers

Plexor Labs also returns detailed metrics in response headers:

Header	Description
`X-Plexor-Request-Id`	Unique request identifier for debugging
`X-Plexor-Session-Id`	Session ID for conversation continuity
`X-Plexor-Mode`	The routing mode that was used
`X-Plexor-Provider-Used`	The provider that handled the request
`X-Plexor-Actual-Cost`	Actual cost in USD
`X-Plexor-Baseline-Cost`	What the request would have cost with Claude
`X-Plexor-Savings`	Cost savings in USD
`X-Plexor-Savings-Percent`	Percentage cost savings
`X-Plexor-Latency-Ms`	Total request latency in milliseconds

Example Response

JSON Response

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! I'm doing well, thank you for asking. How can I help you today?"
    }
  ],
  "model": "claude-3-5-sonnet-20241022",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 12,
    "output_tokens": 24
  },
  // Plexor Labs extension fields
  "plexor_provider_used": "deepseek",
  "plexor_session_id": "gw_01HQVX8K9JNMKQP3R4STUVWX",
  "plexor_cost_usd": 0.00012,
  "plexor_savings_usd": 0.00288
}

Task Classification

Every Plexor Labs API response now includes a task classification object that reveals how Plexor categorized your prompt. This classification drives routing decisions, compression strategy selection, and provider recommendations.

Classification Object

The classification object is included in responses from all prompt-processing endpoints: /v1/optimize, /v1/complete, /v1/completions, and /v1/compress.

Field	Type	Description
`category`	string	Semantic category: `code`, `reasoning`, `creative`, `factual`, `general`, `tool`
`confidence`	float	Classification confidence from 0.0 to 1.0
`task_type`	string	Legacy routing type for backward compatibility (e.g., `code_generation`, `analysis`)
`reasoning`	string	Human-readable explanation of the classification decision
`method`	string	Which classification path was used: `fast_path` (heuristic, <1ms), `semantic` (BERT embeddings, ~10ms), or `fallback` (keyword-based)
`suggested_provider`	string	Recommended provider for this task category

Category Reference

Category	Description	Suggested Provider
`code`	Programming, implementation, code generation	deepseek
`reasoning`	Analysis, design, evaluation, comparison	claude
`creative`	Writing, documentation, content generation	claude
`factual`	Information retrieval, definitions, lists	mistral
`general`	Simple questions, clarifications	mistral
`tool`	GitHub operations, filesystem, CI/CD	claude

Example Classification Response

JSON - Classification in /v1/optimize

{
  "optimized_messages": [...],
  "original_tokens": 12,
  "optimized_tokens": 10,
  "tokens_saved": 2,
  "compression_ratio": 0.833,
  "recommended_provider": "deepseek",
  "classification": {
    "category": "code",
    "confidence": 0.90,
    "task_type": "code_generation",
    "reasoning": "Code generation verb with code artifact detected.",
    "method": "fast_path",
    "suggested_provider": "deepseek"
  }
}

Response Location by Endpoint

Endpoint	Response Location
`POST /v1/optimize`	`response.classification`
`POST /v1/complete`	`response.plexor.classification`
`POST /v1/completions`	`response.classification`
`POST /v1/compress`	`response.classification`

Using Classification for Routing

Use the classification to build task-aware pipelines. For example, route code tasks with confidence > 0.75 to DeepSeek for cost savings, while sending reasoning tasks to Claude for quality. When confidence is below 0.55, fall back to Claude for safety. See the full classification API docs for detailed examples.

Streaming

Plexor Labs supports streaming responses via Server-Sent Events (SSE). Enable streaming by setting stream: true in your request body.

Streaming Request

cURL with Streaming

curl -X POST https://api.plexor.dev/gateway/anthropic/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "max_tokens": 1024,
    "stream": true,
    "messages": [
      {"role": "user", "content": "Write a haiku about programming"}
    ]
  }'

Stream Events

Streaming responses include the following event types:

Event Type	Description
`message_start`	Initial message metadata
`content_block_start`	Start of a content block
`content_block_delta`	Incremental text content
`content_block_stop`	End of a content block
`message_delta`	Final usage information
`message_stop`	End of message

Error Handling

Plexor Labs uses standard HTTP status codes and returns detailed error information in a consistent format.

HTTP Status Codes

Code	Meaning	Description
`200`	OK	Request succeeded
`400`	Bad Request	Invalid request parameters or body format
`401`	Unauthorized	Missing or invalid API key
`403`	Forbidden	API key valid but lacks required permissions
`429`	Too Many Requests	Rate limit exceeded
`500`	Internal Server Error	Server error - retry with exponential backoff
`502`	Bad Gateway	Upstream provider error
`503`	Service Unavailable	Service temporarily unavailable

Error Response Format

Error Response

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "Invalid model name: 'gpt-5-ultra'. See supported models at https://docs.plexor.ai/models"
  }
}

Error Types

Type	Description
`authentication_error`	Invalid or missing API key
`invalid_request_error`	Malformed request or invalid parameters
`rate_limit_error`	Too many requests
`api_error`	Internal server error
`overloaded_error`	Service temporarily overloaded

Rate Limiting

Plexor Labs implements rate limiting to ensure fair usage and service stability. Rate limits are applied per API key.

Rate Limit Headers

Rate limit information is returned in response headers:

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed per time window
`X-RateLimit-Remaining`	Remaining requests in current window
`X-RateLimit-Reset`	Unix timestamp when the rate limit resets
`Retry-After`	Seconds to wait before retrying (only on 429)

Handling Rate Limits

Implement exponential backoff when you receive a 429 response. Start with a 1 second delay and double it on each retry, up to a maximum of 60 seconds.

Code Examples

Complete examples for making API requests in various languages.

cURL

curl -X POST https://api.plexor.dev/gateway/anthropic/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Plexor-Mode: balanced" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": "Explain the concept of recursion in programming."
      }
    ],
    "system": "You are a helpful programming tutor. Explain concepts clearly with examples."
  }'

Python

Python (requests)

import requests

# Configuration
API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.plexor.dev"

# Make the request
response = requests.post(
    f"{BASE_URL}/gateway/anthropic/v1/messages",
    headers={
        "Content-Type": "application/json",
        "Authorization": f"Bearer {API_KEY}",
        "X-Plexor-Mode": "balanced",
    },
    json={
        "model": "claude-3-5-sonnet-20241022",
        "max_tokens": 1024,
        "messages": [
            {
                "role": "user",
                "content": "What is the capital of France?"
            }
        ]
    }
)

# Handle the response
if response.status_code == 200:
    data = response.json()
    print(f"Response: {data['content'][0]['text']}")
    print(f"Provider used: {data.get('plexor_provider_used', 'unknown')}")
    print(f"Cost: ${data.get('plexor_cost_usd', 0):.6f}")
else:
    print(f"Error: {response.status_code} - {response.text}")

Python with Anthropic SDK

Python (Anthropic SDK)

import anthropic

# Point the Anthropic SDK at Plexor Labs
client = anthropic.Anthropic(
    api_key="YOUR_PLEXOR_API_KEY",
    base_url="https://api.plexor.dev/gateway/anthropic"
)

# Use the SDK normally - requests are routed through Plexor Labs
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

print(message.content[0].text)

Node.js

Node.js (fetch)

const API_KEY = 'YOUR_API_KEY';
const BASE_URL = 'https://api.plexor.dev';

async function chat(message) {
  const response = await fetch(
    `${BASE_URL}/gateway/anthropic/v1/messages`,
    {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${API_KEY}`,
        'X-Plexor-Mode': 'balanced',
      },
      body: JSON.stringify({
        model: 'claude-3-5-sonnet-20241022',
        max_tokens: 1024,
        messages: [
          { role: 'user', content: message }
        ]
      })
    }
  );

  if (!response.ok) {
    throw new Error(`HTTP error: ${response.status}`);
  }

  const data = await response.json();

  console.log('Response:', data.content[0].text);
  console.log('Provider:', data.plexor_provider_used);
  console.log('Cost: $' + (data.plexor_cost_usd || 0).toFixed(6));

  return data;
}

// Example usage
chat('Explain JavaScript closures in simple terms.')
  .catch(console.error);

Node.js with OpenAI SDK

Node.js (OpenAI SDK)

import OpenAI from 'openai';

// Point the OpenAI SDK at Plexor Labs' OpenAI-compatible endpoint
const client = new OpenAI({
  apiKey: 'YOUR_PLEXOR_API_KEY',
  baseURL: 'https://api.plexor.dev/gateway/openai/v1',
});

async function main() {
  const completion = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      { role: 'user', content: 'Write a haiku about coding.' }
    ],
  });

  console.log(completion.choices[0].message.content);
}

main();

Streaming Example (Python)

Python (Streaming)

import anthropic

client = anthropic.Anthropic(
    api_key="YOUR_PLEXOR_API_KEY",
    base_url="https://api.plexor.dev/gateway/anthropic"
)

# Stream the response
with client.messages.stream(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a short story about a robot learning to paint."}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

print()  # Final newline

SDK Compatibility

Plexor Labs is designed to work seamlessly with the official Anthropic and OpenAI SDKs. Simply change the base_url to point to Plexor Labs, and all SDK features including streaming, tool use, and vision work automatically.