Quick Start

Get started with Hontoni API in under 2 minutes. Hontoni provides a unified API gateway for Claude Sonnet 4.x, Opus 4.x, GPT-5.x, and Gemini Pro models through OpenAI-compatible and Anthropic-compatible endpoints.

1. Get your API key

Register an account and create an API key from the dashboard, or use the API directly:

bash
# Register
curl -X POST https://api.hontoni.vn/api/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "you@example.com", "password": "your-password", "name": "Your Name"}'

# Create an API key
curl -X POST https://api.hontoni.vn/api/keys \
  -H "Authorization: Bearer <access_token>" \
  -H "Content-Type: application/json" \
  -d '{"name": "my-key"}'

2. Make your first request

bash
curl https://api.hontoni.vn/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "user", "content": "Hello, world!"}
    ]
  }'
That's it!

You're now using Claude Sonnet 4 through the OpenAI-compatible API. Switch models by changing the model field.

Authentication

All API requests require authentication via an API key. You can pass it in two ways:

bash
# Option 1: Authorization header (recommended)
Authorization: Bearer sk-your-api-key

# Option 2: X-API-Key header
X-API-Key: sk-your-api-key
API Key Format

API keys are prefixed with sk-gw- followed by a unique identifier. Keep your keys secure and never expose them in client-side code.

Base URL

All AI API endpoints are relative to your deployment base URL:

text
${apiUrl}/v1
EndpointDescription
/v1/chat/completionsOpenAI Chat Completions API
/v1/messagesAnthropic Messages API
/v1/responsesOpenAI Responses API
/v1/modelsList available models
/api/keysManage API keys
/api/billingBalance & billing
/api/subscriptionSubscription plans
/api/addonsRate limit add-ons

Tool Integrations

Hontoni works as a drop-in replacement for OpenAI and Anthropic APIs. Configure your favorite AI coding tool to use Hontoni as the backend.

Claude Code

Configure Claude Code to use Hontoni as the API provider:

bash
# Set environment variables
export ANTHROPIC_BASE_URL=https://api.hontoni.vn/v1
export ANTHROPIC_API_KEY=sk-your-api-key

# Or configure in ~/.claude/config.json
{
  "apiBaseUrl": "https://api.hontoni.vn/v1",
  "apiKey": "sk-your-api-key"
}

Cursor

In Cursor Settings > Models > OpenAI API Key:

text
API Key: sk-your-api-key
Base URL: https://api.hontoni.vn/v1
Model: claude-sonnet-4
Cursor Tip

Enable "Override OpenAI Base URL" in settings, then enter the Hontoni base URL. All OpenAI-compatible models will work automatically.

Windsurf

Configure in Windsurf settings:

json
{
  "ai.provider": "openai",
  "ai.openai.baseUrl": "https://api.hontoni.vn/v1",
  "ai.openai.apiKey": "sk-your-api-key",
  "ai.openai.model": "claude-sonnet-4"
}

Continue

Add to your ~/.continue/config.json:

json
{
  "models": [
    {
      "title": "Hontoni - Claude Sonnet 4",
      "provider": "openai",
      "model": "claude-sonnet-4",
      "apiBase": "https://api.hontoni.vn/v1",
      "apiKey": "sk-your-api-key"
    }
  ]
}

Cline

In VS Code, open Cline settings and configure the API provider:

text
Provider: OpenAI Compatible
Base URL: https://api.hontoni.vn/v1
API Key: sk-your-api-key
Model: claude-sonnet-4

Aider

Set environment variables or use command-line flags:

bash
# Environment variables
export OPENAI_API_BASE=https://api.hontoni.vn/v1
export OPENAI_API_KEY=sk-your-api-key

# Or use flags
aider --openai-api-base https://api.hontoni.vn/v1 \
      --openai-api-key sk-your-api-key \
      --model claude-sonnet-4

OpenCode

Configure in opencode.json:

json
{
  "provider": {
    "openai": {
      "apiKey": "sk-your-api-key",
      "baseURL": "https://api.hontoni.vn/v1"
    }
  },
  "model": {
    "default": "claude-sonnet-4"
  }
}

Chat Completions API

OpenAI-compatible Chat Completions endpoint. Drop-in replacement for POST /v1/chat/completions.

POST /v1/chat/completions

Request Body

ParameterTypeRequiredDescription
modelstringYesModel ID (e.g. claude-sonnet-4)
messagesarrayYesArray of message objects
streambooleanNoEnable SSE streaming (default: false)
temperaturenumberNoSampling temperature (0-2)
max_tokensintegerNoMaximum tokens to generate
top_pnumberNoNucleus sampling parameter
toolsarrayNoTool/function definitions
tool_choicestring|objectNoTool selection behavior

Message Object

FieldTypeDescription
rolestringsystem, user, assistant, or tool
contentstring|nullMessage content
namestringOptional sender name
tool_callsarrayTool calls (assistant messages)

Example: Non-streaming

bash
curl -X POST https://api.hontoni.vn/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in 3 sentences."}
    ],
    "max_tokens": 200
  }'

Example: Streaming

bash
curl -X POST https://api.hontoni.vn/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Write a haiku about coding"}
    ],
    "stream": true
  }'

Response

json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "claude-sonnet-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 50,
    "total_tokens": 75
  }
}

Messages API

Anthropic-compatible Messages endpoint. Drop-in replacement for POST /v1/messages.

POST /v1/messages

Request Body

ParameterTypeRequiredDescription
modelstringYesModel ID (e.g. claude-sonnet-4)
messagesarrayYesConversation messages
max_tokensintegerYesMaximum tokens to generate
systemstringNoSystem prompt
streambooleanNoEnable SSE streaming
temperaturenumberNoSampling temperature
toolsarrayNoTool definitions
thinkingobjectNoExtended thinking config

Thinking (Extended Reasoning)

For models that support reasoning (Claude Sonnet 4, Claude Opus 4):

json
{
  "model": "claude-sonnet-4",
  "messages": [{"role": "user", "content": "Solve: x^2 + 5x + 6 = 0"}],
  "max_tokens": 16000,
  "thinking": {
    "type": "enabled",
    "budget_tokens": 10000
  }
}

Example Request

bash
curl -X POST https://api.hontoni.vn/v1/messages \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4",
    "max_tokens": 1024,
    "system": "You are a helpful coding assistant.",
    "messages": [
      {"role": "user", "content": "Write a Python fibonacci function"}
    ]
  }'

Response

json
{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "model": "claude-sonnet-4",
  "content": [
    {
      "type": "text",
      "text": "Here's a Python fibonacci function..."
    }
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 30,
    "output_tokens": 150
  }
}

Responses API

OpenAI Responses API (newer format). Supports reasoning effort control and simplified input format.

POST /v1/responses

Request Body

ParameterTypeRequiredDescription
modelstringYesModel ID
inputstring|arrayYesSimple string or structured input items
instructionsstringNoSystem instructions
streambooleanNoEnable SSE streaming
max_output_tokensintegerNoMaximum output tokens
temperaturenumberNoSampling temperature
reasoningobjectNoReasoning configuration

Reasoning Effort

Control how much reasoning the model uses:

json
{
  "model": "gpt-5.2",
  "input": "What is the meaning of life?",
  "reasoning": {
    "effort": "high",
    "summary": "auto"
  }
}

Effort levels: none, minimal, low, medium, high, xhigh

Streaming Events

When stream: true, the API sends Server-Sent Events:

EventDescription
response.createdResponse object created
response.in_progressGeneration started
response.output_text.deltaText content chunk
response.reasoning_text.deltaReasoning content chunk
response.completedGeneration finished

Models & Pricing

All available models with capabilities and per-token pricing (per 1M tokens in USD).

GET /v1/models

Anthropic Models

ModelContextInputOutputCache ReadCache WriteReasoningCapabilities
claude-sonnet-4216K$3.00$15.00$0.30$3.75$15.00reasoning, tools, vision, code, chat
claude-sonnet-4.5200K$3.00$15.00$0.30$3.75$15.00reasoning, tools, vision, code, chat
claude-sonnet-4.6200K$3.00$15.00$0.30$3.75$15.00reasoning, tools, vision, code, chat
claude-opus-4.5200K$5.00$25.00$0.50$6.25$25.00reasoning, tools, vision, code, chat
claude-opus-4.6200K$5.00$25.00$0.50$6.25$25.00reasoning, tools, vision, code, chat
claude-haiku-4.5200K$1.00$5.00$0.10$1.25$5.00reasoning, tools, vision, code, chat

OpenAI Models

ModelContextInputOutputReasoningCapabilities
gpt-4o128K$2.50$10.00-tools, vision, code, chat
gpt-4o-mini128K$0.15$0.60-tools, vision, code, chat
gpt-4.1128K$2.00$8.00-tools, vision, code, chat
gpt-5.1264K$5.00$15.00$15.00reasoning, tools, vision, code, chat
gpt-5.2264K$5.00$20.00$20.00reasoning, tools, vision, code, chat
gpt-5.2-codex400K$5.00$20.00$20.00reasoning, tools, vision, code, chat
gpt-5.3-codex400K$5.00$20.00$20.00reasoning, tools, vision, code, chat
gpt-5.4400K$3.00$12.00$12.00reasoning, tools, vision, code, chat
gpt-5.4-mini400K$0.40$1.60$1.60reasoning, tools, vision, code, chat
gpt-5-mini264K$1.00$4.00$4.00reasoning, tools, vision, code, chat

Google Models

ModelContextInputOutputReasoningCapabilities
gemini-2.5-pro128K$1.25$10.00$10.00reasoning, tools, vision, code, chat
gemini-3-flash-preview128K$0.15$0.60$0.60reasoning, tools, vision, code, chat
gemini-3.1-pro-preview128K$1.25$5.00$5.00reasoning, tools, vision, code, chat
Live Model Data

For the most up-to-date model list and capabilities, use the GET /api/models endpoint.

Model Variants

Use model variant suffixes to control reasoning effort and context size without extra parameters.

Reasoning Effort Suffixes

Append :high or :max to any thinking-capable model to set reasoning effort automatically:

SuffixBudget Models (Claude/Gemini)Effort Models (o-series)
:highthinking_budget = 32,768 tokensreasoning_effort = "high"
:maxthinking_budget = 65,536 tokensreasoning_effort = "xhigh"

Extended Context Suffix

Append -1m to request an extended 1M token context window:

VariantEffect
claude-sonnet-4-1mExtended context via pay-as-you-go
gpt-4o-1mExtended context via pay-as-you-go

Combining Suffixes

Reasoning and context suffixes can be combined:

bash
curl -X POST ${baseUrl}/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -d '{"model": "claude-sonnet-4:high-1m", "messages": [{"role": "user", "content": "Analyze this large codebase..."}]}'
Note: Variant suffixes are convenience shortcuts. You can still use the explicit thinking_budget or reasoning_effort parameters — explicit parameters take priority over variant suffixes.

Rate Limits

Rate limits protect the API from abuse and ensure fair usage. Limits apply per API key.

Plan-Based Limits

PlanPriceRPMPer 5hDailyWeeklyMonthlyConcurrentThinking Budget
Basic$6/mo82010012048018,000
Standard$13/mo12452502751,100216,000
Premium$39/mo202001,0001,2505,000232,000
Ultimate$79/mo408003,0005,00020,000464,000

Rate Limit Headers

Every API response includes rate limit headers:

text
X-RateLimit-Limit-RPM: 12
X-RateLimit-Remaining-RPM: 11
X-RateLimit-Limit-Daily: 250
X-RateLimit-Remaining-Daily: 248
X-RateLimit-Limit-5h: 45
X-RateLimit-Remaining-5h: 43
X-RateLimit-Limit-Weekly: 275
X-RateLimit-Remaining-Weekly: 270
X-RateLimit-Limit-Monthly: 1100
X-RateLimit-Remaining-Monthly: 1095
X-RateLimit-Concurrent-Limit: 2
X-RateLimit-Concurrent-Active: 1

Rate Limit Add-ons

Boost your rate limits with add-ons:

Add-onPriceEffect
Rate Limit 2x$4.99/moDouble all rate limits
Rate Limit 5x$9.99/mo5x all rate limits
Rate Limit 10x$19.99/mo10x all rate limits

Billing

Hontoni uses a prepaid balance system. Top up your account and pay per token used.

Cost Calculation

Cost is calculated per request based on token usage and model pricing:

text
cost = (input_tokens × input_price / 1,000,000)
     + (output_tokens × output_price / 1,000,000)
     + (reasoning_tokens × reasoning_price / 1,000,000)

Example

Using claude-sonnet-4 with 1,000 input tokens and 500 output tokens:

text
Input cost:  1,000 × $3.00 / 1,000,000 = $0.003
Output cost: 500 × $15.00 / 1,000,000   = $0.0075
Total cost:  $0.0105

Billing Endpoints

MethodEndpointDescription
GET/api/billing/balanceCheck current balance
POST/api/billing/topupAdd funds (direct credit)
POST/api/billing/topup/checkoutCreate Stripe Checkout session
POST/api/billing/topup/intentCreate Stripe PaymentIntent
GET/api/billing/transactionsTransaction history
GET/api/billing/invoicesList Stripe invoices
GET/api/billing/invoices/:id/pdfGet invoice PDF URL
GET/api/billing/referralReferral info & stats
POST/api/billing/referralApply referral code
POST/api/billing/promo-codeValidate promo code
Insufficient Balance

Requests will be rejected with a 402 Payment Required error when your balance is too low. Top up your account to continue using the API.

Error Handling

The API uses standard HTTP status codes and returns structured error responses.

Error Response Format

json
{
  "error": {
    "message": "Invalid API key provided",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

Status Codes

CodeDescriptionCommon Cause
400Bad RequestInvalid request body or parameters
401UnauthorizedMissing or invalid API key
402Payment RequiredInsufficient balance
404Not FoundInvalid model or endpoint
409ConflictDuplicate resource
429Too Many RequestsRate limit exceeded
500Internal Server ErrorUpstream provider error

Handling Rate Limits

typescript
async function callWithRetry(request, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    const response = await fetch(url, request);

    if (response.status === 429) {
      const retryAfter = response.headers.get('retry-after');
      const delay = retryAfter ? parseInt(retryAfter) * 1000 : Math.pow(2, i) * 1000;
      await new Promise(r => setTimeout(r, delay));
      continue;
    }

    return response;
  }
  throw new Error('Max retries exceeded');
}

FAQ

What models are supported?

Hontoni supports Claude (Sonnet 4, Sonnet 4.5, Sonnet 4.6, Opus 4.5, Opus 4.6, Haiku 4.5), GPT (4o, 4o-mini, 4.1, 5.1, 5.2, 5.4, 5-mini, codex variants), and Gemini (2.5 Pro, 3 Flash, 3.1 Pro). See the Models & Pricing section for the full list.

Is the API compatible with OpenAI SDKs?

Yes. Hontoni provides an OpenAI-compatible API at /v1/chat/completions. You can use the official OpenAI SDK by setting the base URL:

typescript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-api-key',
  baseURL: 'https://api.hontoni.vn/v1',
});

const response = await client.chat.completions.create({
  model: 'claude-sonnet-4',
  messages: [{ role: 'user', content: 'Hello!' }],
});

Is the API compatible with Anthropic SDKs?

Yes. Hontoni provides an Anthropic-compatible API at /v1/messages. Use the official Anthropic SDK:

typescript
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: 'sk-your-api-key',
  baseURL: 'https://api.hontoni.vn/v1',
});

const message = await client.messages.create({
  model: 'claude-sonnet-4',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello!' }],
});

How does billing work?

Hontoni uses a prepaid balance system. You top up your account with funds, and each API request deducts costs based on token usage and the model's pricing. See the Billing section for the cost calculation formula.

What happens when I exceed rate limits?

You'll receive a 429 Too Many Requests response with rate limit headers indicating when limits reset. Implement exponential backoff in your client. Consider upgrading your plan or adding rate limit add-ons for higher limits.

Can I use multiple models in the same project?

Absolutely. Simply change the model parameter in each request. Use cheaper models like gpt-4o-mini or claude-haiku-4.5 for simple tasks and powerful models like claude-opus-4.6 for complex reasoning.