Quick Start

Get started with Hontoni API in under 2 minutes. Hontoni provides a unified API gateway for Claude Sonnet 4.x, Opus 4.x, GPT-5.x, and Gemini Pro models through OpenAI-compatible and Anthropic-compatible endpoints.

1. Get your API key

bash

# Register
curl -X POST https://api.hontoni.vn/api/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "you@example.com", "password": "your-password", "name": "Your Name"}'

# Create an API key
curl -X POST https://api.hontoni.vn/api/keys \
  -H "Authorization: Bearer <access_token>" \
  -H "Content-Type: application/json" \
  -d '{"name": "my-key"}'

2. Make your first request

bash

curl https://api.hontoni.vn/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "user", "content": "Hello, world!"}
    ]
  }'

That's it!

You're now using Claude Sonnet 4 through the OpenAI-compatible API. Switch models by changing the model field.

Authentication

All API requests require authentication via an API key. You can pass it in two ways:

bash

# Option 1: Authorization header (recommended)
Authorization: Bearer sk-your-api-key

# Option 2: X-API-Key header
X-API-Key: sk-your-api-key

API Key Format

API keys are prefixed with sk-gw- followed by a unique identifier. Keep your keys secure and never expose them in client-side code.

Base URL

All AI API endpoints are relative to your deployment base URL:

text

${apiUrl}/v1

Endpoint	Description
`/v1/chat/completions`	OpenAI Chat Completions API
`/v1/messages`	Anthropic Messages API
`/v1/responses`	OpenAI Responses API
`/v1/models`	List available models
`/api/keys`	Manage API keys
`/api/billing`	Balance & billing
`/api/subscription`	Subscription plans
`/api/addons`	Rate limit add-ons

Tool Integrations

Hontoni works as a drop-in replacement for OpenAI and Anthropic APIs. Configure your favorite AI coding tool to use Hontoni as the backend.

Claude Code

Configure Claude Code to use Hontoni as the API provider:

bash

# Set environment variables
export ANTHROPIC_BASE_URL=https://api.hontoni.vn/v1
export ANTHROPIC_API_KEY=sk-your-api-key

# Or configure in ~/.claude/config.json
{
  "apiBaseUrl": "https://api.hontoni.vn/v1",
  "apiKey": "sk-your-api-key"
}

Cursor

In Cursor Settings > Models > OpenAI API Key:

text

API Key: sk-your-api-key
Base URL: https://api.hontoni.vn/v1
Model: claude-sonnet-4

Cursor Tip

Enable "Override OpenAI Base URL" in settings, then enter the Hontoni base URL. All OpenAI-compatible models will work automatically.

Windsurf

Configure in Windsurf settings:

json

{
  "ai.provider": "openai",
  "ai.openai.baseUrl": "https://api.hontoni.vn/v1",
  "ai.openai.apiKey": "sk-your-api-key",
  "ai.openai.model": "claude-sonnet-4"
}

Continue

Add to your ~/.continue/config.json:

json

{
  "models": [
    {
      "title": "Hontoni - Claude Sonnet 4",
      "provider": "openai",
      "model": "claude-sonnet-4",
      "apiBase": "https://api.hontoni.vn/v1",
      "apiKey": "sk-your-api-key"
    }
  ]
}

Cline

In VS Code, open Cline settings and configure the API provider:

text

Provider: OpenAI Compatible
Base URL: https://api.hontoni.vn/v1
API Key: sk-your-api-key
Model: claude-sonnet-4

Aider

Set environment variables or use command-line flags:

bash

# Environment variables
export OPENAI_API_BASE=https://api.hontoni.vn/v1
export OPENAI_API_KEY=sk-your-api-key

# Or use flags
aider --openai-api-base https://api.hontoni.vn/v1 \
      --openai-api-key sk-your-api-key \
      --model claude-sonnet-4

OpenCode

Configure in opencode.json:

json

{
  "provider": {
    "openai": {
      "apiKey": "sk-your-api-key",
      "baseURL": "https://api.hontoni.vn/v1"
    }
  },
  "model": {
    "default": "claude-sonnet-4"
  }
}

Chat Completions API

OpenAI-compatible Chat Completions endpoint. Drop-in replacement for POST /v1/chat/completions.

POST /v1/chat/completions

Request Body

Parameter	Type	Required	Description
`model`	string	Yes	Model ID (e.g. `claude-sonnet-4`)
`messages`	array	Yes	Array of message objects
`stream`	boolean	No	Enable SSE streaming (default: false)
`temperature`	number	No	Sampling temperature (0-2)
`max_tokens`	integer	No	Maximum tokens to generate
`top_p`	number	No	Nucleus sampling parameter
`tools`	array	No	Tool/function definitions
`tool_choice`	string\|object	No	Tool selection behavior

Message Object

Field	Type	Description
`role`	string	`system`, `user`, `assistant`, or `tool`
`content`	string\|null	Message content
`name`	string	Optional sender name
`tool_calls`	array	Tool calls (assistant messages)

Example: Non-streaming

bash

curl -X POST https://api.hontoni.vn/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in 3 sentences."}
    ],
    "max_tokens": 200
  }'

Example: Streaming

bash

curl -X POST https://api.hontoni.vn/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Write a haiku about coding"}
    ],
    "stream": true
  }'

Response

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "claude-sonnet-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 50,
    "total_tokens": 75
  }
}

Messages API

Anthropic-compatible Messages endpoint. Drop-in replacement for POST /v1/messages.

POST /v1/messages

Request Body

Parameter	Type	Required	Description
`model`	string	Yes	Model ID (e.g. `claude-sonnet-4`)
`messages`	array	Yes	Conversation messages
`max_tokens`	integer	Yes	Maximum tokens to generate
`system`	string	No	System prompt
`stream`	boolean	No	Enable SSE streaming
`temperature`	number	No	Sampling temperature
`tools`	array	No	Tool definitions
`thinking`	object	No	Extended thinking config

Thinking (Extended Reasoning)

For models that support reasoning (Claude Sonnet 4, Claude Opus 4):

json

{
  "model": "claude-sonnet-4",
  "messages": [{"role": "user", "content": "Solve: x^2 + 5x + 6 = 0"}],
  "max_tokens": 16000,
  "thinking": {
    "type": "enabled",
    "budget_tokens": 10000
  }
}

Example Request

bash

curl -X POST https://api.hontoni.vn/v1/messages \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4",
    "max_tokens": 1024,
    "system": "You are a helpful coding assistant.",
    "messages": [
      {"role": "user", "content": "Write a Python fibonacci function"}
    ]
  }'

Response

json

{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "model": "claude-sonnet-4",
  "content": [
    {
      "type": "text",
      "text": "Here's a Python fibonacci function..."
    }
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 30,
    "output_tokens": 150
  }
}

Responses API

OpenAI Responses API (newer format). Supports reasoning effort control and simplified input format.

POST /v1/responses

Request Body

Parameter	Type	Required	Description
`model`	string	Yes	Model ID
`input`	string\|array	Yes	Simple string or structured input items
`instructions`	string	No	System instructions
`stream`	boolean	No	Enable SSE streaming
`max_output_tokens`	integer	No	Maximum output tokens
`temperature`	number	No	Sampling temperature
`reasoning`	object	No	Reasoning configuration

Reasoning Effort

Control how much reasoning the model uses:

json

{
  "model": "gpt-5.2",
  "input": "What is the meaning of life?",
  "reasoning": {
    "effort": "high",
    "summary": "auto"
  }
}

Effort levels: none, minimal, low, medium, high, xhigh

Streaming Events

When stream: true, the API sends Server-Sent Events:

Event	Description
`response.created`	Response object created
`response.in_progress`	Generation started
`response.output_text.delta`	Text content chunk
`response.reasoning_text.delta`	Reasoning content chunk
`response.completed`	Generation finished

Models & Pricing

All available models with capabilities and per-token pricing (per 1M tokens in USD).

GET /v1/models

Anthropic Models

Model	Context	Input	Output	Cache Read	Cache Write	Reasoning	Capabilities
`claude-sonnet-4`	216K	$3.00	$15.00	$0.30	$3.75	$15.00	reasoning, tools, vision, code, chat
`claude-sonnet-4.5`	200K	$3.00	$15.00	$0.30	$3.75	$15.00	reasoning, tools, vision, code, chat
`claude-sonnet-4.6`	200K	$3.00	$15.00	$0.30	$3.75	$15.00	reasoning, tools, vision, code, chat
`claude-opus-4.5`	200K	$5.00	$25.00	$0.50	$6.25	$25.00	reasoning, tools, vision, code, chat
`claude-opus-4.6`	200K	$5.00	$25.00	$0.50	$6.25	$25.00	reasoning, tools, vision, code, chat
`claude-haiku-4.5`	200K	$1.00	$5.00	$0.10	$1.25	$5.00	reasoning, tools, vision, code, chat

OpenAI Models

Model	Context	Input	Output	Reasoning	Capabilities
`gpt-4o`	128K	$2.50	$10.00	-	tools, vision, code, chat
`gpt-4o-mini`	128K	$0.15	$0.60	-	tools, vision, code, chat
`gpt-4.1`	128K	$2.00	$8.00	-	tools, vision, code, chat
`gpt-5.1`	264K	$5.00	$15.00	$15.00	reasoning, tools, vision, code, chat
`gpt-5.2`	264K	$5.00	$20.00	$20.00	reasoning, tools, vision, code, chat
`gpt-5.2-codex`	400K	$5.00	$20.00	$20.00	reasoning, tools, vision, code, chat
`gpt-5.3-codex`	400K	$5.00	$20.00	$20.00	reasoning, tools, vision, code, chat
`gpt-5.4`	400K	$3.00	$12.00	$12.00	reasoning, tools, vision, code, chat
`gpt-5.4-mini`	400K	$0.40	$1.60	$1.60	reasoning, tools, vision, code, chat
`gpt-5-mini`	264K	$1.00	$4.00	$4.00	reasoning, tools, vision, code, chat

Google Models

Model	Context	Input	Output	Reasoning	Capabilities
`gemini-2.5-pro`	128K	$1.25	$10.00	$10.00	reasoning, tools, vision, code, chat
`gemini-3-flash-preview`	128K	$0.15	$0.60	$0.60	reasoning, tools, vision, code, chat
`gemini-3.1-pro-preview`	128K	$1.25	$5.00	$5.00	reasoning, tools, vision, code, chat

Live Model Data

For the most up-to-date model list and capabilities, use the GET /api/models endpoint.

Model Variants

Use model variant suffixes to control reasoning effort and context size without extra parameters.

Reasoning Effort Suffixes

Append :high or :max to any thinking-capable model to set reasoning effort automatically:

Suffix	Budget Models (Claude/Gemini)	Effort Models (o-series)
`:high`	thinking_budget = 32,768 tokens	reasoning_effort = "high"
`:max`	thinking_budget = 65,536 tokens	reasoning_effort = "xhigh"

Extended Context Suffix

Append -1m to request an extended 1M token context window:

Variant	Effect
`claude-sonnet-4-1m`	Extended context via pay-as-you-go
`gpt-4o-1m`	Extended context via pay-as-you-go

Combining Suffixes

Reasoning and context suffixes can be combined:

bash

curl -X POST ${baseUrl}/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -d '{"model": "claude-sonnet-4:high-1m", "messages": [{"role": "user", "content": "Analyze this large codebase..."}]}'

Note: Variant suffixes are convenience shortcuts. You can still use the explicit thinking_budget or reasoning_effort parameters — explicit parameters take priority over variant suffixes.

Rate Limits

Rate limits protect the API from abuse and ensure fair usage. Limits apply per API key.

Plan-Based Limits

Plan	Price	RPM	Per 5h	Daily	Weekly	Monthly	Concurrent	Thinking Budget
Basic	$6/mo	8	20	100	120	480	1	8,000
Standard	$13/mo	12	45	250	275	1,100	2	16,000
Premium	$39/mo	20	200	1,000	1,250	5,000	2	32,000
Ultimate	$79/mo	40	800	3,000	5,000	20,000	4	64,000

Rate Limit Headers

Every API response includes rate limit headers:

text

X-RateLimit-Limit-RPM: 12
X-RateLimit-Remaining-RPM: 11
X-RateLimit-Limit-Daily: 250
X-RateLimit-Remaining-Daily: 248
X-RateLimit-Limit-5h: 45
X-RateLimit-Remaining-5h: 43
X-RateLimit-Limit-Weekly: 275
X-RateLimit-Remaining-Weekly: 270
X-RateLimit-Limit-Monthly: 1100
X-RateLimit-Remaining-Monthly: 1095
X-RateLimit-Concurrent-Limit: 2
X-RateLimit-Concurrent-Active: 1

Rate Limit Add-ons

Boost your rate limits with add-ons:

Add-on	Price	Effect
Rate Limit 2x	$4.99/mo	Double all rate limits
Rate Limit 5x	$9.99/mo	5x all rate limits
Rate Limit 10x	$19.99/mo	10x all rate limits

Billing

Hontoni uses a prepaid balance system. Top up your account and pay per token used.

Cost Calculation

Cost is calculated per request based on token usage and model pricing:

text

cost = (input_tokens × input_price / 1,000,000)
     + (output_tokens × output_price / 1,000,000)
     + (reasoning_tokens × reasoning_price / 1,000,000)

Example

Using claude-sonnet-4 with 1,000 input tokens and 500 output tokens:

text

Input cost:  1,000 × $3.00 / 1,000,000 = $0.003
Output cost: 500 × $15.00 / 1,000,000   = $0.0075
Total cost:  $0.0105

Billing Endpoints

Method	Endpoint	Description
GET	`/api/billing/balance`	Check current balance
POST	`/api/billing/topup`	Add funds (direct credit)
POST	`/api/billing/topup/checkout`	Create Stripe Checkout session
POST	`/api/billing/topup/intent`	Create Stripe PaymentIntent
GET	`/api/billing/transactions`	Transaction history
GET	`/api/billing/invoices`	List Stripe invoices
GET	`/api/billing/invoices/:id/pdf`	Get invoice PDF URL
GET	`/api/billing/referral`	Referral info & stats
POST	`/api/billing/referral`	Apply referral code
POST	`/api/billing/promo-code`	Validate promo code

Insufficient Balance

Requests will be rejected with a 402 Payment Required error when your balance is too low. Top up your account to continue using the API.

Error Handling

The API uses standard HTTP status codes and returns structured error responses.

Error Response Format

json

{
  "error": {
    "message": "Invalid API key provided",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

Status Codes

Code	Description	Common Cause
400	Bad Request	Invalid request body or parameters
401	Unauthorized	Missing or invalid API key
402	Payment Required	Insufficient balance
404	Not Found	Invalid model or endpoint
409	Conflict	Duplicate resource
429	Too Many Requests	Rate limit exceeded
500	Internal Server Error	Upstream provider error

Handling Rate Limits

typescript

async function callWithRetry(request, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    const response = await fetch(url, request);

    if (response.status === 429) {
      const retryAfter = response.headers.get('retry-after');
      const delay = retryAfter ? parseInt(retryAfter) * 1000 : Math.pow(2, i) * 1000;
      await new Promise(r => setTimeout(r, delay));
      continue;
    }

    return response;
  }
  throw new Error('Max retries exceeded');
}

FAQ

What models are supported?

Hontoni supports Claude (Sonnet 4, Sonnet 4.5, Sonnet 4.6, Opus 4.5, Opus 4.6, Haiku 4.5), GPT (4o, 4o-mini, 4.1, 5.1, 5.2, 5.4, 5-mini, codex variants), and Gemini (2.5 Pro, 3 Flash, 3.1 Pro). See the Models & Pricing section for the full list.

Is the API compatible with OpenAI SDKs?

Yes. Hontoni provides an OpenAI-compatible API at /v1/chat/completions. You can use the official OpenAI SDK by setting the base URL:

typescript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-api-key',
  baseURL: 'https://api.hontoni.vn/v1',
});

const response = await client.chat.completions.create({
  model: 'claude-sonnet-4',
  messages: [{ role: 'user', content: 'Hello!' }],
});

Is the API compatible with Anthropic SDKs?

Yes. Hontoni provides an Anthropic-compatible API at /v1/messages. Use the official Anthropic SDK:

typescript

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: 'sk-your-api-key',
  baseURL: 'https://api.hontoni.vn/v1',
});

const message = await client.messages.create({
  model: 'claude-sonnet-4',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello!' }],
});

How does billing work?

Hontoni uses a prepaid balance system. You top up your account with funds, and each API request deducts costs based on token usage and the model's pricing. See the Billing section for the cost calculation formula.

What happens when I exceed rate limits?

You'll receive a 429 Too Many Requests response with rate limit headers indicating when limits reset. Implement exponential backoff in your client. Consider upgrading your plan or adding rate limit add-ons for higher limits.

Can I use multiple models in the same project?

Absolutely. Simply change the model parameter in each request. Use cheaper models like gpt-4o-mini or claude-haiku-4.5 for simple tasks and powerful models like claude-opus-4.6 for complex reasoning.