Configure your first plan

Design realistic plan tiers for your product, submit them via the API, and update them as you learn.

Plans encode the economics of your product. This guide walks through designing a realistic tiered setup, submitting it via the API, and updating it as you learn.

Decide your tiers before writing config

Before opening an API call, answer four questions for each tier you plan to offer.

1. What's the worst-case dollar cost you can absorb on a single user?

This is your max_spend_per_period. Be honest about what would still be profitable. If a $49 plan needs to keep margin even on a heavy user, the cap might be $20–25, not $49.

2. Which models should this tier have access to, and how much?

This is your model_limits. Free tier might only get GPT-4o-mini. Pro tier gets GPT-4o-mini + GPT-4o with separate quotas. Enterprise gets everything. Per-model quotas are the lever that lets the same dollar cap mean very different things across tiers.

3. What's the longest single conversation you want to permit?

This is your max_spend_per_session. Caps a runaway agent loop or one bad prompt from burning a user's whole monthly allocation in one go. Common rule of thumb: 10–20% of max_spend_per_period.

4. When should soft warnings fire?

This is your soft_gate_at (default 0.80, fires at 80%). Earlier means more time to act — prompt an upgrade, switch models, warn the user. Later means fewer warnings and tighter enforcement.

Build a realistic two-tier setup

Here's a complete config for a chatbot product with Free and Pro tiers, with reasoning for each number.

{
  "soft_gate_at": 0.80,
  "hard_gate_at": 1.00,
  "plans": [
    {
      "name": "free",
      "max_spend_per_period": 5.00,
      "max_spend_per_session": 0.50,
      "model_limits": {
        "gpt-4o-mini": {"max_tokens_per_period": 10000}
      },
      "cost_rates": {
        "gpt-4o-mini": {"input": 0.00015, "output": 0.0006}
      },
      "session_timeout_minutes": 30.0
    },
    {
      "name": "pro",
      "max_spend_per_period": 49.00,
      "max_spend_per_session": 5.00,
      "model_limits": {
        "gpt-4o": {"max_tokens_per_period": 50000},
        "gpt-4o-mini": {"max_tokens_per_period": 200000}
      },
      "cost_rates": {
        "gpt-4o": {"input": 0.0025, "output": 0.01},
        "gpt-4o-mini": {"input": 0.00015, "output": 0.0006}
      },
      "session_timeout_minutes": 30.0
    }
  ]
}

The reasoning behind each tier:

Free — capped at $5/period because that's the most you're willing to lose on a free user. Locked to gpt-4o-mini only (no GPT-4o access at all — leaving the model out of model_limits and cost_rates means calls to it won't be metered or allowed against this plan's tokens). 10K-token quota gives the user a real taste of the product without burning through the cap. Per-session cap of $0.50 stops a long conversation from eating the entire monthly allocation.

Pro — $49/period reflects what you charge plus your margin. Two models with separate quotas: 50K GPT-4o tokens and 200K GPT-4o-mini tokens. A user can lean on the cheap model for routine tasks and the expensive one for the hard ones, without one starving the other. Per-session cap of $5 protects against runaway loops while still allowing genuinely complex multi-call sessions.

Both tiers use the top-level soft_gate_at and hard_gate_at (80% / 100%). You could override either per-plan if you wanted Free users to see warnings earlier (say, at 60%) — but consistent thresholds across tiers make the user experience predictable.

Cost rates are yours, not OpenAI's

The numbers in cost_rates are the prices you charge against your plan's dollar cap — they don't have to match what OpenAI bills you.

Most teams use one of three approaches:

Pass-through — copy OpenAI's published rates verbatim (what's shown above). Simplest, lowest friction, smallest margin protection.
Marked up — multiply OpenAI's rates by 1.5–3x. Builds margin headroom into the cap itself, so a "$49 plan" gives the user roughly $30 of OpenAI compute and absorbs the rest.
Flat tier rates — round numbers (e.g., $0.01/1K tokens regardless of model). Easier to communicate to users; less precise about real costs.

You can always update cost_rates later as OpenAI changes their pricing or as you learn what your users actually consume.

Submit your plans

Send the config to the Paygent API:

curl

curl -X POST https://api.paygent.to/api/v1/config/plans \
  -H "Authorization: Bearer pg_live_..." \
  -H "Content-Type: application/json" \
  -d @plans.json

Python (httpx)

import httpx

with open("plans.json") as f:
    config = f.read()

resp = httpx.post(
    "https://api.paygent.to/api/v1/config/plans",
    headers={"Authorization": "Bearer pg_live_..."},
    content=config,
)

plans = {p["name"]: p["id"] for p in resp.json()["plans"]}
# {'free': '...', 'pro': '...'}

The response includes each plan's UUID — save these, you'll need them when assigning users.

{
  "plans": [
    {"id": "01HF...", "name": "free", "max_spend_per_period": 5.00, ...},
    {"id": "01HG...", "name": "pro", "max_spend_per_period": 49.00, ...}
  ],
  "duplicates": []
}

Note

If you re-run this call, plans with names that already exist are silently skipped and listed under duplicates. To change an existing plan, use PATCH (next section), not another POST.

Verify with GET /plans

Confirm what's actually configured:

curl https://api.paygent.to/api/v1/config/plans \
  -H "Authorization: Bearer pg_live_..."

Returns the full list of plans for your product, including all defaults that were filled in (e.g., soft_gate_at: 0.80 if you didn't override it).

Update a plan as you learn

Once a plan is in production, you'll find numbers you want to tune — Free users hitting the limit too quickly, Pro users not getting enough GPT-4o, costs needing markup. Use PATCH to update one field at a time without re-sending the whole plan:

curl

curl -X PATCH https://api.paygent.to/api/v1/config/plans/PRO_PLAN_UUID \
  -H "Authorization: Bearer pg_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model_limits": {
      "gpt-4o": {"max_tokens_per_period": 75000},
      "gpt-4o-mini": {"max_tokens_per_period": 300000}
    }
  }'

Python (httpx)

httpx.patch(
    f"https://api.paygent.to/api/v1/config/plans/{plans['pro']}",
    headers={"Authorization": "Bearer pg_live_..."},
    json={
        "model_limits": {
            "gpt-4o": {"max_tokens_per_period": 75000},
            "gpt-4o-mini": {"max_tokens_per_period": 300000},
        }
    },
)

Only the fields you include are updated. Everything else stays as it was. The change takes effect on the next session bootstrap for users on that plan (typically within seconds).

Common patterns

Freemium with locked models

Free tier has a tiny dollar cap and only cheap-model access. Paying tiers unlock more models. Users hit the gate naturally and self-serve into a paid plan.

Tiered quotas at the same dollar cap

Two plans with the same max_spend_per_period but different model_limits — Pro gets premium models, Standard gets only mini. Same price, different capabilities.

Per-session caps for high-stakes products

For document processing or research agents where one task can spiral, set max_spend_per_session aggressively (e.g., $1–2) regardless of plan. Prevents single-task disasters.

Custom rates for cost pass-through

If you want users to see OpenAI's actual rates instead of marked-up ones, set cost_rates to OpenAI's published numbers. Combined with a thin max_spend_per_period, this becomes a "metered with a ceiling" experience.

What's next

Assign users to plans — register end users with Paygent and put them on the plans you just created
Backend API reference — every field on POST /config/plans including default_cost_rate, pre_call_estimate, session timeouts, and tool costs
Callbacks and events — react to soft warnings and hard blocks from your application code