OpenAI Rate Limits Explained – Complete Guide by Tier
Getting “Rate limit exceeded” errors from OpenAI? This guide explains exactly what your limits are, why you’re hitting them, and how to fix it.
Understanding OpenAI Rate Limits
OpenAI rate limits control how many requests you can make to the API. They’re measured in two ways:
- RPM (Requests Per Minute) – How many API calls you can make per minute
- TPM (Tokens Per Minute) – How many tokens you can process per minute
You’ll hit a rate limit if you exceed _either_ threshold.
Rate Limits by Tier
OpenAI uses a tier system based on your account history and spending. Here are the current limits:
Tier 1 (New accounts, $5+ spent)
| Model | RPM | TPM |
|---|---|---|
| GPT-4o | 500 | 30,000 |
| GPT-4o-mini | 500 | 200,000 |
| o1 | 500 | 30,000 |
| o1-mini | 500 | 200,000 |
| o3-mini | 500 | 30,000 |
Tier 2 ($50+ spent, 7+ days)
| Model | RPM | TPM |
|---|---|---|
| GPT-4o | 5,000 | 450,000 |
| GPT-4o-mini | 5,000 | 2,000,000 |
| o1 | 5,000 | 450,000 |
| o1-mini | 5,000 | 2,000,000 |
| o3-mini | 5,000 | 450,000 |
Tier 3 ($100+ spent, 7+ days)
| Model | RPM | TPM |
|---|---|---|
| GPT-4o | 5,000 | 800,000 |
| GPT-4o-mini | 5,000 | 4,000,000 |
| o1 | 5,000 | 800,000 |
| o1-mini | 5,000 | 4,000,000 |
| o3-mini | 5,000 | 800,000 |
Tier 4 ($250+ spent, 14+ days)
| Model | RPM | TPM |
|---|---|---|
| GPT-4o | 10,000 | 2,000,000 |
| GPT-4o-mini | 10,000 | 10,000,000 |
| o1 | 10,000 | 2,000,000 |
| o1-mini | 10,000 | 10,000,000 |
| o3-mini | 10,000 | 2,000,000 |
Tier 5 ($1,000+ spent, 30+ days)
| Model | RPM | TPM |
|---|---|---|
| GPT-4o | 10,000 | 30,000,000 |
| GPT-4o-mini | 30,000 | 150,000,000 |
| o1 | 10,000 | 30,000,000 |
| o1-mini | 30,000 | 150,000,000 |
| o3-mini | 10,000 | 150,000,000 |
Reasoning Models (o1, o3) – Special Considerations
The o1 and o3 series are OpenAI’s reasoning models. They have some unique characteristics:
Higher Latency, Different Use Cases
- o1/o3 models take longer to respond (they “think” before answering)
- Rate limits are similar to GPT-4o, but effective throughput may feel lower
- Best for complex reasoning tasks, not high-volume chat
Availability
- o1 and o1-mini: Available to all paid API users
- o3 and o3-mini: May require qualification or higher tiers
- Check your dashboard for which models you can access
When to Use Reasoning Models
Use o1/o3 when you need:
- Complex multi-step reasoning
- Math and logic problems
- Code generation with careful planning
- Tasks where accuracy matters more than speed
Use GPT-4o/GPT-4o-mini for:
- High-volume applications
- Real-time chat
- Tasks where speed matters more than deep reasoning
Why Am I Hitting Rate Limits?
1. Too Many Parallel Requests
If you’re making concurrent API calls, you might exceed RPM limits even with low overall traffic.
Fix: Implement request queuing or reduce parallelism.
2. Prompts Are Too Long
Long system prompts or input contexts consume TPM quickly.
Fix: Optimize prompts, use summarization, or truncate context.
3. Burst Traffic
Sudden spikes (like all users hitting the API at once) can trigger limits.
Fix: Implement request smoothing or rate limiting on your end.
4. Wrong Model Choice
Using GPT-4o when GPT-4o-mini would work means lower rate limits for the same task.
Fix: Use the smallest model that meets your quality needs.
5. New Account on Tier 1
Fresh accounts have the lowest limits.
Fix: Spend $50+ and wait 7 days for automatic upgrade to Tier 2.
6. Using Reasoning Models for High-Volume Tasks
o1/o3 models have the same RPM limits as GPT-4o but take longer per request.
Fix: Use GPT-4o-mini for high-volume tasks, reserve o1/o3 for complex reasoning.
How to Handle 429 Errors
When you hit a rate limit, OpenAI returns a 429 “Too Many Requests” error. Here’s how to handle it:
Implement Exponential Backoff
import time
import random
import openai
def call_with_retry(prompt, model="gpt-4o", max_retries=5):
for attempt in range(max_retries):
try:
return openai.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}]
)
except openai.RateLimitError as e:
if attempt == max_retries - 1:
raise
wait_time = (2 ** attempt) + random.random()
print(f"Rate limited. Waiting {wait_time:.1f}s...")
time.sleep(wait_time)
Check the Retry-After Header
OpenAI sometimes includes a Retry-After header telling you exactly how long to wait.
except openai.RateLimitError as e:
retry_after = getattr(e, 'headers', {}).get('Retry-After', 5)
time.sleep(int(retry_after))
Use the Batch API
For non-time-sensitive tasks, OpenAI’s Batch API has much higher limits and 50% lower costs.
How to Upgrade Your Tier
OpenAI automatically upgrades your tier based on:
- Total spend – More spending = higher tier
- Account age – Older accounts get higher limits
- Usage patterns – Consistent usage helps
To speed up the upgrade:
- Prepay for credits (this counts toward spending threshold)
- Wait for the time requirement (7-30 days depending on tier)
- Contact OpenAI support for enterprise needs
Tier Upgrade Requirements
| Tier | Spending Requirement | Time Requirement |
|---|---|---|
| Tier 1 | $5+ | None |
| Tier 2 | $50+ | 7+ days |
| Tier 3 | $100+ | 7+ days |
| Tier 4 | $250+ | 14+ days |
| Tier 5 | $1,000+ | 30+ days |
Check Your Current Tier
- Go to platform.openai.com
- Navigate to Settings > Limits
- View your current tier and limits
Rate Limit vs Quota
Don’t confuse rate limits with billing quotas:
- Rate limits = requests per minute (resets every minute)
- Billing quota = monthly spending cap (resets monthly)
If you’re getting “You exceeded your current quota” errors, that’s a billing issue, not a rate limit. Check your billing settings.
FAQ
What happens when I hit a rate limit?
You’ll receive a 429 HTTP error with a message like “Rate limit exceeded. Please retry after X seconds.” Your request is not processed, and you’re not charged for it.
Can I request higher rate limits?
Yes, for enterprise needs. Contact OpenAI sales for custom rate limits. Otherwise, spend more and wait for automatic tier upgrades.
Do different models have different limits?
Yes. GPT-4o-mini and o1-mini have much higher TPM limits than GPT-4o and o1. Check the tables above for specifics.
Are rate limits per API key or per organization?
Rate limits are per organization, shared across all API keys.
What are the rate limits for o1 and o3?
o1 and o3 models have similar RPM limits to GPT-4o (500-10,000 depending on tier). The main difference is response latency – reasoning models take longer to respond, so effective throughput is lower.
Is this tool free?
Yes, completely free with no signup required. Use the error decoder above to diagnose any API errors.
Related Tools
- How to Handle OpenAI 429 Errors – Detailed guide with code examples
- AI Error Decoder – Decode any API error message
- AI Status Page – Check if OpenAI is down
- AI Pricing Calculator – Compare costs if you need to switch providers
- All AI Developer Tools – Browse all free tools
Read More From AI Buzz

Perplexity pplx-embed: SOTA Open-Source Models for RAG
Perplexity AI has released pplx-embed, a new suite of state-of-the-art multilingual embedding models, making a significant contribution to the open-source community and revealing a key aspect of its corporate strategy. This Perplexity pplx-embed open source release, built on the Qwen3 architecture and distributed under a permissive MIT License, provides developers with a powerful new tool […]

New AI Agent Benchmark: LangGraph vs CrewAI for Production
A comprehensive new benchmark analysis of leading AI agent frameworks has crystallized a fundamental challenge for developers: choosing between the rapid development speed ideal for prototyping and the high-consistency output required for production. The data-driven study by Lukasz Grochal evaluates prominent tools like LangGraph, CrewAI, and Microsoft’s new Agent Framework, revealing stark tradeoffs in performance, […]

Vector DB Market Shifts: Qdrant, Chroma Challenge Milvus
The vector database market is splitting in two. On one side: enterprise-grade distributed systems built for billion-vector scale. On the other: developer-first tools designed so that spinning up semantic search is as easy as pip install. This month’s data makes clear which side developers are choosing — and the answer should concern anyone who bet […]