Last updated: May 2026·GPT-5.5 Flagship  ·  GPT-5.4 / GPT-4.1 / o3 / o4-mini  ·  Responses API
To update: ask Claude at claude.ai — "regenerate the ChatGPT cheat sheet"
GPT-5.5 Flagship Chat · API · Assistants Responses API · Realtime Vision · Audio · Images
Current OpenAI Models — May 2026
GPT-5.5 Flagship
gpt-5.5 · gpt-5.5-pro
Most capable. Complex reasoning and coding. Use for highest-quality outputs and agentic workflows.
Vision + toolsPrompt cache 90% offPriority tier available
GPT-5.4 Series Balanced
gpt-5.4 · gpt-5.4-mini · gpt-5.4-nano · gpt-5.4-pro
Professional work and coding. Adaptive reasoning, improved latency. Mini/nano for cost-sensitive apps.
GPT-5.4-mini: lower latencyGPT-5.4-nano: cheapest
GPT-4.1 Series Best for code
gpt-4.1 · gpt-4.1-mini · gpt-4.1-nano
Significantly better at coding than GPT-4o. GPT-4.1: $2/$8/M · Mini: $0.40/$1.60/M · Nano: cheap.
1M context33K output (mini)Prompt cache 75% off
GPT-4o / 4o-mini Proven
gpt-4o · gpt-4o-mini
Reliable, widely integrated. 4o-mini: $0.15/$0.60/M. Prompt cache 50% off. Vision + audio.
128K contextMultimodal
o3 Reasoning
o3
Frontier reasoning. 80% price cut — now $0.40/$1.60/M. Best for math, science, coding, vision tasks.
200K contextReasoning tokens billed
o4-mini Fast Reasoning
o4-mini
Compact, efficient reasoning. Excels at math, coding, visual tasks. Great price/performance ratio.
200K contextFine-tunable$1.10/$4.40/M
💡Start with gpt-5.5 for complex tasks, gpt-5.4-mini or gpt-4.1-nano for cost-sensitive / high-volume
🧠o-series reasoning tokens: billed as output but not returned. A 500-token response may use 2000+ tokens. Monitor carefully.
💰Prompt caching: GPT-5 family 90% off · GPT-4.1 family 75% off · GPT-4o/o-series 50% off. Cache persists 5–10 min.
Quickstart Code
Python (openai SDK)
pip install openai from openai import OpenAI client = OpenAI(api_key="sk-...") resp = client.chat.completions.create( model="gpt-5.5", messages=[{"role":"user","content":"Hi"}] ) print(resp.choices[0].message.content)
JavaScript / Node
npm install openai import OpenAI from "openai" const client = new OpenAI({apiKey:"sk-..."}) const resp = await client.chat.completions.create({ model:"gpt-5.5", messages:[{role:"user",content:"Hi"}] })
cURL
curl api.openai.com/v1/chat/completions \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"gpt-5.5","messages":[{"role":"user","content":"Hi"}]}'
📡API Endpoints
Base URL: api.openai.com/v1/
POST
/chat/completions
Main chat endpoint
POST
/responses
Responses API (newer, recommended)
POST
/assistants · /threads · /runs
Assistants API
POST
/images/generations
DALL-E / gpt-image-1
POST
/audio/speech · /audio/transcriptions
TTS + Whisper STT
POST
/embeddings
Text embeddings
POST
/fine_tuning/jobs
Fine-tuning (GPT-4.1, 4o, o4-mini)
POST
/batches
Async batch — 50% off, 24h SLA
POST
/moderations
Content moderation check
GET
/models
List available models
POST
/vector_stores
File search vector stores
⚙️Key Parameters
Chat Completions
modelstring — required
messagesarray — [{role, content}]
temperature0–2 · 0=deterministic, 1=balanced
streambool — SSE streaming
response_format{type:"json_schema"} structured output
toolsarray — function definitions
tool_choice"auto" | "none" | force function
parallel_tool_callsbool — multiple tools at once
seedint — deterministic outputs (best effort)
reasoning_effort"low"|"medium"|"high" — o-series only
max_completion_tokensint — max response tokens
logprobsbool — token log probabilities
userstring — unique user ID for abuse detection
Structured Output
📐Use json_schema with strict:true for guaranteed JSON. Avoids hallucinated fields.
💎ChatGPT Plans / Tiers
Free $0/mo
GPT-4o limitedLimited searchesBasic voice
Plus $20/mo
GPT-5.5 accesso3/o4-miniCanvasImage genAdvanced voiceSearch
Pro $200/mo
Unlimited GPT-5.5o3 extendedPriority accessAll features
Team $25/user/mo
All Plus featuresAdmin consoleData privacyHigher limits
Enterprise Custom
SSO/SAMLSOC 2 Type IICustom retentionDedicated support
API Rate Tiers
Tier 1 ($5 spent): 500 RPM · 30K TPM
Tier 2 ($50): 5K RPM · 450K TPM
Tier 3 ($100): 5K RPM · 800K TPM
Tier 4 ($250): 10K RPM · 2M TPM
Tier 5 ($1K): 30K RPM · 150M TPM
💬ChatGPT Web Features
🧠
Memory
Remembers info across conversations
Plus+
🎨
Canvas
Collaborative writing and code editing
Plus+
🤖
GPTs / Custom GPTs
Build and share custom assistants
All
📁
Projects
Organise chats, share files + instructions
All
🔍
ChatGPT Search
Real-time web search with citations
Plus+
🖼️
Image Generation
DALL-E 3 + gpt-image-1 in chat
Plus+
🎤
Advanced Voice Mode
Real-time voice with emotion detection
Plus+
💻
Code Interpreter
Python sandbox, charts, data analysis
Plus+
📄
File Uploads
PDFs, images, code, spreadsheets
All
🤖Assistants API
🧵Persistent AI with threads, tools, and file access. Beta — using client.beta.assistants
AssistantPersistent AI with instructions, model, tools
ThreadConversation history — add messages, run assistant
RunExecution on a thread — poll for completion
Code Interp.Run Python, create files, process data
File SearchVector store semantic search over files (RAG)
Vector StoreStore + search file embeddings for File Search
Quick Create
a = client.beta.assistants.create( model="gpt-5.5", instructions="You are a helpful analyst", tools=[{"type":"code_interpreter"}] ) t = client.beta.threads.create()
Audio APIs
🎵Whisper STT: gpt-4o-transcribe — audio → text, multi-language
🔊TTS: tts-1, tts-1-hd, gpt-4o-mini-tts — 6 voices, MP3/Opus
Realtime API: gpt-4o-realtime-preview — bidirectional audio WebSocket, sub-300ms
🔧Function Calling, Structured Output & Prompting Tips
Function Calling
tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get weather", "parameters": { "type": "object", "properties": { "city": {"type":"string"}}, "required": ["city"]}}}] # Check resp.choices[0].message.tool_calls
Structured JSON Output
resp = client.chat.completions.create( model="gpt-5.5", messages=messages, response_format={ "type": "json_schema", "json_schema": { "name": "result", "schema": {...your schema...}, "strict": True}})
Keyboard Shortcuts
Ctrl+Shift+O — New conversation
Ctrl+Shift+S — Toggle sidebar
Ctrl+K — Search conversations
Ctrl+Shift+C — Copy last code block
Shift+Enter — New line (not send)
Esc — Stop generating
Prompting Tips for GPT / o-series
🎯System prompt first. Put all persistent instructions in the system role — highest priority, saves tokens.
📋Few-shot examples. 2–5 input/output pairs. GPT-5 follows patterns very quickly.
🌡️Temperature. 0 for extraction/classification. 0.7–1 for creative writing. Above 1 degrades quality.
📐Structured output. Always use json_schema with strict:true. Eliminates parsing errors.
🧠o-series. Don't add chain-of-thought — model reasons internally. Use reasoning_effort:"high" for hard problems.
💾Prompt caching. Keep static content at the start of your prompt for best cache hit rates.
Troubleshooting
🔑401 Unauthorized — Invalid API key. Check OPENAI_API_KEY, no trailing whitespace.
⏱️429 Rate Limit — Implement exponential backoff. Consider Batch API (50% off, async).
Timeout — Use stream:true for long outputs. Increase client timeout.