Last updated: May 2026·Gemini 3.1 Pro · Gemini 2.5 Pro · 2.5 Flash · 2.5 Flash-Lite  ·  1M token context
To update: ask Claude at claude.ai — "regenerate the Gemini cheat sheet"
Gemini 3.1 Pro 2.5 Pro · 2.5 Flash · Flash-Lite 1M Token Context AI Studio · Vertex AI
Current Gemini Models — May 2026
Gemini 3.1 Pro Latest
gemini-3.1-pro-preview
State-of-the-art. Complex tasks, deep reasoning, coding, agentic workflows. Replaces Gemini 3 Pro (deprecated March 2026).
FrontierPreview
Gemini 2.5 Pro Flagship GA
gemini-2.5-pro
Advanced multimodal reasoning. Best for coding and agentic tasks. Steepest growth of any Google model.
2M ctxThinking model
Gemini 2.5 Flash Best Value
gemini-2.5-flash
Best price/performance. Thinking capabilities. High-throughput applications. GA stable (05-20 model).
1M ctx66K out$0.30/$2.50/M
Gemini 2.5 Flash-Lite Cheapest
gemini-2.5-flash-lite
Lowest latency and cost in the 2.5 family. Upgrade from 1.5/2.0 Flash. Thinking off by default.
1M ctx$0.10/$0.40/M
🧠Thinking models: All 2.5 series. Configure thinkingBudget — 0 disables, -1 dynamic. Budget controls cost.
💰Free tier: Gemini 2.5 Flash: 15 RPM, 1M TPD · Gemini 2.5 Pro: 5 RPM, 25 req/day · All free at aistudio.google.com
⚠️Deprecated: Gemini 3 Pro Preview shut down March 9, 2026. Migrate to Gemini 3.1 Pro Preview.
Quickstart Code
Python (google-genai)
pip install google-genai from google import genai client = genai.Client(api_key="YOUR_KEY") resp = client.models.generate_content( model="gemini-2.5-flash", contents="Hello Gemini!" ) print(resp.text)
JavaScript (@google/genai)
npm install @google/genai import { GoogleGenAI } from "@google/genai" const ai = new GoogleGenAI({apiKey:"KEY"}) const r = await ai.models.generateContent({ model:"gemini-2.5-flash", contents:"Hello!" })
REST / cURL
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=KEY" \ -H "Content-Type: application/json" \ -d '{"contents":[{"parts":[{"text":"Hi"}]}]}'
Official SDKs
🐍 pip install google-genai
🟨 npm install @google/genai
🔵 go get google.golang.org/genai
🌐 curl -H "x-goog-api-key: KEY"
📡API Endpoints & Auth
Base URL
generativelanguage.googleapis.com/v1beta/
Key Endpoints
POST
models/{model}:generateContent
Generate text/multimodal response
POST
models/{model}:streamGenerateContent
Streaming response (SSE)
POST
models/{model}:embedContent
Generate text embeddings
GET
models
List all available models
POST
files
Upload files via File API (2GB max)
POST
cachedContents
Context caching (prefix + TTL)
Authentication
🔑API Key: x-goog-api-key header or GOOGLE_API_KEY env var. Get free at aistudio.google.com
🔒OAuth / ADC: For Vertex AI and Google Cloud. gcloud auth application-default login
⚙️Core API Parameters
Generation Config
temperaturefloat 0–2 · randomness control
topPfloat 0–1 · nucleus sampling
topKint · top-K sampling tokens
maxOutputTokensint · max response tokens
stopSequencesstring[] · stop generation here
responseMimeType"text/plain" or "application/json"
responseSchemaobject · JSON schema for structured output
Thinking Config (2.5 models)
thinkingBudget0=off · -1=dynamic · int=fixed tokens
includeThoughtsbool · return thinking trace in response
Safety Thresholds
BLOCK_NONE  ·  BLOCK_LOW_AND_ABOVE
BLOCK_MEDIUM_AND_ABOVE  ·  BLOCK_ONLY_HIGH
Safety Categories
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_SEXUALLY_EXPLICIT
HARM_CATEGORY_DANGEROUS_CONTENT
HARM_CATEGORY_CIVIC_INTEGRITY
Key Features
🔍
Google Search Grounding
Real-time web grounding for accurate answers
GA
⚙️
Function Calling / Tools
Define custom functions, structured outputs
GA
💻
Code Execution
Run Python in sandboxed environment
GA
🧠
Long Context
Up to 2M token context (2.5 Pro)
GA
💾
Context Caching
Cache prompt prefixes — reduce cost + latency
GA
Live API
Real-time bidirectional audio/video streaming
New
🖼️
Image Generation (Imagen 3)
Native image generation via API
New
🔗
System Instructions
Persistent context and persona across turns
GA
🔊
Text-to-Speech (Native)
Multiple voices, languages, fine style control
New
🌈Multimodal Inputs
📝TextAll models
🖼️ImagesJPEG PNG WebP GIF
🎵AudioMP3 WAV OGG FLAC
🎬VideoMP4 AVI MOV ~1hr
📄DocumentsPDF TXT HTML CSS JS
💻CodeAll languages
File API
📁Upload up to 2GB per file, 20GB per project. Files stored 48 hours. Reference by URI in prompts.
Inline (small files): base64-encode and pass directly. Max ~20MB inline.
Function Calling Example
# Define function schema get_weather = { "name": "get_weather", "description": "Get current weather", "parameters": { "type": "object", "properties": { "city": {"type":"string"}}, "required": ["city"]}} # Pass to model resp = client.models.generate_content( model="gemini-2.5-pro", contents="Weather in Sydney?", config={"tools":[{"function_declarations":[get_weather]}]})
💾Context Caching · Streaming · Structured Output
Context Caching
# Cache a large system prompt cache = client.caches.create( model="gemini-2.5-flash", config={ "contents": [very_long_doc], "system_instruction": "You are...", "ttl": "3600s" # 1 hour TTL } ) # Use cached content (much cheaper!) resp = client.models.generate_content( model="gemini-2.5-flash", contents="Summarise key findings", config={"cached_content": cache.name})
Streaming
for chunk in client.models.generate_content_stream( model="gemini-2.5-flash", contents="Write a long essay..." ): print(chunk.text, end="", flush=True)
Structured JSON Output
resp = client.models.generate_content( model="gemini-2.5-flash", contents="List 3 capitals", config={ "response_mime_type": "application/json", "response_schema": { "type": "array", "items": {"type": "string"} } } )
Prompting Tips
🎯Be specific. State format, length, constraints explicitly. Gemini follows detailed instructions closely.
📋System instructions for persistent persona — avoids repeating context every turn.
🧠Thinking mode (2.5): Enable for complex reasoning. Budget controls cost. Returns thinking trace if includeThoughts:true.
🌡️Temperature: 0 for facts, 0.7–1.0 for creative work.
📏Long context: Place key content at the start. Quality can degrade at very long tails.
🔧AI Studio · Vertex AI · Troubleshooting
Google AI Studio (aistudio.google.com)
🏠Browser-based IDE. Free. Get API keys. No setup needed. Test multimodal prompts, export as code.
💬 Freeform / Structured / Chat prompt modes
⚙️ Live temperature / topP / topK tuning
📊 Compare same prompt across models
🔑 Create and manage API keys
📁 File upload for multimodal testing
💻 Export as Python · JS · curl
Vertex AI (Google Cloud)
gcloud auth application-default login pip install google-cloud-aiplatform
🔒 IAM auth (service accounts, Workload Identity)
🌍 Multi-region: US / EU / Asia-Pacific
🛡️ Enterprise security: VPC SC, CMEK, DLP
🔧 Fine-tuning: supervised + RLHF
📊 Batch predictions (async, large datasets)
Troubleshooting
🔑API_KEY_INVALID — Check at aistudio.google.com. Verify GOOGLE_API_KEY env var.
⏱️429 Rate Limit — Add exponential backoff. Upgrade to pay-as-you-go tier.
📦RESOURCE_EXHAUSTED — Daily quota hit. Wait 24h or upgrade plan.
🚫SAFETY_BLOCKED — Adjust safety thresholds or rephrase prompt.
📏Context too long — Use context caching for repeated prefixes. 2.5 Pro supports 2M tokens.