Google Gemini — Cheat Sheet 2026

✦Current Gemini Models — May 2026

Gemini 3.1 Pro Latest

gemini-3.1-pro-preview

State-of-the-art. Complex tasks, deep reasoning, coding, agentic workflows. Replaces Gemini 3 Pro (deprecated March 2026).

FrontierPreview

Gemini 2.5 Pro Flagship GA

gemini-2.5-pro

Advanced multimodal reasoning. Best for coding and agentic tasks. Steepest growth of any Google model.

2M ctxThinking model

Gemini 2.5 Flash Best Value

gemini-2.5-flash

Best price/performance. Thinking capabilities. High-throughput applications. GA stable (05-20 model).

1M ctx66K out$0.30/$2.50/M

Gemini 2.5 Flash-Lite Cheapest

gemini-2.5-flash-lite

Lowest latency and cost in the 2.5 family. Upgrade from 1.5/2.0 Flash. Thinking off by default.

1M ctx$0.10/$0.40/M

🧠Thinking models: All 2.5 series. Configure thinkingBudget — 0 disables, -1 dynamic. Budget controls cost.

💰Free tier: Gemini 2.5 Flash: 15 RPM, 1M TPD · Gemini 2.5 Pro: 5 RPM, 25 req/day · All free at aistudio.google.com

⚠️Deprecated: Gemini 3 Pro Preview shut down March 9, 2026. Migrate to Gemini 3.1 Pro Preview.

⚡Quickstart Code

Python (google-genai)

pip install google-genai from google import genai client = genai.Client(api_key="YOUR_KEY") resp = client.models.generate_content( model="gemini-2.5-flash", contents="Hello Gemini!" ) print(resp.text)

JavaScript (@google/genai)

npm install @google/genai import { GoogleGenAI } from "@google/genai" const ai = new GoogleGenAI({apiKey:"KEY"}) const r = await ai.models.generateContent({ model:"gemini-2.5-flash", contents:"Hello!" })

REST / cURL

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=KEY" \ -H "Content-Type: application/json" \ -d '{"contents":[{"parts":[{"text":"Hi"}]}]}'

Official SDKs

🐍 pip install google-genai
🟨 npm install @google/genai
🔵 go get google.golang.org/genai
🌐 curl -H "x-goog-api-key: KEY"

📡API Endpoints & Auth

Base URL

generativelanguage.googleapis.com/v1beta/

Key Endpoints

POST

models/{model}:generateContent

Generate text/multimodal response

POST

models/{model}:streamGenerateContent

Streaming response (SSE)

POST

models/{model}:embedContent

Generate text embeddings

GET

models

List all available models

POST

files

Upload files via File API (2GB max)

POST

cachedContents

Context caching (prefix + TTL)

Authentication

🔑API Key: x-goog-api-key header or GOOGLE_API_KEY env var. Get free at aistudio.google.com

🔒OAuth / ADC: For Vertex AI and Google Cloud. gcloud auth application-default login

⚙️Core API Parameters

Generation Config

temperaturefloat 0–2 · randomness control

topPfloat 0–1 · nucleus sampling

topKint · top-K sampling tokens

maxOutputTokensint · max response tokens

stopSequencesstring[] · stop generation here

responseMimeType"text/plain" or "application/json"

responseSchemaobject · JSON schema for structured output

Thinking Config (2.5 models)

thinkingBudget0=off · -1=dynamic · int=fixed tokens

includeThoughtsbool · return thinking trace in response

Safety Thresholds

BLOCK_NONE · BLOCK_LOW_AND_ABOVE
BLOCK_MEDIUM_AND_ABOVE · BLOCK_ONLY_HIGH

Safety Categories

HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_SEXUALLY_EXPLICIT
HARM_CATEGORY_DANGEROUS_CONTENT
HARM_CATEGORY_CIVIC_INTEGRITY

✨Key Features

🔍

Google Search Grounding

Real-time web grounding for accurate answers

⚙️

Function Calling / Tools

Define custom functions, structured outputs

💻

Code Execution

Run Python in sandboxed environment

🧠

Long Context

Up to 2M token context (2.5 Pro)

💾

Context Caching

Cache prompt prefixes — reduce cost + latency

⚡

Live API

Real-time bidirectional audio/video streaming

New

🖼️

Image Generation (Imagen 3)

Native image generation via API

New

🔗

System Instructions

Persistent context and persona across turns

🔊

Text-to-Speech (Native)

Multiple voices, languages, fine style control

New

🌈Multimodal Inputs

📝TextAll models

🖼️ImagesJPEG PNG WebP GIF

🎵AudioMP3 WAV OGG FLAC

🎬VideoMP4 AVI MOV ~1hr

📄DocumentsPDF TXT HTML CSS JS

💻CodeAll languages

File API

📁Upload up to 2GB per file, 20GB per project. Files stored 48 hours. Reference by URI in prompts.

⚡Inline (small files): base64-encode and pass directly. Max ~20MB inline.

Function Calling Example

# Define function schema get_weather = { "name": "get_weather", "description": "Get current weather", "parameters": { "type": "object", "properties": { "city": {"type":"string"}}, "required": ["city"]}} # Pass to model resp = client.models.generate_content( model="gemini-2.5-pro", contents="Weather in Sydney?", config={"tools":[{"function_declarations":[get_weather]}]})

💾Context Caching · Streaming · Structured Output

Context Caching

# Cache a large system prompt cache = client.caches.create( model="gemini-2.5-flash", config={ "contents": [very_long_doc], "system_instruction": "You are...", "ttl": "3600s" # 1 hour TTL } ) # Use cached content (much cheaper!) resp = client.models.generate_content( model="gemini-2.5-flash", contents="Summarise key findings", config={"cached_content": cache.name})

Streaming

for chunk in client.models.generate_content_stream( model="gemini-2.5-flash", contents="Write a long essay..." ): print(chunk.text, end="", flush=True)

Structured JSON Output

resp = client.models.generate_content( model="gemini-2.5-flash", contents="List 3 capitals", config={ "response_mime_type": "application/json", "response_schema": { "type": "array", "items": {"type": "string"} } } )

Prompting Tips

🎯Be specific. State format, length, constraints explicitly. Gemini follows detailed instructions closely.

📋System instructions for persistent persona — avoids repeating context every turn.

🧠Thinking mode (2.5): Enable for complex reasoning. Budget controls cost. Returns thinking trace if includeThoughts:true.

🌡️Temperature: 0 for facts, 0.7–1.0 for creative work.

📏Long context: Place key content at the start. Quality can degrade at very long tails.

🔧AI Studio · Vertex AI · Troubleshooting

Google AI Studio (aistudio.google.com)

🏠Browser-based IDE. Free. Get API keys. No setup needed. Test multimodal prompts, export as code.

💬 Freeform / Structured / Chat prompt modes
⚙️ Live temperature / topP / topK tuning
📊 Compare same prompt across models
🔑 Create and manage API keys
📁 File upload for multimodal testing
💻 Export as Python · JS · curl

Vertex AI (Google Cloud)

gcloud auth application-default login pip install google-cloud-aiplatform

🔒 IAM auth (service accounts, Workload Identity)
🌍 Multi-region: US / EU / Asia-Pacific
🛡️ Enterprise security: VPC SC, CMEK, DLP
🔧 Fine-tuning: supervised + RLHF
📊 Batch predictions (async, large datasets)

Troubleshooting

🔑API_KEY_INVALID — Check at aistudio.google.com. Verify GOOGLE_API_KEY env var.

⏱️429 Rate Limit — Add exponential backoff. Upgrade to pay-as-you-go tier.

📦RESOURCE_EXHAUSTED — Daily quota hit. Wait 24h or upgrade plan.

🚫SAFETY_BLOCKED — Adjust safety thresholds or rephrase prompt.

📏Context too long — Use context caching for repeated prefixes. 2.5 Pro supports 2M tokens.