Model Guide • 13 min read
How to Prompt ChatGPT: The Complete Guide
To prompt ChatGPT effectively, follow the instruction hierarchy (System → User → Tools → Examples), use function calling for structured JSON output (99%+ reliability), tune temperature by model (GPT-4o: 0.0-1.0; o3: always 1.0 with reasoning_effort), and structure system prompts in four blocks (Identity, Rules, Output Format, Boundaries). Function calling is GPT's biggest differentiator — use it for any structured output need.
Core Prompting Techniques
Model Selection Guide
| Model | Speed | Context | Temperature | Cost | Best For |
|---|---|---|---|---|---|
| GPT-4o | Fast | 128K | 0.0-1.0 (conventional) | $$ | General-purpose, multimodal, speed-sensitive tasks |
| GPT-4o-mini | Very fast | 128K | 0.0-1.0 | $ | High-volume, cost-sensitive tasks, classification |
| o3-mini | Moderate | 200K | 1.0 (fixed) | $$$ | Complex reasoning, math, code, analysis |
| o3 | Slower | 200K | 1.0 (fixed) | $$$$ | Hardest problems, research, multi-step reasoning |
Common Pitfalls
- ✗ Lowering temperature on o3 models — use
reasoning_effortinstead - ✗ Using prompt-based JSON when function calling is available — 10× more reliable
- ✗ System prompts over 500 tokens — instruction following degrades with length
- ✗ Not testing with adversarial inputs — users will try to override system prompts
- ✗ Using the same prompt structure as Claude — GPT prefers markdown over XML tags
📌 Key Takeaways
- Function calling is GPT's superpower — use it for any structured output need.
- Temperature works differently across models: GPT-4o (conventional), o3 (always 1.0 + reasoning_effort).
- Structure system prompts: Identity → Rules → Output Format → Boundaries.
- Compare approaches: How to Prompt Claude · How to Prompt Gemini · Prompt Formulas · Structured Output
- See the evidence behind these techniques on the Evidence Hub.
- Calculate prompt optimisation ROI with the ROI Calculator.
Frequently Asked Questions
What is the best prompt format for ChatGPT?
ChatGPT responds best to a clear instruction hierarchy: (1) System prompt — define role, rules, and output format. (2) User prompt — provide context, task, and constraints. (3) Examples — include 2-3 few-shot examples for consistent formatting. (4) Tools/functions — use function calling for structured JSON output. Use markdown headers (##) for section separation within prompts, and number multi-step instructions explicitly.
How does function calling work in ChatGPT?
Function calling lets you define a JSON schema for your desired output format. GPT generates a structured response matching your schema exactly — no parsing, no retries. Define functions with name, description, and parameters (JSON Schema). GPT decides when to "call" the function and returns structured arguments. This is the most reliable way to get JSON from ChatGPT — 99%+ structural validity vs 85-90% with prompt-only approaches.
What is the difference between GPT-4o and o3-mini?
GPT-4o is optimised for speed, multimodality (text + vision + audio), and general-purpose tasks — use temperature 0.7-1.0 for creative work, 0.0-0.3 for deterministic tasks. o3-mini is optimised for complex reasoning, math, and code — it uses internal chain-of-thought and performs best with temperature 1.0 (its default) and explicit "think step by step" instructions. Choose GPT-4o for speed and multimodal, o3-mini for hard reasoning.
How do I write better ChatGPT system prompts?
Structure your system prompt in four blocks: (1) Identity — "You are a [role] with [experience]. You [traits]." (2) Rules — 3-5 explicit behavioural rules. (3) Output format — exact format specification with example. (4) Boundaries — what the model should refuse or redirect. Keep system prompts under 500 tokens for GPT-4o (longer prompts reduce instruction following) and use markdown formatting for readability.
Generate GPT-Optimised Prompts
AI Prompt Architect builds prompts with the right instruction hierarchy, function schemas, and model-specific tuning — one click.
Prompt ChatGPT Better →🔬 The Research Behind This
Function calling's 99%+ structural validity rate comes from OpenAI's own benchmarks comparing prompt-based JSON extraction with tool-use schemas. System prompt length recommendations (≬500 tokens) are based on empirical testing showing instruction-following degradation beyond this threshold.
Temperature guidance for o3 models reflects OpenAI's technical documentation specifying that reasoning models use fixed temperature with reasoning_effort as the primary quality lever. The instruction hierarchy (System → User → Tools) aligns with the priority model documented across GPT-4o and o3 series.
Explore 500+ cited data points on our Prompt Engineering Evidence Hub →
ChatGPT Prompting: The Evidence
Every claim below is sourced from peer-reviewed research and industry reports.Browse all 141 citations →
Few-shot extraction minimizes context window usage vs zero-shot verbose.
3 well-crafted few-shot examples (150 tokens) outperform a 600-token verbose instruction block, saving 75% on input costs per request.
Without concise few-shot examples, developers write lengthy prose instructions that consume 4x more tokens for equivalent or inferior output quality.
Brown et al., 'Language Models are Few-Shot Learners', NeurIPS 2020JSON Schema enforcement eliminates parse errors.
OpenAI structured outputs with JSON Schema achieve 99.9% schema adherence vs <70% with unconstrained generation — a 30x reduction in parse failures.
Without schema enforcement, every 1M requests generate 300K+ malformed responses requiring retries, error handling, and downstream data corruption.
OpenAI, 'Structured Outputs: JSON Schema' documentation, 2024Chain-of-thought prompting improves complex reasoning accuracy.
Adding 'Let's think step by step' improves accuracy on GSM8K math benchmarks from 17.7% to 78.7% — a 4.4x improvement on multi-step reasoning tasks.
Without chain-of-thought, models attempt to produce answers in a single leap, failing on problems requiring intermediate steps.
Wei et al., 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models', Google Research, 2022