Model Guide • 13 min read

How to Prompt ChatGPT: The Complete Guide

Quick Answer

To prompt ChatGPT effectively, follow the instruction hierarchy (System → User → Tools → Examples), use function calling for structured JSON output (99%+ reliability), tune temperature by model (GPT-4o: 0.0-1.0; o3: always 1.0 with reasoning_effort), and structure system prompts in four blocks (Identity, Rules, Output Format, Boundaries). Function calling is GPT's biggest differentiator — use it for any structured output need.

128K

GPT-4o context window

99%+

Function calling reliability

Key techniques covered

Core Prompting Techniques

📐

Instruction Hierarchy

Foundation

ChatGPT processes instructions in a clear priority order: System → User → Tools → Assistant. Understanding this hierarchy is key to reliable prompt engineering — conflicting instructions are resolved by priority level.

Example

// System prompt (highest priority)
"You are a technical writer. Always use British English. Never exceed 200 words."

// User prompt
"Write a product description for our API monitoring tool."

// Tools (function calling)
{ name: "format_output", parameters: { format: "markdown", sections: ["headline", "body", "cta"] } }

// Assistant (prefilled context)
"Here is the description following your style guide..."

💡 Place hard rules in the system prompt where they can't be overridden by user input — critical for deployed chatbots.

🔧

Function Calling

GPT's Superpower

Function calling is ChatGPT's most powerful feature for structured output. Define a JSON schema and GPT returns data matching it exactly. No parsing errors, no retry logic, no "here is the JSON" preamble.

Example

// Define the function
{
  name: "extract_entities",
  description: "Extract structured entities from text",
  parameters: {
    type: "object",
    properties: {
      people: { type: "array", items: { type: "string" } },
      organisations: { type: "array", items: { type: "string" } },
      dates: { type: "array", items: { type: "string" } },
      sentiment: { enum: ["positive", "negative", "neutral"] }
    },
    required: ["people", "organisations", "sentiment"]
  }
}

💡 Function calling is the most reliable way to get structured output from any OpenAI model — prefer it over prompt-based JSON requests.

🌡️

Temperature Tuning

Model-Specific

Different GPT models respond differently to temperature. GPT-4o uses temperature 0.0-1.0 conventionally. o3/o3-mini use internal chain-of-thought and are designed for temperature 1.0 by default — lowering temperature can actually hurt reasoning quality.

Example

// GPT-4o: Creative writing
{ model: "gpt-4o", temperature: 0.8 }

// GPT-4o: Data extraction
{ model: "gpt-4o", temperature: 0.0 }

// o3-mini: Complex reasoning
{ model: "o3-mini", temperature: 1.0 }  // Default, don't lower!

// o3-mini: Math/logic
{ model: "o3-mini", reasoning_effort: "high" }  // Use effort, not temp

💡 For o3 models, control output quality with reasoning_effort (low/medium/high) instead of temperature. High effort = more thinking tokens = better accuracy.

👁️

Vision Prompting

GPT-4o

GPT-4o can process images alongside text. Provide images via URL or base64 encoding with specific instructions about what to analyse. Be explicit about what you want extracted — "describe this image" produces generic output.

Example

{
  role: "user",
  content: [
    { type: "text", text: "This is a screenshot of our checkout page. Identify:\n1. UI/UX issues\n2. Accessibility violations\n3. Mobile responsiveness concerns\n\nFormat as a prioritised list with severity ratings." },
    { type: "image_url", image_url: { url: "https://..." } }
  ]
}

💡 Be as specific with vision prompts as with text prompts. "Analyse this screenshot" produces vague output. "Identify accessibility violations in this checkout form" produces actionable results.

🏗️

System Prompt Architecture

Production

For production ChatGPT deployments, structure system prompts in four blocks: Identity, Rules, Output Format, and Boundaries. Keep under 500 tokens for GPT-4o to maintain strong instruction following.

Example

## Identity
You are Aria, a customer support specialist for TechCorp.
Experience: Enterprise SaaS support, 5 years.
Tone: Professional, warm, concise.

## Rules
1. Always check the knowledge base before answering.
2. Never share internal pricing or roadmap.
3. Escalate billing disputes to human agents.
4. Respond in the customer's language.

## Output Format
- Greeting (1 sentence)
- Answer (2-4 sentences)
- Next step or follow-up question

## Boundaries
Refuse: legal advice, competitor comparisons, personal opinions.
Redirect: "I'd recommend contacting our [team] for that."

💡 Test your system prompt against adversarial inputs — users will try to override it. Place the most important rules first.

Model Selection Guide

Model	Speed	Context	Temperature	Cost	Best For
GPT-4o	Fast	128K	0.0-1.0 (conventional)	$$	General-purpose, multimodal, speed-sensitive tasks
GPT-4o-mini	Very fast	128K	0.0-1.0	$	High-volume, cost-sensitive tasks, classification
o3-mini	Moderate	200K	1.0 (fixed)	$$$	Complex reasoning, math, code, analysis
o3	Slower	200K	1.0 (fixed)	$$$$	Hardest problems, research, multi-step reasoning

Common Pitfalls

✗ Lowering temperature on o3 models — use reasoning_effort instead
✗ Using prompt-based JSON when function calling is available — 10× more reliable
✗ System prompts over 500 tokens — instruction following degrades with length
✗ Not testing with adversarial inputs — users will try to override system prompts
✗ Using the same prompt structure as Claude — GPT prefers markdown over XML tags

📌 Key Takeaways

Function calling is GPT's superpower — use it for any structured output need.
Temperature works differently across models: GPT-4o (conventional), o3 (always 1.0 + reasoning_effort).
Structure system prompts: Identity → Rules → Output Format → Boundaries.
Compare approaches: How to Prompt Claude · How to Prompt Gemini · Prompt Formulas · Structured Output
See the evidence behind these techniques on the Evidence Hub.
Calculate prompt optimisation ROI with the ROI Calculator.

Frequently Asked Questions

What is the best prompt format for ChatGPT?

ChatGPT responds best to a clear instruction hierarchy: (1) System prompt — define role, rules, and output format. (2) User prompt — provide context, task, and constraints. (3) Examples — include 2-3 few-shot examples for consistent formatting. (4) Tools/functions — use function calling for structured JSON output. Use markdown headers (##) for section separation within prompts, and number multi-step instructions explicitly.

How does function calling work in ChatGPT?

Function calling lets you define a JSON schema for your desired output format. GPT generates a structured response matching your schema exactly — no parsing, no retries. Define functions with name, description, and parameters (JSON Schema). GPT decides when to "call" the function and returns structured arguments. This is the most reliable way to get JSON from ChatGPT — 99%+ structural validity vs 85-90% with prompt-only approaches.

What is the difference between GPT-4o and o3-mini?

GPT-4o is optimised for speed, multimodality (text + vision + audio), and general-purpose tasks — use temperature 0.7-1.0 for creative work, 0.0-0.3 for deterministic tasks. o3-mini is optimised for complex reasoning, math, and code — it uses internal chain-of-thought and performs best with temperature 1.0 (its default) and explicit "think step by step" instructions. Choose GPT-4o for speed and multimodal, o3-mini for hard reasoning.

How do I write better ChatGPT system prompts?

Structure your system prompt in four blocks: (1) Identity — "You are a [role] with [experience]. You [traits]." (2) Rules — 3-5 explicit behavioural rules. (3) Output format — exact format specification with example. (4) Boundaries — what the model should refuse or redirect. Keep system prompts under 500 tokens for GPT-4o (longer prompts reduce instruction following) and use markdown formatting for readability.

Generate GPT-Optimised Prompts

AI Prompt Architect builds prompts with the right instruction hierarchy, function schemas, and model-specific tuning — one click.

Prompt ChatGPT Better →

🔬 The Research Behind This

Function calling's 99%+ structural validity rate comes from OpenAI's own benchmarks comparing prompt-based JSON extraction with tool-use schemas. System prompt length recommendations (≬500 tokens) are based on empirical testing showing instruction-following degradation beyond this threshold.

Temperature guidance for o3 models reflects OpenAI's technical documentation specifying that reasoning models use fixed temperature with reasoning_effort as the primary quality lever. The instruction hierarchy (System → User → Tools) aligns with the priority model documented across GPT-4o and o3 series.

Explore 500+ cited data points on our Prompt Engineering Evidence Hub →

ChatGPT Prompting: The Evidence

Every claim below is sourced from peer-reviewed research and industry reports.Browse all 141 citations →

Few-shot extraction minimizes context window usage vs zero-shot verbose.

3 well-crafted few-shot examples (150 tokens) outperform a 600-token verbose instruction block, saving 75% on input costs per request.

Without concise few-shot examples, developers write lengthy prose instructions that consume 4x more tokens for equivalent or inferior output quality.

Brown et al., 'Language Models are Few-Shot Learners', NeurIPS 2020

JSON Schema enforcement eliminates parse errors.

OpenAI structured outputs with JSON Schema achieve 99.9% schema adherence vs <70% with unconstrained generation — a 30x reduction in parse failures.

Without schema enforcement, every 1M requests generate 300K+ malformed responses requiring retries, error handling, and downstream data corruption.

OpenAI, 'Structured Outputs: JSON Schema' documentation, 2024

Chain-of-thought prompting improves complex reasoning accuracy.

Adding 'Let's think step by step' improves accuracy on GSM8K math benchmarks from 17.7% to 78.7% — a 4.4x improvement on multi-step reasoning tasks.

Without chain-of-thought, models attempt to produce answers in a single leap, failing on problems requiring intermediate steps.

Wei et al., 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models', Google Research, 2022