Skip to Main Content
Reliabilitype-citation-106P0

Chain-of-thought prompting dramatically improves multi-step reasoning in large language models.

CoT prompting improved GSM8K math…CoT prompting improved GSM8K math benchmark accuracy from 17.7% to 58.1% on PaLM 540B — a 3.3x improvement with zero model changes.

Context & Methodology

By adding 'Let's think step by step' or providing reasoning exemplars, models allocate compute to intermediate reasoning rather than jumping to answers.

Applies To

openaianthropicgoogle

Confidence Level

High

Implementation Effort

low

Recommendation

follow

Execution Priority

P0

Put This Evidence to Work

Use the STCO framework to implement findings like this in structured, testable prompts.

Structured prompt templates cut development time from 4 hours to 20 minutes per prompt (8x reduction) by separating inst.LangChain, 'Prompt Templates' documentation, 2024