Chain-of-thought prompting dramatically improves multi-step…

Q: What does research say about: Chain-of-thought prompting dramatically improves multi-step reasoning in large language models?

CoT prompting improved GSM8K math benchmark accuracy from 17.7% to 58.1% on PaLM 540B — a 3.3x improvement with zero model changes. (Source: Wei et al., 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models', NeurIPS 2022). By adding 'Let's think step by step' or providing reasoning exemplars, models allocate compute to intermediate reasoning rather than jumping to answers.

Context & Methodology

By adding 'Let's think step by step' or providing reasoning exemplars, models allocate compute to intermediate reasoning rather than jumping to answers.

Applies To

openaianthropicgoogle

Confidence Level

High

Implementation Effort

low

Recommendation

Execution Priority

Put This Evidence to Work

Use the STCO framework to implement findings like this in structured, testable prompts.

Start Building Free Browse All 141 Citations

ROI Calculator Token Calculator Prompt Templates