What does research say about: Automated prompt optimisation outperforms human-written prompts?

APO-generated prompts outperformed human expert prompts by 3-8% on BIG-Bench Hard tasks, while requiring zero human iteration time. (Source: Pryzant et al., 'Automatic Prompt Optimization with 'Gradient Descent' and Beam Search', EMNLP 2023). Automatic prompt optimisation uses the LLM itself to generate, evaluate, and refine prompts — eliminating the trial-and-error cycle of manual prompt engineering.

Automated prompt optimisation outperforms human-written…

Context & Methodology

Automatic prompt optimisation uses the LLM itself to generate, evaluate, and refine prompts — eliminating the trial-and-error cycle of manual prompt engineering.

Applies To

openaianthropicgoogle

Confidence Level

Medium

Implementation Effort

medium

Recommendation

test

Execution Priority

Put This Evidence to Work

Use the STCO framework to implement findings like this in structured, testable prompts.

Start Building Free Browse All 141 Citations

ROI Calculator Token Calculator Prompt Templates