Skip to Main Content

Reflexion improved HumanEval coding benchmark pass@1 from 80.1% to 91.0% by prompting the model to reflect on test failu.Shinn et al., 'Reflexion: Language Agents with Ver…