Accuracy-Cost Tradeoff on HumanEval
Full reproducibility materials for the paper titled "AI Agents That Matter"
Input ($/1M tokens)
Output ($/1M tokens)
GPT-4
GPT-3.5
Llama-3-8B
Llama-3-70B
Log Scale
Legend
Complex agent
✕
Baseline agent
Zero-shot model