Accuracy-Cost Tradeoff on HumanEval

Input ($/1M tokens)

Output ($/1M tokens)

GPT-4
GPT-3.5
Llama-3-8B
Llama-3-70B
Log Scale

Legend

Complex agent
Baseline agent
Zero-shot model