I'm a PhD student at
Princeton University, advised by
Arvind Narayanan. Previously, I graduated with a B.Sc. from the Technical University
of Munich (TUM) & M.Sc. from the Hertie School.
I am working on AI agents, with a focus on enhancing their real-world
usefulness and reliability. Part of that is developing rigorous evaluation frameworks and studying the limitations of inference scaling techniques.
Inference Scaling 𝙛Laws: The Limits of LLM Resampling with Imperfect Verifiers arXiv preprint 2411.17501 (2024)
CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark arXiv preprint 2409.11363 (2024)
AI Agents That Matter arXiv preprint 2407.01502 (2024)
(* indicates equal contribution)
Is AI progress slowing down? Making sense of recent technology trends and claims. AI Snake Oil (2024)
AI leaderboards are no longer useful. It's time to switch to Pareto curves. AI Snake Oil (2024)
Workshop on Useful and Reliable AI Agents Princeton University. 600+ attendees. Virtual Workshop. August 2024.
AI agents that matter. Weaviate Podcast. Podcast. September 2024.
AI agent benchmarks are misleading, study warns. VentureBeat. News article. June 2024.
The perils of evaluating AI agents. Meta (Core Applied Sciences). Invited talk. May 2024.