Hao Lab Test AI Models On Super Mario

Hao AI Lab, part of the University of California San Diego, recently put a variety of AI models through their paces by testing them on the original Super Mario from 1985. The objective was to help create Gaming Models that can be applied to future AI use.
Hao AI Lab’s main aim is to democratize large machine learning systems, so that they can be used by anyone for reasons beyond their initial creation. For the Super Mario test, Claude-3.7, a recent superstar in the AI-plays-games arena, performed the best, with GPT-4o being less impressive.
“We threw AI gaming agents into LIVE Super Mario games and found Claude-3.7 outperformed other models with simple heuristics,” Hao AI Lab said in a recent post on X (Twitter).
“We believe games provide challenging and dynamic environments for testing LLM (Language Learning Model) agents.”
Gaming Models
LMGames is the name of the Hao Lab team behind the recent Super Mario test, and as part of what it’s calling the GamingAgent project the team has released the relevant source code. This is published under an MIT licence, meaning it’s free for use and can be adapted or refined, as long as any subsequent project uses the same licence.
At the moment, the GamingAgent works with 2084 and Tetris, as well as Super Mario, and it can handle a few AI models created by OpenAI, Anthropic and Gemini. As such, there’s plenty of room for the code to be expanded to run on other games and AI models.
AI Benchmarks
Training AI by getting it to play games is not a new idea. Back in 2019, Greg Brockman, an overseer for OpenAI, used the now-popular LLM on a variety of games to test and refine its reasoning capabilities.
“Games have always been a benchmark for AI. If you can’t solve games, you can’t expect to solve anything else,” he said in a New York Times article.