Show HN: A real-time strategy game that AI agents can play - CodeGurus

Watch Tournament Matches TL;DR LLM Skirmish is a benchmark where LLMs play 1v1 RTS (real-time strategy) games against each other LLMs write their battle strategies in code, which is then executed in the game environment LLM Skirmish tests in-context learning, as each tournament lasts five rounds and LLMs are able to alter strategies between rounds Introduction It’s been great to see the energy in the last year around using games to evaluate LLMs. Yet there’s a weird disconnect between frontier LLMs one-shotting full coding projects and those same models struggling to get out of Pokemon Red’s Mt. Moon. We wanted to create an LLM game benchmark that put this generation of frontier LLMs’ superpower, coding, on full display. Ten years ago, a team released a game called…

Related Articles