Midjourney/Every illustration.

To Improve LLMs, Coach Them Like Athletes in an Arena

Games will teach you more about model capabilities than benchmarks ever could

12 1

Comments

You need to login before you can comment.
Don't have an account? Sign up!
Jo Pforr 6 days ago

Love this whole project so much! So much learning there and you share it so well. I just checked and the diplomacy repo (thankfully OS) has been forked 64x. Did you see/hear anything interesting that people did with those forks?

For c4-1m vs. c4-1m-agressive it seems like the top score and average looks better for aggressive but the variance is huge compared to the non-agressive. Now, if you join the AI Diplomacy tournament, and without giving too much away, which of the models with prompts as shown in this article would you bet on? Say the prize money is $1M? Why would you select that model and that prompt from the data set of this article?