A new chess-based benchmark, ChessArena, has revealed that large language models (LLMs) still struggle with genuine strategic reasoning, with none surpassing human amateur level and some losing to ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results