Introduction to Large Language Models
Large language models (LLMs) are incredibly skilled at understanding the context of a document and providing logical answers about its contents. However, these same models often struggle to correctly answer even the simplest math problems. This is because textual reasoning, which LLMs are based on, is not always the best approach for computational or algorithmic tasks.
The Limitations of Textual Reasoning
Textual reasoning is usually a less-than-ideal way to deliberate over computational or algorithmic tasks. While some LLMs can generate code like Python to handle symbolic queries, the models don’t always know when to use code, or what kind of code would work best. This limitation can lead to incorrect or inefficient solutions to problems.
The Development of CodeSteer
To address this limitation, researchers at MIT have developed a smart assistant called CodeSteer. CodeSteer is a smaller LLM that guides a larger LLM to switch between code and text generation until it correctly answers a query. CodeSteer automatically generates a series of prompts to iteratively steer a larger LLM, reviewing the model’s current and previous answers after each round and providing guidance for how it can fix or refine that solution until it deems the answer is correct.
How CodeSteer Works
CodeSteer works in conjunction with the larger LLM. It first reviews a query and determines whether text or code is suitable for this problem, and which sort of code would be best. Then it generates a prompt for the larger LLM, telling it to use a coding method or textual reasoning to answer the query. The larger model follows this prompt to answer the query and sends the result back to CodeSteer, which reviews it. If the answer is not correct, CodeSteer will continue prompting the LLM to try different things that might fix the problem, such as incorporating a search algorithm or constraint into its Python code, until the answer is correct.
The Benefits of CodeSteer
The researchers found that augmenting a larger LLM with CodeSteer boosted its accuracy on symbolic tasks, like multiplying numbers, playing Sudoku, and stacking blocks, by more than 30 percent. It also enabled less sophisticated models to outperform more advanced models with enhanced reasoning skills. This advance could improve the problem-solving capabilities of LLMs for complex tasks that are especially difficult to solve with textual reasoning alone.
Tackling Complex Tasks
As the researchers designed CodeSteer, they couldn’t find suitable symbolic datasets to fine-tune and test the model, since many existing benchmarks don’t point out whether a certain query could be best solved with text or code. So, they gathered a corpus of 37 complex symbolic tasks, including spatial reasoning, mathematics, order reasoning, and optimization, and built their own dataset, called SymBench. They implemented a fine-tuning approach that leverages SymBench to maximize the performance of CodeSteer.
Results and Future Directions
In their experiments, CodeSteer outperformed all nine baseline methods they evaluated and boosted average accuracy from 53.3 percent to 86.4 percent. It maintains similar performance even on unseen tasks, and on a variety of LLMs. In addition, a general-purpose model augmented with CodeSteer can achieve higher accuracy than state-of-the-art models designed to focus on complex reasoning and planning, while requiring much less computation. The researchers want to streamline CodeSteer to speed up its iterative prompting process and study how to effectively fine-tune a unified model with the ability to switch between textual reasoning and code generation.
Conclusion
CodeSteer represents a significant advancement in the development of large language models. By guiding LLMs to switch between code and text generation, CodeSteer can improve the accuracy of these models on complex tasks. This technology has the potential to enhance the problem-solving capabilities of LLMs and enable them to tackle a wide range of tasks that are currently challenging for them. As the field of artificial intelligence continues to evolve, the development of tools like CodeSteer will play an important role in unlocking the full potential of LLMs.