Researchers from Google DeepMind, Stanford University, and the University of California, Berkeley have developed a Chain of Code that addresses the problem of improving the code-driven reasoning of language models. Chain of Code encourages LMs to format semantic sub-tasks in a program as flexible pseudocode that the interpreter can explicitly catch undefined behaviors and hand off to simulate with an LM(as an “LMulator”). CoC scales well with large and small models and broadens the scope of reasoning questions LMs can correctly answer by thinking in code.
Works like Chain of Thought, least-to-most, and ScratchPad have leveraged prompting to improve reasoning by breaking tasks down into intermediate steps or maintaining a trace of intermediate results.LMs trained on Github have been prompted to write and execute code, which helps solve complex questions involving numeric or symbolic reasoning.
To solve a given problem, CoC generates reasoning substeps in the code structure. This code provides the framework of reasoning through the pain and may be in the form of explicit code, pseudocode, or natural language. CoC enables code use in entirely new regimes by combining the advantages of code with the powerful semantic and commonsense knowledge of LMs, which can easily express rules that are challenging to speak in code(e.g., which foods are fruits?).
A core contribution of CoC is not just the generation of reasoning code but how it is executed. Once the code is written, the code is attempted to be run by a code interpreter- in this work, researchers consider Python, but the approach is general to any interpreter. If the code is successfully executed, the program state is updated, and the execution continues. If the code is not executable or raises any exception, the language model instead is used to simulate the execution. The language model’s outputs update the program state, and the execution continues.
The overall performance of the CoC approach outperforms other methods, exceeding the human baseline in the number of tasks it exceeds and the overall amount it exceeds the baseline. CoC achieves state-of-the-art performance in several studies. It shows improvements in performance as the model size increases, similar to Chain of Thought prompting. Cross-task prompting results in a drop in performance for all methods, but CoC still outperforms Chain of Thought and direct prompting at scale, approaching human average performance.
CoC is an approach towards reasoning with language models through writing code and executing code either with an interpreter or with a language model that simulates the execution if the code is not executable. CoC can leverage both the expressive structure of regulation and its powerful tools. Beyond this, by simulating the execution of non-executable code, CoC can apply to problems nominally outside the scope of code (e.g., semantic reasoning problems).
Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
The post Google DeepMind Researchers Propose Chain of Code (CoC): A Simple Yet Surprisingly Effective Extension that Improves Language Model (LM) Code-Driven Reasoning appeared first on MarkTechPost.
#AIShorts #Applications #ArtificialIntelligence #EditorsPick #LanguageModel #LargeLanguageModel #MachineLearning #Staff #TechNews #Technology #Uncategorized [Source: AI Techpark]