• Sat. Jul 6th, 2024

MIT Researchers Introduce LILO: A Neuro-Symbolic Framework for Learning Interpretable Libraries for Program Synthesis

Nov 8, 2023

MIT Researchers Introduce LILO: A Neuro-Symbolic Framework for Learning Interpretable Libraries for Program Synthesis

Nov 8, 2023

Big language models (LLMs) are becoming increasingly skilled in programming in various contexts, such as finishing partly written code, interacting with human programmers, and even figuring out challenging programming riddles at the competition level. Software developers, however, are more interested in creating libraries that may be used to solve whole problem domains than they are in finishing the current work at hand. To this aim, the skill of refactoring—finding abstractions that make the codebase more legible (intuitive to other programmers), reusable (generalizing to new jobs), and compact (consolidating shared structure)—is a crucial component of software development. It will be necessary to expand the capabilities of current code completion tools—which are presently utilized by millions of programmers—to address the issue of library learning to solve this multi-objective optimization. 

To learn libraries of reusable function abstractions, they integrate language models with current algorithmic developments in automatic refactoring from the programming languages (PL) literature in this study. Researchers from MIT CSAIL, MIT Brain and Cognitive Sciences and Harvey Mudd College present LILO, a neurosymbolic framework comprised of three interrelated modules (Fig. 1) for Library Induction from Language Observations: 

• A dual-system synthesis module, which uses two different approaches to look for answers to programming problems: Strong domain-general priors are introduced into the system by LLM-guided search, while domain-specific expressions may be found through enumerative search

• A compression module that uses STITCH, a high-performance symbolic compression system, to find relevant abstractions from the current solution set

• An auto-documentation (AutoDoc) module that produces docstrings and function names that are legible by humans, enhancing interpretability and facilitating LLM-guided search later on. 

Their design is based on the iterative Wake-Sleep algorithm DREAMCODER, which alternates between finding solutions to programming challenges (the Wake phase) and rewriting common abstractions into a library (the Sleep phase), which helps to direct the search. DreamCoder, in contrast to conventional deep learning techniques, may draw significant generalizations from a small number of samples, and the learnt library symbolically represents the conceptual knowledge of the model. But DreamCoder’s search process is so computationally demanding that learning a single domain takes over two CPU months. 

Figure 1: The LILO learning loop overview. (Al) Using a dual-system search methodology, LILO creates programs from task descriptions written in plain language. LILO combines LLM-generated auto-documentation (C) with a compression method called STITCH (B) to restructure a collection of program solutions and create an interpretable library of λ-abstractions. The structure of program solutions (A vs. D) is made simpler by this 

search-compress-document cycle, which facilitates the solving of increasingly difficult jobs in subsequent rounds.

A significant portion of this search time is devoted to “getting off the ground”—finding a foundational set of abstractions that programmers are either already familiar with or may be able to grasp rapidly due to prior domain-specific problem-solving experience. Furthermore, DreamCoder libraries are not always interpretable; deciphering them requires domain knowledge and an understanding of lambda calculus. To tackle these problems, LILO uses LLMs in two innovative ways: (1) to find program solutions more quickly during searches and (2) to enhance the documentation of learnt libraries to make them easier to understand. On three difficult program synthesis domains—string editing with regular expressions, scene reasoning on the CLEVR dataset, and graphics composition in the 2D Logo turtle graphics language—they compare LILO to a language-guided DreamCoder equivalent. 

Compared to DreamCoder, LILO completes more jobs on all three domains and learns empirically richer libraries that contain abstractions that are impossible to find with current techniques. As an illustration, LILO picks up the notion of a vowel, a crucial first step in the string editing field, as it eliminates the need to look up more than 265 potential character primitive disjunctions. LILO compresses this information into symbolic abstractions that are helpful for both conventional search techniques and LLM-guided synthesis, in contrast to LLM-only baselines, which can accomplish comparable tasks. Their AutoDoc module is essential to this neurosymbolic integration since it enhances interpretability and facilitates the LLM synthesizer’s better utilization of the library. As a novel development in a long line of work in inductive program synthesis, LILO shows how concepts and resources from the PL community may be combined with current advances in language modeling.


Check out the Paper and GithubAll credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.

The post MIT Researchers Introduce LILO: A Neuro-Symbolic Framework for Learning Interpretable Libraries for Program Synthesis appeared first on MarkTechPost.


#AIShorts #Applications #ArtificialIntelligence #EditorsPick #LanguageModel #LargeLanguageModel #MachineLearning #Staff #TechNews #Technology #Uncategorized
[Source: AI Techpark]

Related Post

Big language models (LLMs) are becoming increasingly skilled in programming in various contexts, such as finishing partly written code, interacting with human programmers, and even figuring out challenging programming riddles at the competition level. Software developers, however, are more interested in creating libraries that may be used to solve whole problem domains than they are in finishing the current work at hand. To this aim, the skill of refactoring—finding abstractions that make the codebase more legible (intuitive to other programmers), reusable (generalizing to new jobs), and compact (consolidating shared structure)—is a crucial component of software development. It will be necessary to expand the capabilities of current code completion tools—which are presently utilized by millions of programmers—to address the issue of library learning to solve this multi-objective optimization. 

To learn libraries of reusable function abstractions, they integrate language models with current algorithmic developments in automatic refactoring from the programming languages (PL) literature in this study. Researchers from MIT CSAIL, MIT Brain and Cognitive Sciences and Harvey Mudd College present LILO, a neurosymbolic framework comprised of three interrelated modules (Fig. 1) for Library Induction from Language Observations: 

• A dual-system synthesis module, which uses two different approaches to look for answers to programming problems: Strong domain-general priors are introduced into the system by LLM-guided search, while domain-specific expressions may be found through enumerative search

• A compression module that uses STITCH, a high-performance symbolic compression system, to find relevant abstractions from the current solution set

• An auto-documentation (AutoDoc) module that produces docstrings and function names that are legible by humans, enhancing interpretability and facilitating LLM-guided search later on. 

Their design is based on the iterative Wake-Sleep algorithm DREAMCODER, which alternates between finding solutions to programming challenges (the Wake phase) and rewriting common abstractions into a library (the Sleep phase), which helps to direct the search. DreamCoder, in contrast to conventional deep learning techniques, may draw significant generalizations from a small number of samples, and the learnt library symbolically represents the conceptual knowledge of the model. But DreamCoder’s search process is so computationally demanding that learning a single domain takes over two CPU months. 

Figure 1: The LILO learning loop overview. (Al) Using a dual-system search methodology, LILO creates programs from task descriptions written in plain language. LILO combines LLM-generated auto-documentation (C) with a compression method called STITCH (B) to restructure a collection of program solutions and create an interpretable library of λ-abstractions. The structure of program solutions (A vs. D) is made simpler by this 

search-compress-document cycle, which facilitates the solving of increasingly difficult jobs in subsequent rounds.

A significant portion of this search time is devoted to “getting off the ground”—finding a foundational set of abstractions that programmers are either already familiar with or may be able to grasp rapidly due to prior domain-specific problem-solving experience. Furthermore, DreamCoder libraries are not always interpretable; deciphering them requires domain knowledge and an understanding of lambda calculus. To tackle these problems, LILO uses LLMs in two innovative ways: (1) to find program solutions more quickly during searches and (2) to enhance the documentation of learnt libraries to make them easier to understand. On three difficult program synthesis domains—string editing with regular expressions, scene reasoning on the CLEVR dataset, and graphics composition in the 2D Logo turtle graphics language—they compare LILO to a language-guided DreamCoder equivalent. 

Compared to DreamCoder, LILO completes more jobs on all three domains and learns empirically richer libraries that contain abstractions that are impossible to find with current techniques. As an illustration, LILO picks up the notion of a vowel, a crucial first step in the string editing field, as it eliminates the need to look up more than 265 potential character primitive disjunctions. LILO compresses this information into symbolic abstractions that are helpful for both conventional search techniques and LLM-guided synthesis, in contrast to LLM-only baselines, which can accomplish comparable tasks. Their AutoDoc module is essential to this neurosymbolic integration since it enhances interpretability and facilitates the LLM synthesizer’s better utilization of the library. As a novel development in a long line of work in inductive program synthesis, LILO shows how concepts and resources from the PL community may be combined with current advances in language modeling.


Check out the Paper and GithubAll credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.

The post MIT Researchers Introduce LILO: A Neuro-Symbolic Framework for Learning Interpretable Libraries for Program Synthesis appeared first on MarkTechPost.


#AIShorts #Applications #ArtificialIntelligence #EditorsPick #LanguageModel #LargeLanguageModel #MachineLearning #Staff #TechNews #Technology #Uncategorized
[Source: AI Techpark]

Related Post