Program Synthesis and Semantic Parsing with Learned Code Idioms

Program synthesis of general-purpose source code from natural language specifications is challenging due to the need to reason about high-level patterns in the target program and low-level implementation details at the same time. In this work, we present PATOIS, a system that allows a neural program synthesizer to explicitly interleave high-level and low-level reasoning at every generation step. It accomplishes this by automatically mining common code idioms from a given corpus, incorporating them into the underlying language for neural synthesis, and training a tree-based neural synthesizer to use these idioms during code generation. We evaluate PATOIS on two complex semantic parsing datasets and show that using learned code idioms improves the synthesizer's accuracy.

Program synthesis is a task of translating an incomplete specification (e.g. natural language, input- output examples, or a combination of the two) into the most likely program that satisfies this specification in a given language. In the last decade, it has advanced dramatically thanks to the novel neural and neuro-symbolic techniques, first mass-market applications, and massive datasets. Most of the successful applications apply program synthesis to manually crafted domain-specific languages (DSLs) such as FlashFill and Karel, or to subsets of general-purpose functional languages such as SQL and Lisp. However, scaling program synthesis to real-life programs in a general-purpose language with complex control flow remains an open challenge.


By leveraging automation, code generation, and machine learning, we are capable of delivering secure, stable, and change-embracing software. Through research and education, we push the boundaries of technology forward.

© 2020 Protoku. All rights reserved.