LPR: Large Language Models-Aided Program Reduction
Mengxiao Zhang, Yongqiang Tian, Zhenyang Xu, and 3 more authors
In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, Vienna, Austria, September 16-20, 2024, 2024
Program reduction strives to eliminate bug-irrelevant code elements from a bug-triggering program, so that (1) a smaller and more straightforward bug-triggering program can be obtained, (2) and the difference among duplicates (i.e., different programs that trigger the same bug) can be minimized or even eliminated. With such reduction and canonicalization functionality, program reduction facilitates debugging for software, especially language toolchains, such as compilers, interpreters, and debuggers. While many program reduction techniques have been proposed, most of them (especially the language-agnostic ones) overlooked the potential reduction opportunities hidden within tokens. Therefore, their capabilities in terms of reduction and canonicalization are significantly restricted.To fill this gap, we propose T-Rec, a fine-grained language-agnostic program reduction technique guided by lexical syntax. Instead of treating tokens as atomic and irreducible components, T-Rec introduces a fine-grained reduction process that leverages the lexical syntax of programming languages to effectively explore the reduction opportunities in tokens. Through comprehensive evaluations with versatile benchmark suites, we demonstrate that T-Rec significantly improves the reduction and canonicalization capability of two existing language-agnostic program reducers (i.e., Perses and Vulcan). T-Rec enables Perses and Vulcan to further eliminate 1,294 and 1,315 duplicates in a benchmark suite that contains 3,796 test cases that triggers 46 unique bugs. Additionally, T-Rec can also reduce up to 65.52% and 53.73% bytes in the results of Perses and Vulcan on our multi-lingual benchmark suite, respectively.
2023
2023
ASPLOS ’23
Compilation Consistency Modulo Debug Information
Theodore Luo Wang, Yongqiang Tian, Yiwen Dong, and 2 more authors
In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, Vancouver, BC, Canada, Aug 2023
Compilation Consistency Modulo Debug Information (CCMD) is an essential compiler property that a production compiler should support: the compiler should emit the same machine code regardless of enabling debug information. CCMD is vital to developers’ experiences with debugging a production binary containing no debug information. To debug such a binary, developers need build another binary with the same compiler flags and enable debug information. Without CCMD, the machine code in the latter binary will be different, which can confuse the debugger, hide the bug, or even cause a miscompilation (as GCC once did with the Linux Kernel). This paper is the first to introduce to the research community the validation of CCMD, a new research problem that has been overlooked for decades despite its importance. More importantly, we propose the first testing technique Dfusor to automatically validate CCMD for C compilers. At the high level, given a compilable program P as a seed, Dfusor automatically generates compilable program variants via multiple effective program transformations. Such variants can cause a compiler to emit more debug information than it would when compiling P, thus exercising more code paths in the compiler and increasing the chance to find CCMD bugs. Our extensive evaluations of Dfusor demonstrate that Dfusor can produce variants that exhibit significant increases in the quantity and complexity of the emitted debug information, and thus has found new, real bugs in GCC and LLVM. With a sample of 100 variants derived from distinct seed programs, Dfusor introduces 214% more debug information entries and 36% more distinct debug information entries in the variants than the seeds, and improves the code coverage of GCC and Clang by up to 6.00% and 6.82%. More importantly, Dfusor has found CCMD bugs; within 10 months of development and intermittent testing, Dfusor has found 23 bugs (9 in GCC and 14 in Clang), with 3 confirmed and 18 fixed.
OOPSLA ’23
Pushing the Limit of 1-Minimality of Language-Agnostic Program Reduction
Zhenyang Xu, Yongqiang Tian, Mengxiao Zhang, and 3 more authors
Program reduction has demonstrated its usefulness in facilitating debugging language implementations in practice, by minimizing bug-triggering programs. There are two categories of program reducers: language-agnostic program reducers (AGRs) and language-specific program reducers (SPRs). AGRs, such as HDD and Perses, are generally applicable to various languages; SPRs are specifically designed for one language with meticulous thoughts and significant engineering efforts, e.g., C-Reduce for reducing C/C++ programs. Program reduction is an NP-complete problem: finding the globally minimal program is usually infeasible. Thus all existing program reducers resort to producing 1-minimal results, a special type of local minima. However, 1-minimality can still be large and contain excessive bug-irrelevant program elements. This is especially the case for AGR-produced results because of the generic reduction algorithms used in AGRs. An SPR often yields smaller results than AGRs for the language for which the SPR has customized reduction algorithms. But SPRs are not language-agnostic, and implementing a new SPR for a different language requires significant engineering efforts. This paper proposes Vulcan, a language-agnostic framework to further minimize AGRs-produced results by exploiting the formal syntax of the language to perform aggressive program transformations, in hope of creating reduction opportunities for other reduction algorithms to progress or even directly deleting bugirrelevant elements from the results. Our key insights are two-fold. First, the program transformations in all existing program reducers including SPRs are not diverse enough, which traps these program reducers early in 1-minimality. Second, compared with the original program, the results of AGRs are much smaller, and time-wise it is affordable to perform diverse program transformations that change programs but do not necessarily reduce the sizes of the programs directly. Within the Vulcan framework, we proposed three simple examples of fine-grained program transformations to demonstrate that Vulcan can indeed further push the 1-minimality of AGRs. By performing these program transformations, a 1-minimal program might become a non-1-minimal one that can be further reduced later. Our extensive evaluations on multilingual benchmarks including C, Rust and SMT-LIBv2 programs strongly demonstrate the effectiveness and generality of Vulcan. Vulcan outperforms the state-of-the-art language-agnostic program reducer Perses in size in all benchmarks: On average, the result of Vulcan contains 33.55%, 21.61%, and 31.34% fewer tokens than that of Perses on C, Rust, and SMT-LIBv2 subjects respectively. Vulcan can produce even smaller results if more reduction time is allocated. Moreover, for the C programs that are reduced by C-Reduce, Vulcan is even able to further minimize them by 10.07%.
IJCAI ’23
Revisiting the Evaluation of Deep Learning-Based Compiler Testing
Yongqiang Tian, Zhenyang Xu, Yiwen Dong, and 2 more authors
In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China, Apr 2023
Mengxiao Zhang, Zhenyang Xu, Yongqiang Tian, and 2 more authors
In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023, Apr 2023
Jia Le Tian, Mengxiao Zhang, Zhenyang Xu, and 3 more authors
In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023, Apr 2023
Security of smart contracts has attracted increasing attention in recent years. Many researchers have devoted themselves to devising testing tools for vulnerability detection. Each published tool has demonstrated its effectiveness through a series of evaluations on their own experimental scenarios. However, the inconsistency of evaluation settings such as different data sets or performance metrics, may result in biased conclusion. In this paper, based on an empirical evaluation of widely used smart contract testing tools, we propose a unified standard to eliminate the bias in the assessment process. First, we collect 46,186 source-available smart contracts from four influential organizations. This comprehensive dataset is open to the public and involves different code characteristics, vulnerability patterns and application scenarios. Then we propose a 4-step evaluation process and summarize the difference among relevant work in these steps. We use nine representative tools to carry out extensive experiments. The results demonstrate that different choices of experimental settings could significantly affect tool performance and lead to misleading or even opposite conclusions. Finally, we generalize some problems of existing testing tools, and propose some possible directions for further improvement.
TSE
Pluto: Exposing vulnerabilities in inter-contract scenarios
Fuchen Ma, Zhenyang Xu, Meng Ren, and 7 more authors
IEEE Transactions on Software Engineering, Apr 2021
Attacks on smart contracts have caused considerable losses to digital assets. Many techniques based on symbolic execution, fuzzing, and static analysis are used to detect contract vulnerabilities. Most of the current analyzers only consider vulnerability detection intra-contract scenarios. However, Ethereum contracts usually interact with others by calling their functions. A bug hidden in a path that depends on information from external contract calls is defined as an inter-contract vulnerability. Failure to deal with this kind of bug can result in potential false negatives and false positives. In this work, we propose Pluto, which supports vulnerability detection in inter-contract scenarios. It first builds an Inter-contract Control Flow Graph (ICFG) to extract semantic information among contract calls. Afterward, it symbolically explores the ICFG and deduces Inter-Contract Path Constraints (ICPC) to check the reachability of execution paths more accurately. Finally, Pluto detects whether there is a vulnerability based on some predefined rules. For evaluation, we compare Pluto with five state-of-the-art tools, including Oyente, Mythril, Securify, ILF, and Clairvoyance on a labeled benchmark and 39,443 real-world Ethereum smart contracts. The result shows that other tools can only detect 10% of the inter-contract vulnerabilities, while Pluto can detect 80% of them on the labeled dataset. Beyond that, Pluto has detected 451 confirmed vulnerabilities on real-world contracts, including 36 vulnerabilities in inter-contract scenarios. Two bugs have been assigned with unique CVE identifiers by the US National Vulnerability Database (NVD). On average, Pluto costs 16.9 seconds to analyze a contract, which is as fast as the state-of-the-art tools.