RISCV Fuzzer for GCC and LLVM
- Rajeev Gadgil

- Sep 8, 2025
- 3 min read
Updated: Dec 3, 2025
Fuzzing RISC-V compilers like GCC and LLVM is a crucial practice for ensuring the correctness and security of the entire software ecosystem built on this architecture. It's not about finding vulnerabilities in the final compiled code, but rather about discovering bugs within the compiler itself that could lead to incorrect code generation, unexpected behavior, or even exploitable flaws.
Why Compiler Fuzzing is a Unique Challenge
Fuzzing compilers is different from fuzzing a typical application. Instead of feeding random data to a program, you're generating random, yet syntactically valid, source code to feed to the compiler. A dumb fuzzer that just mutates bytes will quickly generate code that can't even be parsed, missing deeper bugs.
The primary goal of compiler fuzzing is to detect two main types of bugs:

Crashes and Panics: The fuzzer generates code that causes the compiler to crash, hang, or throw a fatal error during compilation. This indicates a compiler bug that needs to be fixed.
Miscompilations: This is the most dangerous type of bug. The compiler successfully compiles the fuzzed code, but the generated machine code (the RISC-V assembly) is incorrect. This can lead to silent data corruption, security vulnerabilities, or unpredictable program behavior. Finding these requires a technique called differential fuzzing.
The Power of Differential Fuzzing for RISC-V Compilers
Differential fuzzing is an exceptionally powerful technique for finding miscompilations in RISC-V compilers. Here's how it works:

A fuzzer, often one that generates valid C or C++ code (like csmith), creates a unique program.
This program is compiled by at least two different compilers (e.g., GCC and LLVM) or with different optimization flags (e.g., -O0 and -O3).
The compiled binaries are then executed, and their outputs are compared.
If the outputs don't match, it means at least one of the compilers has a miscompilation bug. The fuzzer then saves this specific source code as a test case for a developer to analyze.
This method effectively uses a "test oracle" to automatically identify bugs without needing to know the correct output beforehand. It's a key reason why so many compiler bugs have been found in both GCC and LLVM.
Key Tools and Repositories for RISC-V Compiler Fuzzing
While many of the general-purpose fuzzers mentioned before (like AFL++) can be used to fuzz a compiler's source code, specialized tools are often needed to effectively generate and test valid RISC-V-specific code.
csmith: This is a well-known, randomized test case generator for C programs. It creates complex, valid C code that is a perfect input for differential testing of C compilers like GCC and LLVM. While not RISC-V-specific, it's an essential part of the workflow for fuzzing any C compiler's RISC-V backend.
GitHub Repo: https://github.com/csmith-project/csmith
RISCV-DV: Maintained by the RISC-V community, this tool is primarily for design verification of RISC-V processors, but it can be used to generate complex instruction sequences for testing compiler backends. It's highly configurable and can target specific ISA extensions.
GitHub Repo: https://github.com/google/riscv-dv
IRFuzzer: A specialized fuzzer for the LLVM backend. Instead of generating C/C++ source code, it generates LLVM's Intermediate Representation (IR), allowing it to directly test the backend code generation without worrying about frontend bugs. This is a very targeted approach for finding issues in LLVM's RISC-V code generator.
GitHub: As a research tool, you can often find resources on arXiv and university websites. Searching for "IRFuzzer" on GitHub will lead to related projects.
RISCV-Vector-Intrinsic-Fuzzing (RIF): A specific fuzzer designed to generate random code using the RISC-V Vector Extension (RVV) intrinsics. This is crucial for verifying that compilers like GCC and LLVM correctly implement this complex and performance-critical part of the RISC-V ISA.
Patrick-rivos/compiler-fuzz-ci: This GitHub repository provides a great example of a Continuous Integration (CI) setup for fuzzing RISC-V compilers. It demonstrates how to combine tools like csmith with a CI pipeline to automatically fuzz GCC and LLVM and report bugs.
GitHub Repo: https://github.com/patrick-rivos/compiler-fuzz-ci
Fuzzing RISC-V compilers is an ongoing and critical effort. It ensures that the software developers are building on top of is reliable, secure, and correctly translated to the underlying hardware, strengthening the entire RISC-V ecosystem.





Comments