top of page

Tuning Compiler Flags for Custom Hardware

  • Writer: Sayali Tamane
    Sayali Tamane
  • Jul 21
  • 2 min read

Benchmarking SPECint on FPGA:

Introduction

With the growing interest in AI hardware for high-performance and power-efficient computing, understanding how industry-standard benchmarks perform on such platforms is critical. In this paper, we focus on SPECrate®2017 Integer workloads, a widely-used CPU benchmark suite, and share a case study comparing various runs on an FPGA target: a base run and a tuned run that achieved better performance. This paper describes how the tuning and benchmarking procedure was executed, the challenges faced, and what we learned from this hands-on analysis.


Why SPECint on FPGA?

spec

SPECrate®2017 integer evaluates the integer processing capabilities of a CPU through a set of compute-intensive, single-threaded programs. Running these on an FPGA (with soft or hardened CPU cores) helps evaluate and tune how custom logic performs in realistic software scenarios, especially in workloads like compilers, compression, and AI preprocessing.


Benchmarking Setup

  • Platform: FPGA

  • Emulation: Run on QEMU for pre-validation, native execution on FPGA target

  • Benchmark Suite: SPEC CPU2017

  • Cross-compilation: All benchmarks built using a target toolchain with specmake, applying -static, and a base set of flags

  • Base Run: No tuning; baseline compiler flags, minimal memory tuning

  • Optimized Run: Enhanced compiler flags, better memory layout, cache tuning


Here’s how the benchmarking was carried out:


  • Cross-Compilation of SPECrate®2017 Integer Benchmarks

    • Ensured static linking for portability

    • Verified ELF binaries using file and readelf


  • Execution with runspec

    • Invoked with runspec --config=target.cfg --tune=base --size=test,train.ref for initial testing


  • Data Collection

    • Captured runtime, SPEC score, and individual benchmark outputs

    • Track CPU MHz, instruction counts using perf or counters


  • Tricks

    • Use math models to reduce run times of Spec Workloads

    • Get a sense for Test, Train and Ref workloads and find a relation so there is no need to runref everytime.

Comments


bottom of page