top of page


Tuning Compiler Flags for Custom Hardware
Benchmarking SPECint on FPGA: Introductio n With the growing interest in AI hardware for high-performance and power-efficient computing, understanding how industry-standard benchmarks perform on such platforms is critical. In this paper, we focus on SPECrate®2017 Integer workloads, a widely-used CPU benchmark suite, and share a case study comparing various runs on an FPGA target: a base run and a tuned run that achieved better performance. This paper describes how the tuning

Sayali Tamane
Jul 21, 20252 min read


To get maximum tokens generated for target CPU
LLMs are Getting Better and Smaller Let’s look at Llama as an example. The rapid evolution of these models highlights a key trend in AI: prioritizing efficiency and performance. When Llama 2 70B launched in August 2023, it was considered a top-tier foundational model. However, its massive size demanded powerful hardware like the NVIDIA H100 accelerator. Less than nine months later, Meta introduced Llama 3 8B, shrinking the model by almost 9x. This enabled it to run on smaller

Archana Barve
Jun 9, 20251 min read


Benchmarking Meta Llama 4 Scout on CPU-Only Systems: Performance, Quantization, and Architecture Tuning
Meta’s Llama 4 Scout, released in April 2025, is a 17-billion parameter general-purpose language model that brings powerful reasoning to a broader range of applications—including those running without GPUs. This blog focuses on benchmarking Llama 4 Scout on CPU-only systems, covering: Tokens per second Latency per token Prompt handling efficiency Quantization techniques Architecture-specific optimization for x86, ARM, and RISC-V (RV64) Converting to GGUF format for efficient

Rajeev Gadgil
May 26, 20253 min read


Cross-Compiling SPEC CPU2017 for RISC-V (RV64): A Practical Guide
SPEC CPU2017 is a well-known benchmark suite for evaluating CPU-intensive performance. Although it assumes native compilation and execution, there are cases—especially with RISC-V (RV64) platforms—where cross-compilation is the only feasible route. This guide walks through the steps to cross-compile SPEC CPU2017 for RISC-V, transfer the binaries to a target system, and optionally use the --fake option to simulate runs where execution isn't possible or needed during develop

Rajeev Gadgil
May 12, 20253 min read
bottom of page

