top of page


Top CPU Performance Benchmarking Toolkits You Should Know
Modern compute platforms - from cloud hyperscale CPUs to edge processors - deliver unprecedented parallelism and instruction-set capabilities. But to truly understand performance, you need the right benchmarking tools. Whether you're comparing cloud instances, evaluating Arm-based servers like Ampere , or validating x86, RISC-V, or AI-accelerated hardware, the ecosystem offers several battle-tested frameworks. In this blog, we explore the most widely-used CPU benchmarking too

Rajeev Gadgil
Nov 3, 20252 min read


Unleashing Performance Insights on ARM: Bringing Intel's PerfSpect to the Entire Ecosystem
Performance analysis can often feel like searching for a needle in a haystack. When your application isn't running as fast as you'd like, where do you even begin to look? Is it a memory bottleneck? Are you stalling in the CPU's front-end? Answering these questions is critical, but traditional tools can be complex and overwhelming. This is where Intel's PerfSpect comes in. And now, thanks to some recent contributions, this powerful tool is no longer just for x86 systems. I'm h

Sameer Natu
Sep 15, 20252 min read


Benchmarking Meta Llama 4 Scout on CPU-Only Systems: Performance, Quantization, and Architecture Tuning
Meta’s Llama 4 Scout, released in April 2025, is a 17-billion parameter general-purpose language model that brings powerful reasoning to a broader range of applications—including those running without GPUs. This blog focuses on benchmarking Llama 4 Scout on CPU-only systems, covering: Tokens per second Latency per token Prompt handling efficiency Quantization techniques Architecture-specific optimization for x86, ARM, and RISC-V (RV64) Converting to GGUF format for efficient

Rajeev Gadgil
May 26, 20253 min read


GCP Cloud Performance: Time-Based Score Variations
In May 2022, one of our customers asked us to tune Elasticsearch with Esrally for cloud providers. We started with trying multiple combinations of manual runs on all cloud providers. We were collecting scaling runs with 2/4/8/16 cores. In the above data collection, we could not see the proportionate scores. Hence, we decided to experiment with running the Elasticsearch ESRally benchmark throughout the day. As Esrally doesn’t run for a particular duration, we carried out the r

Archana Barve
Apr 3, 20231 min read


Oracle Optimized BLIS Libraries for Ampere Altra Family
Basic Linear Algebra Subprograms(BLAS) and BLAS Like Interface Software(BLIS) are libraries that can accelerate mathematical operations on current CPU microarchitectures. As a part of the FLAME project , BLIS was introduced to handle the dense linear algebra software stack. The framework was designed to isolate essential kernels of computation that, when optimized, immediately enable optimized implementations of most of its commonly used and computationally intensive operatio

Rahul Bapat
Mar 19, 20231 min read
bottom of page

