Unleashing Performance Insights on ARM: Bringing Intel's PerfSpect to the Entire Ecosystem
- Sameer Natu

- Sep 15
- 2 min read
Updated: Nov 21
Performance analysis can often feel like searching for a needle in a haystack. When your application isn't running as fast as you'd like, where do you even begin to look? Is it a memory bottleneck? Are you stalling in the CPU's front-end? Answering these questions is critical, but traditional tools can be complex and overwhelming.
This is where Intel's PerfSpect comes in. And now, thanks to some recent contributions, this powerful tool is no longer just for x86 systems. I'm happy to share - how I've been able to natively compile PerfSpect on ARM architecture- enabling deep performance analysis on platforms like Neoverse series for processors like Ampere, AWS Graviton, Google Axion, NVIDIA Grace and Microsoft Cobalt series of Processors supporting.
Why PerfSpect? A Simpler Path to Performance Insights
PerfSpect is a lightweight, command-line performance analysis tool. Its primary strength lies in its use of the Top-Down Microarchitecture Analysis (TMA) methodology.
Instead of drowning you in hundreds of raw performance counters, TMA provides a structured, hierarchical way to identify the primary bottleneck in your system. It breaks down CPU cycles into a few high-level categories:

Front-End Bound: The CPU isn't getting instructions fast enough.
Back-End Bound: Instructions are available, but the execution units are stalled. This is further broken down into:
Core Bound: The computation units are the bottleneck.
Memory Bound: The CPU is waiting on data from memory or caches.
Retiring: The CPU is successfully executing instructions. This is the "good" category.
Bad Speculation / Miss: The CPU wasted work on instructions that were ultimately discarded (e.g., due to branch misprediction).
By presenting performance data through this lens, PerfSpect makes it incredibly easy to pinpoint the character of your bottleneck and tells you exactly where to focus your optimization efforts.
The Competitive Landscape: How Does PerfSpect Compare?
PerfSpect doesn't exist in a vacuum. The Linux ecosystem is rich with powerful profiling tools like perf, Intel VTune Profiler, AMD uProf.
PerfSpect's unique value is its combination of simplicity, structured TMA methodology, and now, cross-architecture support. It provides actionable insights without the steep learning curve of raw perf or the complexity of a full-blown GUI profiler.
My Contribution: Native Support for ARM
My primary contribution was to port PerfSpect, enabling it to build and run natively on ARMv8/ARMv9 architectures.
This involved mapping the ARM Performance Monitoring Unit (PMU) events to the TMA categories, allowing the same intuitive reporting to work seamlessly on platforms from Ampere, Amazon, and Microsoft. Now, developers can use a single, familiar tool to analyze workloads across different server fleets.
Get Started: Build and Run PerfSpect on ARM
Ready to try it on your ARM machine? Here’s how you can get it up and running.
Prerequisites
Ensure you have Python, pip, and the standard Linux performance tools installed.
# For Debian/Ubuntu-based systems
$ sudo apt-get update
$ sudo apt-get install -y python3 python3-pip linux-tools-common linux-tools-genericStep 1: Clone the Repository
$ git clone -b Neoverse-native-support https://github.com/Whileone-Techsoft/PerfSpect.git
$ cd PerfSpectStep 2: Build Tools Docker for aarch64
$ ./builder/build.shStep 3: Build Perfspect natively on aarch64
$ make -jSample TMA image for Graviton4





Comments