top of page


The Architecture of Speed: Agentic AI & The New Era of Performance
Discover how agentic AI is flipping the performance engineering paradigm. Learn how to evolve from a "loop tuner" to a "constraints governor" with our checklist for AI-proof SLOs.
Rajeev Gadgil
Jan 123 min read


From Innovation to Impact: Aligning ER&D with Marketing and Sales
Engineering R&D in a Changing Landscape Engineering Research and Development has always been at the heart of innovation. But today, its role is evolving rapidly. What was once primarily about pushing technical boundaries is now equally about speed, efficiency, and alignment with business outcomes. As industries grow more complex and interconnected, Engineering R&D teams are being asked to deliver faster, smarter, and with fewer margins for error. From a marketing and sales po
Shruti Gadgil
Dec 29, 20253 min read


Porting Math Primitives on a Custom RISC-V
Introduction As RISC-V expands into accelerator domains, software readiness becomes as critical as hardware innovation. This work focused on implementing a set of mathematical and BLAS primitives for a custom RISC-V architecture, forming foundational building blocks for numerical computing. The implementation included vector and matrix operations with careful attention to numerical correctness and floating-point behavior. Challenges and Constraints A key challenge was the abs
Anup Halarnkar
Dec 22, 20251 min read


QEMU vs. FPGA: Understanding the Differences in Emulating and Prototyping Any ISA
With the evolution of hardware design and development, two tools have become fundamental for those working on Instruction Set Architectures (ISA) QEMU and FPGA boards. Although both serve as key resources for developing, testing, and experimenting with different ISAs (such as RISC-V, ARM, x86, etc.), they operate in significantly different ways. This blog highlights the key distinctions between QEMU and FPGA boards and their use cases across various architectures. Key Featur
Sayali Tamane
Dec 8, 20253 min read


Network Latency Study in OCI Cloud
Network testing tools such as netperf can perform latency tests plus throughput tests and more. In netperf, the TCP_RR and UDP_RR (RR=request-response) tests report round-trip latency. With the -o flag, output metrics can be customized to display the exact information. Here’s an example of using the test-specific -o flag so netperf outputs several latency statistics: Google has lots of practical experience in latency benchmarking and as per blog using-netperf-and-ping-to-me
Archana Barve
Dec 1, 20254 min read


Understanding DLRM with PyTorch
DLRM stands for Deep Learning Recommendation Model. It is a neural network architecture developed by Facebook AI (Meta) for large-scale personalized recommendation systems. DLRM is widely used in real-world applications where personalized recommendations or ranking predictions are needed. DLRM designed for click-through rate (CTR) prediction and ranking task. Examples: Online Advertising, E-commerce Recommendations, Social Media Feed Ranking, Streaming Services, Online Mark
Mrinal Kshirsagar
Nov 24, 20252 min read


Top CPU Performance Benchmarking Toolkits You Should Know
Modern compute platforms - from cloud hyperscale CPUs to edge processors - deliver unprecedented parallelism and instruction-set capabilities. But to truly understand performance, you need the right benchmarking tools. Whether you're comparing cloud instances, evaluating Arm-based servers like Ampere , or validating x86, RISC-V, or AI-accelerated hardware, the ecosystem offers several battle-tested frameworks. In this blog, we explore the most widely-used CPU benchmarking too
Rajeev Gadgil
Nov 3, 20252 min read


Major Takeaways from RISCV NA Summit 2025
1. The Software Ecosystem is Now the Core Focus The most significant shift was the overwhelming emphasis on software, tools, and developer experience. Platform Mindset: Keynote speakers, including executives from major players, stressed the need to view RISC-V not just as an ISA (Instruction Set Architecture) but as an ecosystem that requires platform-level thinking. The message was clear: no single company can build the entire software stack alone; continued, sustained commu
Anup Halarnkar
Oct 27, 20256 min read


Predicting Differential Loss at the Edge: Lightweight ML for Real-Time Test Intelligence
Inspiration In high-throughput production environments, every sensor reading tells a story. Test systems continuously record Pressure , Temperature , and Differential Loss (DL) across thousands of cycles, but much of this data remains passive, observed but not interpreted. We set out to change that by deploying machine learning directly at the edge on a BeagleBone Black board. The goal was not anomaly detection, but live inference : to compute what the ideal DL should be (
Alisha Bhale
Oct 20, 20253 min read


Debugging the Debugger: A Deep Dive into GDB and RISC-V
In the world of software development, the GNU Debugger (GDB) is an essential tool for programmers. It allows us to peer inside a running program, find bugs, and understand complex code. As new hardware architectures emerge, it's crucial that our tools keep pace. One such rising star is RISC-V, an open-source instruction set architecture that is rapidly gaining popularity, particularly with its new vector extensions for high-performance computing. The Challenge: An Unknown Ins
Soham Gargote
Oct 13, 20252 min read


Neoverse -V2 Support to Intel Perfspect
We recently worked on extending Intel Perfspect ( https://github.com/Whileone-Techsoft/PerfSpect/tree/Neoverse-native-support ) , a robust, command-line performance analysis tool that implements the Top-Down Microarchitecture Analysis Method (TMAM). It fully supports the Arm Neoverse-V2 architectures. This project required mapping the Performance Monitoring Unit (PMU) events on the ARM cores to the metrics of TMAM methodology. We can now get the Level 1 breakdown (Fronte
Ruchi Joshi
Oct 6, 20251 min read


From Classroom to Code: Our Transformative Journey as Interns at WhileOne
The Leap into the Unknown Stepping out of the academic bubble and into the professional world is often painted as a daunting transition. For us, it was less a leap of faith and more an excited dive into the deep end, specifically, into the innovative waters of WhileOne.Our motivation to join was simple yet profound: we sought a place where curiosity was celebrated, challenges were seen as growth opportunities, and real-world impact was a daily pursuit. Little did we know that
Tanaya Ajgar
Sep 22, 20254 min read


Unleashing Performance Insights on ARM: Bringing Intel's PerfSpect to the Entire Ecosystem
Performance analysis can often feel like searching for a needle in a haystack. When your application isn't running as fast as you'd like, where do you even begin to look? Is it a memory bottleneck? Are you stalling in the CPU's front-end? Answering these questions is critical, but traditional tools can be complex and overwhelming. This is where Intel's PerfSpect comes in. And now, thanks to some recent contributions, this powerful tool is no longer just for x86 systems. I'm h
Sameer Natu
Sep 15, 20252 min read


RISCV Fuzzer for GCC and LLVM
Fuzzing RISC-V compilers like GCC and LLVM is a crucial practice for ensuring the correctness and security of the entire software ecosystem built on this architecture. It's not about finding vulnerabilities in the final compiled code, but rather about discovering bugs within the compiler itself that could lead to incorrect code generation, unexpected behavior, or even exploitable flaws. Why Compiler Fuzzing is a Unique Challenge Fuzzing compilers is different from fuzzing a
Rajeev Gadgil
Sep 8, 20253 min read


Success Story: How We Built a Trusted SRE Partnership with Our Client
In the world of Site Reliability Engineering (SRE), trust, knowledge, and execution matter more than anything else. When our team was presented with the opportunity to support one of the leading clients in the inference systems domain, we knew the competition would be fierce. Many well-established and much larger organizations were bidding for the same project. Yet, we saw this as an opportunity to prove that expertise, dedication, and the right approach can outweigh size and
Akshay Bhide
Sep 1, 20254 min read


AWS Graviton4 vs. GCP Axion
This blog post dives into a head-to-head performance comparison of two leading contenders: AWS Graviton4 (powering AWS r8g instances) and Google Axion (powering GCP Axion instances), both built on the advanced Arm Neoverse-V2 architecture. We'll examine their performance with Valkey 8.0.1, a popular in-memory data store. The Contenders: AWS Graviton4 and Google Axion AWS Graviton and Google Axion represent the latest generation of ARM-based server processors from Amazon and G
Rahul Bapat
Aug 25, 20253 min read


SPDK AIO bdevperf Performance Report: Analyzing Workload on AWS Graviton4
We conducted SPDK bdevperf tests on an AWS EC2 r8gd.metal-24xl instance, focusing on single CPU core performance under high I/O load. Our objective was to demonstrate a CPU-bound workload. Results show low I/O wait and high CPU utilization, confirming the CPU is the limiting factor. The 2-disk configuration achieved the highest throughput, indicating a CPU saturation point. 1. Performance Results Summary (100-second duration) Below is a consolidated view of our 100-second bd
Rahul Bapat
Aug 11, 20254 min read


Building Observability-Driven Performance Benchmarking Frameworks
Complex computing environments, spanning cloud, HPC, AI, and edge workloads; observability is no longer optional. With multiple layers of hardware and software working together, traditional monitoring alone cannot surface the insights needed for optimizing performance or preventing downtime. At Whileone Techsoft Pvt. Ltd. , we help companies go beyond monitoring by building deep observability frameworks that connect performance benchmarking , system analytics , telemetry , an
Nandita Gadgil
Aug 4, 20253 min read


MySQL Cloud Workload Brief
Overview MySQL is an open-source relational database management system (RDBMS) that stores and organizes data using tables, rows, and columns, and allows you to query and manage that data using SQL (Structured Query Language). MySQL Database Server is fast, reliable, scalable, and easy to use. It continues to rank highly in popularity among databases, according to DB-engines. SysBench is a multi-threaded benchmark tool. The tool can create a simple database schema, populate d
Mrinal Kshirsagar
Jul 28, 20253 min read


Tuning Compiler Flags for Custom Hardware
Benchmarking SPECint on FPGA: Introductio n With the growing interest in AI hardware for high-performance and power-efficient computing, understanding how industry-standard benchmarks perform on such platforms is critical. In this paper, we focus on SPECrate®2017 Integer workloads, a widely-used CPU benchmark suite, and share a case study comparing various runs on an FPGA target: a base run and a tuned run that achieved better performance. This paper describes how the tuning
Sayali Tamane
Jul 21, 20252 min read
bottom of page

