top of page


Boost Software Efficiency with Software Performance Optimization
Software efficiency is more critical than ever. Users expect applications to be fast, reliable, and scalable. Achieving this requires more than just writing clean code; it demands a strategic approach known as software performance optimization. This process ensures that software not only meets functional requirements but also performs optimally under various conditions. By focusing on performance engineering, businesses can deliver superior user experiences, reduce operationa

Archana Barve
Feb 233 min read


Chaos Engineering in the Production Stack
Chaos Engineering: Enhancing System Resilience Chaos engineering is the discipline of intentionally introducing controlled faults to validate system resilience. In any production ecosystem, spanning silicon validation, system integration, and software stacks, it helps uncover performance, reliability, and scalability risks long before production deployment. Understanding Kubernetes Pods Modern validation and benchmarking workloads increasingly run on Kubernetes. Pods, the sma

Rajeev Gadgil
Feb 92 min read


Stop Starting, Start Resuming: Quickly starting dockers
Cold starting docker containers is expensive. Before a Dockerized application does anything useful, it pulls images, initializes the runtime, loads classes or modules, allocates memory, opens files and sockets, and slowly warms into a steady operating state. In modern infrastructure, this cost shows up everywhere: pod restarts, scale-outs, rollouts, autoscaling events. Each time, the same warm-up work is paid for again. Capture and restore offers a different idea: instead of

Ojas Natu
Jan 272 min read


The Evolution of Software Performance with Agentic AI
Discover how agentic AI is flipping the performance engineering paradigm. Learn how to evolve from a "loop tuner" to a "constraints governor" with our checklist for AI-proof SLOs.

Rajeev Gadgil
Jan 122 min read


From Innovation to Impact: Aligning ER&D with Marketing and Sales
Engineering R&D in a Changing Landscape Engineering Research and Development has always been at the heart of innovation. But today, its role is evolving rapidly. What was once primarily about pushing technical boundaries is now equally about speed, efficiency, and alignment with business outcomes. As industries grow more complex and interconnected, Engineering R&D teams are being asked to deliver faster, smarter, and with fewer margins for error. From a marketing and sales po

Shruti Gadgil
Dec 29, 20253 min read


RISC-V: Accelerating Software Readiness for Numerical Computing
Introduction to RISC-V and Software Readiness As RISC-V expands into accelerator domains, software readiness becomes as critical as hardware innovation. This work focuses on implementing a set of mathematical and BLAS primitives for a custom RISC-V architecture. These primitives form foundational building blocks for numerical computing. The implementation includes vector and matrix operations, with careful attention to numerical correctness and floating-point behavior. Overco

Anup Halarnkar
Dec 22, 20252 min read


QEMU vs. FPGA: Understanding the Differences in Emulating and Prototyping Any ISA
With the evolution of hardware design and development, two tools have become fundamental for those working on Instruction Set Architectures (ISA) QEMU and FPGA boards. Although both serve as key resources for developing, testing, and experimenting with different ISAs (such as RISC-V, ARM, x86, etc.), they operate in significantly different ways. This blog highlights the key distinctions between QEMU and FPGA boards and their use cases across various architectures. Key Featur

Sayali Tamane
Dec 8, 20253 min read


Network Latency Study in OCI Cloud
Network testing tools such as netperf can perform latency tests plus throughput tests and more. In netperf, the TCP_RR and UDP_RR (RR=request-response) tests report round-trip latency. With the -o flag, output metrics can be customized to display the exact information. Here’s an example of using the test-specific -o flag so netperf outputs several latency statistics: Google has lots of practical experience in latency benchmarking and as per blog using-netperf-and-ping-to-me

Archana Barve
Dec 1, 20254 min read


Understanding DLRM with PyTorch
DLRM stands for Deep Learning Recommendation Model. It is a neural network architecture developed by Facebook AI (Meta) for large-scale personalized recommendation systems. DLRM is widely used in real-world applications where personalized recommendations or ranking predictions are needed. DLRM designed for click-through rate (CTR) prediction and ranking task. Examples: Online Advertising, E-commerce Recommendations, Social Media Feed Ranking, Streaming Services, Online Mark

Mrinal Kshirsagar
Nov 24, 20252 min read


Top CPU Performance Benchmarking Toolkits You Should Know
Modern compute platforms - from cloud hyperscale CPUs to edge processors - deliver unprecedented parallelism and instruction-set capabilities. But to truly understand performance, you need the right benchmarking tools. Whether you're comparing cloud instances, evaluating Arm-based servers like Ampere , or validating x86, RISC-V, or AI-accelerated hardware, the ecosystem offers several battle-tested frameworks. In this blog, we explore the most widely-used CPU benchmarking too

Rajeev Gadgil
Nov 3, 20252 min read


Major Takeaways from RISCV NA Summit 2025
1. The Software Ecosystem is Now the Core Focus The most significant shift was the overwhelming emphasis on software, tools, and developer experience. Platform Mindset: Keynote speakers, including executives from major players, stressed the need to view RISC-V not just as an ISA (Instruction Set Architecture) but as an ecosystem that requires platform-level thinking. The message was clear: no single company can build the entire software stack alone; continued, sustained commu

Anup Halarnkar
Oct 27, 20256 min read


Predicting Differential Loss at the Edge: Lightweight ML for Real-Time Test Intelligence
Inspiration In high-throughput production environments, every sensor reading tells a story. Test systems continuously record Pressure , Temperature , and Differential Loss (DL) across thousands of cycles, but much of this data remains passive, observed but not interpreted. We set out to change that by deploying machine learning directly at the edge on a BeagleBone Black board. The goal was not anomaly detection, but live inference : to compute what the ideal DL should be (

Alisha Bhale
Oct 20, 20253 min read


Debugging the Debugger: A Deep Dive into GDB and RISC-V
In the world of software development, the GNU Debugger (GDB) is an essential tool for programmers. It allows us to peer inside a running program, find bugs, and understand complex code. As new hardware architectures emerge, it's crucial that our tools keep pace. One such rising star is RISC-V, an open-source instruction set architecture that is rapidly gaining popularity, particularly with its new vector extensions for high-performance computing. The Challenge: An Unknown Ins
Soham Gargote
Oct 13, 20252 min read


Neoverse -V2 Support to Intel Perfspect
We recently worked on extending Intel Perfspect ( https://github.com/Whileone-Techsoft/PerfSpect/tree/Neoverse-native-support ) , a robust, command-line performance analysis tool that implements the Top-Down Microarchitecture Analysis Method (TMAM). It fully supports the Arm Neoverse-V2 architectures. This project required mapping the Performance Monitoring Unit (PMU) events on the ARM cores to the metrics of TMAM methodology. We can now get the Level 1 breakdown (Fronte
Ruchi Joshi
Oct 6, 20251 min read


From Classroom to Code: Our Transformative Journey as Interns at WhileOne
The Leap into the Unknown Stepping out of the academic bubble and into the professional world is often painted as a daunting transition. For us, it was less a leap of faith and more an excited dive into the deep end, specifically, into the innovative waters of WhileOne.Our motivation to join was simple yet profound: we sought a place where curiosity was celebrated, challenges were seen as growth opportunities, and real-world impact was a daily pursuit. Little did we know that
Tanaya Ajgar
Sep 22, 20254 min read


Unleashing Performance Insights on ARM: Bringing Intel's PerfSpect to the Entire Ecosystem
Performance analysis can often feel like searching for a needle in a haystack. When your application isn't running as fast as you'd like, where do you even begin to look? Is it a memory bottleneck? Are you stalling in the CPU's front-end? Answering these questions is critical, but traditional tools can be complex and overwhelming. This is where Intel's PerfSpect comes in. And now, thanks to some recent contributions, this powerful tool is no longer just for x86 systems. I'm h

Sameer Natu
Sep 15, 20252 min read


RISCV Fuzzer for GCC and LLVM
Fuzzing RISC-V compilers like GCC and LLVM is a crucial practice for ensuring the correctness and security of the entire software ecosystem built on this architecture. It's not about finding vulnerabilities in the final compiled code, but rather about discovering bugs within the compiler itself that could lead to incorrect code generation, unexpected behavior, or even exploitable flaws. Why Compiler Fuzzing is a Unique Challenge Fuzzing compilers is different from fuzzing a

Rajeev Gadgil
Sep 8, 20253 min read


Success Story: How We Built a Trusted SRE Partnership with Our Client
In the world of Site Reliability Engineering (SRE), trust, knowledge, and execution matter more than anything else. When our team was presented with the opportunity to support one of the leading clients in the inference systems domain, we knew the competition would be fierce. Many well-established and much larger organizations were bidding for the same project. Yet, we saw this as an opportunity to prove that expertise, dedication, and the right approach can outweigh size and

Akshay Bhide
Sep 1, 20254 min read


AWS Graviton4 vs. GCP Axion
This blog post dives into a head-to-head performance comparison of two leading contenders: AWS Graviton4 (powering AWS r8g instances) and Google Axion (powering GCP Axion instances), both built on the advanced Arm Neoverse-V2 architecture. We'll examine their performance with Valkey 8.0.1, a popular in-memory data store. The Contenders: AWS Graviton4 and Google Axion AWS Graviton and Google Axion represent the latest generation of ARM-based server processors from Amazon and G

Rahul Bapat
Aug 25, 20253 min read


SPDK AIO bdevperf Performance Report: Analyzing Workload on AWS Graviton4
We conducted SPDK bdevperf tests on an AWS EC2 r8gd.metal-24xl instance, focusing on single CPU core performance under high I/O load. Our objective was to demonstrate a CPU-bound workload. Results show low I/O wait and high CPU utilization, confirming the CPU is the limiting factor. The 2-disk configuration achieved the highest throughput, indicating a CPU saturation point. 1. Performance Results Summary (100-second duration) Below is a consolidated view of our 100-second bd

Rahul Bapat
Aug 11, 20254 min read
bottom of page

