top of page


From Classroom to Code: Our Transformative Journey as Interns at WhileOne
The Leap into the Unknown Stepping out of the academic bubble and into the professional world is often painted as a daunting transition. For us, it was less a leap of faith and more an excited dive into the deep end, specifically, into the innovative waters of WhileOne.Our motivation to join was simple yet profound: we sought a place where curiosity was celebrated, challenges were seen as growth opportunities, and real-world impact was a daily pursuit. Little did we know that
Tanaya Ajgar
Sep 22, 20254 min read


SPDK AIO bdevperf Performance Report: Analyzing Workload on AWS Graviton4
We conducted SPDK bdevperf tests on an AWS EC2 r8gd.metal-24xl instance, focusing on single CPU core performance under high I/O load. Our objective was to demonstrate a CPU-bound workload. Results show low I/O wait and high CPU utilization, confirming the CPU is the limiting factor. The 2-disk configuration achieved the highest throughput, indicating a CPU saturation point. 1. Performance Results Summary (100-second duration) Below is a consolidated view of our 100-second bd

Rahul Bapat
Aug 11, 20254 min read


Building Observability-Driven Performance Benchmarking Frameworks
Complex computing environments, spanning cloud, HPC, AI, and edge workloads; observability is no longer optional. With multiple layers of hardware and software working together, traditional monitoring alone cannot surface the insights needed for optimizing performance or preventing downtime. At Whileone Techsoft Pvt. Ltd. , we help companies go beyond monitoring by building deep observability frameworks that connect performance benchmarking , system analytics , telemetry , an

Nandita Gadgil
Aug 4, 20253 min read


MySQL Cloud Workload Brief
Overview MySQL is an open-source relational database management system (RDBMS) that stores and organizes data using tables, rows, and columns, and allows you to query and manage that data using SQL (Structured Query Language). MySQL Database Server is fast, reliable, scalable, and easy to use. It continues to rank highly in popularity among databases, according to DB-engines. SysBench is a multi-threaded benchmark tool. The tool can create a simple database schema, populate d

Mrinal Kshirsagar
Jul 28, 20253 min read


Tuning Compiler Flags for Custom Hardware
Benchmarking SPECint on FPGA: Introductio n With the growing interest in AI hardware for high-performance and power-efficient computing, understanding how industry-standard benchmarks perform on such platforms is critical. In this paper, we focus on SPECrate®2017 Integer workloads, a widely-used CPU benchmark suite, and share a case study comparing various runs on an FPGA target: a base run and a tuned run that achieved better performance. This paper describes how the tuning

Sayali Tamane
Jul 21, 20252 min read


Benchmarking and Validation of Workloads on Emulators
In this case study, we describe our systematic approach to benchmarking and validating workloads on FPGA platforms using HAPS (High-performance ASIC Prototyping System) models. The workflow involves compiling and cross-compiling a diverse set of workloads using both native QEMU and the open source toolchain, executing them on FPGA hardware, and capturing detailed performance metrics such as instructions executed and cycle counts. 1. Benchmark Preparation and Build Process W

Sayali Tamane
Jun 30, 20252 min read


ARM64 Benchmarking with DeathStarBench: A Porting Journey
Delivering Modernization, Benchmarking & Cost Efficiency Migrating workloads from AMD64 to ARM64 allows organizations to harness the...

Alisha Bhale
Jun 23, 20252 min read


CPU-Centric HPC Benchmarking with miniFE and GROMACS
Benchmarks are vital for evaluating High-Performance Computing (HPC) system performance, guiding hardware choices, and optimizing software. This whitepaper focuses on understanding and overcoming bottlenecks in HPC benchmarks for CPU environments, specifically considering ARM/AARCH64 architectures, using miniFE and GROMACS as examples. 1. Introduction to miniFE and GROMACS Benchmarks 1.1. miniFE: A Finite Element Mini-Application miniFE, part of the Mantevo suite, simulates i

Rahul Bapat
Jun 16, 20255 min read


To get maximum tokens generated for target CPU
LLMs are Getting Better and Smaller Let’s look at Llama as an example. The rapid evolution of these models highlights a key trend in AI: prioritizing efficiency and performance. When Llama 2 70B launched in August 2023, it was considered a top-tier foundational model. However, its massive size demanded powerful hardware like the NVIDIA H100 accelerator. Less than nine months later, Meta introduced Llama 3 8B, shrinking the model by almost 9x. This enabled it to run on smaller

Archana Barve
Jun 9, 20251 min read


Benchmarking Meta Llama 4 Scout on CPU-Only Systems: Performance, Quantization, and Architecture Tuning
Meta’s Llama 4 Scout, released in April 2025, is a 17-billion parameter general-purpose language model that brings powerful reasoning to a broader range of applications—including those running without GPUs. This blog focuses on benchmarking Llama 4 Scout on CPU-only systems, covering: Tokens per second Latency per token Prompt handling efficiency Quantization techniques Architecture-specific optimization for x86, ARM, and RISC-V (RV64) Converting to GGUF format for efficient

Rajeev Gadgil
May 26, 20253 min read


Migrating JetStream 2.2 to Node.js: Challenges, Design, and What We Learned
JetStream is a JavaScript benchmark suite that evaluates web application performance by measuring the execution latency and throughput of complex workloads. With the release of JetStream 2.2, we at WhileOne Techsoft undertook the task of migrating its harness to a modern Node.js-based setup . Recently while working with a customer who was looking to benchmark their CPU using some js workloads. This post dives into why we did it, how we did it, and what you can expect from th

Rajeev Gadgil
May 19, 20253 min read


Cross-Compiling SPEC CPU2017 for RISC-V (RV64): A Practical Guide
SPEC CPU2017 is a well-known benchmark suite for evaluating CPU-intensive performance. Although it assumes native compilation and execution, there are cases—especially with RISC-V (RV64) platforms—where cross-compilation is the only feasible route. This guide walks through the steps to cross-compile SPEC CPU2017 for RISC-V, transfer the binaries to a target system, and optionally use the --fake option to simulate runs where execution isn't possible or needed during develop

Rajeev Gadgil
May 12, 20253 min read


Performance Modelling: How to Predict and Optimize System Efficiency
1. Introduction In today’s fast-paced digital world, system performance is critical to the success of applications ranging from cloud computing platforms to high-performance computing (HPC) workloads. Performance modelling is a powerful technique used to predict, analyze, and optimize the efficiency of computing systems. By simulating and understanding system behavior, developers, engineers, and IT managers can make informed decisions about design, scaling, and optimization s

Nandita Gadgil
Apr 14, 20252 min read


Understanding SPEC HPC Benchmarks: A Comprehensive Guide for Beginners
1. Introduction High-Performance Computing (HPC) is at the core of solving complex computational problems in scientific research, engineering, and large-scale data analysis. Benchmarking plays a critical role in evaluating and optimizing HPC system performance. The Standard Performance Evaluation Corporation (SPEC) provides widely recognized benchmarking suites tailored for different computing environments, helping researchers, businesses, and hardware vendors assess system c

Nandita Gadgil
Apr 7, 20252 min read


Automating Web Application Deployment on AWS EC2 with GitHub Actions
Introduction Deploying web applications manually can be time-consuming and error-prone. Automating the deployment process ensures consistency, reduces downtime, and improves efficiency. In this blog, we will explore how to automate web application deployment on AWS EC2 using GitHub Actions. By the end of this guide, you will have a fully automated CI/CD pipeline that pushes code from a GitHub repository to an AWS EC2 instance, ensuring smooth and reliable deployments. Seamles

Sameer Natu
Mar 17, 20253 min read


Performance Testing with NeoLoad: A Detailed Exploration of WebFocus Use Case
In today’s software-driven world, ensuring the seamless performance of applications under varying workloads is a necessity. For performance testing, tools like NeoLoad empower testers to simulate real-world conditions and optimize application behavior. In this blog, we’ll delve into the practical use of NeoLoad for WebFocus performance testing , focusing on scenarios like chart rendering, page loads, data uploads, and resource utilization. Understanding the Scope of Performa
Manasi Bansode
Jan 27, 20253 min read


Debugging Tool for workloads using Java
A debugging tool, GCeasy is used while generating reports of performance engineering.

Samruddhi Gole
Oct 28, 20242 min read


Android on RiscV Part - I
The Problem statement: Our customer expressed their desire to know if Android (AOSP) was already ported by community to the RiscV platform and if we could provide a detailed summary of the current status of AOSP compilation/build and Qemu emulation progress for RiscV Introduction to AOSP: Android is an open source operating system for mobile devices and an open source project led by Google. Android Open Source Project (AOSP) repository offers the information and source code

Anup Halarnkar
Aug 22, 20242 min read


Server Performance Prediction using ML Models - Part 2
In the first part of the blog, we described the problem that we intend to solve, the data gathering, post processing, and generating the final training data. In the 2nd part, we will take a look at the Machine Learning model we used for training and for inference with new data. Correlation between various counters We have captured various counters for various benchmarks. Here is a graph that shows the correlation between each counter with every other counter. K Neighbors Re

Rajeev Gadgil
Jul 26, 20232 min read


Server Performance Prediction using ML Models - Part 1
This blog is the first part of the series in "Server Performance Prediction using Machine Learning Models" OVERVIEW: In the semiconductor industry, a Silicon Cycle takes nearly a year or more from conception to the silicon being available. As soon as the concept comes in, there is immense pressure on the marketing and sales teams to come up with performance numbers for this new generation of the silicon i.e. processor. There is a need for a way to find out what the score coul

Rajeev Gadgil
Jul 20, 20237 min read
bottom of page

