Whileone Tech Space

Understanding DLRM with PyTorch

DLRM stands for Deep Learning Recommendation Model. It is a neural network architecture developed by Facebook AI (Meta) for large-scale personalized recommendation systems. DLRM is widely used in real-world applications where personalized recommendations or ranking predictions are needed. DLRM designed for click-through rate (CTR) prediction and ranking task. Examples: Online Advertising, E-commerce Recommendations, Social Media Feed Ranking, Streaming Services, Online Mark

Mrinal Kshirsagar

Nov 24, 20252 min read

Predicting Differential Loss at the Edge: Lightweight ML for Real-Time Test Intelligence

Inspiration In high-throughput production environments, every sensor reading tells a story. Test systems continuously record Pressure , Temperature , and Differential Loss (DL) across thousands of cycles, but much of this data remains passive, observed but not interpreted. We set out to change that by deploying machine learning directly at the edge on a BeagleBone Black board. The goal was not anomaly detection, but live inference : to compute what the ideal DL should be (

Alisha Bhale

Oct 20, 20253 min read

Benchmarking Meta Llama 4 Scout on CPU-Only Systems: Performance, Quantization, and Architecture Tuning

Meta’s Llama 4 Scout, released in April 2025, is a 17-billion parameter general-purpose language model that brings powerful reasoning to a broader range of applications—including those running without GPUs. This blog focuses on benchmarking Llama 4 Scout on CPU-only systems, covering: Tokens per second Latency per token Prompt handling efficiency Quantization techniques Architecture-specific optimization for x86, ARM, and RISC-V (RV64) Converting to GGUF format for efficient

Rajeev Gadgil

May 26, 20253 min read

Understanding SPEC HPC Benchmarks: A Comprehensive Guide for Beginners

1. Introduction High-Performance Computing (HPC) is at the core of solving complex computational problems in scientific research, engineering, and large-scale data analysis. Benchmarking plays a critical role in evaluating and optimizing HPC system performance. The Standard Performance Evaluation Corporation (SPEC) provides widely recognized benchmarking suites tailored for different computing environments, helping researchers, businesses, and hardware vendors assess system c

Nandita Gadgil

Apr 7, 20252 min read

YOLOX on RISC-V QEMU

Goal of this project: This project aims to determine RISC-V's readiness for running YOLOX for the latest edge requirements. Target Application: Running YOLOX on RISC-V QEMU involves setting up a RISC-V virtual machine and then configuring the necessary environment to compile and run YOLOX. Please note that this is a complex process, and it's essential to have prior experience with virtualization and RISC-V development. From the RISCV website, this is a blog ( https://riscv.or

Sameer Natu

Sep 19, 20233 min read

Server Performance Prediction using ML Models - Part 2

In the first part of the blog, we described the problem that we intend to solve, the data gathering, post processing, and generating the final training data. In the 2nd part, we will take a look at the Machine Learning model we used for training and for inference with new data. Correlation between various counters We have captured various counters for various benchmarks. Here is a graph that shows the correlation between each counter with every other counter. K Neighbors Re

Rajeev Gadgil

Jul 26, 20232 min read

Server Performance Prediction using ML Models - Part 1

This blog is the first part of the series in "Server Performance Prediction using Machine Learning Models" OVERVIEW: In the semiconductor industry, a Silicon Cycle takes nearly a year or more from conception to the silicon being available. As soon as the concept comes in, there is immense pressure on the marketing and sales teams to come up with performance numbers for this new generation of the silicon i.e. processor. There is a need for a way to find out what the score coul

Rajeev Gadgil

Jul 20, 20237 min read