top of page


Understanding DLRM with PyTorch
DLRM stands for Deep Learning Recommendation Model. It is a neural network architecture developed by Facebook AI (Meta) for large-scale personalized recommendation systems. DLRM is widely used in real-world applications where personalized recommendations or ranking predictions are needed. DLRM designed for click-through rate (CTR) prediction and ranking task. Examples: Online Advertising, E-commerce Recommendations, Social Media Feed Ranking, Streaming Services, Online Mark

Mrinal Kshirsagar
Nov 24, 20252 min read


Predicting Differential Loss at the Edge: Lightweight ML for Real-Time Test Intelligence
Inspiration In high-throughput production environments, every sensor reading tells a story. Test systems continuously record Pressure , Temperature , and Differential Loss (DL) across thousands of cycles, but much of this data remains passive, observed but not interpreted. We set out to change that by deploying machine learning directly at the edge on a BeagleBone Black board. The goal was not anomaly detection, but live inference : to compute what the ideal DL should be (

Alisha Bhale
Oct 20, 20253 min read


Benchmarking Meta Llama 4 Scout on CPU-Only Systems: Performance, Quantization, and Architecture Tuning
Meta’s Llama 4 Scout, released in April 2025, is a 17-billion parameter general-purpose language model that brings powerful reasoning to a broader range of applications—including those running without GPUs. This blog focuses on benchmarking Llama 4 Scout on CPU-only systems, covering: Tokens per second Latency per token Prompt handling efficiency Quantization techniques Architecture-specific optimization for x86, ARM, and RISC-V (RV64) Converting to GGUF format for efficient

Rajeev Gadgil
May 26, 20253 min read


Understanding SPEC HPC Benchmarks: A Comprehensive Guide for Beginners
1. Introduction High-Performance Computing (HPC) is at the core of solving complex computational problems in scientific research, engineering, and large-scale data analysis. Benchmarking plays a critical role in evaluating and optimizing HPC system performance. The Standard Performance Evaluation Corporation (SPEC) provides widely recognized benchmarking suites tailored for different computing environments, helping researchers, businesses, and hardware vendors assess system c

Nandita Gadgil
Apr 7, 20252 min read


YOLOX on RISC-V QEMU
Goal of this project: This project aims to determine RISC-V's readiness for running YOLOX for the latest edge requirements. Target Application: Running YOLOX on RISC-V QEMU involves setting up a RISC-V virtual machine and then configuring the necessary environment to compile and run YOLOX. Please note that this is a complex process, and it's essential to have prior experience with virtualization and RISC-V development. From the RISCV website, this is a blog ( https://riscv.or

Sameer Natu
Sep 19, 20233 min read


Server Performance Prediction using ML Models - Part 2
In the first part of the blog, we described the problem that we intend to solve, the data gathering, post processing, and generating the final training data. In the 2nd part, we will take a look at the Machine Learning model we used for training and for inference with new data. Correlation between various counters We have captured various counters for various benchmarks. Here is a graph that shows the correlation between each counter with every other counter. K Neighbors Re

Rajeev Gadgil
Jul 26, 20232 min read


Server Performance Prediction using ML Models - Part 1
This blog is the first part of the series in "Server Performance Prediction using Machine Learning Models" OVERVIEW: In the semiconductor industry, a Silicon Cycle takes nearly a year or more from conception to the silicon being available. As soon as the concept comes in, there is immense pressure on the marketing and sales teams to come up with performance numbers for this new generation of the silicon i.e. processor. There is a need for a way to find out what the score coul

Rajeev Gadgil
Jul 20, 20237 min read
bottom of page

