top of page


Understanding DLRM with PyTorch
DLRM stands for Deep Learning Recommendation Model. It is a neural network architecture developed by Facebook AI (Meta) for large-scale personalized recommendation systems. DLRM is widely used in real-world applications  where personalized recommendations or ranking predictions are needed. DLRM designed for click-through rate (CTR) prediction and ranking task. Examples: Online Advertising, E-commerce Recommendations, Social Media Feed Ranking, Streaming Services, Online Mark

Mrinal Kshirsagar
Nov 242 min read
Â
Â
Â


Predicting Differential Loss at the Edge: Lightweight ML for Real-Time Test Intelligence
Inspiration In high-throughput production environments, every sensor reading tells a story. Test systems continuously record Pressure , Temperature , and Differential Loss (DL)  across thousands of cycles, but much of this data remains passive, observed but not interpreted. We set out to change that by deploying machine learning directly at the edge  on a BeagleBone Black  board. The goal was not anomaly detection, but live inference : to compute what the ideal DL should be (

Alisha Bhale
Oct 203 min read
Â
Â
Â


To get maximum tokens generated for target CPU
LLMs are Getting Better and Smaller Let’s look at Llama as an example. The rapid evolution of these models highlights a key trend in AI: prioritizing efficiency and performance. When Llama 2 70B launched in August 2023, it was considered a top-tier foundational model. However, its massive size demanded powerful hardware like the NVIDIA H100 accelerator. Less than nine months later, Meta introduced Llama 3 8B, shrinking the model by almost 9x. This enabled it to run on smaller

Archana Barve
Jun 91 min read
Â
Â
Â


Benchmarking Meta Llama 4 Scout on CPU-Only Systems: Performance, Quantization, and Architecture Tuning
Meta’s Llama 4 Scout, released in April 2025, is a 17-billion parameter general-purpose language model that brings powerful reasoning to a broader range of applications—including those running without GPUs. This blog focuses on benchmarking Llama 4 Scout on CPU-only systems, covering: Tokens per second Latency per token Prompt handling efficiency Quantization techniques Architecture-specific optimization for x86, ARM, and RISC-V (RV64) Converting to GGUF format for efficient

Rajeev Gadgil
May 263 min read
Â
Â
Â


Automating Web Application Deployment on AWS EC2 with GitHub Actions
Introduction Deploying web applications manually can be time-consuming and error-prone. Automating the deployment process ensures consistency, reduces downtime, and improves efficiency. In this blog, we will explore how to automate web application deployment on AWS EC2 using GitHub Actions. By the end of this guide, you will have a fully automated CI/CD pipeline that pushes code from a GitHub repository to an AWS EC2 instance, ensuring smooth and reliable deployments. Seamles

Sameer Natu
Mar 173 min read
Â
Â
Â
bottom of page
