Whileone Tech Space

Understanding DLRM with PyTorch

DLRM stands for Deep Learning Recommendation Model. It is a neural network architecture developed by Facebook AI (Meta) for large-scale personalized recommendation systems. DLRM is widely used in real-world applications where personalized recommendations or ranking predictions are needed. DLRM designed for click-through rate (CTR) prediction and ranking task. Examples: Online Advertising, E-commerce Recommendations, Social Media Feed Ranking, Streaming Services, Online Mark

Mrinal Kshirsagar

Nov 24, 20252 min read

Site Reliability Engineering (SRE) Support for System Infrastructure

Operational Excellence for Service-Driven Enterprises As businesses increasingly deploy services and in production environments, the reliability and uptime of servers have become a critical need. These workloads are often hosted in hybrid setups, including dedicated data centers and public clouds, where even brief outages can impact performance, user trust, and business outcomes. To meet these demands, a dedicated Site Reliability Engineering (SRE) team provides comprehensive

Akshay Bhide

Jul 14, 20253 min read

To get maximum tokens generated for target CPU

LLMs are Getting Better and Smaller Let’s look at Llama as an example. The rapid evolution of these models highlights a key trend in AI: prioritizing efficiency and performance. When Llama 2 70B launched in August 2023, it was considered a top-tier foundational model. However, its massive size demanded powerful hardware like the NVIDIA H100 accelerator. Less than nine months later, Meta introduced Llama 3 8B, shrinking the model by almost 9x. This enabled it to run on smaller

Archana Barve

Jun 9, 20251 min read

Benchmarking Meta Llama 4 Scout on CPU-Only Systems: Performance, Quantization, and Architecture Tuning

Meta’s Llama 4 Scout, released in April 2025, is a 17-billion parameter general-purpose language model that brings powerful reasoning to a broader range of applications—including those running without GPUs. This blog focuses on benchmarking Llama 4 Scout on CPU-only systems, covering: Tokens per second Latency per token Prompt handling efficiency Quantization techniques Architecture-specific optimization for x86, ARM, and RISC-V (RV64) Converting to GGUF format for efficient

Rajeev Gadgil

May 26, 20253 min read

Understanding SPEC HPC Benchmarks: A Comprehensive Guide for Beginners

1. Introduction High-Performance Computing (HPC) is at the core of solving complex computational problems in scientific research, engineering, and large-scale data analysis. Benchmarking plays a critical role in evaluating and optimizing HPC system performance. The Standard Performance Evaluation Corporation (SPEC) provides widely recognized benchmarking suites tailored for different computing environments, helping researchers, businesses, and hardware vendors assess system c

Nandita Gadgil

Apr 7, 20252 min read