top of page

72 results found with an empty search

  • UI/UX Design Isn’t Just About Aesthetic Appeal

    When people hear the term "UI/UX design," they often envision sleek interfaces, vibrant colour palettes, and visually appealing layouts. Although aesthetics play a significant role, UI/UX design is much more than just aesthetics. Despite the fact that the process is far from straightforward, the goal is to create smooth experiences that seem natural and effortless to users. The Intricacy of Simplicity A well-designed product is not created by chance. Every button location, transition, and navigation flow has been carefully considered. To build an experience that meets user expectations, designers balance a number of factors, including usability, accessibility, responsiveness, user behaviour, and even psychology. Making ensuring people can finish their tasks without difficulty or confusion is the aim.                      It is an experience, not just a set of screens. UI/UX design is more than simply what people see; it's also about how they feel while interacting with the product. A well-designed interface helps users navigate naturally, lowering cognitive burden and removing friction spots. This necessitates extensive study, prototyping, user testing, and frequent iteration. It's about knowing genuine users' demands, anticipating their pain areas, and designing solutions that are second nature to them. Organised Chaos: The Designer's World A great experience may appear straightforward to users, but behind the scenes, UI/UX designers oversee a complex web of panels, flows, and interactions. Designing an app or website might feel like mapping out a completely new dimension, with every possible path taken into account. There are numerous considerations to be taken, from selecting the appropriate typography and colour schemes to creating intricate user journeys and micro-interactions. Beyond Aesthetics: The Role of Functionality A visually pleasing design without utility is insignificant. UI/UX design strikes a balance between form and function. It ensures that people may not only admire the beauty of a product, but also use it with ease. This involves reducing loading times, making content accessible to all users, and guaranteeing consistency across devices and platforms. The Ultimate Goal: Effortlessness At the core of UI/UX design is the desire to make digital interactions as seamless as possible. Users should never have to struggle to find what they need or question their next step. If they do, the design failed. The true mark of outstanding UI/UX design is when users don't see it; it simply works. Final Thoughts Next time you come across a beautifully designed app or website that just 'feels right,' keep in mind that it is the result of a lot of strategy, research, and problem-solving. UI/UX design is more than just producing visually pleasing interfaces; it is also about creating experiences that empower users, solve issues, and make technology feel more human.

  • Why Firmware Security is Critical?

    The spotlight often shines on software and hardware security. Yet, lurking beneath the surface, lies a critical layer often overlooked, firmware . This low-level software embedded in our devices, from routers and smart thermostats to industrial control systems and medical devices, acts as the vital link between hardware and operating systems. Its security, or lack thereof, can have profound consequences. The proliferation of Internet of Things (IoT) devices has exponentially expanded the attack surface. Each connected device represents a potential entry point for malicious actors. Compromised firmware can grant attackers complete control over a device, allowing them to: Conduct espionage:  Access sensitive data, monitor activities, and eavesdrop on communications. Imagine a compromised smart camera feeding live footage to a malicious server. Launch wider network attacks:  Use compromised devices as botnets to execute Distributed Denial of Service (DDoS) attacks, crippling websites and online services. Think of thousands of hacked smart bulbs overwhelming a target server. Cause physical harm:  In industrial or medical settings, compromised firmware can manipulate critical functions, leading to equipment malfunction or even endangering lives. Consider a hacked insulin pump delivering incorrect dosages. Common firmware vulnerabilities often arise from: Insecure default configurations:  Weak or easily guessable passwords and open ports. Lack of proper input validation:  Allowing attackers to inject malicious code. Outdated or unpatched firmware:  Failing to address known security flaws. Insufficient encryption:  Leaving sensitive data transmitted by the firmware vulnerable to interception. Securing firmware is no longer optional; it's a necessity. Best practices include: Secure by Design principles:  Building security into the firmware development lifecycle from the outset. Regular security audits and penetration testing:  Identifying and addressing potential vulnerabilities. Robust and secure update mechanisms:  Ensuring timely patching of security flaws. Strong authentication and authorization:  Protecting access to device functionalities. Data encryption at rest and in transit:  Safeguarding sensitive information handled by the firmware. Ignoring firmware security is akin to leaving the back door of your digital infrastructure wide open. As our world becomes increasingly interconnected, recognizing and addressing the security of this unsung hero is paramount to protecting our data, our systems, and ultimately, our safety. Investing in secure firmware development and proactive updates is not just a technical necessity, but a fundamental requirement for a secure and trustworthy connected future.

  • Cloud Cost Management Tools: How CloudNudge Outperforms the Competition

    In today's cloud-first world, managing spend isn’t just a finance problem, it’s a strategic advantage. Enterprises and startups are turning to cost optimization platforms to maximize ROI and reduce waste. Tools like Densify , IBM Turbonomic , FinOut , Granulate , Datadog , nOps , and Virtana  offer various solutions but most fall short of delivering strategic, intelligent, and business-aligned optimization. CloudNudge is a platform built to track costs and reshape how organizations think about and act on cloud spending . Let’s break down the market and explore why CloudNudge is redefining what true cloud cost intelligence looks like. Competitive Feature Analysis Table ✅ = Fully supported ❌ = Not supported ❓ = Not clearly mentioned — = No data available Why are these 5 Key Features That Influence Buying Decisions?  1. Workload-Based VM Suggestions (Right-sizing) Over-provisioned VMs are a major source of cloud waste . Buyers want tools that analyze real usage patterns  (CPU, memory, I/O) and recommend exact instance types to reduce cost without degrading performance. This feature provides immediate, measurable savings  — often up to 30%. Impact:  Tangible cost savings + better resource utilization. 2. Container (Kubernetes) Optimization Kubernetes is now the standard for deploying scalable applications , especially in microservices environments. Orchestrated environments can quickly accumulate hidden costs due to poor resource allocation (requests/limits). Buyers look for tools that can optimize container usage automatically or with clear suggestions . Impact:  Saves money and reduces developer overhead in container-heavy environments. 3. Cross-Cloud Support (AWS, Azure, GCP) Most modern businesses run multi-cloud or hybrid-cloud  strategies — either by design or acquisition. Buyers want one centralized view  of all cloud costs to avoid tool sprawl and enable consistent governance. A platform that handles all major cloud vendors  is significantly more appealing. Impact:  Simplifies management, reduces vendor lock-in, and ensures full visibility. 4. Anomaly Detection Sudden spikes in cloud bills (due to misconfigurations, rogue scripts, etc.) can be financially devastating. Buyers need proactive detection  of outliers before they impact budgets. AI-driven anomaly detection shows maturity and prevents surprises . Impact:  Protects against unplanned spend and improves forecasting accuracy. 5. Jira Integration (Actionable Workflows) FinOps insights are only valuable if they lead to real action . Many teams struggle to operationalize cost recommendations — insights get stuck in reports. Integrating with Jira (or similar tools)  ensures that optimization tasks become part of the team's natural workflow. Impact:  Drives actual change, not just reporting — boosts ROI from the tool. In summary, these five features deliver tangible financial savings, address real operational challenges, meet the needs of both technical and financial stakeholders, and demonstrate maturity and practical value in today’s competitive tool landscape. Why CloudNudge Stands Out All 5 features are covered natively  — most competitors miss at least one. Jira integration  gives CloudNudge a unique edge by turning insights into actions. Cross-cloud support  ensures unified visibility and optimization across AWS, Azure, and GCP, still a major gap in tools like Virtana. Workload-specific VM recommendations     Kubernetes optimization  ensures deep, resource-level efficiency. Bottom Line: Smarter Cloud Spending Starts Here Most tools in the market offer monitoring  or reactive alerts . CloudNudge is different. It delivers: Strategic insights  rooted in real-world benchmarks Smart automation  integrated into your daily workflow Forward-looking controls  that empower your teams Stop reacting to cloud bills. Take control with smart, strategic cloud spending. Experience the CloudNudge difference now.

  • Cross-Compiling SPEC CPU2017 for RISC-V (RV64): A Practical Guide

    SPEC CPU2017 is a well-known benchmark suite for evaluating CPU-intensive performance. Although it assumes native compilation and execution, there are cases—especially with RISC-V (RV64) platforms—where cross-compilation is the only feasible route. This guide walks through the steps to cross-compile SPEC CPU2017 for RISC-V, transfer the binaries to a target system, and optionally use the --fake option to simulate runs where execution isn't possible or needed during development. Cross-compiling is essential when: Your RISC-V target system (e.g., dev board or emulator) lacks compiler tools. You're benchmarking an emulator (e.g., QEMU) or a minimal Linux image. Native builds are too slow or memory-constrained. Prerequisites A working RISC-V cross-toolchain (e.g., riscv64-linux-gnu-gcc). Installed SPEC CPU2017 suite on your host machine. Access to a RISC-V target environment (real or emulated). Optional: knowledge of the --fake flag in SPEC CPU2017 (we'll explain it below). Step-by-Step Guide 1. Install SPEC CPU2017 on the Host Machine Install SPEC on your x86_64 development system as usual: bash ./install.sh 2. Setup the Cross-Toolchain Make sure the RISC-V toolchain is installed and available: bash export CROSS_COMPILE=riscv64-linux-gnu- export CC=${CROSS_COMPILE}gcc export CXX=${CROSS_COMPILE}g++ Make sure the compiler binaries are in your $PATH. 3. Create a RISC-V SPEC Config File Copy and modify an existing config: bash cd $SPEC_DIR/config cp linux64-gcc.cfg linux-rv64-cross.cfg Then edit linux-rv64-cross.cfg: ini default=default=base,peak CC           = riscv64-linux-gnu-gcc CXX          = riscv64-linux-gnu-g++ COPTIMIZE    = -O2 -static CXXOPTIMIZE  = -O2 -static PORTABILITY  = -DSPEC_CPU_LINUX EXTRA_LDFLAGS = -static Use --sysroot or target-specific flags if needed. The -static flag is highly recommended to avoid runtime issues on minimal RISC-V Linux systems. 4. Build the Benchmarks (Without Running) This step compiles the benchmarks using the cross toolchain, but does not attempt to run them: bash cd $SPEC_DIR ./bin/runcpu --config=linux-rv64-cross --action=build --tune=base --size=ref all This will create executable binaries in the benchmark run/ directories. 5. (Optional) Simulate Benchmark Runs Using --fake If you only want to verify that the binaries were built correctly and prepare result directories for later manual execution, you can use: bash ./bin/runcpu --config=linux-rv64-cross --action=run --fake --tune=base --size=ref all This does not execute the binaries. Instead, it fakes a successful run and populates the result directories and reports. Use cases for --fake: Validate build structure without requiring target hardware. Automate CI pipelines for SPEC builds. Pre-generate result directories to collect logs from target systems later. Important: --fake is not a benchmark run. It's a metadata operation. You still need to run the binaries on the actual hardware to get performance data. 6. Transfer Binaries to Target System Find the executables in: bash $SPEC_DIR/benchspec/CPU/*/run/* Use scp, rsync, or embed them into a disk image. On your RISC-V target: bash cd /run/path ./_base.riscv64 Capture performance stats using /usr/bin/time, perf, or another profiler. Troubleshooting Issue Fix Illegal instruction Cross-compiler may be targeting wrong ISA; use -march=rv64gc Segmentation fault Missing libraries or stack size issues; try -static or ulimit -s unlimited Missing libstdc++ Use -static-libstdc++ or provide shared libs manually QEMU hangs or crashes Upgrade QEMU version or run on real hardware Summary With proper configuration, cross-compiling SPEC CPU2017 for RISC-V is not only feasible, but it’s also a powerful way to bring industrial-grade performance testing to emerging architectures. The --fake flag is a valuable tool when you're preparing runs in a disconnected or staged workflow. Bonus: CI/CD Pipeline Tip If you’re integrating into CI: Use --action=build and --fake together to validate builds. Export binaries as artifacts. Deploy them onto your RISC-V target for actual execution.

  • Migrating JetStream 2.2 to Node.js: Challenges, Design, and What We Learned

    JetStream is a JavaScript benchmark suite that evaluates web application performance by measuring the execution latency and throughput of complex workloads. With the release of JetStream 2.2, we at WhileOne Techsoft undertook the task of migrating its harness to a modern Node.js-based setup . Recently while working with a customer who was looking to benchmark their CPU using some js workloads. This post dives into why we did it, how we did it, and what you can expect from the JetStream-on-Node.js v2 repo . Background: Why Move to Node.js? JetStream is browser-focused but heavily dependent on JavaScriptCore, V8 , or other JS engines. Traditionally, it's driven via browser-based test harnesses , but for backend benchmarking and automation (especially for CPU benchmarking on headless systems), a Node.js-based execution layer is far more practical. Key motivators: Automated benchmarking in CI/CD environments. Cross-platform compatibility , especially for non-GUI servers (ARM, RISC-V, etc.). Easier integration with custom harnesses or profiling tools (like perf, time, etc.). Remove reliance on browser UIs and move toward CLI-based, headless benchmarks . Architecture Overview We restructured JetStream 2.2 to run under Node.js with minimal dependency changes. The project now: Loads JetStream benchmarks as CommonJS/ES modules. Mocks browser-specific globals (window, document, etc.) only where needed. Handles timing and result reporting within Node. Adds CLI support for automated runs. Project structure highlights: JetStream-Node/ ├── benchmarks/ ├── driver/ │ ├── main.js ← CLI runner │ ├── harness.js ← Benchmark orchestrator │ └── fake-browser.js ← Global mocks ├── results/ ├── utils/ ├── package.json Migration Strategy We broke the migration into several focused steps: 1. Browser Shim Implementation JetStream’s benchmarks expect a browser-like environment. We built a lightweight shim that defines: window, document, performance.now () setTimeout, clearTimeout Custom stubs for HTML elements (e.g., CanvasRenderingContext2D) This shim lives in driver/fake-browser.js and is injected before benchmarks are loaded. 2. Rewriting the Test Harness Instead of using JetStream's HTML runner, we built a new runner in driver/harness.js that: Iterates over all benchmark modules. Loads and runs them with a warm-up + main execution loop. Times each run with performance.now () or process.hrtime. 3. CommonJS Compatibility Fixes Some benchmarks had inline scripts or relied on document.write. We: Wrapped them in modules where possible. Rewrote some legacy benchmark entry points using require() or import(). Adjusted globals to match what the benchmark expects. 4. Result Logging and Aggregation Each benchmark result is recorded in JSON format in the results/ folder. We compute: Raw latency times. Geometric means. Per-benchmark scores. This structure allows easy post-processing or integration into other tools. What Works (and What’s Left) Working: Full benchmark execution under Node 20+. Logging, scoring, and isolation of benchmark outputs. Runs across x86 and ARM64. Still to refine: Some benchmarks that depend on browser layout (e.g., DOM-heavy tests) are disabled or stubbed. Parallel execution and profiling hooks are planned. Rewriting result UI for visualization (low priority for CLI users). Usage To try it yourself: git clone  https://github.com/Whileone-Techsoft/JetStream-Node.git cd JetStream-Node git checkout jetstream-on-node-js-v2 npm install node driver/main.js This will run the benchmark suite and save output under results/. Performance Use Cases Server CPU benchmarking : Run JetStream as part of CPU regression testing on headless servers. CI integration : Track JS performance changes across commits or platforms. Cross-architecture comparison : Run the same benchmark on x86, ARM64, RISC-V, and compare results meaningfully. Future Work Add --filter and --repeat CLI flags. Support native engine plugins (e.g., run with SpiderMonkey via CLI). Add CSV and HTML result output formats. Final Thoughts Migrating JetStream 2.2 to Node.js wasn’t just a port—it was about transforming it into a modern, scriptable, backend-compatible benchmarking tool . If you're looking to run JetStream without a browser, or to integrate it into your infrastructure, this project is a clean, extensible starting point. You can check out the source and contribute here: GitHub - https://github.com/Whileone-Techsoft/JetStream-Node/tree/jetstream-on-node-js-v2

  • Benchmarking Meta Llama 4 Scout on CPU-Only Systems: Performance, Quantization, and Architecture Tuning

    Meta’s Llama 4 Scout, released in April 2025, is a 17-billion parameter general-purpose language model that brings powerful reasoning to a broader range of applications—including those running without GPUs. This blog focuses on benchmarking Llama 4 Scout on CPU-only systems, covering: Tokens per second Latency per token Prompt handling efficiency Quantization techniques Architecture-specific optimization for x86, ARM, and RISC-V (RV64) Converting to GGUF format for efficient deployment Why Benchmark on CPU? While most LLMs are deployed on GPUs, CPU-only inference  is often necessary for: Edge devices Cloud VMs with no GPU access Open hardware ecosystems (e.g., RISC-V) Cost-conscious deployments That makes Llama 4 Scout a strong candidate, especially with quantized variants. Key Benchmark Metrics Tokens/sec Overall throughput, critical for long completions Latency/token Time to generate one token; important for chats Prompt size sensitivity How inference speed degrades with longer inputs Memory usage RAM footprint determines if the model can run at all Why Quantization Is Essential Quantization reduces the memory and compute requirements of large models. Llama 4 Scout quantized to int4 or int8 can run comfortably on CPUs with 8–16 GB of RAM. Benefit: Impact on Llama 4 Scout Memory savings: From 34GB → ~5–7GB (int4) Speedup: Up to 3× faster than float16 Hardware fit: Allows ARM & RV64 CPUs to host inference Tools like ggml, llama.cpp, and MLC support quantized Llama 4 models, including CPU backends. Architecture-Specific Performance Considerations 🔹 x86-64 (Intel, AMD) Vector Support: AVX2 or AVX-512 preferred Threading: Mature OpenMP and NUMA support Performance: High; well-optimized in llama models ARM (Graviton, Apple Silicon, Neoverse) Vector ISA: NEON (128-bit) on all, SVE/SVE2 on newer chips Threading: Requires tuning due to core heterogeneity Quantization: NEON handles int8 and int4 efficiently Tip: Use taskset and numactl to pin threads for optimal performance. RISC-V (RV64 with RVV) Vector ISA: RISC-V Vector Extension (RVV), variable width Quantization: Essential; float32 models are impractical on RV64 edge devices Tooling: llama.cpp support is experimental but growing For RV64, memory layout and cache-friendly quantization are critical due to limited bandwidth. Sample Inference Results (Hypothetical) Architecture Model Variant Prompt Size Tokens/sec. RAM Usage x86_64 Llama 4 Scout int4 512 11.2 ~6.5 GB ARM Neoverse Llama 4 Scout int4 512 8.7 ~6.5 GB RISC-V RV64 Llama 4 Scout int4 512 3.2 ~6.5 GB These results assume multi-threaded CPU inference with quantized weights using llama.cpp or similar. From Raw Model to GGUF: Why and How? To run Meta Llama 4 Scout efficiently on CPU-only systems, especially with tools like llama.cpp, the model must be in GGUF format. Why Convert to GGUF? GGUF (Grokking GGML Unified Format) is a compact, memory-optimized model file format designed for CPU and edge inference using: llama.cpp mlc-llm text-generation-webui GGUF Advantage : Benefit Memory Efficient: Packs quantized weights and metadata Fast Load Times:No need to re-tokenize or parse configs Metadata Preserved:Tokenizer, vocab, model type included Simplified Use: Single file usable across many tools How to Convert Llama 4 Scout to GGUF Download the Raw Model (HF Format) Get the original model from Hugging Face (e.g., meta-llama/Meta-Llama-4-Scout-17B). Install transformers and llama-cpp-python tools pip install transformers huggingface_hub git clone https://github.com/ggerganov/llama.cppcd llama.cppmake Run the GGUF Conversion Script From the llama.cpp/scripts directory: python convert.py \ --outfile llama4-scout.gguf \--model meta-llama/Meta-Llama-4-Scout-17B \ --dtype q4_0 3. Load It in Your Inference Tool Once converted, the .gguf file can be run directly:./main -m llama4-scout.gguf -p "Hello, world" GGUF + Quantization = CPU Superpowers Converting to GGUF enables you to quantize during the conversion: q4_0, q4_K, q5_1, and q8_0 supported You reduce size dramatically—from ~34GB → ~5–7GB for q4 It ensures compatibility with CPU SIMD instructions like AVX, SVE, or RVV On RISC-V or ARM boards with limited memory, GGUF + int4 is often the only way to get Llama 4 Scout running at all. Pro Tip: GGUF Conversion Options You can fine-tune conversion settings: --vocab-type to customize tokenizer structure --trust-remote-code if the Hugging Face repo uses custom loading --quantize q4_K for better int4 accuracy Final Thoughts Meta's Llama 4 Scout is one of the most practical open-source LLMs for CPU inference in 2025. With quantization and SIMD-aware deployment, it can serve: Edge applications (IoT, phones) Sovereign compute platforms (RISC-V) Cloud-native environments without GPUs If you’re interested in pushing the limits of open LLMs on CPU architectures, Llama 4 Scout is one of the best starting points.

  • Why Every Company Needs Robust Demos And How WhileOne Can Help

    Building a great product is only half the battle. Demonstrating its capabilities convincingly — whether in front of customers, at an exhibition, or during a PoC — is often what seals the deal. Yet, for many companies, setting up demos ends up as a side project that falls through the cracks. At Whileone , we understand this challenge. That’s why helping companies build reliable, repeatable demos has been part of our mission since day one. The Problem with Ad-Hoc Demos Most companies start strong when building a product, but creating demos usually gets delegated to engineers as a “when-you-have-time” task. This often results in: Inconsistent setups that don’t reflect the product’s full potential Broken environments due to configuration drift or missing dependencies Missed opportunities at conferences, sales pitches, or proof-of-concept trials The reality is — demos are critical, and they deserve dedicated engineering effort. Our Role in Fixing It Since our inception, WhileOne has been the go-to partner for companies needing production-quality demo setups. Whether it’s for exhibitions, PoC engagements, or internal experimentation frameworks, we build environments that work — every time. We've Supported Demos At: ComputeX Open Compute Project (OCP) Summit CloudFest RISC-V Summits SuperCompute And many more These demos are often used in booths, technical sessions, or partner showcases, and they just work — because we build them with reliability, repeatability, and reproducibility in mind. Kubernetes and Container-Based Environments Our demo environments are often built on Kubernetes or Docker, ensuring they are: Easily reproducible across developer machines and exhibition floors Modular and maintainable, for rapid iteration and updates Cloud-ready and on-prem compatible This allows your team to focus on what matters engaging your audience rather than wrestling with deployment issues. Demo Infrastructure Vision We believe demo infrastructure should be treated like production infrastructure: Version-controlled Testable Portable And every next demo being built up on a previous version/iteration. By working with WhileOne, your demos will never be an afterthought again.

  • To get maximum tokens generated for target CPU

    LLMs are Getting Better and Smaller Let’s look at Llama as an example. The rapid evolution of these models highlights a key trend in AI: prioritizing efficiency and performance. When Llama 2 70B launched in August 2023, it was considered a top-tier foundational model. However, its massive size demanded powerful hardware like the NVIDIA H100 accelerator. Less than nine months later, Meta introduced Llama 3 8B, shrinking the model by almost 9x. This enabled it to run on smaller AI accelerators and even optimized CPUs, drastically reducing the required hardware costs and power usage. Impressively, Llama 3 8B surpassed its larger predecessor in accuracy benchmarks. Setup details Tested with llama.cpp on Machine: Gv4 r8g.24xlarge  OS: ubuntu 2204 kernel: 6.8.AWS Model: Meta-Llama-3.1-8B-Instruct-  Q8_0.gguf Test sweep nproc x nthreads x bs [1-32] Graphs with observations highlighting benefits Token generation is done in an auto-regressive manner and is highly sensitive to the length of output needed to be generated. Arm optimizations help here with larger batch sizes, increasing the throughput by more than 2x. Conclusion For Meta-Llama-3.1-8B-Instruct- Q8_0.gguf, Graviton4 can generate 161 tokens per sec which translates to 102,486 tokens per dollar.

  • Site Reliability Engineering (SRE) Support for System Infrastructure

    Operational Excellence for Service-Driven Enterprises As businesses increasingly deploy services and in production environments, the reliability and uptime of servers have become a critical need. These workloads are often hosted in hybrid setups, including dedicated data centers and public clouds, where even brief outages can impact performance, user trust, and business outcomes. To meet these demands, a dedicated Site Reliability Engineering (SRE) team provides comprehensive support, combining real-time incident management, infrastructure optimization, and operational discipline to maintain high availability, typically targeting 99.9% uptime. The Whileone Approach to SRE Excellence At Whileone, we specialize in keeping critical system up and running with minimal disruption. Our team blends hands-on expertise in Linux, server management, and cloud platforms to deliver consistent, high-availability support. From alert response to root cause analysis and resolution, we follow a disciplined SRE approach that ensures incidents are handled swiftly and systematically. We take pride in being the steady hand behind your infrastructure proactive, and reliable. Core Capabilities and Technical Expertise The SRE team operates with a diverse skill set tailored to high-performance, always-on environments: Operating Systems & Systems-Level Engineering :  Deep understanding of Linux-based systems including process management, disk and memory diagnostics, kernel tuning, system services, networking, and security configurations. Physical and Virtual Server Management: Experience with both bare-metal server environments and virtualized compute platforms, ensuring reliability from hardware up to the OS and service layer. Cloud and Hybrid Infrastructure: Proficient in managing cloud-native workloads and integrating cloud services with on-premise infrastructure across platforms such as AWS, Azure, Google Cloud, and Oracle Cloud. Monitoring and Observability: Skilled in leveraging observability stacks to monitor key metrics, application health, and system-level behavior, enabling proactive detection and rapid triage of issues. Process Engineering and Benchmarking: The team implements standardized incident handling workflows and continuously refines processes to improve detection, diagnosis, and recovery times. Full Stack of Operational Support (L1–L4): The team provides structured, in-house coverage across all support levels, from basic alert triage (L1), to systems analysis (L2), code-level debugging (L3), and infrastructure-level resolution or architectural remediation (L4). Cross-Functional Collaboration: Workflows are integrated with enterprise-grade tools that support alerting, team coordination, ticketing, documentation, and shift-based communication. Shift-Based Support and Observational Handoffs The team operates in rotating shifts to ensure 24/7 coverage. Each shift is responsible for ongoing incident management, proactive health checks, and noting key system behaviors or deviations. At the end of each shift, outgoing engineers document their observations. The first shift of each day consolidates these notes into a comprehensive report, highlighting unresolved issues, recurring patterns, and system performance trends. This ensures that both technical and leadership teams remain informed and aligned. Structured Incident Response Lifecycle Alert Detection & Acknowledgement: Monitoring tools flag anomalies; engineers acknowledge and initiate an investigation immediately. System Diagnosis & Log Review: Teams inspect logs, resource metrics, and system health to identify stalls, failures, or contention. Collaborative Communication: A live incident thread is established to coordinate response and ensure full team visibility. Corrective Actions: Engineers take steps like restarting services, isolating nodes, or reallocating load to stabilize systems. Documentation & Run-log Update: The incident is formally logged with actions and findings for traceability and future reference. Escalation When Required: Complex issues are smoothly handed off to higher-tier specialists with full context and diagnostics. Operational Readiness and In-House Autonomy All support services from the initial alert handling to the most advanced system-level debugging are managed by a fully autonomous in-house team. This includes: Immediate L1 triage and alert response. Deep L2 and L3 systems troubleshooting. L4 infrastructure decision-making and optimization. With expertise spanning operating systems, cloud platforms, observability, automation, and performance engineering, the team is self-sufficient and minimizes external dependencies. This allows for faster resolution times and better control over long-term infrastructure health. This Site Reliability Engineering function provides robust operational support across hybrid and cloud-native environments. With a combination of hands-on technical depth, well-defined processes, and structured escalation paths, the team ensures stability, uptime, and resilience for complex production systems.

  • MySQL Cloud Workload Brief

    Overview MySQL is an open-source relational database management system (RDBMS) that stores and organizes data using tables, rows, and columns, and allows you to query and manage that data using SQL (Structured Query Language). MySQL Database Server is fast, reliable, scalable, and easy to use. It continues to rank highly in popularity among databases, according to DB-engines. SysBench is a multi-threaded benchmark tool. The tool can create a simple database schema, populate database tables with data, and generate multi-thread load (SQL queries) towards the database server.​ Sysbench works very well with MySQL because it was originally designed specifically to benchmark MySQL and MariaDB under various OLTP workloads. Sysbench comes with Lua scripts to simulate: Read/write transactions (oltp_read_write.lua) Read-only workloads (oltp_read_only.lua) Write-only workloads (oltp_write_only.lua) Point-select workloads (oltp_point_select.lua) Setup Details: MySQL Server Version: 8.0.36​ Sysbench Version: 1.0.20​ Example: MySQL Benchmark (OLTP) sysbench oltp_point_select\   --db-driver=mysql \   --mysql-user=root \   --mysql-password=yourpass \   --mysql-db=test \   prepare   # load data sysbench oltp_point_select \   --db-driver=mysql \   --mysql-user=root \   --mysql-password=yourpass \   --mysql-db=test \   --tables=10 \   --table-size=100000 \   --threads=8 \   --time=60 run   # run benchmark sysbench oltp_point_select \   --db-driver=mysql \   --mysql-user=root \   --mysql-password=yourpass \   --mysql-db=test \   cleanup   # remove data MySQL - Competitive Analysis: Compare MySQL on AMD, Intel and ARM machines. On each platform using the same Linux distribution (Fedora 38) and the same kernel version (6.4.13-200.fc38), with 4K Page Size. Test Purpose: oltp_point_select.lua: This workload performs single-row SELECTs by primary key. It’s used to measure pure read throughput, memory/cache efficiency, and indexing performance. Sysbench Command: sysbench/oltp_point_select.lua --table-size=10000000 --tables=8 --mysql-port=3000 --mysql-db=sbtest --threads=64 --events=0 --time=600 --report-interval=10 --thread-init-timeout=5 --rate=0 --rand-type=uniform --rand-seed=1 --mysql-host=`ip_address` --mysql-user=sbtest --mysql-password=`yourpass` --mysql-ssl=REQUIRED --mysql-ssl-cipher=AES128-SHA256 --db-ps-mode=disable --mysql-ignore-errors=1213,1205,1020,2013 --db-driver=mysql run Observations:  High QPS with very low average latency — typical of optimized read workloads. Performance scales well with threads up to a point. Depends heavily on index lookups and buffer pool hits. Minor spikes in latency may occur with disk I/O or buffer pool misses. Performance Insights Arm outperforms others in high-concurrency scenarios, ideal for thread-heavy OLTP workloads. Icelake performs best in low-to-mid thread counts, likely due to strong single-threaded or cache performance. Genoa is consistent, Good Compatibility with all MySQL features. Best Performance. Milan may need tuning or reflects a less optimized test environment. MySQL tunings for better performance: Tuning MySQL for better performance involves adjusting configuration settings based on your workload type (OLTP, analytics, mixed), system resources (RAM, CPU, disk), and traffic pattern (read-heavy, write-heavy, etc.).   Also tune some parameters in my.cnf such as “innodb write io threads”, “innodb read io threads”, “max connections” etc. Also used PGO (Profile-Guided Optimization) is a compiler optimization technique used with GCC to improve binary performance by collecting execution profiles and optimizing accordingly. MySQL - Competitive Analysis over GCP cloud: Compare MySQL on AMD, Intel and ARM machines on GCP cloud. Performance Analysis: Performance Insights GCP-N2 is the best choice for maximum performance in read-intensive benchmarks like Sysbench OLTP Point Select. GCP-N2D and T2A offer a good balance of performance and likely cost. GCP-T2D may not scale well for high QPS workloads — best suited for light-duty use. Conclusion MySQL handled concurrent connections efficiently with steady TPS and low latency. The oltp_point_select.lua test demonstrated MySQL's ability to handle high-throughput primary key lookups with exceptional performance — achieving 18,000 QPS at ~0.5 ms average latency. The results indicate efficient use of the InnoDB buffer pool and minimal I/O waits. To sustain performance under higher concurrency, further tuning of buffer size, read threads, and CPU parallelism may be beneficial. This workload is a good indicator of how MySQL will perform in read-heavy applications like caching layers or real-time dashboards.

  • SPDK AIO bdevperf Performance Report: Analyzing Workload on AWS Graviton4

    We conducted SPDK bdevperf tests on an AWS EC2 r8gd.metal-24xl instance, focusing on single CPU core performance under high I/O load. Our objective was to demonstrate a CPU-bound workload. Results show low I/O wait and high CPU utilization, confirming the CPU is the limiting factor. The 2-disk configuration achieved the highest throughput, indicating a CPU saturation point. 1. Performance Results Summary  (100-second duration) Below is a consolidated view of our 100-second bdevperf runs across 1, 2, and 3 local NVMe disks. These figures include throughput, latency, CPU utilization (from mpstat), and Instructions Per Cycle (IPC, from perf stat) for the dedicated CPU core. 2. Test Setup and Environment Tests were conducted on an AWS EC2 r8gd.metal-24xl instance, a bare-metal machine. Processor: AWS Graviton4 (ARM-based). Local Storage:  Three 1900 GiB NVMe SSDs (Instance Store). bdevperf parameters used to focus on CPU utilization include: SPDK Driver: AIO (Asynchronous I/O), which uses the Linux kernel's native AIO interfaces. Queue Depth (QD): 384, set high to keep storage busy. I/O Size (IO_SIZE): 4096 bytes (4 KiB), a block size for transactional workloads. Workload Type: randrw (Mixed Random Read/Write) with 70% Reads / 30% Writes. Test Duration:  100 seconds per run. CPU Core Dedication: bdevperf was affinity-set to a single CPU core (core 0) to measure that core's I/O processing capacity. 3. Our Script's Test Methodology Our custom automation script executes and monitors bdevperf tests as follows: 1. Device Identification & Selection: The script identifies available, unmounted NVMe block devices, excluding system partitions. We then select devices for testing. 2. SPDK AIO bdev Configuration: For each test (incrementally adding selected disks), a JSON configuration file is generated. This configures SPDK to use AIO block devices from the physical NVMe drives. 3. Performance Execution with Monitoring: bdevperf runs with the generated configuration. mpstat Monitoring: mpstat concurrently monitors the dedicated CPU core(s), capturing CPU utilization percentages (User, System, Idle, I/O Wait). perf stat Monitoring:  perf stat wraps bdevperf, targeting the dedicated CPU core(s). It collects hardware performance counter data (instructions, cycles) and directly extracts Instructions Per Cycle (IPC), a measure of CPU efficiency. All raw outputs are saved to a unique, timestamped directory. 4. Results Aggregation & Summary: After each test, the script parses bdevperf (IOPS, throughput, latency) and CPU metrics. A summary table is presented, highlighting the configuration with the highest total CPU utilization. 4. Key Findings: CPU-Bound Workload Confirmation Our tests confirm that the workload is CPU-bound  on the single dedicated Graviton4 core, not bottlenecked by NVMe storage. Low I/O Wait: Multi-disk configurations show I/O Wait at 0.02% to 0.06%, indicating NVMe storage provides data faster than the CPU can process it. The single disk I/O wait is 5.49%. High CPU Utilization:  Total CPU Utilization on the dedicated core remained high (84.87% for 1-disk, nearly 99% for 2-disk and 3-disk), confirming the single core as the performance bottleneck. Dominant System CPU: High System CPU (76-87%) is expected with SPDK AIO bdevs under heavy load, reflecting kernel overhead in processing numerous asynchronous I/O requests. IPC Values:  IPC values (2.21 to 2.53) indicate the Graviton4 core's efficiency. The slight IPC increase with more disks suggests improved pipeline utilization as CPU saturation increases. 5. Performance Dynamics: 2 Disks vs. 3 Disks - Optimal Point Identification The comparison between 2-disk and 3-disk scenarios shows the CPU's saturation point: 2 Disks:  Optimal Throughput: This configuration achieved the highest throughput (658.59 kIOPS / 20.09 Gbps), with nearly 99% Total CPU Utilization and minimal I/O Wait. Two NVMe devices provide I/O that a single Graviton4 core can optimally handle, maximizing throughput without excessive contention. 3 Disks: Beyond Optimal Saturation: Adding a third disk maintained high CPU utilization (98.93%) and low I/O Wait (0.02%). However, total throughput slightly decreased (638.92 kIOPS / 19.50 Gbps), and average latency significantly increased to 1802.86 μs (from 1166.00 μs). This indicates that beyond CPU saturation, additional I/O sources increase contention and queueing, leading to higher latencies without throughput gain. 6. Limitations and Future Work: The Role of SPDK Drivers (VFIO/UIO) Our current methodology utilizes the SPDK AIO bdev driver , which passes all I/O through the Linux kernel's I/O stack. This incurs kernel overhead, contributing to our observed high System CPU utilization. SPDK offers  VFIO (Virtual Function I/O) and UIO (Userspace I/O) drivers for direct, zero-copy access to NVMe devices from user space, bypassing kernel overhead. These drivers typically offer higher IOPS and lower latency. We were unable to utilize VFIO or UIO drivers in this test series due to setup constraints.  Using these drivers could yield higher performance (more User CPU, less System CPU), further pushing the single Graviton4 core's capabilities. Future Work: Investigating SPDK performance with VFIO or UIO drivers to fully assess the r8gd.metal-24xl instance's potential by minimizing kernel involvement. Conclusion Our experiments confirm the CPU-bound nature of the SPDK AIO bdevperf workload on a single Graviton4 core. The r8gd.metal-24xl instance's local NVMe storage is suffcient to saturate a single CPU core with high-volume, small-block random I/O. The 2-disk configuration represents the optimal point for throughput before latency increases. Future tests with user-space drivers like VFIO or UIO could demonstrate even higher performance.

  • Automating Web Application Deployment on AWS EC2 with GitHub Actions

    Introduction Deploying web applications manually can be time-consuming and error-prone. Automating the deployment process ensures consistency, reduces downtime, and improves efficiency. In this blog, we will explore how to automate web application deployment on AWS EC2 using GitHub Actions. By the end of this guide, you will have a fully automated CI/CD pipeline that pushes code from a GitHub repository to an AWS EC2 instance, ensuring smooth and reliable deployments. Prerequisites Before we begin, ensure you have the following: An AWS account An EC2 instance with SSH access A GitHub repository containing your web application A domain name (optional) Basic knowledge of AWS, Linux, and GitHub Actions Step 1: Set Up Your EC2 Instance Log in to your AWS account and navigate to the EC2 dashboard. Launch a new EC2 instance with your preferred operating system (Ubuntu recommended). Create a new security group and allow inbound SSH (port 22) and HTTP/HTTPS traffic (ports 80, 443). Connect to your EC2 instance using SSH: ssh -i /path/to/your-key.pem ubuntu@your-ec2-ip Update the system and install necessary packages: sudo apt update && sudo apt upgrade -y sudo apt install -y git nginx docker Ensure your application dependencies are installed. Step 2: Configure SSH Access from GitHub Actions To allow GitHub Actions to SSH into your EC2 instance and deploy the code: Generate a new SSH key on your local machine: ssh-keygen -t rsa -b 4096 -C "github-actions" Copy the public key to your EC2 instance: cat ~/.ssh/id_rsa.pub | ssh ubuntu@your-ec2-ip 'mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys' Store the private key as a GitHub Actions secret: Go to your repository on GitHub. Navigate to Settings  > Secrets and variables  > Actions . Add a new secret named EC2_SSH_PRIVATE_KEY and paste the private key. Also, add a secret named EC2_HOST with your EC2 public IP address. Add a secret named EC2_USER with the value ubuntu (or your EC2 username). Step 3: Clone the Repository on EC2 SSH into your EC2 instance: ssh ubuntu@your-ec2-ip Navigate to the /var/www/html directory and clone your repository: cd /var/www/html git clone https://github.com/your-username/your-repo.git myapp Step 4: Configure Docker (If Using Docker) Navigate to the project directory: cd myapp Create a docker-compose.yml file: version: '3' services: app: image: myapp:latest build: . ports: - "80:80" Run the application using Docker: docker-compose up -d --build Step 5: Create a GitHub Actions Workflow In your GitHub repository, create a new directory for workflows: mkdir -p .github/workflows Create a new file named deploy.yml inside .github/workflows: name: Deploy to AWS EC2 on: push: branches: - main jobs: deploy: runs-on: ubuntu-latest steps: - name: Checkout Code uses: actions/checkout@v3 - name: Set up SSH run: | echo "${{ secrets.EC2_SSH_PRIVATE_KEY }}" > private_key.pem chmod 600 private_key.pem - name: Deploy to EC2 run: | ssh -o StrictHostKeyChecking=no -i private_key.pem ${{ secrets.EC2_USER }}@${{ secrets.EC2_HOST }} << 'EOF' cd /var/www/html/myapp git pull origin main docker-compose down docker-compose up -d --build exit EOF Step 6: Test the CI/CD Pipeline Push some changes to the main branch of your repository. Navigate to Actions  in your GitHub repository to see the workflow running. After the deployment completes, visit your EC2 instance's public IP in a browser. Step 7: Configure Nginx as a Reverse Proxy (Optional) Install Nginx on your EC2 instance if not already installed: sudo apt install nginx -y Create a new Nginx configuration file: sudo nano /etc/nginx/sites-available/myapp Add the following configuration: server { listen 80; server_name yourdomain.com; location / { proxy_pass http://localhost:80; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; } } Enable the configuration and restart Nginx: sudo ln -s /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled/ sudo systemctl restart nginx Step 8: Enable HTTPS with Let’s Encrypt (Optional) Install Certbot: sudo apt install certbot python3-certbot-nginx -y Obtain an SSL certificate: sudo certbot --nginx -d yourdomain.com -d www.yourdomain.com Verify SSL renewal: sudo certbot renew --dry-run Step 9: Set Up Auto-Restart for Services Ensure Docker services restart on reboot: sudo systemctl enable docker If using a Node.js or Python application, use PM2 or Supervisor to keep it running. Step 10: Implement Rollback Strategy Keep older versions of your application in a backup directory. In case of failure, manually switch to a previous version by checking out an older commit: git checkout docker-compose up -d --build Conclusion By following this guide, you have successfully automated the deployment of your web application on AWS EC2 using GitHub Actions. This setup ensures that every time you push code to the main branch, your application gets automatically updated on the server. For further improvements, consider: Adding rollback strategies for failed deployments. Implementing automated tests before deployment. Using AWS CodeDeploy for more complex deployment workflows.

bottom of page