top of page

79 results found with an empty search

  • Tuning Compiler Flags for Custom Hardware

    Benchmarking SPECint on FPGA: Introductio n With the growing interest in AI hardware for high-performance and power-efficient computing, understanding how industry-standard benchmarks perform on such platforms is critical. In this paper, we focus on SPECrate®2017 Integer workloads, a widely-used CPU benchmark suite, and share a case study comparing various runs on an FPGA target: a base run and a tuned run that achieved better performance. This paper describes how the tuning and benchmarking procedure was executed, the challenges faced, and what we learned from this hands-on analysis. Why SPECint on FPGA? SPECrate®2017 integer evaluates the integer processing capabilities of a CPU through a set of compute-intensive, single-threaded programs. Running these on an FPGA (with soft or hardened CPU cores) helps evaluate and tune how custom logic performs in realistic software scenarios, especially in workloads like compilers, compression, and AI preprocessing. Benchmarking Setup Platform: FPGA Emulation: Run on QEMU for pre-validation, native execution on FPGA target Benchmark Suite: SPEC CPU2017 Cross-compilation: All benchmarks built using a target toolchain with specmake, applying -static, and a base set of flags Base Run: No tuning; baseline compiler flags, minimal memory tuning Optimized Run: Enhanced compiler flags, better memory layout, cache tuning Here’s how the benchmarking was carried out: Cross-Compilation of SPECrate®2017 Integer Benchmarks Ensured static linking for portability Verified ELF binaries using file and readelf Execution with runspec Invoked with runspec --config=target.cfg --tune=base --size=test,train.ref for initial testing Data Collection Captured runtime, SPEC score, and individual benchmark outputs Track CPU MHz, instruction counts using perf or counters Tricks Use math models to reduce run times of Spec Workloads Get a sense for Test, Train and Ref workloads and find a relation so there is no need to runref everytime.

  • Performance Modelling: How to Predict and Optimize System Efficiency

    1. Introduction In today’s fast-paced digital world, system performance is critical to the success of applications ranging from cloud computing platforms to high-performance computing (HPC) workloads. Performance modelling is a powerful technique used to predict, analyze, and optimize the efficiency of computing systems. By simulating and understanding system behavior, developers, engineers, and IT managers can make informed decisions about design, scaling, and optimization strategies. 2. What is Performance Modelling? Performance modelling is the process of creating abstract representations (models) of a system's behavior under various workloads and configurations. These models help predict how systems respond to changes in usage, hardware, software, or architecture. Performance models can be analytical, simulation-based, or empirical, each offering unique insights into system behavior. 3. Objectives of Performance Modelling Prediction:  Estimate system behavior before deployment. Bottleneck Identification:  Locate components that limit performance. Optimization:  Inform design choices to improve efficiency. Capacity Planning:  Guide resource allocation for current and future needs. Cost Efficiency:  Avoid over-provisioning and reduce operational expenses. 4. Key Techniques in Performance Modelling Analytical Models:  Use mathematical formulas to describe system performance. Simulation Models:  Create detailed simulations to mimic system behavior over time. This could be as simple as equations with simple assumptions or using models available online. Empirical Models:  Rely on real-world data and benchmarks to build predictive models. This is more involved since this requires in-depth knowledge of system architecture. 5. Steps in Developing a Performance Model Define Goals:  Determine what you want to achieve (e.g., optimize response time, throughput). Collect Data:  Gather metrics from logs, monitoring tools, or benchmarks. Choose Modelling Technique:  Decide between analytical, simulation, or empirical models. Build the Model:  Construct the performance model using appropriate tools or software. Validate the Model:  Compare predictions with actual performance to ensure accuracy. Analyze & Optimize:  Use the model to explore different configurations and identify optimal settings. 6. Tools for Performance Modelling Queuing Models for analyzing response times Simulators for detailed, event-based modeling Benchmarking Suites for real-world performance data Profiling Tools for low-level performance metrics 7. Applications of Performance Modelling High-Performance Computing (HPC):  Optimize cluster performance and parallel job scheduling. Cloud Computing:  Predict performance under varying loads and optimize resource allocation. Software Engineering:  Improve application architecture and identify inefficient code paths. Enterprise IT:  Plan for infrastructure upgrades and disaster recovery. 8. Challenges and Best Practices Challenges: Model accuracy vs. complexity trade-off Data collection overhead Environmental variability Best Practices: Keep models as simple as possible while maintaining accuracy Continuously validate models against real performance Use a combination of modelling techniques when necessary 9. Conclusion Performance modelling is an indispensable approach for understanding, predicting, and optimizing system efficiency. Whether you're designing a new application, upgrading infrastructure, or managing a complex cloud environment, performance models can help you make better, data-driven decisions. By embracing the right modelling techniques and tools, organizations can improve performance, reduce costs, and deliver superior user experiences.

  • Unleashing Performance Insights on ARM: Bringing Intel's PerfSpect to the Entire Ecosystem

    Performance analysis can often feel like searching for a needle in a haystack. When your application isn't running as fast as you'd like, where do you even begin to look? Is it a memory bottleneck? Are you stalling in the CPU's front-end? Answering these questions is critical, but traditional tools can be complex and overwhelming. This is where Intel's PerfSpect comes in. And now, thanks to some recent contributions, this powerful tool is no longer just for x86 systems. I'm happy to share - how I've been able to natively compile PerfSpect on ARM architecture- enabling deep performance analysis on platforms like Neoverse series for processors like Ampere, AWS Graviton, Google Axion, NVIDIA Grace and Microsoft Cobalt series of Processors supporting. Why PerfSpect? A Simpler Path to Performance Insights PerfSpect is a lightweight, command-line performance analysis tool. Its primary strength lies in its use of the Top-Down Microarchitecture Analysis (TMA)  methodology. Instead of drowning you in hundreds of raw performance counters, TMA provides a structured, hierarchical way to identify the primary bottleneck in your system. It breaks down CPU cycles into a few high-level categories: Front-End Bound:  The CPU isn't getting instructions fast enough. Back-End Bound:  Instructions are available, but the execution units are stalled. This is further broken down into: Core Bound:  The computation units are the bottleneck. Memory Bound:  The CPU is waiting on data from memory or caches. Retiring:  The CPU is successfully executing instructions. This is the "good" category. Bad Speculation / Miss:  The CPU wasted work on instructions that were ultimately discarded (e.g., due to branch misprediction). By presenting performance data through this lens, PerfSpect makes it incredibly easy to pinpoint the character  of your bottleneck and tells you exactly where to focus your optimization efforts. The Competitive Landscape: How Does PerfSpect Compare? PerfSpect doesn't exist in a vacuum. The Linux ecosystem is rich with powerful profiling tools like perf, Intel VTune Profiler, AMD uProf. PerfSpect's unique value is its combination of simplicity, structured TMA methodology, and now, cross-architecture support.  It provides actionable insights without the steep learning curve of raw perf or the complexity of a full-blown GUI profiler. My Contribution: Native Support for ARM My primary contribution was to port PerfSpect, enabling it to build and run natively on ARMv8/ARMv9 architectures. This involved mapping the ARM Performance Monitoring Unit (PMU) events to the TMA categories, allowing the same intuitive reporting to work seamlessly on platforms from Ampere, Amazon, and Microsoft. Now, developers can use a single, familiar tool to analyze workloads across different server fleets. Get Started: Build and Run PerfSpect on ARM Ready to try it on your ARM machine? Here’s how you can get it up and running. Prerequisites Ensure you have Python, pip, and the standard Linux performance tools installed. # For Debian/Ubuntu-based systems $ sudo apt-get update $ sudo apt-get install -y python3 python3-pip linux-tools-common linux-tools-generic Step 1: Clone the Repository $ git clone -b Neoverse-native-support https://github.com/Whileone-Techsoft/PerfSpect.git $ cd PerfSpect Step 2: Build Tools Docker for aarch64 $ ./builder/build.sh Step 3: Build Perfspect natively on aarch64 $ make -j Sample TMA image for Graviton4

  • Building Observability-Driven Performance Benchmarking Frameworks

    Complex computing environments, spanning cloud, HPC, AI, and edge workloads; observability is no longer optional. With multiple layers of hardware and software working together, traditional monitoring alone cannot surface the insights needed for optimizing performance or preventing downtime. At Whileone Techsoft Pvt. Ltd. , we help companies go beyond monitoring by building deep observability frameworks that connect performance benchmarking , system analytics , telemetry , and profiling . This integrated approach helps engineering teams gain complete visibility into their systems, enabling faster debugging, reduced operational costs, and enhanced end-user experiences. Why Observability Matters As infrastructures scale and workloads diversify, blind spots emerge. This can lead to: Observability addresses these challenges by providing end-to-end visibility into the state and behavior of your systems. This means you can detect issues earlier, understand their root causes, and fix them before they impact users. Observability vs. Traditional Monitoring Traditional monitoring answers the “what”, for example, CPU utilisation or error counts. Observability goes deeper and answers the “why” behind performance issues. It focuses on three core pillars: Metrics  – Quantifiable measurements (e.g., latency, throughput) Logs  – Detailed event records for context Traces  – Understanding requests as they travel across distributed systems At Whileone Techsoft, we layer these pillars with analytics, telemetry, and profiling  to deliver actionable insights. Performance Benchmarking: The Foundation Our Performance Benchmarking Services form the cornerstone of observability. We help companies: This data-driven approach uncovers bottlenecks early before they become costly in production. System Analytics for Deeper Understanding Benchmarking generates performance data, but analytics transforms that data into insights. System analytics  helps teams understand: How workloads utilize CPU, memory, I/O, and network resources Correlation between resource consumption and performance outcomes Trends and anomalies in system behavior over time Our analytics frameworks leverage advanced models to identify optimization opportunities, ensuring your workloads perform consistently and reliably. Telemetry for Real-Time Visibility Telemetry extends observability by collecting live data from hardware, firmware, middleware, and applications. It captures fine-grained performance metrics continuously Enables proactive alerts for deviations from benchmarks Allows visualization of live system health through unified dashboards Whileone’s use of open standards like OpenTelemetry  makes this telemetry layer scalable and interoperable with your existing tools. Profiling for Root Cause Analysis Even the best benchmarking and telemetry setups cannot replace profiling when you need detailed root cause analysis. System-level profiling : Identifies hotspots in the kernel, drivers, or hardware interfaces Code-level profiling : Finds inefficient functions, loops, or algorithms in the application stack By correlating profiling data with benchmark and telemetry insights, we help engineering teams quickly diagnose and resolve performance regressions. An Integrated Observability Framework At Whileone Techsoft, we integrate benchmarking, analytics, telemetry, and profiling into a single observability framework: Unified dashboards to correlate data across layers Automated workflows for continuous testing and monitoring Cross-silo visibility that spans hardware, system software, and applications This holistic approach ensures reliable, high-performance outcomes for pre-silicon validation, cloud workload optimization, and edge deployments. Benefits of Observability-Driven Benchmarking Real-World Example One of our semiconductor customers was struggling with inconsistent performance in their post-silicon validation phase. By deploying Whileone’s observability-driven benchmarking framework , they were able to: Pinpoint compiler-level inefficiencies using code profiling: https://www.whileone.in/post/tuning-compiler-flags-for-custom-hardware Correlate memory bandwidth metrics from telemetry data with workload performance: https://www.whileone.in/post/investigating-performance-discrepancy-in-hpl-test-on-arm64-machines Best Practices for Building Observability Observability is no longer a “nice-to-have”, it’s essential for ensuring reliable, high-performance systems. Whileone Techsoft Pvt. Ltd.  brings together performance benchmarking, system analytics, telemetry, and profiling  to build observability frameworks tailored for semiconductor companies, cloud providers, and software enterprises. Ready to take your performance engineering efforts to the next level? Reach out to us  to learn how our observability-driven services can help you reduce costs, accelerate time-to-market, and achieve industry-leading performance.

  • Understanding SPEC HPC Benchmarks: A Comprehensive Guide for Beginners

    1. Introduction High-Performance Computing (HPC) is at the core of solving complex computational problems in scientific research, engineering, and large-scale data analysis. Benchmarking plays a critical role in evaluating and optimizing HPC system performance. The Standard Performance Evaluation Corporation (SPEC) provides widely recognized benchmarking suites tailored for different computing environments, helping researchers, businesses, and hardware vendors assess system capabilities. 2. What is SPEC HPC Benchmarking? SPEC HPC benchmarks are designed to measure the performance of high-performance computing systems under real-world workloads. Unlike general performance testing, SPEC HPC benchmarks focus on evaluating scalability, efficiency, and computational power across various hardware and software configurations. Key metrics include execution time, scalability efficiency, and energy consumption. 3. Why SPEC HPC Benchmarks Matter? Evaluating Scalability & Efficiency:  SPEC benchmarks measure how well HPC systems scale with increasing workloads. Benchmarking Real-World Applications:  Unlike synthetic benchmarks, SPEC HPC benchmarks reflect real-world HPC workloads used in scientific and industrial applications. Standardization & Comparability:  They enable fair performance comparisons between different architectures, compilers, and system configurations. 4. Key SPEC HPC Benchmark Suites SPEC MPI:  Measures parallel computing performance using MPI-based workloads. SPEC OMP:  Evaluates OpenMP-based applications for multi-threaded workloads. SPEC ACCEL:  Assesses performance on GPUs and other accelerators. SPEC CPU:  Focuses on single-thread and multi-thread performance in computational workloads. 5. How SPEC HPC Benchmarks Work Benchmark execution process: Benchmarks are executed under controlled conditions to ensure reproducibility. Setting up the testing environment: Includes configuring system parameters, compilers, and libraries. Running SPEC benchmarks on various HPC hardware: Executing the benchmark suite on HPC hardware to collect performance data. Collecting and analyzing results CPU, GPU, and memory performance Compiler optimizations and software configurations Networking and storage bottlenecks Factors that impact benchmarking results: 6. Understanding Benchmark Results Interpreting SPEC Scores:  Higher scores indicate better performance. Comparing Results:  Performance ratios help compare different architectures and software configurations. Case Studies:  SPEC benchmarks are widely used in industries like climate modeling, genomics, and engineering simulations to evaluate and improve HPC systems. 7. Best Practices for Running SPEC HPC Benchmarks Preparing an Optimized Benchmarking Environment:  Ensure system settings and compiler options align with best practices. Choosing the Right SPEC Benchmark:  Select the benchmark that aligns with the intended workload. Avoiding Common Mistakes:  Properly setting up software and avoiding misinterpretations of results ensures accurate assessments. 8. Future Trends in SPEC HPC Benchmarking AI, ML, and Cloud Computing:  Emerging workloads in artificial intelligence and machine learning are shaping future benchmarks. Heterogeneous Computing:  SPEC is evolving to benchmark performance across GPUs, FPGAs, and new architectures like RISC-V. Upcoming Developments:  Continuous updates in benchmarking methodologies are expected to keep pace with next-generation HPC innovations. 9. Conclusion SPEC HPC benchmarks provide a standardized way to evaluate and compare HPC system performance. Businesses, researchers, and hardware vendors can leverage these benchmarks to optimize their computing infrastructure. For further exploration, SPEC’s official website and research publications offer in-depth insights into benchmarking methodologies.

  • YOLOX on RISC-V QEMU

    Goal of this project: This project aims to determine RISC-V's readiness for running YOLOX for the latest edge requirements. Target Application: Running YOLOX on RISC-V QEMU involves setting up a RISC-V virtual machine and then configuring the necessary environment to compile and run YOLOX. Please note that this is a complex process, and it's essential to have prior experience with virtualization and RISC-V development. From the RISCV website, this is a blog ( https://riscv.org/blog/2023/07/yolox-for-object-detection/ ) which describes the steps to build and run YOLOX for a development board. These steps did not work as is when running on QEMU. This blog assumes the readers of this blog are comfortable with a Linux-based host system (this guide is based on Ubuntu 22.04). Step 1: Install QEMU and Set Up a RISC-V Virtual Machine First, you need to install QEMU and the RISC-V toolchain. You can do this by running: ​sudo apt-get install qemu-system-riscv In this step, you'll create a RISC-V virtual machine using QEMU. You'll need a RISC-V disk image for this. You can find pre-built RISC-V images for various Linux distributions online. You can also build your own RISC-V image if you prefer. wget https://cdimage.ubuntu.com/releases/22.04/release/ubuntu-22.04.3-preinstalled-server-riscv64+unmatched.img.xz tar xf ubuntu-22.04.3-preinstalled-server-riscv64+unmatched.img.xz #Rename the qemu_image mv ubuntu-22.04.3-preinstalled-server-riscv64+unmatched.img riscv-ubuntu2204.img qemu-img resize ubuntu-22.04.3-preinstalled-server-riscv64+unmatched.img +16G Launch the Qemu VM as follows: qemu-system-riscv64 -nographic -machine virt -m 16G -append "root=/dev/vda rw" -drive file=riscv-ubuntu2204.img,if=none,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -device virtio-net-device,netdev=net0 -netdev user,id=net0 This will boot the RISC-V VM with 16GB of RAM. Step 2: Configure the Python Environment Once the VM is up and running, log in, and set up your RISC-V development environment. You may need to install the necessary dependencies, which may vary depending on the distribution and the version. Most of the software packages that Python program software depends on can be installed by pip. You can run the following command to install pip. apt install python3-pip Before installing other Python packages, install the venv package that can be used to create a Python virtual environment. apt install python3.11-venv Create a Python virtual environment and activate it. cd /root python3 -m venv yolox source /root/yolox/bin/activate Step 3: Install necessary whl packages The Python ecology of the RISC-V architecture is still lacking. We have created build packages to be able to install directly on python3.11.  Step 4: Build and Run YOLOX Next, clone the YOLOX repository into your RISC-V qemu git clone https://github.com/Megvii-BaseDetection/YOLOX Navigate to the YOLOX directory and build the YOLOX code. This step may involve installing additional dependencies and configuring the build for RISC-V architecture. cd YOLOX make With YOLOX successfully built, you can now run it on your RISC-V system. You'll need to adapt the YOLOX commands to work with your specific use case and input data. ​Standard models https://github.com/Megvii-BaseDetection/YOLOX#standard-models In this example, yolox_s is downloaded. wget wttps://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s.pth -P /home/ubuntu/ python3 tools/demo.py image -n yolox-s -c /home/ubuntu/yolox_s.pth --path assets/demo.png --conf 0.25 --nms 0.45 --tsize 640 --save_result --device cpu #Output Logs 2023-09-15 17:05:49.803 | INFO | __main__:main:269 - Model Summary: Params: 8.97M, Gflops: 26.93 2023-09-15 17:05:49.860 | INFO | __main__:main:282 - loading checkpoint 2023-09-15 17:05:53.884 | INFO | __main__:main:286 - loaded checkpoint done. 2023-09-15 17:06:24.598 | INFO | __main__:inference:165 - Infer time: 30.0775s 2023-09-15 17:06:24.708 | INFO | __main__:image_demo:202 - Saving detection result in ./YOLOX_outputs/yolox_s/vis_res/2023_09_15_17_05_53/demo.png We would like to hear from you if this blog was useful to you. Please contact us at info@whileone.in. We would be happy to understand and discuss your requirements and showcase our expertise in a variety of cloud and edge technologies.

  • Bring up Yocto for RISC-V deployment

    We at Whileone Techsoft pvt ltd understood the requirements of our customer who wanted to have a basic Yocto based RiscV deployment for their custom SoC chip. The customer intended to share this basic deployment with their clients who wished to make use of our customer’s SoC in their products.  Our customer was unaware of Yocto and what was needed to ensure a favorable deployment. They had their own custom patched Linux kernel, Root file system, Toolchain and custom Bootloader and a custom simulator as well to boot the final image. Their client insisted on Yocto instead of their default BuildRoot deployment. As Yocto has its own tools, compiler and dependencies, the challenge was to ensure the final Image generated by Yocto was compatible enough to be run by their custom simulator. Introduction to Yocto: With the Opensource Yocto Project, we can create custom Linux based systems for embedded products. It is quite possible to tailor the Linux images as per requirements with a set of flexible tools and friendly customizable scripts. Yocto provides a reference embedded distribution called ‘Poky’ that was used for this project The customer’s custom patched Linux Kernel was of a much smaller version than the current that was available in the kernel.org website. So, initially when we went with the latest Yocto version (Mickledore, v4.2) which featured GCC compiler version 12.x, we got errors during Kernel build. The errors pointed to some unknown assembly instructions. The reason was that our custom kernel version was old and it wasn’t updated. As the Customer was already using GCC 11.x in their build infrastructure, we did a search for a match of Yocto version that provided the nearest GCC 11.x and that was found to be Yocto Honister (v3.4). Initial test build was successful with Honister and so we finalized this version before moving ahead. Yocto uses Bitbake as its build tool. So, whenever we plan to create recipes in Yocto, we should create a separate folder inside poky that starts with “meta-” as per the Yocto manual. Also, referring to similar meta folders like meta, meta-yocto-bsp, meta-poky; we came up with our own “meta-riscv-custom”. To add a new meta layer, make use of bitbake commands, such as the one given below, $> bitbake-layers add-layer meta-riscv-custom Yocto Configuration options: As we were using the sample Poky distribution of Yocto and to let Poky know that we intend to use our custom “meta-riscv-custom” folder in the build process, we have to update a file “bblayers.conf” in the build/conf directory. This build/conf directory is generated after we initialize the environment by executing “source oe-init-build-env” in the Poky root folder. Also, we have to modify the variable “MACHINE” among others in the location “build/conf/local.conf” to “qemuriscv64” and comment out the default value. There are other options in the file “local.conf” that we can modify to get image output in a desired format. The variable IMAGE_FSTYPES = “tar cpio” will generate the image in both tar and cpio formats. This is especially useful when we want to generate a root file system in this format. Creating recipes: Recipes are like script files that are created under the meta- folders. Files like “riscv-linux.bb” which is a recipe for building Linux kernel, “riscv-boot.bb” for building bootloader and so on. Custom changes: The customer was also interested to know how one could add a custom directory and files and make custom changes to existing files in the file system. Yocto has its own package group recipe file “packagegroup-core-boot.bb” that can be modified. For example, 1. We can disable UDEV by commenting it out 2. Similarly, we can also comment out HWCLOCK in the same file To create a custom folder “custom-riscv” inside root (“/”) and a file named “custom.conf” with some configuration options and comments, we had to modify a recipe file “base-files_3.0.14.bb”. Build: Yocto uses Bitbake as its build tool. To build, we make use of the following commands. $> bitbake -cclean riscv-linux $> bitbake riscv-linux The above command skips the extension “.bb”. Also, if we had not added the folder path of “meta-riscv-custom” to bblayers.conf, then we would get an error here after running the above command. Build artifacts: The artifacts are generated in the work directory under the path, Poky/build/tmp/work/riscv64-poky-linux/riscv-linux/1.0-r0/custom-linux/*

  • GCP Cloud Performance: Time-Based Score Variations

    In May 2022, one of our customers asked us to tune Elasticsearch with Esrally for cloud providers. We started with trying multiple combinations of manual runs on all cloud providers. We were collecting scaling runs with 2/4/8/16 cores. In the above data collection, we could not see the proportionate scores. Hence, we decided to experiment with running the Elasticsearch ESRally benchmark throughout the day. As Esrally doesn’t run for a particular duration, we carried out the runs 50 times so that it will span a whole day. And here is what we saw! Used configurations are: Altra - t2a-standard-16 Intel Icelake - n2d-standard-16 Milan - c2d-standard-16 Elasticsearch 8.4.1 Esrally 2.6.0 Server     Altra.    Intel Icelake.    Milan.  Client   Altra   Altra   Altra Variation is observed according to time of day. AMD is the best where SD is lowest. But Intel and Altra show large standard deviations. NGINX-wrk benchmark also shows such behaviour on GCP. NGINX- wrk runs are carried out 1440 times keeping each run ‘s duration 60 seconds. Variation in p95 latency is observed through the time of day. Both Intel and Altra show 10% standard deviation in p95 latency numbers.   Do consider Time-based Score Variations before running network applications: Time of day does affect latency since neighboring VMs might be busy or idle depending on the time of day.  Run-to-run variation is a function of time of day. Eventually, we were able to help the customer figure out where the performance difference was coming from. To ensure a specific output through the day the scaling of VM has been suggested.

  • Network compute agnostic Performance Analysis for Cloud workloads

    At Whileone we take pride in customer's success. We help customers achieve goals and execute out of the box ideas that are necessary for success. One such project was to get IPCs for cloud applications on different architectures, completely omitting network stack. This would give the RISC-V chip designing customer a good picture whether architecture IPC ( Instructions per Cycle ) is inline with competition like Intel or ARM.  To achieve this, we modified cloud applications to profile and benchmark the performance with no network or socket calls. The idea here was to see performance of different architectures with vanilla versions and lite ( modified ) versions. This would help the customer run these applications on their simulator. This would help them to get the IPC number of that architecture for that application and compare it with the competition. To give you an example, one of the applications we picked was Redis -a cached server application. Redis takes SET/GET requests from clients and processes those internally to keep cached copy for quick response. To get away with the network part, we simulated the client and to look like Redis has N SET/GET requests and processed those. Now the performance numbers we have are solely for that application processing on that architecture. This helped eliminate network noise and get a good picture of what the IPC is for core application processing. Table below shows the IPC Redis vs RedisLite. Drop in IPC can be attributed to networking sockets being removed. SET Redis Redis-Lite Graviton2 Intel 8275 Cascade Lake Graviton2 IPC 0.94 0.69 1.76 Icount / packet ~39000 ~30000 ~20200 In doing so, we made sure that we do not modify program logic and core behavior of the application in any way. We could see a similar call stack in case of Redis and Redis Lite. Below are the snapshots. REDIS Flamegraph REDIS-LITE Flamegraph As evident from the flame graphs- the call stack of the core application is not altered. In the Redis-lite flamegraph, the network component is absent.  Redis is a single threaded application. We helped the customer port various multi-threaded / multi-process applications. The customer was able to cross-compile these application and run it on the it’s RISC-V simulator. This was an interesting experiment from the performance numbers point of view and useful for the customer in the early phase of chip development. This helped the customer to understand where they are placed with respect to the competition.

  • Oracle Optimized BLIS Libraries for Ampere Altra Family

    Basic Linear Algebra Subprograms(BLAS) and BLAS Like Interface Software(BLIS) are libraries that can accelerate mathematical operations on current CPU microarchitectures. As a part of the FLAME project, BLIS was introduced to handle the dense linear algebra software stack. The framework was designed to isolate essential kernels of computation that, when optimized, immediately enable optimized implementations of most of its commonly used and computationally intensive operations. BLIS offers enhanced performance for cases of matrix multiplications where the operands are small. BLIS supports both, single and multi-threaded modes of operations. Oracle in its efforts has optimized the BLIS libraries for exceptional performance on Ampere Altra Family of processors. Let us look at how we can leverage this to our benefit and what sort of performance boost can be expected. Step 1: Getting BLIS Sources git clone https://github.com/flame/blis.git cd blis git checkout ampere​ Step 2: Building BLIS ./QuickStart.sh altramax #Change ./QuickStart.sh altramax to ./QuickStart.sh altra if building for Ampere Altra #processors source ./blis_build_altramax.sh source blis_setenv.sh export LD_LIBRARY_PATH=/lib/altramax Note: BLIS can be built for OpenMP(default) or pthreads. Details can be found in documentation/tutorial. Step 3: Performance Experiments ​ For our test, we will be using HPL- 2.3, a High Performance Linpack benchmark that is commonly used to test systems. We will be comparing performance of Oracle BLIS library with OpenBLAS and Arm-PL System Config: OS: Ubuntu 22.04 Kernel: 5.19.0-46-generic Toolchain: gcc (GCC) 12.3.0 Memory: 16x32GB Results: For HPL, the Oracle optimized BLIS libraries provide 1.2 times boost in performance.

  • Using Google Charts in a Dynamic Way - How Using Google Charts Allowed Flexibility in a Short Dev Time Window

    We had a requirement to build a charting facility that could provide several charts. The requirement implied that we needed to support several different charts, but those charts weren’t really defined, and flexibility was required. Our setup was a headless CMS ( Strapi.io ) and NextJS for server-side-rendered and statically generated pages. We found react-google-charts to be an interesting library. For any chart, this library requires inputs as chart-type, data, width and height. Our workaround to deliver this overnight was to add a JSON field in CMS for accepting these parameters and on the frontend, pass these to the react-google-charts. Implementing it in this way implied we could support 100% of what the react-google-charts supports. Given that most content managing stakeholders weren’t from a web development background, we documented and trained them on how they could leverage this feature. This again was made extremely simple by react-google-charts live examples . Using any one example which suited the requirements, the content managers could view the chart live, then open the sandbox, tweak data as required, then create a JSON from the data and chart-type. Initial few charts did take development time to figure out things like tweaking the colors, having two y-axes with values in different units and modifying the content of the hover bubble. However, after initial 3-4 charts, such additional things were repetitive and the content managers could easily figure things out on their own by referring to these initial 3-4 charts and using the sandboxes that react-google-charts provide.

  • UI/UX and Graphic Design: Creating Seamless Digital Solutions

    Effective UI/UX and graphic design are crucial for building user-friendly applications in today's digital world. At Whileone Techsoft Pvt. Ltd., we blend creativity with technical expertise to deliver visually appealing, high-performance products that drive business success. Our UI/UX and Graphic Design is all about: Why Us? We leverage a diverse set of design and development tools to create high-quality visual and interactive content. Our expertise spans various creative processes, ensuring that each project is enhanced with tailored, impactful visuals and elements that align with the client's goals. We deliver UI/UX and Development Together: As UI/UX and development are interconnected, our Designers create wireframes and prototypes that serve as blueprints for developers. Our expertise in tools like React and Vue.js facilitates this transition, ensuring a smooth development process. Collaboration between design and development teams results in visually consistent and high-performing applications. Our Experience in the Semiconductor Industry: Our UI/UX designs help simplify complex systems. Our services, where development and design go hand in hand, helped the customer deliver data with precision. Intuitive interfaces we created made visualizing and interacting with intricate technical data easier, reducing user error and enhancing productivity. … And customers achieved seamless improvement in their processes. CONTACT US FOR: At Whileone Techsoft Pvt. Ltd., we specialize in user-centric UI/UX and graphic design that aligns with business goals and technical needs. By working closely with development teams, we deliver cohesive digital products that enhance user engagement, improve workflows, and boost performance. Whether you're in any technical field, our design solutions simplify complex ideas with precision and clarity.

bottom of page