top of page
Requirements
Focus: "Hot loop" optimization.
Key Responsibilities:
Kernel Optimization: Identify and extract performance-critical "hot loops" from AI kernels.
Low-Level Dev: Write high-efficiency C/C++ code targeted at architectures.
Profiling: Use hardware debuggers and profilers to eliminate latency in AI inference paths.
Requirements:
Expertise in C/C++ and Assembly.
Background in computer architecture or compiler design.
Experience with performance tuning and memory management.
bottom of page
