Job Description
1.GPU/NPU Software Development and Optimization
Implement high performance kernels, operators, and libraries for GPU/NPU.
Profile with Nsight Systems/Compute, VTune, Perf, TensorBoard, etc., identify
bottlenecks and apply code level optimizations.
2.Robotics AI System Prototyping, Development and Tuning
Collaborate with Algorithm and Hardware teams to deploy various models on
development platforms (GPU/NPU-based) with real time performance constraints.
Build automated benchmarks, generate performance reports, and propose
optimization strategies.
3.Agentic System Research (KV-Cache etc.)
Design, implement, and accelerate KV Cache etc. for large model inference.
Explore and prototype Agentic (agent based, self adapting) inference
frameworks evaluate them in robotic AI scenarios.
Qualifications
* Currently enrolled in a Master's or Ph.D. program (Computer Science,
Electrical Engineering, AI, Mathematics, or related fields).
* Proficient in C/C++ and Python; ability to write clean, maintainable code.
* Solid understanding of the CUDA programming model; 1year of hands on CUDA
experience (kernel development, streams, memory management, optimization).
* Experience with profiling tools such as Nsight, VTune, Perf, TensorBoard,
etc.
* Familiarity with Transformers, CNNs, RNNs and the typical performance
bottlenecks during inference.
* Good reading/writing skills in English; effective teamwork across
multidisciplinary groups.
* Strong passion for pushing extreme boundaries of GPU/NPU acceleration,
robotics AI, and Agentic systems.
Skills as Plus:
* Experience with KV Cache, attention mechanism optimization, or model
compression (quantization, pruning, distillation).
* Hands on work with Agentic/agent based AI frameworks (e.g., ReAct, Tool
Use, Auto GPT).
* Development experience on NPUs or other heterogeneous accelerators.
* Contributions to open source projects such as TensorRT, ONNX Runtime,
OneAPI, etc.
* Linux system tuning, driver development, or low level hardware interface
knowledge.