CV
Director of Engineering at Qualcomm. Researcher working on AI systems, inference optimization, and hardware-software co-design.
Contact Information
| Name | Alex (Wei) Chen |
| Professional Title | Director of Engineering |
| alexchen4ai@gmail.com |
Professional Summary
Director of Engineering at Qualcomm (via acquisition of Nexa AI). Researcher and engineer focused on AI systems, LLM/VLM inference optimization, hardware-software co-design, and robotics. PhD from Stanford University (2024).
Experience
-
2026 - present Santa Clara, CA
Director of Engineering (Principal Engineer/Manager)
Qualcomm
- Joined Qualcomm following the acquisition of Nexa AI. Lead on-device AI optimization research and engineering focused on accelerating generative AI inference on Qualcomm Snapdragon NPU (Hexagon HTP) and edge SoCs.
- Drive end-to-end optimization of LLMs and VLMs for Qualcomm Hexagon HTP NPU, encompassing hardware-aware quantization (INT4/INT8/FP16 mixed-precision), operator fusion, memory bandwidth scheduling, and kernel-level tuning.
- Collaborate with silicon, compiler, and runtime teams to co-design neural network architectures and inference pipelines that fully exploit Hexagon NPU vector and tensor acceleration.
- Manage a cross-functional team spanning model research, runtime optimization, and SDK development.
-
2024 - 2026 Palo Alto, CA
Founder, CEO and Chief Scientist
Nexa AI
- Founded Nexa AI, specializing in efficient AI research and deployment. Built the NexaML engine and NexaSDK (7.6K GitHub stars) for generative AI model deployment on NPU, GPU, and CPU. Acquired by Qualcomm in 2026.
- Principal architect and first author of AutoNeural, Octopus V1–V4, OmniVLM, OmniAudio, and NexaQuant. Octopus V2 represents ~2% of total HuggingFace downloads since 2022.
- Solutions deliver ~55% faster throughput and ~28% better output quality vs. existing AI inference stacks. Trusted by Geely, HP, Lenovo, and İşbank.
- Official partner of Qualcomm, AMD, NVIDIA, IBM, Google, Microsoft, Intel, Docker, HP, Lenovo, and Dell.
-
2021 - 2023 Palo Alto, CA
Investment Scout
Sequoia Capital
- Sourced and evaluated early-stage startups in the Bay Area focusing on the Stanford ecosystem. Conducted market research, due diligence, and facilitated founder-partner connections.
Education
Awards
-
2025 Best Paper Runner-Up Award
IEEE ICDM
For the paper: DP-FedLoRA: Privacy-Enhanced Federated Fine-Tuning for On-Device Large Language Models.
Publications
-
2025 -
2025 -
2024 AutoNeural: Co-Designing Vision-Language Models for NPU Inference
arXiv:2512.02924
-
2024 -
2024 Octopus v4: Graph of language models
arXiv:2404.19296
-
2024 -
2024 Octopus v2: On-device language model for super agent
arXiv:2404.01744
-
2024 On-Device Language Models: A Comprehensive Review
arXiv:2409.00088
Skills
Programming Languages (Expert): Python, C/C++, Java, JavaScript
Machine Learning (Expert): LLM/VLM Pretraining & Post-training, Quantization, Model Optimization, On-device Inference, PyTorch, ONNX
Hardware Acceleration (Expert): Qualcomm Hexagon HTP, CUDA, Vulkan, OpenCL, Kernel Optimization for NPU/GPU/CPU
Systems (Proficient): Linux, Docker
Languages
English : Fluent
Chinese (Mandarin) : Native