Machine Learning Engineer
InfiVR
Bengaluru · Karnataka · India
Full-time
2-5
5d ago
91%
Strong
Job description
Job Title: ML Engineer — Edge AI (Vision, Voice & On-Device LLMs)
Company: InfiVR
Location: Bangalore, India (On-Site)
Employment Type: Full-time
Experience Level: 2–5 Years
About InfiVR
InfiVR delivers AI, computer vision, and immersive digital solutions for industrial enterprises across Oil & Gas, Defense, Healthcare, and Aerospace. We build intelligent applications that run in real-world operational environments — on edge devices, in the field, often without connectivity. Our clients include Fortune 500 companies and leading global organizations.
About the Role
We are looking for an ML Engineer who can take AI models from research to production on mobile and edge hardware. This is not a training-in-the-cloud role — you will be selecting, optimizing, and deploying models that run entirely on-device under tight compute, memory, and power constraints. You will work across computer vision, speech-to-text, and small language models, shipping them on Qualcomm chipsets with NPU acceleration for industrial field applications.
Responsibilities
Evaluate and benchmark pre-trained models for on-device deployment across object detection, OCR, speech-to-text, and conversational AI. Quantize and optimize models (INT8, INT4, W4A16) using AIMET or equivalent tools, targeting ONNX, TFLite, and QNN formats. Profile and optimize inference latency, memory usage, and thermal performance on actual target hardware. Integrate models into Android applications using ONNX Runtime (QNN Execution Provider), whisper.cpp, llama.cpp, and NDK. Build and maintain on-device RAG pipelines using local vector stores and small language models. Collaborate with Android developers on camera, audio, and sensor integration for AI-powered field applications.
Requirements
2+ years deploying ML models on mobile or edge devices — not just training, but shipping on real hardware. Hands-on experience with model quantization and optimization for constrained environments. Strong working knowledge of at least two of: object detection (YOLO family, EfficientDet), speech-to-text (Whisper, wav2vec), or small language models (Phi, Gemma, Llama). Proficiency in Python and PyTorch for model preparation. Comfortable with C/C++ and Android NDK for on-device integration. Understanding of ONNX model format and runtime ecosystem.
Good to Have
Experience with Qualcomm AI stack (AI Hub, QNN SDK, Hexagon NPU). Familiarity with PaddleOCR or similar mobile OCR frameworks. Prior work on industrial, field-deployed, or offline-first applications. Exposure to on-device embedding models and vector search.