Job Summary:
We are seeking experienced Platform Engineers with expertise in MLOps and handling
distributed systems, particularly Kubernetes, along with a strong background in managing
Multi-GPU, Multi-Node Deep Learning job/inference scheduling. Proficiency in Linux (Ubuntu)
systems, the ability to create intricate shell scripts, good proficiency in working with
configuration management tools and sufficient understanding of deep learning workflow.
Required Skills & Qualifications:
● Experience:
○ 3+ years of experience in platform engineering, DevOps, or systems
engineering, with a strong focus on machine learning and AI workloads.
○ Proven experience working with LLM workflows, and GPU-based machine
learning infrastructure.
○ Hands-on experience in managing distributed computing systems, training
large-scale models, and deploying AI systems in cloud environments.
○ Knowledge of GPU architectures (e.g., NVIDIA A100, V100, etc.), multi-GPU
systems, and optimization techniques for AI workloads.
● Technical Skills:
○ Proficiency in Linux systems and command-line tools. Strong scripting skills
(Python, Bash, or similar).
○ Expertise in containerization and orchestration technologies (e.g., Docker,
Kubernetes, Helm).
○ Experience with cloud platform (AWS), tools such as Terraform, /Terragrunt, or
similar infrastructure-as-code solutions, and exposure to automation of CICD
pipelines using Jenkins/Gitlab/Github, etc.
○ Familiarity with machine learning frameworks (TensorFlow, PyTorch, etc.) and
deep learning model deployment pipelines. Exposure to vLLM or NVIDIA
software stack for data & model management is preferred.
○ Expertise in performance optimization tools and techniques for GPUs, including
memory management, parallel processing, and hardware acceleration.
● Soft Skills:
○ Strong problem-solving skills and ability to work on complex system-level
challenges.
○ Excellent communication skills, with the ability to collaborate across technical
and non-technical teams.
○ Self-motivated and capable of driving initiatives in a fast-paced environment.
Good to Have Skills:
● Experience in building or managing machine learning platforms, specifically for
generative AI models or large-scale NLP tasks.
● Familiarity with distributed computing frameworks (e.g., Dask, MPI, Pytorch DDP) and
data pipeline orchestration tools (e.g., AWS Glue, Apache Airflow, etc).
● Knowledge of AI model deployment frameworks such as TensorFlow Serving,
TorchServe, vLLM, Triton Inference Server.
● Good understanding of LLM inference & how to optimize self-managed infrastructure
● Understanding of AI model explainability, fairness, and ethical AI considerations.
● Experience in automating and scaling the deployment of AI models on a global
infrastructure.
...German Speaking Call Center Agent Black Pen Recruitment is a global leading FinTech recruitment agency based in Cape Town, South Africa. As our organization is a 100% female owned company, we have made it our mission to bring diversity hiring within the FinTech industry...
...share our mindset and our belief that we can make an impact, one detail at a time. POSITION SUMMARY The Senior Digital Marketing Manager is responsible for the management, implementation, and execution of Nixons global digital marketing channels. They...
...Certified Program Instructor (i.e. must have Licensed Chemical Dependency Counselor, LCDC, **AND** Licensed Professional Counselor, LPC, **OR** LPC Supervisor). Must maintain valid licensure for continued employment in position.+ A valid driver's license in the state...
...About the Family:Loving family is looking for a compassionate and engaging nanny to care for their 1-year-old daughter in the afternoons, 3 days a week, while the mom takes their 3-year-old son to therapy. The family lives in Belmont, MA, and has a small, friendly, hypoallergenic...
Job responsibilities include maintaining nap and meal schedule. Light cleaning for kids spaces, kid laundry, kid dishes. Qualifications A love of working with children Excellent communication skills Ability to follow directions Passion for helping others...