Expoint – all jobs in one place
מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר
Limitless High-tech career opportunities - Expoint

Ebay AI Infrastructure DevOPS Engineer 
Netherlands, North Holland, Amsterdam 
999576035

10.06.2025
What you will accomplish:
  • Architect and design of high-performance storage system for GPU cluster, supporting large checkpoints and low-latency context preemption and reloads.

  • Develop monitoring and observability tools for GPU clusters

  • Maintain high availability, fault tolerance, and disaster recovery strategies for AI infrastructure

  • Work closely with AI/ML engineers, data scientists, and DevOps teams to streamline AI workflows.

What you will bring:

  • Masters or PhD in EE or CS

  • Over 5 years of experience building HPC systems

  • C/C++ Programming – for performance-critical components and integration tasks. Lustre (Paralell filesystems is in C)

  • Linux Kernel and OS internals – to optimize system behavior and support kernel-level customization for filesystems and networking

  • Filesystems knowledge – with a strong preference for experience in Lustre or similar distributed filesystems

  • Kubernetes – for container orchestration and management at scale

  • Hardware and Networking familiarity – to work effectively with low-level infrastructure and tuning

Good to have:

  • Strong understanding of RDMA, RoCE V2 protocols

  • Hands-on experience with GPUs

  • Understanding of AI Workflows, training, inferencing

  • Understanding of AI/ML Python frameworks (TensorFlow, PyTorch)