Expoint – all jobs in one place
The point where experts and best companies meet
Limitless High-tech career opportunities - Expoint

Nvidia Engineering Manager - Rack Scale AI Systems 
United States, California 
160873492

Yesterday
US, CA, Santa Clara
time type
Full time
posted on
Posted 14 Days Ago
job requisition id

What you'll be doing:

  • Build and Lead an engineering organization focused on Rack Scale systems onboarding and Bring up execution along with external and internal partner engagement.

  • Work with Engineering, Product Management, and Customer Program Management teams to define, prioritize, and implement features, infrastructure, processes, and workflows.

  • Work with NVIDIA Product Teams to understand new product requirements including HPC and AI/ML Products.

  • Collaborate with multi-functional teams, including system engineering, software engineering, mechanical/thermal engineering, operations, data center teams, external vendors, and other partners to successfully deliver a reliable and robust platform from concept to prototype to deployments.

  • Help identify potential or observed weaknesses in the current process, offer ideas for actions that can improve quality.

  • Drive overall quality of deployments and improve time to market next gen products

  • Lead the on-ground team in collecting data on SOL deployments, physical touch information, and patterns of failure.

  • Drive overall triage and recovery execution during product bring up and maintain support through product sustaining phase.

What we need to see:

  • Bachelor's or Master's Degree in Computer Science or Software Engineering, or equivalent experience.

  • 5+ years of Management experience in large, cross-matrix, and geo-dispersed technology organizations focused in/around the server and data center space—strong experiences in Operations Product Engineering with 8+ years of overall experience.

  • Strong technical skills and understanding of embedded systems, orchestration & automation systems, data centers and cloud architecture, as well as excellent communication and planning skills.

  • Deep understanding of cloud design in the areas of virtualization and global infrastructure, distributed systems, load balancing and security

  • Excellent thought process for identifying risks and developing robust mitigations

  • Strong collaborative and interpersonal skills, specifically a proven track record to effectively guide and influence within a diverse team

Ways to stand out from the crowd:

  • Experience in large scale QA environments, for product bring ups.

  • Experience with high performance or large scale computing environments, parallel computing, or CUDA.

  • Special skills in large-scale computing and cluster computing(MPI), data center design include high speed interconnect InfiniBand, Cluster Storage and Scheduling related design and/or management experience.

  • Experience with converged and hyper-converged hardware and servers.

  • Strong background on Windows & Linux administration.

You will also be eligible for equity and .