The point where experts and best companies meet

Nvidia Engineering Manager - Rack Scale AI Systems
United States, California
160873492

Yesterday

US, CA, Santa Clara

What you'll be doing:

Build and Lead an engineering organization focused on Rack Scale systems onboarding and Bring up execution along with external and internal partner engagement.
Work with Engineering, Product Management, and Customer Program Management teams to define, prioritize, and implement features, infrastructure, processes, and workflows.
Work with NVIDIA Product Teams to understand new product requirements including HPC and AI/ML Products.
Collaborate with multi-functional teams, including system engineering, software engineering, mechanical/thermal engineering, operations, data center teams, external vendors, and other partners to successfully deliver a reliable and robust platform from concept to prototype to deployments.
Help identify potential or observed weaknesses in the current process, offer ideas for actions that can improve quality.
Drive overall quality of deployments and improve time to market next gen products
Lead the on-ground team in collecting data on SOL deployments, physical touch information, and patterns of failure.
Drive overall triage and recovery execution during product bring up and maintain support through product sustaining phase.

What we need to see:

Bachelor's or Master's Degree in Computer Science or Software Engineering, or equivalent experience.
5+ years of Management experience in large, cross-matrix, and geo-dispersed technology organizations focused in/around the server and data center space—strong experiences in Operations Product Engineering with 8+ years of overall experience.
Strong technical skills and understanding of embedded systems, orchestration & automation systems, data centers and cloud architecture, as well as excellent communication and planning skills.
Deep understanding of cloud design in the areas of virtualization and global infrastructure, distributed systems, load balancing and security
Excellent thought process for identifying risks and developing robust mitigations
Strong collaborative and interpersonal skills, specifically a proven track record to effectively guide and influence within a diverse team

Ways to stand out from the crowd:

Experience in large scale QA environments, for product bring ups.
Experience with high performance or large scale computing environments, parallel computing, or CUDA.
Special skills in large-scale computing and cluster computing(MPI), data center design include high speed interconnect InfiniBand, Cluster Storage and Scheduling related design and/or management experience.
Experience with converged and hyper-converged hardware and servers.
Strong background on Windows & Linux administration.

You will also be eligible for equity and .

These jobs might be a good fit

Intel AI Hardware Rack Verification Engineer United States, Oregon, Hillsboro

Get to the top of the "yes list" with a standout CV!

CREATE CV