Expoint – all jobs in one place
Finding the best job has never been easier
Limitless High-tech career opportunities - Expoint

Nvidia Senior AI Network System Architect 
Israel, North District 
886348886

Today
Israel, Yokneam
Israel, Tel Aviv
time type
Full time
posted on
Posted 7 Days Ago
job requisition id

What You’ll Be Doing:

  • Define, develop, and execute cutting-edge benchmarks and workloads to analyze system performance, identify bottlenecks, and drive optimizations across our hardware and software stack.

  • Drive the direction of our future products by performing deep-dive analysis of system architectures and solutions to assess their performance, efficiency, and value proposition.

  • Develop and validate sophisticated performance and network simulation models, correlating them with real-world hardware to predict and analyze the behavior of future systems.

  • Analyze and optimize the entire AI stack, including communication libraries (like NCCL) and system software to the underlying network fabric, developing Proof-of-Concepts (POCs) for new features and improvements.

  • Conceptualize next-generation networking architectures driven by emerging DL and AI technologies.

  • Collaborate with multi-functional teams, including other architecture teams, logic design, system software, firmware, and DL research teams, to ensure the successful execution of our vision.

What We Need To See:

  • M.Sc. or Ph.D. degree in Computer Science, Computer Engineering, or Electrical Engineering, or equivalent experience.

  • 6+ years of relevant industry or research experience in high-performance computing, computer architecture, or computer networks.

  • Excellent understanding of large-scale system behavior and the effect of distributed computing workloads on network and system performance.

  • Proven experience in simulative performance analysis or benchmarking.

  • Exceptional analytical, problem-solving, and systems-thinking skills, with the ability to translate complex technical data into strategic architectural insights.

  • Hands-on programming skills in Python and/or AI frameworks for system analysis, automation, and modeling.

  • Ability to thrive in a fast-paced, dynamic environment and work concurrently with multiple groups across the organization.

Ways To Stand Out From The Crowd:

  • Expertise in the architecture and system-level requirements of large-scale, distributed DL workloads (e.g., LLMs, Generative AI for vision).

  • Deep understanding of communication libraries such as NCCL, UCX, or UCC.

  • Expertise in network protocols (Ethernet, InfiniBand, RoCE) and large-scale network topologies.

  • Experience with industry-standard AI benchmarks (e.g., MLPerf) and NVIDIA's frameworks (e.g., NeMo) on large-scale clusters.