Share
What you will be doing:
Leading the overall architecture and design of our distributed storage service optimized for AI/ML
Building features for a distributed storage service to enhance availability and reliability for large-scale deployments
Engaging and collaborating with NVIDIA Research, Computing, Product teams, cross-functional teams, and external customers to deliver Cloud services.
Automating distributed storage service end-to-end, including deployment, management, and monitoring
What we need to see:
Strong track record of delivering distributed services in a variety of distributed computing environments
Experience designing, implementing, and operating distributed systems at a multi-petabyte scale
Experience in implementing storage services and interfaces to ensure scalable, high-performance, and reliable solutions
History of ownership of product delivery from inception to support
Great communication and presentation skills
Prior experience developing distributed systems with Kubernetes, Golang, Python, and Cloud Service Provider integrations
Bachelor’s of Science in Computer Science, or related field (or equivalent experience) with 5+ years of industry experience
Ways to stand out from the crowd:
You have architected, built, and deployed a distributed service that runs on large-scale clusters, multi-petabyte to exabyte in size, with millions of users
Experience and own responsibility for all software development and delivery stages.
Passionate about innovating and investing in groundbreaking technologies and interested in working with accelerated Computing environments such as GPU Direct Storage, DPU, and RDMA. You are skilled in building and delivering cloud services, with a specific focus on distributed systems
You will also be eligible for equity and .
These jobs might be a good fit