NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.
What you'll be doing:
Develop and manage enterprise-scale platforms that unify storage infrastructure and services, integrating enterprise appliances, networks, and open-source technologies.
Develop and scale REST APIs in Python/Go, enabling thousands of engineers to seamlessly run on-demand storage workflows.
Automate storage operations — provisioning, monitoring, metrics, telemetry, and solving — ensuring high reliability and performance.
Integrate intelligent observability and tracing into workflows to improve accuracy, reduce latency, and optimize efficiency across infrastructure services.
Implement agentic workflows that empower self-healing, proactive remediation, and automation to decrease operational overhead.
Build proof-of-concept integrations between infrastructure services and emerging agentic AI frameworks, laying the foundation for intelligent infrastructure platforms.
Document practices and procedures, evaluate new technologies, and drive adoption of next-gen automation in enterprise storage services.
What we need to see:
BS in Computer Science (or equivalent experience) with 12+ years of relevant experience, MS with 10+ years, or Ph.D. with 8+ years.
Extensive expertise building large-scale, multi-threaded, distributed backend systems.
Experience designing and building RESTful APIs using Python or Go.
Familiarity with containerization & orchestration (Docker, Kubernetes).
Exposure to cloud platforms (AWS, Azure, GCP).
Experience with telemetry stacks (Prometheus, Grafana, Alert manager, ELK/Kibana).
Ability to collaborate across teams and communicate technical solutions effectively.
Growth mindset to quickly adopt new frameworks in observability, AI automation, and infrastructure management.
Ways to stand out from the crowd:
Contributions to open-source projects (infrastructure, storage, or Python-based libraries).
Strong background in Linux storage systems and solving at enterprise scale.
Experience with Enterprise NAS (NetApp, Pure Storage), distributed filesystems (Lustre, GPFS, Ceph), or S3-compatible object storage.
Experience with GenAI/agentic application frameworks (LangChain, LlamaIndex, AutoGen) or observability platforms (LangSmith, Arize Phoenix, W&B Weave).
Proven track record to prototype and productionize intelligent automation workflows, especially for self-service, large-scale infrastructure.
משרות נוספות שיכולות לעניין אותך