Perform deep-dive debugging of multi-rack, multi-tenant clusters: scheduler behavior, container runtime issues, device-plugin crashes, RDMA/IB fabric anomalies, etc. Gather customer requirements and prototype feature extensions for Kubernetes operators, Slurm plugins,...