CLOUD INFRA AND PLATFORM MONITORING
As a Cloud Infrastructure and Platform Monitoring Analyst, you will be responsible for monitoring cloud infrastructure, responding to alerts, and ensuring platform stability. Your primary role will involve real-time incident detection, initial troubleshooting, and escalation for cloud and application environments. You will collaborate with L2/L3 support and engineering teams to ensure compliance with SLAs, improve monitoring workflows, and enhance platform observability
Your key responsibilities
- Monitor cloud infrastructure and platform health using predefined observability tools
- Acknowledge and validate alerts from monitoring tools and determine the next course of action
- Follow standard operating procedures (SOPs) and runbooks for initial triage before escalating
- Reassign unresolved incidents and requests in ITSM tool to the correct product teams
- Ensure accurate ticket documentation with timestamps, initial analysis, and handoff details
- Communicate clearly with L2/L3 teams and stakeholders to facilitate smooth issue escalation
- Perform shift handovers with detailed updates to ensure continuity in monitoring
Skills and attributes for success
- Basic understanding of cloud services such as AWS (EC2, S3, IAM, Secrets Manager).
- Familiarity with observability tools like Prometheus, Grafana, Datadog, OpenTelemetry (OTEL), Splunk, AWS CloudWatch.
- Ability to strictly follow SOPs and predefined triage steps without deviation.
- Strong attention to detail for accurate documentation and handoffs.
- Good communication skills to effectively collaborate with L3 teams and stakeholders
To qualify for the role, you must have
- 0-2 years of experience in cloud infrastructure or application monitoring.
- Basic understanding of cloud platform - AWS and operating systems (Windows/Linux).
- Knowledge of monitoring tools like Azure Monitor, AWS CloudWatch, or third-party tools such as New Relic, Datadog, or Grafana.
- Experience working with ITSM tools.
- Basic knowledge of DevOps tools such as Jenkins, Kubernetes, and Git.
Must haves
- Knowledge of any cloud platform with preference to Azure.
- Experience with monitoring tools such as CloudWatch, Datadog, Splunk, and Grafana.
- Ability to respond, troubleshoot, and resolve Cloud Infrastructure and Platform-level issues/alerts.
- ITSM tool: Any (with preference for ServiceNow tool)
- Strong written and verbal communication skills to document and convey technical issues effectively.
- Presentation skills to communicate updates, persuade stakeholders, and facilitate discussions at different levels.
- Interpersonal skills to establish and maintain effective working relationships across teams and stakeholders.
Good to have
- Any cloud certification preferably AWS
- Containerization & Orchestration: Docker, Kubernetes (EKS).
- Log Analysis & Alert Handling: ELK Stack (Elasticsearch, Logstash, Kibana), Cribl
- Infrastructure as Code (IaC) & Configuration Management: Helm, GitOps Framework.
- CI/CD & Source Control: ArgoCD, Harness.io.
- Understanding of DevOps tools such as Jenkins and Infrastructure-as-Code (IaC) tools like Terraform and Ansible.
What we look for
- Enthusiastic learners with a passion for cloud technologies and DevOps practices.
- Problem solvers with a proactive approach to troubleshooting and optimization.
- Team players who can collaborate effectively in a remote or hybrid work environment.
- Detail-oriented professionals with strong documentation skills.
What we offer
EY Global Delivery Services (GDS) is a dynamic and truly global delivery network. We work across six locations – Argentina, China, India, the Philippines, Poland and the UK – and with teams from all EY service lines, geographies and sectors, playing a vital role in the delivery of the EY growth strategy. From accountants to coders to advisory consultants, we offer a wide variety of fulfilling career opportunities that span all business disciplines. In GDS, you will collaborate with EY teams on exciting projects and work with well-known brands from across the globe. We’ll introduce you to an ever-expanding ecosystem of people, learning, skills and insights that will stay with you throughout your career.
- Continuous learning: You’ll develop the mindset and skills to navigate whatever comes next.
- Success as defined by you: We’ll provide the tools and flexibility, so you can make a meaningful impact, your way.
- Transformative leadership: We’ll give you the insights, coaching and confidence to be the leader the world needs.
- Diverse and inclusive culture: You’ll be embraced for who you are and empowered to use your voice to help others find theirs.