- Radical customer centricity
- Ownership-driven culture
- Keeping everything simple
- Long-term thinking
- Complete transparency
- Bridging the gaps b/w core infra, security, QA and development team.
- Managing application deployment & GKE platforms - automate and improve development and release processes.
- Working with the Dev team to understand the application architecture and its bottlenecks in-depth.
- Creating, managing and maintaining data stores & data platform infra using IaC.
- Owning the end-to-end Availability, Performance, and Capacity of applications and their infrastructure and creating/maintaining the respective observability with Prometheus/New Relic/ELK/Loki.
- Creating, managing and maintaining the Internal infrastructure platform which manages the CI/CD process, data stores, Kubernetes etc
- Providing 24X7 infra & app support, building processes and documenting "tribal" knowledge around the same time.
- Managing the SLO/Error Budgets/Alerts of the internal infrastructure platform.
- Working with Core Infra, Dev and Product teams to implement the platform.
- Managing outages, doing detailed RCA with developers, and identifying ways to avoid that situation.
- Mentor and train L1 engineers and continually improve app and infra support processes.
- Automate toil and repetitive work
- 5 to 8 Years of experience in managing high traffic, large-scale microservices and infrastructure.
- Experience in troubleshooting, managing and deploying containerised environments using Docker/containerd, and Kubernetes is a must.
- Must be proficient with the helm with experience in service mesh like Istio/Linkerd.
- Must have hands-on experience in managing and troubleshooting the Kubernetes environment.
- Extensive experience with Linux administration and understanding of the various Linux kernel subsystems (memory, storage, network etc).
- Extensive experience in DNS, TCP/IP, Routing and Load Balancing.
- Expertise in GitOps, Infrastructure as a Code tool such as Terraform, Pulumi, Crossplane etc. and Configuration Management Tools such as Chef, Puppet, Saltstack, and Ansible.