Key Responsibilities
· Own the design, deployment, and lifecycle management of the Splunk Enterprise platform, including indexer and search head clustering, forwarders, and knowledge objects.
· Define and implement best practices for data onboarding, parsing, enrichment, and storage to support observability use cases.
· Collaborate with infrastructure, DevOps, security, and application teams to build reliable, scalable observability solutions.
· Develop advanced SPL searches, correlation rules, alerts, and performance dashboards.
· Improve alert quality and reduce noise through smarter event correlation and visualization.
· Drive observability maturity initiatives including logging standardization, automation, and self-service access to telemetry data.
· Evaluate and integrate additional observability and monitoring tools (e.g., Prometheus, Grafana, LogicMonitor, AppDynamics, Dynatrace, etc.) to complement existing capabilities.
· Lead troubleshooting and incident response efforts where visibility and telemetry data are required.
· Mentor junior engineers and influence platform and observability architecture decisions.
Qualifications
· 8–12 years of progressive experience in observability, infrastructure monitoring, or SRE roles.
· Minimum 7 years of direct hands-on experience with Splunk Enterprise at enterprise scale.
· Deep knowledge of Splunk architecture, including clustering, ingestion pipelines, search performance tuning, and index lifecycle policies.
· Advanced proficiency with SPL (Search Processing Language) and dashboarding.
· Experience building and scaling log pipelines using technologies such as syslog, Fluentd, Logstash, Cribl, etc.
· Familiarity with cloud platforms (AWS, Azure, or GCP) and hybrid infrastructure environments.
· Experience working with configuration management and infrastructure-as-code tools (e.g., Terraform, Ansible).
· Excellent collaboration, problem-solving, and communication skills.
Must Have
· Splunk certifications (e.g., Certified Architect, Consultant, or Admin).
· Experience with APM and tracing tools (e.g., OpenTelemetry, Jaeger, New Relic, etc.).