Responsibilities:
- Implement ELK based monitoring
- Implement Loki logging
- Implement alert generation with ElastAlert and Prometheus Alert Manager
- Develop service dashboards (Kibana/Grafana).
- Develop custom scripts for monitoring
Qualifications:
- B.S. in Computer Science or equivalent.
- 4+ years of experience as a software engineer.
- Working knowledge of ELK stack, Prometheus, Loki, and Grafana
- Working knowledge of statistical functions used for real-time monitoring (eg averages, rate of change, etc.)
- Nagios and SolarWinds experience is a plus
- Strong programming skill with Python.
- Solid understanding/experience of web services, databases, networking, and related infrastructure/architectures as they related to monitoring and alerting.
- Experience with Google Cloud Platform.
- Excellent Troubleshooting Skills.
- Ability to creatively solve complex problems
- Raspberry PI skills is a plus
- Must be able to work independently from requirements and verbal instructions