Role description
About the Role
We are seeking a skilled Monitoring & Observability Engineer with hands-on experience in enterprise-grade monitoring and observability platforms such as LogicMonitor, BigPanda, AppDynamics, and other performance monitoring tools.
This role plays a key part in ensuring system reliability, performance, and proactive incident management across our infrastructure and applications. The ideal candidate will contribute to the design, implementation, and continuous improvement of our monitoring ecosystem.
Key Responsibilities
Manage and support monitoring, ing, and observability platforms for infrastructure and applications.
Configure and maintain integrations with tools like LogicMonitor, AppDynamics, BigPanda, and others to ensure full-stack visibility.
Collaborate with DevOps, application, and infrastructure teams to define monitoring KPIs, SLAs, and thresholds.
Onboard new services and systems into the monitoring ecosystem, ensuring complete observability from day one.
Implement intelligent ing, event correlation, and noise reduction strategies to improve incident response.
Automate monitoring configurations, s, and dashboards using scripting (Python, PowerShell) and REST APIs.
Perform root cause analysis and contribute to performance tuning efforts using telemetry and trend data.
Create and maintain operational dashboards, reports, and health checks to support proactive IT operations.
Ensure all monitoring tools are secure, up to date, and optimized for performance.
Document monitoring configurations and processes; provide knowledge-sharing to other teams.
Qualifications
Bachelor’s degree in Computer Science, Information Systems, or a related field.
3–5 years of experience in IT operations, systems monitoring, or infrastructure observability.
Hands-on experience with at least two monitoring tools such as:
LogicMonitor, AppDynamics, BigPanda, Dynatrace, Splunk, Datadog, etc.Solid understanding of infrastructure, APM, synthetic, log, and event monitoring concepts.
Experience with event correlation and noise reduction strategies.
Familiarity with ITIL processes, especially related to incident and problem management.
Strong scripting and automation skills using Python, PowerShell, or REST APIs.
Experience with cloud environments (AWS, Azure, GCP) and related monitoring integrations is a plus.
Strong analytical, problem-solving, and communication skills.
Preferred Certifications
LogicMonitor Certified Professional
AppDynamics Certified Performance Analyst
BigPanda Practitioner Certification
ITIL Foundation Certification
AWS / Azure Cloud Practitioner or Associate-level Certification
Skills
Network Monitoring,Security Monitoring,Itsm