Lead Engineer - IT
Guardian Life
Job Description:JOB SUMMARY: Observability Engineer with deep expertise in Zenoss
An Observability Engineer would be a member of the IT team with strong technical understanding of Zenoss. He/ she needs to be aware of techniques used to monitor the health and well-being of growing IT infrastructure. This position will administer, upgrade, maintain and evolve our enterprise monitoring, alerting, reporting, and analysis suite of tools in IT Enterprise division. The overall responsibility of Observability Engineer is to design, configure, implement and maintenance of the Enterprise Management (EMS/NMS) tool suite. The EMS tool Suite provides Compute, cloud, SaaS, Network, Voice and Security monitoring capabilities covering infrastructure components such as data center infrastructure, servers (Linux/Windows/AIX), network equipment, AWS Services, appliances, storage, databases, and applications etc.Qualifications:
An Observability Engineer would be a member of the IT team with strong technical understanding of Zenoss. He/ she needs to be aware of techniques used to monitor the health and well-being of growing IT infrastructure. This position will administer, upgrade, maintain and evolve our enterprise monitoring, alerting, reporting, and analysis suite of tools in IT Enterprise division. The overall responsibility of Observability Engineer is to design, configure, implement and maintenance of the Enterprise Management (EMS/NMS) tool suite. The EMS tool Suite provides Compute, cloud, SaaS, Network, Voice and Security monitoring capabilities covering infrastructure components such as data center infrastructure, servers (Linux/Windows/AIX), network equipment, AWS Services, appliances, storage, databases, and applications etc.Qualifications:
ROLES AND RESPONSIBILITIES:
Zenoss Administration
Administer and manage the Zenoss SaaS environmentConfigure and optimize Zenoss for performance monitoring, event correlation, and alertingImplement custom monitoring solutions, including SNMP, API-based, and agent-based monitoringMaintain ZenPacks, ensuring seamless integration with various IT systemsPerform troubleshooting of Zenoss-related issues and optimize event processingManage user roles, permissions, and integrations with external toolsUnderstands and write transforms for event enrichment and managementPerform administration and life cycle management of other Enterprise Monitoring tools including Splunk and AppDynamicsCollaborate with various internal technical teams to deploy new monitoring and alerting conditionsRespond to and manage trouble tickets related to the Enterprise Monitoring toolsSetting up logging and application performance monitoring using Zenoss and AppDynamicsAnalyze processes, identify weaknesses, develop, and implement improvementsAttend daily operations team meeting; create, maintain and update daily operations status reportsCreate, maintain and update relevant documentation and runbookAbility to conduct independent assessments of technical challenges, and to perform architectural trade-offs, and other analysesMonitoring & Observability Tools
Work with Splunk for log monitoring, dashboarding, and alertingCollaborate on AppDynamics configurations for application performance monitoringAssist in integrating Zenoss with Splunk and AppDynamics for end-to-end observabilityIncident & Problem Management
Ensure proactive monitoring and incident response to minimize downtimeConduct root cause analysis (RCA) for performance issuesProvide recommendations for system improvements based on monitoring insightsREQUIRED QUALIFICATIONS:
Degree or higher with preference on Computer Engineering/ TechnologyTotal of 10 years’ experience with a minimum of 3 years as Zenoss SMEStrong knowledge of Zenoss event processing, ZenPacks, and integrationsExperience with monitoring protocols (SNMP, WMI, API-based, etc.)Familiarity with Splunk (log analysis, dashboards, alerting)Experience with AppDynamics (basic administration and monitoring setup)Scripting skills in Python, Shell, or PowerShell for automationStrong troubleshooting and problem-solving skillsKnowledge of ITIL processes (Incident, Problem, and Change Management) is a plusPREFERRED QUALIFICATIONS:
Experience working in enterprise IT environments with large-scale monitoringExposure to cloud platforms (AWS, Azure) and Kubernetes monitoringLocation:This position can be based in any of the following locations:
GurgaonCurrent Guardian Colleagues: Please apply through the internal Jobs Hub in Workday
Confirm your E-mail: Send Email
All Jobs from Guardian Life