- - - - - - - - - - - -
KEY EXPECTED ACHIEVEMENTS:
Incident Management:
Track and manage the status of major incidents, ensuring timely updates and communication to stakeholders.
Minimize business impact by ensuring efficient incident resolution through coordination with the appropriate support teams.
Monitor adherence to SLAs, ensuring incidents are resolved within agreed timelines.
Provide clear and concise updates to senior leadership on the status and progress of major incidents.
Problem Management:
Drive root cause analysis (RCA) quality to prevent recurrence of incidents.
Ensure thorough documentation of problem records and RCAs, following industry best practices.
Monitor and validate the implementation of corrective and preventive actions.
Process Improvement:
Continuously assess and improve incident and problem management processes to enhance efficiency and effectiveness.
Develop and implement best practices, leveraging ITIL frameworks where applicable.
Identify trends and patterns in incidents and problems and recommend proactive solutions.
Collaboration:
Act as the primary point of contact for major incidents, coordinating with cross-functional teams and external partners.
Collaborate with teams across different time zones to ensure seamless resolution of incidents.
Foster strong relationships with internal and external stakeholders, including vendors and third-party support teams.
24x7 Incident Support:
Ensure 24x7 availability to manage critical incidents, leveraging and coordinating with dedicated support teams.
Establish and maintain an on-call schedule to address major incident escalations promptly.
Reporting and Metrics:
Develop and present incident and problem management performance reports, highlighting trends and areas for improvement.
Track and report on KPIs, including mean time to resolution (MTTR) and first-time fix rates.
Required Technical Skills:
Strong knowledge of ITIL framework (certification preferred).
Proficiency in incident and problem management tools such as ServiceNow, Remedy, or similar platforms.
Experience with root cause analysis techniques and tools.
Familiarity with infrastructure technologies, including networking, servers, databases, and cloud environments.
Knowledge of monitoring and alerting tools like Splunk, Dynatrace, or SolarWinds.
Understanding of cybersecurity principles and their impact on incident resolution.
Ability to analyze and interpret technical data to identify trends and patterns.
Availability
Flexibility to work 3-4 days from the office while managing cross-country collaboration remotely.
Availability to oversee and coordinate 24x7 support for major incidents.