Bangalore
6 days ago
Associate III - Production Support

Role Proficiency:

Independently support customer applications by monitoring and resolving the system issues. Guides other associates and assists Lead 1 – Production Support

Outcomes:

      Understand the application/feature/component and issues related to the same from Business users to resolve issues and create required SOPs/Runbooks       Monitor triage and resolve higher severity issues pertaining to systems/applications/infrastructure tools for end users via phone chats and/or via email       Identify the problem patterns and suggest better resolution techniques       Optimise efficiency cost and quality by identifying opportunities for automation/process improvements       Proactively identify issues/defects/flaws in application; take necessary measures to address       Assist Lead 1 – Production Support on troubleshooting and Issue resolution; begin demonstrating Lead 1 capabilities in making critical decisions       Mentor Trainee Associate and Associate I II- Production Support to become more effective in their roles       Act as the technical SME for troubleshooting/resolving any reported production incident/ticket/issue Learn business domain technology and system domain individually and as recommended by the project/account

Measures of Outcomes:

      Adherence to engineering process and standards       Adherence to schedule / timelines       Adhere to SLAs where applicable       # of issues resolved       # of non-compliance issues with respect to SOP       Reduction of reoccurrence of known defects       Quick turnaround of production bugs       Defined productivity standards for the team       # of new runbooks created   # of production jobs automated   # of new monitoring dashboards introduced   Completion of applicable technical/domain certifications Completion of all mandatory training requirements

Outputs Expected:

Issue Resolution:

Analyse and resolves the incidents/tickets within the optimal MTTR (Mean Time To Resolve)


Training:

Attends one on one need-based domain/project/technical trainings as needed Provides need-based training to juniors on the team


Escalation:

Escalate problems to appropriate individuals/support team based on established guidelines and procedures. Where applicable
monitor progress of requests for support and ensure users and other interested parties are kept informed.


Document:

Create documentation for one's own work


Automation:

Identify opportunities for automation/process improvements that help in optimising cost and improving quality


Mentoring:

Mentor juniors on the team Set FAST goals and provide feedback of FAST goals to mentees


Status Reporting:

Report status of tasks assigned Comply with project related reporting standards/process


Manage knowledge:

Absorb and contribute to project related documents
share point
libraries
client universities


Release:

Adhere to release management process

Skill Examples:

      Identify triage and resolve issues reported by customer       Log Monitor and report issues as defined by SLAs       Develop runbooks SOPs and dashboards       Problem solving approach       Manage and guarantee high levels of quality       Team Player       Good written and verbal communication abilities       Proactively ask for help and offer help

Knowledge Examples:

      Appropriate software programs/modules/ tools       Operating Systems and software platforms       Integrated development environment (IDE)       DBMS       Programming Languages       Software life cycle methodology E.g. Agile methods       Knowledge base of customer domain and about sub domain where problem is solved       Proactively ensure the highest levels of systems availability Agile methods

Additional Comments:

Position Overview: The Production Support Engineer is responsible for maintaining, monitoring, and enhancing production environments for various big data systems. The role demands strong expertise in supporting and troubleshooting complex data pipelines, ensuring high availability and performance of systems. You will collaborate with cross-functional teams to address technical issues and provide solutions to enhance system efficiency. Key Responsibilities: • Event Management Monitoring • Monitor in-scope applications and applicable servers • Application Stability, Capacity & Threshold Monitoring • Respond to system s and thresholds; raise incident tickets and Swat as appropriate • Participate on SWAT and Engage all required teams (through TOC) for incident recovery • Manage, monitor and, optimize operations/application performance including recreation of online and batch issues in test environment to identify the root cause • Monitor execution of daily data transfer jobs • Monitor interface files ensuring naming convention is maintained • Notify Source System owners of data quality issues • Monitor Batch Streams, DL/I and batch jobs • Backfill jobs across regions after production/off-cycle releases • Coordinate scheduled region refresh including ad-hoc data copy refresh and perform required job run/data masking as applicable • Send health check status as per Standard Operating Procedure document • Perform proactive restarts • Coordinate with Vendor on Release/changes • Incident Management • Monitor and log incidents in ticketing system (ServiceNow) for Inscope applications when applicable • Determine wherever possible whether an application incident should be treated as a “problem” (e.g., whether preventive action may be necessary to avoid incident recurrence) and, in conjunction with the appropriate Support Level, raise a “problem” record to initiate action • Perform and particpate in root cause analysis for recurring issues and open Problem tickets for permanent fixes where applicable • Coordinate with third party suppliers and other vendors for resolution as appropriate • Document solutions to resolved application incidents in central knowledge management database (Including all information pertinent to trouble ticket - general verbiage, codes, etc.) when applicable • Maintain current and historical records of all application incidents and the resolution of those application incidents for the life of the contract and provide reporting and trend capabilities • Review issues and risks reported and its corrective action plan • Monitoring mailbox for s and notifications • Receive Incident reports from Incident Management system • Act as the single point of contact for In-Scope Application Incident Resolutions • Facilitate the resolution of Incidents • Triage Incident to the appropriate In-Scope Application team • Generate RCA for incidents triaged for In Scope applications • Maintain the status of each Incident • Engage appropriate support teams that provide supporting services to In-Scope applications when necessary • Establish and operate a communication plan with end users for updates on incidents affecting user experience. • Provide on call production support/Incident Resolution • Resolve Incident with In-Scope Applications and perform root cause analysis (where required) • Update the AppLab for Priority 0, Priority 1 and Priority 2 Incidents • Oversee the resolution of P2, P3, P4 Incidents and report status • Manage, monitor and resolve Incidents • Contact Source System owner with data quality issues where applicable • Problem Management: • Conduct Impact analysis of the problem ticket and identify affected systems • Deep Dive on Problems [Through JIRA Tasks] related to Application functional issues • Deep Dive on Problems [Through JIRA Tasks] related to Data Issues • Implement workarounds or permanent fixes resulting from Data Issues (up to 100 hrs) • Perform Advanced Configuration Changes (Under 100 Hours) for COTS, SaaS, and ERP Applications in Scope. • Perform Application Problem Management Services in conformance with defined change management procedures that ELV follows • Track and report on application problems and trends or failures and identify associated consequences of application problems • Ensure follow-up for the tickets raised with the 3rd party/SaaS infrastructure vendors till the resolution. • Create and submit RFC in SNOW when applicable • Provide Level 2/3 support • Preventive Maintanence: • Identify and document recurring problems and raise problem tickets as necessary to reduce future incidents • Conduct proactive trend analysis of application incidents and application problems to identify recurring situations that are or may be indicative of future application problems and points of failure when applicable • Develop and recommend application corrective actions or solutions to address recurring application incidents and application problems or failures, as well as

Confirm your E-mail: Send Email