Role Proficiency:
Resolve L1 Incident and service requests within agreed SLA
Outcomes:
1) Monitor customer infrastructure using tools or defined SOPs to identify failures and mitigate the same by raising tickets with defined priority and severity2) Update SOP with updated troubleshooting instructions and process changes3) Mentor new team members in understanding customer infrastructure and processes4) Perform analysis for driving incident reduction5) Resolve L1 incidents and service requestsMeasures of Outcomes:
1) SLA Adherence2) Compliance towards runbook based troubleshooting process3) Time bound elevations and routing of tickets – OLA Adherence4) Schedule Adherence in managing ticket backlogs5) # of NCs in internal/external audits6) Number of KB changes suggested7) Production readiness of new joiners within agreed timeline by one-on-one mentorship8) % Completion of all mandatory training requirements9) Number of tickets reduced by analysis 10) Number of installation SR handled for endpoints / change tasks completed for infrastructure 11) Number of L1 tickets closedOutputs Expected:
Monitoring:
Understand Priority and Severity based on ITIL practice. Understand agreed SLA with customer and adhere. Repetitive analysis for finding high ticket generating Cis. Adhere to ITIL best practices
Runbook Reference/Change:
record troubleshooting steps and provide inputs for runbook changes.
Escalation/Elevation/Routing of tickets:
L2
L3 etc)
adhere to OLA
route the tickets to relevant queue
initiate intimation respective teams/customer based on defiled process.
Tickets Backlog/Resolution:
manage ticket backlogs/last activity as per defined process. Resolve incidents and SRs within agreed timelines. Execute change tasks for infrastructure.
Collaboration:
document learnings for self-reference. Close/resole L1 tickets with help from respective tower. Actively participate in team/organization-wide initiatives.
Installation:
Stakeholder Management:
Process Adherence:
Training:
Performance Management:
track
report and seek continues feedback from peers and manager. Set goals and provide feedback for mentees. Assist new team members to understand the customer environment.
Skill Examples:
1) Good communication skills (Written verbal and email etiquette) to interact with different teams and customers2) Networking:a. Good in Monitoring tools and Device back up schedulingb. Basic DHCP and DNS configuration in routers and switchesc. Basic troubleshooting skills in ‘show ip route’ ‘sh mac address-table’ etcd. Static and dynamic IP routing protocols basics3) Server:a. Basic to intermediate powershell / BASH/Python scripting skillsb. Manual patch of QA serverc. Analyse space s from a server and engage Capacity Mgmt. team for disc expansion4) Storage and Back upa. Ability to handle Storage and Backup issues independentlyb. Ability to handle Vendor management Device management Storage array managementc. Perform Hardware upgrades firmware upgrades Vulnerability remediationd. Ticket analysis Storage and backup Performance management various trouble shootings5) Database:a. Patching and upgrading the DB server and application toolsb. Tweak queries making them run as fast as possiblec. Logical and Physical Schema design (indexing constraints partitioning etc.)d. Ability to visualize debug the end-to-end flow of business transaction model and applicationse. DB migration export/importKnowledge Examples:
1) Fair understanding of customer infrastructure ability to co-relate failures
2) Monitoring knowledge in infrastructure tools3) Networkinga. IP addressing and Subnetting knowledgeb. Preferably certified in Cisco's basic certification trackc. IOS upgradation knowledge and IOS patching knowledge4) Servera. Intermediate level knowledge in active directory DNS DHCP DFS IIS patch managementb. Strong knowledge in backup tools such as Veritas/Commvault/Windows backup storage concepts etcc. Strong Virtualization and basic cloud knowledged. AD group policy management group policy tools and troubleshooting GPO se. Basic AD object creation DNS concepts DHCP DFSf. Knowledge with tools like SCCM SCOM administration5) Storage and Backupa. In depth knowledge in Storage & Backup technology Storage allocation and reclamation Backup policy creation and managementb. Strong knowledge in server Network and virtualization technologies6) Toola. Knowledge in Infrastructure and application technologiesb. Understanding of monitoring concepts and processc. Understanding of key network monitoring protocols including SNMP NetFlow WMI syslog etcd. Knowledge in administration of tools like SCOM Solarwinds CA UIM Nagios ServiceNow etc7) Monitoringa. Good understanding of networking concepts and protocolsb. Knowledge in Server backup storage technologiesc. Desirable to have knowledge in SQL scriptingd. Knowledge in ITIL process8) Database:a. Knowledge of Database security9) Quality Analysisa. Exposure to FMEA audit practicesb. Exposure to technology/processes as per audit requirements.10) Working knowledge of MS Excel Word PPT Outlook etc.Additional Comments:
Mandatory Skills: data pipeline, Aws Sagemaker, Aws Glue, aws, python, Aws Lambda, Amazon Ec2, deployment, Athena, devops, ansible, web application, Aws Cloud, Aws Cloudformation, Terraform, Boto3, Code Pipeline, Code Deploy Skill to Evaluate: AWS,DataPipeLine,CloudInfra,Glue,Python,Pyspark,Athena,ETL,DevOps,CodePipeline Experience: 4 to 6 Years Location: Bengaluru Job Description: We are seeking an experienced Data Pipeline & Cloud Infrastructure Engineer to join our team. The ideal candidate will be responsible for building and maintaining robust data pipelines, managing cloud infrastructure (primarily EC2 and S3), supporting machine learning models, and ensuring smooth operations for web applications and analytics systems. You will work closely with data scientists and various teams to resolve issues, handle deployments, and maintain a secure environment Education Qualificaiton: Bachelor OF Engineering Roles & Responsibilities: Key Responsibilities: Data Pipeline Development & Maintenance Build and maintain data pipelines for data enrichment and loading into base tables for ML model integration. Leverage AWS services (Glue, Athena, S3) and Python (90%) + PySpark (10%) to ensure efficient data flow. Implement error handling and s with appropriate notifications. Collaborate with SRI Data Scientists to troubleshoot and resolve data-related issues. Workload Management Maintain EC2 workloads, ensuring security, setting up s, managing Linux environments, and handling deployments for machine learning models. Web Application Hosting Host web applications on EC2 for analytics purposes using Python and the Streamlit framework. Manage Linux-based EC2 instances, Elastic Load Balancers (ELB), and SSL/TLS certificates. Reporting & Monitoring Generate and maintain various reports including billing, bill consolidation, data stats, and daily activity reports. Adhoc Research Workloads Provision EC2 instances for short-term, ad-hoc research tasks by external users (e.g., interns). Manage the lifecycle of such instances, including provisioning, access, and decommissioning. Sagemaker User Management Add users and manage domains within AWS SageMaker (permissions and infrastructure support). No direct workload management but provide support related to infrastructure issues (e.g., access permissions). Network & Access Management Manage Virtual Private Clouds (VPCs) and access controls. Provide VPN access on an ad-hoc basis and handle IP whitelisting requests. CI/CD & Infrastructure Deployment Maintain CloudFormation (CFN) templates and deploy infrastructure for machine learning workflows. Manage the version control of the ML codebase through Git. Qualifications: Strong experience with AWS services (Glue, Athena, S3, EC2, VPCs). Proficiency in Python (90%) and PySpark (10%) for data processing and automation. Experience with web application hosting, particularly with Python/Streamlit framework. Familiarity with CI/CD pipelines, CloudFormation, and version control (Git). Experience with machine learning infrastructure and tools, particularly SageMaker, is a plus. Strong troubleshooting skills and the ability to collaborate with cross-functional teams. Preferred Skills: Linux system administration experience. Experience with security best practices, monitoring, and ing on cloud infrastructure. Knowledge of networking concepts (VPCs, VPN, IP whitelisting). Experience managing billing and report generation tasks. Additional Information: Flexible working hours based on project deadlines. Collaborative team environment with an opportunity to work with data scientists and engineers across various domains.