Pune, IND
3 days ago
Data Engineer - Senior
**DESCRIPTION** Leads projects for the design, development, and maintenance of a data and analytics platform. Ensures efficient processing, storage, and availability of data for analysts and other consumers. Collaborates with key business stakeholders, IT experts, and subject-matter experts to plan, design, and deliver optimal analytics and data science solutions. Works on one or multiple product teams simultaneously. **Note:- Even though the role is categorized as Remote, it will follow a hybrid work model.** **Key Responsibilities:** + Design and automate the deployment of distributed systems for ingesting and transforming data from various sources (relational, event-based, unstructured). + Develop frameworks for continuous monitoring and troubleshooting of data quality and integrity issues. + Implement data governance processes for metadata management, data access, and retention policies for internal and external users. + Provide guidance on building reliable, efficient, scalable, and quality data pipelines with monitoring and alert mechanisms using ETL/ELT tools or scripting languages. + Design and implement physical data models to define database structures and optimize performance through indexing and table relationships. + Optimize, test, and troubleshoot data pipelines. + Develop and manage large-scale data storage and processing solutions using distributed and cloud-based platforms such as Data Lakes, Hadoop, Hbase, Cassandra, MongoDB, Accumulo, DynamoDB, and others. + Utilize modern tools and architectures to automate common, repeatable, and tedious data preparation and integration tasks. + Drive automation in data integration and management by renovating the data management infrastructure. + Ensure the success of critical analytics initiatives by employing agile development methodologies such as DevOps, Scrum, and Kanban. + Coach and mentor less experienced team members. **RESPONSIBILITIES** **Technical Skills:** + Expert-level proficiency in Spark, including optimization, debugging, and troubleshooting Spark jobs. + Solid knowledge of Azure Databricks for scalable, distributed data processing. + Strong coding skills in Python and Scala for data processing. + Experience with SQL, especially for large datasets. + Knowledge of data formats such as Iceberg, Parquet, ORC, and Delta Lake. + Experience developing CI/CD processes. + Deep understanding of Azure Data Services (e.g., Azure Blob Storage, Azure Data Lake, Azure SQL Data Warehouse, Synapse Analytics, etc.). + Familiarity with data lakes, data warehouses, and modern data architectures. **Competencies:** + **System Requirements Engineering** - Translates stakeholder needs into verifiable requirements, establishing acceptance criteria and assessing the impact of requirement changes. + **Collaborates** - Builds partnerships and works collaboratively with others to meet shared objectives. + **Communicates effectively** - Develops and delivers clear, audience-specific communications. + **Customer focus** - Builds strong customer relationships and delivers customer-centric solutions. + **Decision quality** - Makes timely and informed decisions to keep the organization moving forward. + **Data Extraction** - Performs ETL activities from various sources using appropriate tools and technologies. + **Programming** - Develops, tests, and maintains computer code and scripts to meet business and compliance requirements. + **Quality Assurance Metrics** - Uses IT Operating Model (ITOM) and SDLC standards to assess solution quality. + **Solution Documentation** - Documents solutions for improved productivity and knowledge transfer. + **Solution Validation Testing** - Ensures configuration changes and solutions meet customer requirements. + **Data Quality** - Identifies, understands, and corrects data flaws to enhance information governance. + **Problem Solving** - Uses systematic analysis to identify root causes and implement robust solutions. + **Values differences** - Recognizes and appreciates diverse perspectives and cultures. **QUALIFICATIONS** **Education, Licenses, Certifications:** + Bachelor’s or master’s degree in Computer Science, Information Technology, Engineering, or a related field. **Experience:** + 8+ years of experience in data engineering or a related field, with experience in a leadership role. + Intermediate experience in relevant disciplines is required. + Knowledge of the latest data engineering technologies and trends is preferred, including: + Analyzing complex business systems, industry requirements, and data regulations. + Processing and managing large datasets. + Designing and developing Big Data platforms using open-source and third-party tools. + SPARK, Scala/Java, Map-Reduce, Hive, Hbase, and Kafka or equivalent. + SQL query language. + Cloud-based clustered compute implementation. + Developing applications requiring large file movement in a cloud-based environment. + Building analytical solutions. + Intermediate experience in the following is preferred: + IoT technology. + Agile software development. **Job** Systems/Information Technology **Organization** Cummins Inc. **Role Category** Remote **Job Type** Exempt - Experienced **ReqID** 2411156 **Relocation Package** No
Confirm your E-mail: Send Email