Pune, IND
22 hours ago
Data Engineer
**DESCRIPTION** Although the role category specified in the GPP is Remote, the requirement is for Hybrid. Key Responsibilities: + Implement and automate deployment of distributed systems for ingesting and transforming data from various sources (relational, event-based, unstructured). + Continuously monitor and troubleshoot data quality and integrity issues. + Implement data governance processes and methods for managing metadata, access, and retention for internal and external users. + Develop reliable, efficient, scalable, and quality data pipelines with monitoring and alert mechanisms using ETL/ELT tools or scripting languages. + Develop physical data models and implement data storage architectures as per design guidelines. + Analyze complex data elements and systems, data flow, dependencies, and relationships to contribute to conceptual, physical, and logical data models. + Participate in testing and troubleshooting of data pipelines. + Develop and operate large-scale data storage and processing solutions using distributed and cloud-based platforms (e.g., Data Lakes, Hadoop, Hbase, Cassandra, MongoDB, Accumulo, DynamoDB). + Use agile development technologies, such as DevOps, Scrum, Kanban, and continuous improvement cycles, for data-driven applications. **RESPONSIBILITIES** **Qualifications:** + College, university, or equivalent degree in a relevant technical discipline, or relevant equivalent experience required. + This position may require licensing for compliance with export controls or sanctions regulations. **Competencies:** + **System Requirements Engineering:** Uses appropriate methods and tools to translate stakeholder needs into verifiable requirements. + **Collaborates:** Building partnerships and working collaboratively with others to meet shared objectives. + **Communicates effectively:** Developing and delivering multi-mode communications that convey a clear understanding of the unique needs of different audiences. + **Customer focus:** Building strong customer relationships and delivering customer-centric solutions. + **Decision quality:** Making good and timely decisions that keep the organization moving forward. + **Data Extraction:** Performs ETL activities from various sources and transforms them for consumption by downstream applications and users. + **Programming:** Creates, writes, and tests computer code, test scripts, and build scripts using industry standards and tools. + **Quality Assurance Metrics:** Applies measurement science to assess whether a solution meets its intended outcomes. + **Solution Documentation:** Documents information and solutions based on knowledge gained during product development activities. + **Solution Validation Testing:** Validates configuration item changes or solutions using defined best practices. + **Data Quality:** Identifies, understands, and corrects flaws in data to support effective information governance. + **Problem Solving:** Solves problems using systematic analysis processes and industry-standard methodologies. + **Values differences:** Recognizing the value that different perspectives and cultures bring to an organization. **QUALIFICATIONS** **Knowledge/Skills:** **Must-Have:** + 3-5 years of experience in data engineering with a strong background in Azure Databricks and Scala/Python. + Hands-on experience with Spark (Scala/PySpark) and SQL. + Experience with SPARK Streaming, SPARK Internals, and Query Optimization. + Proficiency in Azure Cloud Services. + Agile Development experience. + Unit Testing of ETL. + Experience creating ETL pipelines with ML model integration. + Knowledge of Big Data storage strategies (optimization and performance). + Critical problem-solving skills. + Basic understanding of Data Models (SQL/NoSQL) including Delta Lake or Lakehouse. + Quick learner. **Nice-to-Have:** + Understanding of the ML lifecycle. + Exposure to Big Data open-source technologies. + Experience with SPARK, Scala/Java, Map-Reduce, Hive, HBase, and Kafka. + SQL query language proficiency. + Experience with clustered compute cloud-based implementations. + Familiarity with developing applications requiring large file movement for a cloud-based environment. + Exposure to Agile software development. + Experience building analytical solutions. + Exposure to IoT technology. **Experience:** + Relevant experience preferred, such as working in temporary student employment, internships, co-ops, or other extracurricular team activities. + Knowledge of the latest technologies in data engineering is highly preferred, including: + Exposure to Big Data open source + SPARK, Scala/Java, Map-Reduce, Hive, Hbase, and Kafka or equivalent college coursework + SQL query language + Clustered compute cloud-based implementation experience + Familiarity with developing applications requiring large file movement for a cloud-based environment + Exposure to Agile software development + Exposure to building analytical solutions + Exposure to IoT technology **Work Schedule:** Most of the work will be with stakeholders in the US, with an overlap of 2-3 hours during EST hours on a need basis. **Job** Systems/Information Technology **Organization** Cummins Inc. **Role Category** Remote **Job Type** Exempt - Experienced **ReqID** 2410604 **Relocation Package** No
Confirm your E-mail: Send Email