Job Title: Data Engineer
Location: Trivandrum
---
Job Summary:
We are seeking a skilled and motivated Data Engineer to join our dynamic team. In this role, you will be responsible for designing, developing, and maintaining scalable data pipelines using Apache Spark and PySpark. You will implement and orchestrate data workflows using Apache Airflow, optimize data storage and retrieval processes with your SQL and NoSQL expertise, and collaborate closely with data scientists and analysts to deliver robust analytical solutions.
---
Key Responsibilities:
- Data Pipeline Development: Design, develop, and maintain scalable data pipelines using Apache Spark and PySpark to process large volumes of data efficiently.
- Workflow Orchestration: Implement and manage data workflows and orchestration using Apache Airflow to ensure reliable and efficient execution of data pipelines.
- Data Optimization: Optimize data storage and retrieval processes, leveraging strong SQL skills and a solid understanding of NoSQL databases (e.g., MongoDB, Cassandra).
- Collaboration: Work closely with data scientists and analysts to understand data requirements and deliver solutions that meet analytical needs.
- Monitoring & Troubleshooting: Monitor and troubleshoot data pipelines to ensure data quality, reliability, and performance.
- Data Modeling: Design and implement data models and schemas that support business requirements and ensure data consistency.
- Technology Evaluation: Evaluate and recommend new technologies and tools to enhance data processing and analytics capabilities.
- Documentation: Document technical designs, procedures, and operating processes related to data engineering activities.
---
Qualifications:
- Education: Bachelor’s degree in Computer Science, Engineering, Mathematics, or a related field. Advanced degree preferred.
- Experience: Proven experience in data engineering roles, with hands-on expertise in Apache Spark, PySpark, Apache Airflow, SQL, and NoSQL databases (e.g., MongoDB, Cassandra).
- Technical Skills:
- Strong proficiency in Apache Spark and PySpark.
- Experience with Apache Airflow for workflow orchestration.
- Advanced SQL skills and knowledge of NoSQL databases.
- Familiarity with data modeling and schema design.
- Ability to troubleshoot and optimize data pipelines.
- Soft Skills:
- Excellent problem-solving and analytical abilities.
- Strong communication skills and the ability to collaborate effectively with cross-functional teams.
- Attention to detail and a commitment to data quality and performance.
- Additional Skills: Familiarity with cloud platforms (e.g., AWS, Azure, Google Cloud) and data warehousing solutions is a plus.