Bangalore
38 days ago
Data Engineering Lead - AWS Glue & PySpark Specialist

We are seeking a skilled and experienced Data Engineer Lead to join our team. The ideal candidate will have expertise in Apache Spark, PySpark, Python, and AWS services (particularly AWS Glue). You will be responsible for designing, building, and optimizing ETL processes and data workflows in the cloud, specifically on the AWS platform. Your work will focus on leveraging Spark-based frameworks, Python, and AWS services to efficiently process and manage large datasets.

Key Responsibilities: Spark & PySpark Development: Design and implement scalable data processing pipelines using Apache Spark and PySpark to support large-scale data transformations. ETL Pipeline Development: Build, maintain, and optimize ETL processes for seamless data extraction, transformation, and loading across various data sources and destinations. AWS Glue Integration: Utilize AWS Glue to create, run, and monitor serverless ETL jobs for data transformations and integrations in the cloud. Python Scripting: Develop efficient, reusable Python scripts to support data manipulation, analysis, and transformation within the Spark and Glue environments. Data Pipeline Optimization: Ensure that all data workflows are optimized for performance, scalability, and cost-efficiency on the AWS Cloud platform. Collaboration: Work closely with data analysts, data scientists, and other engineering teams to create reliable data solutions that support business analytics and decision-making. Documentation & Best Practices: Maintain clear documentation of processes, workflows, and code while adhering to best practices in data engineering, cloud architecture, and ETL design. Required Skills: Expertise in Apache Spark and PySpark for large-scale data processing and transformation. Hands-on experience with AWS Glue for building and managing ETL workflows in the cloud. Strong programming skills in Python, with experience in data manipulation, automation, and integration with Spark and Glue. In-depth knowledge of ETL principles and data pipeline design, including optimization techniques. Proficiency in working with AWS services, such as S3, Glue, Lambda, and Redshift. Strong skills in writing optimized SQL queries, with a focus on performance tuning. Ability to translate complex business requirements into practical technical solutions. Familiarity with Apache Airflow for orchestrating data workflows. Knowledge of data warehousing concepts and cloud-native analytics tools.
Confirm your E-mail: Send Email