Bangalore, Karnataka, India
3 days ago
Pyspark Developer- Indore

Job Title: PySpark Developer

Location: Indore

Job Type: Full time

Years of Experience: 4 to 12 years.

Job Description:

We are seeking an experienced PySpark Developer to join our data engineering team. In this role, you will be responsible for designing, developing, and optimizing large-scale data processing pipelines using PySpark and other big data technologies. The ideal candidate will have expertise in distributed computing, data processing frameworks, and working with large datasets in cloud-based or on-premises environments. You will collaborate with data engineers, data scientists, and business analysts to build robust, scalable, and efficient data solutions.

Key Responsibilities:

• Data Processing & Transformation: Design, develop, and implement distributed data processing and transformation workflows using PySpark to handle large-scale datasets across various storage systems (HDFS, S3, etc.).

• ETL Development: Build and manage ETL (Extract, Transform, Load) pipelines using PySpark, integrating data from multiple sources such as databases, flat files, cloud storage, and other data platforms.

• Data Wrangling & Cleansing: Perform data cleaning, data wrangling, and data transformations to ensure the integrity, accuracy, and completeness of the data before feeding it into analytical models or reports.

• Optimization & Performance Tuning: Optimize PySpark jobs for better performance, such as minimizing memory usage, optimizing partitioning, and tuning Spark configurations for faster data processing.

• Collaboration with Data Scientists: Work closely with data scientists to help preprocess large datasets, manage data pipelines, and support machine learning model deployment and experimentation.

• Big Data Technologies Integration: Integrate PySpark with other big data technologies (e.g., Hadoop, Hive, Kafka, NoSQL databases) to process structured and unstructured data in real-time or batch modes.

• Data Modeling: Work with data engineers to design and implement data models that support efficient storage and querying, ensuring data can be leveraged for analytics, BI, and machine learning use cases.

• Testing & Debugging: Ensure the accuracy and reliability of data processing by conducting unit tests, integration tests, and debugging PySpark jobs in a distributed environment.

• Documentation: Create and maintain documentation for PySpark applications, data workflows, and procedures to ensure clarity and knowledge transfer across teams.

• Monitoring & Support: Monitor data pipelines and jobs, ensuring they run efficiently and handle exceptions or errors effectively. Provide support for production systems as needed.

Required Skills and Qualifications:

• PySpark Expertise: Strong experience with PySpark for developing distributed data processing workflows, transformations, and optimizations on large datasets.

• Big Data Frameworks: Proficiency with big data technologies such as Hadoop, Hive, Spark, Kafka, or other distributed processing frameworks.

• Programming Skills: Solid knowledge of Python for data manipulation, scripting, and automating tasks. Familiarity with other languages like Scala or Java is a plus.

• SQL Skills: Proficient in SQL for querying databases and integrating with PySpark to extract and manipulate structured data.

• Data Storage: Experience with cloud storage systems (e.g., Amazon S3, Azure Blob Storage) and distributed file systems (e.g., HDFS).

• Data Processing & Integration: Experience in building data pipelines and integrating disparate data sources for processing, analysis, and reporting.

• Performance Tuning & Troubleshooting: Expertise in optimizing PySpark jobs for performance and troubleshooting issues in a distributed computing environment.

• Cloud Platforms: Experience working with cloud platforms like AWS, Azure, or Google Cloud, specifically their big data offerings (e.g., AWS EMR, Azure Databricks, Google Dataproc).

• Version Control: Familiarity with Git or other version control tools for collaborative development and deployment.

• Problem-Solving: Strong analytical skills with the ability to break down complex problems

Job Summary

We are seeking a highly skilled Sr. Developer with 4 to 8 years of experience to join our team. The ideal candidate will have expertise in Python Databricks SQL Databricks Workflows and PySpark. Experience in Park Operations is a plus. This role involves developing and optimizing data workflows to support our business objectives and enhance operational efficiency.

Responsibilities

Develop and maintain data workflows using Databricks Workflows to ensure seamless data integration and processing. Utilize Python to create efficient and scalable data processing scripts. Implement and optimize SQL queries within Databricks to support data analysis and reporting needs. Leverage PySpark to handle large-scale data processing tasks and improve performance. Collaborate with cross-functional teams to understand business requirements and translate them into technical solutions. Provide technical guidance and support to junior developers to foster skill development and knowledge sharing. Conduct code reviews to ensure code quality and adherence to best practices. Troubleshoot and resolve technical issues related to data workflows and processing. Monitor and optimize the performance of data workflows to ensure they meet business requirements. Develop and maintain documentation for data workflows processes and best practices. Stay updated with the latest industry trends and technologies to continuously improve data processing capabilities. Work closely with stakeholders to gather requirements and provide regular updates on project progress. Ensure data security and compliance with company policies and industry regulations.

Qualifications

Possess strong experience in Python for data processing and automation. Demonstrate expertise in Databricks SQL for data analysis and reporting. Have hands-on experience with Databricks Workflows for data integration. Show proficiency in PySpark for large-scale data processing. Experience in Park Operations is a plus providing valuable domain knowledge. Exhibit excellent problem-solving skills and attention to detail. Display strong communication skills to effectively collaborate with team members and stakeholders. Have a proactive approach to learning and staying updated with new technologies.

Certifications Required

Databricks Certified Associate Developer for Apache Spark Python Certification

The Cognizant community:
We are a high caliber team who appreciate and support one another. Our people uphold an energetic, collaborative and inclusive workplace where everyone can thrive.

Cognizant is a global community with more than 300,000 associates around the world. We don’t just dream of a better way – we make it happen. We take care of our people, clients, company, communities and climate by doing what’s right. We foster an innovative environment where you can build the career path that’s right for you.

About us:
Cognizant is one of the world's leading professional services companies, transforming clients' business, operating, and technology models for the digital era. Our unique industry-based, consultative approach helps clients envision, build, and run more innovative and efficient businesses. Headquartered in the U.S., Cognizant (a member of the NASDAQ-100 and one of Forbes World’s Best Employers 2024) is consistently listed among the most admired companies in the world. Learn how Cognizant helps clients lead with digital at www.cognizant.com

Our commitment to diversity and inclusion:
Cognizant is an equal opportunity employer that embraces diversity, champions equity and values inclusion. We are dedicated to nurturing a community where everyone feels heard, accepted and welcome. Your application and candidacy will not be considered based on race, color, sex, religion, creed, sexual orientation, gender identity, national origin, disability, genetic information, pregnancy, veteran status or any other protected characteristic as outlined by federal, state or local laws.

Disclaimer: 
Compensation information is accurate as of the date of this posting. Cognizant reserves the right to modify this information at any time, subject to applicable law.

Applicants may be required to attend interviews in person or by video conference. In addition, candidates may be required to present their current state or government issued ID during each interview.

Confirm your E-mail: Send Email