Tampa, FL, USA
8 days ago
IT - Technology Lead | Big Data - Data Processing | Spark
Job Seekers, Please send resumes to resumes@hireitpeople.com

Detailed Job Description:

5+ years total experience in development on Bigdata, Hive & Hadoop, Spark, Scala, Python/Pyspark, AWS, and other cloud related technologies. Independent/lead developer who can work with minimal supervision. Solid understanding of distributed system fundamentals. Solid understanding of hadoop security and familiar with kerberos/keytabs etc and hands on experience with working with Spark/Hive/Oozie/Kafka etc on a kerberized cluster. Experience in developing, troubleshooting, diagnosing, and performance tuning of distributed batch & real-time data pipelines using Spark/PySpark at scale. Develop scalable and reliable data solutions to move data across systems from multiple sources in real time (Nifi, Kafka) as well as batch modes (Sqoop) Demonstrated professional experience working with various components of Big Data ecosystem: Spark/Spark Streaming, Hive, Kafka/KSQL, Hadoop (or similar NoSQL ecosystem) and orchestrate these pipelines using oozie, et. al, in a production system. Construct data staging layers and fast real-time systems to feed BI applications and machine learning algorithms. Strong software engineering skills with Python or Scala/Java. Knowledge of some flavor of SQL (MySQL, Oracle, Hive, Impala), including the fundamentals of data modeling and performance. Skills in real-time streaming applications. Develop scalable and reliable data solutions to move data across systems from multiple sources in real time (Nifi, Kafka) as well as batch modes (Sqoop). Experienced in Data Engineering with good understanding of Datawarehouse, Data Lake, Data Modelling, Parsing, Data wrangling, Cleansing & Transformation, and sanitizing. Agile work experience, build CI/CD pipelines using Jenkins, GIT, Artifactory, Anisble etc. Hands-on Development experience with Scala, Python using Spark 2.0, Spark Internals and Spark jobs performance improvement. Good understanding of Yarn, Spark UI, Spark resource management and Hadoop resource management and efficient Hadoop storage mechanisms. Good understanding & experience with Performance tuning in Cloud environment for complex S/W projects mainly around large scale and low latency. AWS knowledge is essential with good working experience in AWS Technologies EMR, S3, Cluster management, AWS Airflow automation, Snowflake Knowledge is plus. AWS development certification/Spark certifications is an advantage. Expert in data analysis in Python (Numpy, Scipy, Scikit-learn, Pandas, etc.) Strong UNIX Shell scripting experience to support data warehousing solutions. Process oriented, focused on standardization, streamlining, and implementation of best practices delivery.  Excellent problem solving and analytical skill, excellent verbal and written communication skills. Proven teamwork in multi-site/multi-geography organizations. Ability to multi-task and function efficiently in a fast-paced environment. Strong background in Scala or Java and experience with streaming technologies such as Flink, Kafka, Kinesis, and Firehose ,experience with EMR, Spark, Parquet, and Airflow. Excellent interpersonal skills, ability to handle ambiguity and learn quickly. Exposure to data architecture & governance is helpful. A degree in Computer Science or a related technical field; or equivalent work experience

Minimum years of experience*: 5+

Confirm your E-mail: Send Email