Data Engineer
IBM
Data Engineer in IBM's CIO organization, supporting a data warehouse that consolidates IBM’s global real estate data into IBM’s Cognitive Enterprise Data Platform (CEDP), including integrations with enterprise systems (TRIRIGA, Maximo, Envizi, and others). Data includes comprehensive information about IBM’s global internal real estate portfolio, including properties, space, leases, energy consumption, construction/renovation projects, and environmental compliance.
Responsibilities:Manage integrations for data ingestion from multiple source systems via APIs, queries, and Apache Spark and Airflow workflows. Launch Spark jobs in Airflow as needed.Support data transformations and aggregations in a Cloud Object Storage (COS) integration zone, and subsequent data feeds to a DB2 warehouse.Maintain a Cirrus platform hosting Cloud Object Storage, along with inbound & outbound data flows. Initiate activities such as opening firewall flows, defining entitlements, and managing roles & user access as needed. Identify and perform other activities required to ensure reliable operation of the cluster, Identify and promptly address issues with data and integrations. Implement and optimize monitoring, and troubleshoot errors through in-depth reviews of logs, code, data, and integration components.Document and share data architectures and flows (including schemas, tables, queries, and scheduled activities) with data analysts and Cognos developers.Develop new capabilities for data ingestion and transformation as needed. Create algorithms, develop new transfers using Spark/Airflow, APIs, SQL, etc. Perform comprehensive testing of individual components as well as end-to-end solution.
Responsibilities:Manage integrations for data ingestion from multiple source systems via APIs, queries, and Apache Spark and Airflow workflows. Launch Spark jobs in Airflow as needed.Support data transformations and aggregations in a Cloud Object Storage (COS) integration zone, and subsequent data feeds to a DB2 warehouse.Maintain a Cirrus platform hosting Cloud Object Storage, along with inbound & outbound data flows. Initiate activities such as opening firewall flows, defining entitlements, and managing roles & user access as needed. Identify and perform other activities required to ensure reliable operation of the cluster, Identify and promptly address issues with data and integrations. Implement and optimize monitoring, and troubleshoot errors through in-depth reviews of logs, code, data, and integration components.Document and share data architectures and flows (including schemas, tables, queries, and scheduled activities) with data analysts and Cognos developers.Develop new capabilities for data ingestion and transformation as needed. Create algorithms, develop new transfers using Spark/Airflow, APIs, SQL, etc. Perform comprehensive testing of individual components as well as end-to-end solution.
Confirm your E-mail: Send Email
All Jobs from IBM