· Participate in Team activities, Design discussions, Stand up meetings and planning Review with team.
· Perform data analysis, data profiling, data quality and data ingestion in various layers using big data/Hadoop/Hive/Impala queries, PySpark programs and UNIX shell scripts.
· Follow the organization coding standard document, Create mappings, sessions, and workflows as per the mapping specification document.
· Perform Gap and impact analysis of ETL and IOP jobs for the new requirement and enhancements.
· Create jobs in Hadoop using SQOOP, PYSPARK and Stream Sets to meet the business user needs.
· Create mockup data, perform Unit testing, and capture the result sets against the jobs developed in lower environment.
· Updating the production support Run book, Control M schedule document as per the production release.
· Create and update design documents, provide detail description about workflows after every production release.
· Continuously monitor the production data loads, fix the issues, update the tracker document with the issues, Identify the performance issues.
· Performance tuning long running ETL/ELT jobs by creating partitions, enabling full load and other standard approaches.
· Perform Quality assurance check, Reconciliation post data loads and communicate to vendor for receiving fixed data.
· Participate in ETL/ELT code review and design re-usable frameworks.
· Create Remedy/Service Now tickets to fix production issues, create Support Requests to deploy Database, Hadoop, Hive, Impala, UNIX, ETL/ELT, and SAS code to UAT environment.
· Create Remedy/Service Now tickets and/or incidents to trigger Control M jobs for FTP and ETL/ELT jobs on ADHOC, daily, weekly, Monthly, and quarterly basis as needed.
· Model and create STAGE / ODS / Data warehouse Hive and Impala tables as and when needed.
· Create Change requests, workplan, Test results, BCAB checklist documents for the code deployment to production environment and perform the code validation post deployment.
· Work with Hadoop Admin, ETL and SAS admin teams for code deployments and health checks.
· Create re-usable UNIX shell scripts for file archival, file validations and Hadoop workflow looping.
· Create re-usable framework for Audit Balance Control to capture Reconciliation, mapping parameters and variables, serves as single point of reference for workflows.
· Create PySpark programs to ingest historical and incremental data.
· Create SQOOP scripts to ingest historical data from EDW oracle database to Hadoop IOP, created HIVE tables and Impala views creation scripts for Dimension tables.
· Participate in meetings to continuously upgrade the Functional and technical expertise.