Data Engineer

Company: Horizontal Talent
Location: Copperas Cove
Posted on: May 3, 2021

Job Description:


As a Data Engineer, you will work cross-functionally with data scientists, data analysts, product managers, and stakeholders to understand business needs and develop, maintain and optimize the data sets, data models and large-scale data pipelines primarily in the Azure Databricks Spark cloud stack used for data science models and visualizations. You will partner with Optum Technology team to drive best practices and set standards for data engineering patterns and optimization. You are a key influencer in data engineering strategy. This is a unique, high visibility opportunity for someone who wants to have business impact, dive deep into large scale data pipeline and work closely with cross functional team.. Purpose Of Position

1) Design and develop ETL/ELT solutions on Azure Databricks, Delta Lake and Spark to support OptumRx Digital MBO's. 2) Develop, implement, and deploy large scale data pipelines powering machine learning algorithms, insights generation, business intelligence dashboards, reporting and new data products. 3) Partner with Optum Technology to create and maintain the technical architecture of the Enterprise Delta Lake to consolidate data from many systems into a single source for machine learning and reporting analytics. Major Responsibilities

Design, build, optimize, and manage modern large scale data pipelines ETL/ELT processing to support data integration for analytics, machine learning features and predictive modelling. Consume data from a variety of sources (RDBMS, APIs, FTPs and other cloud storage) & formats (Excel, CSV, XML, JSON, Parquet, Unstructured) Write advanced / complex SQL with performance tuning and optimization. Identify ways to improve data reliability, data integrity, system efficiency and quality. Participate in architectural evolution of data engineering patterns, frameworks, systems, and platforms including defining best practices and standards for managing data collections and integration. Work with data scientists to deploy machine learning models to real-time analytics systems. Design and build data service APIs. Mentor other data engineers and provide significant technical direction by teaching other data engineers how to leverage cloud data platforms. Required Qualifications

An undergraduate degree in Computer Science, Engineering, Mathematics, Statistics, Economics or related discipline. 2+ years of experience in data engineering, data integration, data modeling, data architecture, and ETL/ELT processes to provide quality data and analytics solutions. 2+ years of experience in SQL with designing complex data schemas and query performance optimization. 2+ years of experience in Apache Spark (PySpark / Spark SQL) 2+ years of experience in Python Experience in integrating data from semi-structured. Experience with at least one of the following cloud platforms: Azure, AWS or GCP Excellent collaborator that are able to collaborate effectively cross-functional teams such as leadership, product management and engineering. and willingness to inspire other data engineers, data scientists and analysts. Excellent communication skills - ability to communicate technical concepts to both technical and non-technical audience. Preferred Qualifications

Experience in working with large size data sets using Big Data Frameworks (Hadoop/EMR/Databricks/Spark/Hive etc.) Experience in Big Data processing Experience in Databricks Experience in Regular Expression Experience in Rest API Experience in NoSQL Experience in Kafka Experience in CI/CD technology Experience in Git Extensive knowledge of data architecture principles (e.g., Data Lake, Databricks Delta Lake, Data Warehousing, etc.). Extensive knowledge of data modelling techniques including slowly changing dimensions, aggregation, partitioning and indexing strategies. Ability to independently troubleshoot and performance tune large scale enterprise systems. Great understanding of Lambda architecture patterns.

