Previous Job
Previous
Data Engineer
Ref No.: 18-69902
Location: Deerfield, Illinois
Start Date: 09/24/2018
Location: Deerfield, IL
Contract

Job Title: Data Engineer
As a Data Engineer, you will provide engineering expertise to create and enhance data solutions enabling seamless delivery of data across our enterprise. You will be on the cutting edge of integrating new technology and tools for data centric projects. You will provide lead level technical consulting to peer data engineers during design and development of highly complex and critical data projects. Some of these projects will include designing and developing data ingestion and processing/transformation frameworks leveraging tools and formats such as AWS Athena, Hive, Java, Scala, Spark APIs and ETL, AWS Glue, Parquet, Avro, Orc, etc (in addition to traditional tools such as Informatica ETL). Additionally, you may work on real time processing solutions using tools such as Spark Streaming, Kafka, and AWS Kinesis. You will deploy application code using CI/CD tools and techniques.
- Develop data driven solutions utilizing current and next generation technologies to meet evolving business needs.
- Ability to quickly identify an opportunity and recommend possible technical solutions.
- Utilize multiple development languages/tools such as Python, SPARK, HBase, Hive, Java to build prototypes and evaluate results for effectiveness and feasibility.
- Operationalize open source data-analytic tools for enterprise use.
- Utilize tools available to you across AWS Services
- Develop real-time data ingestion and stream-analytic solutions leveraging technologies such as Kafka, Apache Spark, NIFI, Python, HBase and Hadoop.
- Provide subject matter expertise in the analysis, preparation of specifications and plans for the development of data processes.
- Ensure proper data governance policies are followed by implementing or validating Data Lineage, Quality checks, classification, etc.
- Independently use own judgement and experience to identify data and data integration requirements and influence the detailed solution design.
- Influence design for solutions involving structured data, big data and difficult to structure data sets.
- Influence design to enable efficient operations including recommending automated QCs, metrics for data quality and data integration, and parameterized approaches to allow for future flexibility.
- Provide leadership to complex data analysis, uses and explores data, languages, tools and software to best construct data for predictive modelling, test the model, and train data to deploy the modelling within a complex Commercial Pharmaceutical environment.
Required:
- Bachelor Degree in Computer Science or equivalent
- 7+ years' experience and/or relevant project / coursework
- Up-to-date specialized knowledge of data wrangling, manipulation and management of technologies.
- Ability to manipulate voluminous data with different degree of structuring across disparate sources to build and communicate actionable insights for internal or external parties.
- Possesses strong communication skills to portray information.
- Ability to work in an agile environment with high quality deliverables.
- Hands-on experience with Informatica ETL tools (PowerCenter, ICS, IICS)
- Must have hands-on experience with the AWS ecosystem (EMR, Redshift, S3, etc.).
- Working knowledge of SQL and Relational Databases
- Experience with concepts of Hadoop and Spark
Desired:
- Knowledge of pharmaceutical Commercial data, including Patient Data, and associated KPIs/Metrics.
- Experience with Machine Learning / Predictive Analytics
- Knowledge of at least one of the following languages: Python, Scala, R, SAS
- Knowledge of Sqoop, Oozie, and AWS Glue
- Experience with data formats including Parquet, ORC or AVRO
- Experience with SAP Business Objects / BI suite
- Experience with data virtualization tools such as Denodo or Composite.
- Experience with data governance and data catalog concepts and tools.
- Experience with Master Data Management.
- Knowledge of DataOps and DevOps, and their interdependencies in a cloud environment.