Previous Job
Previous
AWS Big Data Engineer
Ref No.: 18-67310
Location: Malvern, Pennsylvania
Position Type:Full Time/Contract
Start Date: 09/13/2018
Experience and Qualifications:
2+ years of hands-on experience designing and deploying an AWS-based application (native/re-factored)
5+ years with Python, Scala, Spark, Oozie, Big Data
Expertise in the core AWS services, uses, automation, and architecture best practices
Proficiency in designing, developing, and deploying cloud-based Big Data solutions using AWS
Experience with developing and maintaining applications written for Amazon Simple Storage Service, Amazon Simple Queue Service, Amazon Simple Notification Service, Amazon Simple Workflow, API Gateway Service, AWS Elastic Beanstalk, and AWS CloudFormation
Proficiency in Amazon Compute and Storage Instances
Experience with S3 Server Side Encryption, IAM, and Policy, CloudTrail, CloudWatch.
Experience on EMR and (Lambda) Serverless Architecture
Experience setting up Kinsesis streams and integrating them with CDC (Attunity preferred)
5+ years working with Big Data (Hadoop, Cloudera, HBase)
Proficiency on High Available, Fault Tolerant, and DR Architecture
Good working knowledge and experience working with databases like DynamoDB, S3
Experience working with Google Doubleclick is highly preferred
Experience on DevOps CI and CD using Jenkins or Bamboo or Code Deploy
AWS Developer, Solution Architect Certified a plus but not required
Experience with Atlassian stack highly preferred

Job responsibilities:
Design, develop and deliver scalable and automated Data Pipelines to ingest Google Doubleclick data
Familiarity with ingesting and loading data using Oozie workflow manager and cloud-native ingestion services
Code and enable Data store on S3
Leverage IAM roles & policies for service authentication
Build load, transformation, and validation logic in EMR (Spark/Scala)
Build necessary infra to provision query cluster using existing architecture
Migrate OnPrem Hadoop data and queries to AWS
Promote serverless code where appropriate
Data Quality evaluations based on the source data.