Click here to login or register
Reference # : 18-00289 Title : Data Engineer
Location : Owings Mills, MD
Experience Level : Start Date / End Date : 01/15/2018 / 03/31/2018  
Location: Owings Mill, MD
Contract position
Interview type: phone and in-person
Will consider H1B candidates

Job Duties:
• Design, develop, implement, and maintain code, information architecture, and conceptual models to support data processing, and flows thru data lake
o Landing Zone - data ingestion of raw data or capture of streaming data
o Discovery Zone – evaluate data quality, transform raw data, and cleanse data
o Enterprise Zone – transform into data model for external consumption by reporting & self-service BI
• Develop data and metadata policies and procedures
• Recommend, design, implement and maintain the various file formats (e.g. XML/XSD, SequenceFiles, Avro files, or Parquet files) for information interchange between application, external systems, and/or data lake.
• Review and evaluate database performance, risk and financial analysis feasibility studies
• Investigate and repair application defects regardless of component, including platform, business logic, data process logic, or database (SQL and data modeling).
• All other duties as assigned or directed

• Bachelor's of Science (or higher) in computer science or related field
• 5+ years of systems/application analysis & design experience
• 3+ years of data modeling & database administrator experience
• 3+ years of experience in designing, building, and using a big data distribution, preferably Cloudera (Hortonworks, or MapR), for
o data ingestion, cleansing, and transformation (e.g. Talend, Scoop)
o data discovery & analysis using querying tools (e.g. Impala, Hive)
o data storage using distributed databases (HBASE, Kudu)
o data streaming (e.g. Kafka, Apache Spark)
o data visualization (e.g. Tableau, Birst, Qlik)
o processing monitoring (e.g. Cloudera manager, Hue)
Technical Skills
• Demonstrable knowledge with Java, Java map reduce, Apache spark, distributed file systems (HDFS), and concurrent programming
o Spring Framework (XD, Boot, Cloud, Security) experience desirable
• Experience with cluster management technologies such as YARN, Mesos, Kubernetes
• Excellent knowledge of relational databases (PostgreSQL), SQL and ORM technologies
• Preferred experience with ATTD and associated technologies (Fitnesse, DBSlim, Junit)
• Preferred experience with delivering code using Continuous Integration and Continuous Delivery (CI/CD) best practices and DevOps (Jenkins Pipelines, Docker, Groovy, Ansible)