Previous Job
Big Data Architect
Ref No.: 18-65981
Location: Madison, Wisconsin
Position Type:Contract
Start Date: 09/10/2018
Job Title: Big Data Architect
Location: Madison, WI

Big Data Architect with 10-15 years of exp. Some of pre-requisites for the role

Overall understanding of BigData Technologies, Analytics, Data warehousing Concepts, Business Intelligence, Cloud platforms, and support. Demonstrable knowledge of Hadoop, Spark, Map Reduce (MR), HDFS, HBase, Hive, zookeeper, Sqoop, Flume, Ambari and Oozie. Exposure to Hadoop on prem as well on Cloud, AWS S3, EC2, EMR, Lambda and Multi Node - Clusters in Cloud, Hortonworks (Ambari, HDP ).Understanding of CICD Pipelines, using Jenkins, Involved in Cloud Infrastructure solution design, Research, Defining the access patterns and etc.

  • Interact with business analysts and application managers to gather requirements, guide implementations, and perform production deployments for real-time, interactive, and batch/ETL applications.
  • Enhance performance and execution time of deliverables with Spark, HBase, Hive LLAP, Tez, Map-Reduce programs. Build Data Pipelines using Flume, Sqoop, Storm and Kafka.
  • Create Realtime data flow management using NiFi, MiNiFi, Installed & Configured HDF for high volume event processing, for immediate analysis and actions.
  • Enable automation Data Ingestion jobs into HDFS and Hive using Sqoop from relational databases like Oracle, SQL Server, Teradata, DB2, and Informix across business channels.
  • Implement Joins, Dynamic Partitioning, Buckets, File Formats and Compression techniques in HDFS for efficient data access. Troubleshoot and tune Hadoop clusters and process data with Hive and Spark.
  • Automate workflow scheduling with Shell Scripting. Seamlessly handle migrations and cluster downtimes. Load balance based on resource availability (Memory, CPU) and data availability across Hadoop clusters.
  • Establish a Streaming Platform which would process streams of records as they occur, used Kafka, Spark – Real time (In-memory) Cluster Computing Framework – Customizable.
  • Create an end to end DataLake solution and moving the cluster from On-premise to Cloud making the clusters on Active-Active mode ensuring the High availability
  • Analyze and profile source data to create detailed data design, data flow diagrams and data lineage
  • Translate technical aspects of data design (facts, dimensions, "snowflaking”, etc.) into language that non-technical users can understand. Work with both technologists as well as non-technical users with ease and approach ability.
  • Design tools required to measure data accuracy, latency, etc., work with tools team to ensure implementation and delivery. Evangelize Technical architecture across functional teams
Involve in Conceptual, Logical Data Modeling as a part of Data Architecture, Evaluated the core, Supporting technologies, Application Technology stack & Development tools