Previous Job
Lead Data Engineer
Ref No.: 18-01391
Location: NY, New York
Position Type:Full Time
Pay Rate : $ 175,000.00 /Year
Data Engineer

We're using Cloud and Big Data technologies to get the job done in an efficient and faster way. You'll join and work with a team of talented engineers who enjoy solving tough problems and improving the healthcare industry. It's a fantastic opportunity for you to learn from both engineering and product teams in a growing company with a get-it-done attitude!

We believe in supporting both personal and professional growth. That means, we pay attention to goals, celebrate milestones and highly encourage learning opportunities.

1) Assume ownership of the data ingestion and ETL pipeline. This means ensuring that data is being munged and ingested correctly, capturing exceptions, implementing a system to notify stakeholders when those exceptions occur, and instrumenting all elements of the pipeline stack. You will also iterate on our current architecture to make it more efficient and fault tolerant - the goal is to have a pipeline which can ingest new data from any source with minimal reconfiguration.

2) Maintain concurrency between our services and our data warehouse, and work with our data scientist and provide engineering support for our machine-learning engine to match providers to their credentials.

3) Build tools to facilitate on-boarding our customers' data and enable our operations team to change and view that data. You'll be the point of contact for the operations team when there are technical requests that specifically impact our data pipeline.

As our platform grows, so too does the need for a high-performance ETL pipeline and a scalable framework for storing that data and making it accessible for our customers and operations team.

To have an immediate impact you'll need to be skilled in some (not necessarily all) of the technologies in our stack: Django, Python, Ruby on Rails, Postgres, MongoDB, AWS, Chef, RabbitMQ, and Redis. Bonus points if you have experience with the Hadoop ecosystem, Python libraries used for machine learning (scikit-learn, pandas, numpy), or experience with distributed computing.

  • Demonstrated experience (2-5 years) building and maintaining data ingestion pipelines with large data sets
  • Working knowledge and opinions of latest tools for data collection, analysis, warehousing and transformation
  • Understanding and intuition for data modeling and enterprise data management
  • You love data like us!
  • Good, open communicator. Your code/scripts should not be the only thing that speaks for your work. Documentation and pull request comments, sprint planning sessions, conversations with customers' technical staff, explaining complex data model issues to other functional teams within the company all require excellent written and spoken communication skills