Previous Job
Technical Lead - Senior Dev - Python (PySpark)
Ref No.: 17-00425
Location: McLean, Virginia
Position Type:Contract
CW Requisition – Big Data Engineer (PySpark)


Job Description
• Responsible for delivery in the areas of: big data engineering with Hadoop, Python and Spark (PySpark) and an high level understanding of machine learning
• Develop scalable and reliable data solutions to move data across systems from multiple sources in real time (Kafka) as well as batch modes (Sqoop).
• Construct data staging layers and fast real-time systems to feed BI applications and machine learning algorithms
• Utilize expertise in technologies and tools, such as Python, Hadoop, Spark, Azure/AWS, as well as other cutting-edge tools and applications for Big Data
• Demonstrated ability to quickly learn new tools and paradigms to deploy cutting edge solutions.
• Develop both deployment architecture and scripts for automated system deployment in Azure/AWS
• Create large scale deployments using newly researched methodologies.
• Work in Agile environment
Basic Qualifications
• Bachelor's degree in Mathematics, Statistics, Computer Science
• Solid experience with Hadoop including Hive, HDFS, Kafka and PySpark
• At least 3 years' experience in Python (NumPy, Pandas, PySpark) and any other open source programming languages for large scale data analysis
• At least 5 years' experience with relational database
Preferred Qualifications
• Master's Degree in Computer Science
• 3+ years of experience working with AWS/Azure
• 5+ years' experience in Java
• 2+ years of experience working with financial data
• Familiarity of modern statistical learning methods & machine learning (SciPy, scikit-learn)
• Familiarity with one or more streaming technologies, viz. Kafka, NiFi etc.
• Experience with NoSQL databases
• 3+ years of experience in Python (including NLP) for large scale data analysis
• 5+ years of experience with SQL
• Strong communication skills, with the ability to work both independently and in project teams

Other Skills:
Python, PySpark, Hadoop, Hive, HDFS, Sqoop, Oozie