Previous Job
Previous
Data Engineer
Ref No.: 18-12701
Location: South San Francisco, California
Position Type:Contract
The Genentech Research and Early Development (gRED) Early Clinical Development Operations (ECD Ops) department is seeking an experienced Data Engineer who is motivated and experienced in data architecture to help further the development of ECD Operations data services. This individual will work in the ECD Ops Information Management Office (IMO) and will be accountable for providing engineering expertise in the delivery and optimization of the organization’s data lake and data warehouse called gCORE.
The role will require cross-functional interactions with Data Management Leads, Clinical Study Teams, Predictive Analytics, Artificial Intelligence and Information Technology teams to drive data acquisitions and data operations projects as well as data platform technology needs. The hallmark of a great candidate is one who is eager to solve complex problems with data, is skilled in managing databases and developing data pipelines and has a passion for learning new skillsets to deliver on organizational-wide data needs.
Responsibilities
Architect solutions that will transform data into an analyzable format for data scientists, data operations processes and analytical tools / dashboards
Work with external suppliers including clinical sites and CROs to define and design data integrations
Develop and optimize big data pipelines for data scientists
Develop ETL workflows using data warehouse ETL tools for production processes, such as data quality monitoring and cleansing in coordination with IT
· Perform hands-on infrastructure design of ECD’s data lake and data warehouse environment (gCORE) including continuous exploration and recommendation of new technologies and best practices
· Communicate synthesized data quality findings to business and technical team members, senior leaders and external stakeholders
· Research and recommend new innovative methods and systems to manage data for business improvement
· Contribute to internal governance teams to drive the data quality business cycle and roadmap

Bachelor’s or Master’s degree in computer science or software engineering
5+ years of programming experience in one or more of these: Java, Python, C++, Scala, etc.
Experience with relational SQL and NoSQL databases, including Postgres and Cassandra
Experience building and optimizing big data pipelines using Spark or other similar technologies
Experience with AWS cloud services: EC2, EMR, RDS, Redshift
Solid understanding of how to design robust data workflows including optimization and user experience
Strong analytical and problem solving skills
Excellent oral and written communication skills
Able to work in teams and collaborate with others to clarify requirements
Strong co-ordination and project management skills to handle complex projects
Experience developing and working with XML, JSON, and external web services
Preferred Qualifications
Clinical drug development domain knowledge
Experience with Clinical data and systems such as Medidata RAVE, Siebel CTMS, IxRS
Experience with Scientific data such as Genomics and Imaging data
Experience with data quality software such as Informatica, Paxata, Alteryx, Data Monarch or similar class of tools
Competencies in applied statistics to solve business needs
Knowledge of industry data standards used in drug development, particularly in Clinical development

Education:
​Bachelor’s or Master’s degree in computer science or software engineering