Previous Job
Site Reliability Engineer
Ref No.: 18-16170
Location: Waltham, Massachusetts
Site Reliability Engineer
Waltham, MA
7 Months - Temp to Perm Possible


This is an exciting opportunity to join a rapidly growing healthcare information technology business as a Site Reliability Engineer. You will play an important role in supporting and transforming the core technology of a leading healthcare SaaS product company, bringing thought leadership and new techniques to a challenging and exciting role. We are looking for a dynamic individual who has a demonstrated passion for the art of Site Reliability Engineering, DevOps, Release Engineering, and Production Environment support and standards. This is a tremendous opportunity to help transform the public facing user experience of a leading healthcare technology company.

The Site Reliability Engineer will play a key role in aiding client Health to support existing technology project infrastructure and to help develop the strategy for our next generation efforts. This person will support and augment the current team, bridge the technologies between current and next, and help lead the strategy and implementation of new technologies. These improved tools, technologies, and processes will enable us to deliver Product and Infrastructure Roadmaps in a more nimble, predictable, and quality fashion. In addition, this role will help us in our ability to provide round the clock operational system support for current and new systems.


• Assume the responsibilities and perform the duties of a Site Reliability Engineer (SRE) to support and deliver SaaS / IaaS solutions.
• Design and implement future state SaaS / IaaS architecture.
• Enable and implement an OpenShift / Kubernetes platform and associated services.
• Enable platform services that support continuous delivery and continuous integration.
• Analyze a variety of approaches to Site Reliability Engineering and the provide pros and cons of different approaches to enable the team to arrive at an agreed upon direction.
• Develop and administer tools to enable rapid Micro-Service based software deployments.
• Create operations handbook as required for others to assist in the administration.
• Collaborate to incorporate automated unit, integration, functional, and performance testing into the Continuous Integration process across multiple projects.
• Collaborate with the Development, Project Management, and Product Management teams to align projects and other efforts.
• Evolve and automate processes to increase flexibility within the development and testing of multiple simultaneous projects.
• Provide Development Project level Support.
• Build, maintain and deploy the application level software in our development and test regions.
• Prioritize and troubleshoot development and test region issues.
• Develop runbooks that detail building, deploying, and troubleshooting processes.
• Promote and contribute to best practices.
• Plan and execute tasks within an agile environment.
• Provide Production Support and monitor Production Regions and Environments.
• Provide first level support for application software issues in all environments.
• Prioritize and rapidly troubleshoot issues to ensure maximum uptime and optimal performance for customers in our production environment.


Provide assistance as needed to other teams in the areas where existing technical skills and experience are applicable.


Education: Bachelor of Science degree in Computer Science or equivalent job experience.

Desired Qualifications:

• 7+ years of experience in SRE, DevOps, Release Engineering, System Operations, or Software Development
• 3+ years of experience in operating /developing large scale distributed services/applications
• Excellent organizational, verbal, and written communication skills
• Demonstrate strong collaboration skills, within function and across peer stakeholders.
• Extensive experience with Linux and UNIX System Administration. Experience in using Windows.
• Proficiency with Linux Containers, Docker, Container Solutions, associated Management Tools and challenges.
• Hands on experience with shell scripting, including Bash, Python, Groovy, etc.
• Proficiency working within a Java Software Development Team.
• Experience and deep commitment to the transformation to a DevOps culture focusing on continuous integration – full lifecycle of building, automated and performance testing, and automating deployment.
• Experience with VMware provisioning of Virtual Machines, Virtual Networking and Storage Resources.
• Experience with Ansible (preferred), Chef, Puppet or other Configuration Management tools.
• Experience with Jenkins (preferred), TeamCity or other Continuous Integration tools.
• Deep knowledge of build tools like Gradle (preferred), Maven, and Ant.
• Hands on experience with SQL, and DB Release Management.
• Usage of Jira (preferred), Rally, or other tracking tools.
• Usage of Confluence (preferred), or other documentation tools.
• Demonstrate strong problem analysis, problem resolution, and decision making and judgment skills.
• Demonstrate understanding of complex software architecture, and ability to enhance, support, and troubleshoot same.
• Demonstrate excellent and effective interpersonal and communication skills (written, verbal and listening), with ability to build positive working relationships with all levels of the organization.
• Ability to leverage technical know-how to find viable compromises amidst competing business needs.
• Demonstrate ability to plan and excel in a fast-paced and demanding environment.
• Solid understanding of agile methodology and Release Engineering and able to leverage what has worked and adapt it to fit new situations.
• Knowledge of cloud compute technologies, network monitoring, data processing and analytics.

Additional Qualifications:

• Knowledge of Site Reliability Engineering.
• Networking Monitoring, Networking protocols, SNMP, syslog, network telemetry, REST API.
• Exposure to Grafana, Prometheus, Alert Manger, Kafka, Elastic search, and other platforms.
• Prior experience with Datacenter Monitoring, Service Oriented Systems, and Micro-Services.
• Contributions towards Open Source projects is big plus.
• Knowledge and Practice with using Scrum & Agile Methodologies.
• Master's degree in Computer Science or related field.
Other Knowledge, Skills, Abilities or Certifications:

• Agile scrum master experience.
• Able to plan and execute projects as part of a collaborative team.

(List any travel requirements of the job. Include whether travel will be on a weekly, monthly, or annual basis. Indicate whether travel will be domestic or international in nature. Express amount of travel as a percent of total work time.)


Able to lift server equipment if needed.