Site Reliability Engineer - 18-00234
Previous Job
Site Reliability Engineer - 18-00234
Ref No.: 18-00234
Location: Cambridge, Massachusetts
Position Type:Contract
Start Date: 01/25/2018
Job Description:
  • As a site reliability engineer, you will ensure the proper accessibility and functionality of the production environment. 
  • You will be a key member of the Technical Operations team tackling complex assignments where independent action and a high degree of initiative are required to resolve problems and develop recommendations. 
  • You will understand the challenges around rapidly creating, scaling, and managing distributed applications and services, and will be able to collaborate with talented engineers across multiple disciplines to address those challenges. 
  • You will design operational processes and solutions to proactively address issues before they become customer facing.
  • This is a hands-on position with a strong emphasis on system automation and supporting production support. 
  • You should be comfortable learning new technologies.  You thrive in true-agile, highly paced, production facing environment. 
  • You have a low tolerance for mediocrity. 
  • Provide advanced support for incident resolution for technical problems involving the full application stack
  • Track down defects and come up with innovative solutions to improve reliability and availability
  • Identify persistent or recurring problems and recommend creative solutions
  • Design, write and deliver software to improve the availability and reliability of our cloud services
  • Propose, design, test, and implement strategic operational system solutions
  • Automate Operations functions (Cloud deployments, upgrades, capacity additions, operational processes)
Required Skills
  • BS degree in Computer Science or related technical field, or equivalent practical experience. 
  • 7+ years in a Windows/UNIX-based large-scale web operations role
  • 5+ Years’ Experience with at least one high level programming language like Python, PowerShell, or Java
  • 4+ years advanced-level experience with Windows/Linux
  • Knowledge of various SQL and NoSQL databases such MS SQL, DynamoDB, RDS, and Aurora
  • Well versed in cloud orchestration using Terraform, CloudFormation, or similar technologies
  • Experience with AWS cloud services such as S3, EC2, auto scaling groups, SNS, and IAM
  • Ability to quickly learn and develop expertise in highly complex existing applications and architectures
  • Solid understanding of the challenges with creating, scaling, and managing distributed applications and services
  • Familiarity with application profiling, system scalability, monitoring and performance
  • Knowledge of automation tools, such as Puppet, Chef, Ansible, Salt, etc. in a production environment
  • Trouble-shooting skills that span systems, network, storage, and code
  • Obtain complete knowledge of our complex applications
  • Transfer knowledge, train, and coach others in area of expertise
We are looking for folks that thrive in an all hands-on deck startup mentality, just like us!  You should be energetic, confident, and ready to contribute in a fast-paced production facing environment.  
Required Soft-Skills
  • Strong analytic, problem solving, and troubleshooting skills a must
  • Exceptional ability and motivation to solve problems and learn fast
  • Must be able to perform at a high level within a technical team
  • Ability to work independently with minimal supervision
  • Excellent communication and relationship skills
  • Distributed team collaboration
Desired Skills
  • AWS DevOps or SysOps certification
  • Experience with Custom Applications, deployments, and configuration
  • Experience with automation of cloud services a strong plus