Previous Job
Site Reliability Engineer
Ref No.: 18-50210
Location: Westport, Connecticut
Position Type:Direct Placement
Start Date: 07/17/2018
Site Reliability Engineer- Cloud SRE

Job Description:
Seeking a sys admin/SRE who will serve as second level support. Candidates should have in-depth, hands-on experience troubleshooting configuration management systems like Ansible, AWS using Linux OS and native AWS tools, and Git source code management. Additionally, they should be familiar with scripting in bash and python – not just running manual commands off a KB – and able to run, maintain, and write scripts related to cloud development and administration. Experience with Linux Systems Administration, other cloud providers and virtualization technologies a plus.
  • The position is for an operational (run) group. The sys admin/SRE will respond to incidents and as well as handle support requests for a variety of technologies related to our cloud services.
  • The sys admin/SRE will be responsible for resolving tickets, identifying trends, finding the technical cause of problems, as well as recommending process/ system changes to eliminate incoming workload, optimizing team process, documenting procedures, and escalating where appropriate.
  • Sys admin/SRE will also have project responsibilities (i.e. executing some of the improvements identified by them or others on the team) and will own small projects within their respective area.
  • The position is for a primary day shift (8-10 hours per day). While there may be on-call responsibilities, this should be infrequent and brief during evenings and weekends. More extended off-shift time may be needed but these would be planned in advance, and generally associated with project work where maintenance windows need to be observed.
  • This is not a low level position. The person will need an understanding of their respective area, but this is also a hands-on operations position. Engineers that only want to do projects and/or design and architect solutions will not be happy with this role.
  • Break/Fix, provisioning, decom, and patching of cloud servers and their associated management systems
  • Documentation
  • Analyzing, troubleshooting and resolving system OS, software, and performance issues
  • At least 2+ years of experience in relevant technologies
  • Hands on knowledge of troubleshooting in a Linux operating system environment
  • Must be able to work well in an ambiguous and changing environment
  • Core Skills/Technologies:
    • AWS
    • Moderate Linux and Windows
    • Ansible, Chef, or similar configuration management systems
    • Source code management (SVN, Git)
    • Bash, Python, similar scripting lanuages
  • Additional Skills/ Technologies:
    • Other cloud services
    • Server patching with yum and Red Hat Satellite
    • Centrify, SSSD