Previous Job
Ref No.: 17-14502
Location: Tampa, Florida
Senior AWS Reliability Engineer
Onsite in Tampa, FL
6+ Month Contract

Position Summary
IT Architecture is seeking an AWS Reliability Engineer with extensive experience in AWS Cloud. The Cloud Reliability Engineer role will focus on the design and implementation of application resiliency in AWS and or other Cloud providers environment. The role will require that the individual be familiar with current and emerging cloud technologies, tools and system architectures. The position will include developing resiliency validation and verification automation, monitoring, and alerting for AWS technologies such as ECS, EC2, S3, Aurora, DynamoDB, Lambda, etc. The role requires an individual who can work closely with application developers, architects and infrastructure engineers to deliver resilient solution in a team oriented environment. This role will have the opportunity to influence the application architecture and infrastructure direction as we establish a significant presence within the Cloud landscape.

Principal Responsibilities
• Understand business goals and drivers and translate those into an appropriate technical solution
• Focus on continuous improvement practices as required to meet system resiliency imperatives
• Build automation tools to conduct chaos experiments and uncover system vulnerability
• Provide inputs and expertise guidance to major architectural designs to ensure high availability, scalability, and fault tolerance with respect to the overall system architecture early on
• Design, test and support solutions resiliency in the Cloud and on premises
• Measure everything, providing critical operational insight into our applications related to performance, scalability, availability, and reliability.
• Work with application development and infrastructure engineering to provide root cause analysis of production resiliency issues.
• Support for the adoption of DevOps methodology and Agile project management
• Provide mentoring, knowledge transfer and assist in training for other team members.

• Minimum of 10 years' experience in the design & implementation of distributed applications
• Minimum of 5 years' experience in networking, infrastructure, and database architecture
• Minimum of 5 years' experience in hands-on production administration of large systems environment
• Ability to do low level debugging and problem analysis by examining logs and running Linux commands
• Experience establishing and improving procedures within a mission critical environment

Knowledge/ Skills
• Ability to work independently, with minimal supervision.
• Strong knowledge of AWS cloud environment
• Knowledge of AWS cloud monitoring tools (cloud watch, cloud trail, splunk, and other application monitoring.)
• Knowledge of AWS Identity Access Management (IAM)
• Must be comfortable working in an open, highly collaborative team
• Full stack software development experience, with strong troubleshooting skills
• In-depth, hands-on expertise in Bash, MySQL, and Linux
• Ability to write scripts (Bash, Python, etc.) for automation of solution resiliency validation and verification.
• Excellent oral and written communication skills along with and ability to communicate at all levels.
• Experience with provisioning and configuration tools like AWS CloudFormation/Terraform a plus
• Chaos engineering experience a huge plus.

• Bachelor's Degree in a technical discipline or equivalent work experience