Previous Job
Site Reliability Engineer II
Ref No.: 18-03982
Location: Plano, Texas
Position Type:Right to Hire
Plano, TX 
6+ month contract to hire

The Position: GTI-Application Software Engineering (ASE) is focused on developing and delivering services that integrate software solutions with infrastructure in innovative, cost effective and efficient ways. The ASE team provides services for use by business-aligned application delivery organizations, across all layers of the software stack, which may include application interface services, productivity and collaboration tools, and data integration solutions.

We are looking for Site Reliability Engineer (SRE) who runs, maintains and improves the service/product against established Service Level Objectives by applying software engineering practices. SRE is Responsible for the availability, performance, change management, monitoring, and capacity management of their services

What You'll Do: 
  • Designs, develops, tests and delivers the software to automate manual operational work
  • Troubleshoots priority incidents, conducts blameless post-mortems and ensures permanent closure of the incidents
  • Engages with development team throughout the life cycle to help develop software for reliability
  • Applies analytics on the past data like incidents and usage patterns for predicting issues and takes proactive actions
  • Drives adoption of self-healing and resiliency patterns such as circuit breaker, bulkhead etc.
  • Designs and conducts the performance tests, identifies the bottlenecks, opportunities for optimization and the capacity demand
  • Defines and drives adoption of a best in class monitoring frameworks to accomplish end to end flow monitoring and noiseless alerting
  • Deploys the software and product upgrades
  • Adds value to team delivery and works with team to complete tasks to high quality and actively learns new skills
  • Facilitates maximum speed of delivery by objectively binding to error budgets of the service
  • Manages the effort split between manual operational work and engineering work
  • Be part of the 24x7 support coverage as needed
  • Coaches other team members and manages teams as needed

Skills/Experience You'll Need: 
  • Bachelor's degree (or equivalent experience) in Computer Science/Engineering
  • 6+ years of experience in developing enterprise software and proficiency in multiple technologies preferably Java, Python, Shell scripting
  • 3+ years of experience in performance engineering and monitoring using tools such as AppDynamics, Splunk, Apica, Jmeter and Blaze meter etc.
  • 2+ years of incident resolution experience in an large scale operations environment
  • Experience with configuration Management tools like Ansible/Puppet/Chef/Powershell
  • Proven ability to understand and troubleshoot complex problems under pressure
  • Experience working in an Agile Development environment.
  • Experience/knowledge administering application servers, web servers, and databases (Tomcat, WebSphere, Nginx, Microsoft IIS, Oracle, MySQL, etc.)
  • Experience with private and public cloud environments is a plus