Previous Job
Previous
Site Reliability Engineer
Ref No.: 23-49882
Location: Bentonville, Arkansas
Position Type:Contract
Start Date: 05/03/2023
Job Description

Title: Senior Site Reliability Engineer (Reliability Engineering and Retail Payments)
Location: Fully Remote
Duration: 6-month contract (Potential extension to 18 months or conversion to FTE with strong performance)

Job Summary:
  • We are looking for a Site Reliability Engineer (SRE) who will be responsible for ensuring the reliability, availability, and performance of our production systems.
  • As an SRE, you will work closely with cross development and engineering teams to design and implement tools and processes to automate deployment, observability, and troubleshooting of our applications and infrastructure supporting the deployment of new Android tablets to the stores.
  • This individual must be skilled and have professional experience with the core functions of Site Reliability Engineering including deployments, observability, monitoring, telemetry, and automation.

Responsibilities:
  • Ensure the reliability, availability, and performance of our production systems as we scale
  • Develop and maintain monitoring and alerting systems to detect and respond to incidents in a timely manner
  • There is no on-call rotation but occasionally support planned deployment roll outs that may require working off-hours during store closure
  • Work with cross-functional teams to plan and execute scaling initiatives
  • Develop and maintain documentation of processes, procedures, and technical configurations

Requirements:
  • Strong written and verbal communication skills with peers, technical leads, project managers and product owners
  • Must be able to collaborate with customers and cross-functional teams to design, test and validate deliverable which meet or exceed expectations
  • Self-starter and highly motivated individual that is well-organized
  • Bachelor's degree in Computer Science or related field
  • 5+ years of experience as a Site Reliability Engineer
  • Strong experience with automation tools and experience with automation scripting in Python
  • Experience with containerization technologies such as Docker and Kubernetes
  • Experience with cloud platforms such as Azure or AWS
  • Experience with monitoring and logging tools such as Datadog, Prometheus, Grafana or Splunk
  • Strong understanding of networking, security, and systems administration
  • Excellent problem-solving skills and attention to detail
  • Must be available to work core hours PST.

 Preferred qualifications:
  • Experience with distributed systems and supporting a large retail business
  • Experience with infrastructure as code tools such as Terraform or CloudFormation
  • Experience with CI/CD tools such as Jenkins
  • Experience with incident ticketing systems such as ServiceNow and Jira for tracking stories
  • Familiarity with Agile/Scrum methodologies and DevOps principles

If you are passionate about ensuring the reliability and availability of systems in our stores and enjoy collaborating with cross-functional teams to solve complex problems, we encourage you to apply for this exciting opportunity as an SRE.