Previous Job
Previous
Agency Senior SRE/Devops Engineer
Ref No.: 18-15370
Location: Sunnyvale, California
Title: Agency Senior SRE/Devops Engineer
Location: Sunnyvale CA
Duration: 03+ Months
 
About Client:
Our Client  is an industry-leading vendor of network gear and software. We are a fast-growing group focused exclusively on developing a multi-tenant Linux server software platform based on Kubernetes for Client internal use. It supports multiple cloud providers, on-premise data centres, and hybrid configurations. We are mostly agnostic to the purposes of the apps running on our platform, hence deep networking expertise is not required.


Responsibilities
  • Building complex distributed on-premise systems from the ground up, including automated infrastructure provisioning and configuration pipelines
  • Design overall networked data storage architecture for a complex distributed systems with multiple NoSQL databases and diverse workloads
  • Rack, stack, and configure new hardware, such as servers, TORs, and storage arrays
  • Champion cross-team SRE methodologies, technologies, and tools
  • Automate, configure, manage, and monitor hypervisors and VMs
  • Engage with development teams to facilitate transition into our distributed Kubernetes-based software platform
Qualifications:
  • Advanced knowledge of configuration and automation tools, especially Ansible
  • Advanced knowledge of hypervisors, especially KVM
  • Advanced knowledge of storage area networks (SANs) for iSCSI-connected storage arrays
  • Advanced knowledge of troubleshooting complex distributed systems
  • Hands-on experience with PXEs, such as Digital Rebar
  • Proficiency in Python
  • Prior experience developing systems delivered as physical appliances for data centers
  • Hands-on experience racking, stacking, and configuring servers, TORs, and storage arrays
  • Hands-on experience with multiple distributed NoSQL databases such as Elasticsearch,
  • MongoDB, Kafka, Zookeeper, Etcd, and ClickHouse
  • Ability to work collaboratively with teams and influence their technological direction
     
Additional Requirements:
  • Operations expertise (hypervisor, monitoring, logging, RBAC, upgrades, runbooks, maintaining dev envs)
  • AWS, provision, build, deploy, customize
  • Networking expertise (L4 LB, L7 Ingress, CNI, VM network, Contrail, VPC, DNS)
  • Networked storage expertise (iSCSI, Ceph, EBS, etc)
  • Expertise deploying, maintaining, backing up and restoring and scaling the following systems
  • ClickHouse
  • Consul (or punt)
  • Elasticsearch
  • Etcd
  • Kafka
  • MongoDB
  • Postgres
  • Vault (or punt)
  • Zookeeper