Previous Job
Site Reliability Engineer
Ref No.: 17-14907
Location: Cambridge, Massachusetts
Position Type:Contract
About the Job
The Site Performance and Reliability Engineer has the responsibility to ensure optimal performance and up-time of our portal services and infrastructure. Candidate will analyze system performance end-to-end, from the clients (browsers) to the database, with a special focus on the back-end and infrastructure elements. Once bottlenecks and potential failure points are identified, candidate will propose and validate solutions that the operations and development teams will implement and test under his/her guidance.
About the Team
Luna Control Center is Akamai's public portal. It is the critical system that all of our customers interact with to gather web traffic analytics, manage configurations, fetch support materials and plan their online events. We are committed to dramatically enhance our customers' user experience and create a unique competitive differentiator in terms of feature set, self-service, speed, availability and security.
* fluent in systems programming and/or automation, and can leverage their experience to solve complex problems associated with running production environments at massive scale in multi-tenant environments.
* Implementation of proactive monitoring, alerting, trend analysis and self-healing systems
* Participate in incident resolution processes driving restoration and repair of service-impacting issues
* Instrument existing code and/or write performance-dedicated applications to enable fine-grain tracking of speed bottlenecks. Graphically report, in near real-time, Luna Control Center performance as perceived by our customers
* Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems

Basic Qualifications
* 5+ years of experience as a SRE or Operations or administration of customer-facing, high-availability, large scale web-based applications.
* 5+ years of PHP, Perl, or other scripting language.
* 5 years of experience in Java-based technologies
* 3 years of experience working with Cassandra databases – Oracle is a plus
Desired Qualifications
* Prior successful experience as a systems performance or site reliability engineer
* Mastery of Linux/Unix
* Mastery in PHP, Perl or Python Programming.
* Administrative Experience with installs, configures, troubleshoots, monitors, maintains of Linux infrastructure.
* Experience in writing SQL and PL/SQL procedures.
* Experience with one of the log analysis tools like Splunk or ELK Products (ElasticSearch,Logstash, Kibana)
*Experience with Orchestration Tools like Ansible etc.
* Experience with monitoring tools like Sensu, Collectd, Grafana etc.
* Desire to work in a fast paced and dynamic environment.
* A passion for performance excellence