SITE RELIABILITY ENGINEER
endevis is partnered with a world class organization to fill a Site Reliability Engineer opportunity. This individual will be responsible for planning, deploying, and troubleshooting application stacks, as well as ensuring the digital space is up and running at all times and performing at its peak capability. The Site Reliability Engineer will interact with several functional areas across all levels of the organization.
- Collaborate with the development teams to design the application stack, enhancing the digital presence, performance, and availability on multiple cloud services.
- Conduct post-mortem reviews of system down-time with internal stakeholders to put short- and long-term solutions in place to eliminate repeat occurrences.
- Conduct risk analysis to review system shortcomings that present risk of downtime for application stacks.
- Implement DevOps changes and rollouts and managing deployment in a manner leading to optimal results.
- Combine software and systems engineering to build and run large-scale, distributed, fault-tolerant systems.
- Ensures internally critical and externally-visible systems have reliability and appropriate uptime.
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
- Use configuration management tools to create repeatable environments.
- Create dashboards which communicate and alert on the overall system health to less technical colleagues.
- Develop system configuration management templates, and audit systems.
- Work with developers to quickly identify and address issues to provide smooth code rollouts and seamless change back-out when there are problems.
- Performing code deployments.
- Install configure and maintain middleware and ESB environments.
- Install configure and maintain “cloud” hosting technologies.
- Install configure and maintain API gateways.
- Routine load testing of our systems.
- Optimize platform builds and automation.
- System / service performance tuning, troubleshooting and debugging.
- Use tools like Puppet, Satellite, Jenkins, Hudson, ELK Stack, Terraform, Ansible, Salt & Splunk
- Minimum of four (4) years of Linux systems administration experience.
- Experience with Apache, Tomcat, Wildfly, .NET or .NET Core hosting.
- Working knowledge and experience with Networking fundamentals.
- Expert skill level in Scripting and Automation.
- Expert in high-availability and load balancing technologies.
- Someone driven to get an “extra 9” of availability.
- QSR experience
- This is a full-time position that provides Level 2 & 3 support, on a 24 x 7 schedule, for all operational and outage issues relating to the infrastructure.
- Ability to participate in an on-call rotation performing weekend and after-hours support is required.
- Bachelor Degree in Computer Science or similar area. Experience may be considered in lieu of a degree.
Endevis, LLC. and all companies represented are Equal Opportunity Employers and do not discriminate against any employee or applicant for employment because of age, race, color, sex, religion, national origin, sexual orientation, gender identity and/or expression, status as a veteran, and basis of disability or any other federal, state or local protected class.