Functions and Responsibilities

Manage production environments by monitoring availability and taking a holistic view of system health
Automate reliability, quality, and repeatability of cloud environments
Proactively ensure the highest levels of systems and infrastructure availability
Responsible for maintaining tools/systems/platforms for cloud service
Automation of infrastructure operations
Collaborating with engineering and development teams to evaluate and identify optimal cloud solutions
Provide primary operational support and engineering for multiple software applications
Support incident escalation and troubleshooting

Skills & Knowledge

Ability to multi-task and complete regular duties in a time-efficient manner.
Strong skills in Windows & Linux Server systems.
Experience with MS SQL/Postgres SQL applications.
Experience with Azure and Amazon Cloud Services
Experience with Python and PowerShell scripts
Experience with deploying, supporting, and monitoring new and existing services, platforms, and application stacks
Agile mindset and DevOps philosophies

Minimum Qualifications

4+ years with site reliability engineering
4+ years of experience with IaaS environments such as MS Azure, AWS, GCP
Advanced knowledge of TCP/IP networking, architecture, and core technologies (such as DNS, DHCP, HTTP, Routing, VPN)
3+ years automation process implementation
Experience with Infrastructure as Code (IaC)
Experience implementing & administering large scalable cloud environments

Job Type
Full-Time Regular

Location
Remote

Location
Atlanta GA

Site Reliability Engineers