Senior Site Reliability Engineer
Reston, VA or Remote
What we’re looking for…
We are looking for a Senior Site Reliability Engineer who is well versed in building cloud technologies in a secure manner, has an automation mindset and is an ardent follower of the SRE discipline. If this sounds like you and you are a US citizen, then our team will benefit from your skillset!
Who we are…
ScienceLogic is going through a product transformation, and the Site Reliability Engineering (SRE) team is at the forefront of it. We are responsible for the design, deployment, and maintenance of the Cloud Infrastructure used for running the company’s revenue generating go-forward SaaS product line.
Overall, we’re passionate about automation and solving complex business and technology challenges. Our team combines SRE, DevOps, Software Development, and Information Security knowledge to help make Cloud operations agile, elastic inside the security and governance framework boundaries.
What you’ll be doing…
Enhance the company’s SaaS infrastructure security protocols. Collaborate across the organization to design, build and operationalize SaaS services conforming to the FedRAMP compliance standards. Participate in architecture, security, and operations reviews for SaaS Government Cloud. Lead design reviews and buildout of secure systems for delivering various SaaS services with 99.99% uptime. Design, automate, test, and monitor the use of cloud native technologies as a foundation for a service platform. Investigate and resolve customer and operational issues with the mentality of fixing and not just mitigating issues. Identify and automate measurement of operations SLAs and SLOs Triage incident response, document SOPs, Runbooks, and train NOC team members Spend 75% of your time on forward looking priorities designing and building SaaS systems while remaining on supporting the Operations and Maintenance of the current SaaS infrastructure. Writing automation that can be easily supported and extended by others. Participate in the on-call rotation as assigned. Work on special projects as assigned.
Qualities you possess…
Here at Site Reliability, we believe that if you are hungry for learning, passionate for technology and like building tools then you are a good fit. Having experience with the skills is an added plus:
10+ years of site reliability engineering or cloud operations experience or equivalent experience Proven track record of operating production SaaS environments within security standards like SOC2, ISO, PCI. Experience in operating FedRAMP certified SaaS Production systems is an added plus. Bachelor's or Master's degree in computer science, Information Systems or similar field Skilled at problem solving, algorithms, and data structures conforming to the modern SaaS security requirements. Building tools and scripting frameworks from scratch. Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli. Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc. Exposure to Windows and Linux administration skills. Familiarity with basic networking, security and cloud engineering concepts Highly collaborative with effective written and verbal communication skills Ability to work against tight deadlines and occasionally after-hours, part of on-call scheduling. Occasionally work during off-hours and participate in weekly on-call schedule. Take full responsibility for the availability and performance of the platform.
About ScienceLogic
ScienceLogic is a leader in IT Operations Management, providing modern IT operations with actionable insights to resolve and predict problems faster in a digital, ephemeral world. Its solution sees everything across cloud and distributed architectures, contextualizes data through relationship mapping, and acts on this insight through integration and automation.
www.sciencelogic.com
#LI-Remote