The Company:
Marigold helps brands foster customer relationships through the science and art of connection. Marigold Relationship Marketing is a suite of world-class martech solutions that help marketers create long term customer love and loyalty. Marigold provides the most comprehensive set of use cases for marketers at any level. Headquartered in Nashville, Tennessee, Marigold has offices globally across the United States, Europe, Australia, New Zealand, South America and Central America, as well as in Japan.
What You’ll Do
Help build a Site Reliability Engineering culture by sharing your best practices, approaches, documentation, and code with other engineering teams
Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually
Troubleshoot complicated issues handling OS, Networking, Database in a cloud-based SaaS environment/on-premises environment and handle live production incidents, debug/troubleshoot application and infrastructure issues, follow and implement SRE best practices
Monitor application performance, take steps to improve overall application performance and stability and follow through with implementation
Conduct system analysis, configuration management and develop improvements for system software performance, availability and reliability
Work closely with software and QA engineers to ensure the system is responding properly to non-functional requirements such as performance, security, and availability
Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it
Maintain and monitor deployments, orchestration, databases, and general backend infrastructure
Keep up-to-date with security and proactively identify, diagnose, and solve complex security issues.
Be part of an on-call rotation to support the global platform providing an excellent customer experience
Ideal Qualifications:
Degree in Computer Science or equivalent combination of education and experience
7+ yrs experience in DevOps or SRE role
7+ yrs Linux experience
5+ years managing production environments in AWS
5+ years experience in Kubernetes preferably EKS
3+ years creating and maintaining infrastructures with Terraform
Experience using infrastructure as code principles to design, build and maintain cloud platforms using Terraform/OpenToFu
Experience working with database and data store technologies such as RDS/MySQL, Elasticache/Redis or equivalent
Knowledge of core server-side concepts and experience working with cloud networking, load balancers, HTTP or GRPC protocols, and large scale microservice environments
Experience with observability stacks, instrumenting environments for logging and monitoring and building and designing dashboards and alerts
Knowledge of DevOps methodologies, basic programming and the tools involved in CI/CD automation
Nice to Have:
Experience managing high scale web application platforms or SaaS platforms
Strong Kubernetes, EKS or ECS/Fargate experience
Deep understanding of security principles
History of contributing to FOSS projects
Experience with AWS networking concepts such as VPC peering, Transit Gateway
Experience with multi-geography, multi-tenant applications
Experience designing and performing disaster recovery
Experience programming with Go or Python
Experience with cost management
Experience with NoSQL databases such as ScyllaDB.
Experience working with Stream processing and big data technology stacks such as Kafka or Trino
What We Offer:
The competitive salary and benefits you’d expect!
Generous time off (we call it Open Time Away) as well as paid holidays and a birthday benefit day off.
Retirement contributions.
Employee-centric and supportive remote work environment with flexibility.
Support for life events including paid parental leave.