By combining Artificial and Human Intelligence, Crisp’s Extended Intelligence delivers 24/7/365 protection by continually fighting the weaponization of social communications from whoever the source, whatever the language and whichever the online harm.
Crisp tracks and understands what ‘Bad Actors’ are saying to build accurate profiles and use predictive and behavioural analytics to identify trends and identify new, unknown harms. Crisp’s proprietary technology then scans the web continuously, capturing billions of pieces of data every week.
Crisp currently protects over $4 trillion of aggregate market capitalisation across our current customer base. This demonstrates both the value and uniqueness of our service and the trust our customers have in protecting their reputational risk.
SCOPE AND REMIT OF THE ROLE
This role sits within the Site Reliability Engineering team and is part of the wider Crisp Technical Services business unit. Our suite of SaaS, distributed systems and product integrations help our internal stakeholders run their critical business operations and provide customers in turn with industry leading threat detection technology products. You’ll play a key role in the formation of a new area within Crisp: that aims to drive operational excellence and customer focus into the operation of our SaaS hosted application suite.
As a Technical Operations Engineer, you will be using your skills and expertise on cloud platforms to maintain and improve availability and running on the Google Cloud platform, and support our industry leading SaaS solution. As part of the SRE team you will be an integral part of ensuring our platforms are highly available and resilient, through continual monitoring and providing improvement suggestions. You will work closely with engineering teams in Development and Delivery to uphold contracted Service Level Objectives (SLOs). You will be tasked with ensuring our internal and externally available systems have reliability, and uptime appropriate to user needs.
ROLE DUTIES AND REQUIREMENTS
Work in a team to provide third line support for the infrastructure and application Take responsibility, ownership, and coordinate fault resolution. Work alongside a team of engineers where necessary to fix faults that are raised against the supported elements, networks, or applications, For service impacting incidents lead the investigation into the RCA, producing any reports and co-ordinating the delivery of any fixes to mitigate further occurrences Use and maintain our monitoring platforms Operating 24/7 to a response SLA Adhere to ITIL Practices Respond to first line alerts from our alerting platform/s Tune alerting thresholds to reduce false positive alerts Follow run books/SOP’s for alert resolution Create and update SOP’s/runbooks to help with repeat resolutions Write up Postmortems/RCA Documents for P1/P2 Close Loop on Service Cloud Cases for incidents Escalate if an issue was unable to be resolved Contribute to problem management and root cause diagnosis following an incident Manage Escalation for any outstanding issues working with the SDO IAM / GCP permissions (not platform permissions) BAU routine examples Elastic index snapping security alert reviews daily volume alerts triage Persistent failures reprocessing
Essential Experience
Desirable Experience
Google Cloud Certifications Release and Deployment Tooling Octopus Deploy Elasticsearch Consul Docker Linux (Debian/Ubuntu) SQL Software Defined Networking Cloud & Platform Security