Senior Site Reliability Incident Analyst at Waddle

Reliability, Permanent, Melbourne, AU melbourne analytics full-time
Description
Posted 2 days ago

Xero is a beautiful, easy-to-use platform that helps small businesses and their accounting and bookkeeping advisors grow and thrive. 

At Xero, our purpose is to make life better for people in small business, their advisors, and communities around the world. This purpose sits at the centre of everything we do. We support our people to do the best work of their lives so that they can help small businesses succeed through better tools, information and connections. Because when they succeed they make a difference, and when millions of small businesses are making a difference, the world is a more beautiful place.


How you’ll make an impact 
You will be a trusted advisor to engineering teams and engineering leadership, helping them gain deeper insights into the production operations of their systems through data, insights and deeper analysis of incidents. You will enable improvement of reliability and resilience of Xero’s services through cultivating a deeper understanding of incidents and operational surprises. You will collaborate with stakeholders across Xero to enhance situational awareness of production operations and enable learnings from incidents. 
You will be able to frame up the questions and needs to answer, identify critical data points and apply methods to produce incident reporting which provides clear, meaningful insights. You will produce compelling narrative descriptions of incidents and surprising events with the purpose of maximising post incident learning. 

What you’ll do

  • We make it beautiful: Develop, deliver and enhance incident reporting for consumption at all levels of the Xero and provide recommendations and enhancements to existing processes.
  • We make it happen: Conduct routine and ad hoc analysis of incidents with the purpose of enabling  learning from incidents to provide recommendations, improvements and lead enhancements. 
  • We make it human: Collaborate with relevant stakeholders to automate and dashboard incident data and insights where possible. Share expertise in knowledge of human factors, safety science and resilience engineering with the ability to produce compelling narrative descriptions of incidents and operational surprises that encourage learning and engagement.
  • We make it together: Collaborate with teams across Xero to improve the quality and usability of data supported by excellent written and verbal communication skills for communicating technical concepts to technical and business stakeholders
  • About you

  • Demonstrate knowledge, skills and expertise  as a software, platform or site reliability engineer, or similar, and strong exposure to incident analysis with knowledge in safety-related incident investigations
  • Demonstrated learnings and confidence  in influencing and advocating on the topics of systems reliability engineering, incident learning and safety culture
  • Skilled in delivering data and analytics in the form of visualisations, interactive dashboards and reports with data, trends, and producing recommendations for stakeholders with a solid foundational knowledge of techniques for data , particularly qualitative. 
  • Collaborating with stakeholders at all levels Xero, to create strong connections and opportunities to make it together. 
  • Expertise in complex distributed systems environment working with large, diverse and disparate data sources and data sets delivering analytics and reporting to stakeholders and providing recommendations and enhancements
  • Technical proficiency in modern enterprise analytics platforms, such as MicroStrategy, Microsoft SQL and proficiency in Python or R with any of the following technologies a plus; Mixpanel, Google Analytics, New Relic
  • Recruitment assessment and selection process for this specific position at Xero.

  • 30 minute video call with Talent Specialist to align on suitability to progress/shortlist for formal interview
  • 60 minute technical assessment to with hiring manager
  • 60 Minute Interview with SRE Team for you to share a relevant technical example of a Site Reliability incident and your role in this. 
  • 30 Minute final interview with key non technical stakeholder.
  • Why Xero? 
    Offering very generous paid leave to use however you’d like (plus statutory holidays!), dedicated paid leave to care for your physical and mental wellbeing as well as an Employee Assistance Program to access mental health care for you and your family, health insurance, life insurance, and income protection, wellbeing and sports programmes, employee resource groups, 26 weeks of paid parental leave for primary caregivers, an Employee Share Plan, beautiful offices, flexible working, career development, and many other benefits that reflect our human value, you’ll do the best work of your life at Xero.