Monitoring: contribute to the improvement of the monitoring and measurement systems that support our operational scale and continuous delivery. This goes from setting up and maintaining the right tools, to help the different engineering teams on the correct instrumentation of their code;
Availability: work to measure and increase the mean-time-between-failures and decrease the mean-time-to-repair of public-facing systems;
Operations: help the engineering team to operate their systems;
Performance, Efficiency & Latency: contribute to the measurement techniques that assist in the performance tuning of the applications stack by recommending and implementing performance improvements, also leveraging monitoring systems to maintain application performance at acceptable levels;
Security & Risk: participate in the ongoing process to identify and mitigate risk on our systems, ensuring compliance requirements standards are met;
Capacity Planning: use our monitoring suite to advise on capacity requirements;
Engineering Tools: create and maintain tools that help engineering teams improve their day to day work.