OVERVIEW

Highly skilled, hands-on technical engineer with demonstrable success maintaining high-availability, large-scale enterprise/cloud services. Innovative problem solver with proven leadership and mentoring abilities. Long track record of delivering substantial return on investment to employers and clients. A commitment to keeping up to date with the latest developments in the industry.

EXPERIENCE

Senior Site Reliability Engineer

AJW Group

2018 - present

Sussex, UK

Responsibilities

• Work alongside a geographically distributed team of Developers and Infrastructure Engineers for AJW Group, a world-leading independent specialist in the global management of commercial and business aircraft spares

• Lead and developed the culture of SRE within the Organisation, implemenation of Automated Incident Management across services

• Lead development of tools,automation to facilitate production system uptime and achieving product SLA

• Defining service SLA / SLOs of services

• Feature development, enhancements for the Kubernetes PAAS platform

• On Call activities, Incident management and Postmortem efforts for the platform

• Lead One Click Deployment of PAAS Infrastructure, auto-remediation / repairing of Infrastructure

• Ensure health of production systems, investigate anomalous behaviour and triage outages, shepherd code changes from development to production, develop and enhance automation and monitoring tools

• Provide technical leadership in cross-organizational projects

• Serve as escalation point for troubleshooting critical problems and unexpected operational issues

Accomplishments

• Documented achievement of service availability exceeding 99.99%

• Produced detailed service metrics, allowing consistently accurate utilization projections; variance from norm in metrics used as an early-warning mechanism for detecting problems/changes in behaviour

• Developed benchmarking tools for system analysis and optimization; allowed detailed performance testing of new hardware and software configurations outside of actual production environment

• Established a common monitoring and reporting framework which facilitated the rapid development and deployment of new services

• Established a configuration management toolkit for enforcing operational best-practices throughout the organization

• Originally joined AJW Group as a Cloud Engineer, elevated to Senior SRE within a year of hire.

• Successfully transitioned production deployment and on-call/triage responsibilities to SRE team; created documentation for SRE ramp-up and critical job functions, including prod deployment process/checklist

• Managed successful delivery of new production cloud architecture; developed system validation and performance benchmarking tools; streamlined validation and deployment processes

• Became highly proficient with Kubernetes, an open-source system for automating deployment, scaling, and management of containerized applications; helped develop Kubernetes best practices, identified bugs and suggested new features

• Implemented standards for incident tracking, documentation, and post-mortems

• Awarded Kubernetes Certified Administrator (CKA) Certification

Cloud Engineer

AJW Group

2017 - 2018

Sussex, UK

Responsibilities

• Architect new services, re-architected existing services, and conceived new features and functionality

• Incident Management and resolution

• Troubleshooting and triaging operational and application issues and fixing them within the defined SLA

• Infrastructure Capacity Management

• Infrastructure and application monitoring / logging

• Production upgrades / updates / patching

• Ensuring that support calls were logged and handled effectively / efficiently within agreed Service Level Agreements using ITIL compliant service desk applications

Accomplishments

• Implemented monitoring, alerting, and code delivery mechanisms which stabilized service reliability and reduced downtime by an order of magnitude in less than 1 month after taking over AWS.

• Led effort to establish common Terraform infrastructure for all AJW Group cloud services.

Support Engineer

Equinix / Telecity

2013 - 2017

London, UK

Responsibilities

• Ensuring that support calls were logged and handled effectively / efficiently within agreed Service Level Agreements using ITIL compliant service desk applications.

• Worked in a team as part of 24/7 network operations centre for Equinix, a global managed services provider, supporting mission critical datacenter infrastructure across the globe.

• Ensuring health of production systems, investigate anomalous behaviour and triage outages.

• Monitoring the progress of live support tickets with third-party maintenance contract suppliers.

• Monitoring of internal and customer hardware, working with external hardware vendors and internal teams to remediate hardware and configuration issues.

• Working with network carriers to troubleshoot customer and internal networks. Configuration changes carried out on a broad range of core network cisco equipment, including ASR Service Provider border routers and access switches.

• Rule checks on customer security hardware including Cisco and Checkpoint firewalls.

• Deployment of new physical and virtual servers. OS patching, configuration and troubleshooting of VMWare ESXi hypervisors and virtual infrastructure management for both internal and customer environments.

• DDoS attack mitigation and threat management of customer and internal IP Networks.

Accomplishments

• Implemented standards for source code management using Git and Gitlab.

EDUCATION

CNCF - Certified Kubernetes Administrator (CKA)

2020

Arborventure LTD, UK - (CS38) Tree Climbing and Aerial Rescue

2006 – 2007

Solent University, UK - Cisco Certified Network Associate

2005 – 2006

University of Portsmouth, UK - Computer Network Management & Design

2003 – 2005

TECHNICAL EXPERTISE

Software - Git, GitHub, Gitlab, Docker, Kubernetes, Terraform, Cloudformation, Grafana, Prometheus

Operating Systems - Linux, Docker, Windows Server, MacOS

Programming - Go, Bash, Python, SQL, HTML, CSS

Cloud Vendors - AWS, GCP, Azure