OVERVIEW Highly skilled, hands-on technical engineer with demonstrable success maintaining high-availability, large-scale enterprise/cloud services. Innovative problem solver with proven leadership and mentoring abilities. Long track record of delivering substantial return on investment to employers and clients. A commitment to keeping up to date with the latest developments in the industry. EXPERIENCE Senior Site Reliability Engineer 2018 - present AJW Group Sussex, UK Responsibilities • Work alongside a geographically distributed team of Developers and Infrastructure Engineers for AJW Group, a world-leading independent specialist in the global management of commercial and business aircraft spares. • Ensure health of production systems, investigate anomalous behaviour and triage outages, shepherd code changes from development to production, develop and enhance automation and monitoring tools. • Manage systems and maintain servers spread across multiple cloud platforms with total annual operating budget of nearly £400K. • Architect new services, re-architected existing services, and conceived new features and functionality. • Provide technical leadership in cross-organizational projects. • Serve as escalation point for troubleshooting critical problems and unexpected operational issues. Accomplishments • Documented achievement of service availability exceeding 99.99% • Produced detailed service metrics, allowing consistently accurate utilization projections; variance from norm in metrics used as an early- warning mechanism for detecting problems/changes in behaviour. • Developed benchmarking tools for system analysis and optimization; allowed detailed performance testing of new hardware and software configurations outside of actual production environment. • Established a common monitoring and reporting framework, including the creation of templates which facilitated the rapid development and deployment of new reporting mechanisms. • Designed and maintained Docker images for build environments utilized by multiple teams. Established a configuration management toolkit for enforcing operational best-practices throughout the organization. • Originally joined AJW Group as a Cloud Engineer, elevated to Senior SRE within a year of hire. • Successfully transitioned production deployment and on-call/triage responsibilities to SRE team; created documentation for SRE ramp- up and critical job functions, including prod deployment process/checklist. • Managed successful delivery of new production cloud architecture; developed system validation and performance benchmarking tools; streamlined validation and deployment processes. • Became highly proficient with Kubernetes, an open-source system for automating deployment, scaling, and management of containerized applications; helped develop Kubernetes best practices, identified bugs and suggested new features. • Implemented standards for incident tracking, documentation, and post-mortems. • Awarded Kubernetes Certified Administrator (CKA) Certification. Cloud Engineer 2017 - 2018 AJW Group Sussex, UK Responsibilities • Architect new services, re-architected existing services, and conceived new features and functionality. • Ensuring that support calls were logged and handled effectively / efficiently within agreed Service Level Agreements using ITIL compliant service desk applications. Accomplishments • Implemented monitoring, alerting, and code delivery mechanisms which stabilized service reliability and reduced downtime by an order of magnitude in less than 1 month after taking over AWS. • Led effort to establish common Terraform infrastructure for all AJW Group cloud services. Support Engineer 2013 - 2017 Equinix / Telecity London, UK Responsibilities • Ensuring that support calls were logged and handled effectively / efficiently within agreed Service Level Agreements using ITIL compliant service desk applications. • Worked in a team as part of 24/7 network operations centre forEquinix, a global managed services provider, supporting mission critical datacenter infrastructure across the globe. • Ensuring health of production systems, investigate anomalous behaviour and triage outages. • Monitoring the progress of live support tickets with third-party maintenance contract suppliers. • Monitoring of internal and customer hardware, working with external hardware vendors and internal teams to remediate hardware and configuration issues. • Working with network carriers to troubleshoot customer and internal networks. Configuration changes carried out on a broad range of core network cisco equipment, including ASR Service Provider border routers and access switches. • Rule checks on customer security hardware including Cisco and Checkpoint firewalls. • Deployment of new physical and virtual servers. OS patching, configuration and troubleshooting of VMWare ESXi hypervisors and virtual infrastructure management for both internal and customer environments. • DDoS attack mitigation and threat management of customer and internal IP Networks. Accomplishments • Implemented standards for source code management using Git and Gitlab. EDUCATION Arborventure LTD, UK - (CS38) Tree Climbing and Aerial Rescue 2006 – 2007 Solent University, UK - Cisco Certified Network Associate 2005 – 2006 University of Portsmouth, UK - Computer Network Management & Design 2003 – 2005 TECHNICAL EXPERTISE Software - Git, GitHub, Gitlab, Docker, Kubernetes, Terraform, Cloudformation, Grafana, Prometheus Operating Systems - Linux, Docker, Windows Server, MacOS Programming - Go, Bash, Python, SQL, HTML, CSS Cloud Vendors - AWS, GCP, Azure