Senior DevOps Engineer - GCP & AI Systems
Must live in Eastern Time Zone or Central Time Zone
Direct-hire, full-time employment
Company
• Software for people with disabilities
• Make a difference and impact in people's lives
• Competitive salary, comprehensive benefits, 401(k)
• Fast growing company with opportunities for career growth
• Passionate team committed to bringing out the best in each other
• Artificial Intelligence and automation leveraged to scale
• Industry leader, laid-back culture, open-door policy
Summary
We are looking for a Senior DevOps Engineer to lead the evolution of our high-performance infrastructure. You will own a diverse environment that bridges immutable Golden Image VM architectures with modern Kubernetes (GKE) orchestration and Managed Instance Group (MIG). Your mission is to ensure that our systems, ranging from large-scale file processing to GPU-accelerated AI inference, are secure, cost-optimized, and resilient against any disaster.
Key Responsibilities
• Hybrid Infrastructure & IaC: Design and maintain a mix of VM-based (Alpine/Ubuntu) and containerized (GKE) environments using Terraform and Packer for cloud and on-premise deployments
• Storage & Networking: Optimize high-performance NFSv3 (Filestore) mounts and manage complex VPC networking, including subnets, firewalls, and secure internal routing
• Database Scaling & Reliability: Manage the lifecycle of Google Cloud Datastore, MySQL, and MongoDB, with a heavy focus on high-availability, replica set tuning, and automated backups
• AI & GPU Operations: Scale AI infrastructure utilizing GPUs and lifecycles for high-concurrency ML workloads
• Security & Hardening: Enforce strict security standards across Alpine, Ubuntu, and Debian systems, including OS-level hardening, IAM least-privilege, and answering complex security compliance requirements
• Disaster Recovery (DR) & Observability: Develop and test robust DR strategies and implement comprehensive monitoring to ensure system health
• Cloud Cost Governance: Proactively optimize GCP spend by rightsizing resources, managing idle costs, and leveraging efficient autoscaling policies
Skills
• Cloud (GCP): GKE (Standard/Autopilot), Managed Redis, Cloud Storage, Artifact Registry, and expert-level gcloud CLI
• Infrastructure: Immutable Patterns: Terraform (state management), Ansible, Packer (HCL), and Managed Instance Groups (MIG)
• Systems: Multi-Distro Linux: Bash/Python scripting across Alpine (OpenRC), Ubuntu, and Debian
• Networking: Storage Protocols: Deep understanding of NFS (v3 vs v4), HTTP, TLS, Web Socket, SSH Tunneling (ProxyJump), and Load Balancing (GFE/URL Maps)
• Databases: Data Operations: MongoDB (replica sets/WiredTiger), MySQL, and Redis-backed task queues
• Containers: Kubernetes Operations: Managing Pod Autoscaling (HPA/VPA), Node Pools, Taints, and Tolerations
Preferred Certifications
• Google Professional Cloud DevOps Engineer
• Google Professional Cloud Architect
• Certified Kubernetes Administrator (CKA)
• MongoDB Certified DBA
Next Step
• Email your resume to Sean.Zetts@RiversideRecruiting.com for more information
Sean Zetts
440-447-0001
Riverside Recruiting
Sr. Recruiter & President
www.RiversideRecruiting.com
www.LinkedIn.com/in/SeanZetts
Sean.Zetts@RiversideRecruiting.com