We are seeking highly experienced Senior DevOps Engineers based in the EU region to lead the implementation of an active/active multi-region architecture for our Autonomics platform. This critical project requires delivering a complete active/active solution within 12 weeks, demanding exceptional expertise and proven track record in similar high-stakes implementations.
We are seeking highly experienced Senior DevOps Engineers based in the EU region to lead the implementation of an active/active multi-region architecture for our Autonomics platform. This critical project requires delivering a complete active/active solution within 12 weeks, demanding exceptional expertise and proven track record in similar high-stakes implementations.
Project Critical Requirements
Critical Delivery Requirements
- Target Delivery: 12 weeks for complete active/active implementation
- Mission-Critical SLA: 99.95% uptime requirement
- RTO/RPO Target: 0 minutes ideal, few minutes acceptable within SLA
- Geographic Requirement: EU-based candidates preferred for timezone alignment and collaboration
Technical Scope
- Transform existing single-region platform to active/active multi-region deployment
- Implement intelligent user routing to closest region with seamless disaster failover
- Maintain business continuity during regional disasters
- All implementations using open-source technologies: Kubernetes, Percona MySQL, OpenSearch, Cassandra, Redis
Required Qualifications
Essential Experience
- 8+ years of DevOps/Infrastructure Engineering with demonstrable active/active
implementations
- Proven track record of delivering similar projects under tight timelines
- Expert-level Kubernetes multi-cluster and multi-region management
- Deep expertise in distributed systems, CAP theorem, and conflict resolution patterns
- Production experience with 99.9%+ SLA environments and disaster recovery
- Data model review and adaptation experience for multi-region architectures
- Network design expertise for cross-region topology and capacity planning
- Conflict resolution strategy implementation in distributed environments
Database Expertise
- Production-scale Percona MySQL multi-master replication and conflict resolution
- Enterprise-level Apache Cassandra multi-datacenter deployments with performance tuning
- Large-scale OpenSearch cluster federation and cross-region synchronization
- High-availability Redis clustering and cross-region replication with monitoring
- Hands-on experience with database performance optimization under multi-region load
- Replication monitoring and alerting system implementation
Infrastructure & Architecture
- Proven experience designing and implementing multi-region network architectures
- Advanced knowledge of latency optimization and cross-region traffic management
- Expert-level load balancing and DNS-based failover mechanisms
- Production experience with chaos engineering and disaster recovery testing
- Advanced monitoring and observability implementation for distributed systems
- Kubernetes configuration updates for multi-region deployments
- Region affinity rules and pod topology spread constraints implementation
- Cross-region service discovery and health check refinements
- Deployment pipeline updates for multi-region automation
Project Delivery Skills
- Proven ability to work under tight deadlines without compromising quality
- Experience with parallel development and implementation strategies
- Strong collaboration skills for working with core development teams
- Excellent communication for rapid decision-making and issue escalation
- Risk management experience for identifying and mitigating project bottlenecks
- Load testing across regions and performance validation experience
- Chaos testing implementation for regional failure scenarios
- Runbook development and verification for operational procedures
-
We offer:
We will inform you about the development of the platform and new features to make your search more efficient and convenient