NetForce Ukraine

DevOps Engineer

Форма роботи

повна зайнятість

Рівень

Middle

Навички

Docker
Kubernetes
CI/CD
AZURE
GitOps
Azure Container Registry

a month ago

Вимоги

Your primary responsibility is hands-on infrastructure automation and operations for an Agentic AI platform and its supporting services.

Cloud Infrastructure & Automation

· Design, implement, and maintain cloud infrastructure in Azure using Terraform with reusable modules and environment-based configurations

· Build and manage Azure resources including AKS, VMs, VNets, Load Balancers, Key Vault, API Management, Azure Container Registry, storage, and database-related services

· Implement scalable and secure networking topologies, including hub-spoke architecture, private endpoints, firewalls, routing, and WAF

· Support infrastructure readiness for future multi-region setup and disaster recovery scenarios

· Define and improve backup, restore, and recovery processes for critical infrastructure and databases

CI/CD, GitOps & Platform Enablement

· Design and maintain automated CI/CD pipelines in Azure DevOps using Pipeline as Code principles

· Implement multi-stage YAML pipelines with approvals, environments, variables, secrets, and deployment strategies

· Enable automated container build and deployment workflows using Docker, ACR, AKS, and Helm

· Develop reusable pipeline templates for consistent delivery practices across teams

· Support gradual transition toward GitOps-based deployment workflows, preferably using ArgoCD

· Maintain deployment configuration in Git where appropriate and help improve traceability of infrastructure and application changes

Kubernetes & Runtime Operations

· Operate and scale AKS clusters, including node pool management, network policies, autoscaling, and cluster security

· Deploy microservices and supporting components using Helm

· Support runtime reliability, troubleshooting, resource optimization, and incident investigation

· Improve operational readiness of services running in Kubernetes environments

Databases, Backups & Reliability

· Support PostgreSQL-based application infrastructure, including access, connectivity, backups, restore validation, and operational reliability

· Understand how backend services use Prisma, including schema changes, migrations, and database interaction patterns

· Help improve backup strategy, recovery procedures, and documentation for critical services

· Contribute to disaster recovery planning, including RTO/RPO considerations and future multi-region readiness

· Support systems using Kafka or similar event-driven components where applicable

Security, Compliance & Observability

· Implement industry best practices for cloud and DevOps security, including least privilege, identity federation, secret governance, and artifact signing

· Apply security guardrails across Azure and delivery pipelines, including automated scanning and policy-as-code using Azure Policies

· Set up and maintain monitoring, logging, dashboards, and alerting using Azure Monitor, Log Analytics, Application Insights, Grafana, Prometheus, Alloy, Loki, and OpenTelemetry

· Improve visibility into system health, application performance, infrastructure usage, and deployment stability

Collaboration & Internal Tooling

· Collaborate closely with software engineering and AI teams to enable fast and reliable development workflows

· Participate in architectural discussions around infrastructure scalability, reliability, disaster recovery, and operational readiness

· Contribute to documentation for infrastructure architecture, Terraform modules, pipelines, backup/restore processes, GitOps workflows, and operational runbooks

Буде плюсом

Nice to have

· Experience with AWS/GCP

· Experience with integrating LLM-powered systems or high-throughput AI/ML pipelines

· Experience with GitHub Actions or other CI/CD platforms

· Experience with event-driven systems: Azure Event Grid, Service Bus, Kafka

· Automation using Python, TypeScript, or Bash

· Strong experience with GitOps, preferably ArgoCD

· Experience with PostgreSQL administration, backup automation, restore testing, and performance troubleshooting

· Familiarity with Prisma ORM from an infrastructure/DevOps perspective

· Experience designing backup and disaster recovery strategies

· Experience with multi-region cloud architecture and high-availability systems

· Experience with cloud cost optimization and FinOps practices

· Familiarity with on-call operations, incident response, and SRE practices