GitLab
GitLab

1001-5000 employees

WebsiteLinkedIn
Software Development
DevOps
Cloud Computing
Information Technology
About GitLab

GitLab is a comprehensive DevOps platform delivered as a single application, enabling organizations to manage the entire software development lifecycle from planning and source code management to CI/CD, monitoring, and security. Founded in 2014, GitLab's mission is to make it possible for everyone to contribute to software development by providing a collaborative, open-source platform that supports remote work and transparency. The company offers a cloud-based and self-managed solution that integrates with various tools to streamline development workflows, improve productivity, and accelerate software delivery. GitLab is publicly traded on NASDAQ under the symbol GTLB and serves a global customer base ranging from startups to large enterprises.

2 months ago

Site Reliability Engineer

Full-time
Mid Level
Site Reliability Engineer
Report problem

📋

Description
  • GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. Our mission is to enable everyone to contribute to and co-create the software that powers our world. When everyone can contribute, consumers become contributors, significantly accelerating human progress. Our platform unites teams and organizations, breaking down barriers and redefining what's possible in software development. Thanks to products like Duo Enterprise and Duo Agent Platform, customers get AI benefits at every stage of the SDLC. Our high-performance culture is driven by our values and continuous knowledge exchange, enabling our team members to reach their full potential while collaborating with industry leaders to solve complex problems. Co-create the future with us as we build technology that transforms how the world develops software.
  • You'll join the Dedicated team as a Site Reliability Engineer focused on Environment Automation, where your work will help power hundreds of isolated GitLab environments for our customers. In this role, you'll help keep these environments reliable, scalable, secure, and consistent by treating everything as code and contributing to automation across the entire lifecycle, from initial provisioning to day-to-day operations. You'll collaborate with senior SREs to solve the challenges of managing many tenant environments in parallel, each with its own constraints and integration points.
  • You will define, deploy, and maintain GitLab environments across cloud providers using infrastructure as code, deployment packages, and Kubernetes. Your automation efforts will reduce manual work, and you'll build tooling for upgrades and configuration changes, supporting observability to monitor environment health. Your work will directly impact customer experience by ensuring environments are always production-ready.
  • Responsibilities include designing infrastructure automation with Terraform, Ansible, and Kubernetes; debugging production issues; creating deployment and orchestration tools; automating operational tasks; building observability stacks; responding to incidents; planning infrastructure changes; developing scripts and workflows; applying best practices for Kubernetes and cloud platforms; participating in on-call rotations; and documenting operational procedures.

🎯

Requirements
  • Experience working as an SRE or in a similar role operating production infrastructure, with an interest in automating the lifecycle of many environments or tenants in parallel.
  • Hands-on experience running Kubernetes-based workloads in production, including deployments, rollouts, and debugging issues like crash loops and failed health checks.
  • Familiarity with infrastructure automation and configuration management tools such as Terraform and Ansible, including managing modules, variables, and state.
  • Solid understanding of Git-based workflows and infrastructure-as-code practices, with the ability to contribute to reusable modules, templates, and pipelines.
  • Experience working in distributed systems or cloud-based production environments, ideally in SaaS or managed service settings, with incident response and on-call experience.
  • A proactive mindset focused on automation and documentation, seeking to remove manual steps and improve runbooks.
  • Comfort working asynchronously across distributed teams and contributing to collaborative values.
  • Basic programming skills in languages such as Go or Ruby are valuable but not required; experience with infrastructure tooling is a plus.

🏖️

Benefits
  • Flexible Paid Time Off
  • Team Member Resource Groups
  • Equity Compensation & Employee Stock Purchase Plan
  • Growth and Development Fund
  • Parental leave
  • Home office support