Senior Site Reliability Engineer
BillingPlatform
Serbia
Contract
5-10
7h ago
88%
Strong
Job description
Senior Site Reliability Engineer
BillingPlatform is an industry-leading, fast-growing SaaS company. Our award-winning, cloud-based revenue lifecycle management platform is leveraged by leading global enterprises to automate and streamline the entire quote-to-cash process. At BillingPlatform, our employees are our most valuable asset, and we believe deeply in a culture of collaboration, accountability, innovation, and transparency. We seek bright, enthusiastic, and creative professionals looking to be part of our incredible team focused on challenging the status quo and driving transformational value to customers.
Backed by leading private equity firms FTV Capital and Columbia Capital, we have achieved remarkable industry recognition for growth, including being listed for the fifth consecutive year on Deloitte’s Technology Fast 500™ list of fastest-growing technology companies and ranked on the Inc 5000 list for four years running.
Our ability to innovate market-leading solutions has been validated by all major industry analyst firms, including being named a Leader in the first-ever Gartner® Magic Quadrant™ for Recurring Billing Applications, and being recognized as the Leader in Forrester Research’s “The Forrester Wave™: SaaS Recurring Billing Solutions.” To learn more about us, visit billingplatform.com.
Responsibilities
Own and improve on-call processes, incident response playbooks, and post-mortem culture
Define, track, and manage SLOs, SLIs, and error budgets for critical services
Lead blameless post-mortems and drive systematic reliability improvements
Respond to production incidents and coordinate cross-functional resolution
Design, build, and maintain scalable AWS infrastructure using IaC (Terraform, Pulumi)
Manage Kubernetes clusters and containerized workloads in production
Build and maintain CI/CD pipelines to improve deployment speed and reliability
Evaluate and implement tooling to enhance developer productivity and system stability
Implement monitoring, alerting, and distributed tracing (Prometheus, Grafana, Datadog, Jaeger)
Identify and resolve performance bottlenecks across services, networks, and databases
Build dashboards and runbooks for self-service operational insights
Partner with engineering teams to embed reliability practices (load testing, capacity planning, chaos engineering)
Conduct architecture reviews with a focus on reliability and operability
Qualifications
5+ years of experience in SRE, DevOps, or infrastructure engineering
Deep expertise with AWS and cloud-native architectures
Strong experience with Kubernetes and container orchestration at scale
Hands-on experience with infrastructure-as-code tools (Terraform or Pulumi)
Proficiency in Python, Go, or Bash
Experience with observability tools (Prometheus, Grafana, Datadog, or similar)
Strong understanding of SLOs, SLIs, and error budgets
Experience with service mesh technologies (Istio, Linkerd)
Familiarity with chaos engineering tools (Chaos Monkey, Gremlin, LitmusChaos)
Background in Oracle database reliability and administration
Contributions to open-source infrastructure projects
Experience in a high-growth SaaS or product-led environment
Excellent English communication skills (written and spoken)
Incentives
Become a part of the team on global initiatives
A high-impact role at a growing SaaS company that values personal growth, accountability, and teamwork
A culture of open collaboration and problem-solving
100% remote
Competitive pay
This position is based in Serbia and is not eligible for relocation.
BillingPlatform provides equal employment opportunity (EEO) to all persons regardless of age, color, national origin, citizenship status, physical or mental disability, race, religion, creed, gender, sex, pregnancy, sexual orientation, gender identity and/or expression, genetic information, marital status, status with regard to public assistance, veteran status, or any other characteristic protected by federal, state, or local law.