Back to all jobs
W

Software Test Engineer - (Contract)

wwwcarbon3aicareers

United Kingdom (Occasional office visit maybe required) Contract 1h ago

Job description

Role Summary: We are seeking a Software Engineer in Test to join our fast-scaling team. This role sits within Product but works across Product, Engineering and Operations. You will design and build automated testing, validation, and benchmarking capabilities that continuously verify our cloud platform, Kubernetes environments, GPU infrastructure and customer-facing services. You will help ensure that every platform release, infrastructure change or hardware deployment is tested, validated and production-ready before reaching customers. This is an opportunity to join a mission-led AI business that is redefining infrastructure, intelligence, and impact for enterprise customers. Initial 6 month contract Start date - 13th July Competitive day rate If you are a contractor open to perm please include salary expectations in application. Key Responsibilities: Test automation, validation, and benchmarking: Design, build and maintain automated test frameworks for cloud infrastructure and platform services. Develop automated validation suites covering APIs, infrastructure workflows, Kubernetes environments and customer-facing services. Integrate automated testing into CI/CD pipelines to support continuous validation and release readiness. Designing, building and maintaining automated test frameworks for regression, smoke, integration, performance testing Develop performance, scalability and resilience testing capabilities. Create benchmarking frameworks to validate infrastructure behaviour under realistic customer workloads. Partner with Product Managers to define measurable acceptance criteria and quality gates for new features and platform capabilities. Work with Platform Engineering teams to improve testability, automation and release confidence. Contribute to service readiness, release reviews and post-incident analysis. Infrastructure Validation: Create and develop automated validation for GPU platforms, Storage systems, Networking services, Identity and access management, Platform APIs, Kubernetes clusters Create repeatable tests for new environments, platform releases and infrastructure upgrades. Develop repeatable qualification tests for new environments, platform releases and infrastructure upgrades. Support cluster qualification and operational readiness testing. Observability & Reliability: Investigate failures using logs, metrics and telemetry. Work with Engineering and Operations teams to identify root causes and improve platform resilience. Develop automated monitoring and health-check capabilities that continuously assess service readiness. Essential Experience: Experience testing AI, GPU or HPC platforms and workloads. HPC technologies such as Slurm, InfiniBand and performance benchmarking tools (DCGM, NCCL, NIXL, HPL). Strong software engineering skills (including CI/CD), with proficiency in Python or a comparable programming language. Experience analysing logs, metrics and monitoring data using an observability stack such as Prometheus, Grafana, Loki or OpenTelemetry. Hands-on experience with infrastructure automation tools such as Terraform or Ansible. Strong communication skills with the ability to work effectively across Product, Engineering and Operations. Comfortable operating in a fast-moving startup environment and balancing quality, risk and delivery priorities. Why Join Era4: You’ll be joining a mission-driven start-up building critical national infrastructure, where operational excellence directly enables growth. This role offers high visibility with leadership, real autonomy, and the chance to shape how a next-generation company operates at scale. Diversity & Inclusion : Era4 is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Era4 develops, owns and operates AI infrastructure across the UK, powered by renewable energy. Converting legacy industrial and energy sites into modern data-centre facilities, Era4 is combining brownfield regeneration opportunities with cleaner, efficient, scalable compute capacity for healthcare, research, finance, enterprise, and public-sector organisations