Site Reliability Engineer (SRE), DevOps, system architect
Your New Company
Our client, a global retailer group, is seeking a Site Reliability Engineering (SRE) Architect to drive the reliability, scalability, and efficiency of large-scale systems. This role focuses on observability frameworks and automation, ensuring proactive system management and continuous improvement.
Your New Role
- Design and implement observability solutions using tools such as Dynatrace, Prometheus, Grafana, and ELK Stack.
- Advocate for observability best practices and integrate monitoring into infrastructure.
- Automate infrastructure and optimize performance using tools like Ansible, Jenkins, and AI-driven anomaly detection.
- Mentor team members, fostering a culture of innovation and continuous improvement.
- Collaborate with technical partners for PoCs, tool exploration, license management, and training.
What You'll Need to Succeed
- 5+ years in SRE, DevOps, or systems architecture roles with project deployment experience.
- Hands-on experience with observability tools and automation frameworks.
- Strong scripting/programming skills for automation.
- Knowledge of AI/ML-driven observability, predictive analytics, and anomaly detection.
- Exceptional problem-solving skills, a data-driven mindset, and the ability to communicate effectively with technical and non-technical stakeholders.
What You Need to Do Now
If you're ready to take on this exciting role, send your CV to cherry.ho@hays.com.hk, or call Cherry Ho at +852 2230 7493 for a confidential discussion.