My client, a global hedge fund, is actively seeking a hands on a highly skilled and motivated SRE to join their team. As an SRE, you will play a critical role in driving the adoption of Site Reliability Engineering practices within their organization. The ideal candidate will have a strong technical background and a passion for driving operational efficiency and continuous improvement.
 The role:
  - Drive the adoption of SRE principles, methodologies, and best practices across the organization.
  - Collaborate closely with application development teams to ensure the successful deployment and operation of applications, including early-stage support during development.
  - Establish and monitor key metrics, performance indicators, and service level objectives (SLOs) to ensure the reliability and availability of critical systems.
  - Identify opportunities to eliminate toil through automation, code improvements, and process optimizations.
  - Conduct root cause analyses for system failures and incidents, and implement engineering solutions to prevent future occurrences.
  - Lead incident management and resolution efforts, ensuring timely and effective response to incidents, and driving post-incident reviews and process improvements.
  - Work closely with cross-functional teams, including infrastructure, networking, and security, to optimize system performance, scalability, and security.
  - Collaborate with stakeholders to define and refine service-level agreements (SLAs) and operational requirements.
  - Stay abreast of industry trends and emerging technologies in Site Reliability Engineering, and leverage them to drive innovation and enhance operational efficiency.
  
 What you offer:
  - Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
  - Extensive experience in Site Reliability Engineering or a related field, with a strong understanding of SRE principles, practices, and tools.
  - Strong technical background with expertise in areas such as distributed systems, cloud computing, network architecture, and software development.
  - Strong experience in Python, also with automation and configuration management tools.
  - Solid understanding of monitoring and observability frameworks, incident management, and post-incident analysis.
  - Excellent problem-solving and troubleshooting skills, with the ability to analyze complex systems and identify areas for improvement.
  - Strong leadership skills, with the ability to inspire and motivate a team, and foster a culture of collaboration, innovation, and continuous improvement.
  - Excellent communication and interpersonal skills, with the ability to effectively communicate technical concepts to both technical and non-technical stakeholders.
  - Experience in the financial industry or hedge fund environment highly preferred.
  
 The sell:
  - Extremely competitive compensation and medical benefits
  - Brand name that will open doors for your career in the future
  - Many opportunities for internal mobility and long term career growth