Find a new opportunity within our portfolio


Site Reliability Engineer



Software Engineering
Posted on Sunday, July 7, 2024

Company Description

Onum is a data optimization and analytics company based in Madrid. We specialize in real-time data analysis to enable rapid decision-making regarding cybersecurity, network performance, and infrastructure management. Onum helps you optimize your data analytics costs by reducing data, avoiding vendor lock-in, and aligning the value of each dataset with actions taken.

About the Role

At Onum, we're pioneering real-time data optimization and analytics to empower organizations with actionable insights. As a Site Reliability Engineer (SRE) at Onum, you'll play a crucial role in ensuring the reliability, scalability, and performance of our cutting-edge data analytics platform. You'll be instrumental in sculpting our technological platform to align seamlessly with our business and product objectives, fundamentally reshaping the conventional insurance offering to deliver unparalleled value to our customers and partners. This entails addressing intricate challenges within a multi-tenant, distributed infrastructure, where you'll craft, deploy, and refine meticulously automated solutions. These solutions will be meticulously designed, implemented, and rigorously tested to ensure they meet the stringent benchmarks for scalability, reliability, and performance demanded by customer-facing applications.


  • You'll play a vital role in upholding the dependability and accessibility of our platform, indispensable for facilitating swift decision-making across crucial domains like cybersecurity, network performance, and infrastructure management.
  • Collaborate with our engineering and operations teams to design and implement scalable infrastructure solutions, allowing us to handle large volumes of data and support our growing client base.
  • Continuously monitor and optimize the performance of our platform, leveraging your expertise in real-time data analysis to identify and address bottlenecks and inefficiencies.
  • Implement robust security measures to protect our platform and our clients' data from security threats and vulnerabilities. This includes conducting regular security audits, implementing security best practices, and staying updated on emerging threats and mitigation strategies.
  • Establish comprehensive monitoring and observability frameworks to gain insights into the health and performance of our platform. This involves designing and implementing monitoring tools, dashboards, and alerts to proactively identify and resolve issues before they impact our clients.
  • Work closely with our compliance team to ensure that our platform meets regulatory requirements and industry standards for data security and privacy. This includes implementing controls and processes to safeguard sensitive data and facilitate compliance audits.
  • Respond to incidents and outages in a timely manner, coordinating
  • with cross-functional teams to minimize downtime and mitigate the impact on our clients. Conduct post-incident reviews to identify root causes and implement preventive measures to prevent recurrence.
  • Drive continuous improvement initiatives to enhance the reliability, performance, and security of our platform. This includes evaluating new technologies and tools, implementing best practices, and optimizing processes to streamline operations and reduce risk.


  • Bachelor's degree in Engineering or relevant technical field, or equivalent military experience.
  • 5+ years of experience with Unix/Linux, including proficiency with shell scripting, system tools, kernel management, networking, and storage.
  • 2+ years of experience working with microservice architectures deployed on Kubernetes and containerized environments.
  • Strong understanding of architecture and design principles for fault tolerance, scalability, and stability, with familiarity with the AWS Well Architected Framework or similar frameworks.
  • Demonstrated proficiency in building tools and automation using Python or Golang for large-scale production environments.
  • Proven experience with Configuration Management and Infrastructure as Code tools such as Terraform, Ansible, Chef, or Puppet.
  • Experience with Kubernetes and systems like Helm, Argo, Flux, or other cloud-native CI/CD platforms
  • Experience working with public cloud platforms (preferably AWS or GCP) at medium to large scale.
  • Experience in designing and implementing large-scale metrics and monitoring systems is a plus.
  • Highly organized with a focus on building, improving, and delivering results.
  • Excellent communication skills, able to effectively collaborate within and across teams, and take a leadership role.
  • Commitment to contributing to the success of SRE and DevOps initiatives.
  • Ability to quickly develop expertise in new technologies.
  • Collaborative approach, working closely with developers, researchers, data scientists, and security experts.
  • Proficiency in designing, building, and operating reliable and secure cloud infrastructure.
  • Proven ability to automate deployment processes for robust services.
  • Experience in orchestrating end-to-end monitoring and alerting systems.
  • Mentoring experience and a commitment to championing SRE culture.
  • Participation in design reviews to ensure robustness and scalability of systems.


  • Private medical insurance
  • Flexible remuneration
  • Hybrid work model
  • On-going training with access to e-learning platforms
  • English lessons

Our recruiting process

For this position, you can generally expect a hiring process similar to the following one:

  • Qualified candidates receive feedback concerning their application from our Recruiting team, based on your role and experience we bring you to the next stage.
  • Technical interviews: in this stage candidates will go through two different technical rounds, assessing technical knowledge, coding abilities, and problem-solving skills relevant to the role, meeting different engineers of the hiring team
  • Meet with the CTO: a discussion about technical topics, past experiences, and cultural fit.
  • We give feedback to all candidates via email.