Senior DevOps Support Engineer / Site Reliability Engineer (SRE)

About Our Company:

We are a dynamic, fully remote international company at the forefront of the future of work. Our distributed team spans multiple time zones and countries, allowing us to leverage a global talent pool and diverse perspectives.

Our mission is to revolutionise prepaid mobile services in Africa and Asia through innovative, omnichannel solutions. We empower Mobile Network Operators to optimise their entire value chain, balancing self-care options with agent-assisted services to reach all customers effectively in diverse economic environments.

We deliver consistent, hyper-personalised offers across all points of sale, transforming how operators create, manage, and distribute services. Our seamless integration of USSD and smart app technologies ensures accessibility and uniform experiences for all users. We help operators bundle and sell benefits effectively to subscribers, enhancing sales, distribution, and customer interactions.

Our goal is to spearhead telecommunications advancement in emerging markets, bridging technology gaps and fostering digital inclusion while adapting to regional challenges. Our integrated portfolio includes innovative solutions such as the OSG USSD gateway, CoaleSCE menu service environment, SmartShop bundle management system, and Crediverse EVD solution.

Job Overview:

We’re seeking an experienced and adaptable Senior DevOps Support Engineer to join our dynamic remote team. This role is critical in maintaining and optimizing our complex, high-volume transaction-based software systems. The ideal candidate will excel in fast-paced environments, possess strong problem-solving skills, and have the ability to navigate and improve evolving system architectures.

You’ll play a key role in bridging traditional support engineering with modern DevOps practices, contributing significantly to our mission of revolutionizing prepaid mobile services in emerging markets. This position offers unique challenges and opportunities for those who thrive on solving complex problems, driving system improvements, and collaborating closely with development teams.

Key Responsibilities:

  • Respond swiftly to incidents via VPN to customer sites, providing rapid troubleshooting and resolution
  • Deploy and upgrade our complex, high-volume transaction-based software in diverse customer environments
  • Implement and maintain robust monitoring and observability solutions to proactively identify potential issues
  • Develop and improve automated deployment processes using GitHub Actions and Ansible, enhancing system reliability and efficiency
  • Spearhead our GitOps-based configuration management and infrastructure as code initiatives
  • Participate in on-call rotations for critical incident response
  • Lead efforts in improving system documentation and knowledge sharing within the team
  • Lead the development and maintenance of our customer-facing knowledge base and self-service portal, ensuring comprehensive documentation and intuitive navigation for optimal customer experience
  • Conduct in-depth analysis of transaction data records, log files, and database tables to identify and resolve complex issues
  • Develop and execute complex SQL queries to investigate data anomalies, troubleshoot system behavior, and generate comprehensive reports
  • Create and maintain data analysis scripts and tools to automate routine investigations and improve efficiency in problem resolution
  • Design and implement solutions to address intricate challenges related to transaction processing and subscriber management
  • Perform root cause analysis on critical system issues and develop long-term solutions
  • Produce comprehensive incident reports for internal teams and external stakeholders, including detailed incident descriptions, thorough root cause analyses, and actionable recommendations for system improvements
  • Manage clustered environments and implement redundancy measures to ensure high availability
  • Develop and maintain disaster recovery (DR) sites and procedures, conducting regular DR drills
  • Collaborate closely with our distributed team to drive continuous improvement in our systems and processes
  • Mentor junior team members and contribute to building a culture of engineering excellence
  • Collaborate closely with developers to troubleshoot and resolve incidents, ensuring quick resolution and knowledge sharing between teams
  • Foster a DevOps culture by bridging the gap between development and operations, promoting shared responsibility for system reliability

Required skill/experience:

  • Proven experience (5+ years) in DevOps or Site Reliability Engineering roles
  • Strong knowledge of Unix operating systems and hardware
  • Extensive experience with telecom systems and protocols
  • Advanced proficiency in database management, particularly MariaDB
  • Strong proficiency in SQL and experience with advanced database querying techniques
  • Expertise in log analysis and the ability to extract meaningful insights from large volumes of log data
  • Experience with data visualization tools to effectively communicate findings from data analysis
  • Familiarity with scripting languages (e.g., Python, Bash) for automating data analysis tasks
  • Expertise in DevOps tools and practices, including GitHub Actions and Ansible
  • Demonstrated ability to work with and improve complex system architectures
  • Strong analytical skills for processing and interpreting large volumes of transaction data
  • Excellent troubleshooting and problem-solving skills, especially in high-pressure situations
  • Ability to work effectively in a remote environment with minimal supervision
  • Strong communication skills for coordinating with team members and customers
  • Proficiency in creating and maintaining system architecture documentation
  • Solid understanding of high-availability concepts and implementation in telecom environments
  • Experience with clustering technologies and redundancy strategies
  • Knowledge of disaster recovery planning and implementation
  • Strong collaborative skills, with experience working closely with development teams to resolve complex issues
  • Understanding of software development processes and ability to read and understand code for troubleshooting purposes
  • Experience in fostering a DevOps culture and promoting cross-team collaboration

Preferred skill/experience:

  • Experience working with systems handling millions of daily transactions
  • Background in fintech or telecom industries
  • Familiarity with USSD and smart app technologies
  • Knowledge of software deployment best practices in diverse environments
  • Experience with automated testing and continuous integration/deployment (CI/CD) pipelines
  • Background in customer support or technical account management for critical systems
  • Certifications related to high-availability systems or disaster recovery (e.g., CDCP, CBCP)
  • Experience in roles that bridged development and operations teams
  • Experience with big data technologies and distributed system analysis
  • ITIL v3 or v4 certification

What We Offer:

  • Opportunity to work on challenging projects that directly impact millions of users in emerging markets
  • Remote work environment that values work-life balance and independent problem-solving
  • Chance to be a key player in a dynamic team, with significant opportunity for individual impact and growth
  • Competitive compensation package, tailored to your location and experience
  • Exposure to cutting-edge technologies in the mobile services industry
  • Potential for rapid career advancement as you help drive our company’s growth and evolution
  • Flexible working hours to accommodate different time zones and operational needs
  • Regular team-building activities and virtual events to foster connections among remote team members

Our Values:

  • Customer-Centric Approach: We passionately serve our customers by delivering innovative solutions that address complex challenges and create lasting value.
  • Engineering Elegance: We believe in purposeful design, intuitive usability, refined simplicity, and maintainability in all our solutions, even in the face of complex system landscapes.
  • Continuous Improvement: We’re committed to constantly enhancing our systems, processes, and skills to stay at the forefront of our industry.

How to Apply:

If you’re excited by the challenge of optimizing critical systems and driving technological advancement in emerging markets, we want to hear from you. Please submit your resume, a brief cover letter explaining your interest in the role and how you’ve tackled complex system challenges in the past, and any relevant portfolio or project examples to careers@concurrent.systems.

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.