Your Role
Serve as Subject Matter Expert (SME) for distributed applications on hybrid cloud platforms, documenting best practices and providing guidance to peers.
Champion continuous operational improvements informed by metrics analysis and customer feedback.
Lead incident management, troubleshooting, response coordination, and conduct comprehensive post-incident reviews.
Clearly communicate complex technical issues to development teams, document root causes, and collaborate internally to create robust solutions.
Manage, deploy, and maintain enterprise applications and cloud-based systems using secure, scalable, and reliable frameworks.
Proactively monitor, troubleshoot, and optimize the health, performance, and reliability of applications and platforms.
Perform detailed log analysis and utilize stack traces to debug and resolve issues reported by partners and end-users.
Develop comprehensive documentation covering operational procedures, system configurations, and environment setups.
Continuously identify and implement automation opportunities to reduce manual tasks and operational overhead.
Train junior engineers in different subjects of expertise.
Participate in a 24x7 shifting rotation.
Your Qualifications
Bachelor's degree in Information Technology, Engineering, or a related technical field.
Minimum 5 years of experience supporting critical, high-availability production systems with a focus on automation, reliability, and operational excellence.
At least 5+ years of hands-on experience in at least 12 tools per domain:
Linux Administration & Troubleshooting: RHEL, Cent OS, Ubuntu, or similar Unix-based OS.
Distributed Applications: Microservices architecture and distributed application support.
Logging & Monitoring: Splunk, Grafana, Prometheus.
Incident Management: Pager Duty, Service Now.
Version Control: Git, Git Hub, Git Lab.
Plus points if you have:
Certifications such as CKA, CKAD, or cloud certifications (AWS, Azure, GCP).
Experience supporting and maintaining Paa S environments, CDNs, Messaging Queues, API Gateways, and Proxies in scalable, resilient architectures.
Proven success in cross‑functional collaboration within modern Dev Ops environments.
Ability to drive operational efficiency through automation, using Bash, Python, or similar ing languages.