Job Description
As a core member of the team, you will provide cloud operational support (including code-level fixes), own incident management, and continuously improve system reliability and operational excellence across production and non-production environments.
Working Hours: Mon-Fri
Working Location: Central
Salary Package: Up to $8800 (basic) + AWS
Job Type: Contract
Key Responsibilities
-
Monitor and analyse production and non-production environments using full-stack observability tools to ensure optimal performance, uptime, and user experience.
-
Own incident management end-to-end: detect, triage, resolve incidents, conduct root cause analysis (RCA), coordinate across teams/vendors, and produce post-incident reports.
-
Drive continuous improvement initiatives through data-driven insights in collaboration with product, development, and security teams.
-
Build and maintain operations documentation, runbooks, and SOPs to support audit compliance and knowledge sharing.
-
Automate repetitive operational and infrastructure tasks using Infrastructure-as-Code and scripting tools to reduce downtime and human error.
-
Implement and enhance monitoring, alerting, and logging across application and infrastructure layers (APM).
-
Manage day-to-day operational activities, produce performance and availability reports, and present insights to stakeholders and leadership.
-
Lead and coordinate 24/7 operations support, working with internal teams and external vendors to meet SLAs.
Requirements
-
Bachelor’s degree in Computer Science, Information Technology, or a related field.
-
Minimum 3 years of experience in Operations Support, Site Reliability Engineering, DevOps, or similar roles.
-
Hands-on experience providing L1–L3 support, including troubleshooting at application and infrastructure levels.
-
Strong experience with incident, problem, and change management using ITSM tools (e.g. ServiceNow, Jira Service Management, PagerDuty).
-
Experience implementing security controls and privileged access management for test and production environments.
-
Proven experience in full-stack monitoring and observability, including cloud-native and open-source tools (e.g. CloudWatch, Stackdriver, Prometheus/Grafana, OpenTelemetry).
-
Experience with automation and Infrastructure-as-Code (e.g. Terraform, Ansible, scripting).
-
Familiarity with Agile/DevOps practices, CI/CD pipelines, test-driven development, and information security best practices.
-
Experience managing cloud infrastructure and services (AWS, Azure, Google Cloud); cloud certifications are a plus.
-
Strong problem-solving, analytical, and communication skills, with the ability to explain technical issues to non-technical stakeholders.
-
A collaborative mindset, proactive attitude, and ability to thrive in a fast-paced, high-performance environment.
By submitting your resume, you consent to the collection, use, and disclosure of your personal information per ScienTec’s Privacy Policy (scientecconsulting.com/privacy-policy).
This authorizes us to:
Contact you about potential opportunities.
Delete personal data as it is not required at this application stage.
All applications will be processed with strict confidence. Only shortlisted candidates will be contacted.
Elane Yap Theng Yu- R1989397
ScienTec Consulting Pte Ltd - 11C5781