Job Description
AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI / ML, and our people-first culture has earned us multiple Best Place to Work awards.
If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you!
WHAT YOU WILL DO
- Shift : Monday – Thursday 8AM – 7PM PST (11AM – 10PM EST) with rotating on-call;
- Manage alerts daily, check systems, and escalate issues as needed;
- Be part of a team that provides 24×7 on-call support for critical SaaS events;
- Be available in case of emergencies when team members are not available or need help;
- Document issues and remediation steps;
- Proactively create appropriate monitors in the EKS / K8S ecosystem;
- Deploy to EKS / K8s cluster using Terraform and Helm;
- Learn and maintain existing infrastructure running under Docker Swarm;
- Improve existing infrastructure health by implementing checks and scripts to correct known issues;
- Maintain and develop deployment code;
- Automate manual tasks;
- Implement / integrate new technologies in our Cloud Infrastructure;
- Collaborate with other teams and departments to provide the highest level of support and assistance;
- Apply a real customer focus when planning deployments / updates, having the customer in the forefront of the mind, and considering the impact on them before making changes;
- Work closely on solutions with Support, Customer Success, Migration, and Professional Services teams to provide the best in class SaaS service to our customers;
- Perform RCA and take necessary corrective actions to prevent the recurrence of issues;
- Create and assign alert-related actions to the appropriate team after the investigation;
- Handle support requests for environment-specific actions;
- Identify and provide automation requirements to improve RCA.
MUST HAVES
2+ years of professional experience;Experience working with Datadog ;Hands-on experience as an AWS Cloud Engineer;Working knowledge of EKS / Terraform / Helm;Working Experience with Docker and Docker Swarm;Good understanding of AWS IAM roles and policies;Experience logging and monitoring AWS resources using CloudWatch logs;Experience working in a Linux environment;Proficient in Bash and / or Python scripting;A strong understanding of web technologies such as REST APIs;Working Experience with monitoring solutions, such as Grafana and Prometheus;Excellent oral and written communication skills;Customer-facing communication skills to effectively explain issues and RCAs to them;Experience in Product / Application Support for SaaS-based products;Understanding of APIs, Databases, Systems Architecture, and Design;Designing, implementing, and operating in a DevSecOps;Excellent communication skills, both written and verbal;Ability to work independently as well as within a collaborative environment;A technical aptitude with the desire to learn new and evolving technologies;Upper-Intermediate English level.NICE TO HAVES
ExperienceTHE BENEFITS OF JOINING US
Professional growth : Accelerate your professional journey with mentorship, TechTalks, and personalized growth roadmaps.Competitive compensation : We match your ever-growing skills, talent, and contributions with competitive USD-based compensation and budgets for education, fitness, and team activities.A selection of exciting projects : Join projects with modern solutions development and top-tier clients that include Fortune 500 enterprises and leading product brands.Flextime : Tailor your schedule for an optimal work-life balance, by having the options of working from home and going to the office – whatever makes you the happiest and most productive.Your application doesn't end here! To unlock the next steps, check your email and complete your registration on our Applicant Site . The incomplete registration results in the termination of your process.
Requirements
2+ years of professional experience; Experience working with Datadog; Hands-on experience as an AWS Cloud Engineer; Working knowledge of EKS / Terraform / Helm; Working Experience with Docker and Docker Swarm; Good understanding of AWS IAM roles and policies; Experience logging and monitoring AWS resources using CloudWatch logs; Experience working in a Linux environment; Proficient in Bash and / or Python scripting; A strong understanding of web technologies such as REST APIs; Working Experience with monitoring solutions, such as Grafana and Prometheus; Excellent oral and written communication skills; Customer-facing communication skills to effectively explain issues and RCAs to them; Experience in Product / Application Support for SaaS-based products; Understanding of APIs, Databases, Systems Architecture, and Design; Designing, implementing, and operating in a DevSecOps; Excellent communication skills, both written and verbal; Ability to work independently as well as within a collaborative environment; A technical aptitude with the desire to learn new and evolving technologies; Upper-Intermediate English level.