Site Reliability Engineer (Senior)

Site Reliability Engineer (Senior)






Site Reliability Engineer (Senior)

Details of the offer

Site Reliability Engineer (Senior)The next generation of our digital products are delivering engaging, adaptive, and personalized learning experiences to optimally support every student. We are hiring a Senior Site Reliability Engineer who will work with system and software engineers to build reliable, high capacity and high-performance systems in support of our mission to reimagine learning for millions of students and learners worldwide.We aim to break down walls between development and operations; participate in finding and building solutions which enable teams to deliver software updates in a way that is highly stable and operationally sound. We are strongly invested in the AWS Cloud, infrastructure-as-code, and monitoring-as-code. We favor the practical and pragmatic over the ideal, including finding right-sized solutions. We are anticipatory and forward-looking, reliable, and have a bias toward taking action. We understand that without our customers our efforts are worthless, and that operational changes are likely to have a direct impact on user experience. We understand that up time is paramount, and we work backwards from there.Essential Accountabilities:Leadership:Listening to the needs of our teams, learning how they work best, and delivering solutions.The ability to collaborate with product teams and technical leads to prioritize our efforts.Stay current on industry trends; conceive and present to management ways to improve current practices, to improve our standing in the marketplace, and remain on the cutting edge of technology.Ability to take ownership over a project, drive it forward, “sell” it to other teams inside the company as a solution for a given problem, and work with teams to drive adoption.If you see an opportunity to solve a problem or otherwise make something better, take the initiative.Mentor team members; foster growth by setting high-reaching goals; providing support as needed to achieve them.
Technical:Hands-on design, understanding, and troubleshooting of highly-distributed, large-scale production systems — both modern and legacy, monolithic and micro.Co-ownership with the development teams over reliability, uptime, capacity, and performance.Ensuring the repeatability, traceability, and transparency of our infrastructure automation.Identifying highest-impact opportunities to optimize existing systems; ensuring “right-sized” and
cost-optimized solutions in consideration of technical and business constraints.System design consulting for teams seeking to leverage or improve their production infrastructure.Anticipate, build, and plan capacity for upcoming product/feature launches.Working with application teams and product principals to fully operationalize software/systems projects (including security requirements).Being part of an on-call rotation spread amongst the rest of the team. (The better we do at the things above, the quieter the rotation is!)
Required:MHE is a polyglot organization. Being “conversational” in JavaScript/TypeScript, Node, Python, PHP, Ruby, Golang, Java, Bash, Markdown, reStructuredText, HCL, JSON, YAML, and TOML would be valuable. Must be fluent in 2-3 of them.Must have the skills of a senior (or higher) level software application engineer.Must have the skills of a senior (or higher) level cloud operations engineer.Ability to translate knowledge and ideas into written-word as documentation/1-pagers.Excellent presentation and communication skills.Mastery of AWS services (IAM, EC2, S3, EBS/EFS, ELB/ALB, AutoScaling, RDS and replication techniques, VPC, Subnets, Elastic IP, Route53, CloudWatch, CloudFront, Lambda, CloudFormation, ECS, SNS, ElastiCache).Expertise in container/container-fleet-orchestration technologies (Kubernetes, ECS, Docker).Expertise integrating continuous-integration and continuous-delivery software development lifecycles (i.e., CI/CD) into one or more applications (using Jenkins, Circle CI, Travis CI, or other modern CI tools).Expertise in infrastructure automation technologies (e.g., Terraform, CloudFormation).Expertise with Lean/Agile deployment processes (e.g., blue/green, zero downtime, canary, and DNS strategies).Significant experience troubleshooting interactions among concurrent and distributed systems.Cloud database operations and deployment experience (e.g., RDS MySQL/Postgres/Aurora), caching operations & deployments (e.g., Memcache, Redis).Ability to design and manage escalation response plans — from monitoring, to
reaction/response/remediation, to retrospection/post-mortem in culturally-aligned (proactive, customer focused, collaborative, proven-with-data) ways.Familiarity with site and infrastructure monitoring systems (e.g., CloudWatch, Datadog, New Relic, Sumo Logic, Thousand Eyes).Cloud and container-native Linux administration/build/management skills (e.g., AMIs, Packer).Strong problem-solving, root cause understanding, and systems engineering skills.Expertise with software development lifecycle branching and distributed source code management systems (e.g., Git/Mercurial, Git-Flow, GitHub-Flow).B.S. Degree in Computer Science (or related technical field, or equivalent industry experience).A non-trivial background in open source is a HUGE plus.

Source: Jobs4It


  • IT - Information Technology / Programmer



  • DNS
  • Linux

Related offers

Director, engineering

Minimum qualifications: Bachelors degree in Computer Science or equivalent practical experience. 10 years of experience in software development leadership...


Published 17 days ago

Operations director (cloud)

As the Operations Director (Cloud) reporting to the Sr VP of Software, you will work across the software development organization to maintain and improve the...


Published 17 days ago

Director institutional research

The Director of Institutional Research supervises all institutional research activities and directs and manages data collection, compilation, analysis, and...


Published 17 days ago

Systems engineer

OVERVIEW At Sentry, we know that IT is the future. So today, were investing in not only our information systems and technology, but also in our team. We are...


Published 17 days ago