Site Reliability Engineer (Senior)

Site Reliability Engineer (Senior)






Site Reliability Engineer (Senior)

Details of the offer

Site Reliability Engineer (Senior)The next generation of our digital products are delivering engaging, adaptive, and personalized learning experiences to optimally support every student. We are hiring a Senior Site Reliability Engineer who will work with system and software engineers to build reliable, high capacity and high-performance systems in support of our mission to reimagine learning for millions of students and learners worldwide.We aim to break down walls between development and operations; participate in finding and building solutions which enable teams to deliver software updates in a way that is highly stable and operationally sound. We are strongly invested in the AWS Cloud, infrastructure-as-code, and monitoring-as-code. We favor the practical and pragmatic over the ideal, including finding right-sized solutions. We are anticipatory and forward-looking, reliable, and have a bias toward taking action. We understand that without our customers our efforts are worthless, and that operational changes are likely to have a direct impact on user experience. We understand that up time is paramount, and we work backwards from there.Essential Accountabilities:Leadership:Listening to the needs of our teams, learning how they work best, and delivering solutions.The ability to collaborate with product teams and technical leads to prioritize our efforts.Stay current on industry trends; conceive and present to management ways to improve current practices, to improve our standing in the marketplace, and remain on the cutting edge of technology.Ability to take ownership over a project, drive it forward, “sell” it to other teams inside the company as a solution for a given problem, and work with teams to drive adoption.If you see an opportunity to solve a problem or otherwise make something better, take the initiative.Mentor team members; foster growth by setting high-reaching goals; providing support as needed to achieve them.
Technical:Hands-on design, understanding, and troubleshooting of highly-distributed, large-scale production systems — both modern and legacy, monolithic and micro.Co-ownership with the development teams over reliability, uptime, capacity, and performance.Ensuring the repeatability, traceability, and transparency of our infrastructure automation.Identifying highest-impact opportunities to optimize existing systems; ensuring “right-sized” and
cost-optimized solutions in consideration of technical and business constraints.System design consulting for teams seeking to leverage or improve their production infrastructure.Anticipate, build, and plan capacity for upcoming product/feature launches.Working with application teams and product principals to fully operationalize software/systems projects (including security requirements).Being part of an on-call rotation spread amongst the rest of the team. (The better we do at the things above, the quieter the rotation is!)
Required:MHE is a polyglot organization. Being “conversational” in JavaScript/TypeScript, Node, Python, PHP, Ruby, Golang, Java, Bash, Markdown, reStructuredText, HCL, JSON, YAML, and TOML would be valuable. Must be fluent in 2-3 of them.Must have the skills of a senior (or higher) level software application engineer.Must have the skills of a senior (or higher) level cloud operations engineer.Ability to translate knowledge and ideas into written-word as documentation/1-pagers.Excellent presentation and communication skills.Mastery of AWS services (IAM, EC2, S3, EBS/EFS, ELB/ALB, AutoScaling, RDS and replication techniques, VPC, Subnets, Elastic IP, Route53, CloudWatch, CloudFront, Lambda, CloudFormation, ECS, SNS, ElastiCache).Expertise in container/container-fleet-orchestration technologies (Kubernetes, ECS, Docker).Expertise integrating continuous-integration and continuous-delivery software development lifecycles (i.e., CI/CD) into one or more applications (using Jenkins, Circle CI, Travis CI, or other modern CI tools).Expertise in infrastructure automation technologies (e.g., Terraform, CloudFormation).Expertise with Lean/Agile deployment processes (e.g., blue/green, zero downtime, canary, and DNS strategies).Significant experience troubleshooting interactions among concurrent and distributed systems.Cloud database operations and deployment experience (e.g., RDS MySQL/Postgres/Aurora), caching operations & deployments (e.g., Memcache, Redis).Ability to design and manage escalation response plans — from monitoring, to
reaction/response/remediation, to retrospection/post-mortem in culturally-aligned (proactive, customer focused, collaborative, proven-with-data) ways.Familiarity with site and infrastructure monitoring systems (e.g., CloudWatch, Datadog, New Relic, Sumo Logic, Thousand Eyes).Cloud and container-native Linux administration/build/management skills (e.g., AMIs, Packer).Strong problem-solving, root cause understanding, and systems engineering skills.Expertise with software development lifecycle branching and distributed source code management systems (e.g., Git/Mercurial, Git-Flow, GitHub-Flow).B.S. Degree in Computer Science (or related technical field, or equivalent industry experience).A non-trivial background in open source is a HUGE plus.

Source: Jobs4It


  • IT - Information Technology / Programmer



  • DNS
  • Linux

Related offers

Data engineer

Allianz Global Investors is seeking an Data Engineer/Application Developer to support our Best Styles department. Best Styles is an active, highly diversified...

From Allianz Global Investors - California

Published 21 days ago

Software engineer

Lightbeam Software Engineer As a Pioneer for the Population Health Software Industry that's projected to grow to 31B by 2020, we are looking for full time...

From Lightbeam Health - Pennsylvania

Published 21 days ago

Administrative assistant

A growing UAV and remote sensing solutions company based in Austin, TX is looking to add an experienced Administrative Assistant to our team.We build custom...

From Sdhr Consulting - Texas

Published 21 days ago

Pl/sql developer

Job Description: Solid understanding of Software Development Life Cycle models as well as expert knowledge of both Agile principles and practices Hands-on...

From Ztek Consulting Inc Duluth, Ga - Minnesota

Published 21 days ago