Site Reliability Engineer (Sre)

Site Reliability Engineer (Sre)






Site Reliability Engineer (Sre)

Details of the offer

Summary Posted: Apr 22, 2020 Weekly Hours: 40 Role Number:200166618
We are establishing a centralized team to enable 3rd party cloud access and adoption. You will partner with developers, system and site reliability engineers and adopters to understand their challenges, work through their issues and provide solutions that can be adopted widely. The ideal candidate is someone with a proven track record, sound technical knowledge and skills in delivering large scale complex software solutions deployed on 3rd party cloud (e.g. AWS, GCP).
This role will be responsible for designing, building, running, and monitoring public & private cloud infrastructure to support a variety of mission critical services. This is a highly technical, hands-on role that requires expertise supporting systems at enterprise scale. The candidate will deliver innovative solutions in key areas:
Engineering - Continuously optimize secure, scalable and performant security tools and services
Reliability - Drive fault detection and correction, performance and uptime at global scale
Monitoring - Instrument systems to gain visibility and understanding of how they are
performing at any time
Automation and orchestration to enable
- Accelerated infrastructure, application and software configuration deployment
- Automated response to alerts or indicators of performance issues
- Infrastructure as code
Key Qualifications
5+ years of managing services in a distributed, mission critical *nix environment
Experience supporting infrastructure and services in public and private cloud environments
Expertise with monitoring or log aggregation tools (Prometheus, Splunk, ELK, etc.)
Experience building and supporting containerized application technologies including Docker, Kubernetes
Familiarity with CI/CD tools and deployment processes
Working knowledge of network protocols and network based services, including routing and network load balancing
Failure Testing and Chaos Engineering
Experience with virtualization technologies
Solid understanding Linux/Unix system internals, including kernel tuning
Solid understanding of storage systems, including network filesystems
Proficient with various programming languages such as Python/Java/Ruby/Perl/Go/Makefile for building automation or integration with APIs
Solid understanding and experience with centralized configuration management, coordination and provisioning technologies, such as Ansible, Chef, Puppet, etc.
Excellent communication skills, must be capable of working with cross functional technical and business teams and varying levels of management
Experience implementing and working with open source projects
Understanding of Agile methodologies like Scrum and be able to work in fast-paced environment
Strong project management skills, including excellent presentation skills
Must be capable of writing detailed solution specifications, diagrams, best practices/standards documentation, operating procedures, test plans/test reports, etc.
Understanding of Agile methodologies like Scrum and be able to work in fast-paced environment
Be part of an engineering team scaling the core cloud platform for thousands of applications in a secure manner, focusing on automation of operations and high availability.
You will be a proactive hands-on engineer automating operations, improve platform features and observability and strategist in key design and architecture initiatives with a broad knowledge in cloud and operational excellence.
Responsibilities include:
- Build, engineer and support cloud platform IaaS and PaaS services
- Partner with application teams to provision scalable workloads reliably across distributed compute resources
- Provide engineering and operational support for distributed systems and network based information security tools, including for configuration management and provisioning
- Implement and maintain security controls
- Work closely with development teams to understand application performance and behavior patterns to proactively monitor, tune and correct issues before they occur
- Identify opportunities to improve security tooling reliability, performance and security
- Develop tools and automation to eliminate manual and repetitive efforts
Education & Experience
Bachelor of Science in Computer Science or equivalent experience 4+ years

Source: Jobs4It




  • Access

Senior software developer, android

Full-time position. Flexible schedule. Will provide technical leadership, collaborate with team, draft applications and deploy features. Bachelor's degree and...


Published a month ago

Senior ios developer

Full-time position with remote option. Will implement new features, create products, contribute to codebase and provide industry-specific expertise. Must have...

New York

Published a month ago

Master level java application developer

Guides team members working on Agile/DevOps solutions, conducts technical & process assessments, designs new products, & serves as a leader in the technology...


Published a month ago

Senior .net developer

A remote-based senior .NET developer will work full-time and participate in requirements gathering, systems design and analysis, collaborate with clients and...


Published a month ago