Ai Sre Support- Us Contract

Ai Sre Support- Us Contract
Company:

Zortech Solutions


Details of the offer

Role: AI SRE Support
Location: NC/Remote/US
Duration: 6+ Months
Job Description:
Act as production Gatekeeper for all changes (Product and infrastructure changes)
Perform detailed deep dive (root cause analysis) on the repeated system issues and work with engineering team for permanent solution
Provide support as Tier2 application/platform support for Optum AI applications
Periodic on call rotations and available outside of normal business hours on evenings and weekends during critical production release or issue escalation periods
Site Reliability Engineer (SRE) is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning
This role will be a member of a team that focuses on DevOps, DevSecOps and SRE for the Optum AI Organization
The role drives continuous improvement in delivery of resilient, scalable, performant, secure, and high-quality cloud-native services
Collaborating with SecOps, and development teams the SRE identifies cross-team issues which create risk for operations across the organization and resolving those issues with a mixture of engineering, troubleshooting expertise, and general operational guidance
Proactively drive improvement of enterprise cloud capabilities while creating best practices and tools to empower developers to create, deploy, and operationally support services
As a key contributor in the organization this role is responsible for the working with the Principal SRE and guiding junior team members in DevOps culture, highly scalable architectures, and lean development utilizing agile practices
Educate yourself and others on anything that helps service teams more quickly and easily build, test, deploy & run their services to be more reliable
Plan, design, deploy, and operate Site Reliability Engineering capabilities for cloud products & services
Recognize and address sub-standard performance based on key performance indicators (KPIs)
Build monitoring that alerts on symptoms rather than outages
Continuously build, automate, and improve upon capabilities that are secure, scalable, performant, and resilient
Work closely with Infrastructure, Network, Security, Architecture, and Development teams to build highly performing, scalable, and secure Azure/AWS/GCP (cloud) environments
Define needs by documenting processes; includes research, planning and writing supporting documentation
Participate in regulatory and compliance activities as necessary
Periodic on call rotations and available outside of normal business hours on evenings and weekends during critical production release or issue escalation periods
Responsible for remediating the security vulnerabilities which are discovered in the non-production and production scans.
Participate in the new vendor/product/service onboarding and assess partner technical readiness (Such as Azure AI studio, Azure model catalog, AWS sage maker).
.Develop or maintain dashboard for operational analysis and status reports.
Perform Operational Readiness testing for every release package to proactively predict any performance degradations across all components of a critical asset. (For example Portal, Workspace creation, Project creation, Model Inference and API Response times)
What does the ideal candidate background look like (ex: healthcare specific background specific industry experience etc.)?
AI platform experience
Cloud experience
Strong SRE background (include CICD, automation, observability)
What skills/attributes are required (please be detailed as to number of years of experience for each skill)?
8+ years in software or operations engineering
3+ years of DevOps and Site Reliability engineering or similar experience with cloud-native solutions
Proven experience in DevOps culture and site reliability engineering focused on the customer, cross-functional autonomous teams, and continuous improvement
DevOps experience with a cloud-native web application hosted in one of the three major cloud platforms
Familiarity with version control systems e.g., Git, AzureDevOps
Extensive database and operating systems experience
Experience in designing and implementing a continuous integration pipeline (CICD)
May interacts with cloud vendors and service providers to resolve system related problems
Experience in monitoring infrastructure, application uptime, latency, and performance on large distributed systems
Exhibit proficiency at troubleshooting various cloud and system related issues
Demonstrable cross-functional knowledge with systems, storage, networking, security, and databases
Strong verbal and written communication skills with ability to effectively communicate at multiple levels in the organization


Source: Grabsjobs_Co

Requirements

Ai Sre Support- Us Contract
Company:

Zortech Solutions


It Infrastructure Manager

The IT Infrastructure Manager leads a team of administrators and support staff and manages relationships with outsourced providers to make sure associated ha...


From Kayser-Roth Corporation - North Carolina

Published 21 days ago

Data Analyst

Data Analyst with Data Migration Experience – 2 roles Please ensure you read the below overview and requirements for this employment opportunity completely....


From Cyitechsearch Llc - North Carolina

Published 21 days ago

Software Engineer 4 - Contingent

Only W2 Job Title: Software Engineer 4 or Senior Data EngineerLocation: Charlotte NCDuration: 12 months contractType: W2RTO: Hybrid 3 days/ week to office ...


From Pinnacle Group - North Carolina

Published 21 days ago

Sr. Application Developer

SR. APPLICATION DEVELOPER | REMOTE (EST) The Select Group is looking for a talented Sr. Application Developer to join one of our top healthcare partners. Thi...


From The Select Group - North Carolina

Published 21 days ago

Built at: 2024-06-17T12:06:01.748Z