- Entry level
- No Education
- Salary to negotiate
- South San Francisco
Scientific Computing Operations Expert
As part of the Global Operations team we are looking for a full-time infrastructure expert based in South San Francisco for our Scientific Computing Infrastructure environment. Your mission will be to provide multi-site operations support for scientific infrastructure environments supporting our Research partners in their needs for agility and performance. This includes installing, configuring, administering, monitoring and optimizing our High Performance Computing environment and related infrastructure components across the organization in a timely, cost-effective and efficient manner as well as managing the engagements with external services supporting this environment.
Responsible for the full operational support of the current scientific infrastructure and provide practical input into the evolution of the current environment.
Ensure installation, configuration and operation of the environments to achieve the performance and agility of the diverse applications supported (in the range of several hundreds).
Contribute to the concept, planning and execution of projects.
Provide in-depth technical skills and experience to enhance the overall capability of the team.
Point of escalation for managed services as well as more junior and less experienced members of the team and work with them to resolve complex incidents or associated underlying problems.
Collaborate in solution design using an agile approach as required to meet specific business objectives.
Create and implement automated solutions and manage the implementation of major change initiatives which will have institution-wide impact.
Participate in the long-term strategy.
Provides guidance to others on ways of increasing their contribution to the mission, objectives, and values of the organization.
Bachelor’s degree in Computer Science or equivalent work experience.
ITSM or/and Agile/DevOps methodology
Technical skills required:
10+ years of Scientific Computing Operations experience. Experience managing a Parallel File System (e.g. GPFS, Lustre).
Experience with integration and utilization of centralized identity management (AD, LDAP, Centrify), overseeing the integration and utilization with all scientific infrastructure.
Experienced in working with linux engineers to resolve performance issues through kernel tuning and optimizing kernel extensions for scientific infrastructure.
Demonstrated ability to partner with network engineers to troubleshoot and resolve inter-device network performance issues.
Strong Linux administration knowledge (e.g. RH/CentOS).
Technical operational skills, such as troubleshooting, capacity planning and root cause analysis.
Familiar with multiple computational devices (blades, SMP, GPU, etc.) and experienced in optimizing their configurations to support diverse software stacks.
Support the health, patching and maintenance of environments utilizing Compute Cluster Management (e.g. Bright CM).
Demonstrated experience in managing the operations of a workload management scheduler (e.g. SLURM, UGE, Torque, LSF).
Experience in environments using tiered storage and data lifecycle management, including object storage (NetApp StorageGRID), and data transfer (AFM with GPFS).
Familiarity with the integration and operation of cloud-based scientific computing resources in a hybrid infrastructure model.
Responsible for hardware vendor management, including SLA’s and delivery quality.
Scripting experience: Bash, PowerShell, Perl, Ruby, Python.
DevOps approach: IaC - configuration management (e.g. Ansible, Puppet), automated build/release test & deployment (e.g. Jenkins, Git) and testing frameworks (Pytest, testinfra etc.).
Monitoring tools/frameworks (e.g.Grafana, Ganglia, ELK, Nagios, Zabbix).
Virtualization and containerization knowledge (e.g. Docker, Mesos-Marathon, Kubernetes, Singularity).
Desired understanding of Computer Systems Validation and ITIL concepts.
Excellent customer orientation and delivery focus with good end user perspective. Cares and can drill down into a conversation with the developer or customer, to reach agreements and solve problems.
Proactively supports peers in own and other functions. Cares about people and can mentor others.
Collaboration. Provides relevant and timely communication to teams.
Lead people and get people thinking together about solving problems.
Demonstrated experience in
About the company
Roche is a Swiss global health-care company that operates worldwide under two divisions: Pharmaceuticals and Diagnostics. Its holding company, Roche Holding AG, has bearer shares listed on the SIX Swiss Exchange.
The company headquarters are located in Basel and the company has many pharmaceutical and diagnostic sites around the world – including: Tucson, AZ; Pleasanton, CA; Vacaville, California, Oceanside, California, Branchburg, NJ; Indianapolis, Indiana; Florence, South Carolina; and Ponce, Puerto Rico in the US; Welwyn Garden City and Burgess Hill in the UK; Clarecastle in Ireland; Mannheim and Penzberg in Germany; Mississauga and Laval in Canada; Shanghai in China; Mumbai & Hyderabad in India; São Paulo and Rio de Janeiro, Brazil; Segrate, Milan in Italy; Johannesburg in South Africa; Karachi, Islamabad and Lahore in Pakistan. There are 26 manufacturing sites worldwide.