Job description

Requirements

  • Entry level
  • No Education
  • Salary to negotiate
  • South San Francisco

Description

Scientific Computing Operations Expert

South San Francisco

California,
United States of America


Job ID: 201905-114017


Download PDF


Apply For This Job


Return to Search Results


The Position


Scientific Computing Operations Expert


Job Overview

As part of the Global Operations team we are looking for a full-time infrastructure expert based in South San Francisco for our Scientific Computing Infrastructure environment. Your mission will be to provide multi-site operations support for scientific infrastructure environments supporting our Research partners in their needs  for agility and performance. This includes installing, configuring, administering, monitoring and optimizing our High Performance Computing environment and related infrastructure components across the organization in a timely, cost-effective and efficient manner as well as managing the engagements with external services supporting this environment.


Key responsibilities:


- Responsible for the full operational support of the current scientific infrastructure and provide practical input into the evolution of the current environment.


- Ensure installation, configuration and operation of the environments to achieve the performance and agility of the diverse applications supported (in the range of several hundreds).


- Contribute to the concept, planning and execution of projects.


- Provide in-depth technical skills and experience to enhance the overall capability of the team.


- Point of escalation for managed services as well as more junior and less experienced members of the team and work with them to resolve complex incidents or associated underlying problems.


- Collaborate in solution design using an agile approach as required to meet specific business objectives.


- Create and implement automated solutions and manage the implementation of major change initiatives which will have institution-wide impact.


- Participate in the long-term strategy.


- Provides guidance to others on ways of increasing their contribution to the mission, objectives, and values of the organization.


Qualifications/Requirements:


- Bachelor’s degree in Computer Science or equivalent work experience.


- ITSM or/and Agile/DevOps methodology


- Technical skills required:


- 10+ years of Scientific Computing Operations experience. Experience managing a Parallel File System (e.g. GPFS, Lustre).


- Experience with integration and utilization of centralized identity management (AD, LDAP, Centrify), overseeing the integration and utilization with all scientific infrastructure.


- Experienced in working with linux engineers to resolve performance issues through kernel tuning and optimizing kernel extensions for scientific infrastructure.


- Demonstrated ability to partner with network engineers to troubleshoot and resolve inter-device network performance issues.


- Strong Linux administration knowledge (e.g. RH/CentOS).


- Technical operational skills, such as troubleshooting, capacity planning and root cause analysis.


- Familiar with multiple computational devices (blades, SMP, GPU, etc.) and experienced in optimizing their configurations to support diverse software stacks.


- Support the health, patching and maintenance of environments utilizing Compute  Cluster Management (e.g. Bright CM).


- Demonstrated experience in managing the operations of a workload management scheduler (e.g. SLURM, UGE, Torque, LSF).


- Experience in environments using tiered storage and data lifecycle management, including object storage (NetApp StorageGRID), and data transfer (AFM with GPFS).


- Familiarity with the integration and operation of cloud-based scientific computing resources in a hybrid infrastructure model.


- Responsible for hardware vendor management, including SLA’s and delivery quality.


- Scripting experience: Bash, PowerShell, Perl, Ruby, Python.


- DevOps approach: IaC - configuration management (e.g. Ansible, Puppet), automated build/release test & deployment (e.g. Jenkins, Git) and testing frameworks (Pytest, testinfra etc.).


- Monitoring tools/frameworks (e.g.Grafana, Ganglia, ELK, Nagios, Zabbix).


- Virtualization and containerization knowledge (e.g. Docker, Mesos-Marathon, Kubernetes, Singularity).


- Desired understanding of Computer Systems Validation and ITIL concepts.


- Other Requirements:


- Excellent customer orientation and delivery focus with good end user perspective. Cares and can drill down into a conversation with the developer or customer, to reach agreements and solve problems.


- Proactively supports peers in own and other functions. Cares abou

About the company

Genentech is a leading biotechnology company that discovers, develops, manufactures and commercializes medicines to treat patients with serious or life-threatening medical conditions. We are among the world's leading biotech companies, with multiple products on the market and a promising development pipeline.

Companies in this sector