Back to Careers

HPC Systems Admin

Location: 
Boston
Date Posted: 
06/08/2022
Employment Type: 
Contract
Job ID: 
12275
Description: 

HPC Systems Administrator&…

JOB DESCRIPTION

HPC Systems Administrator

Skills

HPC

  • Manage multi-vendor filesystems such as XFS and GPFS including upgrades, patching, space management, GPFS cluster management, diagnostics
  • Manage workload schedulers torque and slurm including upgrades, patches, diagnostics, user resource consumption and configuration
  • Manage Bright cluster management software including creating and maintaining images, managing job schedules and upgrades/patching
  • Installation and configuration of hardware, operating systems, and commercial software packages
  • managing user accounts, tuning system performance, installing system wide software and allocate mass storage space

 

Systems Administration

  • Advanced RHEL systems administration including hardware set up, upgrades, patching
  • Remediation of vulnerabilities
  • Performance tuning and server hardening
  • Disk space management
  • Diagnostics (slowness, nodes down, etc)

 

Software experience

  • Torque
  • Slurm
  • Bright
  • Moab
  • Mathlab

 

Experience working with containers (docker, singularity, podman, kubernetes) a plus

Experience in working with Git and supporting CI/CD pipelines a plus

 

Job Responsibilities

 

  • Installation, configuration, fine-tuning, and troubleshooting multi-vendor Linux HPC servers
  • Building and deploying open source software and software from vendors/partners
  • Diagnosing and resolving system operational problems quickly and effectively
  • Verifying full operation of systems including network, systems and storage performance
  • Configuration of the scheduling and queuing system
  • Troubleshoot and maintain Infiniband and ethernet networks
  • Understands, maintains, supports high performance parallel storage system
  • Assists users/research team running applications on the HPC cluster
  • Manage, maintain, monitor and control interactive and batch processes (scheduled and unscheduled)

 

Requirements

  • Expert knowledge of HPC server hardware including HP, Dell
  • Expert knowledge of CentOS and Red Hat
  • Expert knowledge of related parallel distributed file system like IBM GPFS
  • Advanced knowledge of cluster storage systems including Isilon
  • Advanced knowledge of the Linux Operating system such as: kernel compiles, boot up command line options, selinux, rpm, yum
  • Advanced level of proficiency with NIS, NFS, autofs, TCP/IP, Linux network configuration, local storage, lm_sensors, ipmi required
  • Intermediate knowledge of HPC resource Managers such as PBS, Torque and Moab
  • Intermediate level of knowledge with Bash, Perl, PHP, awk, sed, grep, HTML
  • Intermediate skill with scripting tools and leveraging solutions
  • Ability to provide day-to-day 24 x 7 and participate in on-call rotation
  • Must be able to lift and move 40lbs

Hybrid preferred but open to remote