Chicago, IL, USA
17 days ago
HPC System Administrator

Department

Provost Research Computing Center


About the Department

The University of Chicago Research Computing Center (RCC), a unit in the Office of Research, provides high-end research computing resources to researchers at the University of Chicago. It is dedicated to enabling research by providing access to centrally managed High-Performance Computing (HPC), storage, and visualization resources. These resources include hardware, software, high-level scientific and technical user support, and the education and training required to help researchers make full use of modern HPC technology and local and national supercomputing resources. The Office of Research oversees the conduct of sponsored research, research program development, and contract management functions.


Job Summary

The job participates in the design of automated, scalable, and rapidly deployable solutions to systems infrastructure and server configuration. Installs, configures, and maintains operating systems, monitoring and alerting systems, utility software, and firewalls. Plans and executes hands-on maintenance for production servers as well as Windows and Linux servers.

The University of Chicago Research Computing Center (RCC) is seeking a skilled HPC System Administrator to join its Systems and Operations Team. This position will support the deployment, maintenance, and automation of RCC's HPC systems, including CPU/GPU clusters, storage, and networking infrastructure. The HPC System Administrator will assist in system-level administration, troubleshooting, performance tuning, and automation while collaborating with faculty and researchers to enable cutting-edge computational science.

This is a hybrid position requiring 3 days onsite.

Responsibilities

Administer, install, monitor, and maintain HPC systems, including compute nodes, storage, networking, and software stacks.

Develop and maintain automation tools for system provisioning, configuration management, and monitoring.

Assist in the implementation and management of distributed file systems (e.g., Lustre, BeeGFS, GPFS).

Install, configure, and optimize job scheduling and resource management tools (e.g., Slurm, LSF, PBS).

Assist in system security, patch management, and troubleshooting operational issues.

Contribute to performance benchmarking, system tuning, and capacity planning.

Deploy and maintain commonly used HPC applications and software stacks.

Document system administration procedures and contribute to knowledge-sharing initiatives.

Support researchers by providing technical expertise and resolving escalated support tickets.

Participate in vendor coordination, system procurement, and hardware/software lifecycle management.

Installs, configures, and maintains operating system workstations and servers. Performs software installations and upgrades to operating systems and layered software packages. Monitors and tunes the system to achieve optimum performance levels, acquiring higher-level skills in the process.

Maintains all supporting documentation for comprehensive operating system, hardware and software configuration. Monitors primary responses for information technology related security incidents and violations. Keeps current with new security and network monitoring technologies, applicable laws, and regulations.

Performs other related work as needed.


Minimum Qualifications

Education:

Minimum requirements include a college or university degree in related field.


Work Experience:

Minimum requirements include knowledge and skills developed through 2-5 years of work experience in a related job discipline.


Certifications:

---

Preferred Qualifications

Technical Skills or Knowledge:

Experience administering Linux-based HPC clusters, including job schedulers (e.g., Slurm, LSF, PBS).

Familiarity with high-speed networking (e.g., InfiniBand, Ethernet).

Scripting/programming skills (Python, Bash, or Perl).

Experience configuring, installing and troubleshooting MPI and OpenMP applications.

Experience configuring, installing, tuning and maintaining scientific applications on large-scale systems.

Experience with system automation tools (e.g., Ansible, Puppet).

Experience with system provisioning tools (e.g., xCAT, Confluent, Warewulf, etc).

Knowledge of distributed storage systems (e.g., Lustre, BeeGFS, GPFS).

Experience with containerization (Docker, Singularity, Apptainer).

Experience configuring, installing, maintaining and/or using infrastructure and performance monitoring and optimization tools (such as CheckMK, Grafana, Prometheus, Icinga, etc).

Experience in setting up and executing benchmarks in an HPC environment and analyzing their results systematically.

Preferred Competencies

Ability to work well with faculty and researchers.

Ability to identify and gain expertise in appropriate new technologies and/or software tools.

Ability to understand and translate researchers’ scientific goals into technical requirements.

Ability to function as part of an interactive team while demonstrating self-initiative to achieve project's goals and Research Computing Center's mission.

Strong analytical skills, problem-solving ability, attention to detail.

Application Documents

Resume (required)

Cover letter (preferred)


When applying, the document(s) MUST be uploaded via the My Experience page, in the section titled Application Documents of the application.


Job Family

Information Technology


Role Impact

Individual Contributor


Scheduled Weekly Hours

37.5


Drug Test Required

No


Health Screen Required

No


Motor Vehicle Record Inquiry Required

No


Pay Rate Type

Salary


FLSA Status

Exempt


Pay Range

$85,750.00 - $109,500.00

The included pay rate or range represents the University’s good faith estimate of the possible compensation offer for this role at the time of posting.


Benefits Eligible

Yes

The University of Chicago offers a wide range of benefits programs and resources for eligible employees, including health, retirement, and paid time off. Information about the benefit offerings can be found in the Benefits Guidebook.


Posting Statement
 

The University of Chicago is an Affirmative Action/Equal Opportunity/Disabled/Veterans and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender, gender identity, national or ethnic origin, age, status as an individual with a disability, military or veteran status, genetic information, or other protected classes under the law. For additional information please see the University's Notice of Nondiscrimination.

 

Staff Job seekers in need of a reasonable accommodation to complete the application process should call 773-702-5800 or submit a request via Applicant Inquiry Form.

 

We seek a diverse pool of applicants who wish to join an academic community that places the highest value on rigorous inquiry and encourages a diversity of perspectives, experiences, groups of individuals, and ideas to inform and stimulate intellectual challenge, engagement, and exchange.

 

All offers of employment are contingent upon a background check that includes a review of conviction history.  A conviction does not automatically preclude University employment.  Rather, the University considers conviction information on a case-by-case basis and assesses the nature of the offense, the circumstances surrounding it, the proximity in time of the conviction, and its relevance to the position.

 

The University of Chicago's Annual Security & Fire Safety Report (Report) provides information about University offices and programs that provide safety support, crime and fire statistics, emergency response and communications plans, and other policies and information. The Report can be accessed online at: http://securityreport.uchicago.edu. Paper copies of the Report are available, upon request, from the University of Chicago Police Department, 850 E. 61st Street, Chicago, IL 60637.

Confirm your E-mail: Send Email