Kingsville, TX, 78363, USA
3 days ago
High-Performance Computing Systems Engineer
Job Title High-Performance Computing Systems Engineer Agency Texas A&M University - Kingsville Department I Tech Proposed Minimum Salary Commensurate Job Location Kingsville, Texas Job Type Staff Job Description Job Summary The High-Performance Computing Engineer (HPC) is a unique role that combines the design, development, and operational management of the institution's high-performance computing resources. This position offers the opportunity to work closely with faculty, researchers, and students, supporting their computational research projects and ensuring the HPC infrastructure meets their needs. The engineer will play a crucial role in optimizing computational methods and facilitating groundbreaking research across disciplines. The High-Performance Computing Engineer manages the High-Performance Computing cluster administration, unit coordination, maintaining HPC systems, strategic planning for the University’s HPC infrastructure, and providing advanced technical support for using HPC systems. Essential Duties and Responsibilities System Architecture and Design + To meet research needs, design and implement HPC infrastructure, including compute clusters, storage, and interconnects to accommodate for our computational needs of our research community. + Evaluate and integrate HPC, cloud, and storage technology advancements to enhance performance. System Administration and Maintenance + Manage and optimize HPC clusters, addressing hardware, software, and networking. + Perform system administration tasks on HPC clusters, including configuration, maintenance, and troubleshooting of hardware, software and networking components. + Monitor performance, troubleshoot, and implement security measures. User Support and Collaboration + Provide technical support and training for researchers on HPC tools and best practices. + Organize training sessions and workshops on HPC best practices and programming, and optimization techniques. + Collaborate with researchers on computational strategies and code optimization. Strategic Planning + Represent the department in strategic planning and advisory roles. + Guide IT strategies to support teaching, research, and service goals. + Collaborate and advise the CIO and other executive staff on issues concerning information technology needs of Texas A&M – Kingsville. + Establish information technology strategy, direction, and strategic plans to achieve the university’s teaching, learning, research, and service goals. Software and Application Management + Deploy and maintain scientific software and development tools. + Develop scripts and tools to automate tasks and enhance workflows. + Must be fluent in multiple programming languages to meet our campus needs. Disaster Recovery and Continuity + Regularly review and document disaster recovery and business continuity procedures. + Assess HPC utilization, lifecycle, and performance for improvement opportunities. + Ensure we are aligned with our campus and system policies and rules. + Design, test, and verify the disaster recovery plan to ensure continuity. Research and Development Computing + Lead Administrator for our campus HPC systems and document performance analyses. + Identify and implement solutions to advance computational research. Data Management and Storage + Develop policies for data integrity, backup, and availability. + Design scalable storage solutions for efficient data access and integration. + Optimize scalable solutions with efficiency. Networking and Collaboration + Build partnerships with industry, academic institutions, and HPC networks. Training and Education + Create training programs and documentation to support organizational needs. + Communicate effectively across all organizational levels. The above represents the major duties, responsibilities, and authorities of this job, and is not intended to be a complete list of all tasks and functions. Other duties may be assigned. Additional Responsibilities Other: 5% + May require availability to work some nights, weekends, and holidays. + Perform other duties as assigned. Minimum Requirements Education – Bachelor’s degree or an equivalent combination of education and experience Experience – Six years of related experience Preferred Requirements Education – Master's in Computer or Computational Science, Statistics, or Engineering program. Experience: + Ten years or more experience in HPC related to hands-on system administration and management of large-scale supercomputing clusters at all levels, the use of parallelization techniques, the use of programming languages, tools, and techniques with Fortran, C/C++, Java, or POSIX threads, etc., and mass storage architecting and planning. + Five years of management and leadership experience in HPC or research computing centers. + Experience with computing clusters in Windows and Linux and virtualized environments. + Experience in enhancing and maintaining the securing of HPC resources. + Ability to evaluate and benchmark cluster architectures and their key subsystems (e.g., mass storage, interconnect, processor technology). Knowledge of scripting languages like Bash, Python, and Perl to maintain HPC systems and scientific computing. Knowledge of C/C++, Fortran, CUDA, OpenCL, OpenMP, and MPI for scientific computing. Configuration management tools include Puppet, Chef, Ansible, Salt, etc. Knowledge of container technologies such as Docker, Singularity, and Kubernetes. Excellent troubleshooting skills include quickly recognizing failure modes and corresponding symptoms. Excellent intercommunication skills. + Higher Education Experience Licensing / Professional Certifications: + Linux/UNIX certifications related to systems administration. + Certifications related to managing high-performance storage systems. Supervision of Others This position generally does not supervise employees. All positions are security-sensitive. Applicants are subject to a criminal history investigation, and employment is contingent upon the institution’s verification of credentials and/or other information required by the institution’s procedures, including the completion of the criminal history check. Equal Opportunity/Affirmative Action/Veterans/Disability Employer.
Confirm your E-mail: Send Email