Eviden, part of the Atos Group, with an annual revenue of circa € 5 billion is a global leader in data-driven, trusted and sustainable digital transformation. As a next generation digital business with worldwide leading positions in digital, cloud, data, advanced computing and security, it brings deep expertise for all industries in more than 47 countries. By uniting unique high-end technologies across the full digital continuum with 47,000 world-class talents, Eviden expands the possibilities of data and technology, now and for generations to come.
HPC Support Engineer:
A High-Performance Computing (HPC) support engineer plays a vital role in maintaining and optimizing computing environments, which are used by research institutions, industries, and organizations for tasks that require significant computational power, such as scientific simulations, large-scale data analysis, machine learning, and engineering computations.
Role Expectations:
HPC systems are often clusters of interconnected servers. The engineer is responsible for the administration of these clusters, which includes installation, configuration, and maintenance of hardware and software.
Linux is the dominant OS in HPC environments. The engineer ensures that the OS is updated, secure, and optimized for high-performance workloads.
Deploy, configure, and maintain parallel file systems (e.g., Lustre, GPFS (Spectrum Scale))
Manage NAS, SAN, and object storage solutions (e.g., Ceph, ZFS, NetApp, Dell EMC Isilon)
Handle RAID configurations, LVM (Logical Volume Manager)
Investigate and resolve disk failures, network congestion, and hardware faults.
Analyze logs from storage controllers, RAID arrays, and filesystems.
Set up snapshots, replication, and erasure coding for data redundancy.
Interactions with SMC (Smart Management Center) which is the foundation for hosting infrastructure and application micro-services dedicated in managing a HPC supercomputer.
Support and maintain technology standards, processes and policies related to on prem/cloud Infrastructure in scope.
Produce and maintain appropriate documentation and diagrams describing system setups and overall inventory.
Capabilities and Expertise:
System Administration RedHat expertise.
Storage Management, familiarity with large-scale storage systems such as GPFS, Lustre, or NFS, and the ability to troubleshoot file system issues.
Proficiency in hardware diagnostics.
Familiarity with high-availability (HA) storage solutions.
Experience with backup and restore solutions for petabyte-scale data
Familiarity firewall rules for securing storage nodes.
Scripting Proficiency, use scripting languages such as Bash, Python, or Perl for automating routine tasks like storage monitoring, job submissions, etc.
Nice to have:
Supercomputers knowledge, and understanding of advanced supercomputing platforms (e.g., Cray, IBM Blue Gene).
IBM Certified Administrator - Spectrum Scale (GPFS) – GPFS expertise.
Understanding of how HPC storage is evolving in exascale computing.
What we offer:
Training and Certifications: Access to continuous learning and career development opportunities. Flexible working environment Competitive salary and benefits package. Reimbursement: Get a yearly fixed amount for reimbursement. Performance Bonus: Earn an annual performance bonus based on your achievements. Career Advancement: Explore numerous opportunities for professional growth and career advancement. Extra Vacation Days: Take advantage of additional vacation days to relax and recharge.
Let’s grow together.