BELGIUM - System Engineer HPC

 

 

Cenaero, located in Gosselies (Belgium), is a private non-profit applied research center providing to companies involved in a technology innovation process numerical simulation methods and tools to invent and design more competitive products. Our ambition is to be internationally recognized as a technology leader in modeling and numerical simulation, to be a strategic partner of large global industries as well as a real support to regional companies including innovative SMEs.

Cenaero provides expertise and engineering services in multidisciplinary simulation, design, and optimization in the fields of both mechanics (including fluid, structure, thermal, and acoustics) and electro-magnetics, manufacturing of metallic and composite structures as well as in analysis of in-service behavior of complex systems and life prediction. It also provides software through its massively parallel multi-physics platform Argo and its design space exploration and optimization platform Minamo. Cenaero operates the Tier-1 Walloon supercomputing infrastructure, named Lucia, of a capacity close to 4 PFlops on a mixed CPU and GPU architecture.

Cenaero is currently looking for a System Engineer HPC (M/F). This permanent position is available immediately.

 

Responsibilities:

In collaboration with the other HPC team members:

  • Ensure the day-to-day operations of the Walloon tier-1 supercomputer in order to guarantee an optimal performance, security and availability of the system.
  • Monitor computing resources, networking components and storage systems, and report on their usage.
  • Operate, configure, tweak and maintain up-to-date the systems and management tools (operating system, job scheduler, system monitoring and deployment tools, storage and network middleware, etc.).
  • Maintain and deploy the software packages available to the users.
  • Follow up on incidents, diagnose hardware and software issues in collaboration with the vendors, possibly assist vendors replacing defective parts, apply software or firmware updates.
  • Engage with vendors for a proactive maintenance of the systems and planning major hardware or software upgrades.
  • End-user support ranging from helping researchers onboarding HPC systems and explaining basic usage, to assisting users in troubleshooting software issues and in performance debugging their codes.
  • Maintain and improve system and user documentation.
  • Remain constantly informed of the evolutions, emerging technologies and best practices in the HPC field.
  • Contribute to the drafting of technical specifications for the supply of new equipment through public tenders procedures.

 

Qualifications:

Required:

  • Master's degree in computer science, engineering or demonstrated equivalent experience.
  • At least 3 years of administration and operation of computing environments with Linux (preferably RHEL or SLE).
  • Working knowledge of HPC environments and architectures.
  • Solid scripting skills (e.g. Bash, Python, Perl, etc.).
  • Good communication skills, verbal and written, in French or in English.
  • Good organizational skills and independent in daily tasks.
  • Good analytical mind, especially to quickly identify and solve problems.

 

Nice to have:

  • Experience with HPC job scheduling tools such as Slurm, PBS, etc.
  • Good understanding of Ethernet and Infiniband network concepts.
  • Experience with parallel filesystems such as GPFS, Lustre, BeeGFS, etc.
  • Experience with deployment automation and configuration management tools such as xCAT, Foreman, Ansible, Puppet, etc.
  • Experience with monitoring tools such as Nagios, Ganglia, Grafana, Telegraf, etc.
  • Familiar with software management tools such as Easybuild or Spack.
  • Experience with the configuration, compilation and installation of a variety of Linux software with complex dependencies.
  • Good knowledge of programming languages such as C, C++ and Fortran.
  • Familiar with software programming environments such as GCC, Intel Compilers, CrayPE, etc.
  • Familiar with Nvidia CUDA libraries and GPU computing.
  • Familiar with distributed and shared memory parallelism (MPI/OpenMP).
  • Experience with container technologies, in particular Apptainer/Singularity.
  • Familiar with web portals such as Open OnDemand.
  • Experience with LDAP, Red Hat Identity Manager or FreeIPA.
  • Experience with physical installation of servers, replacing parts, cabling, racking, etc.

 

Offer

Cenaero offers a position in growing and leading technological sectors, a direct relationship with their business actors and technical experts, a competitive salary package and a stimulating and dynamic work environment. The successful candidate will benefit from outstanding supercomputing capacity with a brand-new Tier-1 facility at regional level and the possibility to access one of the most powerful supercomputers in the world through the LUMI consortium, in which Belgium has a significant share.

 

Application procedure

Interested candidates should send a cover letter, quoting the reference number of the offer (BE-JO-2023-03) and a resume to rh_be-jo-2023-03@cenaero.be.