LearnHPC: dynamic creation of HPC infrastructure for educational purposes

 

Abstract

In a newly successful PRACE-ICEI proposal, E-CAM, FocusCoE, HPC Carpentry and EESSI join forces to bring HPC resources to the classroom in a simple, secure and scalable way. Our plan is to reproduce the model developed by the Canadian open-source software project Magic Castle. The proposed solution creates virtual HPC infrastructure(s) in a public cloud, in this case on the Fenix Research Infrastructure, and generates temporary event-specific HPC clusters for training purposes, including a complete scientific software stack. The scientific software stack is fully optimised for the available hardware and will be provided by the European Environment for Scientific Software Installations (EESSI). 

Description 

EU-wide requirements for HPC training are exploding as the adoption of HPC in the wider scientific community gathers pace. However, the number of topics that can be thoroughly addressed without providing access to actual HPC resources is very limited, even at the introductory level. In cases where such access is available, security concerns and the overhead of the process of provisioning accounts make the scalability of this approach questionable.

EU-wide access to HPC resources on the scale required to meet the training needs of all countries is an objective that we attempt to address with this project. The proposed solution essentially provisions virtual HPC system(s) in a public cloud, in this case on the Fenix Research Infrastructure. The infrastructure will dynamically create temporary event-specific HPC clusters for training purposes, including a scientific software stack. The scientific software stack will be provided by the European Environment for Scientific Software Installations (EESSI) which uses a software distribution system developed at CERN, CernVM-FS, and makes a research-grade scalable software stack available for a wide set of HPC systems, as well as servers, desktops and laptops (including MacOS and Windows!). 

The concept is built upon the solution of Compute Canada, Magic Castle, which aims to recreate the Compute Canada user experience in public clouds (there is even a presentation where the main developer creates a cluster just by talking to his phone!). Magic Castle uses the open-source software Terraform and HashiCorp Language (HCL) to define the virtual machines, volumes, and networks that are required to replicate a virtual HPC infrastructure. 

In addition to providing a dynamically provisioned HPC resource, the project will also offer a scientific software stack provided by EESSI. This model is also based on a Compute Canada approach and enables replication of the EESSI software environment outside of any directly related physical infrastructure. 

Our adaption of Magic Castle aims to recreate the EESSI HPC user experience, for training purposes, on the Fenix Research Infrastructure.  After deployment, the user is provided with a complete HPC cluster software environment including a Slurm scheduler, a Globus Endpoint, JupyterHub, LDAP, DNS, and a wide selection of research software applications compiled by experts with EasyBuild.

The architecture of the solution is best represented by the graphic below (taken from the Compute Canada documentation at https://github.com/ComputeCanada/magic_castle/tree/master/docs):

Cloud Cluster Architecture Overview ©Magic Castle (https://github.com/ComputeCanada/magic_castle)

With the resources made available to the project, we plan to run 6 HPC training events from January to July 2021. These training events are connected to the Centres of Excellence E-CAM and FocusCoE and with HPC Carpentry.

Share

The launch of the E-CAM Online Training Portal

 

We are pleased to announce that our E-CAM training portal is now online. Access instructions here.

The goals and expected impacts for our online training infrastructure are to:

  •   Collect the content captured at our Extended Software Development Workshops (ESDWs), allowing participants to re-visit lectures or demonstrations in their own time, both during and after the meeting. Such material can also be used by people who did not have the opportunity to attend the ESDW in person (particularly interested industries);
  •   Generate online training modules for each ESDW, which will be a set of preparatory materials shared with the participants of the event and that will allow everyone to acquire the same basic knowledge before the meeting;
  •   Be a repository for the data associated to our events, such as captured lectures, lecture materials, reading materials, tutorial content and software requirements;
  •   Build tutorials on programming best practices to develop software for extreme-scale hardware, that we can propose to the extended E-CAM community;
  •   Associate with other groups and projects with similar training scope, to cover for different and broader training material.

 

Information on the access to the portal, terminology and instructions for ESDW participants is at this link. The content of the training portal  is freely available upon registration, but we also keep a selection of publicly available lectures accessible directly from the E-CAM website.

 

Share