Upscaling AI with Containers

Artificial Intelligence (AI) has become a foundational building block of our modern world. Accordingly, a vast effort has been put into bringing AI to researchers and practitioners of a wide range of fields. Nonetheless, the computationally intensive task of training an AI increasingly requires more computational power than what our laptops and PCs can offer. Therefore, the ability to develop and train a neural network on large clusters seems imperative. This workshop teaches us how to scale an AI-powered application in large clusters, i.e., supercomputers.

Prerequisites

Working knowledge of Unix OS is required. In addition, a basic understanding of Neural Networks (NNs) is desirable. Please follow the link to create a username and password on the DockerHub website, as we will use Play-with-Docker (PWD) freely available service. Details of using and access to the cluster are given in Access to Vega section.

20 min

Introduction to Containers

20 min

Introduction to Docker

20 min

Namespaces and cgroups

20 min

Cleaning Up Containers

20 min

Creating your own container images

20 min

Creating More Complex Container Images

20 min

PWD exercises

20 min

Containers in research workflows

20 min

Singularity: Getting started

20 min

Working with Singularity containers

20 min

Building Singularity images

20 min

Running MPI parallel jobs using Singularity containers

20 min

TensorFlow on a single GPU

20 min

Distributed training in TensorFlow

20 min

Intoduction to Horovod

Who is the course for?

About the course

This lesson material is developed by the EuroCC National Competence Center Sweden (ENCCS) and taught in ENCCS workshops. It aims at researchers and developers who have experience working with AI and wish to train their applications on supercomputers. The lesson material is licensed under CC-BY-4.0 and can be reused in any form (with appropriate credit) in other courses and workshops. Instructors who wish to teach this lesson can refer to the Instructor’s guide for practical advice.

See also

Docker provides plenty of educational materials for users. Therefore, checking Docker official website is highly recommended. The same can be stated about Singularity , where one can find many compelling examples with relevant details.

TensorFlow and Horovorod documentation are also good sources of learning about commands and their proper use.

Credits

The lesson file structure and browsing layout is inspired by and derived from work by CodeRefinery licensed under the MIT license. We have copied and adapted most of their license text.

Materials from the below references have been used in various parts of this course.

Instructional Material

This instructional material is made available under the Creative Commons Attribution license (CC-BY-4.0). The following is a human-readable summary of (and not a substitute for) the full legal text of the CC-BY-4.0 license. You are free to:

  • share - copy and redistribute the material in any medium or format

  • adapt - remix, transform, and build upon the material for any purpose, even commercially.

The licensor cannot revoke these freedoms as long as you follow these license terms:

  • Attribution - You must give appropriate credit (mentioning that your work is derived from work that is Copyright (c) Hossein Ehteshami and individual contributors and, where practical, linking to https://enccs.se), provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

  • No additional restrictions - You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

With the understanding that:

  • You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.

  • No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.

Software

Except where otherwise noted, the example programs and other software provided with this repository are made available under the OSI-approved MIT license.