Development Environment Setup on Local Machine

We encourage all students to use the Conda package management for a better environment manegement. The Conda installation can be found here https://www.anaconda.com/products/individual.

Create environment

The default environment usually have too many pre-installed packages, which may conflict the packages we want to install. Therefore, we create a brand new environment by the following command:

conda create -n cs533r python=3.8

This command creates a Python3.8 environment named cs533r.

Then we can use the command

conda activate cs533r

to activate cs533r environment.

Install Packages

We have included all packages needed in requirements.txt. You can use

conda install --file requirements.txt

to automatically install all required packages.

Open JupyterLab

As our assignments are written in Jupyter, we use

jupyter lab

to open the jupyter lab in our browser.

Instructions on xxx@lin*.student.cs.ubc.ca

This machine is used to submit heavy jobs. In general, it needs to wait some time until the job is accepted to run. Therefore, it is very unrecommanded to run Jupyter on this machine.

Setup

To access xxx@lin*.student.cs.ubc.ca, we need first get into the intermediate platform by

ssh xxx@remote.student.cs.ubc.ca

Then we can access the server by

ssh xxx@lin*.student.cs.ubc.ca

where * can be 03-25.

Our working directory is /hpc/cs-533r/students. We can use

cd /hpc/cs-533r/students

to access it. We need to create a folder under this directory as our own working folder. All packages and data should go in this folder Please name the folder by the name of the group for convinience. We take xxx as an example.

mkdir xxx
cd xxx

The conda environment can be installed by previous instructions.

A simple example

Let's start from the hello world script hello.sh

#!/bin/bash
#SBATCH --time=00:01:00                            #### date format is DD:HH:MM:SS
#SBATCH --account=hpc-cpsc533r            #### account should not be changed
#SBATCH --partition=CPSC533R                #### partition should not be changed

echo 'Hello, world!'
sleep 30

After saving this to hello.sh, we can use

sbatch hello.sh

to submit the job.

You will be able to see your job running (as it sleeps for 30 seconds) by running the command:

squeue

To cancel this job before it finishes, you could get job_id from the output of squeue and run the command:

scancel job_id

Training script example

A common training script should look like this:

#!/bin/bash

#SBATCH --gres=gpu:1                                            ##### how many GPUs to use
#SBATCH --cpus-per-task=6                                    #####    how many CPUs to use
#SBATCH --mem=32G                                                    ##### Memory required for each CPU
#SBATCH --time=00:10:00                                        ##### Training time
#SBATCH --account=hpc-cpsc533r
#SBATCH --partition=CPSC533R

source path_to_your_conda_environment/bin/activate

python test.py

We use source to activate the conda environment, and then simply run your Python script.

Submitting multiple jobs simultaneously

In the following example, we want to submit 3 jobs at the same time, with 3 different learning rates.

#!/bin/bash

#SBATCH --gres=gpu:1                ##### how many GPUs to use
#SBATCH --cpus-per-task=6        #####    how many CPUs to use
#SBATCH --mem=32G                        ##### Memory required for each CPU
#SBATCH --time=00:10:00            ##### Training time
#SBATCH --array=0-2                    ##### Job index
#SBATCH --account=hpc-cpsc533r
#SBATCH --partition=CPSC533R

source path_to_your_conda_environment/bin/activate

learning_rates=(0.1 0.01 0.001)
lr=${learning_rates[$SLURM_ARRAY_TASK_ID]}

python test.py --learning_rate $lr

We use #SBATCH --array=0-2 to indicate that we need 3 jobs and their index is 0, 1, 2. We can access the job index by the variable SLURM_ARRAY_TASK_ID. In this script, we pass learning rates 0.1, 0.01, 0.001, to job 0, 1, 2, respectively.