18) Introduction to Batch Jobs

18) Introduction to Batch Jobs#

Last time:

Introduction to MPI

Today:

Introduction to Batch Jobs and Job Scripting
SLURM Demo

1. Introduction to Batch Jobs and Job Scripting#

Batch jobs are, by far, the most common type of job on HPC systems.
Batch jobs are resource provisions that run applications on compute nodes and do not require supervision or interaction.
Batch jobs are commonly used for applications that run for long periods of time or require little to no user input.
To submit jobs that use multiple resources in parallel, you need a job/task scheduling system.
- SLURM is a very popular parallel job scheduler (although not the only one).
- SLURM submits/manages jobs to its own scheduling system via the sbatch command.

Jobs scripting#

Even though it is possible to run jobs completely from the command line, it is often overly tedious and unorganized to do so.

Instead, it is recommended to construct a job script for your batch jobs.

A job script is a set of Linux commands paired with a set of resource requirements that can be passed to the Slurm job scheduler.
Slurm will then generate a job according to the parameters set in the job script.
Any commands that are included with the job script will be run within the job.

Running a job script#

Running a job script can be done with the sbatch command:

sbatch <your-job-script-name>

Once you submit yout batch job, you may see the following message:

Submitted batch job <job_ID>

You can query the status of your job (wait in queue/running/ended, etc) with:

squeue -u your_user_name

Making a job script#

A job script looks something like this:

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:10:00
#SBATCH --partition=<any_needed_specific_partition>
#SBATCH --output=sample-%j.out

module purge

module load <necessary_modules_to_be_loaded> # such as compilers needed etc

echo "== This is the scripting step! =="
sleep 30
./executable
echo "== End of Job =="

Normally job scripts are divided into 3 primary parts:

directives
loading software
user scripting.

Directives give the terminal and the Slurm daemon instructions on setting up the job.

Loading software involves cleaning out the environment and loading specific pieces of software you need for your job.

User scripting is simply the commands you wish to be executed in your job.

Directives#

The first directive, the shebang directive, is always on the first line of any script.

The directive indicates which shell you want running commands in your job.

Most users employ bash as their shell, so we will specify bash by typing:

#!/bin/bash

The next directives that must be included with your job script are sbatch directives.

These directives specify resource requirements to Slurm for a batch job.

These directives must come after the shebang directive and before any commands are issued in the job script.

Each directive contains a flag that requests a resource the job would need to complete execution.

You can also send yourself an email to alert you that your job has ended.

An sbatch directive is written as such:

#SBATCH --<resource>=<amount>

For example, if you wanted to request 2 nodes with an sbatch directive, you would write:

#SBATCH --nodes=2

Other useful directives:

--ntasks=processes to specify total number of tasks
--ntasks-per-node=processes to specify the number of processes you wish to assign to each node
--mem=memory sets the total memory (per node requested) required for the job.

Having more resources (e.g., they can be subdivided in “Resource Sets” etc.) means having more parameters to specify! For instance, researchers at ORNL, put together workshops on How to submit jobs on the Summit supercomputer.

2. SLURM Demo#

You can find a couple of batch job scripts in batch_scripts/ that you can execute via sbatch on the cluster.