Lsf For Dummies
Jan 25, 2017
LSF for dummies
(Actually, if you’ve gotten to a point you need to use LSF, you’re far from a dummy ;P)
LSF (Load Sharing Facility) is a system to manage programs that generally cannot be run interactively on a machine because they require too much CPU-time, memory, or other system resources. For that reason, those large programs have to be run in batch as jobs.
LSF takes care of that batch management. Based on the job specifications, LSF will start execution of jobs when there are enough system resources available for the job to complete. Until that time, a job request will be queued in a queue.
What is a queue and what queues are available to me?
The queue is the basic container for jobs in the LSF system. Queues can have access controls placed on them (so only users in a certain group can use a certain queue, for example), and once a job is submitted, its queue determines how it is scheduled and where it is executed.
bqueues
displays a list of queues
bqueues -u all
shows the list of queues that all users can submit to
How do I submit a job to a queue?
bsub
submits a job to the LSF system
There are a number of options you can use (with examples):
-n
4
Run on four cores (Some programs use “processors” or “threads” for this idea)
-e
errfile
Send errors (stderr) to file errfile. If file exists, add to it.
-N
notify
Send email when job finishes
-o
outfile
Send screen output (stdout) to file outfile. If file exists, add to it.
-R "rusage[mem=10000]"
Resource request. Reserve 10,000 MB of memory
-R "select[transfer]"
Resource request. Only run on “transfer” computers
-W 30
Runlimit. Job will be killed if it runs longer than 30 minutes (30:00 means 30 hours)
Using the trivial example of running the program test.sh in the short queue with a runtime limit of 5 minutes
bsub -q short -W 5 test.sh
Note we can also use bsub commands in combination with workflow management systems such as snakemake:
snakemake --snakefile tophat.snakefile --jobs 999 --cluster 'bsub -o "tophat.snakemake.out" -q short -W 12:00 -R "rusage[mem=4000]"'
Common issues
It’s generally better to over-estimate than under-estimate run time and memory usage otherwise you may get errors such as ‘TERM_MEMLIMIT: job killed after reaching LSF memory usage limit’, etc.
Useful tips
If you submit to the wrong queue and would like to switch, try:
bswitch -q long mcore 0
This switches all pending jobs from the long to mcore queue.
Another approach to give you more control is to use bmod:
bmod -q mcore <JOBID>
Thus the following will give the same result as bswitch:
pend=$(bjobs | grep ‘PEND’ | awk ‘{ print $1 }’)
for i in $pend; do bmod -q mcore $i; done;
If you messed up and need to kill your jobs:
bkill <JOBID>
or bkill 0
to kill all jobs
Additional resources
- Older
- Newer
RECENT POSTS
- Animating the Cell Cycle on 28 December 2020
- Using R To Find The Missing Faculty on 30 November 2020
- Using scVelo in R using Reticulate on 25 August 2020
- A Guide to Responding to Scientific Peer Review on 17 June 2020
- Quickly Creating Pseudobulks on 06 April 2020
- A Guide to Scientific Peer Review on 23 March 2020
- Ten PhD Transition Tips for the Biological Sciences on 23 January 2020
- RNA Velocity Analysis (In Situ) - Tutorial and Tips on 14 January 2020
- How to write an abstract on 24 September 2019
- Figure style faux pas on 19 July 2019