Toolchain
How can I interact with my workload globally on the system?
Section titled “How can I interact with my workload globally on the system?”Creating wrapper scripts in your PATH
makes it easier to interact with your project from anywhere on the system. For research and development, this often makes it easier to interact with your workload without having to navigate to specific directories or hardcoding paths in job scripts. This way you’ll be able to interact with your project from anywhere with something like @workload train --help
for example.
Creating Wrapper Scripts
Section titled “Creating Wrapper Scripts”Store executable scripts in directories like ~/bin
or ~/.local/bin
that are in your PATH
. Consider prefixing with @
to indicate it’s a wrapper script rather than an official package.
#!/bin/bash
# Wrapper script for project workload# Runs the specified entrypoint from your project directory
uv run --directory ${HOME}/my-project workload "$@"
The "$@"
is used to pass all the arguments to the entrypoint, so you can run @workload train --epochs 100 --batch-size 32
and those arguments will be forwarded to your training script.
How do I keep a consistent environment and toolchain configuration across different machines?
Section titled “How do I keep a consistent environment and toolchain configuration across different machines?”Having a well-defined toolchain makes it alot easier to have a consistent experience across different machines which will make it more engaging and easier to use HPCs in general.
A flexible and common way of managing dotfiles is to use a symlink farm/manager like stow
. The idea is that you have a repo with all of your dotfiles and then you can use stow
to symlink the files to the correct location.
To get started initialize a new directory named dotfiles
in your home directory and run git init
to initialize the repo. Then begin copying the original dotfiles and/or directories such as ~/.bashrc
, ~/.ssh/config
, ~/.config/wandb
and so on.
From here you can run stow
to symlink the files to the correct location.
# From anywhere in the terminalstow -v <package> $HOME
# From within the dotfiles repocd ~/.dotfiles && stow .
How do I access services on the remote machine?
Section titled “How do I access services on the remote machine?”To access services running on remote HPC nodes, create an SSH tunnel to forward the remote port to your local machine. One example could be running a Jupyter notebook server alongside an Ollama inference service, allowing you to interact with AI models directly from your notebook.
LSF Job Submission Script
Section titled “LSF Job Submission Script”#!/bin/bash
#BSUB -J services # Job name#BSUB -n 4 # Number of CPU cores#BSUB -W 7:00 # Walltime (7 hours)#BSUB -q p1 # Queue name#BSUB -R "rusage[mem=8GB]" # Memory requirement#BSUB -R "span[hosts=1]" # Ensure all cores on one host#BSUB -gpu "num=1:mode=exclusive" # GPU requirement#BSUB -u your@email.com # Email for notifications#BSUB -B # Send email at job begin#BSUB -N # Send email at job end#BSUB -o %J.out # Standard output#BSUB -e %J.err # Standard error
./entrypoint.sh
#!/usr/bin/env bashset -euo pipefail
: "${NOTEBOOK_APP_IP:=0.0.0.0}": "${NOTEBOOK_APP_PORT:=8888}": "${NOTEBOOK_APP_OPEN_BROWSER:=False}"
# Start Jupyter servicejupyter notebook --ip="$NOTEBOOK_APP_IP" --port="$NOTEBOOK_APP_PORT" --no-browser="$NOTEBOOK_APP_OPEN_BROWSER" &JUPYTER_PID=$!
wait $JUPYTER_PID
SSH Tunnel
Section titled “SSH Tunnel”Create the tunnel to access services locally:
# Forward both ports (assumes SSH host 'p1-hpc' in ~/.ssh/config)ssh -L 8888:localhost:8888 p1-hpc -N -f
# Access service# Jupyter: http://localhost:8888
Or add the following to your ~/.ssh/config
:
Host p1-hpc # ... LocalForward 8888 localhost:8888