Using NYU HPC Greene: A Beginner-Friendly Tutorial for R Users
🚀 Getting Started on NYU Greene HPC (with R and SLURM)
This tutorial provides a structured introduction to the NYU Greene High Performance Computing (HPC) environment. It includes:
- How to log in correctly
- The difference between login nodes and compute nodes
- How to load software modules
- How to install R packages into your private
/scratch/<NetID>/rpackageslibrary - How to create and submit SLURM jobs
- A complete working example using the
SISpackage
This version expands explanations, adds context for new users, and clarifies common pitfalls.
1. 🔑 Logging in to Greene
You log in using SSH. If you are off campus, you must use the NYU VPN or the gateway node.
ssh <NetID>@gw.hpc.nyu.edu # required only when off-campus / no VPN
ssh <NetID>@greene.hpc.nyu.edu
Replace <NetID> with your own NYU NetID.
If you see a known-host key warning (common after a cluster upgrade), remove the old SSH record:
ssh-keygen -R gw.hpc.nyu.edu
ssh-keygen -R greene.hpc.nyu.edu
2. 🧠 Understanding the Cluster: Login Node vs Compute Node
Greene uses a shared login node and many compute nodes.
Login node — for light tasks only
- browsing directories
- editing scripts (
nano,vim, VS Code SSH remote) - installing R/python modules
- transferring files
❌ Do not run long computations here. These may be terminated automatically.
Compute nodes — where your jobs actually run
You access these by submitting a job:
sbatch job.sh
SLURM schedules your job onto a compute node with the resources you request.
3. 📦 Loading R via Modules
Greene uses the environment module system.
List available R versions:
module avail r
Load your preferred version:
module load r/gcc/4.2.0
Check that it loaded correctly:
R --version
To unload:
module unload r/gcc/4.2.0
To see currently loaded modules:
module list
4. 📁 Installing R Packages into /scratch/<NetID>/rpackages
System-wide installation is not allowed.
You must install R packages into your personal library.
Create your library directory:
mkdir -p /scratch/<NetID>/rpackages
Tell R to use this library. You must do this every time unless added to your ~/.bashrc.
export R_LIBS="/scratch/<NetID>/rpackages"
Start R:
R
Inside R, confirm:
.libPaths()
Expected output:
[1] "/scratch/<NetID>/rpackages"
Now install packages normally:
install.packages("SIS")
library(SIS)
If installation fails, check:
- Did you load the same R module version for both installation & execution?
- Did you correctly set
R_LIBS?
Exit R:
q()
5. 📂 Project Setup
Create a clean working directory for this example:
mkdir SIS
cd SIS
6. ✏️ Example R Script (run_SIS.R)
This script:
- accepts a random seed as input
- generates synthetic data
- fits a SIS model
- saves results to
/scratch/<NetID>/SIS_results
Create the file:
nano run_SIS.R
Paste:
# -------------------------------------------------------------------
# Example R script for Greene HPC using SIS package
# -------------------------------------------------------------------
args <- commandArgs(trailingOnly = TRUE)
seed <- as.numeric(args[1])
outdir <- args[2]
set.seed(seed)
# Always load your R package library path if needed
# .libPaths("/scratch/<NetID>/rpackages") # uncomment if necessary
library(SIS)
# Generate correlated synthetic data
n = 400; p = 50; rho = 0.5
corrmat = diag(rep(1-rho, p)) + rho
corrmat[4, ] = corrmat[, 4] = sqrt(rho)
corrmat[5, ] = corrmat[, 5] = 0
corrmat[5,5] = 1
cholmat = chol(corrmat)
x = matrix(rnorm(n*p), n, p)
x = x %*% cholmat
# Gaussian response
b = c(4,4,4,-6*sqrt(2),4/3)
y = x[, 1:5] %*% b + rnorm(n)
# Run SIS with BIC tuning
model <- SIS(x, y, family='gaussian', tune='bic')
# Save results
out <- list(
seed = seed,
coef = model$coef.est,
selected_vars = model$ix
)
save(
out,
file = file.path(outdir, paste0("SIS_seed_", seed, ".RData"))
)
cat("Finished seed:", seed, "\n")
Save and exit (Ctrl+X, then Y, then Enter).
7. 🖥️ Creating the SLURM Submission Script (submit_SIS.sh)
Create:
nano submit_SIS.sh
Paste:
#!/bin/bash
#SBATCH --job-name=SIS_test
#SBATCH --output=/scratch/%u/logs/out_%A_%a.txt
#SBATCH --error=/scratch/%u/logs/err_%A_%a.txt
#SBATCH --array=1-20
#SBATCH --time=01:00:00
#SBATCH --mem=4G
#SBATCH --cpus-per-task=1
# Load R
module load r/gcc/4.2.0
# Your R library
export R_LIBS="/scratch/$USER/rpackages"
# Logging directory
mkdir -p /scratch/$USER/logs
# Output directory for results
OUTDIR="/scratch/$USER/SIS_results"
mkdir -p $OUTDIR
echo "Running seed $SLURM_ARRAY_TASK_ID"
# Run the R script
Rscript run_SIS.R $SLURM_ARRAY_TASK_ID "$OUTDIR"
Explanation of important SLURM directives:
-
--array=1-20runs 20 parallel jobs with seeds 1–20 -
%A= master job ID -
%a= array index -
--mem=4Grequests 4GB per task -
--cpus-per-task=1requests 1 CPU -
--time=01:00:00sets a 1-hour time limit - Output and error logs are saved under
/scratch/<NetID>/logs/ - Results are saved under
/scratch/<NetID>/SIS_results/
8. 🚀 Submitting Your Job
sbatch submit_SIS.sh
If successful:
Submitted batch job 1234567
9. 📊 Monitoring Your Jobs
Check all your jobs:
squeue -u <NetID>
Cancel a job:
scancel <jobid>
Cancel all your jobs:
scancel -u <NetID>
10. 📂 Examining Output Files
List your results:
ls /scratch/<NetID>/SIS_results
Load one result in R:
load("/scratch/<NetID>/SIS_results/SIS_seed_5.RData")
out
🎉 Summary
This guide covers:
✔ login/login-node basics
✔ module system usage
✔ persistent R library installation
✔ project directory setup
✔ clear, well-annotated R example
✔ well-structured SLURM job script
✔ job submission / monitoring
✔ consistent storage under /scratch/<NetID>
Enjoy Reading This Article?
Here are some more articles you might like to read next: