Using NYU HPC Greene: A Beginner-Friendly Tutorial for R Users

🚀 Getting Started on NYU Greene HPC (with R and SLURM)

This tutorial provides a structured introduction to the NYU Greene High Performance Computing (HPC) environment. It includes:

How to log in correctly
The difference between login nodes and compute nodes
How to load software modules
How to install R packages into your private /scratch/<NetID>/rpackages library
How to create and submit SLURM jobs
A complete working example using the SIS package

This version expands explanations, adds context for new users, and clarifies common pitfalls.

1. 🔑 Logging in to Greene

You log in using SSH. If you are off campus, you must use the NYU VPN or the gateway node.

ssh <NetID>@gw.hpc.nyu.edu   # required only when off-campus / no VPN
ssh <NetID>@greene.hpc.nyu.edu

Replace <NetID> with your own NYU NetID.

If you see a known-host key warning (common after a cluster upgrade), remove the old SSH record:

ssh-keygen -R gw.hpc.nyu.edu
ssh-keygen -R greene.hpc.nyu.edu

Greene uses a shared login node and many compute nodes.

Login node — for light tasks only

browsing directories
editing scripts (nano, vim, VS Code SSH remote)
installing R/python modules
transferring files

❌ Do not run long computations here. These may be terminated automatically.

Compute nodes — where your jobs actually run
You access these by submitting a job:

sbatch job.sh

SLURM schedules your job onto a compute node with the resources you request.

3. 📦 Loading R via Modules

Greene uses the environment module system.

List available R versions:

module avail r

Load your preferred version:

module load r/gcc/4.2.0

Check that it loaded correctly:

R --version

To unload:

module unload r/gcc/4.2.0

To see currently loaded modules:

module list

4. 📁 Installing R Packages into `/scratch/<NetID>/rpackages`

System-wide installation is not allowed.
You must install R packages into your personal library.

Create your library directory:

mkdir -p /scratch/<NetID>/rpackages

Tell R to use this library. You must do this every time unless added to your ~/.bashrc.

export R_LIBS="/scratch/<NetID>/rpackages"

Start R:

Inside R, confirm:

.libPaths()

Expected output:

[1] "/scratch/<NetID>/rpackages"

Now install packages normally:

install.packages("SIS")
library(SIS)

If installation fails, check:

Did you load the same R module version for both installation & execution?
Did you correctly set R_LIBS?

Exit R:

q()

5. 📂 Project Setup

Create a clean working directory for this example:

mkdir SIS
cd SIS

6. ✏️ Example R Script (`run_SIS.R`)

This script:

accepts a random seed as input
generates synthetic data
fits a SIS model
saves results to /scratch/<NetID>/SIS_results

Create the file:

nano run_SIS.R

Paste:

# -------------------------------------------------------------------
# Example R script for Greene HPC using SIS package
# -------------------------------------------------------------------

args <- commandArgs(trailingOnly = TRUE)
seed <- as.numeric(args[1])
outdir <- args[2]

set.seed(seed)

# Always load your R package library path if needed
# .libPaths("/scratch/<NetID>/rpackages")  # uncomment if necessary

library(SIS)

# Generate correlated synthetic data
n = 400; p = 50; rho = 0.5
corrmat = diag(rep(1-rho, p)) + rho
corrmat[4, ] = corrmat[, 4] = sqrt(rho)
corrmat[5, ] = corrmat[, 5] = 0
corrmat[5,5] = 1
cholmat = chol(corrmat)

x = matrix(rnorm(n*p), n, p)
x = x %*% cholmat

# Gaussian response
b = c(4,4,4,-6*sqrt(2),4/3)
y = x[, 1:5] %*% b + rnorm(n)

# Run SIS with BIC tuning
model <- SIS(x, y, family='gaussian', tune='bic')

# Save results
out <- list(
  seed = seed,
  coef = model$coef.est,
  selected_vars = model$ix
)

save(
  out,
  file = file.path(outdir, paste0("SIS_seed_", seed, ".RData"))
)

cat("Finished seed:", seed, "\n")

Save and exit (Ctrl+X, then Y, then Enter).

7. 🖥️ Creating the SLURM Submission Script (`submit_SIS.sh`)

Create:

nano submit_SIS.sh

Paste:

#!/bin/bash
#SBATCH --job-name=SIS_test
#SBATCH --output=/scratch/%u/logs/out_%A_%a.txt
#SBATCH --error=/scratch/%u/logs/err_%A_%a.txt
#SBATCH --array=1-20
#SBATCH --time=01:00:00
#SBATCH --mem=4G
#SBATCH --cpus-per-task=1

# Load R
module load r/gcc/4.2.0

# Your R library
export R_LIBS="/scratch/$USER/rpackages"

# Logging directory
mkdir -p /scratch/$USER/logs

# Output directory for results
OUTDIR="/scratch/$USER/SIS_results"
mkdir -p $OUTDIR

echo "Running seed $SLURM_ARRAY_TASK_ID"

# Run the R script
Rscript run_SIS.R $SLURM_ARRAY_TASK_ID "$OUTDIR"

Explanation of important SLURM directives:

--array=1-20 runs 20 parallel jobs with seeds 1–20
%A = master job ID
%a = array index
--mem=4G requests 4GB per task
--cpus-per-task=1 requests 1 CPU
--time=01:00:00 sets a 1-hour time limit
Output and error logs are saved under /scratch/<NetID>/logs/
Results are saved under /scratch/<NetID>/SIS_results/

8. 🚀 Submitting Your Job

sbatch submit_SIS.sh

If successful:

Submitted batch job 1234567

9. 📊 Monitoring Your Jobs

Check all your jobs:

squeue -u <NetID>

Cancel a job:

scancel <jobid>

Cancel all your jobs:

scancel -u <NetID>

10. 📂 Examining Output Files

List your results:

ls /scratch/<NetID>/SIS_results

Load one result in R:

load("/scratch/<NetID>/SIS_results/SIS_seed_5.RData")
out

🎉 Summary

This guide covers:

✔ login/login-node basics
✔ module system usage
✔ persistent R library installation
✔ project directory setup
✔ clear, well-annotated R example
✔ well-structured SLURM job script
✔ job submission / monitoring
✔ consistent storage under /scratch/<NetID>

🚀 Getting Started on NYU Greene HPC (with R and SLURM)

1. 🔑 Logging in to Greene

2. 🧠 Understanding the Cluster: Login Node vs Compute Node

3. 📦 Loading R via Modules

4. 📁 Installing R Packages into /scratch/<NetID>/rpackages