Using NYU HPC Greene: A Beginner-Friendly Tutorial for R Users

🚀 Getting Started on NYU Greene HPC (with R and SLURM)

This tutorial walks you from “never used Greene” to “running parallel R jobs with SLURM”. If you follow the sections in order, you will:

  • Log in to Greene from your own computer
  • Understand login vs. compute nodes
  • Load R with the module system
  • Install R packages into your private /scratch/<NetID>/rpackages library
  • Create a small R project that uses the SIS package
  • Submit and monitor an array job on Greene

What you need before you start

  • An NYU NetID and Greene HPC access
  • A terminal application (Terminal on macOS, WSL/PowerShell/MobaXterm on Windows)
  • Connectivity to NYU VPN if off-campus

0. Greene in One Minute: Storage and Workflow

NYU’s Greene HPC cluster is a shared system used for research and data analysis. It uses the SLURM workload manager to schedule jobs on many compute nodes.

📦 Understanding Greene Storage Spaces (Home, Scratch, Archive)

Greene provides three main storage areas, each with different purposes, backup policies, and size limits:

Filesystem Space Environment Variable Backed up? / Flushed? Allocation (Space / Files)
/home $HOME Yes / No 50 GB / 30.7K
/scratch $SCRATCH No / Yes 5 TB / 1.0M
/archive $ARCHIVE Yes / No 2 TB / 20.5K

a. /home — Your Personal Workspace

  • Best for scripts, SLURM files, small code projects
  • Backed up nightly → safe for important scripts
  • Small quota → do not store large data here
  • Not flushed

b. /scratch — Fast Temporary Storage for Active Computation

  • Best place for all SLURM job outputs
  • Very large quota (TB-level)
  • Not backed up → files may be deleted after long inactivity
  • Intended for active simulation, data processing, temporary files

c. /archive — Long-term Storage

  • Best for completed project results you want to keep
  • Backed up and not flushed
  • Larger than /home but smaller than /scratch
  • Not designed for running jobs

✔ Usage Pattern Summary

  • Write code and configuration files in $HOME
  • Run jobs and write results to $SCRATCH
  • Move final results to $ARCHIVE for long-term safekeeping

This separation keeps Greene efficient and ensures your important project outputs are protected.


1. 🔑 Logging in to Greene

First, connect to VPN if you are off-campus.

You log in using SSH in a terminal on your computer.

ssh <NetID>@greene.hpc.nyu.edu

Replace <NetID> with your own NYU NetID.

If you see a known-host key warning/error (common after a cluster upgrade), remove the old SSH record:

ssh-keygen -R greene.hpc.nyu.edu

2. 🧠 Understanding the Cluster: Login Node vs Compute Node

Greene uses a shared login node and many compute nodes.

Login node — for light tasks only

  • browsing directories
  • editing scripts (nano, vim, VS Code SSH remote)
  • installing R/python modules
  • transferring files

❌ Do not run long computations here. These may be terminated automatically.

Compute nodes — where your jobs actually run
You access these by submitting a job:

sbatch job.sh

SLURM schedules your job onto a compute node with the resources you request.


3. 📦 Loading R via Modules

Greene uses the environment module system.

Run the following commands inside your SSH session on Greene (after Step 1).

List available R versions:

module avail r

Load your preferred version:

module load r/gcc/4.2.0

To see currently loaded modules:

module list

4. 📁 Installing R Packages into /scratch/<NetID>/rpackages

Run this step on Greene (not on your local computer), in the same terminal where you loaded the R module.

Before installing R packages, you want to request an interactive session on a compute node. This ensures you have sufficient resources and avoids overloading the login node.

srun --pty --mem=4G --time=02:00:00 bash

After a few moments, you should see a new prompt indicating you are on a compute node (e.g. cm004 in the following example).

(base) [yf31@log-3 ~]$ srun --pty --mem=4G --time=02:00:00 bash
srun: job 3306849 queued and waiting for resources
srun: job 3306849 has been allocated resources
(base) [yf31@cm004 ~]$

System-wide installation is not allowed. You must install R packages into your personal library on Greene.

Create your library directory:

mkdir -p /scratch/<NetID>/rpackages

Tell R to use this library. You must do this every time unless added to your ~/.bashrc.

export R_LIBS="/scratch/<NetID>/rpackages"

Start R:

R

Inside R, confirm:

.libPaths()

Expected output:

[1] "/scratch/<NetID>/rpackages"

Now install packages normally:

install.packages("SIS")
library(SIS)

If installation fails, check:

  • Did you load the same R module version for both installation & execution?
  • Did you correctly set R_LIBS?

Exit R:

q()

After finishing package installation, exit the interactive session:

exit

Then, you will see the original login node prompt again.

(base) [yf31@cm004 ~]$ exit
exit
(base) [yf31@log-3 ~]$ 

5. 📁 Create a project directory on your own computer

Do this step on your local computer (not on Greene).

Create a folder for this project (e.g. project_SIS), then create an R script named run_SIS.R inside it with the following content:

# -------------------------------------------------------------------
# Example R script for Greene HPC using SIS package
# -------------------------------------------------------------------

args <- commandArgs(trailingOnly = TRUE) #get the command line arguments from SLURM
seed <- as.numeric(args[1]) #the seed number, the 1st argument from the command line
outdir <- args[2] #output directory, the 2nd argument from the command line

set.seed(seed, kind = "L'Ecuyer-CMRG") #the kind is important for parallel RNG

cat("Seed used:", seed, "\n")

library(SIS) #if not installed, install it first as shown in Step 4

# Generate correlated synthetic data
n = 400; p = 50; rho = 0.5
corrmat = diag(rep(1-rho, p)) + rho
corrmat[4, ] = corrmat[, 4] = sqrt(rho)
corrmat[5, ] = corrmat[, 5] = 0
corrmat[5,5] = 1
cholmat = chol(corrmat)

x = matrix(rnorm(n*p), n, p)
x = x %*% cholmat

# Gaussian response
b = c(4,4,4,-6*sqrt(2),4/3)
y = x[, 1:5] %*% b + rnorm(n)

# Run SIS with BIC tuning
model <- SIS(x, y, family='gaussian', tune='bic')

# Save results
out <- list(
  seed = seed,
  coef = model$coef.est,
  selected_vars = model$ix
)

save(
  out,
  file = file.path(outdir, paste0("SIS_seed_", seed, ".RData"))
) #save output to specified directory containing seed number

cat("Finished seed:", seed, "\n")

Then, in the same local folder, create a SLURM submission script named submit_SIS.sh with the following content:

#!/bin/bash
#SBATCH --job-name=SIS
#SBATCH --output=/scratch/%u/logs/out_%A_%a.txt
#SBATCH --error=/scratch/%u/logs/err_%A_%a.txt
#SBATCH --array=1-20
#SBATCH --time=01:00:00
#SBATCH --mem=4G
#SBATCH --cpus-per-task=1

# Load R
module load r/gcc/4.2.0

# Your R library
export R_LIBS="/scratch/$USER/rpackages"

# Logging directory
mkdir -p /scratch/$USER/logs

# Output directory for results
OUTDIR="/scratch/$USER/SIS_results"
mkdir -p $OUTDIR

echo "Running seed $SLURM_ARRAY_TASK_ID"

# Run the R script
Rscript run_SIS.R $SLURM_ARRAY_TASK_ID "$OUTDIR"

Explanation of important SLURM directives:

  • --array=1-20 runs 20 parallel jobs with seeds 1–20
  • %A = master job ID
  • %a = array index
  • --mem=4G requests 4GB per task
  • --cpus-per-task=1 requests 1 CPU
  • --time=01:00:00 sets a 1-hour time limit
  • Output and error logs are saved under /scratch/<NetID>/logs/
  • Results are saved under /scratch/<NetID>/SIS_results/

In practice, you would include all the scripts and data files in this project folder.

6. 📡 Transferring Files Between Your Computer and Greene

Use rsync for all file transfers. It is reliable, resumable, and efficient for large datasets or syncing project folders.

Upload your project folder from your computer to Greene:

IMPORTANT: in your terminal, open a new tab/window on your local computer (not on Greene).

rsync -avhP /local/path/ <NetID>@greene.hpc.nyu.edu:/home/<NetID>/

Here /local/path/ is the path to your project folder (e.g. project_SIS) on your computer.

Here, we save the project folder to your home directory on Greene.


7. 🚀 Submitting Your Job

Now, in the terminal, navigate to your project directory on Greene, and submit your job:

cd /home/<NetID>/project_SIS
sbatch submit_SIS.sh

If successful:

Submitted batch job 1234567

Here, 1234567 is your master job ID.


8. 📊 Monitoring Your Jobs

While the job is running, you can check the status of your jobs with:

squeue -u <NetID>

Cancel a job:

scancel <jobid>

Cancel all your jobs:

scancel -u <NetID>

9. 📂 Examining Output Files

When your jobs finish, they will disappear from squeue.

List your results:

ls /scratch/<NetID>/SIS_results

You should see files like SIS_seed_1.RData, SIS_seed_2.RData, etc.

If not, check the log files under /scratch/<NetID>/logs/ for errors.

Now, you can download these results back to your local computer using rsync:

rsync -avhP <NetID>@greene.hpc.nyu.edu:/scratch/<NetID>/SIS_results/ /local/path/SIS_results/

Here, /local/path/SIS_results/ is the destination folder on your computer.


10. 🧹 Archiving Results

After you finish your project, consider moving important results to your archive directory for long-term storage. This could be important since the scratch directory may be cleaned up periodically.

First tar and compress your results on Greene:

cd /scratch/<NetID>/SIS_results
tar -czvf SIS_results.tar.gz *

Then move the tarball to your archive directory:

mv SIS_results.tar.gz /archive/<NetID>/

When needed, you can retrieve and extract it later.

cd /archive/<NetID>/
tar -xzvf SIS_results.tar.gz -C /desired/path/

For example, /desired/path/ could be your scratch directory /scratch/<NetID>/SIS_results/.

🎉 Summary

This guide covered:

✔ Logging into Greene
✔ Module system (loading R)
✔ Installing packages in your private library
✔ Writing reproducible R scripts using proper RNG
✔ Submitting SLURM array jobs
✔ Checking logs and outputs
✔ Transferring data with rsync
✔ Archiving results




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • My first experience building an iOS App
  • My Spanish Learning Journey