Knowledge Base Resources

Contributed by cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators), these resources are shared through the ConnectCI community platform. Add resources you find helpful!

Add a Resource

Cornell Virtual Workshop

Cornell Virtual Workshop is a comprehensive training resource for high performance computing topics. The Cornell University Center for Advanced Computing (CAC) is a leader in the development and deployment of Web-based training programs. Our Cornell Virtual Workshop learning platform is designed to enhance the computational science skills of researchers, accelerate the adoption of new and emerging technologies, and broaden the participation of underrepresented groups in science and engineering. Over 350,000 unique visitors have accessed Cornell Virtual Workshop training on programming languages, parallel computing, code improvement, and data analysis. The platform supports learning communities around the world, with code examples from national systems such as Frontera, Stampede2, and Jetstream2.

1 Like

Type

learning

Level

OpenHPC: Beyond the Install Guide

OpenHPC: Beyond the Install Guide

Materials for the "OpenHPC: Beyond the Install Guide" half-day tutorial, first offered at PEARC24. The goal of this repository is to let instructors or self-learners to construct one or more OpenHPC 3.x virtual environments, for those environments to be as close as possible to the defaults from the OpenHPC installation guide, and to then use those environments to demonstrate several topics beyond the basic installation guide. Topics include: 1. Building a login node that's practically identical to a compute node (except for where it needs to be different) 2. Adding more security to the SMS and login node 3. Using node-local storage for the OS and/or scratch 4. De-coupling the SMS and the compute nodes (e.g., independent kernel versions) 5. GPU driver installation (simulated/recorded, not live) 6. Easier management of node differences (GPU or not, diskless/single-disk/multi-disk, Infiniband or not, etc.) 7. Slurm configuration to match some common policy goals (fair share, resource limits, etc.)

jetstream administering-hpc cluster-management hpc-cluster-build provisioning slurm

0 Likes

Type

learning

Level

AHPCC documentary

Arkansas High Performance Computing Center

This link is a documentary website to use AHPCC.

0 Likes

Type

documentation

Level

Working with Python on HPC Clusters

Working with Python on HPC Clusters

This tutorial series and documentation covers topics on using Python on HPC clusters. The specific steps are based on the HOPPER cluster at George Mason University in Fairfax, VA. They should be implementable on most HPC clusters that have the SLURM scheduler installed, the Environment Modules system for managing packages and Open onDemand for a web-based GUI to access the cluster resources.

pytorch batch-jobs job-submission scheduling slurm modules scripting conda python

0 Likes

Type

documentation

Level

Slurm User Group Mailing List

Slurm Community Mailing List

slurm schedulers

0 Likes

Type

mailing_list

Level

Spatial Data Science in the Cloud (Alpine HPC) using Python

Spatial Data Science in the Cloud (Alpine HPC) using Python Webpage

Spatial Data Science is a growing field across a wide range of industries and disciplines. The open-source programming language Python has many libraries that support spatial analysis, but what do you do when your computer is unable to tackle the massive file sizes of high-resolution data and the computing power required in your analysis? There materials have been prepared to teach you spatial data science and how to execute your analysis using a high-performance computer (HPC).

cloud big-data data-analysis gis hpc-getting-started slurm git anaconda python

0 Likes

Type

learning

Level

Slurm Scheduling Software Documentation

Slurm Documentation

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

cluster-management cluster-support slurm

0 Likes

Type

website

Level

Managing and Optimizing Your Jobs on HPC

Managing and Optimizing Your Jobs on HPC

An overview of tools and methods to manage and optimize jobs and HPC workflows

memory optimization batch-jobs job-accounting job-submission resources slurm

0 Likes

Type

video_link

Level

ACES: Charliecloud Containers for Scientific Workflows (Tutorial)

This tutorial introduces the use of Containers using the Charliecloud software suite. This tutorial will provide participants with background and hands-on experience to use basic Charliecloud containers for HPC applications. We discuss what containers are, why they matter for HPC, and how they work. We'll give an overview of Charliecloud, the unprivileged container solution from Los Alamos National Laboratory's HPC Division. Students will learn how to build toy containers and containerize real HPC applications, and then run them on a cluster. Exercises are demonstrated using the ACES cluster, a composable accelerator testbed at Texas A&M University. Students with an allocation on the ACES cluster can follow along with the ACES-specific exercises.

ACES TAMU scratch lammps tensorflow open-ondemand gpu nfs slurm bash training python containers

0 Likes

Type

learning

Level

Campus Champions

Knowledge Base Resources

Topics

Programming Language

Science Domain

Skill Level

Content Type