Knowledge Base Resources

These resources have been contributed and “vetted” by the community of cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators) that are participating in programs such as this one, that are supported by the ConnectCI community management platform. Additional Knowledge Base Resources are always welcome!

Add a Resource

HPC University

HPC University Resources

A comprehensive list of training resources from the HPC University. HPCU is a virtual organization whose primary goal is to provide a cohesive, persistent, and sustainable on-line environment to share educational and training materials for a continuum of high performance computing environments that span desktop computing capabilities to the highest-end of computing facilities offered by HPC centers.

3 Likes

Type

learning

Level

Cornell Virtual Workshop

Cornell Virtual Workshop is a comprehensive training resource for high performance computing topics. The Cornell University Center for Advanced Computing (CAC) is a leader in the development and deployment of Web-based training programs. Our Cornell Virtual Workshop learning platform is designed to enhance the computational science skills of researchers, accelerate the adoption of new and emerging technologies, and broaden the participation of underrepresented groups in science and engineering. Over 350,000 unique visitors have accessed Cornell Virtual Workshop training on programming languages, parallel computing, code improvement, and data analysis. The platform supports learning communities around the world, with code examples from national systems such as Frontera, Stampede2, and Jetstream2.

jetstream matlab cloud-computing data-analysis performance-tuning parallelization file-transfer globus slurm training cuda matlab python r mpi

1 Like

Type

learning

Level

Introduction to Deep Learning in Pytorch

This workshop series introduces the essential concepts in deep learning and walks through the common steps in a deep learning workflow from data loading and preprocessing to training and model evaluation. Throughout the sessions, students participate in writing and executing simple deep learning programs using Pytorch – a popular Python library for developing, training, and deploying deep learning models.

ai deep-learning image-processing machine-learning neural-networks pytorch gpu

1 Like

Type

learning

Level

DeapSECURE – Data-Enabled Advanced Computational Training Platform for Cybersecurity Research and Education

DeapSECURE lesson modules

DeapSECURE is a training program to infuse high-performance computational techniques into cybersecurity research and education. It is an NSF-funded project of the ODU School of Cybersecurity along with the Department of Electrical and Computer Engineering and the Information Technology Services at ODU. The DeapSECURE team has developed six non-degree training modules to expose cybersecurity students to advanced CI platforms and techniques rooted in big data, machine learning, neural networks, and high-performance programming. Techniques taught in DeapSECURE workshops are rather general and transferable to other areas including science, engineering, finance, linguistics, etc. All lesson materials are made available as open-source educational resources.

ai deep-learning machine-learning neural-networks visualization big-data data-analysis jekyll batch-jobs slurm bash ssh training workforce-development python scikit-learn cybersecurity

1 Like

Type

learning

Level

Gentle Introduction to Programming With Python

A Gentle Introduction to Programming with Python (MIT OCW)

This course from MIT OpenCourseWare (OCW) covers very basic information on how to get started with programming using Python. Lectures are available, along with practice assignments, to users at no cost. Python has many applications in tech today, from web frameworks to machine learning. This course will also instruct users on how to get set up with an IDE, which will allow for way more efficient debugging.

python

1 Like

Type

learning

Level

Version control with Git

Version Control with Git

Understand the benefits of an automated version control system and the basics of how automated version control systems work. Configure git the first time it is used on a computer and understand the meaning of the --global configuration flag. Create a local Git repository and describe the purpose of the .git directory. Go through the modify-add-commit cycle for one or more files, explain where information is stored at each stage of that cycle, and distinguish between descriptive and non-descriptive commit messages.

version-control github git

1 Like

Type

learning

Level

Using Linux commands in a python script (and the difference between the subprocess and os python modules)

Using Linux Commands in a Python Script

Learn how to use Linux commands in a python script. Specifically, learn how to use the subprocess and os modules in python to run shell commands (which run Linux commands) in a python script that is run on a cluster.

cluster-management programming python

1 Like

Type

learning

Level

Attention, Transformers, and LLMs: a hands-on introduction in Pytorch

This workshop focuses on developing an understanding of the fundamentals of attention and the transformer architecture so that you can understand how LLMs work and use them in your own projects.

ai deep-learning machine-learning neural-networks pytorch

1 Like

Type

learning

Level

Fundamentals of R Programming

This course is an introduction to the R programming language and covers the fundamental concepts needed to operate in the R environment. This course was taught for the ACCESS community on September 26, 2023, but the materials for the course are still available on the ACES cluster and can be completed independently. All materials are presented as learnR notebooks and cover several topics, including data types, variables, built-in functions, data structures, and plotting.

ACES TAMU plotting data-analysis r

0 Likes

Type

learning

Level

GIS: What is a Geodetic Datums?

What are Geodetic Datums?

Often when working with GIS, or spatial data, one encounters the word "datum" and it may require that you choose a "datum" when doing GIS computation tasks. Below is a short video on what are datums from NOAA and UCAR.

arcgis gis

0 Likes

Type

learning

Level

NCSA HPC-Moodle

NCSA HPC-Moodle

Self-paced tutorials on high-end computing topics such as parallel computing, multi-core performance, and performance tools. Some of the tutorials also offer digital badges.

training workforce-development

0 Likes

Type

learning

Level

Metadata Systems

Metadata Systems

Metadata is a vital topic in libraries and librarianship, encompassing structured information used for accessing digital resources. The definition of metadata varies but is essentially data about data. It has evolved beyond simply describing metadata schemas and now focuses on topics like interoperability, non-descriptive metadata (administrative and preservation metadata), and the effective application of metadata schemas for user discovery. Interoperability, the ability to seamlessly exchange metadata between systems, is a major concern. Different levels of interoperability are examined, including schema-level, record-level, and repository-level. Challenges to interoperability include variations in standards, collaboration barriers, and costs.Metadata management is discussed in terms of the holistic management of metadata across an entire library. Steps include analyzing metadata requirements, adopting schema, creating metadata content, delivery/access, evaluation, and maintenance. Administrative metadata, which encompasses ownership and production information, is becoming more critical, particularly for electronic resource licensing. Preservation metadata is also gaining importance in ensuring the long-term viability of digital objects.

metadata

0 Likes

Type

learning

Level

File management of Visual Studio Code on clusters

VS Code installation

Visual Studio Code, commonly known as VSCode, is a popular tool used by programmers worldwide. It serves as a text editor and an Integrated Development Environment (IDE) that supports a wide variety of programming languages. One of its key features is its extensive library of extensions. These extensions add on to the basic functionalities of VSCode, making coding more efficient and convenient. However, there's a catch. When these extensions are installed and used frequently, they generate a multitude of files. These files are typically stored in a folder named .vscode-extension within your home directory. On a cluster computing facility such as the FASTER and Grace clusters at Texas A&M University, there's a limitation on how many files you can have in your home directory. For instance, the file number limit could be 10000, while the .vscode-extension directory can hold around 4000 temporary files even with just a few extensions. Thus, if the number of files in your home directory surpasses this limit due to VSCode extensions, you might face some issues. This restriction can discourage users from taking full advantage of the extensive features and extensions offered by the VSCode editor. To overcome this, we can shift the .vscode-extension directory to the scratch space. The scratch space is another area in the cluster where you can store files and it usually has a much higher limit on the number of files compared to the home directory. We can perform this shift smoothly using a feature called symbolic links (or symlinks for short). Think of a symlink as a shortcut or a reference that points to another file or directory located somewhere else. Here's a step-by-step guide on how to move the .vscode-extension directory to the scratch space and create a symbolic link to it in your home directory: 1. Copy the .vscode-extension directory to the scratch space: Using the cp command, you can copy the .vscode-extension directory (along with all its contents) to the scratch space. Here's how: cp -r ~/.vscode-extension /scratch/user Don't forget to replace /scratch/user with the actual path to your scratch directory. 2. Remove the original .vscode-extension directory: Once you've confirmed that the directory has been copied successfully to the scratch space, you can remove the original directory from your home space. You can do this using the rm command: rm -r ~/.vscode-extension It's important to make sure that the directory has been copied to the scratch space successfully before deleting the original. 3. Create a symbolic link in the home directory: Lastly, you'll create a symbolic link in your home directory that points to the .vscode-extension directory in the scratch space. You can do this as follows: ln -s /scratch/user/.vscode-extension ~/.vscode-extension By following this process, all the files generated by VSCode extensions will be stored in the scratch space. This prevents your home directory from exceeding its file limit. Now, when you access ~/.vscode-extension, the system will automatically redirect you to the directory in the scratch space, thanks to the symlink. This method ensures that you can use VSCode and its various extensions without worrying about hitting the file limit in your home directory.

faster file-limit scratch file-transfer

0 Likes

Type

learning

Level

Awesome Jupyter Widgets (for building interactive scientific workflows or science gateway tools)

Awesome Jupyter Widgets List

A curated list of awesome Jupyter widget packages and projects for building interactive visualizations for Python code

0 Likes

Type

learning

Level

UNIX/command line basics tutorial

UNIX/command line basics tutorial

Introductory training materials for working on the UNIX command line.

bash

0 Likes

Type

learning

Level

Introduction to Vizualization on HPC Using Python

University of Arizona Workshop Series: Introduction to HPC, Visualization

This workshop has an introduction to the concepts of visualization followed by hands on exercises. The concepts section has Speaker Notes, and the hands on section has an accompanying Jupyter notebook. The workshop is one in a series of Introduction to HPC

visualization documentation training jupyterhub

0 Likes

Type

learning

Level

What are LSTMs?

Introduction to LSTMs

This reading will explain what a long short-term memory neural network is. LSTMs are a type of neural networks that rely on both past and present data to make decisions about future data. It relies on loops back to previous data to make such decisions. This makes LSTMs very good for predicting time-dependent behavior.

ai deep-learning machine-learning neural-networks

0 Likes

Type

learning

Level

Using Dask on HPC Systems

A tutorial on the effective use of Dask on HPC resources. The four-hour tutorial will be split into two sections, with early topics focused on novice Dask users and later topics focused on intermediate usage on HPC and associated best practices. The knowledge areas covered include (but are not limited to): Beginner section High-level collections including dask.array and dask.dataframe Distributed Dask clusters using HPC job schedulers Earth Science data analysis using Dask with Xarray Using the Dask dashboard to understand your computation Intermediate section Optimizing the number of workers and memory allocation Choosing appropriate chunk shapes and sizes for Dask collections Querying resource usage and debugging errors

training jupyterhub python

0 Likes

Type

learning

Level

Advanced Compilers: The Self-Guided Online Course

Cornell's Advanced Compilers

This is a self guided online course on compilers. The topics covered throughout the course include universal compilers topics like intermediate representations, data flow, and “classic” optimizations as well as more research focusedtopics such as parallelization, just-in-time compilation, and garbage collection.

optimization parallelization training compiling

0 Likes

Type

learning

Level

FreeSurfer Tutorials

FreeSurfer Tutorials

The official MGH / Harvard tutorial page for FreeSurfer. The FreeSurfer group has provided and designed a series of tutorials for using FreeSurfer and for getting acquainted with the concepts needed to perform its various modes of analysis and processing of MRI data. The tutorials are designed to be followed along in a terminal window where commands can be copy/pasted instead of typed.

data-analysis image-processing psychology

0 Likes

Type

learning

Level

Federated CI Resources

How do you add your institutional HPC cluster to the Open Science Grid (OSG)?

Discussion about contributing cycles to the Open Science Grid.

open-science-grid

0 Likes

Type

learning

Level

How to use Rclone

Tutorial - Using Rclone to transfer data into the OSN

Learn how to use Rclone to transfer data, specifically from your local drive to the Open Storage Network, vice versa.

data-transfer

0 Likes

Type

learning

Level

Biopython Tutorial

The Biopython Tutorial and Cookbook website is a dedicated online resource for users in the field of computational biology and bioinformatics. It provides a collection of tutorials and practical examples focused on using the Biopython library. The website offers a series of tutorials that cover various aspects of Biopython, catering to users with different levels of expertise. It also includes code snippets and examples, and common solutions to common challenges in computational biology.

bioinformatics genomics python

0 Likes

Type

learning

Level

GPU Computing Workshop Series for the Earth Science Community

GPU training series for scientists, software engineers, and students, with emphasis on Earth science applications. The content of this course is coordinated with the 6 month series of GPU Training sessions starting in Februrary 2022. The NVIDIA High Performance Computing Software Development Kit (NVHPC SDK) and CUDA Toolkit will be the primary software requirements for this training which will be already available on NCAR's HPC clusters as modules you may load. This software is free to download from NVIDIA by navigating to the NVHPC SDK Current Release Downloads page and the CUDA Toolkit downloads page. Any provided code is written specifically to build and run on NCAR's Casper HPC system but may be adapted to other systems or personal machines. Material will be updated as appropriate for the future deployment of NCAR's Derecho cluster and as technology progresses.

optimization performance-tuning profiling parallelization github pytorch tensorflow oceanography gpu hpc-arch-and-perf training c c++fortran cuda jupyterhub programming programming-best-practices python

0 Likes

Type

learning

Level

Thrust resources

Thrust is a CUDA library that optimizes parallelization on the GPU for you. The Thrust tutorial is great for beginners. The documentation is helpful for anyone using Thrust.

parallelization gpu resources

0 Likes

Type

learning

Level

Campus Champions

Knowledge Base Resources

Topics

Programming Language

Science Domain

Skill Level

Content Type