Knowledge Base Resources
Contributed by cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators), these resources are shared through the ConnectCI community platform. Add resources you find helpful!
HPC University
3
A comprehensive list of training resources from the HPC University. HPCU is a virtual organization whose primary goal is to provide a cohesive, persistent, and sustainable on-line environment to share educational and training materials for a continuum of high performance computing environments that span desktop computing capabilities to the highest-end of computing facilities offered by HPC centers.
Rust Web Server Tutorial
2
This is a beginner-friendly tutorial on how to set up your web server using Rust!
Cornell Virtual Workshop
1
Cornell Virtual Workshop is a comprehensive training resource for high performance computing topics. The Cornell University Center for Advanced Computing (CAC) is a leader in the development and deployment of Web-based training programs. Our Cornell Virtual Workshop learning platform is designed to enhance the computational science skills of researchers, accelerate the adoption of new and emerging technologies, and broaden the participation of underrepresented groups in science and engineering. Over 350,000 unique visitors have accessed Cornell Virtual Workshop training on programming languages, parallel computing, code improvement, and data analysis. The platform supports learning communities around the world, with code examples from national systems such as Frontera, Stampede2, and Jetstream2.
PyTorch for Deep Learning and Natural Language Processing
1
PyTorch is a Python library that supports accelerated GPU processing for Machine Learning and Deep Learning. In this tutorial, I will teach the basics of PyTorch from scratch. I will then explore how to use it for some ML projects such as Neural Networks, Multi-layer perceptrons (MLPs), Sentiment analysis with RNN, and Image Classification with CNN.
GIS: Geocoding Services
1
Geocoding is the process of taking a street address and converting it into coordinates that can be plotted on a map. This conversion typically requires an API call to a remote server hosted by an organization/institution. The remote server will take the address attributes provided by you and the remote server will compare it to the data it contains and return a best estimate on the coordinates for that location.
There are many geocoding services available with different world coverages, quality of result, and set different rate limits for access. For R, a package called "tidygeocoder" provides an easy way to connect to these different services. As an additional benefit, their documentation provides a good summary of geocoding services available and links to their documentation. The link to the documentation for gecoding services accessible by "tidygeocoder" is provided below.
For Python, geopy package is a library that provides connection to various geocoding services. The link to the documentation for this package is also included below.
Attention, Transformers, and LLMs: a hands-on introduction in Pytorch
1
This workshop focuses on developing an understanding of the fundamentals of attention and the transformer architecture so that you can understand how LLMs work and use them in your own projects.
Open OnDemand Documentation Repository
1
This is the main documentation repo for the Open OnDemand Portal which enables researchers to access HPC resources from a familiar web interface.
ACCESS Pegasus Documentation
1
The documentation provides an overview of using Pegasus, a workflow management system, on ACCESS resources for high throughput computing (HTC) workloads, covering logging in, workflow creation, resource configuration, and monitoring options.
Data Visualization tools for Python
1
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It makes analyzing and presenting your data extremely easy and works with Python which many people already know.
DARWIN Documentation Pages
1
DARWIN (Delaware Advanced Research Workforce and Innovation Network) is a big data and high performance computing system designed to catalyze Delaware research and education
Introduction to Deep Learning in Pytorch
1
This workshop series introduces the essential concepts in deep learning and walks through the common steps in a deep learning workflow from data loading and preprocessing to training and model evaluation. Throughout the sessions, students participate in writing and executing simple deep learning programs using Pytorch – a popular Python library for developing, training, and deploying deep learning models.
Introduction to Python for Digital Humanities and Computational Research
1
This documentation contains introductory material on Python Programming for Digital Humanities and Computational Research. This can be a go-to material for a beginner trying to learn Python programming and for anyone wanting a Python refresher.
ACCESS HPC Workshop Series
1
Monthly workshops sponsored by ACCESS on a variety of HPC topics organized by Pittsburgh Supercomputing Center (PSC). Each workshop will be telecast to multiple satellite sites and workshop materials are archived.
Version control with Git
1
Understand the benefits of an automated version control system and the basics of how automated version control systems work. Configure git the first time it is used on a computer and understand the meaning of the --global configuration flag. Create a local Git repository and describe the purpose of the .git directory. Go through the modify-add-commit cycle for one or more files, explain where information is stored at each stage of that cycle, and distinguish between descriptive and non-descriptive commit messages.
NCSA HPC Training Moodle
1
Self-paced tutorials on high-end computing topics such as parallel computing, multi-core performance, and performance tools. Other related topics include 'Cybersecurity for End Users' and 'Developing Webinar Training.' Some of the tutorials also offer digital badges. Many of these tutorials were previously offered on CI-Tutor. A list of open access training courses are provided below.
Parallel Computing on High-Performance Systems
Profiling Python Applications
Using an HPC Cluster for Scientific Applications
Debugging Serial and Parallel Codes
Introduction to MPI
Introduction to OpenMP
Introduction to Visualization
Introduction to Performance Tools
Multilevel Parallel Programming
Introduction to Multi-core Performance
Using the Lustre File System
Gentle Introduction to Programming With Python
1
This course from MIT OpenCourseWare (OCW) covers very basic information on how to get started with programming using Python. Lectures are available, along with practice assignments, to users at no cost. Python has many applications in tech today, from web frameworks to machine learning. This course will also instruct users on how to get set up with an IDE, which will allow for way more efficient debugging.
Useful R Packages for Data Science and Statistics
1
This Udacity article listed the most frequently used R packages for data science and statistics. For each package, the article provided the link to its official documentation. It will be a great start point if you want to start your data science journey in R.
Using Linux commands in a python script (and the difference between the subprocess and os python modules)
1
Learn how to use Linux commands in a python script. Specifically, learn how to use the subprocess and os modules in python to run shell commands (which run Linux commands) in a python script that is run on a cluster.
Singularity/Apptainer User Manuals
0
Singularity/Apptainer is a free and open-source container platform that allows users to build and run containers on high performance computing resources.
SingularityCE is the community edition of Singularity maintained by Sylabs, a company that also offers commercial Singularity products and services.
Apptainer is a fork of Singularity, maintained by the Linux foundation, a community of developers and users who are passionate about open source software.
A guide to pip in Python
0
pip stands for "pip installs packages". It's the go-to package manager for Python, allowing developers to install, update, and manage software libraries and dependencies used in Python projects. With just a few commands in your terminal or command prompt, pip makes it effortless to fetch libraries from the Python Package Index (PyPI) and integrate them into your projects. This guide will walk you through the basics of pip, from installation to advanced package management.
CyberAmbassadors: Professional Skills for Interdisciplinary Work
0
The CyberAmbassadors project was funded through a workforce development grant from the National Science Foundation (Award #1730137). Starting in 2017, the initial focus of this project was to develop, test, and refine new curriculum to help CyberInfrastructure (CI) Professionals strengthen their communications, teamwork and leadership skills. With support and collaboration from a number of academic and professional organizations, the CyberAmbassadors project was expanded to offer professional skills training to college students and professionals working across STEM (science, technology, engineering, math) disciplines.
Paraview UArizona HPC links (beginner)
0
These links take you to visualization resources supported by the University of Arizona's HPC visualization consultant (rtdatavis.github.io). The following links are specific to the Paraview program and the workflows that have been used my researchers at the U of Arizona. Some of the pages linked are very beginner friendly: getting started, working with cameras and keyframes for rendering, visualizing external files (netcdf climate data), graphs and data exporting.
Many of the workflows involve using remote desktops via the Open On Demand interface, but if this isn't set up at your university you can use paraview locally on a desktop. Feel free to post on access ci https://ask.cyberinfrastructure.org/ if you need assistance getting a paraview gui open for your work on HPC.
NERSC Training and Tutorials
0
A comprehensive collection of NERSC developed training and tutorial events, offered on regular schedules. All sessions are archived, including slide decks, video recordings, and software examples as are available. Some examples of past training and tutorial topics are listed below
Deep Learning for Sciences Webinar Series
BerkeleyGW Tutorial Workshop
VASP Trainings
Timemory Software Monitoring Tutorial, April 2021
HPCToolkit to Measure and Analyzing GPU Applications Performance Tutorial
Totalview Tutorial
NVidia HPCSDK - OpenMP Target Offload Training
Parallelware Training Series
ARM Debugging and Profiling Tools Tutorial
Roofline on NVIDIA GPUs
GPUs for Science events
3-part OpenACC Training Series
9-part CUDA Training Series
GIS: Projections and their distortions
0
In GIS, projections are helpful to take something plotted on a globe and convert it to a flat map that we can print or show on a screen. Unfortunately it also introduces distortions to the objects and features on the map. This not only distorts the objects visually, but the results for any spatial attribute calculations will also reflect this distortion (such as distance and area ). Below is a link to a quick primer on projections, types of distortions that can occur, and suggestions on how to choose a correct projection for your work.
Understanding LLM Fine-tuning
0
With the recent uprising of LLM's many business are looking at way to adopt these LLMs and fine-tuning these models on specfic data sets to ensure accuracy. These models when fine-tuned can be optimal for fulfilling the specific needs of a company. This site explains explicitly when, how, and why models should be trained. It goes over various strategies for LLM fine -tuning.