Knowledge Base Resources
Contributed by cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators), these resources are shared through the ConnectCI community platform. Add resources you find helpful!
Raftlib: Open Source library for concurrent data processing pipelines
0
Raftlib is an open-source C++ Library that provides a framework for implementing parallel and concurrent data processing pipelines. It is designed to simplify the development of high-performance data processing applications by abstracting away the complexities of parallelism, concurrency, and data flow management.
It enables stream/data-flow parallel computation by linking parallel compute kernels together using simple right shift operators, similar to C++ streams for string manipulation. RaftLib eliminates the need for explicit usage of traditional threading libraries such as pthreads, std::thread, or OpenMP, which can lead to non-deterministic behavior when misused.
Online Bachelor's in Data Science Program Guide - TechGuide
0
The realm of data science is one that onlookers regard with curiosity and respect. There are a lot of unknowns in this area of study that only recently became hugely relevant. It is important to get the facts on how expertise in data science is transforming the world. This article features what a bachelor’s degree means in today’s market and the future.
NITRC
0
The Neuroimaging Tools and Resources Collaboratory (NITRC) is a neuroimaging informatics knowledge environment for MR, PET/SPECT, CT, EEG/MEG, optical imaging, clinical neuroinformatics, imaging genomics, and computational neuroscience tools and resources.
Fine-tuning LLMs with PEFT and LoRA
0
As LLMs get larger fine-tuning to the full extent can become difficult to train on consumer hardware. Storing and deploying these tuned models can also be quite expensive and difficult to store. With PEFT (parameter -efficent fine tuning), it approaches fine-tune on a smaller scale of model parameters while freezing most parameters of the pretrained LLMs. Basically it is providing full performance that which is similar if not better than full fine tuning while only having a small number of trainable parameters. This source explains that as well as going over LORA diagrams and a code walk through.
RRCoP Resources Page
0
Very helpful list of Regulated Research Community of Practice's collaborating communities.
Framework to help in scaling Machine Learning/Deep Learning/AI/NLP Models to Web Application level
0
This framework will help in scaling Machine Learning/Deep Learning/Artificial Intelligence/Natural Language Processing Models to Web Application level almost without any time.
Pandas - Python
0
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. It lets you store data in easy to manage and display data frames, with column names and datatypes.
Scipy Lecture Notes
0
Comprehensive tutorials and lecture notes covering various aspects of scientific computing using Python and Scipy.
GPU Acceleration in Python
0
This tutorial explains how to use Python for GPU acceleration with libraries like CuPy, PyOpenCL, and PyCUDA. It shows how these libraries can speed up tasks like array operations and matrix multiplication by using the GPU. Examples include replacing NumPy with CuPy for large datasets and using PyOpenCL or PyCUDA for more control with custom GPU kernels. It focuses on practical steps to integrate GPU acceleration into Python programs.
QGIS Processing Executor
0
Running QGIS tools from the command line
Data Visualization Tools for Julia
0
Plots.jl is the most widely used plotting library for the Julia programming language. It's known for being especially powerful in its versatility and intuitiveness. It's limited set of dependencies and wide applicability across different graphics packages make it especially helpful in visualizing the results of your latest Julia implementation.
However, there are still multiple options available for Julia programmers to visualize their datasets. The second link details a comparison against a variety of Julia packages.
Resource to active inference
0
Active inference is an emerging study field in machine learning and computational neuroscience. This website in particular introduces "active inference institute", which has established a couple of years ago, and contains a wide variety of resources for understanding the theory of active inference and for participating a worldwide active inference community.
MATLAB with other Programming Languages
0
MATLAB is a really useful tool for data analysis among other computational work. This tutorial takes you through using MATLAB with other programming languages including C, C++, Fortran, Java, and Python.
Handwritten Digits Tutorial in PyTorch
0
This tutorial is essentially the "hello world" of image recognition and feed-forward neural network (using PyTorch). Using the MNIST database (filled within images of handwritten digits), the tutorial will instruct how to build a feed-forward neural network that can recognize handwritten digits. A solid understanding of feed-forward and back-propagation is recommended.
Examples of Thrust code for GPU Parallelization
0
Some examples for writing Thrust code. To compile, download the CUDA compiler from NVIDIA. This code was tested with CUDA 9.2 but is likely compatible with other versions. Before compiling change extension from thrust_ex.txt to thrust_ex.cu. Any code on the device (GPU) that is run through a Thrust transform is automatically parallelized on the GPU. Host (CPU) code will not be. Thrust code can also be compiled to run on a CPU for practice.
Ask.CI Q&A Platform for Research Computing
0
Fairness and Machine Learning
0
The "Fairness and Machine Learning" book offers a rigorous exploration of fairness in ML and is suitable for researchers, practitioners, and anyone interested in understanding the complexities and implications of fairness in machine learning.
ACCESS Support Portal
0
Representation Learning in Deep Learning
0
Representation learning is a fundamental concept in machine learning and artificial intelligence, particularly in the field of deep learning. At its core, representation learning involves the process of transforming raw data into a form that is more suitable for a specific task or learning objective. This transformation aims to extract meaningful and informative features or representations from the data, which can then be used for various tasks like classification, clustering, regression, and more.
The Use of High-Performance Computing Services in University Settings: A Usability Case Study of the University of Cincinnati’s High-Performance Computing Cluster
0
This presentation gives a detailed breakdown of the outcome of my master's thesis which was focused on making HPC Clusters accessible across all disciplines in a university setting "Our Case Study was the university of Cincinnati".
Geocomputation with R (Free Reference Book)
0
Below is a link for a book that focuses on how to use "sf" and "terra" packages for GIS computations. As of 5/1/2023, this book is up to date and examples are error free. The book has a lot of information but provides a good overview and example workflows on how to use these tools.
ACCESS Guide (originally given at Duke OIT)
0
A guide for Duke OIT on how to advise users on using ACCESS and allocation credits to jetstream 2 for Duke University members. This can be used for non Duke members. Assumes the reader has basic knowledge of ACCESS.
TensorFlow for Deep Neural Networks
0
TensorFlow is a powerful framework for Deep Learning, developed by google. This specifically is their python package, which is easy to use and can be used to train incredibly powerful models.
Working with Python on HPC Clusters
0
This tutorial series and documentation covers topics on using Python on HPC clusters. The specific steps are based on the HOPPER cluster at George Mason University in Fairfax, VA. They should be implementable on most HPC clusters that have the SLURM scheduler installed, the Environment Modules system for managing packages and Open onDemand for a web-based GUI to access the cluster resources.
Paraview UArizona HPC links (advanced)
0
These links take you to visualization resources supported by the University of Arizona's HPC visualization consultant ([rtdatavis.github.io](http://rtdatavis.github.io/)). The following links are specific to the Paraview program and the workflows that have been used my researchers at the U of Arizona. These links are distinct from the others posted in the beginner paraview access ci links from the University of Arizona in that they are for more complex workflows. The links included explain how to use the terminal with paraview (pvpython), and the steps to leverage HPC resources for headless batch rendering. The batch rendering tutorial is significantly more complex than the others so if you find yourself stuck please post on the https://ask.cyberinfrastructure.org/ and I will try to troubleshoot with you.