Knowledge Base Resources
These resources have been contributed and “vetted” by the community of cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators) that are participating in programs such as this one, that are supported by the ConnectCI community management platform. Additional Knowledge Base Resources are always welcome!
Intro to Statistical Computing with Stan
0
The Stan language is used to specify a (Bayesian) statistical model with an imperative program calculating the log probability density function. Here are some useful links to start your exploration of this statistical programming language, and a Python interface to Stan.
ACCESS Getting Started Quick-Guide
0
A step-by-step guide to getting your first allocation for Access computing and storage resources.
AHPCC documentary
0
This link is a documentary website to use AHPCC.
CMake Tutorials
0
CMake is an open-source tool used to manage the build process in operating systems. This tutorial takes you through how to use CMake from the very basics with example projects.
Data Analysis with R for Educators
0
This webinar series is an orientation to R. We start with an overview of R’s history and place in the larger data science ecosystem. Next, we introduce the R Studio user interface and how to access R’s excellent documentation. Finally, we present the fundamental concepts you need to use the R environment and language for data analysis. Along the way, we compare R script files (.R) to R Notebook (.Rmd) files and show how the features of R Notebook support better communication and encourage more dynamic engagement with statistical analysis and code. It is helpful to be familiar with tabular data analysis using statistical software, database tools, or spreadsheet programs.
Workshop materials, including setup directions and slides are available at https://github.com/CornellCAC/r_for_edu/ The Rstudio Cloud project used in the workshop is https://rstudio.cloud/project/4044219.
The Official Documentation of Pandas
0
Pandas is one of the most essential Python libraries for data analysis and manipulation. It provides high-performance, easy-to-use data structures, and data analysis tools for the Python programming language. The official documentation serves as an in-depth guide to using this powerful tool including explanations and examples.
Jetstream2 Docs Site
0
Jetstream2 makes cutting-edge high-performance computing and software easy to use for your research regardless of your project’s scale—even if you have limited experience with supercomputing systems.Cloud-based and on-demand, the 24/7 system includes discipline-specific apps. You can even create virtual machines that look and feel like your lab workstation or home machine, with thousands of times the computing power.
Tutorial for OpenMP Building up and Utilization
0
The following link elaborates the usage of OpenMP API and its related syntax. There are also several exercises available for learners to help them get familiar with this widely-used tool for multi-threaded realization.
Research Software Development in JupyterLab: A Platform for Collaboration Between Scientists and RSEs
0
Iterative Programming takes place when you can explore your code and play with your objects and functions without needing to save, recompile, or leave your development environment. This has traditionally been achieved with a REPL or an interactive shell. The magic of Jupyter Notebooks is that the interactive shell is saved as a persistant document, so you don't have to flip back and forth between your code files and the shell in order to program iteratively.
There are several editors and IDE's that are intended for notebook development, but JupyterLab is a natural choice because it is free and open source and most closely related to the Jupyter Notebooks/iPython projects. The chief motivation of this repository is to enable an IDE-like development environment through the use of extensions. There are also expositional notebooks to show off the usefulness of these features.
Understanding LLM Fine-tuning
0
With the recent uprising of LLM's many business are looking at way to adopt these LLMs and fine-tuning these models on specfic data sets to ensure accuracy. These models when fine-tuned can be optimal for fulfilling the specific needs of a company. This site explains explicitly when, how, and why models should be trained. It goes over various strategies for LLM fine -tuning.
Neurodesk
0
Neurodesk provides a containerised data analysis environment to facilitate reproducible analysis of neuroimaging data. Analysis pipelines for neuroimaging data typically rely on specific versions of packages and software, and are dependent on their native operating system. These dependencies mean that a working analysis pipeline may fail or produce different results on a new computer, or even on the same computer after a software update. Neurodesk provides a platform in which anyone, anywhere, using any computer can reproduce your original research findings given the original data and analysis code.
Trusted CI
0
The mission of Trusted CI is to lead in the development of an NSF Cybersecurity Ecosystem with the workforce, knowledge, processes, and cyberinfrastructure that enables trustworthy science and NSF’s vision of a nation that is a global leader in research and innovation.
MPI Resources
0
Workshop for beginners and intermediate students in MPI which includes helpful exercises. Open MPI documentation.
Applications of Machine Learning in Engineering and Parameter Tuning Tutorial
0
Slides for a tutorial on Machine Learning applications in Engineering and parameter tuning given at the RMACC conference 2019.
MATLAB bioinformatics toolbox
0
Bioinformatics Toolbox provides algorithms and apps for Next Generation Sequencing (NGS), microarray analysis, mass spectrometry, and gene ontology. Using toolbox functions, you can read genomic and proteomic data from standard file formats such as SAM, FASTA, CEL, and CDF, as well as from online databases such as the NCBI Gene Expression Omnibus and GenBank.
High performance computing 101
0
An introductory guide to High Performance Computing.
Docker Tutorial for Beginners
0
A Docker tutorial for beginners is a course that teaches the basics of Docker, a containerization platform that allows you to package your application and its dependencies into a standardized unit for development, shipment, and deployment.
ACCESS HPC Workshop Series
0
Monthly workshops sponsored by ACCESS on a variety of HPC topics organized by Pittsburgh Supercomputing Center (PSC). Each workshop will be telecast to multiple satellite sites and workshop materials are archived.
The Theory Behind Neural Networks (Very Simplified)
0
This video by the YouTube channel 3Blue1Brown provides a very simplified introduction to the theory behind neural networks. This tutorial is perfect for those that don't have much linear algebra or machine learning background and are eager to step into the realm of ML!
Ask.CI Q&A Platform for Research Computing
0
Intro to Machine Learning on HPC
0
This tutorial introduces machine learning on high performance computing (HPC) clusters. While it focuses on the HPC clusters at The University of Arizona, the content is generic enough that it can be used by students from other institutions.
Reinforcement Learning For Beginners with Python
0
This course takes through the fundamentals required to get started with reinforcement learning with Python, OpenAI Gym and Stable Baselines. You'll be able to build deep learning powered agents to solve a varying number of RL problems including CartPole, Breakout and CarRacing as well as learning how to build your very own/custom environment!
Introduction to Probabilistic Graphical Models
0
This website summarizes the notes of Stanford's introductory course on probabilistic graphical models.
It starts from the very basics and concludes by explaining from first principles the variational auto-encoder, an important probabilistic model that is also one of the most influential recent results in deep learning.
Charliecloud User Group
0
Announcements for for users and developers of Charliecloud, which provides lightweight user-defined software stacks for high-performance computing.
Geocomputation with R (Free Reference Book)
0
Below is a link for a book that focuses on how to use "sf" and "terra" packages for GIS computations. As of 5/1/2023, this book is up to date and examples are error free. The book has a lot of information but provides a good overview and example workflows on how to use these tools.