Knowledge Base Resources

Contributed by cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators), these resources are shared through the ConnectCI community platform. Add resources you find helpful!

Add a Resource

Python Data and Viz Training (CCEP Program)

5 Days of recordings of Python data analysis and visualization training.

data-science python

0 Likes

Type

learning

Level

Intro to Statistical Computing with Stan

The Stan language is used to specify a (Bayesian) statistical model with an imperative program calculating the log probability density function. Here are some useful links to start your exploration of this statistical programming language, and a Python interface to Stan.

data-analysis machine-learning monte-carlo python

0 Likes

Type

documentation

Level

Fairness and Machine Learning

Fairness and Machine Learning

The "Fairness and Machine Learning" book offers a rigorous exploration of fairness in ML and is suitable for researchers, practitioners, and anyone interested in understanding the complexities and implications of fairness in machine learning.

ai data-analysis deep-learning machine-learning data-science

0 Likes

Type

documentation

Level

AI powered VsCode Editor

Cursor - AI code editor

**Cursor: The AI-Powered Code Editor** Cursor is a cutting-edge, AI-first code editor designed to revolutionize the way developers write, debug, and understand code. Built upon the premise of pair-programming with artificial intelligence, Cursor harnesses the capabilities of advanced AI models to offer real-time coding assistance, bug detection, and code generation. **How Cursor Benefits High-Performance Computing (HPC) Work:** 1. **Efficient Code Development:** With AI-assisted code generation, researchers and developers in the HPC realm can quickly write optimized code for simulations, data processing, or modeling tasks, reducing the time to deployment. 2. **Debugging Assistance:** Handling complex datasets and simulations often lead to intricate bugs. Cursor's capability to automatically investigate errors and determine root causes can save crucial time in the HPC workflow. 3. **Tailored Code Suggestions:** Cursor's AI provides context-specific code suggestions by understanding the entire codebase. For HPC applications where performance is paramount, this means receiving recommendations that align with optimization goals. 4. **Improved Code Quality:** With AI-driven bug scanning and linter checks, Cursor ensures that HPC codes are not only fast but also robust and free of common errors. 5. **Easy Integration:** Being a fork of VSCode, Cursor allows seamless migration, ensuring that developers working in HPC can swiftly integrate their existing VSCode setups and extensions. In essence, for HPC tasks that demand speed, precision, and robustness, Cursor acts as an invaluable co-pilot, guiding developers towards efficient and optimized coding solutions. It is free if you provide your own OPEN AI API KEY.

ai machine-learning workflow natural-language-processing programming python sas

0 Likes

Type

tool

Level

C Programming

C Programming Notes

"These notes are part of the UW Experimental College course on Introductory C Programming. They are based on notes prepared (beginning in Spring, 1995) to supplement the book The C Programming Language, by Brian Kernighan and Dennis Ritchie, or K&R as the book and its authors are affectionately known. (The second edition was published in 1988 by Prentice-Hall, ISBN 0-13-110362-8.) These notes are now (as of Winter, 1995-6) intended to be stand-alone, although the sections are still cross-referenced to those of K&R, for the reader who wants to pursue a more in-depth exposition." C is a low-level programming language that provides a deep understanding of how a computer's memory and hardware work. This knowledge can be valuable when optimizing apps for performance or when dealing with resource-constrained environments.C is often used as the foundation for creating cross-platform libraries and frameworks. Learning C can allow you to develop libraries that can be used across different platforms, including iOS, Android, and desktop environments.

c c++compiling programming programming-best-practices

0 Likes

Type

learning

Level

Ask.CI Q&A Platform for Research Computing

Ask.CI

resources programming-best-practices

0 Likes

Type

website

Level

Chameleon

Chameleon User Guide

Chameleon is an NSF-funded testbed system for Computer Science experimentation. It is designed to be deeply reconfigurable, with a wide variety of capabilities for researching systems, networking, distributed and cluster computing and security.

data-sharing data-reproducibility

0 Likes

Type

documentation

Level

Paraview UArizona HPC links (beginner)

These links take you to visualization resources supported by the University of Arizona's HPC visualization consultant (rtdatavis.github.io). The following links are specific to the Paraview program and the workflows that have been used my researchers at the U of Arizona. Some of the pages linked are very beginner friendly: getting started, working with cameras and keyframes for rendering, visualizing external files (netcdf climate data), graphs and data exporting. Many of the workflows involve using remote desktops via the Open On Demand interface, but if this isn't set up at your university you can use paraview locally on a desktop. Feel free to post on access ci https://ask.cyberinfrastructure.org/ if you need assistance getting a paraview gui open for your work on HPC.

visualization

0 Likes

Type

documentation

Level

AWS Tutorial For Beginners

AWS Tutorial For Beginners

An AWS Tutorial for Beginners is a course that teaches the basics of Amazon Web Services (AWS), a cloud computing platform that offers a wide range of services, including compute, storage, networking, databases, analytics, machine learning, and artificial intelligence.

aws

0 Likes

Type

video_link

Level

Slurm User Group Mailing List

Slurm Community Mailing List

slurm schedulers

0 Likes

Type

mailing_list

Level

Managing and Optimizing Your Jobs on HPC

Managing and Optimizing Your Jobs on HPC

An overview of tools and methods to manage and optimize jobs and HPC workflows

memory optimization batch-jobs job-accounting job-submission resources slurm

0 Likes

Type

video_link

Level

Official Documentation for PyTorch and NumPy

The official documentation for PyTorch, a machine learning tensor-based framework, and NumPy, which allows for support for ndarrays which is useful to make tensors when implementing NNs. Both libraries can be installed with pip.

deep-learning neural-networks pytorch python

0 Likes

Type

documentation

Level

Federated CI Resources

How do you add your institutional HPC cluster to the Open Science Grid (OSG)?

Discussion about contributing cycles to the Open Science Grid.

open-science-grid

0 Likes

Type

learning

Level

The Official Documentation of Pandas

pandas documentation

Pandas is one of the most essential Python libraries for data analysis and manipulation. It provides high-performance, easy-to-use data structures, and data analysis tools for the Python programming language. The official documentation serves as an in-depth guide to using this powerful tool including explanations and examples.

plotting visualization

0 Likes

Type

documentation

Level

ConnectCI

https://cnct.ci

Connect.Cybinfrastructure is a family of portals, each representing a program that is serving a segment of the research computing and data community. Each portal provides program-specific information, as well a custom "view" into a common database. The portal was originally developed to support project workflows and a knowledge base of self service learning resources for the Northeast Cyberteam. Subsequently, it was expanded to provide support to multiple cyberteams and other research computing communities of practice. We welcome additional communities, please contact us if you are interested in participating. Central to the Portal is an extensive and ever-evolving tagging infrastructure which informs every aspect of the Portal. The tag taxonomy was initially developed by the Northeast Cyberteam to categorize subject matter relevant to practitioners of Research Computing Facilitation and is ever changing due to the frequent introduction of new technology in domains that characterize the field of research computing.

community-outreach

0 Likes

Type

website

Level

Long Tales of Science: A podcast about women in HPC

Long Tales of Science

A series of interviews with women in the HPC community

science-gateway community-outreach professional-development project-management proposal-development training workforce-development xsede

0 Likes

Type

website

Level

GDAL Multi-threading

GDAL Multi-threading

Multi-threading guidance when using GDAL.

parallelization gis

0 Likes

Type

learning

Level

fast.ai

fast.ai Homepage

Fastai offers many tools to people working with machine learning and artifical intelligence including tutorials on PyTorch in addition to their own library built on PyTorch, news articles, and other resources to dive into this realm.

ai machine-learning pytorch training

0 Likes

Type

website

Level

Probabilistic Semantic Data Association for Collaborative Human-Robot Sensing

Probabilistic Semantic Data Association for Collaborative Human-Robot Sensing

Humans cannot always be treated as oracles for collaborative sensing. Robots thus need to maintain beliefs over unknown world states when receiving semantic data from humans, as well as account for possible discrepancies between human-provided data and these beliefs. To this end, this paper introduces the problem of semantic data association (SDA) in relation to conventional data association problems for sensor fusion. It then, develops a novel probabilistic semantic data association (PSDA) algorithm to rigorously address SDA in general settings. Simulations of a multi-object search task show that PSDA enables robust collaborative state estimation under a wide range of conditions.

ai machine-learning

0 Likes

Type

documentation

Level

How to Get the Most Out of a Mentoring Relationship by The Plank Center

The Plank Center Mentorship Guide

Backed by collegiate white papers, top industry professionals, and researchers, The Plank Center’s Mentorship Guide offers basic tips and tricks on how to get the most out of a mentorship relationship. This easy-to-follow guide supplements mentorship programs, lesson plans, and professional relationships.

mentorship professional-development training workforce-development

0 Likes

Type

tool

Level

ACCESS KB Guide - Anvil

ACCESS KB Guide - Anvil

Purdue University is the home of Anvil, a powerful supercomputer that provides advanced computing capabilities to support a wide range of computational and data-intensive research spanning from traditional high-performance computing to modern artificial intelligence applications.

anvil

0 Likes

Type

documentation

Level

Numpy - a Python Library

NumPY Docs

Numpy is a python package that leverages types and compiled C code to make many math operations in Python efficient. It is especially useful for matrix manipulation and operations.

documentation big-data data-analysis deep-learning opencv pytorch tensorflow data-science

0 Likes

Type

tool

Level

RRCoP Resources Page

RRCoP External resources Page

Very helpful list of Regulated Research Community of Practice's collaborating communities.

community-outreach cybersecurity

0 Likes

Type

website

Level

Implementing Markov Processes with Julia

Markov Decision Processes in Julia

The following link provides an easy method of implementing Markov Decision Processes (MDP) in the Julia computing language. MDPs are a class of algorithms designed to handle stochastic situations where the actor has some level of control. For example, used at a low level, MDPs can be used to control an inverted pendulum, but applied in higher level decision making the can also decide when to take evasive action in air traffic management. MDPs can also be extended to the partially observable domain to form the Partially Observable Markov Decision Process (POMDP). This link contains a wealth of information to show one can easily implement basic POMDP and MDP algorithms and apply well known online and offline solvers.

ai machine-learning julia

0 Likes

Type

tool

Level

MATLAB bioinformatics toolbox

https://www.mathworks.com/products/bioinfo.html

Bioinformatics Toolbox provides algorithms and apps for Next Generation Sequencing (NGS), microarray analysis, mass spectrometry, and gene ontology. Using toolbox functions, you can read genomic and proteomic data from standard file formats such as SAM, FASTA, CEL, and CDF, as well as from online databases such as the NCBI Gene Expression Omnibus and GenBank.

visualization data-analysis bioinformatics genomics matlab

0 Likes

Type

tool

Level

Campus Champions

Knowledge Base Resources

Topics

Programming Language

Science Domain

Skill Level

Content Type