Knowledge Base Resources
Use these links “vetted” by the community. Additional CI links are always welcome.
The Official Documentation of Pandas
0
Pandas is one of the most essential Python libraries for data analysis and manipulation. It provides high-performance, easy-to-use data structures, and data analysis tools for the Python programming language. The official documentation serves as an in-depth guide to using this powerful tool including explanations and examples.
GPU Computing Workshop Series for the Earth Science Community
0
GPU training series for scientists, software engineers, and students, with emphasis on Earth science applications.
The content of this course is coordinated with the 6 month series of GPU Training sessions starting in Februrary 2022. The NVIDIA High Performance Computing Software Development Kit (NVHPC SDK) and CUDA Toolkit will be the primary software requirements for this training which will be already available on NCAR's HPC clusters as modules you may load. This software is free to download from NVIDIA by navigating to the NVHPC SDK Current Release Downloads page and the CUDA Toolkit downloads page. Any provided code is written specifically to build and run on NCAR's Casper HPC system but may be adapted to other systems or personal machines. Material will be updated as appropriate for the future deployment of NCAR's Derecho cluster and as technology progresses.
An Introduction to the Julia Programming Language
0
The Julia Programming Language is one of the fastest growing software languages for AI/ML development. It writes in manner that's similar to Python while being nearly as fast as C++, while being open source, and reproducible across platforms and environments. The following link provide an introduction to using Julia including the basic syntax, data structures, key functions, and a few key packages.
Contributing cycles to the Open Science Grid
0
Long Tales of Science: A podcast about women in HPC
0
A series of interviews with women in the HPC community
Campus Research Computing Consortium (CaRCC)
0
CaRCC – the Campus Research Computing Consortium – is an organization of dedicated professionals developing, advocating for, and advancing campus research computing and data and associated professions.
Vision: CaRCC advances the frontiers of research by improving the effectiveness of research computing and data (RCD) professionals, including their career development and visibility, and their ability to deliver services and resources for researchers. CaRCC connects RCD professionals and organizations around common objectives to increase knowledge sharing and enable continuous innovation in research computing and data capabilities.
MATLAB bioinformatics toolbox
0
Bioinformatics Toolbox provides algorithms and apps for Next Generation Sequencing (NGS), microarray analysis, mass spectrometry, and gene ontology. Using toolbox functions, you can read genomic and proteomic data from standard file formats such as SAM, FASTA, CEL, and CDF, as well as from online databases such as the NCBI Gene Expression Omnibus and GenBank.
Rockfish at Johns Hopkins University
0
Resources and User Guide available at Rockfish
Spack Documentation
0
Spack is a package manager for supercomputers that can help administrators install scientific software and libraries for multiple complex software stacks.
QGIS Processing Executor
0
Running QGIS tools from the command line
Numpy - a Python Library
0
Numpy is a python package that leverages types and compiled C code to make many math operations in Python efficient. It is especially useful for matrix manipulation and operations.
Resource to active inference
0
Active inference is an emerging study field in machine learning and computational neuroscience. This website in particular introduces "active inference institute", which has established a couple of years ago, and contains a wide variety of resources for understanding the theory of active inference and for participating a worldwide active inference community.
MDAnalysis - Python library for the analysis of molecular dynamics simulations
0
MDAnalysis is a python based library of tools for the analysis of molecular dynamics simulations. It is able to read and write many popular simulation formats including CHARMM, LAMMPS, GROMACS, and AMBER and more. This link contains the documentation pages of all MDAnalysis functions and has links to tutorials using Jupyter Notebooks.
Expanse Home Page
0
Expanse at SDSC is a cluster designed by Dell and SDSC delivering 5.16 peak petaflops, and offers Composable Systems and Cloud Bursting.
Set Up VSCode for Python and Github
0
VSCode is a popular IDE that runs on Windows, MacOS, and Linux. This tutorial will explain how to get set up with VSCode to code in Python. It will also provide a tutorial on how to set up Github integration within VSCode.
Trusted CI Resources Page
0
Very helpful list of external resources from Trusted CI
Fine-tuning LLMs with PEFT and LoRA
0
As LLMs get larger fine-tuning to the full extent can become difficult to train on consumer hardware. Storing and deploying these tuned models can also be quite expensive and difficult to store. With PEFT (parameter -efficent fine tuning), it approaches fine-tune on a smaller scale of model parameters while freezing most parameters of the pretrained LLMs. Basically it is providing full performance that which is similar if not better than full fine tuning while only having a small number of trainable parameters. This source explains that as well as going over LORA diagrams and a code walk through.
What is fairness in ML?
0
This article discusses the importance of fairness in machine learning and provides insights into how Google approaches fairness in their ML models.
The article covers several key topics:
Introduction to fairness in ML: It provides an overview of why fairness is essential in machine learning systems, the potential biases that can arise, and the impact of biased models on different communities.
Defining fairness: The article discusses various definitions of fairness, including individual fairness, group fairness, and disparate impact. It explains the challenges in achieving fairness due to trade-offs and the need for thoughtful considerations.
Addressing bias in training data: It explores how biases can be present in training data and offers strategies to identify and mitigate these biases. Techniques like data preprocessing, data augmentation, and synthetic data generation are discussed.
Fairness in ML algorithms: The article examines the potential biases that can arise from different machine learning algorithms, such as classification and recommendation systems. It highlights the importance of evaluating and monitoring models for fairness throughout their lifecycle.
Fairness tools and resources: It showcases various tools and resources available to practitioners and developers to help measure, understand, and mitigate bias in machine learning models. Google's TensorFlow Extended (TFX) and What-If Tool are mentioned as examples.
Google's approach to fairness: The article highlights Google's commitment to fairness and the steps they take to address fairness challenges in their ML models. It mentions the use of fairness indicators, ongoing research, and partnerships to advance fairness in AI.
Overall, the article provides a comprehensive overview of fairness in machine learning and offers insights into Google's approach to building fair ML models.
Fundamentals of Cloud Computing
0
An introduction to Cloud Computing
InsideHPC
0
InsideHPC is an informational site offers videos, research papers, articles, and other resources focused on machine learning and quantum computing among other topics within high performance computing.
NITRC
0
The Neuroimaging Tools and Resources Collaboratory (NITRC) is a neuroimaging informatics knowledge environment for MR, PET/SPECT, CT, EEG/MEG, optical imaging, clinical neuroinformatics, imaging genomics, and computational neuroscience tools and resources.
Header-only C++ JSON library
0
JSON is a lightweight format for storing and transporting data, for example in a config file. This library is header-only, and has easy-to-read documentation. It is a C++ library.
Data visualization with Matplotlib
0
Data visualization is a critical aspect of data analysis. It allows for a clear and concise representation of data, making it easier for users to understand and interpret complex datasets. One of the most popular libraries for data visualization in Python is Matplotlib. The included website aims to provide a brief overview of Matplotlib, its features, and examples/exercises to dive deeper into its functionalities.
UNIX/command line basics tutorial
0
Introductory training materials for working on the UNIX command line.
ACES: Charliecloud Containers for Scientific Workflows (Tutorial)
0
This tutorial introduces the use of Containers using the Charliecloud software suite. This tutorial will provide participants with background and hands-on experience to use basic Charliecloud containers for HPC applications. We discuss what containers are, why they matter for HPC, and how they work. We'll give an overview of Charliecloud, the unprivileged container solution from Los Alamos National Laboratory's HPC Division. Students will learn how to build toy containers and containerize real HPC applications, and then run them on a cluster. Exercises are demonstrated using the ACES cluster, a composable accelerator testbed at Texas A&M University. Students with an allocation on the ACES cluster can follow along with the ACES-specific exercises.