Knowledge Base Resources
These resources have been contributed and “vetted” by the community of cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators) that are participating in programs such as this one, that are supported by the ConnectCI community management platform. Additional Knowledge Base Resources are always welcome!
The Carpentries
2
We teach foundational coding and data science skills to researchers worldwide.
DARWIN Documentation Pages
1
DARWIN (Delaware Advanced Research Workforce and Innovation Network) is a big data and high performance computing system designed to catalyze Delaware research and education
PyTorch for Deep Learning and Natural Language Processing
1
PyTorch is a Python library that supports accelerated GPU processing for Machine Learning and Deep Learning. In this tutorial, I will teach the basics of PyTorch from scratch. I will then explore how to use it for some ML projects such as Neural Networks, Multi-layer perceptrons (MLPs), Sentiment analysis with RNN, and Image Classification with CNN.
GIS: Geocoding Services
1
Geocoding is the process of taking a street address and converting it into coordinates that can be plotted on a map. This conversion typically requires an API call to a remote server hosted by an organization/institution. The remote server will take the address attributes provided by you and the remote server will compare it to the data it contains and return a best estimate on the coordinates for that location.
There are many geocoding services available with different world coverages, quality of result, and set different rate limits for access. For R, a package called "tidygeocoder" provides an easy way to connect to these different services. As an additional benefit, their documentation provides a good summary of geocoding services available and links to their documentation. The link to the documentation for gecoding services accessible by "tidygeocoder" is provided below.
For Python, geopy package is a library that provides connection to various geocoding services. The link to the documentation for this package is also included below.
HPC Carpentry
1
An HPC focused Carpentry community. Trainings include: HPC fundamentals, python, chapel, LAMMPS, parallelization with python, scaling studies, etc.
Data Visualization tools for Python
1
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It makes analyzing and presenting your data extremely easy and works with Python which many people already know.
The Chronicle of Evidence-Based Mentoring
1
This is a great mentoring resource and has many articles related to mentoring. It is a one-stop shop for mentoring, and at the bottom, there are tags based on topics, and interested users can pick and choose articles and resources on different types of mentorship.
ACCESS Pegasus Documentation
1
The documentation provides an overview of using Pegasus, a workflow management system, on ACCESS resources for high throughput computing (HTC) workloads, covering logging in, workflow creation, resource configuration, and monitoring options.
Managing Python Packages on an HPC Cluster
1
This workshop will go into the different ways python packages can be managed in a cluster environment using conda and python virtual environments both in batch mode from the command line and with Jupyter Notebooks and Jupyter Lab on the cluster. The examples will be run on the GMU HOPPER Cluster.
Introduction to Python for Digital Humanities and Computational Research
1
This documentation contains introductory material on Python Programming for Digital Humanities and Computational Research. This can be a go-to material for a beginner trying to learn Python programming and for anyone wanting a Python refresher.
Useful R Packages for Data Science and Statistics
1
This Udacity article listed the most frequently used R packages for data science and statistics. For each package, the article provided the link to its official documentation. It will be a great start point if you want to start your data science journey in R.
Open OnDemand
1
Open OnDemand is an easy-to-use web portal that lets students, researchers, and industry professionals use supercomputers from anywhere. It is installed on supercomputing resources at hundreds of sites. By eliminating the need for client software or command-line interface, Open OnDemand empowers users of all skill levels and significantly speeds up the time to their first computing.
Paraview UArizona HPC links (beginner)
0
These links take you to visualization resources supported by the University of Arizona's HPC visualization consultant (rtdatavis.github.io). The following links are specific to the Paraview program and the workflows that have been used my researchers at the U of Arizona. Some of the pages linked are very beginner friendly: getting started, working with cameras and keyframes for rendering, visualizing external files (netcdf climate data), graphs and data exporting.
Many of the workflows involve using remote desktops via the Open On Demand interface, but if this isn't set up at your university you can use paraview locally on a desktop. Feel free to post on access ci https://ask.cyberinfrastructure.org/ if you need assistance getting a paraview gui open for your work on HPC.
Neurostars
0
A question and answer forum for neuroscience researchers, infrastructure providers and software developers.
Pandas - Python
0
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. It lets you store data in easy to manage and display data frames, with column names and datatypes.
ACCESS Support Portal
0
AHPCC documentary
0
This link is a documentary website to use AHPCC.
A visual introduction to Gaussian Belief Propagation
0
This website is an interactive introduction to Gaussian Belief Propagation (GBP). A probabilistic inference algorithm that operates by passing messages between the nodes of arbitrarily structured factor graphs. A special case of loopy belief propagation, GBP updates rely only on local information and will converge independently of the message schedule. The key argument is that, given recent trends in computing hardware, GBP has the right computational properties to act as a scalable distributed probabilistic inference framework for future machine learning systems.
Bridges-2 Home Page
0
Landing Page for Bridges-2 information
Women in HPC
0
Through collaboration and networking, WHPC strives to bring together women in HPC and technical computing while encouraging women to engage in outreach activities and improve the visibility of inspirational role models.
Chameleon
0
Chameleon is an NSF-funded testbed system for Computer Science experimentation. It is designed to be deeply reconfigurable, with a wide variety of capabilities for researching systems, networking, distributed and cluster computing and security.
Ultimate guide to Unix
0
Unix is incredibly common and useful. This website provides all the common commands and explanations for one to get started with a unix system.
Rockfish at Johns Hopkins University
0
Resources and User Guide available at Rockfish
Research Security Operations Center at IU
0
The NSF-funded ResearchSOC helps make scientific computing resilient to cyberattacks and capable of supporting trustworthy, productive research through operational cybersecurity services, training, and information sharing necessary to a community as unique and variable as research and education (R&E).
ResearchSOC is a service offering from Indiana University's OmniSOC.
Machine Learning in Astrophysics
0
Machine learning is becoming increasingly important in field with large data such as astrophysics. AstroML is a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, matplotlib, and astropy allowing for a range of statistical and machine learning routines to analyze astronomical data in Python. In particular, it has loaders for many open astronomical datasets with examples on how to visualize such complicated and large datasets.