Knowledge Base Resources
Contributed by cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators), these resources are shared through the ConnectCI community platform. Add resources you find helpful!
Cornell Virtual Workshop
1
Cornell Virtual Workshop is a comprehensive training resource for high performance computing topics. The Cornell University Center for Advanced Computing (CAC) is a leader in the development and deployment of Web-based training programs. Our Cornell Virtual Workshop learning platform is designed to enhance the computational science skills of researchers, accelerate the adoption of new and emerging technologies, and broaden the participation of underrepresented groups in science and engineering. Over 350,000 unique visitors have accessed Cornell Virtual Workshop training on programming languages, parallel computing, code improvement, and data analysis. The platform supports learning communities around the world, with code examples from national systems such as Frontera, Stampede2, and Jetstream2.
Managing Python Packages on an HPC Cluster
1
This workshop will go into the different ways python packages can be managed in a cluster environment using conda and python virtual environments both in batch mode from the command line and with Jupyter Notebooks and Jupyter Lab on the cluster. The examples will be run on the GMU HOPPER Cluster.
GIS: Geocoding Services
1
Geocoding is the process of taking a street address and converting it into coordinates that can be plotted on a map. This conversion typically requires an API call to a remote server hosted by an organization/institution. The remote server will take the address attributes provided by you and the remote server will compare it to the data it contains and return a best estimate on the coordinates for that location.
There are many geocoding services available with different world coverages, quality of result, and set different rate limits for access. For R, a package called "tidygeocoder" provides an easy way to connect to these different services. As an additional benefit, their documentation provides a good summary of geocoding services available and links to their documentation. The link to the documentation for gecoding services accessible by "tidygeocoder" is provided below.
For Python, geopy package is a library that provides connection to various geocoding services. The link to the documentation for this package is also included below.
HPC Carpentry
1
An HPC focused Carpentry community. Trainings include: HPC fundamentals, python, chapel, LAMMPS, parallelization with python, scaling studies, etc.
Gentle Introduction to Programming With Python
1
This course from MIT OpenCourseWare (OCW) covers very basic information on how to get started with programming using Python. Lectures are available, along with practice assignments, to users at no cost. Python has many applications in tech today, from web frameworks to machine learning. This course will also instruct users on how to get set up with an IDE, which will allow for way more efficient debugging.
PyTorch for Deep Learning and Natural Language Processing
1
PyTorch is a Python library that supports accelerated GPU processing for Machine Learning and Deep Learning. In this tutorial, I will teach the basics of PyTorch from scratch. I will then explore how to use it for some ML projects such as Neural Networks, Multi-layer perceptrons (MLPs), Sentiment analysis with RNN, and Image Classification with CNN.
Displaying Scientific Data with Tableau
0
Tableau is a popular and capable software product for creating charts that present data and dashboards that allow you to explore data. It is typically used to present business or statistical data, but can also create compelling visualizations of scientific data. However, scientific data is often generated or stored in formats that are not immediately accessible by Tableau. This seminar will explore the data formats that work best with Tableau and the available mechanisms for generating scientific data in (or converting it to) those formats so that you can apply the full power of Tableau to create the best possible visualizations of your data.
CI Computing Module For All
0
Computing Module: Introduces fundamental concepts and skills of Cyberinfrastructure (CI) and High-Performance Computing (HPC) to lower the barrier to becoming CI users in disaster management research. The module will cover the critical topics of CI and HPC with hands-on sessions.
Disaster Data Module: Introduces concepts of geospatial big data in disaster management. Students will learn how to access and process disaster data.
Geospatial Analytic Module: Introduces geospatial analytics skills to address real-world challenges in disaster management. The module will use the data introduced in the Disaster Data Module and cover various geospatial analytics topics such as geosimulation, spatial optimization, network analysis, terrain analysis, Geospatial Artificial Intelligence (GeoAI), social sensing, and CyberGIS.
OnShape FeatureScripts: Custom features for everyone
0
OnShape FeatureScripts allow users to create their own features via OnShape's programming language. The user can make these as simple or complex as they need, and they can save tons of time for heavy OnShape users or complex projects!
Open Storage Network
0
The Open Storage Network, a national resource available through the XSEDE resource allocation system, is high quality, sustainable, distributed storage cloud for the research community.
How to Build a Great Relationship with a Mentor
0
Emphasizes benefits of being mentored. Describes how to identify and choose a mentor. Suggests a path forward. Not mentor or two-way focused.
AHPCC documentary
0
This link is a documentary website to use AHPCC.
Bioinformatics Workflow Management with Nextflow
0
Nextflow is an open-source, domain-specific language and workflow manager designed for the execution and coordination of scientific and data-intensive computational workflows. It was specifically created to address the challenges faced by researchers and scientists when dealing with complex and scalable computational pipelines, particularly in fields such as bioinformatics, genomics, and data analysis.
Here provided some links to start with.
Examples of code using JSON nlohmann header only Library for C++
0
This code showcases how to work with the header-only nlohmann JSON library for C++. In order to compile, change the extensions from json_test.txt to json_test.cpp and test.txt to test.json. You must also download the header files from https://github.com/nlohmann/json. Complilation instructions are at the bottom of json_test. This code is very helpful for creating config files, for example.
ACCESS Guide (originally given at Duke OIT)
0
A guide for Duke OIT on how to advise users on using ACCESS and allocation credits to jetstream 2 for Duke University members. This can be used for non Duke members. Assumes the reader has basic knowledge of ACCESS.
Cyber Security
0
learning cybersecurity is crucial for personal protection, safeguarding digital assets, financial security, and national security. It is important when it comes to consumer data protection for business, creating long lasting relationships with customers.
AI for improved HPC research - Cursor and Termius - Powerpoint
0
These slides provide an introduction on how Termius and Cursor, two new and freemium apps that use AI to perform more efficient work, can be used for faster HPC research.
GIS: What is a Geodetic Datums?
0
Often when working with GIS, or spatial data, one encounters the word "datum" and it may require that you choose a "datum" when doing GIS computation tasks. Below is a short video on what are datums from NOAA and UCAR.
Factor Graphs and the Sum-Product Algorithm
0
A tutorial paper that presents a generic message-passing algorithm, the sum-product algorithm, that operates in a factor graph. Following a single, simple computational rule, the sum-product algorithm computes either exactly or approximately various marginal functions derived from the global function. A wide variety of algorithms developed in artificial intelligence, signal processing, and digital communications can be derived as specific instances of the sum-product algorithm, including the forward/backward algorithm, the Viterbi algorithm, the iterative "turbo" decoding algorithm, Pearl's (1988) belief propagation algorithm for Bayesian networks, the Kalman filter, and certain fast Fourier transform (FFT) algorithms
OpenStack Tutorial For Beginners
0
OpenStack Tutorial For Beginners
Educause HEISC-800-171 Community Group
0
The purpose of this group is to provide a forum to discuss NIST 800-171 compliance. Participants are encouraged to collaborate and share effective practices and resources that help higher education institutions prepare for and comply with the NIST 800-171 standard as it relates to Federal Student Aid (FSA), CMMC, DFARS, NIH, and NSF activities.
File management of Visual Studio Code on clusters
0
Visual Studio Code, commonly known as VSCode, is a popular tool used by programmers worldwide. It serves as a text editor and an Integrated Development Environment (IDE) that supports a wide variety of programming languages. One of its key features is its extensive library of extensions. These extensions add on to the basic functionalities of VSCode, making coding more efficient and convenient.
However, there's a catch. When these extensions are installed and used frequently, they generate a multitude of files. These files are typically stored in a folder named .vscode-extension within your home directory. On a cluster computing facility such as the FASTER and Grace clusters at Texas A&M University, there's a limitation on how many files you can have in your home directory. For instance, the file number limit could be 10000, while the .vscode-extension directory can hold around 4000 temporary files even with just a few extensions. Thus, if the number of files in your home directory surpasses this limit due to VSCode extensions, you might face some issues. This restriction can discourage users from taking full advantage of the extensive features and extensions offered by the VSCode editor.
To overcome this, we can shift the .vscode-extension directory to the scratch space. The scratch space is another area in the cluster where you can store files and it usually has a much higher limit on the number of files compared to the home directory. We can perform this shift smoothly using a feature called symbolic links (or symlinks for short). Think of a symlink as a shortcut or a reference that points to another file or directory located somewhere else.
Here's a step-by-step guide on how to move the .vscode-extension directory to the scratch space and create a symbolic link to it in your home directory:
1. Copy the .vscode-extension directory to the scratch space: Using the cp command, you can copy the .vscode-extension directory (along with all its contents) to the scratch space. Here's how:
cp -r ~/.vscode-extension /scratch/user
Don't forget to replace /scratch/user with the actual path to your scratch directory.
2. Remove the original .vscode-extension directory: Once you've confirmed that the directory has been copied successfully to the scratch space, you can remove the original directory from your home space. You can do this using the rm command:
rm -r ~/.vscode-extension
It's important to make sure that the directory has been copied to the scratch space successfully before deleting the original.
3. Create a symbolic link in the home directory: Lastly, you'll create a symbolic link in your home directory that points to the .vscode-extension directory in the scratch space. You can do this as follows:
ln -s /scratch/user/.vscode-extension ~/.vscode-extension
By following this process, all the files generated by VSCode extensions will be stored in the scratch space. This prevents your home directory from exceeding its file limit. Now, when you access ~/.vscode-extension, the system will automatically redirect you to the directory in the scratch space, thanks to the symlink. This method ensures that you can use VSCode and its various extensions without worrying about hitting the file limit in your home directory.
Feed Forward NNs and Gradient Descent
0
Feed-forward neural networks are a simple type of network that simply rely on data to be "fed-forward" through a series of layers that makes decisions on how to categorize datum. Gradient descent is a type of optimization tool that is often used to train machines. These two areas in ML are good starting points and are the easiest types of neural network/optimization to understand.
Machine Learning with sci-kit learn
0
In the realm of Python-based machine learning, Scikit-Learn stands out as one of the most powerful and versatile tools available. This introductory post serves as a gateway to understanding Scikit-Learn through explanations of introductory ML concepts along with implementations examples in Python.
Quick and Robust Data Augmentation with Albumentations Library
0
Data augmentation is a crucial step in the pipeline for image classification with deep learning. Albumentations is an extremely versatile Python library that can be used to easily augment images. Transformations include rotations, flips, downscaling, distortions, blurs, and many more.
Citation:
Buslaev A, Iglovikov VI, Khvedchenya E, Parinov A, Druzhinin M, Kalinin AA. Albumentations: Fast and Flexible Image Augmentations. Information. 2020; 11(2):125. https://doi.org/10.3390/info11020125