Knowledge Base Resources
Contributed by cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators), these resources are shared through the ConnectCI community platform. Add resources you find helpful!
Understanding LLM Fine-tuning
0
With the recent uprising of LLM's many business are looking at way to adopt these LLMs and fine-tuning these models on specfic data sets to ensure accuracy. These models when fine-tuned can be optimal for fulfilling the specific needs of a company. This site explains explicitly when, how, and why models should be trained. It goes over various strategies for LLM fine -tuning.
ConnectCI
0
Connect.Cybinfrastructure is a family of portals, each representing a program that is serving a segment of the research computing and data community. Each portal provides program-specific information, as well a custom "view" into a common database. The portal was originally developed to support project workflows and a knowledge base of self service learning resources for the Northeast Cyberteam. Subsequently, it was expanded to provide support to multiple cyberteams and other research computing communities of practice. We welcome additional communities, please contact us if you are interested in participating. Central to the Portal is an extensive and ever-evolving tagging infrastructure which informs every aspect of the Portal. The tag taxonomy was initially developed by the Northeast Cyberteam to categorize subject matter relevant to practitioners of Research Computing Facilitation and is ever changing due to the frequent introduction of new technology in domains that characterize the field of research computing.
A visual introduction to Gaussian Belief Propagation
0
This website is an interactive introduction to Gaussian Belief Propagation (GBP). A probabilistic inference algorithm that operates by passing messages between the nodes of arbitrarily structured factor graphs. A special case of loopy belief propagation, GBP updates rely only on local information and will converge independently of the message schedule. The key argument is that, given recent trends in computing hardware, GBP has the right computational properties to act as a scalable distributed probabilistic inference framework for future machine learning systems.
Solving differential equations with Physics-informed Neural Network
0
Differential equations, the backbone of countless physical phenomena, have traditionally been solved using numerical methods or analytical techniques. However, the advent of deep learning introduces an intriguing alternative: Physics-Informed Neural Networks (PINNs). By leveraging the representational power of neural networks and integrating physical laws (like differential equations), PINNs offer a novel approach to solving complex problems. This guide walks through an implementation of a PINN to solve DEs such as the logistic equation.
Beautiful Soup - Simple Python Web Scraping
0
This package lets you easily scrape websites and extract information based on html tags and various other metadata found in the page. It can be useful for large-scale web analysis and other tasks requiring automated data gathering.
CyberAmbassadors: Professional Skills for Interdisciplinary Work
0
The CyberAmbassadors project was funded through a workforce development grant from the National Science Foundation (Award #1730137). Starting in 2017, the initial focus of this project was to develop, test, and refine new curriculum to help CyberInfrastructure (CI) Professionals strengthen their communications, teamwork and leadership skills. With support and collaboration from a number of academic and professional organizations, the CyberAmbassadors project was expanded to offer professional skills training to college students and professionals working across STEM (science, technology, engineering, math) disciplines.
FSL Lectures
0
This is the official University of Oxford FSL group lecture page. This includes information on upcoming and past courses (online and in-person), as well as lecture materials. Available lecture materials includes slides and recordings on using FSL, MR physics, and applications of imaging data.
Research Security Operations Center at IU
0
The NSF-funded ResearchSOC helps make scientific computing resilient to cyberattacks and capable of supporting trustworthy, productive research through operational cybersecurity services, training, and information sharing necessary to a community as unique and variable as research and education (R&E).
ResearchSOC is a service offering from Indiana University's OmniSOC.
R for Research Scientists
0
A book for researchers who contribute code to R projects: This booklet is the result of my work with the Social Cognition for Social Justice lab. It was developed in response to questions I was getting from students; both grad students that were making software design decisions, and undergraduates who were using things like version control for the first time. Although many tutorials and resources exist for these topics, there was not a single source that I thought covered just enough material to build up to the workflow used by the lab without extraneous detail.
Bioinformatics Workflow Management with Nextflow
0
Nextflow is an open-source, domain-specific language and workflow manager designed for the execution and coordination of scientific and data-intensive computational workflows. It was specifically created to address the challenges faced by researchers and scientists when dealing with complex and scalable computational pipelines, particularly in fields such as bioinformatics, genomics, and data analysis.
Here provided some links to start with.
fast.ai
0
Fastai offers many tools to people working with machine learning and artifical intelligence including tutorials on PyTorch in addition to their own library built on PyTorch, news articles, and other resources to dive into this realm.
Warewulf documentation
0
Warewulf is an operating system provisioning platform for Linux that is designed to produce secure, scalable, turnkey cluster deployments that maintain flexibility and simplicity. It can be used to setup a stateless provisioning in HPC environment.
Bridges-2 Home Page
0
Landing Page for Bridges-2 information
Oakridge Leadership Computing Facility (OLCF) Training Events and Archive
0
Upcoming training events and archives of training materials detailing general HPC best practices as well as how to use OLCF resources and services.
Cybersecurity Guide
0
Cybersecurity Guide is a comprehensive resource for students and early career professionals that provides users with a wide range of resources and up-to-date information on cybersecurity, including cybersecurity degree programs and bootcamps, career guides, as well as online courses and training opportunities. Additionally, it covers trends, best practices, and much more.
Python Tools for Data Science
0
Python has become a very popular programming language and software ecosystem for work in Data Science, integrating support for data access, data processing, modeling, machine learning, and visualization. In this webinar, we will describe some of the key Python packages that have been developed to support that work, and highlight some of their capabilities. This webinar will also serve as an introduction and overview of topics addressed in two Cornell Virtual Workshop tutorials, available at https://cvw.cac.cornell.edu/pydatasci1 and https://cvw.cac.cornell.edu/pydatasci2
Machine Learning in Astrophysics
0
Machine learning is becoming increasingly important in field with large data such as astrophysics. AstroML is a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, matplotlib, and astropy allowing for a range of statistical and machine learning routines to analyze astronomical data in Python. In particular, it has loaders for many open astronomical datasets with examples on how to visualize such complicated and large datasets.
AI powered VsCode Editor
0
**Cursor: The AI-Powered Code Editor**
Cursor is a cutting-edge, AI-first code editor designed to revolutionize the way developers write, debug, and understand code. Built upon the premise of pair-programming with artificial intelligence, Cursor harnesses the capabilities of advanced AI models to offer real-time coding assistance, bug detection, and code generation.
**How Cursor Benefits High-Performance Computing (HPC) Work:**
1. **Efficient Code Development:** With AI-assisted code generation, researchers and developers in the HPC realm can quickly write optimized code for simulations, data processing, or modeling tasks, reducing the time to deployment.
2. **Debugging Assistance:** Handling complex datasets and simulations often lead to intricate bugs. Cursor's capability to automatically investigate errors and determine root causes can save crucial time in the HPC workflow.
3. **Tailored Code Suggestions:** Cursor's AI provides context-specific code suggestions by understanding the entire codebase. For HPC applications where performance is paramount, this means receiving recommendations that align with optimization goals.
4. **Improved Code Quality:** With AI-driven bug scanning and linter checks, Cursor ensures that HPC codes are not only fast but also robust and free of common errors.
5. **Easy Integration:** Being a fork of VSCode, Cursor allows seamless migration, ensuring that developers working in HPC can swiftly integrate their existing VSCode setups and extensions.
In essence, for HPC tasks that demand speed, precision, and robustness, Cursor acts as an invaluable co-pilot, guiding developers towards efficient and optimized coding solutions.
It is free if you provide your own OPEN AI API KEY.
Slurm Scheduling Software Documentation
0
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
DAGMan for orchestrating complex workflows on HTC resources (High Throughput Computing)
0
DAGMan (Directed Acyclic Graph Manager) is a meta-scheduler for HTCondor. It manages dependencies between jobs at a higher level than the HTCondor Scheduler.
It is a workflow management system developed by the High-Throughput Computing (HTC) community, specifically for managing large-scale scientific computations and data analysis tasks. It enables users to define complex workflows as directed acyclic graphs (DAGs). In a DAG, nodes represent individual computational tasks, and the directed edges represent dependencies between the tasks. DAGMan manages the execution of these tasks and ensures that they are executed in the correct order based on their dependencies.
The primary purpose of DAGMan is to simplify the management of large-scale computations that consist of numerous interdependent tasks. By defining the dependencies between tasks in a DAG, users can easily express the order of execution and allow DAGMan to handle the scheduling and coordination of the tasks. This simplifies the development and execution of complex scientific workflows, making it easier to manage and track the progress of computations.
Numpy - a Python Library
0
Numpy is a python package that leverages types and compiled C code to make many math operations in Python efficient. It is especially useful for matrix manipulation and operations.
GIS: Projections and their distortions
0
In GIS, projections are helpful to take something plotted on a globe and convert it to a flat map that we can print or show on a screen. Unfortunately it also introduces distortions to the objects and features on the map. This not only distorts the objects visually, but the results for any spatial attribute calculations will also reflect this distortion (such as distance and area ). Below is a link to a quick primer on projections, types of distortions that can occur, and suggestions on how to choose a correct projection for your work.
Building Anaconda Navigator applications
0
This tutorial explains how to create an Anaconda Navigator Application (app) for JupyterLab. It is intended for users of Windows, macOS, and Linux who want to generate an Anaconda Navigator app conda package from a given recipe. Prior knowledge of conda-build or conda recipes is recommended.
Natural Language Processing with Deep Learning
0
CS244N is a renowned natural language processing course offered by Stanford University and taught by Christopher Manning. It covers a wide range of topics in NLP, including language modeling, machine translation, sentiment analysis, and more. It teaches both foundational concepts and cutting-edge research to gain a comprehensive understanding of NLP techniques and applications.
OpenMP Tutorial
0
OpenMP (Open Multi-Processing) is an API that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran on many platforms, instruction-set architectures and operating systems, including Solaris, AIX, FreeBSD, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.