Knowledge Base Resources
Contributed by cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators), these resources are shared through the ConnectCI community platform. Add resources you find helpful!
What are LSTMs?
0
This reading will explain what a long short-term memory neural network is. LSTMs are a type of neural networks that rely on both past and present data to make decisions about future data. It relies on loops back to previous data to make such decisions. This makes LSTMs very good for predicting time-dependent behavior.
Metadata Systems
0
Metadata is a vital topic in libraries and librarianship, encompassing structured information used for accessing digital resources. The definition of metadata varies but is essentially data about data. It has evolved beyond simply describing metadata schemas and now focuses on topics like interoperability, non-descriptive metadata (administrative and preservation metadata), and the effective application of metadata schemas for user discovery. Interoperability, the ability to seamlessly exchange metadata between systems, is a major concern. Different levels of interoperability are examined, including schema-level, record-level, and repository-level. Challenges to interoperability include variations in standards, collaboration barriers, and costs.Metadata management is discussed in terms of the holistic management of metadata across an entire library. Steps include analyzing metadata requirements, adopting schema, creating metadata content, delivery/access, evaluation, and maintenance. Administrative metadata, which encompasses ownership and production information, is becoming more critical, particularly for electronic resource licensing. Preservation metadata is also gaining importance in ensuring the long-term viability of digital objects.
Probabilistic Semantic Data Association for Collaborative Human-Robot Sensing
0
Humans cannot always be treated as oracles for collaborative sensing. Robots thus need to maintain beliefs over unknown world states when receiving semantic data from humans, as well as account for possible discrepancies between human-provided data and these beliefs. To this end, this paper introduces the problem of semantic data association (SDA) in relation to conventional data association problems for sensor fusion. It then, develops a novel probabilistic semantic data association (PSDA) algorithm to rigorously address SDA in general settings. Simulations of a multi-object search task show that PSDA enables robust collaborative state estimation under a wide range of conditions.
phenoACCESS-24 workshop program materials
0
phenoACCESS-24: Workshop on Research Computing and Plant Phenotyping
High-throughput plant phenotyping is computationally intensive, requiring data storage, data processing and analysis, research computing expertise, and mechanisms for data sharing. This workshop is aimed at research computing workforce development by addressing questions such as what is plant phenotyping; what types of data are collected; what are the preprocessing and analytical needs; what tools and platforms exist for data capture, management, analysis, and storage; and how best to collaborate and engage with phenotyping researchers. The full-day agenda will include speakers (scientists and research compute staff); panel discussions (how to work with research computing staff and facilities; how to engage with phenotyping scientists), and networking opportunities (meet-and-greet, ice breakers, small group discussions). The videos and slide decks for the talks are included on the linked page.
Docker Tutorial for Beginners
0
A Docker tutorial for beginners is a course that teaches the basics of Docker, a containerization platform that allows you to package your application and its dependencies into a standardized unit for development, shipment, and deployment.
Jetstream2 Status
0
Jetstream2 makes cutting-edge high-performance computing and software easy to use for your research regardless of your project’s scale—even if you have limited experience with supercomputing systems.Cloud-based and on-demand, the 24/7 system includes discipline-specific apps. You can even create virtual machines that look and feel like your lab workstation or home machine, with thousands of times the computing power.
CUDA Toolkit Documentation
0
NVIDIA CUDA Toolkit Documentation: If you are working with GPUs in HPC, the NVIDIA CUDA Toolkit is essential. You can access the CUDA Toolkit documentation, including programming guides and API references, at this provided website
Bridges-2 Home Page
0
Landing Page for Bridges-2 information
Benchmarking with a cross-platform open-source flow solver, PyFR
0
What is PyFR and how does it solve fluid flow problems?
PyFR is an open-source Computational Fluid Dynamics (CFD) solver that is based on Python and employs the high-order Flux Reconstruction technique. It effectively solves fluid flow problems by utilizing streaming architectures, making it suitable for complex fluid dynamics simulations.
How does PyFR achieve scalability on clusters with CPUs and GPUs?
PyFR achieves scalability by leveraging distributed memory parallelism through the Message Passing Interface (MPI). It implements persistent, non-blocking MPI requests using point-to-point (P2P) communication and organizes kernel calls to enable local computations while exchanging ghost states. This design approach allows PyFR to efficiently operate on clusters with heterogeneous architectures, combining CPUs and GPUs.
Why is PyFR valuable for benchmarking clusters?
PyFR's exceptional performance has been recognized by its selection as a finalist in the ACM Gordon Bell Prize for High-Performance Computing. It demonstrates strong-scaling capabilities by effectively utilizing low-latency inter-GPU communication and achieving strong-scaling on unstructured grids. PyFR has been successfully benchmarked with up to 18,000 NVIDIA K20X GPUs on Titan, showcasing its efficiency in handling large-scale simulations.
The Use of High-Performance Computing Services in University Settings: A Usability Case Study of the University of Cincinnati’s High-Performance Computing Cluster
0
This presentation gives a detailed breakdown of the outcome of my master's thesis which was focused on making HPC Clusters accessible across all disciplines in a university setting "Our Case Study was the university of Cincinnati".
Astronomy data analysis with astropy
0
Astropy is a community-driven package that offers core functionalities needed for astrophysical computations and data analysis. From coordinate transformations to time and date handling, unit conversions, and cosmological calculations, Astropy ensures that astronomers can focus on their research without getting bogged down by the intricacies of programming. This guide walks you through practical usage of astropy from CCD data reduction to computing galactic orbits of stars.
MDAnalysis - Python library for the analysis of molecular dynamics simulations
0
MDAnalysis is a python based library of tools for the analysis of molecular dynamics simulations. It is able to read and write many popular simulation formats including CHARMM, LAMMPS, GROMACS, and AMBER and more. This link contains the documentation pages of all MDAnalysis functions and has links to tutorials using Jupyter Notebooks.
R for Research Scientists
0
A book for researchers who contribute code to R projects: This booklet is the result of my work with the Social Cognition for Social Justice lab. It was developed in response to questions I was getting from students; both grad students that were making software design decisions, and undergraduates who were using things like version control for the first time. Although many tutorials and resources exist for these topics, there was not a single source that I thought covered just enough material to build up to the workflow used by the lab without extraneous detail.
Header-only C++ JSON library
0
JSON is a lightweight format for storing and transporting data, for example in a config file. This library is header-only, and has easy-to-read documentation. It is a C++ library.
Conda
0
Conda is a popular package management system. This tutorial introduces you to Conda and walks you through managing Python, your environment, and packages.
Regulated Research Community of Practice
0
The daily news clearly shows the increasing threat to safety and privacy of data, personal as well as intellectual property. While the requirements such as DFARS 7012, HIPAA, and Cybersecurity Maturity Model Certification (CMMC) improve the consistency of data handling between agencies and contractors and grantees, it leaves academic institutions to figure out how to meet such requirements in a cost-effective way that fits the research and education mission of the institution. Most institutions, agencies, and companies act in isolation with one-off contract language to address data security and safeguarding concerns. Even though cybersecurity has a clear and uniform goal of protecting data, a onesize solution does not fit all academic institutions.
By supporting this community with development of a community strategic roadmap, regular discussions and workshops, and a repository of generalized and specific resources for handling regulated research programs RRCoP lowers the barrier to entry for institutions handling new regulations.
Neocortex Documentation
0
Neocortex is a new supercomputing cluster at the Pittsburgh Supercomputing Center (PSC) that features groundbreaking AI hardware from Cerebras Systems.
Automated Machine Learning Book
0
The authoritative book on automated machine learning, which allows practitioners without ML expertise to develop and deploy state-of-the-art machine learning approaches. Describes the background of techniques used in detail, along with tools that are available for free.
Moving-Lid-Driven Flow Simulation by Finite Difference Method
0
The listed repository contains code written in C++ to model the flow inside a cavity with a lid moving above from left to right by discretizing incompressible N-S equations with finite difference method. For the governing equations, artificial viscosity has been considered to increase the stability. In terms of solving the resulted algebraic equation system, both the Point Jacobi Method and Symmetric Gauss Seidel methods have been used for the iteration process.
Online Bachelor's in Data Science Program Guide - TechGuide
0
The realm of data science is one that onlookers regard with curiosity and respect. There are a lot of unknowns in this area of study that only recently became hugely relevant. It is important to get the facts on how expertise in data science is transforming the world. This article features what a bachelor’s degree means in today’s market and the future.
Scikit-Learn: Easy Machine Learning and Modeling
0
Scikit-learn is free software machine learning library for Python. It has a variety of features you can use on data, from linear regression classifiers to xg-boost and random forests. It is very useful when you want to analyze small parts of data quickly.
Why 'N How: Martinos Center for Biomedical Imaging:
0
The Why & How seminar series is designed to introduce research assistants, graduate students, and postdoctoral and clinical fellows – really, anyone who is interested – to the many tools used in medical imaging. These include software tools and most of the major imaging modalities wielded by investigators (MRI, PET, EEG, MEG, optical, TMS and others). As the name of the series suggests, the talks cover both the reasons researchers might need a particular tool and the nuts and bolts of how to apply it. You can watch videos of the overviews below.
Introduction to Vizualization on HPC Using Python
0
This workshop has an introduction to the concepts of visualization followed by hands on exercises. The concepts section has Speaker Notes, and the hands on section has an accompanying Jupyter notebook.
The workshop is one in a series of Introduction to HPC
Biopython Tutorial
0
The Biopython Tutorial and Cookbook website is a dedicated online resource for users in the field of computational biology and bioinformatics. It provides a collection of tutorials and practical examples focused on using the Biopython library.
The website offers a series of tutorials that cover various aspects of Biopython, catering to users with different levels of expertise. It also includes code snippets and examples, and common solutions to common challenges in computational biology.
The Theory Behind Neural Networks (Very Simplified)
0
This video by the YouTube channel 3Blue1Brown provides a very simplified introduction to the theory behind neural networks. This tutorial is perfect for those that don't have much linear algebra or machine learning background and are eager to step into the realm of ML!