- Trinity Tutorial for Transcriptome Assembly0Trinity is one of the most popular tool to assemble transcripts from RNA-Seq short reads. In this tutorial, we will cover the basic usage of Trinity, best practice and common problems.
- Natural Language Processing with Deep Learning0CS244N is a renowned natural language processing course offered by Stanford University and taught by Christopher Manning. It covers a wide range of topics in NLP, including language modeling, machine translation, sentiment analysis, and more. It teaches both foundational concepts and cutting-edge research to gain a comprehensive understanding of NLP techniques and applications.
- Managing and Optimizing Your Jobs on HPC0An overview of tools and methods to manage and optimize jobs and HPC workflows
- Slurm Scheduling Software Documentation0Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
- DeepChem0DeepChem is an open-source library built on TensorFlow and PyTorch. It is helpful in applying machine learning algorithms to molecular data.
- Building the ArduPilot environment for Linux0This article provides instructions for building AirSim, an open-source simulator for autonomous vehicles, on Linux. It outlines the steps to build Unreal Engine, clone and build the AirSim repository, and set up the Unreal environment. It also includes information on how to use AirSim and optional setups such as remote control for manual flight.
- OnShape Documentation0This contains documentation for getting started with using OnShape for CAD. OnShape cloud-hosted CAD software that lets you work with others like on a Google Doc, with the power and capabilities of any other software like Solidworks or Inventor.
- Data Analysis with R for Educators0This webinar series is an orientation to R. We start with an overview of R’s history and place in the larger data science ecosystem. Next, we introduce the R Studio user interface and how to access R’s excellent documentation. Finally, we present the fundamental concepts you need to use the R environment and language for data analysis. Along the way, we compare R script files (.R) to R Notebook (.Rmd) files and show how the features of R Notebook support better communication and encourage more dynamic engagement with statistical analysis and code. It is helpful to be familiar with tabular data analysis using statistical software, database tools, or spreadsheet programs. Workshop materials, including setup directions and slides are available at https://github.com/CornellCAC/r_for_edu/ The Rstudio Cloud project used in the workshop is https://rstudio.cloud/project/4044219.
- Regular Expressions0
- Learn Regular Expressions with simple, interactive exercises
- An online tool to learn, build, & test Regular Expressions
- An Online tool that lets you enter your own text and regular expressions to see what matches
Regular expressions (sometimes referred to as RegEx) is an incredibly powerful tool that is used to define string patterns for "find" or "find and replace" operations on strings, or for input validation. Regular Expressions are used in search engines, in search and replace dialogs of word processors and text editors, and text-processing Linux utilities such as sed and awk. They are supported in many programming languages, including Python, R, Perl, Java, and others. - United Nations Mentor Handbook0The United Nations (UN) is an international organization comprising 193 Member States, including the United States. As a global organization, the UN is the one place on Earth where the world's nations can gather to discuss common problems and find shared solutions that benefit all humanity. This handbook has been produced for UN staff of all backgrounds and levels and provides an overview of how to approach your participation in a mentorship program. This resource is quickly digestible and provides a basic structure that will be helpful to review before the first meeting with your mentee.
- Optimizing Research Workflows - A Documentation of Snakemake0Snakemake is a powerful and versatile workflow management system that simplifies the creation, execution, and management of data analysis pipelines. It uses a user-friendly, Python-based language to define workflows, making it particularly valuable for automating and reproducibly managing complex computational tasks in research and data analysis.
- Singularity/Apptainer User Manuals0Singularity/Apptainer is a free and open-source container platform that allows users to build and run containers on high performance computing resources. SingularityCE is the community edition of Singularity maintained by Sylabs, a company that also offers commercial Singularity products and services. Apptainer is a fork of Singularity, maintained by the Linux foundation, a community of developers and users who are passionate about open source software.
- Molecular Dynamics Tutorials for Beginner's0Links to MD tutorials for beginner's across various simulation platforms.
- Termius - Modern ssh platform0**Termius: The Modern SSH Client for 2023** Termius is the future-facing SSH client that's redefining remote server access in 2023. Designed for ease and efficiency, Termius offers a seamless connection experience across all devices, be it mobile or desktop. Gone are the days of re-inputting IP addresses, ports, and passwords; with Termius, one-click connectivity is the new norm. **How Termius Elevates Remote Server Access:** 1. **One-Click Connectivity:** Save the hassle of remembering and re-entering connection details. Termius provides an immediate connection to your infrastructure with a single click. 2. **Synchronized Across Devices:** Termius ensures that your data, connection settings, and preferences are consistent across all your devices, from mobile to desktop. 3. **Unparalleled Security:** With the Cloud Vault feature, users can securely store their data in an encrypted environment, accessible only from their specific devices. Shared vaults allow for safe connection sharing within teams. 4. **AI-Powered Terminal Experience:** Advanced AI-driven autocomplete means users can input command descriptions, and Termius will swiftly convert them into accurate bash commands, simplifying and enhancing the terminal interaction. 5. **Collaborative Troubleshooting:** Share terminal sessions with teammates, facilitating cooperative problem-solving or knowledge sharing. No additional server-side installations needed. 6. **Automation and Snippets:** Streamline routine processes with the ability to save and run frequently used shell scripts. Sharing these Snippets with your team can lead to increased productivity and fewer manual errors. 7. **All-Device Compatibility:** Whether on iPad, iPhone, Android, macOS, Windows, or Linux, Termius ensures a consistent and fluid experience. The platform's synchronization capability means you're always ready to respond swiftly, irrespective of the device in use. For professionals and businesses aiming for top-notch server access efficiency, Termius is the gold standard in 2023. Experience the revolution in SSH connectivity and optimize your workflow with Termius.
- DAGMan for orchestrating complex workflows on HTC resources (High Throughput Computing)0DAGMan (Directed Acyclic Graph Manager) is a meta-scheduler for HTCondor. It manages dependencies between jobs at a higher level than the HTCondor Scheduler. It is a workflow management system developed by the High-Throughput Computing (HTC) community, specifically for managing large-scale scientific computations and data analysis tasks. It enables users to define complex workflows as directed acyclic graphs (DAGs). In a DAG, nodes represent individual computational tasks, and the directed edges represent dependencies between the tasks. DAGMan manages the execution of these tasks and ensures that they are executed in the correct order based on their dependencies. The primary purpose of DAGMan is to simplify the management of large-scale computations that consist of numerous interdependent tasks. By defining the dependencies between tasks in a DAG, users can easily express the order of execution and allow DAGMan to handle the scheduling and coordination of the tasks. This simplifies the development and execution of complex scientific workflows, making it easier to manage and track the progress of computations.
- Wiki for Onboarding onto the C3DDB Cluster at MGHPCC0This is a resource for researchers and students looking to on-board onto the c3ddb cluster at MGHPCC. In the code section, there are example job submission scripts for the different queues on c3ddb.
- Beautiful Soup - Simple Python Web Scraping0This package lets you easily scrape websites and extract information based on html tags and various other metadata found in the page. It can be useful for large-scale web analysis and other tasks requiring automated data gathering.
- GIS: Projections and their distortions0In GIS, projections are helpful to take something plotted on a globe and convert it to a flat map that we can print or show on a screen. Unfortunately it also introduces distortions to the objects and features on the map. This not only distorts the objects visually, but the results for any spatial attribute calculations will also reflect this distortion (such as distance and area ). Below is a link to a quick primer on projections, types of distortions that can occur, and suggestions on how to choose a correct projection for your work.
- ACCESS KB Guide - Expanse0Expanse at SDSC is a cluster designed by Dell and SDSC delivering 5.16 peak petaflops, and offers Composable Systems and Cloud Bursting. This documentation describes how to use the Expanse cluster with some specific information for people with ACCESS accounts.
- R for Data Science0R for Data Science is a comprehensive resource for individuals looking to harness the power of the R programming language for data analysis, visualization, and statistical modeling. Whether you're a beginner or an experienced data scientist, this guide will help you unlock the full potential of R in the realm of data science.
- Educause HEISC-800-171 Community Group0The purpose of this group is to provide a forum to discuss NIST 800-171 compliance. Participants are encouraged to collaborate and share effective practices and resources that help higher education institutions prepare for and comply with the NIST 800-171 standard as it relates to Federal Student Aid (FSA), CMMC, DFARS, NIH, and NSF activities.
- Handwritten Digits Tutorial in PyTorch0This tutorial is essentially the "hello world" of image recognition and feed-forward neural network (using PyTorch). Using the MNIST database (filled within images of handwritten digits), the tutorial will instruct how to build a feed-forward neural network that can recognize handwritten digits. A solid understanding of feed-forward and back-propagation is recommended.
- Framework to help in scaling Machine Learning/Deep Learning/AI/NLP Models to Web Application level0This framework will help in scaling Machine Learning/Deep Learning/Artificial Intelligence/Natural Language Processing Models to Web Application level almost without any time.
- Application Fundamentals (Android)0The provided text discusses various aspects of Android app development fundamentals. It covers key concepts related to app components, the AndroidManifest.xml file, and app resources. Android apps are built using various components, including Activities, Services, Broadcast Receivers, and Content Providers. These components serve different purposes and have distinct lifecycles. Activities are used for user interaction, services for background tasks, broadcast receivers for system-wide event handling, and content providers for managing shared data.The AndroidManifest.xml file is essential for declaring app components, permissions, and other settings. It informs the Android system about the app's components and capabilities. For instance, it specifies the minimum API level, declares hardware and software requirements, and defines intent filters to enable components to respond to specific actions.It's crucial to declare app requirements, such as device features and minimum Android API levels, to ensure compatibility with different devices and configurations. These declarations help in filtering the app's availability on Google Play for users with compatible devices.Android apps rely on resources separate from code, including images, layouts, strings, and more. These resources are stored in various directories and can be tailored for different device configurations. Providing alternative resources allows for optimization across different languages, screen sizes, orientations, and other factors. Understanding these fundamentals is essential for developing Android applications effectively, ensuring compatibility, and providing a consistent user experience across a wide range of devices and configurations.
- Introduction to Probabilistic Graphical Models0This website summarizes the notes of Stanford's introductory course on probabilistic graphical models. It starts from the very basics and concludes by explaining from first principles the variational auto-encoder, an important probabilistic model that is also one of the most influential recent results in deep learning.
Knowledge Base Resources
These resources are contributed by researchers, facilitators, engineers, and HPC admins. Please upvote resources you find useful!