End-to-end learning of protein-protein interactions

Submission information

Submission Number: 61

Submission ID: 92

Submission UUID: ef874fa2-e0e2-4987-a7dc-f0d9cacd71d9

Submission URI: /form/project

Created: Wed, 08/12/2020 - 13:54

Completed: Wed, 08/12/2020 - 15:10

Changed: Wed, 07/06/2022 - 15:10

Remote IP address: 165.230.224.100

Submitted by: Galen Collier

Language: English

Is draft: No

Webform: Project

Project Title End-to-end learning of protein-protein interactions

Program CAREERS

Project Image

Tags bash (242), bioinformatics (277), computational-chemistry (81), debugging (38), machine-learning (272), programming (5), programming-best-practices (49), python (69), scripting (243), slurm (71), software-installation (211), tensorflow (51), tuning (217)

Status Halted

Project Leader

Project Leader Guillaume Lamoureux

Email guillaume.lamoureux@rutgers.edu

Mobile Phone {Empty}

Work Phone {Empty}

Project Personnel

Mentor(s) Galen Collier

Student-facilitator(s) {Empty}

Mentee(s) {Empty}

Project Information

Project Description Protein-protein interactions (PPIs) are involved in numerous fundamental biological processes and a model that can reliably predict whether two proteins interact — and predict the effect of protein variation on an existing interaction — opens up new avenues for systems biology and for protein design. Current state-of-the-art PPI prediction models rely on sequence similarity with proteins known to interact and have an intrinsically limited accuracy for the protein variants of interest for cancer or viral/bacterial infection.

The goal of the project is to train deep learning models for PPI prediction in absence of structural information about the protein complex. We have recently developed models to predict the structure of any complex formed by two proteins A and B of known structure (see our preprint “Protein-protein docking using learned three-dimensional representations”, https://www.biorxiv.org/content/10.1101/738690v2), and we now aim at developing models that generate the structure of the AB complex at once, without explicitly searching for the optimal relative orientations of the two proteins, and that predict the binding affinity of proteins A and B directly from their structures. Such models have two main advantages: (1) they are much more computationally efficient, since they avoid a costly grid search in the space of translations and rotations, and (2) they are differentiable, which means they can be used as building blocks for larger neural architectures that, for instance, also predict the structures of the individual proteins A and B themselves.

This project is enabled by the development of TorchProteinLibrary, a computationally efficient library of differentiable primitives for deep neural network models of protein structure (see our preprint “TorchProteinLibrary: A computationally efficient, differentiable representation of protein structure” https://arxiv.org/abs/1812.01108). The library implements the functionalities needed to perform end-to-end learning of protein structure prediction.

Project Information Subsection

Project Deliverables Research workflow development: successful training of deep learning models for PPI prediction in absence of structural information about the protein complex. Communicating the findings in the form of presentations and/or publications.

Project Deliverables {Empty}

Student Research Computing Facilitator Profile - Grad or undergrad
- Interested in structural biology research
- Experienced Linux or Unix user
- Comfortable working in a remote Linux environment (HPC cluster)
- Some experience with Python programming
- Structural modeling experience (understanding general concepts) will be helpful
- Familiarity with machine learning concepts will be helpful

Mentee Research Computing Profile {Empty}

Student Facilitator Programming Skill Level Practical applications

Mentee Programming Skill Level {Empty}

Project Institution Rutgers University–Camden

Project Address 303 Cooper St
Camden, New Jersey. 08102

Anchor Institution CR-Rutgers

Preferred Start Date 09/01/2020

Start as soon as possible. No

Project Urgency Already behind3Start date is flexible

Expected Project Duration (in months) {Empty}

Launch Presentation {Empty}

Launch Presentation Date {Empty}

Wrap Presentation {Empty}

Wrap Presentation Date {Empty}

Project Milestones {Empty}

Github Contributions {Empty}

Planned Portal Contributions (if any) {Empty}

Planned Publications (if any) {Empty}

What will the student learn? {Empty}

What will the mentee learn? {Empty}

What will the Cyberteam program learn from this project? Effort involved in recruiting and training junior-level research software engineers.

HPC resources needed to complete this project? {Empty}

Notes {Empty}

Final Report

What is the impact on the development of the principal discipline(s) of the project? {Empty}

What is the impact on other disciplines? {Empty}

Is there an impact physical resources that form infrastructure? {Empty}

Is there an impact on the development of human resources for research computing? {Empty}

Is there an impact on institutional resources that form infrastructure? {Empty}

Is there an impact on information resources that form infrastructure? {Empty}

Is there an impact on technology transfer? {Empty}

Is there an impact on society beyond science and technology? {Empty}

Lessons Learned {Empty}

Overall results {Empty}