Detecting Covid-19 Misinformation on Social Media

Submission information

Submission Number: 148

Submission ID: 408

Submission UUID: 135ffcba-0c6f-4434-a690-3ded031c7f5e

Submission URI: /form/project

Created: Tue, 08/09/2022 - 15:59

Completed: Tue, 08/09/2022 - 16:07

Changed: Fri, 04/14/2023 - 06:37

Remote IP address: 130.215.45.247

Submitted by: Gaurav Khanna

Language: English

Is draft: No

Webform: Project

Project Title Detecting Covid-19 Misinformation on Social Media

Program CAREERS

Project Image

Tags ai (271), bash (242), batch-jobs (76), big-data (4), biology (515), cuda (222)

Status Complete

Project Leader

Project Leader Suhong Li

Email sli@bryant.edu

Mobile Phone {Empty}

Work Phone {Empty}

Project Personnel

Mentor(s) Suhong Li

Student-facilitator(s) Jason Michaud

Mentee(s) {Empty}

Project Information

Project Description The ongoing pandemic has heightened the need for developing tools to flag COVID-19-related misinformation on the internet, specifically on social media such as Twitter. This project is based on 1.6 billion covid-19 tweets that were collected between March 2020 and May 2022. The project focuses on developing a machine learning model to detect covid-19 related misinformation. In addition, the validated model will be applied to all covid-19 tweets to further understand misinformation. For example, who are distributing covid-19 misinformation? How is the misinformation travelled over social media? what are the main topics of the misinformation?, and how does the misinformation differ by time and by location?

The student will work on this project from start to finish using various data analytic methodology including data exploration, topic modelling, natural language processing and machine learning. More specifically, in the context of an RCF skillset, the student will gain experience with accessing a remote computational system, setting up jobs in an HPC environment, working with queuing systems, performing file I/O with remote systems etc.

Note: This is a follow-on project from a previous project led by the same PI and RCF Brenna Rojek that ran in Spring 2022. The current RCF will leverage the tools and workflow that was developed by Brenna and develop it further.

Project Information Subsection

Project Deliverables {Empty}

Student Research Computing Facilitator Profile {Empty}

Mentee Research Computing Profile {Empty}

Student Facilitator Programming Skill Level Some hands-on experience

Mentee Programming Skill Level {Empty}

Project Institution Bryant University

Project Address Rhode Island

Anchor Institution CR-University of Rhode Island

Preferred Start Date 10/01/2022

Start as soon as possible. No

Project Urgency Already behind5Start date is flexible

Expected Project Duration (in months) 6

Launch Presentation

Careers.project.launch.pptx (2.05 MB)

Launch Presentation Date {Empty}

Wrap Presentation

Detecting Fake News Wrap Presentation.pptx (3.26 MB)

Wrap Presentation Date 04/12/2023

Project Milestones

Milestone Title: Milestone #1
Milestone Description: Student review relevant literature and learns about ML, NLP and other needed libraries/packages; launch presentation.
Completion Date Goal: 2022-10-01
Actual Completion Date: 2022-11-01
Milestone Title: Milestone #2
Milestone Description: Student reviews the twitter data set and formats the data for use by the ML, NLP, etc. software.
Completion Date Goal: 2022-11-01
Actual Completion Date: 2022-12-01
Milestone Title: Milestone #3
Milestone Description: Student performs extensive analysis of the formatted data using ML and NLP techniques. Specific tasks
• Build a machine learning model to detect covid-19 misinformation
• Using the validated model to make prediction to all tweets and evaluate the following questions:
• who are distributing covid-19 misinformation?
• How is the misinformation travelled over social media?
• what are the main topics of the misinformation?
• How does the misinformation differ by time and by location?

Completion Date Goal: 2022-12-01
Actual Completion Date: 2023-02-01
Milestone Title: Milestone #4
Milestone Description: Student works with faculty to interpret the results and writes a report.
Completion Date Goal: 2023-02-01
Actual Completion Date: 2023-03-01
Milestone Title: Milestone #5
Milestone Description: The student presents the results in a poster or a Zoom presentation. Student submit the project to a conference; wrap presentation

Completion Date Goal: 2023-03-01
Actual Completion Date: 2023-03-31

Github Contributions {Empty}

Planned Portal Contributions (if any) {Empty}

Planned Publications (if any) {Empty}

What will the student learn? {Empty}

What will the mentee learn? {Empty}

What will the Cyberteam program learn from this project? {Empty}

HPC resources needed to complete this project? {Empty}

Notes {Empty}

Final Report

What is the impact on the development of the principal discipline(s) of the project? This project built a machine learning model to predict fake news related to Covid-19. It applied the model to covid-19 tweets in three countries (United States, UK and India) to detect fake news in each country. The project also applied topic modelling to find dominant topics in fake news in each country.

What is the impact on other disciplines? This project contributes to our knowledge in the field of communication and health care. This project built a machine learning model to predict covid-19 misinformation and can be used to detect fake news in Twitter. In addition, the study deepens our understanding of dominant topics of covid-19 misinformation in social media and how it differs by country. The result can be helpful in detecting and preventing the spread of misinformation on social media.

Is there an impact physical resources that form infrastructure? None

Is there an impact on the development of human resources for research computing? The RCF developed strong awareness of opportunities and experiences involved in research computing -- something the student was completely unaware of previously.

The student involved learned to use High Performance Cluster and request proper resources needed. In addition, the student learned to run batch jobs when dealing with high volume of data. He plans to organize his code and share his code with the public so that more people can benefit from this experience.

Is there an impact on institutional resources that form infrastructure? None.

Is there an impact on information resources that form infrastructure? None.

Is there an impact on technology transfer? None.

Is there an impact on society beyond science and technology? As mentioned previously, this project is helpful in detecting and preventing the spread of misinformation on social media and will reduce potential negative impact of social media on society.

Lessons Learned The student working on this project was able to learn start-of-art natural language processing algorithms, learn to use GPU cluster, and run batch job. However, some jobs still took about more than 24 hours to run. A better approach needs to be developed to scale the data better in the future

Overall results The project trained a model to predict fake news and apply the model to covid-19 tweets collected between March 2020 and May 2022 in three countries (USA, UK and India). The results of topic modelling show the dominant topics in fake news in the US are related to Covid Symptom, Politics, Covid Treatment and Cases /Lock-down, the dominant topics in real news are Mask Mandate/Social Distancing, Covid Statistic, and Politics. The model has trouble distinguish between fake news and real news for India dataset due to limited training data available for that country.