Detecting Covid-19 Misinformation on Social Media
Submission navigation links for Project
Submission information
Submission Number: 148
Submission ID: 408
Submission UUID: 135ffcba-0c6f-4434-a690-3ded031c7f5e
Submission URI: /form/project
Created: Tue, 08/09/2022 - 15:59
Completed: Tue, 08/09/2022 - 16:07
Changed: Fri, 04/14/2023 - 06:37
Remote IP address: 130.215.45.247
Submitted by: Gaurav Khanna
Language: English
Is draft: No
Webform: Project
Detecting Covid-19 Misinformation on Social Media

Complete
Project Leader
Project Personnel
Project Information
The ongoing pandemic has heightened the need for developing tools to flag COVID-19-related misinformation on the internet, specifically on social media such as Twitter. This project is based on 1.6 billion covid-19 tweets that were collected between March 2020 and May 2022. The project focuses on developing a machine learning model to detect covid-19 related misinformation. In addition, the validated model will be applied to all covid-19 tweets to further understand misinformation. For example, who are distributing covid-19 misinformation? How is the misinformation travelled over social media? what are the main topics of the misinformation?, and how does the misinformation differ by time and by location?
The student will work on this project from start to finish using various data analytic methodology including data exploration, topic modelling, natural language processing and machine learning. More specifically, in the context of an RCF skillset, the student will gain experience with accessing a remote computational system, setting up jobs in an HPC environment, working with queuing systems, performing file I/O with remote systems etc.
Note: This is a follow-on project from a previous project led by the same PI and RCF Brenna Rojek that ran in Spring 2022. The current RCF will leverage the tools and workflow that was developed by Brenna and develop it further.
The student will work on this project from start to finish using various data analytic methodology including data exploration, topic modelling, natural language processing and machine learning. More specifically, in the context of an RCF skillset, the student will gain experience with accessing a remote computational system, setting up jobs in an HPC environment, working with queuing systems, performing file I/O with remote systems etc.
Note: This is a follow-on project from a previous project led by the same PI and RCF Brenna Rojek that ran in Spring 2022. The current RCF will leverage the tools and workflow that was developed by Brenna and develop it further.
Project Information Subsection
{Empty}
{Empty}
{Empty}
{Empty}
Some hands-on experience
{Empty}
Bryant University
Rhode Island
CR-University of Rhode Island
10/01/2022
No
Already behind5Start date is flexible
6
Careers.project.launch.pptx
(2.05 MB)
{Empty}
04/12/2023
- Milestone Title: Milestone #1
Milestone Description: Student review relevant literature and learns about ML, NLP and other needed libraries/packages; launch presentation.
Completion Date Goal: 2022-10-01
Actual Completion Date: 2022-11-01 - Milestone Title: Milestone #2
Milestone Description: Student reviews the twitter data set and formats the data for use by the ML, NLP, etc. software.
Completion Date Goal: 2022-11-01
Actual Completion Date: 2022-12-01 - Milestone Title: Milestone #3
Milestone Description: Student performs extensive analysis of the formatted data using ML and NLP techniques. Specific tasks
• Build a machine learning model to detect covid-19 misinformation
• Using the validated model to make prediction to all tweets and evaluate the following questions:
• who are distributing covid-19 misinformation?
• How is the misinformation travelled over social media?
• what are the main topics of the misinformation?
• How does the misinformation differ by time and by location?
Completion Date Goal: 2022-12-01
Actual Completion Date: 2023-02-01 - Milestone Title: Milestone #4
Milestone Description: Student works with faculty to interpret the results and writes a report.
Completion Date Goal: 2023-02-01
Actual Completion Date: 2023-03-01 - Milestone Title: Milestone #5
Milestone Description: The student presents the results in a poster or a Zoom presentation. Student submit the project to a conference; wrap presentation
Completion Date Goal: 2023-03-01
Actual Completion Date: 2023-03-31
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
Final Report
This project built a machine learning model to predict fake news related to Covid-19. It applied the model to covid-19 tweets in three countries (United States, UK and India) to detect fake news in each country. The project also applied topic modelling to find dominant topics in fake news in each country.
This project contributes to our knowledge in the field of communication and health care. This project built a machine learning model to predict covid-19 misinformation and can be used to detect fake news in Twitter. In addition, the study deepens our understanding of dominant topics of covid-19 misinformation in social media and how it differs by country. The result can be helpful in detecting and preventing the spread of misinformation on social media.
None
The RCF developed strong awareness of opportunities and experiences involved in research computing -- something the student was completely unaware of previously.
The student involved learned to use High Performance Cluster and request proper resources needed. In addition, the student learned to run batch jobs when dealing with high volume of data. He plans to organize his code and share his code with the public so that more people can benefit from this experience.
The student involved learned to use High Performance Cluster and request proper resources needed. In addition, the student learned to run batch jobs when dealing with high volume of data. He plans to organize his code and share his code with the public so that more people can benefit from this experience.
None.
None.
None.
As mentioned previously, this project is helpful in detecting and preventing the spread of misinformation on social media and will reduce potential negative impact of social media on society.
The student working on this project was able to learn start-of-art natural language processing algorithms, learn to use GPU cluster, and run batch job. However, some jobs still took about more than 24 hours to run. A better approach needs to be developed to scale the data better in the future
The project trained a model to predict fake news and apply the model to covid-19 tweets collected between March 2020 and May 2022 in three countries (USA, UK and India). The results of topic modelling show the dominant topics in fake news in the US are related to Covid Symptom, Politics, Covid Treatment and Cases /Lock-down, the dominant topics in real news are Mask Mandate/Social Distancing, Covid Statistic, and Politics. The model has trouble distinguish between fake news and real news for India dataset due to limited training data available for that country.