NLP Analysis

NLP Analysis of Covid-19 Related Tweets

With the unprecedented impact of the COVID-19 pandemic, it has become crucial to assess the effectiveness of government responses and public sentiment. Twitter, with its vast volume of pandemic-related discourse, provides a valuable dataset for analysis. This project is aimed to use NLP to analyze COVID-19 related tweets and propose possible actions for governments.

Exploratory Data Analysis

We scraped over 20 million tweets related to COVID-19 from March to June 2021 using Python. Initially, we generated N-gram word clouds to gain a general overview of the public response to COVID-19. With President Trump controversially referring to COVID-19 as “the Chinese Virus” in recent events, there has been increased discussion about China. Additionally, the term ‘Black people’ emerged as a significant topic in the tweets.

Surprisingly, we did not find a relationship between the states with the highest positive cases and the states with the highest tweet frequency. Thus, we would like to explore possible relationships between the sentiment of tweets in each state. We ran sentiment analysis and generated sentiment score for each states.

States depicted in darker colors indicate that the tweets from those states reflect a positive attitude towards COVID-19. Interestingly, these states are all in the northern or eastern parts of the U.S., which may suggest a correlation between tweet positivity and the political inclinations of the states.

Government Actions

We believe that stay-at-home order and public coverings requirements will help slow down the average positive increase and positivity increase.

Summarizing the analysis above, we decided to choose New York, California, Florida, Taxes, South Carolina, and North Dakota to further explore the relationship between government policy, tweets, and COVID-19 cases using sentiment analysis and topic modeling.

Sentiment Analysis and Topic Modeling

Sentiment analysis refers to the use of natural language processing, text mining, and computational linguistics to identify and extract subjective information in the original material. People showed a generally weakly positive attitude toward COVID-19. About half of the tweets (45.69%) show a neutral attitude, more tweets show a positive attitude (32.29% in total ), and only less than onequarter of tweets show a negative sentiment.

Among four states, New York has a higher percentage with positive tweets (32.95%) while Florida has the lowest percentage with positive tweets (31.51%). We can see a dramatic increase in the percentage of positive tweets towards the two policies in each state, in general, people have a more active response to the government policies. In New York, up to 40.17% of tweets show a positive attitude towards the stay-at-home order. However, when we take a look at the differences between the two policies, we see a generally less active response to wearing masks compared with the stay-at-home order. This is in line with what we observe in daily life and it also reflects the fact that Americans are less accustomed to or willing to wear masks.

We focus the analysis on three politicians who are active and leading government figures in COVID-19 related news, namely, Speaker Nancy Pelosi, President Trump, and the director of the National Institute of Allergy and Infectious Diseases (NIAID), Dr. Fauci. The model used for topic modeling is LDA, and we select the “best number of topics” parameter by reviewing the Intertopic Distance Map and the most relevant terms chart so that each topic is discriminant.

We see how sentiments toward different government figures vary across different states. The darker the color on the map, the stronger the average sentiments of each state. The lighter the color, the more neutral the average sentiments.

Recommendations

1. After analyzing Positivity Increase Change Rate and Average Total Positive Change Rate, we find that wearing a mask and implementing the stay-at-home order greatly slow down the speed people get infected.

2. According to the sentiment analysis above, we find people generally have weakly positive attitudes towards COVID-19. Moreover, people have more active responses to government policies, especially for stay-at-home order compared to wearing masks in public. States that are in the north or east part of the U.S. are more actively following the government policies. Also, combined with the COVID-19 confirmed diagnostic rate, we find that actively following government policies will result in a lower confirmed diagnostic rate.

3. In our politician analysis, we discover that Speaker Pelosi’s relief plan is slightly negatively received by most states. Dr. Fauci is linked to multiple topics related to science expertise but receives polar sentiments. Trump generates the most tweets about him, but those tweets do not fall into distinguishable topics of specific policies. This may be because he rarely talks about the details of a policy, and mostly rely on simple words in his language. To our surprise, he has a mostly neutral sentiment from U.S. states.

4. Using the LDA model on COVID-19 tweets, we are able to monitor the public’s response to political figures, as well as extract topics and key terms discussed by the public. The model and process we create can help government offices extract public opinions quickly when mass polling is not viable.