NLP Analysis of Covid-19 Related Tweets
With the unprecedented impact of the COVID-19 pandemic, it has become crucial to assess the effectiveness of government responses and public sentiment. Twitter, with its vast volume of pandemic-related discourse, provides a valuable dataset for analysis. This project is aimed to use NLP to analyze COVID-19 related tweets and propose possible actions for governments.
Exploratory Data Analysis

We scraped over 20 million tweets related to COVID-19 from March to June 2021 using Python. Initially, we generated N-gram word clouds to gain a general overview of the public response to COVID-19. With President Trump controversially referring to COVID-19 as “the Chinese Virus” in recent events, there has been increased discussion about China. Additionally, the term ‘Black people’ emerged as a significant topic in the tweets.

Surprisingly, we did not find a relationship between the states with the highest positive cases and the states with the highest tweet frequency. Thus, we would like to explore possible relationships between the sentiment of tweets in each state. We ran sentiment analysis and generated sentiment score for each states.

States depicted in darker colors indicate that the tweets from those states reflect a positive attitude towards COVID-19. Interestingly, these states are all in the northern or eastern parts of the U.S., which may suggest a correlation between tweet positivity and the political inclinations of the states.
Government Actions
We believe that stay-at-home order and public coverings requirements will help slow down the average positive increase and positivity increase.

Summarizing the analysis above, we decided to choose New York, California, Florida, Taxes, South Carolina, and North Dakota to further explore the relationship between government policy, tweets, and COVID-19 cases using sentiment analysis and topic modeling.
Sentiment Analysis and Topic Modeling
Sentiment analysis refers to the use of natural language processing, text mining, and computational linguistics to identify and extract subjective information in the original material. People showed a generally weakly positive attitude toward COVID-19. About half of the tweets (45.69%) show a neutral attitude, more tweets show a positive attitude (32.29% in total ), and only less than onequarter of tweets show a negative sentiment.


Among four states, New York has a higher percentage with positive tweets (32.95%) while Florida has the lowest percentage with positive tweets (31.51%). We can see a dramatic increase in the percentage of positive tweets towards the two policies in each state, in general, people have a more active response to the government policies. In New York, up to 40.17% of tweets show a positive attitude towards the stay-at-home order. However, when we take a look at the differences between the two policies, we see a generally less active response to wearing masks compared with the stay-at-home order. This is in line with what we observe in daily life and it also reflects the fact that Americans are less accustomed to or willing to wear masks.
We focus the analysis on three politicians who are active and leading government figures in COVID-19 related news, namely, Speaker Nancy Pelosi, President Trump, and the director of the National Institute of Allergy and Infectious Diseases (NIAID), Dr. Fauci. The model used for topic modeling is LDA, and we select the “best number of topics” parameter by reviewing the Intertopic Distance Map and the most relevant terms chart so that each topic is discriminant.

