Python – Twitter Data

This was a course project from Dec2022.

Purpose

The healthcare system in the United States has historically been regarded (at least by Americans) as best-in-class. However, in the wake of the COVID-19 pandemic, healthcare systems across the globe have come under more scrutiny than in the past – none more so than in the US. With the face of US Healthcare, Dr. Anthony Fauci, stepping down as chief medical advisor in December 2022, it is an important time to gauge public discourse on US healthcare via random samples on Twitter.

Methods

Data was collected using the “search recent tweets” function in the Tweepy library. 500 tweets using the hashtag “healthcare” or “UShealthcare” were collected for analysis on 12Dec2022. Data analysis and visualization was performed using Python (pandas, matplotlib, seaborn, and textblob libraries).

Hashtags Associated with Healthcare

Associated hashtags show a tech focus: healthtech, healthit, ai, data, cybersecurity, machinelearning, iot, artificialintellgience, ransomware, and blockchain were all common hashtags associated with healthcare on Twitter. Fewer hashtags were associated with news, breakingnews, covid, or fauci, which was shocking. The third major area of hashtags associated with healthcare was actually around circumventing healthcare systems: healthylifestyle, supplement/supplements, and selfcare were all common in the dataset. These aspects of healthcare circumvent healthcare systems altogether: the healthiest people are those outside of hospitals.

Tweet Sentiment by Polarity & Subjectivity

Polarity and subjectivity metrics were calculated from the text in each tweet. Tweets with low polarity and high subjectivity would be expected to have more negative sentiment, while tweets with high polarity an high subjectivity would be expected to have more positive sentiment. From the plot, there does not appear to be any meaningful trend between polarity/subjectivity and overall sentiment.

The highest representation in the dataset was at 0 subjectivity and 0 polarity, indicating most tweets were hyperlinks or non-text media (a tweet that’s only a hyperlink would get values of 0 for these metrics). Positive and Negative sentiments varied widely across the board. However, tweets with high subjectivity and high polarity generally corresponded more to Positive sentiments. Overall, the dataset skewed towards positive polarity and middling subjectivity.


Posted

in

by

Tags:

Comments

Leave a comment