Patterns in U.S. Political Discussions
  • Home
  • Code and Data
  • Authors
  • References
  1. NLP
  • Introduction
  • EDA
  • NLP
  • ML
  • Conclusion
  • Feedback Discussion

On this page

  • Executive summary
  • Analysis report
    • Comparison Between Reddit Sentiment and Presidential Job Approval Rates
    • Lexical Trends Analysis of Key Terms in Trump and Biden Related Reddit Submissions
    • Impact of Dominant Terms on Shaping Political Discussions on Reddit
    • Identifying the Most Extreme Subreddits in Political Discourse

Natural Language Processing

Executive summary

Sentiment analysis plays a critical role in understanding Reddit’s political discussion climate, revealing the emotional tone of conversations about Presidential candidates Donald Trump and Joe Biden. Across subreddits, most submissions exhibited negative sentiment, but the percentage of positive sentiment fluctuated over time and varied by subreddit. Submissions about Trump showed higher positive sentiment in subreddits aligned with his political party, and the reverse was true for Biden. By comparing sentiment trends with Presidential Job Approval Rates, we gained a nuanced understanding of these dynamics, noting that approval rates often aligned between the positive sentiment percentages of politically opposing subreddits.

We then analyzed text data to uncover patterns in language reflecting key themes in political discourse. Using CountVectorizer and Latent Dirichlet Allocation (LDA), we identified trends in word usage that formed distinct topics across subreddits. Cross-referencing these dominant terms with Google Trends confirmed their alignment with broader public interest and search behaviors. Further, calculating Term Frequency-Inverse Document Frequency (TF-IDF) scores highlighted the importance of specific words within individual subreddits and their influence on submission sentiment.

Finally, we synthesized these findings to assess the distinctiveness of each subreddit. By examining correlations between sentiment, scores, and post popularity, we found that r/Libertarian was the most distinct, while r/politics appeared the most moderate among the six subreddits analyzed.

Analysis report

To analyze Reddit sentiment, we concatenated submission titles and bodies, removed special characters and numbers, and applied PySpark NLP models.

Comparison Between Reddit Sentiment and Presidential Job Approval Rates

Our analysis began by looking at the sentiment distribution across subreddits. As illustrated in Figure 1, nearly 60% of Biden-related submissions had negative sentiment across all subreddits. Trump-related submissions displayed similar trends, except in r/Libertarian, where positive sentiments outweighed negatives, with fewer than 40% negative sentiments.

Figure 1: Sentiment Distribution by two candidates

Then we examined how sentiment trends evolved over time. As shown in Figure 2, the percentage of positive sentiments fluctuates significantly across subreddits. For example, r/politics consistently exhibits low positive sentiment, while r/Libertarian shows the most variability. These differences may relate to subreddit membership sizes, withr/politics having a much larger member base than r/Libertarian.

Figure 2: Postive Sentiments by subreddits over time

We then filtered submissions to focus on posts about each presidential candidate during specific periods. In Trump-related posts, r/Republican and r/Conservative generally showed higher positive sentiment compared to subreddits aligned with opposing views. Conversely, Biden-related posts exhibited higher positive sentiment in r/Democrats and r/Liberal. Notably, r/politics maintained lower positive sentiment for both candidates, while r/Libertarian showed pronounced peaks favoring Trump over Biden, as seen in Figure 3 and Figure 4.

Figure 3: Postive Sentiments on Trump related posts
Figure 4: Postive Sentiments on Biden related posts

The code used for this section is available here.

Lexical Trends Analysis of Key Terms in Trump and Biden Related Reddit Submissions

In addition to sentiment analysis, we employed three NLP/ML models—CountVectorizer, LDA, and TF-IDF—to explore lexical patterns in Reddit submissions. We processed the text by tokenizing, normalizing, and removing stopwords, ensuring clean data for word frequency analysis and topic modeling.

CountVectorizer

As seen in Figure 5, generic terms like case and court initially dominated word frequency counts across all subreddits. After filtering out these terms, subreddit-specific word trends emerged.

Figure 5: Most common words in all 6 subreddits

For instance, in Table 1, terms such as hunter and border were prominent in r/Conservative and r/Republican. In contrast, harris was most frequent in r/Democrats, reflecting Kamala Harris’s role as Vice President. Meanwhile, israel and war stood out in r/Libertarian, highlighting its unique discourse. Common terms like news and administration still appeared across most subreddits.

r/Conservative r/ Republican r/democrats r/Liberal r/Libertarian r/politics
0 hunter (6.08%) hunter (8.07%) harris (3.87%) news (4.73%) convention (6.6%) harris (2.88%)
1 border (2.67%) border (3.9%) voters (3.71%) voters (4.64%) israel (6.24%) poll (2.6%)
2 white (2.31%) report (2.51%) administration (3.34%) right (4.17%) war (4.81%) cnn (2.58%)
3 poll (2.14%) fbi (2.13%) news (3.03%) win (4.08%) right (4.28%) documents (2.47%)
4 administration (2.05%) white (2.07%) state (2.92%) democracy (3.8%) administration (4.1%) georgia (2.47%)
5 media (2.02%) news (2.06%) million (2.44%) voting (3.53%) want (3.92%) money (2.41%)
6 news (2.0%) doj (2.0%) white (2.14%) support (3.53%) hunter (3.92%) voters (2.41%)
7 voters (1.9%) illegal (1.94%) federal (2.11%) wins (3.43%) support (3.74%) presidential (2.41%)
8 state (1.89%) poll (1.82%) states (2.07%) want (3.43%) crimes (3.57%) news (2.4%)
9 report (1.85%) rally (1.73%) support (2.04%) legal (3.34%) billion (3.21%) classified (2.29%)

Table 1: Top 10 Word Counts in 6 subreddits

Latent Dirichlet Allocation (LDA)

In this part, we used LDA for our topic modeling to explore the underlying themes in the data and understand the distribution of terms across topics. We uncovered themes aligned with CountVectorizer findings. Key terms like hunter, border, and harris were grouped into distinct topics. The term israel, although most prevalent in r/Libertarian, also appeared in a topic alongside war and ukraine, suggesting a thematic focus on international issues.

Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
0 judge coming israel border court
1 case lies rally administration voters
2 hunter warns money texas democrats
3 trial house trial wall campaign
4 documents news war crisis house
5 classified epstein campaign student debate
6 special heard ukraine gaza poll
7 court country guilty policy supreme
8 order white hush plan harris
9 fbi hunter term netanyahu presidential

Table 2: Top 10 Words in 5 Topics

Google Trends

Since we found out dominant words formed a political discussion, we compared them with related topics and queries on Google Trends for Trump and Biden. Terms like hunter and israel appeared as related queries for Biden, while no other overlap was observed.

Trump Biden
0 debate (22) hunter (25)
1 united (17) united (18)
2 court (17) israel (13)
3 media (15) debate (13)
4 shot (14) impeachment (12)
5 trial (14) speech (10)
6 indictment (12) jill (10)
7 speaker (12) house (9)
8 mug (11) conference (9)
9 house (11) union (9)

Table 3: Google Trends related-search terms on Trump and Biden

The code used for this section is available here.

Impact of Dominant Terms on Shaping Political Discussions on Reddit

Figuring out some terms dominating the political discussion on those subreddits, we focuse on how important they are shaping the political discussion. We used TF-IDF to calculate the importance score of those terms.

Scores were highly skewed across subreddits. Most subreddits had an average score around 4, but in r/Liberal, scores were more evenly distributed in positive sentiment submissions. Conversely, in r/Libertarian, submissions with negative sentiment showed a smoother score distribution due to potential fewer positive submissions.

Figure 6: TF-IDF Score on term administration

Next, we checked out the term border and israel. As Table 4 and Table 5 show, these two words are very distinct on specific subreddit and sentiment. The term border had a higher average importance score in r/Democrats for negative sentiment submissions. Similarly, the term israel had a higher average score in r/politics for neutral sentiment submissions.

subreddit TF_IDF Score on ‘Border’
sentiment negative neutral positive
0 Conservative 4.400743 4.282261 4.429995
1 Liberal 4.096076 0 4.096076
2 Libertarian 4.096076 0 0
3 Republican 4.804039 4.973807 4.232611
4 democrats 6.085598 4.096076 4.096076
5 politics 4.242364 4.247782 4.159093

Table 4: TF_IDF Score on term ‘border’ by Sentiments and Subreddits

subreddit TF_IDF Score on “Israel”
sentiment negative neutral positive
0 Conservative 4.336109 0 4.173768
1 Liberal 5.266312 0 4.069423
2 Libertarian 5.634586 0 0
3 Republican 4.461657 0 4.069423
4 democrats 4.853890 4.069423 4.069423
5 politics 4.385980 8.138845 4.069424

Table 5: TF_IDF Score on term ‘israel’ by Sentiments and Subreddits

The code used for this section is available here.

Identifying the Most Extreme Subreddits in Political Discourse

Seeing the general negative sentiments trend, lexical trends, and importance scores on selected terms, we decided to see how extreme discussions these subreddits have. As we showed in Figure 1, Figure 3, and Figure 4, r/Libertarian has the most shifting sentiments over time with higher positive sentiments on Trump related submissions than on Biden related submissions. This may shaped their discussion topics as israel and war appeared in top 10 most frequent words in Table 1. On other terms, the subreddit has only negative sentiments on term border and israel showing their political discussion was very controversial.

Next thing we considered was how sentiments impact on submissions’ number of comments and popularity score. We found out that neutral sentiment submissions about Biden in r/Democrats garnered the highest comment counts. Conversely, Trump-related posts with negative sentiment attracted more comments in r/Conservative, r/Liberal, and r/Libertarian. Submissions in r/Republican had fewer comments overall. On thing to note here is that there was no comments on neutral sentiment submissions about Trump in r/Libertarian. This showed us that the subreddits may be more biploar than the other subreddits.

Figure 7: Average Number of Comments by sentiments and subreddits

When it comes to the score, submissions in r/Democrats consistently had the highest scores across all sentiment categories. Meanwhile, r/Libertarian showed again extreme variability, with Trump-related neutral sentiment submissions averaging a score of 1, compared to 89 for neutral Biden-related submissions.

Figure 8: Average Score by sentiments and subreddits

The code used for this section is available here.

Back to top
EDA
ML

Content 2024 by Marion Bauman, Brian Kwon, & Aaron Schwall
Created by Project Group 1 for DSAN 6000 at Georgetown University
All content licensed under a Creative Commons Attribution-NonCommercial 4.0 International license (CC BY-NC 4.0)

 

Made with , , and Quarto
View the source at GitHub