Since becoming Prime Minister in 2018, Scott Morrison has spoken publicly almost every day. Transcripts of all speeches, interviews, and press conferences are available on the Prime Minister’s website, which means we can analyse ScoMo’s language over this whole period (August 2018 - March 2021). This dataset includes 988 transcripts.
In this post I will:
- Compare frequently used words and phrases before and during the pandemic.
- Identify common topics using topic modeling and explore how these have changed over time.
- Identify words that ScoMo commonly uses together.
- Use sentiment analysis to see how ScoMo really feels about each of the states and territories.
As a complement to this post, I’ve also made a Shiny app that you can use to explore the dataset for yourself.
Approach
For this analysis, I primarily used the R package quanteda
, which makes analysing text fairly straightforward. I began by creating a corpus of documents. In this case, each document was the text of a speech, interview or press conference given by Scott Morrison. For transcripts where other people spoke (such as interviews), I included only Scott Morrison’s words.
The corpus is then processed into tokens, which can be single words or strings of words that are known as n-grams. You can then construct a document-feature matrix (DFM) that tells you how many times each token is used in each document. As we’ll see, this workflow allows you to explore the texts in many different ways.
What does ScoMo talk about (and not talk about) during the pandemic?
It can be hard to remember life before the pandemic. Using this dataset, we can take a look at how ScoMo’s language has changed by comparing transcripts from 2018 and 2019 to those from 2020 and 2021. This plot shows words that have had the biggest changes in frequency between the two periods:
Unsurprisingly, all of the words that are much more frequent since 2020 are directly or indirectly related to the pandemic. Conversely, we see much less talk about population growth, the economy, and the surplus than we did before 2020. Bill Shorten was mentioned frequently in 2018 and 2019but very rarely since, which makes sense as he’s no longer opposition leader.
We can use the same strategy to look at which words were used more or less frequently during different phases of the pandemic. Here we’re comparing transcripts from February 1 - August 30 2020 to those from September 1 2020 onwards:
Again, there are many terms related to the pandemic. There has been much less talk about the CovidSafe app and pandemic-related restrictions since September 2020 and a lot more talk about the vaccine and the recession.
We also see that the phrase ‘rule of law’ has been used much more frequently since September 2020. This phrase has been frequently invoked by Scott Morrison since March 2021 in rejecting calls for an independent investigation into allegations against cabinet minister Christian Porter.
What topics does ScoMo talk about?
I also used topic modeling to explore the different topics ScoMo talks about and how these have changed over time. The specific method I used was latent Dirichlet allocation, which is easily implemented in R using the seededlda
package. After some experimentation, I chose to have 18 topics.
Let’s have a look at the top terms associated with each topic to better understand what each topic is:
topic1 | topic2 | topic3 | topic4 | topic5 | topic6 |
---|---|---|---|---|---|
jobkeeper | travel | population | mental | emission | mate |
recession | coronavirus | zealand | mental_health | reduction | coast |
jobseeker | officer | migration | veteran | emission_reduction | stuff |
covid_recession | flight | visa | tasmania | climate | oh |
bring_forward | bank | faith | suicide | gas | neil |
design | qantas | religious | cancer | electricity | guy |
apprentice | dr | terrorist | violence | renewable | game |
employee | medical_officer | immigration | portfolio | reliable | ben |
manufacture | committee | attack | greg | reduction_target | gold |
employer | spread | citizen | young_people | emission_reduction_target | tourism |
young_people | package | threat | affair | beat | sport |
unemployment | chief_medical_officer | permanent | senator | kyoto | weekend |
topic7 | topic8 | topic9 | topic10 | topic11 | topic12 |
---|---|---|---|---|---|
man | space | fire | shorten | farmer | vaccine |
war | excite | disaster | bill_shorten | farm | victorian |
honour | western_sydney | emergency | vote | water | quarantine |
mr | airport | assistance | border_protection | rural | age_care |
john | manufacture | season | boat | town | outbreak |
israel | recycle | bushfire | nauru | michael | professor |
speaker | waste | volunteer | electricity | rural_and_regional | vaccination |
proud | steven | adf | liberal_party | resilience | trace |
memorial | science | firefighter | legislation | council | national_cabinet |
mr_speaker | defence_industry | commissioner | alan | grant | officer |
tim | plastic | flood | malcolm | assistance | victorian_government |
tonight | mine | request | electricity_price | household | paul |
topic13 | topic14 | topic15 | topic16 | topic17 | topic18 |
---|---|---|---|---|---|
royal | strategic | national_cabinet | strong_economy | digital | indigenous |
royal_commission | indo | restriction | bill_shorten | reform | gap |
age_care | indo_pacific | distance | family_business | agendum | indigenous_australian |
recommendation | japan | expert | choice | public_service | aboriginal |
abuse | island | app | shorten | cyber | strait |
police | engagement | social_distance | surplus | strategy | torres |
commissioner | papua | chief_minister | central | productivity | torres_strait |
office | sovereign | save | small_and_family | bank | islander |
leigh | fiji | principle | hospital | development | close_the_gap |
disability | indonesia | jobseeker | high_tax | strength | strait_islander |
conduct | stability | coronavirus | wage | especially | torres_strait_islander |
attorney | prosperity | livelihood | coast | goal | aboriginal_and_torres |
Impressively, LDA has managed to identify coherent topics just from the words used in each text. For example, topic 5 is clearly about climate change, with top terms like ‘climate’, ‘renewable’, and ‘emission’. Topic 1 is about the recession and the economy (‘jobkeeper’, ‘jobseeker’, and ‘recession’) and topic 9 is about bushfires.
This interactive plot shows the number of communications on each topic over time. If you hover over a line, the tooltip will show you the top ten terms associated with that topic.
While a lot of topics appear at a low frequency over the whole time period, there are some that show a more distinctive temporal pattern. For example, the bushfire topic occurs primarily in December 2019 and January 2020.
We also see a couple of topics that peak prior to the election in May 2019:
Topic 10 and topic 16 are clearly related to policy issues and criticism of the opposition. Both include ‘Bill Shorten’, but topic 10 includes ‘border protection’, ‘boat’ and ‘nauru’, whereas topic 16 includes terms like ‘strong economy’ and ‘high tax’.
Topic 6 also peaks before the election, and is probably best characterised as Scott Morrison trying to be likeable. This topic includes terms like ‘mate’, ‘game’, ‘weekend’, and ‘sport’. Indeed, we see a remarkable dropoff in ScoMo using the word ‘mate’ following the first few months of him becoming Prime Minister:
Interestingly, there are three topics related to the pandemic that have distinct profiles:
Topic 2 peaks in March 2020, which is around the time that the WHO officially declared the pandemic. This topic has several terms relating to border and travel restrictions, such as ‘travel’, ‘qantas’, and ‘flight’. Topic 15 peaks slightly lately and includes terms more related to domestic restrictions and guidelines, such as ‘national cabinet’, ‘social distance’, and ‘app’ (this refers to the CovidSafe app). Tellingly, mentions of ‘app’ drop off quite quickly.
Several months later we see a third pandemic-related topic appear. Topic 12 peaks in August 2020, which corresponds to the second wave that occurred in Victoria (hence the terms Victorian, and Victorian government). This topic also includes the terms ‘vaccine’ and ‘vaccination’, so this is still an active topic. Heartbreakingly, this final pandemic-related topic is the first that prominently includes the term ‘aged care’.
The three pandemic-related topics neatly capture different phases of the pandemic in Australia: the first phase where the seriousness of the situation was just becoming clear and the main focus was on closing the borders, the second phase where we begin to see more discussion of internal restrictions and best practices as well as hints of the economic effects of the pandemic (‘jobseeker’, ‘livelihood’), and the third phase corresponding to the Victorian second wave and then the vaccine rollout.
What does ScoMo really think about boats?
We can use this dataset to look at what words are most often used along with with a particular word or phrase we’re interested in. For example, here are words that ScoMo often uses within ten words of ‘boat’:
If you’ve been paying an attention to Australian politics for the past decade or so, you won’t be shocked to see that words like ‘stop’, ‘nauru’, and ‘illegal’ frequently come up when the Prime Minister mentions boats.
Which state or territory is ScoMo’s favourite?
Speaking as a Victorian, it can often seem that the Prime Minister plays favourites with certain states (cough cough NSW). I used sentiment analysis to look at the sentiment scores of sentences mentioning individual states and territories to see if ScoMo really does talk more positively about some states. I used the R package quanteda
to calculate a sentiment score for each sentence mentioning a state or territory (excluding sentences mentioning multiple states and territories). This score is based on how negative or positive the words in the sentence are, with a high sentiment score meaning that the sentiment is positive.
I should note that differences in sentiment scores don’t really tell us about bias because inevitably the events in each state play a big role in determining how positively ScoMo talks about them. Victoria, for example, had bushfires and then the second wave of COVID-19, so you’d expect the Prime Minister’s comments about Victoria to be fairly negative simply because it’s hard to talk about death, disease, and destruction in a positive way. Nonetheless, the results are interesting.
Just for fun, in this plot I’ve coloured the box plots corresponding to each state or territory based on the party its leader belongs to. ScoMo is of course is the leader of of the Liberal party.
The horizontal bars in the boxes show the median sentiment score for each state or territory and the boxes show the interquartile range. In general there’s not much difference in the median scores for sentences referencing each state, but it’s interesting to see that sentences mentioning NSW are generally more positive than those referring to the other states and territories.
Explore the dataset for yourself
This is such an interesting dataset that it’s impossible to do it justice with a single blog post. So, I also made a Shiny app called ScoMoSearch for you to explore it for yourself.
Resources
- All transcripts are are from the Prime Minister’s media centre.
- I used the R packages
quanteda
,sentimentr
, andseededlda
. I particularly recommend thequanteda
tutorials if you’d like to learn more about analysing text data. - Explore the ScoMo dataset with my ScoMoSearch Shiny app. After selecting a word, you can see its frequency over time, associated terms, sentiment scores, and example sentences.
- You can find the scripts I used to scrape, clean, and process the transcripts on GitLab.