Since becoming Prime Minister in 2018, Scott Morrison has spoken publicly almost every day. Transcripts of all speeches, interviews, and press conferences are available on the Prime Minister’s website, which means we can analyse ScoMo’s language over this whole period (August 2018 - March 2021). This dataset includes 988 transcripts.

In this post I will:

  • Compare frequently used words and phrases before and during the pandemic.
  • Identify common topics using topic modeling and explore how these have changed over time.
  • Identify words that ScoMo commonly uses together.
  • Use sentiment analysis to see how ScoMo really feels about each of the states and territories.

As a complement to this post, I’ve also made a Shiny app that you can use to explore the dataset for yourself.

Approach

For this analysis, I primarily used the R package quanteda, which makes analysing text fairly straightforward. I began by creating a corpus of documents. In this case, each document was the text of a speech, interview or press conference given by Scott Morrison. For transcripts where other people spoke (such as interviews), I included only Scott Morrison’s words.

The corpus is then processed into tokens, which can be single words or strings of words that are known as n-grams. You can then construct a document-feature matrix (DFM) that tells you how many times each token is used in each document. As we’ll see, this workflow allows you to explore the texts in many different ways.

What does ScoMo talk about (and not talk about) during the pandemic?

It can be hard to remember life before the pandemic. Using this dataset, we can take a look at how ScoMo’s language has changed by comparing transcripts from 2018 and 2019 to those from 2020 and 2021. This plot shows words that have had the biggest changes in frequency between the two periods:

Unsurprisingly, all of the words that are much more frequent since 2020 are directly or indirectly related to the pandemic. Conversely, we see much less talk about population growth, the economy, and the surplus than we did before 2020. Bill Shorten was mentioned frequently in 2018 and 2019but very rarely since, which makes sense as he’s no longer opposition leader.

We can use the same strategy to look at which words were used more or less frequently during different phases of the pandemic. Here we’re comparing transcripts from February 1 - August 30 2020 to those from September 1 2020 onwards:

Again, there are many terms related to the pandemic. There has been much less talk about the CovidSafe app and pandemic-related restrictions since September 2020 and a lot more talk about the vaccine and the recession.

We also see that the phrase ‘rule of law’ has been used much more frequently since September 2020. This phrase has been frequently invoked by Scott Morrison since March 2021 in rejecting calls for an independent investigation into allegations against cabinet minister Christian Porter.

What topics does ScoMo talk about?

I also used topic modeling to explore the different topics ScoMo talks about and how these have changed over time. The specific method I used was latent Dirichlet allocation, which is easily implemented in R using the seededlda package. After some experimentation, I chose to have 18 topics.

Let’s have a look at the top terms associated with each topic to better understand what each topic is:

topic1 topic2 topic3 topic4 topic5 topic6
jobkeeper travel population mental emission mate
recession coronavirus zealand mental_health reduction coast
jobseeker officer migration veteran emission_reduction stuff
covid_recession flight visa tasmania climate oh
bring_forward bank faith suicide gas neil
design qantas religious cancer electricity guy
apprentice dr terrorist violence renewable game
employee medical_officer immigration portfolio reliable ben
manufacture committee attack greg reduction_target gold
employer spread citizen young_people emission_reduction_target tourism
young_people package threat affair beat sport
unemployment chief_medical_officer permanent senator kyoto weekend
topic7 topic8 topic9 topic10 topic11 topic12
man space fire shorten farmer vaccine
war excite disaster bill_shorten farm victorian
honour western_sydney emergency vote water quarantine
mr airport assistance border_protection rural age_care
john manufacture season boat town outbreak
israel recycle bushfire nauru michael professor
speaker waste volunteer electricity rural_and_regional vaccination
proud steven adf liberal_party resilience trace
memorial science firefighter legislation council national_cabinet
mr_speaker defence_industry commissioner alan grant officer
tim plastic flood malcolm assistance victorian_government
tonight mine request electricity_price household paul
topic13 topic14 topic15 topic16 topic17 topic18
royal strategic national_cabinet strong_economy digital indigenous
royal_commission indo restriction bill_shorten reform gap
age_care indo_pacific distance family_business agendum indigenous_australian
recommendation japan expert choice public_service aboriginal
abuse island app shorten cyber strait
police engagement social_distance surplus strategy torres
commissioner papua chief_minister central productivity torres_strait
office sovereign save small_and_family bank islander
leigh fiji principle hospital development close_the_gap
disability indonesia jobseeker high_tax strength strait_islander
conduct stability coronavirus wage especially torres_strait_islander
attorney prosperity livelihood coast goal aboriginal_and_torres

Impressively, LDA has managed to identify coherent topics just from the words used in each text. For example, topic 5 is clearly about climate change, with top terms like ‘climate’, ‘renewable’, and ‘emission’. Topic 1 is about the recession and the economy (‘jobkeeper’, ‘jobseeker’, and ‘recession’) and topic 9 is about bushfires.

This interactive plot shows the number of communications on each topic over time. If you hover over a line, the tooltip will show you the top ten terms associated with that topic.

While a lot of topics appear at a low frequency over the whole time period, there are some that show a more distinctive temporal pattern. For example, the bushfire topic occurs primarily in December 2019 and January 2020.

We also see a couple of topics that peak prior to the election in May 2019:

Topic 10 and topic 16 are clearly related to policy issues and criticism of the opposition. Both include ‘Bill Shorten’, but topic 10 includes ‘border protection’, ‘boat’ and ‘nauru’, whereas topic 16 includes terms like ‘strong economy’ and ‘high tax’.

Topic 6 also peaks before the election, and is probably best characterised as Scott Morrison trying to be likeable. This topic includes terms like ‘mate’, ‘game’, ‘weekend’, and ‘sport’. Indeed, we see a remarkable dropoff in ScoMo using the word ‘mate’ following the first few months of him becoming Prime Minister:

Interestingly, there are three topics related to the pandemic that have distinct profiles:

Topic 2 peaks in March 2020, which is around the time that the WHO officially declared the pandemic. This topic has several terms relating to border and travel restrictions, such as ‘travel’, ‘qantas’, and ‘flight’. Topic 15 peaks slightly lately and includes terms more related to domestic restrictions and guidelines, such as ‘national cabinet’, ‘social distance’, and ‘app’ (this refers to the CovidSafe app). Tellingly, mentions of ‘app’ drop off quite quickly.

Several months later we see a third pandemic-related topic appear. Topic 12 peaks in August 2020, which corresponds to the second wave that occurred in Victoria (hence the terms Victorian, and Victorian government). This topic also includes the terms ‘vaccine’ and ‘vaccination’, so this is still an active topic. Heartbreakingly, this final pandemic-related topic is the first that prominently includes the term ‘aged care’.

The three pandemic-related topics neatly capture different phases of the pandemic in Australia: the first phase where the seriousness of the situation was just becoming clear and the main focus was on closing the borders, the second phase where we begin to see more discussion of internal restrictions and best practices as well as hints of the economic effects of the pandemic (‘jobseeker’, ‘livelihood’), and the third phase corresponding to the Victorian second wave and then the vaccine rollout.

What does ScoMo really think about boats?

We can use this dataset to look at what words are most often used along with with a particular word or phrase we’re interested in. For example, here are words that ScoMo often uses within ten words of ‘boat’:

If you’ve been paying an attention to Australian politics for the past decade or so, you won’t be shocked to see that words like ‘stop’, ‘nauru’, and ‘illegal’ frequently come up when the Prime Minister mentions boats.

Which state or territory is ScoMo’s favourite?

Speaking as a Victorian, it can often seem that the Prime Minister plays favourites with certain states (cough cough NSW). I used sentiment analysis to look at the sentiment scores of sentences mentioning individual states and territories to see if ScoMo really does talk more positively about some states. I used the R package quanteda to calculate a sentiment score for each sentence mentioning a state or territory (excluding sentences mentioning multiple states and territories). This score is based on how negative or positive the words in the sentence are, with a high sentiment score meaning that the sentiment is positive.

I should note that differences in sentiment scores don’t really tell us about bias because inevitably the events in each state play a big role in determining how positively ScoMo talks about them. Victoria, for example, had bushfires and then the second wave of COVID-19, so you’d expect the Prime Minister’s comments about Victoria to be fairly negative simply because it’s hard to talk about death, disease, and destruction in a positive way. Nonetheless, the results are interesting.

Just for fun, in this plot I’ve coloured the box plots corresponding to each state or territory based on the party its leader belongs to. ScoMo is of course is the leader of of the Liberal party.

The horizontal bars in the boxes show the median sentiment score for each state or territory and the boxes show the interquartile range. In general there’s not much difference in the median scores for sentences referencing each state, but it’s interesting to see that sentences mentioning NSW are generally more positive than those referring to the other states and territories.

Explore the dataset for yourself

This is such an interesting dataset that it’s impossible to do it justice with a single blog post. So, I also made a Shiny app called ScoMoSearch for you to explore it for yourself.

Resources