Capta

Hypothesis / thesis / research question

What can sentence topic models help us discover about social media posts in Canaima National Park?

Approach

My approach consists in reframing my work as a construction of a topic model where sentences, instead of words, are the topics. Previous work, such as that done by Balikas et al. (2016), has been aimed at improving LDA so that it is more sensitive to context contained in sentences. My work joins this discussion not only by considering the sensitivity of sentence-based context, but by also enriching topics themselves by including sentences (or texts of posts).

To do this, my sentence-based topic model assumes the following in contrast to LDA:

LDA (Blei 2013) My approach
Each topic is a distribution of words Each topic is a distribution of posts.
Each document is a mixture of corpus-wide topics Each document is a mixture of corpus-wide topics.
Each word is drawn from one of these topics. Each post is drawn from these topics.

Topics are collections of posts/sentences. In particular:

Sentence embeddings

Fundamental assumption (The distributional hypothesis): “Words that occur in similar contexts have similar meaning.” (Jurafsky & Martin 2020)