Uppsala Computational Literary Studies Group

Illustration: Anna-Maria Hällgren

The computational literary studies group at Uppsala University (UCOL) was initiated in 2017 as a 2-year collaborative project (funded by UU), which included scholars in literature, the Scandinavian languages, and computational linguistics. The project was fruitful and in 2020 a more permanent research group was established.  

The main focus of UCOL is to deploy and develop computational methods for the investigation of Swedish literature and its contexts. Consequently, this includes a broad range of research questions, methods, and materials: from literary stylistics and quantitative approaches to narratives to sociology of literature and textual scholarship; from 19th century classics to contemporary popular fiction; from small or even single-novel corpora to large-scale datasets; from basic word counts and descriptive statistics to complex machine learning algorithms.

Research Projects

At the moment, the work in UCOL is focused on three larger research projects:

Example of streaming pattern data

"Patterns of Popularity: Towards a Holistic Understanding of Contemporary Bestselling Fiction" aims to investigate the most popular contemporary novels at scale, and through a combination of empirical approaches, covering digital text material (ebooks), contextual and book trade material, and reader consumption data. The ambition is to find out in what ways bestsellers stand out, and how formats such as the audiobook affect writing styles and narratives. The project includes a collaboration with Storytel that provides access to data points on real-time book consumption, a dataset that enable new ways to merge publishing studies and readership studies. 

Participants: Karl Berglund (PI), Mats Dahllöf
Duration: 2020–2023
Funder: Swedish Research Council

Illustration: Anna-Maria Hällgren

"Fictional Prose and Language Change: The Role of Colloquialization in the history of Swedish 1830–1930" aims to investigate if language change in Swedish in the 19th century was driven by fiction and its move towards naturalism (The Modern Breakthrough). Since it has been claimed that colloquialization first was expressed in fictional prose, the project focuses on stylistic variability in literary texts and investigates whether colloquial linguistic features have spread from dialogue to narrative by developing and using digital methods of corpus stylistics in large scale materials. The empirical point of departure is Litteraturbanken, a corpus of >4200 Swedish works from 1650 to 1940. 

Participants: David Håkansson (PI), Sara Stymne, Johan Svedjedal, Carin Östman
Duration: 2021–2023
Funder: Swedish Research Council

Illustration: Jenny Jansson

"The Astrid Lindgren Code: Accessing Astrid Lindgren’s shorthand manuscripts through handwritten text recognition, media history, and genetic criticism" explores a material previously untouched by research. It does so primarily through the combination of two digital methods: development and adaptation of algorithms for handwritten text recognition (HTR), and crowd/expert sourcing. The project utilises the joint competences of literary scholars, computer scientists, and professional stenographers to unlock the potential of Lindgren’s original drafts, enable a starting point for full digitalisation and transliteration of Lindgren’s original manuscripts, and provide a general vehicle for methodological development for analysis of handwritten documents.

Participants: Malin Nauwerck (PI), Karolina Andersdotter, Anders Hast, Raphaela Heil
Duration: 2020–2022
Funder: Riksbankens jubileumsfond (RJ)
Project website

ReCENT REsearch output

  • Karl Berglund & Sarah Allison (forthcoming). Larsson, Remade: A Computational Perspective on the Millennium Trilogy in English. (Conditionally accepted for PMLA, February 2022.)
  • Ann Steiner & Karl Berglund. Barnlitterära strömningar. Om ljudböcker för barn [Streams of Children’s Literature: On Audiobooks for Children], Stockholm: Swedish Publishers’ Association (80 pp.) [text]
  • Karl Berglund & Mats Dahllöf (2021). Audiobook Stylistics: Comparing Print and Audio in the Bestselling Segment. Journal of Cultural Analytics, vol. 11, pp. 1–30 [text]
  • Raphaela Heil, Malin Nauwerck & Anders Hast (2021). Shorthand Secrets: Deciphering Astrid Lindgren's stenographed drafts with HTR methodsProceedings of the 17h Ialian Research Conference on Digital Libraries (IRCDL). February 18–19, 2021, Padua, Italy. [text]
  • Karl Berglund (2021). Introducing the Beststreamer: Mapping Nuances in Digital Book Consumption at Scale. Publishing Research Quarterly 37, no. 2, pp. 135–151. [text]
  • Karl Berglund & Ann Steiner (2021). Is Backlist the New Frontlist? Large-Scale Data Analysis of Bestseller Book Consumption in Streaming Services. LOGOS: Journal of the World Publishing Community 32, no. 1, pp. 7–24. [text]
  • Johan Svedjedal (2021). Kassettboken kommer! Från Strindberg till Storytel – korskopplingar mellan ljud och litteratur. Julia Pennlert & Lars Ilshammar (eds.). Göteborg: Daidalos, 2021, pp. 17–30.
  • Malin Nauwerck (2021). Sagoberättaren, sekreteraren och den spelande linden. Muntlig poetik i Astrid Lindgrens ljudvärldar. Från Strindberg till Storytel – korskopplingar mellan ljud och litteratur. Julia Pennlert & Lars Ilshammar (eds.). Göteborg: Daidalos, 2021, pp. 197–232.
  • Karl Berglund (2021). Strömmade bästsäljare. Litteraturkonsumtion i digitala prenumerationstjänster utifrån Storytels användardata. Från Strindberg till Storytel – korskopplingar mellan ljud och litteratur. Julia Pennlert & Lars Ilshammar (eds.). Göteborg: Daidalos, 2021, pp. 327–362.
  • Karl Berglund (2020). Fjärrläsning. Datorstödda metoder för kvantitativ litteraturforskning. Litteraturvetenskap II, Sigrid Schottenius Cullhed, Andreas Hedberg & Johan Svedjedal (eds.). Lund: Studentlitteratur, pp. 227–246.
  • Sara Stymne & Carin Östman (2020). SLäNDa: An Annotated Corpus of Narrative and Dialogue in Swedish Literary Fiction. Twelth International Conference on Language Resources and Evaluation (LREC'20). May 13–15, 2020, Marseilles, France. [text]
  • Karl Berglund, Mats Dahllöf & Jerry Määttä (2019). Apples and Oranges? Large-Scale Thematic Comparisons of Contemporary Swedish Popular and Literary Fiction. Samlaren, vol. 140, pp. 228–260. [text]
  • Mats Dahllöf & Karl Berglund (2019). Faces, Fights, and Families: Topic Modeling and Gendered Themes in Two Corpora of Swedish Prose Fiction. DHN 2019 Copenhagen, Proceedings of 4th Conference of The Association Digital Humanities in the Nordic Countries. March 6-8 2019, Copenhagen, Denmark, pp. 92–111. [text]
  • David Håkansson & Carin Östman (2019). ”afbröt skolläraren ifrigt”: En diakron studie av anföringssatsen i svensk skönlitteratur. [“the teacher interrupted eagerly”: A Diachronic Study of the Speech-Tag in Swedish Fiction.] Samlaren, vol. 140, pp. 261–280. [text]
  • Sara Stymne, Johan Svedjedal & Carin Östman (2018). Språklig rytm i skönlitterär prosa. En fallstudie i Karin Boyes Kallocain. [Linguistic Rhythm in Narrative Prose: the case of Karin Boye’s Kallocain.] Samlaren, vol. 139, pp. 128–161. [text]


Research Group

Affiliated researchers

  • Sarah Allison (Department of English, Loyola University, New Orleans, USA)
  • Karina van Dalen-Oskam (Department of Literary Studies, Huygens Institute and University of Amsterdam, the Netherlands)

Last modified: 2022-05-30