Uppsala Computational Literary Studies Group
The computational literary studies group at Uppsala University (UCOL) was initiated in 2017 as a 2-year collaborative project (funded by UU), which included scholars in literature, the Scandinavian languages, and computational linguistics. The project was fruitful and in 2020 a more permanent research group was established.
The main focus of UCOL is to deploy and develop computational methods for the investigation of Swedish literature and its contexts. Consequently, this includes a broad range of research questions, methods, and materials: from literary stylistics and quantitative approaches to narratives to sociology of literature and textual scholarship; from 19th century classics to contemporary popular fiction; from small or even single-novel corpora to large-scale datasets; from basic word counts and descriptive statistics to complex machine learning algorithms.
At the moment, the work in UCOL is focused on three larger research projects:
"Patterns of Popularity: Towards a Holistic Understanding of Contemporary Bestselling Fiction" aims to investigate the most popular contemporary novels at scale, and through a combination of empirical approaches, covering digital text material (ebooks), contextual and book trade material, and reader consumption data. The ambition is to find out in what ways bestsellers stand out, and how formats such as the audiobook affect writing styles and narratives. The project includes a collaboration with Storytel that provides access to data points on real-time book consumption, a dataset that enable new ways to merge publishing studies and readership studies.
Participants: Karl Berglund (PI), Mats Dahllöf
Funder: Swedish Research Council
"Fictional Prose and Language Change: The Role of Colloquialization in the history of Swedish 1830–1930" aims to investigate if language change in Swedish in the 19th century was driven by fiction and its move towards naturalism (The Modern Breakthrough). Since it has been claimed that colloquialization first was expressed in fictional prose, the project focuses on stylistic variability in literary texts and investigates whether colloquial linguistic features have spread from dialogue to narrative by developing and using digital methods of corpus stylistics in large scale materials. The empirical point of departure is Litteraturbanken, a corpus of >4200 Swedish works from 1650 to 1940.
Participants: David Håkansson (PI), Sara Stymne, Johan Svedjedal, Carin Östman
Funder: Swedish Research Council
"The Astrid Lindgren Code: Accessing Astrid Lindgren’s shorthand manuscripts through handwritten text recognition, media history, and genetic criticism" explores a material previously untouched by research. It does so primarily through the combination of two digital methods: development and adaptation of algorithms for handwritten text recognition (HTR), and crowd/expert sourcing. The project utilises the joint competences of literary scholars, computer scientists, and professional stenographers to unlock the potential of Lindgren’s original drafts, enable a starting point for full digitalisation and transliteration of Lindgren’s original manuscripts, and provide a general vehicle for methodological development for analysis of handwritten documents.
Participants: Malin Nauwerck (PI), Karolina Andersdotter, Anders Hast, Raphaela Heil
Funder: Riksbankens jubileumsfond (RJ)
ReCENT REsearch output
- Karl Berglund & Sarah Allison (forthcoming). Larsson, Remade: A Computational Perspective on the Millennium Trilogy in English. (Conditionally accepted for PMLA, February 2022.)
- Ann Steiner & Karl Berglund. Barnlitterära strömningar. Om ljudböcker för barn [Streams of Children’s Literature: On Audiobooks for Children], Stockholm: Swedish Publishers’ Association (80 pp.) [text]
- Karl Berglund & Mats Dahllöf (2021). Audiobook Stylistics: Comparing Print and Audio in the Bestselling Segment. Journal of Cultural Analytics, vol. 11, pp. 1–30 [text]
- Raphaela Heil, Malin Nauwerck & Anders Hast (2021). Shorthand Secrets: Deciphering Astrid Lindgren's stenographed drafts with HTR methods. Proceedings of the 17h Ialian Research Conference on Digital Libraries (IRCDL). February 18–19, 2021, Padua, Italy. [text]
- Karl Berglund (2021). Introducing the Beststreamer: Mapping Nuances in Digital Book Consumption at Scale. Publishing Research Quarterly 37, no. 2, pp. 135–151. [text]
- Karl Berglund & Ann Steiner (2021). Is Backlist the New Frontlist? Large-Scale Data Analysis of Bestseller Book Consumption in Streaming Services. LOGOS: Journal of the World Publishing Community 32, no. 1, pp. 7–24. [text]
- Johan Svedjedal (2021). Kassettboken kommer! Från Strindberg till Storytel – korskopplingar mellan ljud och litteratur. Julia Pennlert & Lars Ilshammar (eds.). Göteborg: Daidalos, 2021, pp. 17–30.
- Malin Nauwerck (2021). Sagoberättaren, sekreteraren och den spelande linden. Muntlig poetik i Astrid Lindgrens ljudvärldar. Från Strindberg till Storytel – korskopplingar mellan ljud och litteratur. Julia Pennlert & Lars Ilshammar (eds.). Göteborg: Daidalos, 2021, pp. 197–232.
- Karl Berglund (2021). Strömmade bästsäljare. Litteraturkonsumtion i digitala prenumerationstjänster utifrån Storytels användardata. Från Strindberg till Storytel – korskopplingar mellan ljud och litteratur. Julia Pennlert & Lars Ilshammar (eds.). Göteborg: Daidalos, 2021, pp. 327–362.
- Karl Berglund (2020). Fjärrläsning. Datorstödda metoder för kvantitativ litteraturforskning. Litteraturvetenskap II, Sigrid Schottenius Cullhed, Andreas Hedberg & Johan Svedjedal (eds.). Lund: Studentlitteratur, pp. 227–246.
- Sara Stymne & Carin Östman (2020). SLäNDa: An Annotated Corpus of Narrative and Dialogue in Swedish Literary Fiction. Twelth International Conference on Language Resources and Evaluation (LREC'20). May 13–15, 2020, Marseilles, France. [text]
- Karl Berglund, Mats Dahllöf & Jerry Määttä (2019). Apples and Oranges? Large-Scale Thematic Comparisons of Contemporary Swedish Popular and Literary Fiction. Samlaren, vol. 140, pp. 228–260. [text]
- Mats Dahllöf & Karl Berglund (2019). Faces, Fights, and Families: Topic Modeling and Gendered Themes in Two Corpora of Swedish Prose Fiction. DHN 2019 Copenhagen, Proceedings of 4th Conference of The Association Digital Humanities in the Nordic Countries. March 6-8 2019, Copenhagen, Denmark, pp. 92–111. [text]
- David Håkansson & Carin Östman (2019). ”afbröt skolläraren ifrigt”: En diakron studie av anföringssatsen i svensk skönlitteratur. [“the teacher interrupted eagerly”: A Diachronic Study of the Speech-Tag in Swedish Fiction.] Samlaren, vol. 140, pp. 261–280. [text]
- Sara Stymne, Johan Svedjedal & Carin Östman (2018). Språklig rytm i skönlitterär prosa. En fallstudie i Karin Boyes Kallocain. [Linguistic Rhythm in Narrative Prose: the case of Karin Boye’s Kallocain.] Samlaren, vol. 139, pp. 128–161. [text]
- Interview with Karl Berglund in Norwegian Morgenbladet.
- Popular article/review by Karl Berglund in Svenska Dagbladet on that the digitalization of the book trade differs between countries:
- Interview with Karl Berglund in SR P1 Kultur: "Så förändras litteraturen när den blir lyssnad på".
- Interview with Malin Nauwerck and Anders Hast in Forskning & Framsteg on the use of AI and HTR in transcribing Astrid Lindgren's shorthand: "AI får pippi på Astrids kråkfötter"
- Popular article/review by Karl Berglund in Svenska Dagbladet on analyses of race in literature with computational methods:
- Karl Berglund (Literature)
- Mats Dahllöf (Computational Linguistics)
- Anders Hast (Information Technology)
- David Håkansson (Scandinavian Languages)
- Malin Nauwerck (Literature, SBI)
- Sara Stymne (Computational Linguistics)
- Johan Svedjedal (Literature)
- Carin Östman (Nordic Languages)
- Karl Berglund (group coordinator)