Distinguishing drafts: measuring semantic distances between born-digital short story drafts

doi:10.21203/rs.3.rs-7750385/v1

Distinguishing drafts: measuring semantic distances between born-digital short story drafts

2025 · doi:10.21203/rs.3.rs-7750385/v1

preprint OA: closed

Full text JSON View at publisher

Full text 190,614 characters · extracted from preprint-html · click to expand

Distinguishing drafts: measuring semantic distances between born-digital short story drafts | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Distinguishing drafts: measuring semantic distances between born-digital short story drafts Floor Buschenhenke This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7750385/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 7 You are reading this latest preprint version Abstract This study applies computational methods for measuring semantic distances to born-digital drafts. Using text versions leading up to a short story by Flemish author Ellen Van Pelt, we are looking for relevant entry points into a born-digital genetic dossier. Several methods are applied and compared to consecutive pairs of drafts; a count of unique lemmas entering and leaving the working document, a document cosine similarity measure based on a BERT model of Dutch, and a narrativity measure. By comparing these different methods with close reading, we can assess whether such computational tools may be of help to textual scholars working with big corpora of text-genetic manuscripts. The working process of Van Pelt could be divided into several stages with a different focus. The methods each partially picked up on these stages, and also highlighted specific drafts. It did not become clear which of the methods was most succesful, in part because of the single, small case study used, for a writing process that was characterised by gradual expansion and continuous revision. Genetic criticism writing process literary drafts semantic similarity semantic distance measuring semantic change Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction In the Track Changes project, we collected materials from eleven Dutch and Flemish writers working on short stories, leading to 240 hours and +/- 14.400 txt-files, as one file was saved every minute, as well as at the start and end of each writing session. In addition, the Flemish author Gie Bogaert recorded his work on a novella, Roosevelt , between 2014–2016, which created 450+ ‘session versions’ of his work-in-progress. Textual scholars working with born-digital, and in particular with keystroke logged materials, have to find their way through these kinds of large corpora of finegrained data. In order to explore these types of corpora, computational tools are an appealing entrypoint, both because they can handle such large amounts of files as well as because they can pick up on subtle changes between texts that readers cannot. In the current study, computational methods are applied to assess semantic aspects of the textual development over time. Semantic similarity is relatively understudied in computational literary studies. As Ehrmanntraut et al. (2022) state: “Though the concept of similarity is ubiquitous in the practice of literary studies it has seldom been analyzed explicitly.” (p.1) In particular, very few studies so far have applied tools that measure semantic similarity on drafts instead of finished works of literature. The aim of this study therefore, is to compare computational methods for measuring semantic distances between born-digital drafts. These tools may help in finding relevant entry points into a born-digital genetic dossier. They may pick up on things such as the introduction of new characters, a change in setting, or an extension of the story line by adding new events. By comparing different methods with close reading, we can assess whether such computational tools may be of help to textual scholars working with big corpora of text-genetic manuscripts. Several methods will be compared and assessed. A first indicator for semantic change is the change in document length. Large changes in document length between drafts carry the potential for large semantic shifts. Secondly, a pair-wise comparison of documents will take place. This will be done on the basis of document embeddings created with BERT’s pre-trained Dutch language model. Thirdly, we will look at the unique lemmas that are removed from the previous draft and those newly introduced in the follow-up draft, and both count these unique lemmas and measure the cosine similarity on these sets of lemmas. Fourthly, to bring close reading into the mix, a narrativity measure will be applied (see Vauth et al, 2021) that can track plot development. Through manual annotation of events, a quantative score of narrativity is created. These computational methods are applied to a casestudy. The Flemish writer Ellen Van Pelt participated in the research project Track Changes (NWO-funded, 2018–2024). From November 2020 to late January 2021, she used the keystroke logger Inputlog (Leijten & Van Waes, 2013 ) to record her writing process while working on a short story that would be called Dauphin . In 19 recorded work sessions she composed her story. Most of textgenetic scholarship relies on close reading, and the proposed exploratory tools are by no means a replacement of this practice. This casestudy was chosen because it is part of the genetic edition Nanogenesis (Bekius, 2024 ), allowing me to study the textual developments through close reading, and compare the computational findings with the outcomes of the close reading. 1. Framework What is semantic similarity I’m interested in a tool that can find both additions and deletions of ‘meaning’ to the text/draft, which could take place through substitution, addition and deletion of text. I would like the tool to be able to pick up on the addition of a new scene, for example, or a shift in setting from indoors to outdoors. The study of meaning in texts is captured by what is commonly called ' semantics', and the tools I will present to measure differences between drafts are oriented towards semantic differences. Semantic similarity is defined as: “the measure of semantic equivalence between two blocks of text.” (Chandrasekaran & Mago, 2021 , p.2) This definition also captures an operationalisation; it is seen as something measurable quantitatively. Chandrasekaran & Mago further position semantic similarity as “one of the aspects of semantic relatedness”. (ibid.) Furthermore, in the operationalisation of semantic similarity, “methods asually give a ranking or percentage of similarity between texts, rather than a binary decision as similar or not similar.” The semantic relationship between several texts is measured in terms of ‘semantic distance’. In particular, it is often operationalised as distance in vector space (see 1.1), a method I am applying too - however, this captures only a very specific aspect of meaning in and through texts. Meaning arises not just from individual words, but from whole sentences, the text in its entirety, and from looking at the utterance/text in its context. (Kroeger, 2023 ) The selected measures will look at words and the whole texts, for optimal meaning disambiguation. The boundary between meaning and style is hard to draw. Verhagen ( 2012 ) points out that you can distinguish between objects (in the world, but also imaginary and abstract concepts) and the “choices of lexical items and grammatical constructions” used to talk about them. These constructions can be seen as part of a menu of multiple possible formulations. The chosen formulations link to ‘construal’: the way we choose to present an object in language to accomodate our audience. For example, “Marcel”, “my neighbour” and “a middle-aged man carrying a large suitcase” may all refer to the same person, but are more or less adequate ways of describing him depending on what we know and what our audience knows. Although semantics would then mostly be concerned with underlying objects, and style mostly with construals: “In fact, semantic analyses should be able to support the explanation of stylistic phenomena and the experience of a piece of discourse exhibiting a particular style may be used as evidence supporting or contradicting a semantic analysis.” (Verhagen, 2012 ). The definition of style given by Herrmann, Van Dalen-Oskam & Schöch (2015) was created with both computational and close-reading methodologies in mind: “Style is a property of texts constituted by an ensemble of formal features which can be observed quantitatively or qualitatively.” They incorporate semantics into stylistics: under formal features they list “linguistic features at the level of characters, lexicon, syntax, semantics (Stamatatos 2009, 4), but also features going beyond the sentence, such as narrative perspective or textual macro-structure; we differ from some previous definitions in that we conceive of stylistic features as explicitly defined and clearly identifiable.” (p. 44) There are multiple computational ways to analyse style in literary texts (stylometry). A common one is measuring the relative frequency of high-frequent words in a baseline corpus and a target text or texts. This approach is well suited for authorship attribution, genre and diachronic analyses. Looking at lexical diversity and readability/textual complexity are two other methods often used. (Neal et al, 2017 ). When selecting high-frequency words (these often include, for English, words like ‘the’ and ‘it’) the concepts and terms that make each text unique are left out of the equation. The high-frequent words like ‘the’ and ‘it’ do not carry the bulk of the meaning of any text. However, the fact that this analysis is good at distinguishing different genres indicates that the distribution of these ‘small’ words does capture themes and topics, and that the incorporation of semantics into a working definition of stylistics makes sense. This study works with measures that look at word and sentence (dis)similarity – I do not exclude style as part of the explanation of any dis/similarities found. 1.1 Measuring similarity of literary texts Previous studies have used computational methods to assess semantic similarity in literary texts for various purposes. Some studies have looked within texts to find differences between different segments of the same work. Semantic distance has played an important part of genre-analysis and genre attribution too. Furthermore, diachronic shifts in both the meaning of words and the topics that are popular have been studied using measures of semantic distance. Within texts Geyer et al (2020) studied the ‘cut effect’ in haiku, which is where the readers grapple with two contrasting sections of a poem, and (re)read one of the parts (the ‘fragment line’) much more intensely in order to bridge the juxtaposition into a whole. The manually placed ‘cut’ matched the semantic distances between lines very well, demonstrating the validity of the method. Fan et al (2023) see semantic distance between parts of a short story as a proxy for creativity. Their subjects were prompted with an incomplete story, and asked to write a creative ending. Using Word2Vec, both global (whole text) and local (a rolling window of about a sentence long) similarity were calculated. Here, for the whole text, ‘similarity’ meant; the space the text took up in the multidimensional vector space. Human ratings of originality correlated with both the global and local distance measurements. Within text similarity was also measured by Szemes & Nagy (2024). They worked with several of Shakespeare’s plays, to investigate which characters were ‘innovative’, in the sense that they brought new information to the play. Working with sentence-based BERT cosine similarity, each line of each character was compared with the preceding lines from all other characters. This allowed for a kind of social network visualisation, where the strength and direction of the relationships between the characters was based on how similar they spoke and who echoed who. Genre Semantic similarity measures have also succesfully been applied to detect genre boundaries. Van Cranenburgh, Van Dalen-Oskam & Van Zundert (2019) created a semantic profile of literariness. They use both intra-document and inter-document measures. To test the hypothesis that literary fiction is more lexically rich than other genres of fiction (such as sci-fi and romance), the intra-document paragraph vector (or ‘doc2vec’) method, which preserves some of the co-occurence information of words, was used to measure the semantic distances between different sections of a novel. The width of the semantic space occupied correlated with the literariness scores that a reader survey provided. The same held true for three inter-document measures, which looked at the differences and similarities between novels belonging to different genres. Sobchuk & Šela (2024) looked at ‘thematic’ similarity for genre detection of novels. They compared different methods on two corpora; a first small corpus with four pre-tagged genres (detectives, fantasy, romance, science fiction), followed by a large and untagged corpus from Gutenberg. What is quite unique to their paper is the variety not just of analytical methods for assessing similarity, but also of pre-processing steps and textual features. In total they tested 291 combinations from the three ‘menus’, each time on a random sample of 100 text fragments. The final step in their analytical pipelines was an unsupervised clustering. The performance of each pipeline was tested against the genre tags from the small corpus. The ‘winner’ was a pipeline where the texts were strongly pre-processed for theme (by lemmatizing, and then only using nouns, verbs, adverbs and adjectives, removing named entities and applying lexical simplification), then using a Doc2Vec, LDA or bags of words approach – all led to good outcomes, and as similarity measurement Jensen-Shannon outperformed the other metrics, allthough cosine similarity worked very well in combination with doc2vec. Ehrmanntraut et al (2022), compared realist, naturalist and modernist German poetry on between-genre and within-genre textual similarity. Following Bär et al (2011), they used a multidimensional model to operationalise textual similarity. Next to Bär’s content, style and structure, they added emotion, as they found this to be a relevant category for poetry. With several human annotaters, they rated the poems’ similarity on all dimensions. Ratings of content similarity and style similarity correlated highly. They then tested the performance of several models: a tf-idf (based on words), BERT on words, and sentence-BERT, with each poem treated like a sentence and compared in its entirety to other poems. The cosine similarity measure based on Bert-sentence embeddings showed the best results out of the box, with a correlation of 0.82 with human annotation. Results from the computational similary ratings were then mapped onto a heuristic model of distances between poems within and between three genres (realism, naturalism, modernism). This supported the claims made in literary scholarship about the shift from realism to modernism. Diachronic change Beausang (2021) proposed the measure 'diachronic Delta', comprised of relative word frequencies, in a corpus where English literary texts were grouped by year of production for a period spanning the 18th to early 20th century. Underwood (2019) had previously demonstrated the gradual changes rather than drastic innovation shaping literary history over the centuries. Beausang tested this finding by searching for 'break years' where the word frequencies were both significantly different from the years before as well as similar to the years following. For the genre of prose, no break years were found. A similar approach was taken by Griebel et al (2024), who tracked cultural change in three fields of study, including fiction. They applied document embeddings but indicated these needed to be finetuned to perform adequately on their (historical) sources. Like Beausang, they use a formula from Barron et al ( 2018 ) to conceptualise change; novelty - transience = resonance. A year or period with resonance is both markedly different from the preceding period, as well as similar to the following period, showing an 'anticipation of future change' (Griebel et al, 2024, p232). Working with a similar corpus as Beausang, of 18th -20th century English language fiction, they applied topic models and document embeddings. Both measures worked similarly well. Writing process materials can also be approached as a type of diachronic change, and the concept of 'resonance' could be applied to a writing process corpus as well. Both work sessions where the text is becoming more unlike its predecessors than previous versions became to their predecessors, as well as sessions that are 'already' much like later versions would be indicators of a textual development worthy of further inspection. These methods, however, require a much larger corpus than the current case study can offer. It is encouraging that several studies have shown a correlation between computational approaches to textual similarity and human expert annotation. Document and sentence embeddings are frequently used in this field and thus show promise to be applied to drafts. Only a few studies so far have applied computational tools to measure textual development between drafts. They will be presented in the next paragraph. 1.2 Similarity between drafts The question of how drafts differ from each other is traditionally answered through hermeneutical practices. In the genetic critic's workflow, these consist of text transcription (to a digital format), followed by pairwise (automatic or manual) collation, followed by reading and interpretation of the textual differences. (Grésillon, 2016 ). Once a set of drafts and materials has been brought into chronological relations, and represented in a digital format, quantitative visualisations and analyses do become possible. In the Beckett digital manuscript project (Van Hulle & Nixon, 2024 ) for example, a visualisation is given of the number of words that stayed constant, that were deleted, added and modified, between each set of two consecutive drafts of each work. In the field of writing process research, similar statistics are used to characterise digital keystroke logged writing processes. (e.g. Inputlog's summary analysis, Leijten & Van Waes, 2013 ) Santosh et al ( 2024 ) developed a ‘condensed bird eye’s view of edits’ in the form of a ‘thematic summary of changes’ to help collaborating authors quickly see the overarching effects of what the track changes function shows in great detail for each individual revision. Their purpose was to help reviewers and co-authors to approve or reject batches of thematically linked edits with one click, rather than reading through each invidual edit. They developed a human-written set of 45 thematic summaries, based on reviewing a corpus of academic papers on the subject of natural language processing, where each paper had two versions, a submitted draft and a published document. GTP-4 was not able to do this task. They let the LLM create thematic descriptions of each edit (in the form of a diff output), and then cluster the edits into themes. The human annotations and the algorithmic results did not overlap sufficiently. The LLM was not able to provide adequate summaries of the edits, and the concept of ‘theme’ was probably left too open ended in the process. A fully algorithmic approach was taken by Ketzan & Schöch (2021) with their tool 'Coleto'. Coleto was designed to compare three (published) versions of a novel ( the Martian by Andrew Weir). It is a software pipeline that starts off with an automatic collation of two text versions and then tags each edit found. The edit classification has potential for a semantic analysis. There are two large categories, of script-identifiable edits and semantically open edits . These latter are subclassified into deletions, insertions and changes (or: substitutions at the same location). The ‘changes’ are further subdivided based on Levenshtein distance. This is a measure of how many characters need to be changed to transform one string into another string. A ‘major’ edit is defined as having a Levenshtein distance of more than 5, but naturally this is not set in stone. The distinction between major and minor could be a rough indicator of semantic distance too (see Sarkar et al, 2016), although Han et al ( 2021 ) demonstrated that deep learning methods were more effective at assessing semantic distances than Levenshtein distance is. 1.3 Use case: Ellen Van Pelt's story Dauphin Ellen Van Pelt (1980) is a Flemish writer and psychologist, who has published two novels (Drift, 2015, Zwaluwstaarten, 2025) a biography of Flemish writer Roger Van de Velde (Deze wereld is geen ergernis waard, 2020) and many short stories. She participated in the research project Track Changes , in which writers recorded their digital writing process using keystroke logging software. Inputlog (Leijten & Van Waes, 2013 ), the specific software we used, tracks key presses, mouse actions and screen/window switches, and operates on Microsoft Word on Windows operating systems. Writers can work in Word, while Inputlog captures their cursor position, the length of their document, and characters typed and deleted, all with accurate timestamps. Inputlog stores this detailed process information in a tabular format, and also captures text snapshots once a minute. Van Pelt installed Inputlog on her own laptop, on which she worked at her own convenience over a period of three months, from November of 2020 until mid-January of 2021. (18 writing sessions, 15 and a half hours in total) In January of 2023 she made a few additional changes to her story, no longer registered by the keystroke logger. This final version has also been included in our corpus. In Van Pelt’s story, Dauphin , a mother and young son are taking an all-inclusive vacation to Tunis. It is told in first person, from the mother's perspective.The discourse is mostly chronological and captures the events during a single holiday excursion, a day trip from Djerba to Flamingo Island, with a ‘pirate ship’. The mother feels totally out of place. 1 It is the first summer that the mother has to be ‘both father and mother’. The trip was a gift from her father, and not something she would have picked out herself. On the ship, one man stands out from the crowd because of his brooding demeanour and his dark-coloured, warm clothes despite the hot weather. On Flamingo Island, the son becomes too tired to walk and the mysterious man from their ship offers a piggyback ride. Then, at the beach, mother and son collect sea shells. Over lunch, Finn (the son) accidentally throws sand into Mustafa’s (the brooding man) plate. After the reprimand from his mother, he tells Mustafa that his father has died. Mustafa offers consolation. Sailing back from the island, they see two dolphins jumping out of the water. Finn is delighted and says ‘they are always happy’ while the mother finds them beautiful and thinks about how dolphins are the only animals that commit suicide. Mustafa smiles at her for the first time. Textgenetic process description The working document and the keystroke logging files show that Van Pelt started out her project with some notes and a pasted text from a travel agency’s website, about an excursion to a Tunesian island. Both the textual development and the nature of the notes in the working document show that Van Pelt did not have a fully fledged design for her story in advance of composition. The scenes, interactions between characters and the backstory of the main character were all developed during writing. No drastic changes or cuts were made to the plot, characterisation or setting in the process. Van Pelt's process can be subdivided into several stages, although revision is a constant companion throughout the process. A first stage is that of rapid text expansion. It runs from the first to the 7th session. This first expansion, the narrative is expanded chronologically, with consecutive events in the story being composed in their order of occurrence. Meanwhile, Van Pelt is revising her pre-existing draft in each writing session too. In session id7, the single note on the dramatic core of the situation, 'husband dead?' is deleted from the draft, and a scene in which this information is revealed (to Mustafa and the reader) is added at the bottom of the draft-so-far. The way in which the mother, son and the helpful stranger relate to each other has now been established too. In session id8, she introduces a list of new notes at the top of the document with eleven shipping-related sayings and expressions. She also indicates through notes that she questions the beginning, and is unsure about how to end the story. Up and including session 14, she reworks and makes modest additions within the existing scenes. For example, in session id12, she has already described the lunch, with the pirates banging on pots and pans, and calling out one table at a time, with the mother and son approaching the counter and that they receive full plates. Van Pelt adds descriptions of the tables, as well as the food, adding 'baked' to fish and 'magical' to the vegetables. She also makes a characterisation move by adding that the mother waits until most people have been served, similar to how she waited until most tourists had moved away from the docks before she set out. She is portrayed as not fitting in with the other tourists and coolly observing them. In session id14, the opening is changed by switching the order of two scenes, and situating the present moment of the narrative at a different point. After this session, Van Pelt prints her document and revises it on paper. We can see this as an important point in her process. She has now resolved the 'how to start' question from her notes. In sessions 15 and 16, she revises as well as completes the story, at the end of session 16, also resolving the ending-note at least for the time being. So after session id16, she continues on a full draft. She prints this version of her document too, and starts revising it on paper first, then implementing these changes in session 17. From 17 all throughout the remaining process, the types of textual changes made are smaller-scale and mostly stylistic, with a focus on word variation, clarifying references, and creating coherence. Expectations As Van Pelt incorporates extensive revision throughout her work process, I expect some semantical changes between all drafts. However, based on the key points in both plot decisions with regards to the underlying situation (the death of the husband), the ending and the beginning of the story, as well as based on the shifts in document length, I see a number of stages that have a different focus. The first, text expansion stage, from session 1 up to 7, adds many scenes and events. At the end of that stage, two thirds of the final document length is achieved. I would expect to see substantial changes in document similarity during this stage. The second stage, from id8 up and including id14, she continues gradually filling out her story, often from within existing scenes, but also tackling the beginning of the narrative. In 15 and 16 she completes her first draft by adding an ending. The final stage, from id17 onwards, many revisions still take place, but their scope is smaller than previously. I would expect this stage to show the smallest semantic shifts between consecutive drafts. What is semantic similarity I’m interested in a tool that can find both additions and deletions of ‘meaning’ to the text/draft, which could take place through substitution, addition and deletion of text. I would like the tool to be able to pick up on the addition of a new scene, for example, or a shift in setting from indoors to outdoors. The study of meaning in texts is captured by what is commonly called ' semantics', and the tools I will present to measure differences between drafts are oriented towards semantic differences. Semantic similarity is defined as: “the measure of semantic equivalence between two blocks of text.” (Chandrasekaran & Mago, 2021 , p.2) This definition also captures an operationalisation; it is seen as something measurable quantitatively. Chandrasekaran & Mago further position semantic similarity as “one of the aspects of semantic relatedness”. (ibid.) Furthermore, in the operationalisation of semantic similarity, “methods asually give a ranking or percentage of similarity between texts, rather than a binary decision as similar or not similar.” The semantic relationship between several texts is measured in terms of ‘semantic distance’. In particular, it is often operationalised as distance in vector space (see 1.1), a method I am applying too - however, this captures only a very specific aspect of meaning in and through texts. Meaning arises not just from individual words, but from whole sentences, the text in its entirety, and from looking at the utterance/text in its context. (Kroeger, 2023 ) The selected measures will look at words and the whole texts, for optimal meaning disambiguation. The boundary between meaning and style is hard to draw. Verhagen ( 2012 ) points out that you can distinguish between objects (in the world, but also imaginary and abstract concepts) and the “choices of lexical items and grammatical constructions” used to talk about them. These constructions can be seen as part of a menu of multiple possible formulations. The chosen formulations link to ‘construal’: the way we choose to present an object in language to accomodate our audience. For example, “Marcel”, “my neighbour” and “a middle-aged man carrying a large suitcase” may all refer to the same person, but are more or less adequate ways of describing him depending on what we know and what our audience knows. Although semantics would then mostly be concerned with underlying objects, and style mostly with construals: “In fact, semantic analyses should be able to support the explanation of stylistic phenomena and the experience of a piece of discourse exhibiting a particular style may be used as evidence supporting or contradicting a semantic analysis.” (Verhagen, 2012 ). The definition of style given by Herrmann, Van Dalen-Oskam & Schöch (2015) was created with both computational and close-reading methodologies in mind: “Style is a property of texts constituted by an ensemble of formal features which can be observed quantitatively or qualitatively.” They incorporate semantics into stylistics: under formal features they list “linguistic features at the level of characters, lexicon, syntax, semantics (Stamatatos 2009, 4), but also features going beyond the sentence, such as narrative perspective or textual macro-structure; we differ from some previous definitions in that we conceive of stylistic features as explicitly defined and clearly identifiable.” (p. 44) There are multiple computational ways to analyse style in literary texts (stylometry). A common one is measuring the relative frequency of high-frequent words in a baseline corpus and a target text or texts. This approach is well suited for authorship attribution, genre and diachronic analyses. Looking at lexical diversity and readability/textual complexity are two other methods often used. (Neal et al, 2017 ). When selecting high-frequency words (these often include, for English, words like ‘the’ and ‘it’) the concepts and terms that make each text unique are left out of the equation. The high-frequent words like ‘the’ and ‘it’ do not carry the bulk of the meaning of any text. However, the fact that this analysis is good at distinguishing different genres indicates that the distribution of these ‘small’ words does capture themes and topics, and that the incorporation of semantics into a working definition of stylistics makes sense. This study works with measures that look at word and sentence (dis)similarity – I do not exclude style as part of the explanation of any dis/similarities found. 1.1 Measuring similarity of literary texts Previous studies have used computational methods to assess semantic similarity in literary texts for various purposes. Some studies have looked within texts to find differences between different segments of the same work. Semantic distance has played an important part of genre-analysis and genre attribution too. Furthermore, diachronic shifts in both the meaning of words and the topics that are popular have been studied using measures of semantic distance. Within texts Geyer et al (2020) studied the ‘cut effect’ in haiku, which is where the readers grapple with two contrasting sections of a poem, and (re)read one of the parts (the ‘fragment line’) much more intensely in order to bridge the juxtaposition into a whole. The manually placed ‘cut’ matched the semantic distances between lines very well, demonstrating the validity of the method. Fan et al (2023) see semantic distance between parts of a short story as a proxy for creativity. Their subjects were prompted with an incomplete story, and asked to write a creative ending. Using Word2Vec, both global (whole text) and local (a rolling window of about a sentence long) similarity were calculated. Here, for the whole text, ‘similarity’ meant; the space the text took up in the multidimensional vector space. Human ratings of originality correlated with both the global and local distance measurements. Within text similarity was also measured by Szemes & Nagy (2024). They worked with several of Shakespeare’s plays, to investigate which characters were ‘innovative’, in the sense that they brought new information to the play. Working with sentence-based BERT cosine similarity, each line of each character was compared with the preceding lines from all other characters. This allowed for a kind of social network visualisation, where the strength and direction of the relationships between the characters was based on how similar they spoke and who echoed who. Genre Semantic similarity measures have also succesfully been applied to detect genre boundaries. Van Cranenburgh, Van Dalen-Oskam & Van Zundert (2019) created a semantic profile of literariness. They use both intra-document and inter-document measures. To test the hypothesis that literary fiction is more lexically rich than other genres of fiction (such as sci-fi and romance), the intra-document paragraph vector (or ‘doc2vec’) method, which preserves some of the co-occurence information of words, was used to measure the semantic distances between different sections of a novel. The width of the semantic space occupied correlated with the literariness scores that a reader survey provided. The same held true for three inter-document measures, which looked at the differences and similarities between novels belonging to different genres. Sobchuk & Šela (2024) looked at ‘thematic’ similarity for genre detection of novels. They compared different methods on two corpora; a first small corpus with four pre-tagged genres (detectives, fantasy, romance, science fiction), followed by a large and untagged corpus from Gutenberg. What is quite unique to their paper is the variety not just of analytical methods for assessing similarity, but also of pre-processing steps and textual features. In total they tested 291 combinations from the three ‘menus’, each time on a random sample of 100 text fragments. The final step in their analytical pipelines was an unsupervised clustering. The performance of each pipeline was tested against the genre tags from the small corpus. The ‘winner’ was a pipeline where the texts were strongly pre-processed for theme (by lemmatizing, and then only using nouns, verbs, adverbs and adjectives, removing named entities and applying lexical simplification), then using a Doc2Vec, LDA or bags of words approach – all led to good outcomes, and as similarity measurement Jensen-Shannon outperformed the other metrics, allthough cosine similarity worked very well in combination with doc2vec. Ehrmanntraut et al (2022), compared realist, naturalist and modernist German poetry on between-genre and within-genre textual similarity. Following Bär et al (2011), they used a multidimensional model to operationalise textual similarity. Next to Bär’s content, style and structure, they added emotion, as they found this to be a relevant category for poetry. With several human annotaters, they rated the poems’ similarity on all dimensions. Ratings of content similarity and style similarity correlated highly. They then tested the performance of several models: a tf-idf (based on words), BERT on words, and sentence-BERT, with each poem treated like a sentence and compared in its entirety to other poems. The cosine similarity measure based on Bert-sentence embeddings showed the best results out of the box, with a correlation of 0.82 with human annotation. Results from the computational similary ratings were then mapped onto a heuristic model of distances between poems within and between three genres (realism, naturalism, modernism). This supported the claims made in literary scholarship about the shift from realism to modernism. Diachronic change Beausang (2021) proposed the measure 'diachronic Delta', comprised of relative word frequencies, in a corpus where English literary texts were grouped by year of production for a period spanning the 18th to early 20th century. Underwood (2019) had previously demonstrated the gradual changes rather than drastic innovation shaping literary history over the centuries. Beausang tested this finding by searching for 'break years' where the word frequencies were both significantly different from the years before as well as similar to the years following. For the genre of prose, no break years were found. A similar approach was taken by Griebel et al (2024), who tracked cultural change in three fields of study, including fiction. They applied document embeddings but indicated these needed to be finetuned to perform adequately on their (historical) sources. Like Beausang, they use a formula from Barron et al ( 2018 ) to conceptualise change; novelty - transience = resonance. A year or period with resonance is both markedly different from the preceding period, as well as similar to the following period, showing an 'anticipation of future change' (Griebel et al, 2024, p232). Working with a similar corpus as Beausang, of 18th -20th century English language fiction, they applied topic models and document embeddings. Both measures worked similarly well. Writing process materials can also be approached as a type of diachronic change, and the concept of 'resonance' could be applied to a writing process corpus as well. Both work sessions where the text is becoming more unlike its predecessors than previous versions became to their predecessors, as well as sessions that are 'already' much like later versions would be indicators of a textual development worthy of further inspection. These methods, however, require a much larger corpus than the current case study can offer. It is encouraging that several studies have shown a correlation between computational approaches to textual similarity and human expert annotation. Document and sentence embeddings are frequently used in this field and thus show promise to be applied to drafts. Only a few studies so far have applied computational tools to measure textual development between drafts. They will be presented in the next paragraph. 1.2 Similarity between drafts The question of how drafts differ from each other is traditionally answered through hermeneutical practices. In the genetic critic's workflow, these consist of text transcription (to a digital format), followed by pairwise (automatic or manual) collation, followed by reading and interpretation of the textual differences. (Grésillon, 2016 ). Once a set of drafts and materials has been brought into chronological relations, and represented in a digital format, quantitative visualisations and analyses do become possible. In the Beckett digital manuscript project (Van Hulle & Nixon, 2024 ) for example, a visualisation is given of the number of words that stayed constant, that were deleted, added and modified, between each set of two consecutive drafts of each work. In the field of writing process research, similar statistics are used to characterise digital keystroke logged writing processes. (e.g. Inputlog's summary analysis, Leijten & Van Waes, 2013 ) Santosh et al ( 2024 ) developed a ‘condensed bird eye’s view of edits’ in the form of a ‘thematic summary of changes’ to help collaborating authors quickly see the overarching effects of what the track changes function shows in great detail for each individual revision. Their purpose was to help reviewers and co-authors to approve or reject batches of thematically linked edits with one click, rather than reading through each invidual edit. They developed a human-written set of 45 thematic summaries, based on reviewing a corpus of academic papers on the subject of natural language processing, where each paper had two versions, a submitted draft and a published document. GTP-4 was not able to do this task. They let the LLM create thematic descriptions of each edit (in the form of a diff output), and then cluster the edits into themes. The human annotations and the algorithmic results did not overlap sufficiently. The LLM was not able to provide adequate summaries of the edits, and the concept of ‘theme’ was probably left too open ended in the process. A fully algorithmic approach was taken by Ketzan & Schöch (2021) with their tool 'Coleto'. Coleto was designed to compare three (published) versions of a novel ( the Martian by Andrew Weir). It is a software pipeline that starts off with an automatic collation of two text versions and then tags each edit found. The edit classification has potential for a semantic analysis. There are two large categories, of script-identifiable edits and semantically open edits . These latter are subclassified into deletions, insertions and changes (or: substitutions at the same location). The ‘changes’ are further subdivided based on Levenshtein distance. This is a measure of how many characters need to be changed to transform one string into another string. A ‘major’ edit is defined as having a Levenshtein distance of more than 5, but naturally this is not set in stone. The distinction between major and minor could be a rough indicator of semantic distance too (see Sarkar et al, 2016), although Han et al ( 2021 ) demonstrated that deep learning methods were more effective at assessing semantic distances than Levenshtein distance is. 1.3 Use case: Ellen Van Pelt's story Dauphin Ellen Van Pelt (1980) is a Flemish writer and psychologist, who has published two novels (Drift, 2015, Zwaluwstaarten, 2025) a biography of Flemish writer Roger Van de Velde (Deze wereld is geen ergernis waard, 2020) and many short stories. She participated in the research project Track Changes , in which writers recorded their digital writing process using keystroke logging software. Inputlog (Leijten & Van Waes, 2013 ), the specific software we used, tracks key presses, mouse actions and screen/window switches, and operates on Microsoft Word on Windows operating systems. Writers can work in Word, while Inputlog captures their cursor position, the length of their document, and characters typed and deleted, all with accurate timestamps. Inputlog stores this detailed process information in a tabular format, and also captures text snapshots once a minute. Van Pelt installed Inputlog on her own laptop, on which she worked at her own convenience over a period of three months, from November of 2020 until mid-January of 2021. (18 writing sessions, 15 and a half hours in total) In January of 2023 she made a few additional changes to her story, no longer registered by the keystroke logger. This final version has also been included in our corpus. In Van Pelt’s story, Dauphin , a mother and young son are taking an all-inclusive vacation to Tunis. It is told in first person, from the mother's perspective.The discourse is mostly chronological and captures the events during a single holiday excursion, a day trip from Djerba to Flamingo Island, with a ‘pirate ship’. The mother feels totally out of place. 1 It is the first summer that the mother has to be ‘both father and mother’. The trip was a gift from her father, and not something she would have picked out herself. On the ship, one man stands out from the crowd because of his brooding demeanour and his dark-coloured, warm clothes despite the hot weather. On Flamingo Island, the son becomes too tired to walk and the mysterious man from their ship offers a piggyback ride. Then, at the beach, mother and son collect sea shells. Over lunch, Finn (the son) accidentally throws sand into Mustafa’s (the brooding man) plate. After the reprimand from his mother, he tells Mustafa that his father has died. Mustafa offers consolation. Sailing back from the island, they see two dolphins jumping out of the water. Finn is delighted and says ‘they are always happy’ while the mother finds them beautiful and thinks about how dolphins are the only animals that commit suicide. Mustafa smiles at her for the first time. Textgenetic process description The working document and the keystroke logging files show that Van Pelt started out her project with some notes and a pasted text from a travel agency’s website, about an excursion to a Tunesian island. Both the textual development and the nature of the notes in the working document show that Van Pelt did not have a fully fledged design for her story in advance of composition. The scenes, interactions between characters and the backstory of the main character were all developed during writing. No drastic changes or cuts were made to the plot, characterisation or setting in the process. Van Pelt's process can be subdivided into several stages, although revision is a constant companion throughout the process. A first stage is that of rapid text expansion. It runs from the first to the 7th session. This first expansion, the narrative is expanded chronologically, with consecutive events in the story being composed in their order of occurrence. Meanwhile, Van Pelt is revising her pre-existing draft in each writing session too. In session id7, the single note on the dramatic core of the situation, 'husband dead?' is deleted from the draft, and a scene in which this information is revealed (to Mustafa and the reader) is added at the bottom of the draft-so-far. The way in which the mother, son and the helpful stranger relate to each other has now been established too. In session id8, she introduces a list of new notes at the top of the document with eleven shipping-related sayings and expressions. She also indicates through notes that she questions the beginning, and is unsure about how to end the story. Up and including session 14, she reworks and makes modest additions within the existing scenes. For example, in session id12, she has already described the lunch, with the pirates banging on pots and pans, and calling out one table at a time, with the mother and son approaching the counter and that they receive full plates. Van Pelt adds descriptions of the tables, as well as the food, adding 'baked' to fish and 'magical' to the vegetables. She also makes a characterisation move by adding that the mother waits until most people have been served, similar to how she waited until most tourists had moved away from the docks before she set out. She is portrayed as not fitting in with the other tourists and coolly observing them. In session id14, the opening is changed by switching the order of two scenes, and situating the present moment of the narrative at a different point. After this session, Van Pelt prints her document and revises it on paper. We can see this as an important point in her process. She has now resolved the 'how to start' question from her notes. In sessions 15 and 16, she revises as well as completes the story, at the end of session 16, also resolving the ending-note at least for the time being. So after session id16, she continues on a full draft. She prints this version of her document too, and starts revising it on paper first, then implementing these changes in session 17. From 17 all throughout the remaining process, the types of textual changes made are smaller-scale and mostly stylistic, with a focus on word variation, clarifying references, and creating coherence. Expectations As Van Pelt incorporates extensive revision throughout her work process, I expect some semantical changes between all drafts. However, based on the key points in both plot decisions with regards to the underlying situation (the death of the husband), the ending and the beginning of the story, as well as based on the shifts in document length, I see a number of stages that have a different focus. The first, text expansion stage, from session 1 up to 7, adds many scenes and events. At the end of that stage, two thirds of the final document length is achieved. I would expect to see substantial changes in document similarity during this stage. The second stage, from id8 up and including id14, she continues gradually filling out her story, often from within existing scenes, but also tackling the beginning of the narrative. In 15 and 16 she completes her first draft by adding an ending. The final stage, from id17 onwards, many revisions still take place, but their scope is smaller than previously. I would expect this stage to show the smallest semantic shifts between consecutive drafts. 2. Method(s) As stated above, for comparing different drafts, there is no standard metric available (and very little computational work done in general). Ketzan & Schöch (2021) used a Levenshtein distance on the edits only, and Santosh et al combined human annotation with LLM interpretation to group edits into themes - with mixed results. In the broader field of literary studies, Doc2Vec and Bert cosine similarity measures are often applied to measure semantic distance, and align with human judgement of textual similarity. As this is an exploratory study, I have opted to compare both a whole-document measure as well as to isolate just the textual changes. In order to compare the computational methods with a close-reading approach, I have manually annoted the events in the drafts based on Vauth et al (2021), leading to a narrativity score, as will be discussed in section 2.4 . 2.1 Document length Using Frog (Van der Sloot et al, 2018), a parser for modern Dutch, the texts were tokenized and lemmatized. As a basic first proxy for semantic change, the difference in word (token) count of each pair of drafts was used. 2.2 Counting lemmas As the genetic dossier, such as collected through keystroke logging, starts off with an empty document that is growing and developing into a fully formed text, which may then also be reworked and rewritten, the individual drafts are of varying sizes, and in varying forms of completeness. Simply measuring shifts in document size can give a first indicator of meaningful changes. However, isolating just the parts that have been added to or deleted from the draft in between versions allows me to focus on the semantic shifts more clearly. I have furthermore opted for the isolation of words (and their lemma’s) as a unit of analysis, instead of the entire fragments that came and went. Then, for each consecutive pair of documents two lists were made; of lemmas unique to the first version, and lemmas unique to the second version, so added to the document. Then, a simple count was performed of how many lemma's were brought into and taken out of the document. 2.3 Document cosine similarity On the basis of a chronological ordering of the drafts, for each pair of texts, semantic similarity will be measured through the document cosine similarity using (off-the-shelf, pre-trained) Dutch BERT. BERT looks at words in their sentence-context, and then creates a vector for each word in multidimensional space, based on the semantic similarities between words. These similarities are based on the similarities of their contexts of occurence, so the words usually surrounding the target word. BERT is available for multiple languages, and these numeric representations have been created using a large corpus of texts. I have used the Dutch model GroNLP/bert-base-dutch-cased from the Python transformers package and then created a document-based embedding. The pairwise cosine similarity was analysed using the sklearn.metrics.pairwise package. So when using this tool to measure 'semantic' differences, what we are looking at is how far away the words of one draft are from the words in the following draft. This may not always align with human interpretation. To give an example, in this multidimensional space, 'king' will be closer to 'queen' than to 'jester', whereas for a gender-themed analysis of textual changes, you might consider a shift from king to jester less important than a shift from king to queen. 2.4 Narrativity measure To compare the computational approaches to human interpretation of semantic shifts between drafts, we opted to annotate the differences between the drafts with a focus on narrativity. As the drafts in this corpus start from an empty document and work up to a completed narrative, shifts in narrativity say something about the furthering of the plot and the expansion of the world-building. In assessing the types and amount of events that take place in each text version, we have a metric that gives a holistic score of the 'eventfulness' or narrativity of each version of the draft. As a basis for this annotation, we used Gius & Vauth's (2022; see also Vauth et al, 2021) work on event classification. They see events as the 'minimal narrative units' (p.333) and take each verbal phrase (a finite verb and everything connected to it) as units of analysis. This small-scale approach is suitable for our draft corpus, where the changes between drafts may lay in a few sentences only - making a holistic / macrogenetic characterisation of the draft less suitable. They distinguish four categories of events: non-events, stative events, process events and change-of-state events. Non-events are those verbal phrases that no not represent an event at all; but rather a general fact, questions or counterfactual statements (like 'I should have seen it coming.') Stative events are descriptions that do not encompass a time or duration aspect, and often provide information about the setting of a story. An example is 'His trousers were black.' Process events consist of actions and mental processes, such as walking, thinking, seeing or speaking. Change of state events contain 'physical or mental state changes'. These four types have different intensities of narrativity. The change of state is the most eventful, and most likely to propel the plot forwards, followed by the process events, and then the stative events, with the non-events coming in last, as they do not contain events that pertain to the plot at all. Gius & Vauth (2022) then add numerical loadings to each type, to calculate shifts in narrativity throughout a set of novels. I have used their scoring system to attribute weights in the same way. Each draft will then receive a narrativity score based on the amount of verbal phrases for each type of event, times the weight of that type of event. Increases and decreases of this score between consecutive pairs of drafts indicate shifts in narrativity. 2.5 Data The raw keystroke corpus of Ellen Van Pelt's work process was used. Allthough the keystroke source material captures all textual changes made in each session, for this study I am only looking at the changes between the draft at the start and end of each writing session. I selected the Word files from the end of each writing session, which came to 19 files in total. Then, consecutive pairs of documents were used for the measurements. Version 1 was compared with version 2, version 2 with version 3, et cetera. The documents contained the story draft, but most of them also a modest amount of notes. 3 Exploration/Results 3.1 Document length & Count of unique lemmas For the tokenized texts, I performed a word count first – this already gives some insight into potentially pivotal moments in the genesis. As seen in Fig. 1, over the first 7 sessions, the document size gradually expands. This is followed by a series of sessions where the size is much more stable (8–14). In session 15, the draft increases in size again, and following a small expansion in session 16, the draft remains fairly similar in length for the final sessions. [Figure 1. Document word count for each writing session] [Figure 2. Barplot with the count of unique lemmas coming into (the black bars) and disappearing from the text (grey bar) per session, on the filtered lemma set, only nouns, adjectives, verbs & adverbs.] Following Sobchuk & Šela (2024), the set of unique lemmas going in and out of the document in each session has been further pared down to increase semantic load; only nouns, verbs, adjectives and adverbs have been taken into account. The counts of unique lemmas entering and exiting the document, as shown in Fig. 2, show parallels with the wordcount plot above. The first sessions many more new lemmas enter the document than leave it. We also see the marked influx of new lemmas in session 16 (the 15_16 comparison set), which runs parallel with a large expanse of the document length, as well as the completion of the first draft. Comparing the two metrics, we see that the period between sessions 8 and 16, is characterised by substitutions (of unique lemma's) but not by text expansion. We also see that compared to the later stage, sessions 17–22, although shifts in the document length are equally modest, far fewer unique lemmas are entering and exiting the drafts. This suggests a focus on proof-reading and other less semantically rich textual changes, which is supported by the close reading. Van Pelt for example switches out instances of words that she used multiple times, such as 'pirates'[piraten] and 'ship'[schip], to increase lexical variety when referring to the same concept. She also re-orders sentences without shifts in lemmas, for example in session 21: Door zijn zonnebril kan ik het niet met zekerheid weten. ('Al' - occurs in other locations still, so removing it here does not remove a unique lemma from the document.) Based on this lemma-visualisation, the textual scholar could select those writing sessions where many lemmas appear or dissappear from the document as entry points into the textual development. Contrary to the document length visualisation, the analysis of unique lemmas allowed for a distinction between two periods of revision. However, these counts do not yet indicate the semantic distance of those lemmas; for that I have applied a pairwise cosine similarity measure based on BERT. 3.2 Pairwise cosine document similarity The document texts at the end of each writing session were compared with 2 their predecessor. [Figure 4. Barplot showing the pairwise cosine similarity of the document texts.] Figure 4 shows these pair-based similarity scores. The first marked difference is between sessions 1 and 2. As the draft is still very short at the end of session 1, it is more than doubled in size at the end of session 2 - leading to a proportionally large amount of new lemmas coming into the document too. [short texts, type-token ratio, etc] The other patterns are more telling. The middle stage of extensive revision, starting from 6_7, and running until 14_15, shows the lowest pair-wise similarity. From the unique lemma counts, we know that equal amounts of lemmas are entering and existing the document here, whereas in the first stage of expansion, many more new lemmas are added. Session 8 (pair 7_8 in the graph) and session 15 (pair 14_15) show the biggest differences in the cosine similarity with their predecessor. In session 8, Van Pelt removes a distinctive passage that served as a note, rather than part of the narrative, namely a text from a tourist brochure on the pirate-boat excursion, that Van Pelt had copied from a travel agency's website and pasted into the document. She does integrate information from this section into her narrative, but many of the unique lemmas leaving the document in this work session were a part of that brochure fragment still. She also adds a distinctive section, with 11 shipping terms and expressions, also as a note rather than part of the narrative. She adds two new segments to the narrative itself, one on reading, and one on the tourists' reactions during a comedy performance by the pirates, and further expands a scene where the tourists are being served lunch, which explains the food-related lemmas entering the text at this point. It is worth further study to see if this is a genre-effect on the similarity measure; as two distictive non-story parts are playing a role here. A similar shift takes place in session 15 (14_15), where Van Pelt removes all remaining notes from the draft, both the shipping terms as well as short ones on ending and beginning. However, she integrates many of the shipping expressions into the story, by substitution. She also adds short vignettes of two new characters, both speaking - which is represented in indirect speech. A trinket seller on the beach as well as an angry tourist lady are given the stage. This is the last time in the process that new characters are introduced. Also, in a new scene at the end of the draft so far, the main characters find themselves back on the pirate boat; I wouldn't expect too many new words being introduced here, as there have been scenes on the boat before. 3.3 Narrativity measure Using the narrativity annotation system from Vauth et al (2021), all verbal phrases were manually annotated as one of four types of events. Each type had a different weighing of eventfulness, ranging from 0 to 7. Non-events received 0 points; stative events 2, process events 5 and change of state events 7 points. Some example sentences from the drafts can be found in Table 1. Event type & weighing Sentence from the corpus 3 English translation Non-event (0 points) Misschien verandert dat wanneer Finn volgend schooljaar naar het eerste leerjaar gaat. Maybe that will change when Finn starts first grade next year. Stative event (2 points) Ons schip heet Elysa. Our ship is called Elysa. Process event (5 points) 'Wanneer zien we de flamingo's, mama?' vroeg Finn. "When are we going to see the flamingos, Mom?" Finn asked. Change of state event (7 points) Vanmorgen pikte een tourbus ons op vlak voor het hotel. This morning a tour bus picked us up in front of the hotel. [Table 1: Example sentences for each type of narrative event.] [Figure 5. A count of each type of narrative event for each session.] The change of state events can be considered to be most closely aligned with plot development. From Fig. 5, we can glean that in the first few sessions, all types of events are added to the draft, but then from session 3 onwards, Van Pelt mainly adds change of state and process events, the most ‘narrative’ categories. After the completion of the first draft, in session 16, relatively many process events are still added to the draft. These often consisted of dialogue and actions like ‘walking’ or ‘seeing’. [Figure 6. Bar plot with the difference in narrativity scores between consecutive pairs of text versions, where the score of the earlier version is subtracted from the score of the later version. ] Adding up all scores for each session gave a global narrativity score. Then, for each consecutive pairing of sessions, the score of the older session was subtracted from that of the newer session. Figure 6 visualises the resulting shifts in overall narrativity score between each pair of sessions. The first, text expansion, stage (session 1–7) has the highest growth in eventfulness, and the sessions 19 and onwards, which were part of the final revision stage with proofreading and lower-level revisions, have the lowest scores. The sessions in between those two distinguishable stages are not as easily subdivided. It is useful to see that after completion of the first draft, in session 16, the eventfulness still increases. By zooming in on the individual event types (Fig. 5) we can tell that this increase is mainly caused by process events. The overall pattern is very similar to Fig. 2, the count of unique lemma's. Especially the lemmas coming into the document seems highly aligned with the narrativity scores. One marked difference occurs for the pair 11_12 though: it has a similar number of lemmas going in and out of the document as pair 8_11, also a similar cosine document similarity, a similar small fluctuation in document length, but quite a difference in narrativity score, with the eventfullness showing a modest increase in 8_11, but a decrease from 11 to 12. On closer inspection of the drafts, the total amount of sentences does increase over session 12, but several substitutions swap the most plot-advancing, 'change events' (7 points) for less active events, such as statives (2 points) and non-events (0 points) (see Fig. 5). One example is this rewrite, where instead of the change events of food actively being placed on the plates of the main characters, a more descriptive, stative version is given. Van Pelt prioritises setting the scene over driving the plot forwards with these types of revisions and additions. Session 11: We krijgen een bord, waarop ze vis en couscous met groenten scheppen.[7] We are given a plate on which they serve fish and couscous with vegetables.[7] Substitution in Session 12 : Net als in het buffetrestaurant in ons hotel, wordt het eten hier voor jou op je bord geschept. [2] Er is gebakken vis, couscous en een [s]toverij 4 van groenten. [2] Just like in the buffet restaurant in our hotel, the food here is served to you on your plate. [2] There is fried fish, couscous and a vegetable stew. [2] The narrativity scores provide a meaningful portrait of the story’s plot development, but seem to overlap with the patterns shown for the unique lemmas. In the conclusion, the four measures will be compared more closely to assess which ones would complement each other the most. Conclusion In this study, computational methods for measuring semantic changes were applied to a chronological series of short story drafts. Based on close reading, we expected certain sessions to contain a more drastic semantic shift than others. Applying the count of unique lemmas in and out of the documents, the final, proofreading, stage was easily distinguishable. The document similarity measure picked up on the revision-rich stage, as the stage where the semantic shifts were consistently larger between sessions than either the stage focussed on text expansion and the stage focussed on proof-reading. The narrativity scores presented quite a similar picture to the counts of new lemmas coming into the document, and clearly differentiated the first stage of text expansion, where narrativity increased a lot each session, and the final proofreading stage, where few fluctuations in narrativity take place. The revision-rich stage had a more mixed presentation. In order to get an idea of the overlap between these measures, I ran a correlation matrix on the shifts in document length, the unique lemma counts, the narrativity score and the cosine similarity score. The differences in narrativity scores and in word count had a high correlation (0.92, p < .001). Word count differences also correlated highly with the count of unique lemmas entering the document (0.94, p < .0001) The narrativity scores too correlated highly with the amount of unique lemmas coming into the document (0.82, p < .001). The cosine document similarity correlated both with the counts of lemmas in (-0.66, p < .01 ) and out (-0.50, p < .05) of the document, but not with the narrativity score or the difference in word count. The narrativity score, for this specific corpus of drafts, seems to pick up more on plot expansion, whereas the cosine similarity mainly flagged the stage of intensive revision. Concerning the operationalisation of semantic change, the lack of quantitative correlation supports the idea that these two variables measure different, complementary aspects of meaning in the text. The counts of unique lemmas entering and exiting the document formed a bridge as they correlated with all other measures. All three computational measures picked up on parts of the macrogenesis of this story. Looking at the sessions where the semantic change was highest also offered relevant insights into the textual history. However, although all measures brought something to the table, they did not measure the same 'thing'. The narrativity score was designed to track plot and plot development; the word embeddings measure distances between document vectors in a multidimensional vector space. The counts of unique lemmas, a homemade measure added for its simplicity, showed partial correlations with both narrativity and cosine similarity. There was not a clear 'winner' in terms of capturing aspects of the macrogenesis that were found through close reading, but all provided entrance points into the draft corpus by highlighting those writing sessions that stood out from others in one way (plot, text expansion) or the other (intense revision, or proofreading). As the narrativity measure was hand-annotated, and in this specific case, correlated with the document growth, it would cause scalability issues when looking at semantic shifts in a much larger corpus. The cosine similarity based on document embeddings is a well-established method in digital literary studies which had not previously been applied to draft materials. It was already known that it is good at distinguishing textual genres, and it appeared to do so in this corpus as well, by flagging sessions where notes in a specific style were added or deleted. It was rather intruiging that the sessions focussed on revision and expansion of the existing scenes were less similar to their predecessors than those sessions where the story was expanded faster by adding new material in the form of new scenes, furthering the plot. Finally, the count of unique lemmas, although it's a quirky and simplistic idea, held up rather well compared with the more established methods. By reading through the lists of unique lemmas themselves, the researcher can also connect the quantitative with the close reading. As perhaps is often the case, I found the instances where a writing session was different from the rest, or when two analyses presented contrasting findings, to be just as informative about the writing process of Ellen Van Pelt as the search for macrogenetic stages and global impressions. For example, the differences in narrativity score between 8_11 and 11_12 suggests we can tell when the forward drive of plot development is paused by looking at the narrativity score, even when the other measures did not show differences between these two sets. This study was a first exploration of the possibilities of computational methods for textgenetic materials. Its findings are tightly connected to the specific case study, which had a high amount of revision throughout the entire process. Despite this high frequency of revision, no drastic changes were made; the plot was expanded gradually, no large sections were deleted from the drafts, and no big changes were made to the characters, setting, events, perspective and timing. Future studies could curate, perhaps even construct artifical, drafts that incorporate more marked changes, to check whether the semantic measures would pick up on those adequately. Declarations The author has no relevant financial or non-financial interests to disclose. Author Contribution F.B. conducted the research and wrote the manuscript. Acknowledgement This research was performed while I was a post-doc at Huygens Institute (KNAW), Amsterdam, the Netherlands. I would like to thank Karina van Dalen-Oskam for mentoring me during this time. My post-doc was part of the CLS-Infra project, funded by the European Union through its Horizon 2020 program, under grant agreement No 101004984. Data Availability All data used for my analyses has been published in the digital edition https://nanogenesis-digital.github.io/index.html, which is created by Lamyk Bekius. References Barron, A. T. J., Huang, J., Spang, R. L. and DeDeo, S. (2018). Individuals, institutions, and innovation in the debates of the French revolution. Proceedings of the National Academy of Sciences of the United States of America, 115(18): 4607–12. Bär, Daniel, et al. Composing Measures for Computing Text Similarity . Technical Report TUD-CS-2015-0017, Technische Universität Darmstadt, 2015. Beausang, Chris. ‘Diachronic Delta: A Computational Method for Analysing Periods of Accelerated Change in Literary Datasets’. Digital Scholarship in the Humanities , vol. 37, no. 3, Sept. 2022, pp. 644–59. Silverchair , https://doi.org/10.1093/llc/fqab041. Bekius, Lamyk L. (2023) Behind the Computer Screens . 2023. University of Amsterdam & Antwerp University, Doctoral thesis, https://hdl.handle.net/11245.1/07ab9a89-89b6-44cd-84c5-ddc46c9cdf60 Bekius, Lamyk L. (2024). Nanogenesis Digital . Retrieved August 21, 2025, from https://nanogenesis-digital.github.io Buschenhenke, Floor. (2025) Entering Stories: Decoding Born-Digital Writing through Keystroke Logging . University of Amsterdam & Antwerp University, Doctoral thesis, https://hdl.handle.net/11245.1/16460778-7a64-4134-8df5-321e8ece96ef Chandrasekaran, D., & Mago, V. (2021). Evolution of Semantic Similarity—A Survey. ACM Comput. Surv., 54(2), 41:1-41:37. https://doi.org/10.1145/3440755 Ehrmanntraut, Anton, et al. ‘Modeling and Measuring Short Text Similarities. On the Multi-Dimensional Differences between German Poetry of Realism and Modernism’. Journal of Computational Literary Studies , vol. 1, no. 1, 1, Dec. 2022. jcls.io , https://doi.org/10.48694/jcls.116. Fan, Li, et al. ‘Exploring the Behavioral and Neural Correlates of Semantic Distance in Creative Writing’. Psychophysiology , vol. 60, no. 5, 2023, p. e14239. Wiley Online Library , https://doi.org/10.1111/psyp.14239. Geyer, Thomas, et al. ‘Reading Haiku: Semantic Distance and the “Cut Effect”’. ‘To Sing the Haiku the American Way Is a Beautiful Thing’: The Haiku of Etheridge Knight , 2020, p. 9. Grésillon, Almuth. (2016). Eléments de critique génétique: Lire les manuscrits modernes . CNRS éditions. Griebel, Sarah, et al. Locating the Leading Edge of Cultural Change . Zotero , https://ceur-ws.org/Vol-3834/paper70.pdf. Han, M., Zhang, X., Yuan, X., Jiang, J., Yun, W., & Gao, C. (2021). A survey on the techniques, applications, and performance of short text semantic similarity. Concurrency and Computation: Practice and Experience , 33(5), e5971. https://doi.org/10.1002/cpe.5971 Herrmann, J. Berenike, et al. ‘Revisiting Style, a Key Concept in Literary Studies’. Journal of Literary Theory , vol. 9, no. 1, Jan. 2015. CrossRef , https://doi.org/10.1515/jlt-2015-0003. Ketzan, Erik, and Christof Schöch. ‘Classifying and Contextualizing Edits in Variants with Coleto: Three Versions of Andy Weir’s The Martian’. Digital Humanities Quarterly , vol. 015, no. 4, Dec. 2021. Kroeger, Paul R. Analyzing Meaning: An Introduction to Semantics and Pragmatics. Language Science Press, 2023. library.oapen.org , https://doi.org/10.5281/zenodo.6855854. Leijten, Mariëlle, and Luuk Van Waes. ‘Keystroke Logging in Writing Research: Using Inputlog to Analyze and Visualize Writing Processes’. Written Communication , vol. 30, no. 3, July 2013, pp. 358–92, https://doi.org/10.1177/0741088313491692. Neal, T., Sundararajan, K., Fatima, A., Yan, Y., Xiang, Y., & Woodard, D. (2017). Surveying Stylometry Techniques and Applications. ACM Comput. Surv. , 50 (6), 86:1-86:36. https://doi.org/10.1145/3132039 Peverelli, Andrea, et al. ‘Tracking Textual Similarities in Neo-Latin Drama Networks’. Proceedings of the Thirteenth Language Resources and Evaluation Conference , edited by Nicoletta Calzolari et al., European Language Resources Association, 2022, pp. 5295–303. ACLWeb , https://aclanthology.org/2022.lrec-1.567. Posthuma, Jente. ‘En Daarom Haten Ze Zichzelf’. De Gids, vol. 21, no. 1, 2021, https://www.de-gids.nl/artikelen/en-daarom-haten-ze-zichzelf. Ko van der Sloot, Iris Hendrickx, Maarten van Gompel, Antal van den Bosch and Walter Daelemans. Frog, A Natural Language Processing Suite for Dutch, Reference Guide, Language and Speech Technology Technical Report Series 18-02, Radboud University, Nijmegen, December 2018, Available from https://frognlp.readthedocs.io/en/latest/ + https://webservices.cls.ru.nl/frog Sobchuk, Oleg, and Artjoms Šeļa. ‘Computational Thematics: Comparing Algorithms for Clustering the Genres of Literary Fiction’. Humanities and Social Sciences Communications , vol. 11, no. 1, Mar. 2024, pp. 1–12. www.nature.com , https://doi.org/10.1057/s41599-024-02933-6. Szemes, Botond, and Mihály Nagy. Repetition and Innovation in Dramatic Texts . T.y.s.s, Santosh, et al. ‘A Tale of Two Revisions: Summarizing Changes Across Document Versions’. Findings of the Association for Computational Linguistics ACL 2024 , edited by Lun-Wei Ku et al., Association for Computational Linguistics, 2024, pp. 3195–211. ACLWeb , https://aclanthology.org/2024.findings-acl.190. Van Hulle, D., & Nixon, M. (2024). Beckett Digital Manuscript Project [online resource]. University of Oxford. van Cranenburgh, Andreas, et al. ‘Vector Space Explorations of Literary Language’. Language Resources and Evaluation , Feb. 2019. Springer Link , https://doi.org/10.1007/s10579-018-09442-4. Verhagen, Arie. ‘Construal and Stylistics–within a Language, across Contexts, across Languages’. Stylistics across Disciplines. Conference Proceedings , 2012. Google Scholar , http://arieverhagen.nl/cms/files/2012_Verhagen_ConstrualStylistics.pdf. Footnotes This paragraph is taken from Buschenhenke, 2025, p.127 A cosine similarity measure was also ran on the sets of unique lemmas. However, the sessions where one of these sets was very small (between 1 and 3 lemmas) had a very low similarity to the rest that is possibly related to this sample size. Taken from the the story draft, in session id14 ' toverij' means ' magic' but ' stoverij' means ' stew' - although Van Pelt uses ' toverij' it is quite possible that this is a typo. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 05 Nov, 2025 Reviews received at journal 01 Nov, 2025 Reviewers agreed at journal 08 Oct, 2025 Reviewers invited by journal 06 Oct, 2025 Editor assigned by journal 03 Oct, 2025 Submission checks completed at journal 02 Oct, 2025 First submitted to journal 30 Sep, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7750385","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":526584315,"identity":"e0e6a7e4-8df9-4acf-b85a-a0f7408061ea","order_by":0,"name":"Floor Buschenhenke","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAyklEQVRIiWNgGAWjYLAC3gYGAwb2BiDLwIIULTwHQFokSNEikQBiEqHF4HjzsQ9vd9gYy898fnXDjwIJBv727gT8Ws4cS54590yamcHtnLKbPUCHSZw5uwGvFrMbOcbMvG2HbQykc9Ju8AC1GEjkEqXlv438zDNpN/+QoOWAGcMN9mO3ibLFHugXxrltycYGZ3LYbssYSPAQ9Itke/Nhhrdtdobz248/u/nmj40cf3svfi1IgMcATBKrHATYH5CiehSMglEwCkYQAACHA0f+Zxr14QAAAABJRU5ErkJggg==","orcid":"","institution":"","correspondingAuthor":true,"prefix":"","firstName":"Floor","middleName":"","lastName":"Buschenhenke","suffix":""}],"badges":[],"createdAt":"2025-09-30 10:23:29","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7750385/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7750385/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":93780098,"identity":"cc7d4303-b703-4ce6-8ee0-4c64b61f2c71","added_by":"auto","created_at":"2025-10-17 12:58:44","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":135510,"visible":true,"origin":"","legend":"","description":"","filename":"distinguishingdrafts.docx","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/a51024af7e407d10d2653ecd.docx"},{"id":93777752,"identity":"604b2a4e-577c-497c-a29b-f3e63b0d8422","added_by":"auto","created_at":"2025-10-17 12:42:44","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":3329,"visible":true,"origin":"","legend":"","description":"","filename":"a6e07784bf194307be3b81bd1018582b.json","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/42b3a5179e5b52b13b1134cf.json"},{"id":93777745,"identity":"fbf63da2-a1aa-4bf3-b43e-d0bb3ea29111","added_by":"auto","created_at":"2025-10-17 12:42:44","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":93904,"visible":true,"origin":"","legend":"","description":"","filename":"a6e07784bf194307be3b81bd1018582b1enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/a03d99ed662e0d6b1e2d5bea.xml"},{"id":93779026,"identity":"6c29428f-e83d-4467-b976-8c703997a5cb","added_by":"auto","created_at":"2025-10-17 12:50:44","extension":"png","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":21182,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/76b80a7dd033fb6b3672ca7f.png"},{"id":93776140,"identity":"9a2cf73a-133a-4210-8434-47bad98a4927","added_by":"auto","created_at":"2025-10-17 12:34:44","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":26024,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/70590a0c925e42ef12b15f69.png"},{"id":93776152,"identity":"e0de2967-de8e-4a20-9fdc-6883d843ff73","added_by":"auto","created_at":"2025-10-17 12:34:44","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":29215,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/5ae0138c35a23b1a2d1be129.png"},{"id":93776148,"identity":"8a6b692e-5e8f-446f-acb6-58e2867b9b55","added_by":"auto","created_at":"2025-10-17 12:34:44","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":62864,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/eab142e57d27e43eb1ab73c6.png"},{"id":93777751,"identity":"996fc222-390f-4f03-9020-337f55630167","added_by":"auto","created_at":"2025-10-17 12:42:44","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":25197,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/ed162bc1f1ec9a44c3e924d4.png"},{"id":93776151,"identity":"0fdf7305-d7ef-4e6a-831b-94e50421aa76","added_by":"auto","created_at":"2025-10-17 12:34:44","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":7341,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/3b206edd7c2d69bb0f006cff.png"},{"id":93776153,"identity":"6d527b4c-24f5-4b8b-97ee-d93c1d0fa015","added_by":"auto","created_at":"2025-10-17 12:34:45","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":9064,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/4d032735ca466767261f7074.png"},{"id":93776154,"identity":"b492a660-ae1f-4842-8689-d35efdda015e","added_by":"auto","created_at":"2025-10-17 12:34:45","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":9465,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/589657d04c749a7dbc7415b6.png"},{"id":93776157,"identity":"0a3796cc-fe07-446c-91f7-ccc112c3707c","added_by":"auto","created_at":"2025-10-17 12:34:45","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":25718,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/49736f487c218bf5fcaf1e82.png"},{"id":93777753,"identity":"1394d316-176a-4ddf-839b-7ef964cb8d9d","added_by":"auto","created_at":"2025-10-17 12:42:45","extension":"png","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":9295,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/29a6fa13e6f966c9214cee37.png"},{"id":93776158,"identity":"aa6634c4-1181-45da-b7fd-fca3472c65fd","added_by":"auto","created_at":"2025-10-17 12:34:45","extension":"xml","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":90686,"visible":true,"origin":"","legend":"","description":"","filename":"a6e07784bf194307be3b81bd1018582b1structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/ca4e8e4767205ba0ac82ffd6.xml"},{"id":93776155,"identity":"6d7620df-d708-4d23-a882-2b92074cfacc","added_by":"auto","created_at":"2025-10-17 12:34:45","extension":"html","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":101077,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/ee522802fedf5fa7912d9d57.html"},{"id":93776139,"identity":"7d2ae569-e059-4ca8-8a30-d97839df02c1","added_by":"auto","created_at":"2025-10-17 12:34:44","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":65426,"visible":true,"origin":"","legend":"\u003cp\u003eDocument word count for each writing session\u003c/p\u003e","description":"","filename":"1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/a045f4bc6bb8d90f2d704b34.jpg"},{"id":93779029,"identity":"7456a276-4aa1-47b4-9551-6a6d5b3cee20","added_by":"auto","created_at":"2025-10-17 12:50:44","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":90745,"visible":true,"origin":"","legend":"\u003cp\u003eBarplot with the count of unique lemmas coming into (the black bars) and disappearing from the text (grey bar) per session, on the filtered lemma set, only nouns, adjectives, verbs \u0026amp; adverbs.\u003c/p\u003e","description":"","filename":"2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/3a3bb2c8cc69efe660fc9280.jpg"},{"id":93776145,"identity":"8f0e2e3c-7f1e-4eb8-8587-e4a154590cbd","added_by":"auto","created_at":"2025-10-17 12:34:44","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":67013,"visible":true,"origin":"","legend":"\u003cp\u003eFigure 4. Barplot showing the pairwise cosine similarity of the document texts.\u003c/p\u003e","description":"","filename":"4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/f1b8c913b6a99b4e536636d4.jpg"},{"id":93780442,"identity":"b4525d6c-2e7f-4656-9050-b245d841c8eb","added_by":"auto","created_at":"2025-10-17 13:06:44","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":101202,"visible":true,"origin":"","legend":"\u003cp\u003eFigure 5. A count of each type of narrative event for each session.\u003c/p\u003e","description":"","filename":"5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/085ee702dd4af88a9cc6ecc1.jpg"},{"id":93777750,"identity":"38924f08-57f3-473b-8be3-981c9841976e","added_by":"auto","created_at":"2025-10-17 12:42:44","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":61196,"visible":true,"origin":"","legend":"\u003cp\u003eFigure 6. Bar plot with the difference in narrativity scores between consecutive pairs of text versions, where the score of the earlier version is subtracted from the score of the later version.\u003c/p\u003e","description":"","filename":"6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/f8c45a54c683dadeba03937c.jpg"},{"id":93958756,"identity":"77df8883-6237-4ce8-b767-15cac218c863","added_by":"auto","created_at":"2025-10-20 16:40:46","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1166982,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7750385/v1/e1fddecb-725b-4c92-929e-0dc14b6fd91f.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Distinguishing drafts: measuring semantic distances between born-digital short story drafts","fulltext":[{"header":"Introduction","content":"\u003cp\u003eIn the Track Changes project, we collected materials from eleven Dutch and Flemish writers working on short stories, leading to 240 hours and +/- 14.400 txt-files, as one file was saved every minute, as well as at the start and end of each writing session. In addition, the Flemish author Gie Bogaert recorded his work on a novella, \u003cem\u003eRoosevelt\u003c/em\u003e, between 2014\u0026ndash;2016, which created 450+ \u0026lsquo;session versions\u0026rsquo; of his work-in-progress. Textual scholars working with born-digital, and in particular with keystroke logged materials, have to find their way through these kinds of large corpora of finegrained data. In order to explore these types of corpora, computational tools are an appealing entrypoint, both because they can handle such large amounts of files as well as because they can pick up on subtle changes between texts that readers cannot.\u003c/p\u003e\u003cp\u003eIn the current study, computational methods are applied to assess semantic aspects of the textual development over time. Semantic similarity is relatively understudied in computational literary studies. As Ehrmanntraut et al. (2022) state: \u0026ldquo;Though the concept of similarity is ubiquitous in the practice of literary studies it has seldom been analyzed explicitly.\u0026rdquo; (p.1) In particular, very few studies so far have applied tools that measure semantic similarity on drafts instead of finished works of literature.\u003c/p\u003e\u003cp\u003eThe aim of this study therefore, is to compare computational methods for measuring semantic distances between born-digital drafts. These tools may help in finding relevant entry points into a born-digital genetic dossier. They may pick up on things such as the introduction of new characters, a change in setting, or an extension of the story line by adding new events.\u003c/p\u003e\u003cp\u003eBy comparing different methods with close reading, we can assess whether such computational tools may be of help to textual scholars working with big corpora of text-genetic manuscripts.\u003c/p\u003e\u003cp\u003eSeveral methods will be compared and assessed. A first indicator for semantic change is the change in document length. Large changes in document length between drafts carry the potential for large semantic shifts. Secondly, a pair-wise comparison of documents will take place. This will be done on the basis of document embeddings created with BERT\u0026rsquo;s pre-trained Dutch language model. Thirdly, we will look at the unique lemmas that are removed from the previous draft and those newly introduced in the follow-up draft, and both count these unique lemmas and measure the cosine similarity on these sets of lemmas. Fourthly, to bring close reading into the mix, a narrativity measure will be applied (see Vauth et al, 2021) that can track plot development. Through manual annotation of events, a quantative score of narrativity is created.\u003c/p\u003e\u003cp\u003eThese computational methods are applied to a casestudy. The Flemish writer Ellen Van Pelt participated in the research project Track Changes (NWO-funded, 2018\u0026ndash;2024). From November 2020 to late January 2021, she used the keystroke logger Inputlog (Leijten \u0026amp; Van Waes, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2013\u003c/span\u003e) to record her writing process while working on a short story that would be called \u003cem\u003eDauphin\u003c/em\u003e. In 19 recorded work sessions she composed her story.\u003c/p\u003e\u003cp\u003eMost of textgenetic scholarship relies on close reading, and the proposed exploratory tools are by no means a replacement of this practice. This casestudy was chosen because it is part of the genetic edition Nanogenesis (Bekius, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), allowing me to study the textual developments through close reading, and compare the computational findings with the outcomes of the close reading.\u003c/p\u003e"},{"header":"1. Framework","content":"\u003cp\u003e\u003cem\u003eWhat is semantic similarity\u003c/em\u003e\u003c/p\u003e\u003cp\u003eI\u0026rsquo;m interested in a tool that can find both additions and deletions of \u0026lsquo;meaning\u0026rsquo; to the text/draft, which could take place through substitution, addition and deletion of text. I would like the tool to be able to pick up on the addition of a new scene, for example, or a shift in setting from indoors to outdoors. The study of meaning in texts is captured by what is commonly called ' semantics', and the tools I will present to measure differences between drafts are oriented towards semantic differences.\u003c/p\u003e\u003cp\u003eSemantic similarity is defined as: \u0026ldquo;the measure of semantic equivalence between two blocks of text.\u0026rdquo; (Chandrasekaran \u0026amp; Mago, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2021\u003c/span\u003e, p.2) This definition also captures an operationalisation; it is seen as something measurable quantitatively. Chandrasekaran \u0026amp; Mago further position semantic similarity as \u0026ldquo;one of the aspects of semantic relatedness\u0026rdquo;. (ibid.) Furthermore, in the operationalisation of semantic similarity, \u0026ldquo;methods asually give a ranking or percentage of similarity between texts, rather than a binary decision as similar or not similar.\u0026rdquo;\u003c/p\u003e\u003cp\u003eThe semantic relationship between several texts is measured in terms of \u0026lsquo;semantic distance\u0026rsquo;. In particular, it is often operationalised as distance in vector space (see 1.1), a method I am applying too - however, this captures only a very specific aspect of meaning in and through texts. Meaning arises not just from individual words, but from whole sentences, the text in its entirety, and from looking at the utterance/text in its context. (Kroeger, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) The selected measures will look at words and the whole texts, for optimal meaning disambiguation.\u003c/p\u003e\u003cp\u003eThe boundary between meaning and style is hard to draw. Verhagen (\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2012\u003c/span\u003e) points out that you can distinguish between objects (in the world, but also imaginary and abstract concepts) and the \u0026ldquo;choices of lexical items and grammatical constructions\u0026rdquo; used to talk about them. These constructions can be seen as part of a menu of multiple possible formulations. The chosen formulations link to \u0026lsquo;construal\u0026rsquo;: the way we choose to present an object in language to accomodate our audience. For example, \u0026ldquo;Marcel\u0026rdquo;, \u0026ldquo;my neighbour\u0026rdquo; and \u0026ldquo;a middle-aged man carrying a large suitcase\u0026rdquo; may all refer to the same person, but are more or less adequate ways of describing him depending on what we know and what our audience knows. Although semantics would then mostly be concerned with underlying objects, and style mostly with construals: \u0026ldquo;In fact, semantic analyses should be able to support the explanation of stylistic phenomena and the experience of a piece of discourse exhibiting a particular style may be used as evidence supporting or contradicting a semantic analysis.\u0026rdquo; (Verhagen, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2012\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eThe definition of style given by Herrmann, Van Dalen-Oskam \u0026amp; Sch\u0026ouml;ch (2015) was created with both computational and close-reading methodologies in mind: \u0026ldquo;Style is a property of texts constituted by an ensemble of formal features which can be observed quantitatively or qualitatively.\u0026rdquo; They incorporate semantics into stylistics: under formal features they list \u0026ldquo;linguistic features at the level of characters, lexicon, syntax, semantics (Stamatatos 2009, 4), but also features going beyond the sentence, such as narrative perspective or textual macro-structure; we differ from some previous definitions in that we conceive of stylistic features as explicitly defined and clearly identifiable.\u0026rdquo; (p. 44)\u003c/p\u003e\u003cp\u003eThere are multiple computational ways to analyse style in literary texts (stylometry). A common one is measuring the relative frequency of high-frequent words in a baseline corpus and a target text or texts. This approach is well suited for authorship attribution, genre and diachronic analyses. Looking at lexical diversity and readability/textual complexity are two other methods often used. (Neal et al, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). When selecting high-frequency words (these often include, for English, words like \u0026lsquo;the\u0026rsquo; and \u0026lsquo;it\u0026rsquo;) the concepts and terms that make each text unique are left out of the equation. The high-frequent words like \u0026lsquo;the\u0026rsquo; and \u0026lsquo;it\u0026rsquo; do not carry the bulk of the meaning of any text. However, the fact that this analysis is good at distinguishing different genres indicates that the distribution of these \u0026lsquo;small\u0026rsquo; words does capture themes and topics, and that the incorporation of semantics into a working definition of stylistics makes sense.\u003c/p\u003e\u003cp\u003eThis study works with measures that look at word and sentence (dis)similarity \u0026ndash; I do not exclude style as part of the explanation of any dis/similarities found.\u003c/p\u003e\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003e1.1 Measuring similarity of literary texts\u003c/h2\u003e\u003cp\u003ePrevious studies have used computational methods to assess semantic similarity in literary texts for various purposes. Some studies have looked \u003cem\u003ewithin\u003c/em\u003e texts to find differences between different segments of the same work. Semantic distance has played an important part of genre-analysis and genre attribution too. Furthermore, diachronic shifts in both the meaning of words and the topics that are popular have been studied using measures of semantic distance.\u003c/p\u003e\u003cp\u003e\u003cem\u003eWithin texts\u003c/em\u003e\u003c/p\u003e\u003cp\u003eGeyer et al (2020) studied the \u0026lsquo;cut effect\u0026rsquo; in haiku, which is where the readers grapple with two contrasting sections of a poem, and (re)read one of the parts (the \u0026lsquo;fragment line\u0026rsquo;) much more intensely in order to bridge the juxtaposition into a whole. The manually placed \u0026lsquo;cut\u0026rsquo; matched the semantic distances between lines very well, demonstrating the validity of the method.\u003c/p\u003e\u003cp\u003eFan et al (2023) see semantic distance between parts of a short story as a proxy for creativity. Their subjects were prompted with an incomplete story, and asked to write a creative ending. Using Word2Vec, both global (whole text) and local (a rolling window of about a sentence long) similarity were calculated. Here, for the whole text, \u0026lsquo;similarity\u0026rsquo; meant; the space the text took up in the multidimensional vector space. Human ratings of originality correlated with both the global and local distance measurements.\u003c/p\u003e\u003cp\u003eWithin text similarity was also measured by Szemes \u0026amp; Nagy (2024). They worked with several of Shakespeare\u0026rsquo;s plays, to investigate which characters were \u0026lsquo;innovative\u0026rsquo;, in the sense that they brought new information to the play. Working with sentence-based BERT cosine similarity, each line of each character was compared with the preceding lines from all other characters. This allowed for a kind of social network visualisation, where the strength and direction of the relationships between the characters was based on how similar they spoke and who echoed who.\u003c/p\u003e\u003cp\u003e\u003cem\u003eGenre\u003c/em\u003e\u003c/p\u003e\u003cp\u003eSemantic similarity measures have also succesfully been applied to detect genre boundaries. Van Cranenburgh, Van Dalen-Oskam \u0026amp; Van Zundert (2019) created a semantic profile of literariness. They use both intra-document and inter-document measures. To test the hypothesis that literary fiction is more lexically rich than other genres of fiction (such as sci-fi and romance), the intra-document paragraph vector (or \u0026lsquo;doc2vec\u0026rsquo;) method, which preserves some of the co-occurence information of words, was used to measure the semantic distances between different sections of a novel. The width of the semantic space occupied correlated with the literariness scores that a reader survey provided. The same held true for three inter-document measures, which looked at the differences and similarities between novels belonging to different genres.\u003c/p\u003e\u003cp\u003eSobchuk \u0026amp; Šela (2024) looked at \u0026lsquo;thematic\u0026rsquo; similarity for genre detection of novels. They compared different methods on two corpora; a first small corpus with four pre-tagged genres (detectives, fantasy, romance, science fiction), followed by a large and untagged corpus from Gutenberg. What is quite unique to their paper is the variety not just of analytical methods for assessing similarity, but also of pre-processing steps and textual features. In total they tested 291 combinations from the three \u0026lsquo;menus\u0026rsquo;, each time on a random sample of 100 text fragments. The final step in their analytical pipelines was an unsupervised clustering. The performance of each pipeline was tested against the genre tags from the small corpus. The \u0026lsquo;winner\u0026rsquo; was a pipeline where the texts were strongly pre-processed for theme (by lemmatizing, and then only using nouns, verbs, adverbs and adjectives, removing named entities and applying lexical simplification), then using a Doc2Vec, LDA or bags of words approach \u0026ndash; all led to good outcomes, and as similarity measurement Jensen-Shannon outperformed the other metrics, allthough cosine similarity worked very well in combination with doc2vec.\u003c/p\u003e\u003cp\u003eEhrmanntraut et al (2022), compared realist, naturalist and modernist German poetry on between-genre and within-genre textual similarity. Following B\u0026auml;r et al (2011), they used a multidimensional model to operationalise textual similarity. Next to B\u0026auml;r\u0026rsquo;s content, style and structure, they added emotion, as they found this to be a relevant category for poetry. With several human annotaters, they rated the poems\u0026rsquo; similarity on all dimensions. Ratings of content similarity and style similarity correlated highly. They then tested the performance of several models: a tf-idf (based on words), BERT on words, and sentence-BERT, with each poem treated like a sentence and compared in its entirety to other poems. The cosine similarity measure based on Bert-sentence embeddings showed the best results out of the box, with a correlation of 0.82 with human annotation. Results from the computational similary ratings were then mapped onto a heuristic model of distances between poems within and between three genres (realism, naturalism, modernism). This supported the claims made in literary scholarship about the shift from realism to modernism.\u003c/p\u003e\u003cp\u003e\u003cem\u003eDiachronic change\u003c/em\u003e\u003c/p\u003e\u003cp\u003eBeausang (2021) proposed the measure 'diachronic Delta', comprised of relative word frequencies, in a corpus where English literary texts were grouped by year of production for a period spanning the 18th to early 20th century. Underwood (2019) had previously demonstrated the gradual changes rather than drastic innovation shaping literary history over the centuries. Beausang tested this finding by searching for 'break years' where the word frequencies were both significantly different from the years before as well as similar to the years following. For the genre of prose, no break years were found.\u003c/p\u003e\u003cp\u003eA similar approach was taken by Griebel et al (2024), who tracked cultural change in three fields of study, including fiction. They applied document embeddings but indicated these needed to be finetuned to perform adequately on their (historical) sources. Like Beausang, they use a formula from Barron et al (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2018\u003c/span\u003e) to conceptualise change; novelty - transience\u0026thinsp;=\u0026thinsp;resonance. A year or period with resonance is both markedly different from the preceding period, as well as similar to the following period, showing an 'anticipation of future change' (Griebel et al, 2024, p232). Working with a similar corpus as Beausang, of 18th -20th century English language fiction, they applied topic models and document embeddings. Both measures worked similarly well.\u003c/p\u003e\u003cp\u003eWriting process materials can also be approached as a type of diachronic change, and the concept of 'resonance' could be applied to a writing process corpus as well. Both work sessions where the text is becoming more unlike its predecessors than previous versions became to their predecessors, as well as sessions that are 'already' much like later versions would be indicators of a textual development worthy of further inspection. These methods, however, require a much larger corpus than the current case study can offer.\u003c/p\u003e\u003cp\u003eIt is encouraging that several studies have shown a correlation between computational approaches to textual similarity and human expert annotation. Document and sentence embeddings are frequently used in this field and thus show promise to be applied to drafts. Only a few studies so far have applied computational tools to measure textual development between drafts. They will be presented in the next paragraph.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\u003ch2\u003e1.2 Similarity between drafts\u003c/h2\u003e\u003cp\u003eThe question of how drafts differ from each other is traditionally answered through hermeneutical practices. In the genetic critic's workflow, these consist of text transcription (to a digital format), followed by pairwise (automatic or manual) collation, followed by reading and interpretation of the textual differences. (Gr\u0026eacute;sillon, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). Once a set of drafts and materials has been brought into chronological relations, and represented in a digital format, quantitative visualisations and analyses do become possible. In the Beckett digital manuscript project (Van Hulle \u0026amp; Nixon, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) for example, a visualisation is given of the number of words that stayed constant, that were deleted, added and modified, between each set of two consecutive drafts of each work. In the field of writing process research, similar statistics are used to characterise digital keystroke logged writing processes. (e.g. Inputlog's summary analysis, Leijten \u0026amp; Van Waes, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2013\u003c/span\u003e)\u003c/p\u003e\u003cp\u003eSantosh et al (\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) developed a \u0026lsquo;condensed bird eye\u0026rsquo;s view of edits\u0026rsquo; in the form of a \u0026lsquo;thematic summary of changes\u0026rsquo; to help collaborating authors quickly see the overarching effects of what the track changes function shows in great detail for each individual revision. Their purpose was to help reviewers and co-authors to approve or reject batches of thematically linked edits with one click, rather than reading through each invidual edit. They developed a human-written set of 45 thematic summaries, based on reviewing a corpus of academic papers on the subject of natural language processing, where each paper had two versions, a submitted draft and a published document. GTP-4 was not able to do this task. They let the LLM create thematic descriptions of each edit (in the form of a diff output), and then cluster the edits into themes. The human annotations and the algorithmic results did not overlap sufficiently. The LLM was not able to provide adequate summaries of the edits, and the concept of \u0026lsquo;theme\u0026rsquo; was probably left too open ended in the process.\u003c/p\u003e\u003cp\u003eA fully algorithmic approach was taken by Ketzan \u0026amp; Sch\u0026ouml;ch (2021) with their tool 'Coleto'. Coleto was designed to compare three (published) versions of a novel (\u003cem\u003ethe Martian\u003c/em\u003e by Andrew Weir). It is a software pipeline that starts off with an automatic collation of two text versions and then tags each edit found. The edit classification has potential for a semantic analysis. There are two large categories, of \u003cem\u003escript-identifiable edits\u003c/em\u003e and \u003cem\u003esemantically open edits\u003c/em\u003e. These latter are subclassified into deletions, insertions and changes (or: substitutions at the same location). The \u0026lsquo;changes\u0026rsquo; are further subdivided based on Levenshtein distance. This is a measure of how many characters need to be changed to transform one string into another string. A \u0026lsquo;major\u0026rsquo; edit is defined as having a Levenshtein distance of more than 5, but naturally this is not set in stone. The distinction between major and minor could be a rough indicator of semantic distance too (see Sarkar et al, 2016), although Han et al (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2021\u003c/span\u003e) demonstrated that deep learning methods were more effective at assessing semantic distances than Levenshtein distance is.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\u003ch2\u003e1.3 Use case: Ellen Van Pelt's story \u003cem\u003eDauphin\u003c/em\u003e\u003c/h2\u003e\u003cp\u003eEllen Van Pelt (1980) is a Flemish writer and psychologist, who has published two novels (Drift, 2015, Zwaluwstaarten, 2025) a biography of Flemish writer Roger Van de Velde (Deze wereld is geen ergernis waard, 2020) and many short stories. She participated in the research project \u003cem\u003eTrack Changes\u003c/em\u003e, in which writers recorded their digital writing process using keystroke logging software. Inputlog (Leijten \u0026amp; Van Waes, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2013\u003c/span\u003e), the specific software we used, tracks key presses, mouse actions and screen/window switches, and operates on Microsoft Word on Windows operating systems. Writers can work in Word, while Inputlog captures their cursor position, the length of their document, and characters typed and deleted, all with accurate timestamps. Inputlog stores this detailed process information in a tabular format, and also captures text snapshots once a minute. Van Pelt installed Inputlog on her own laptop, on which she worked at her own convenience over a period of three months, from November of 2020 until mid-January of 2021. (18 writing sessions, 15 and a half hours in total) In January of 2023 she made a few additional changes to her story, no longer registered by the keystroke logger. This final version has also been included in our corpus.\u003c/p\u003e\u003cp\u003eIn Van Pelt\u0026rsquo;s story, \u003cem\u003eDauphin\u003c/em\u003e, a mother and young son are taking an all-inclusive vacation to Tunis. It is told in first person, from the mother's perspective.The discourse is mostly chronological and captures the events during a single holiday excursion, a day trip from Djerba to Flamingo Island, with a \u0026lsquo;pirate ship\u0026rsquo;.\u003c/p\u003e\u003cp\u003eThe mother feels totally out of place.\u003csup\u003e1\u003c/sup\u003e It is the first summer that the mother has to be \u0026lsquo;both father and mother\u0026rsquo;. The trip was a gift from her father, and not something she would have picked out herself. On the ship, one man stands out from the crowd because of his brooding demeanour and his dark-coloured, warm clothes despite the hot weather. On Flamingo Island, the son becomes too tired to walk and the mysterious man from their ship offers a piggyback ride. Then, at the beach, mother and son collect sea shells. Over lunch, Finn (the son) accidentally throws sand into Mustafa\u0026rsquo;s (the brooding man) plate. After the reprimand from his mother, he tells Mustafa that his father has died. Mustafa offers consolation. Sailing back from the island, they see two dolphins jumping out of the water. Finn is delighted and says \u0026lsquo;they are always happy\u0026rsquo; while the mother finds them beautiful and thinks about how dolphins are the only animals that commit suicide. Mustafa smiles at her for the first time.\u003c/p\u003e\u003cp\u003eTextgenetic process description\u003c/p\u003e\u003cp\u003eThe working document and the keystroke logging files show that Van Pelt started out her project with some notes and a pasted text from a travel agency\u0026rsquo;s website, about an excursion to a Tunesian island. Both the textual development and the nature of the notes in the working document show that Van Pelt did not have a fully fledged design for her story in advance of composition. The scenes, interactions between characters and the backstory of the main character were all developed during writing. No drastic changes or cuts were made to the plot, characterisation or setting in the process.\u003c/p\u003e\u003cp\u003eVan Pelt's process can be subdivided into several stages, although revision is a constant companion throughout the process.\u003c/p\u003e\u003cp\u003eA first stage is that of rapid text expansion. It runs from the first to the 7th session. This first expansion, the narrative is expanded chronologically, with consecutive events in the story being composed in their order of occurrence. Meanwhile, Van Pelt is revising her pre-existing draft in each writing session too. In session id7, the single note on the dramatic core of the situation, 'husband dead?' is deleted from the draft, and a scene in which this information is revealed (to Mustafa and the reader) is added at the bottom of the draft-so-far. The way in which the mother, son and the helpful stranger relate to each other has now been established too.\u003c/p\u003e\u003cp\u003eIn session id8, she introduces a list of new notes at the top of the document with eleven shipping-related sayings and expressions. She also indicates through notes that she questions the beginning, and is unsure about how to end the story. Up and including session 14, she reworks and makes modest additions within the existing scenes. For example, in session id12, she has already described the lunch, with the pirates banging on pots and pans, and calling out one table at a time, with the mother and son approaching the counter and that they receive full plates. Van Pelt adds descriptions of the tables, as well as the food, adding 'baked' to fish and 'magical' to the vegetables. She also makes a characterisation move by adding that the mother waits until most people have been served, similar to how she waited until most tourists had moved away from the docks before she set out. She is portrayed as not fitting in with the other tourists and coolly observing them. In session id14, the opening is changed by switching the order of two scenes, and situating the present moment of the narrative at a different point. After this session, Van Pelt prints her document and revises it on paper. We can see this as an important point in her process. She has now resolved the 'how to start' question from her notes.\u003c/p\u003e\u003cp\u003eIn sessions 15 and 16, she revises as well as completes the story, at the end of session 16, also resolving the ending-note at least for the time being. So after session id16, she continues on a full draft. She prints this version of her document too, and starts revising it on paper first, then implementing these changes in session 17. From 17 all throughout the remaining process, the types of textual changes made are smaller-scale and mostly stylistic, with a focus on word variation, clarifying references, and creating coherence.\u003c/p\u003e\u003cp\u003eExpectations\u003c/p\u003e\u003cp\u003eAs Van Pelt incorporates extensive revision throughout her work process, I expect some semantical changes between all drafts. However, based on the key points in both plot decisions with regards to the underlying situation (the death of the husband), the ending and the beginning of the story, as well as based on the shifts in document length, I see a number of stages that have a different focus. The first, text expansion stage, from session 1 up to 7, adds many scenes and events. At the end of that stage, two thirds of the final document length is achieved. I would expect to see substantial changes in document similarity during this stage.\u003c/p\u003e\u003cp\u003eThe second stage, from id8 up and including id14, she continues gradually filling out her story, often from within existing scenes, but also tackling the beginning of the narrative. In 15 and 16 she completes her first draft by adding an ending. The final stage, from id17 onwards, many revisions still take place, but their scope is smaller than previously. I would expect this stage to show the smallest semantic shifts between consecutive drafts.\u003c/p\u003e\u003c/div\u003e\u003cp\u003e\u003cem\u003eWhat is semantic similarity\u003c/em\u003e\u003c/p\u003e\u003cp\u003eI\u0026rsquo;m interested in a tool that can find both additions and deletions of \u0026lsquo;meaning\u0026rsquo; to the text/draft, which could take place through substitution, addition and deletion of text. I would like the tool to be able to pick up on the addition of a new scene, for example, or a shift in setting from indoors to outdoors. The study of meaning in texts is captured by what is commonly called ' semantics', and the tools I will present to measure differences between drafts are oriented towards semantic differences.\u003c/p\u003e\u003cp\u003eSemantic similarity is defined as: \u0026ldquo;the measure of semantic equivalence between two blocks of text.\u0026rdquo; (Chandrasekaran \u0026amp; Mago, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2021\u003c/span\u003e, p.2) This definition also captures an operationalisation; it is seen as something measurable quantitatively. Chandrasekaran \u0026amp; Mago further position semantic similarity as \u0026ldquo;one of the aspects of semantic relatedness\u0026rdquo;. (ibid.) Furthermore, in the operationalisation of semantic similarity, \u0026ldquo;methods asually give a ranking or percentage of similarity between texts, rather than a binary decision as similar or not similar.\u0026rdquo;\u003c/p\u003e\u003cp\u003eThe semantic relationship between several texts is measured in terms of \u0026lsquo;semantic distance\u0026rsquo;. In particular, it is often operationalised as distance in vector space (see 1.1), a method I am applying too - however, this captures only a very specific aspect of meaning in and through texts. Meaning arises not just from individual words, but from whole sentences, the text in its entirety, and from looking at the utterance/text in its context. (Kroeger, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) The selected measures will look at words and the whole texts, for optimal meaning disambiguation.\u003c/p\u003e\u003cp\u003eThe boundary between meaning and style is hard to draw. Verhagen (\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2012\u003c/span\u003e) points out that you can distinguish between objects (in the world, but also imaginary and abstract concepts) and the \u0026ldquo;choices of lexical items and grammatical constructions\u0026rdquo; used to talk about them. These constructions can be seen as part of a menu of multiple possible formulations. The chosen formulations link to \u0026lsquo;construal\u0026rsquo;: the way we choose to present an object in language to accomodate our audience. For example, \u0026ldquo;Marcel\u0026rdquo;, \u0026ldquo;my neighbour\u0026rdquo; and \u0026ldquo;a middle-aged man carrying a large suitcase\u0026rdquo; may all refer to the same person, but are more or less adequate ways of describing him depending on what we know and what our audience knows. Although semantics would then mostly be concerned with underlying objects, and style mostly with construals: \u0026ldquo;In fact, semantic analyses should be able to support the explanation of stylistic phenomena and the experience of a piece of discourse exhibiting a particular style may be used as evidence supporting or contradicting a semantic analysis.\u0026rdquo; (Verhagen, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2012\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eThe definition of style given by Herrmann, Van Dalen-Oskam \u0026amp; Sch\u0026ouml;ch (2015) was created with both computational and close-reading methodologies in mind: \u0026ldquo;Style is a property of texts constituted by an ensemble of formal features which can be observed quantitatively or qualitatively.\u0026rdquo; They incorporate semantics into stylistics: under formal features they list \u0026ldquo;linguistic features at the level of characters, lexicon, syntax, semantics (Stamatatos 2009, 4), but also features going beyond the sentence, such as narrative perspective or textual macro-structure; we differ from some previous definitions in that we conceive of stylistic features as explicitly defined and clearly identifiable.\u0026rdquo; (p. 44)\u003c/p\u003e\u003cp\u003eThere are multiple computational ways to analyse style in literary texts (stylometry). A common one is measuring the relative frequency of high-frequent words in a baseline corpus and a target text or texts. This approach is well suited for authorship attribution, genre and diachronic analyses. Looking at lexical diversity and readability/textual complexity are two other methods often used. (Neal et al, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). When selecting high-frequency words (these often include, for English, words like \u0026lsquo;the\u0026rsquo; and \u0026lsquo;it\u0026rsquo;) the concepts and terms that make each text unique are left out of the equation. The high-frequent words like \u0026lsquo;the\u0026rsquo; and \u0026lsquo;it\u0026rsquo; do not carry the bulk of the meaning of any text. However, the fact that this analysis is good at distinguishing different genres indicates that the distribution of these \u0026lsquo;small\u0026rsquo; words does capture themes and topics, and that the incorporation of semantics into a working definition of stylistics makes sense.\u003c/p\u003e\u003cp\u003eThis study works with measures that look at word and sentence (dis)similarity \u0026ndash; I do not exclude style as part of the explanation of any dis/similarities found.\u003c/p\u003e\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003e1.1 Measuring similarity of literary texts\u003c/h2\u003e\u003cp\u003ePrevious studies have used computational methods to assess semantic similarity in literary texts for various purposes. Some studies have looked \u003cem\u003ewithin\u003c/em\u003e texts to find differences between different segments of the same work. Semantic distance has played an important part of genre-analysis and genre attribution too. Furthermore, diachronic shifts in both the meaning of words and the topics that are popular have been studied using measures of semantic distance.\u003c/p\u003e\u003cp\u003e\u003cem\u003eWithin texts\u003c/em\u003e\u003c/p\u003e\u003cp\u003eGeyer et al (2020) studied the \u0026lsquo;cut effect\u0026rsquo; in haiku, which is where the readers grapple with two contrasting sections of a poem, and (re)read one of the parts (the \u0026lsquo;fragment line\u0026rsquo;) much more intensely in order to bridge the juxtaposition into a whole. The manually placed \u0026lsquo;cut\u0026rsquo; matched the semantic distances between lines very well, demonstrating the validity of the method.\u003c/p\u003e\u003cp\u003eFan et al (2023) see semantic distance between parts of a short story as a proxy for creativity. Their subjects were prompted with an incomplete story, and asked to write a creative ending. Using Word2Vec, both global (whole text) and local (a rolling window of about a sentence long) similarity were calculated. Here, for the whole text, \u0026lsquo;similarity\u0026rsquo; meant; the space the text took up in the multidimensional vector space. Human ratings of originality correlated with both the global and local distance measurements.\u003c/p\u003e\u003cp\u003eWithin text similarity was also measured by Szemes \u0026amp; Nagy (2024). They worked with several of Shakespeare\u0026rsquo;s plays, to investigate which characters were \u0026lsquo;innovative\u0026rsquo;, in the sense that they brought new information to the play. Working with sentence-based BERT cosine similarity, each line of each character was compared with the preceding lines from all other characters. This allowed for a kind of social network visualisation, where the strength and direction of the relationships between the characters was based on how similar they spoke and who echoed who.\u003c/p\u003e\u003cp\u003e\u003cem\u003eGenre\u003c/em\u003e\u003c/p\u003e\u003cp\u003eSemantic similarity measures have also succesfully been applied to detect genre boundaries. Van Cranenburgh, Van Dalen-Oskam \u0026amp; Van Zundert (2019) created a semantic profile of literariness. They use both intra-document and inter-document measures. To test the hypothesis that literary fiction is more lexically rich than other genres of fiction (such as sci-fi and romance), the intra-document paragraph vector (or \u0026lsquo;doc2vec\u0026rsquo;) method, which preserves some of the co-occurence information of words, was used to measure the semantic distances between different sections of a novel. The width of the semantic space occupied correlated with the literariness scores that a reader survey provided. The same held true for three inter-document measures, which looked at the differences and similarities between novels belonging to different genres.\u003c/p\u003e\u003cp\u003eSobchuk \u0026amp; Šela (2024) looked at \u0026lsquo;thematic\u0026rsquo; similarity for genre detection of novels. They compared different methods on two corpora; a first small corpus with four pre-tagged genres (detectives, fantasy, romance, science fiction), followed by a large and untagged corpus from Gutenberg. What is quite unique to their paper is the variety not just of analytical methods for assessing similarity, but also of pre-processing steps and textual features. In total they tested 291 combinations from the three \u0026lsquo;menus\u0026rsquo;, each time on a random sample of 100 text fragments. The final step in their analytical pipelines was an unsupervised clustering. The performance of each pipeline was tested against the genre tags from the small corpus. The \u0026lsquo;winner\u0026rsquo; was a pipeline where the texts were strongly pre-processed for theme (by lemmatizing, and then only using nouns, verbs, adverbs and adjectives, removing named entities and applying lexical simplification), then using a Doc2Vec, LDA or bags of words approach \u0026ndash; all led to good outcomes, and as similarity measurement Jensen-Shannon outperformed the other metrics, allthough cosine similarity worked very well in combination with doc2vec.\u003c/p\u003e\u003cp\u003eEhrmanntraut et al (2022), compared realist, naturalist and modernist German poetry on between-genre and within-genre textual similarity. Following B\u0026auml;r et al (2011), they used a multidimensional model to operationalise textual similarity. Next to B\u0026auml;r\u0026rsquo;s content, style and structure, they added emotion, as they found this to be a relevant category for poetry. With several human annotaters, they rated the poems\u0026rsquo; similarity on all dimensions. Ratings of content similarity and style similarity correlated highly. They then tested the performance of several models: a tf-idf (based on words), BERT on words, and sentence-BERT, with each poem treated like a sentence and compared in its entirety to other poems. The cosine similarity measure based on Bert-sentence embeddings showed the best results out of the box, with a correlation of 0.82 with human annotation. Results from the computational similary ratings were then mapped onto a heuristic model of distances between poems within and between three genres (realism, naturalism, modernism). This supported the claims made in literary scholarship about the shift from realism to modernism.\u003c/p\u003e\u003cp\u003e\u003cem\u003eDiachronic change\u003c/em\u003e\u003c/p\u003e\u003cp\u003eBeausang (2021) proposed the measure 'diachronic Delta', comprised of relative word frequencies, in a corpus where English literary texts were grouped by year of production for a period spanning the 18th to early 20th century. Underwood (2019) had previously demonstrated the gradual changes rather than drastic innovation shaping literary history over the centuries. Beausang tested this finding by searching for 'break years' where the word frequencies were both significantly different from the years before as well as similar to the years following. For the genre of prose, no break years were found.\u003c/p\u003e\u003cp\u003eA similar approach was taken by Griebel et al (2024), who tracked cultural change in three fields of study, including fiction. They applied document embeddings but indicated these needed to be finetuned to perform adequately on their (historical) sources. Like Beausang, they use a formula from Barron et al (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2018\u003c/span\u003e) to conceptualise change; novelty - transience\u0026thinsp;=\u0026thinsp;resonance. A year or period with resonance is both markedly different from the preceding period, as well as similar to the following period, showing an 'anticipation of future change' (Griebel et al, 2024, p232). Working with a similar corpus as Beausang, of 18th -20th century English language fiction, they applied topic models and document embeddings. Both measures worked similarly well.\u003c/p\u003e\u003cp\u003eWriting process materials can also be approached as a type of diachronic change, and the concept of 'resonance' could be applied to a writing process corpus as well. Both work sessions where the text is becoming more unlike its predecessors than previous versions became to their predecessors, as well as sessions that are 'already' much like later versions would be indicators of a textual development worthy of further inspection. These methods, however, require a much larger corpus than the current case study can offer.\u003c/p\u003e\u003cp\u003eIt is encouraging that several studies have shown a correlation between computational approaches to textual similarity and human expert annotation. Document and sentence embeddings are frequently used in this field and thus show promise to be applied to drafts. Only a few studies so far have applied computational tools to measure textual development between drafts. They will be presented in the next paragraph.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\u003ch2\u003e1.2 Similarity between drafts\u003c/h2\u003e\u003cp\u003eThe question of how drafts differ from each other is traditionally answered through hermeneutical practices. In the genetic critic's workflow, these consist of text transcription (to a digital format), followed by pairwise (automatic or manual) collation, followed by reading and interpretation of the textual differences. (Gr\u0026eacute;sillon, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). Once a set of drafts and materials has been brought into chronological relations, and represented in a digital format, quantitative visualisations and analyses do become possible. In the Beckett digital manuscript project (Van Hulle \u0026amp; Nixon, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) for example, a visualisation is given of the number of words that stayed constant, that were deleted, added and modified, between each set of two consecutive drafts of each work. In the field of writing process research, similar statistics are used to characterise digital keystroke logged writing processes. (e.g. Inputlog's summary analysis, Leijten \u0026amp; Van Waes, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2013\u003c/span\u003e)\u003c/p\u003e\u003cp\u003eSantosh et al (\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) developed a \u0026lsquo;condensed bird eye\u0026rsquo;s view of edits\u0026rsquo; in the form of a \u0026lsquo;thematic summary of changes\u0026rsquo; to help collaborating authors quickly see the overarching effects of what the track changes function shows in great detail for each individual revision. Their purpose was to help reviewers and co-authors to approve or reject batches of thematically linked edits with one click, rather than reading through each invidual edit. They developed a human-written set of 45 thematic summaries, based on reviewing a corpus of academic papers on the subject of natural language processing, where each paper had two versions, a submitted draft and a published document. GTP-4 was not able to do this task. They let the LLM create thematic descriptions of each edit (in the form of a diff output), and then cluster the edits into themes. The human annotations and the algorithmic results did not overlap sufficiently. The LLM was not able to provide adequate summaries of the edits, and the concept of \u0026lsquo;theme\u0026rsquo; was probably left too open ended in the process.\u003c/p\u003e\u003cp\u003eA fully algorithmic approach was taken by Ketzan \u0026amp; Sch\u0026ouml;ch (2021) with their tool 'Coleto'. Coleto was designed to compare three (published) versions of a novel (\u003cem\u003ethe Martian\u003c/em\u003e by Andrew Weir). It is a software pipeline that starts off with an automatic collation of two text versions and then tags each edit found. The edit classification has potential for a semantic analysis. There are two large categories, of \u003cem\u003escript-identifiable edits\u003c/em\u003e and \u003cem\u003esemantically open edits\u003c/em\u003e. These latter are subclassified into deletions, insertions and changes (or: substitutions at the same location). The \u0026lsquo;changes\u0026rsquo; are further subdivided based on Levenshtein distance. This is a measure of how many characters need to be changed to transform one string into another string. A \u0026lsquo;major\u0026rsquo; edit is defined as having a Levenshtein distance of more than 5, but naturally this is not set in stone. The distinction between major and minor could be a rough indicator of semantic distance too (see Sarkar et al, 2016), although Han et al (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2021\u003c/span\u003e) demonstrated that deep learning methods were more effective at assessing semantic distances than Levenshtein distance is.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\u003ch2\u003e1.3 Use case: Ellen Van Pelt's story \u003cem\u003eDauphin\u003c/em\u003e\u003c/h2\u003e\u003cp\u003eEllen Van Pelt (1980) is a Flemish writer and psychologist, who has published two novels (Drift, 2015, Zwaluwstaarten, 2025) a biography of Flemish writer Roger Van de Velde (Deze wereld is geen ergernis waard, 2020) and many short stories. She participated in the research project \u003cem\u003eTrack Changes\u003c/em\u003e, in which writers recorded their digital writing process using keystroke logging software. Inputlog (Leijten \u0026amp; Van Waes, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2013\u003c/span\u003e), the specific software we used, tracks key presses, mouse actions and screen/window switches, and operates on Microsoft Word on Windows operating systems. Writers can work in Word, while Inputlog captures their cursor position, the length of their document, and characters typed and deleted, all with accurate timestamps. Inputlog stores this detailed process information in a tabular format, and also captures text snapshots once a minute. Van Pelt installed Inputlog on her own laptop, on which she worked at her own convenience over a period of three months, from November of 2020 until mid-January of 2021. (18 writing sessions, 15 and a half hours in total) In January of 2023 she made a few additional changes to her story, no longer registered by the keystroke logger. This final version has also been included in our corpus.\u003c/p\u003e\u003cp\u003eIn Van Pelt\u0026rsquo;s story, \u003cem\u003eDauphin\u003c/em\u003e, a mother and young son are taking an all-inclusive vacation to Tunis. It is told in first person, from the mother's perspective.The discourse is mostly chronological and captures the events during a single holiday excursion, a day trip from Djerba to Flamingo Island, with a \u0026lsquo;pirate ship\u0026rsquo;.\u003c/p\u003e\u003cp\u003eThe mother feels totally out of place.\u003csup\u003e1\u003c/sup\u003e It is the first summer that the mother has to be \u0026lsquo;both father and mother\u0026rsquo;. The trip was a gift from her father, and not something she would have picked out herself. On the ship, one man stands out from the crowd because of his brooding demeanour and his dark-coloured, warm clothes despite the hot weather. On Flamingo Island, the son becomes too tired to walk and the mysterious man from their ship offers a piggyback ride. Then, at the beach, mother and son collect sea shells. Over lunch, Finn (the son) accidentally throws sand into Mustafa\u0026rsquo;s (the brooding man) plate. After the reprimand from his mother, he tells Mustafa that his father has died. Mustafa offers consolation. Sailing back from the island, they see two dolphins jumping out of the water. Finn is delighted and says \u0026lsquo;they are always happy\u0026rsquo; while the mother finds them beautiful and thinks about how dolphins are the only animals that commit suicide. Mustafa smiles at her for the first time.\u003c/p\u003e\u003cp\u003eTextgenetic process description\u003c/p\u003e\u003cp\u003eThe working document and the keystroke logging files show that Van Pelt started out her project with some notes and a pasted text from a travel agency\u0026rsquo;s website, about an excursion to a Tunesian island. Both the textual development and the nature of the notes in the working document show that Van Pelt did not have a fully fledged design for her story in advance of composition. The scenes, interactions between characters and the backstory of the main character were all developed during writing. No drastic changes or cuts were made to the plot, characterisation or setting in the process.\u003c/p\u003e\u003cp\u003eVan Pelt's process can be subdivided into several stages, although revision is a constant companion throughout the process.\u003c/p\u003e\u003cp\u003eA first stage is that of rapid text expansion. It runs from the first to the 7th session. This first expansion, the narrative is expanded chronologically, with consecutive events in the story being composed in their order of occurrence. Meanwhile, Van Pelt is revising her pre-existing draft in each writing session too. In session id7, the single note on the dramatic core of the situation, 'husband dead?' is deleted from the draft, and a scene in which this information is revealed (to Mustafa and the reader) is added at the bottom of the draft-so-far. The way in which the mother, son and the helpful stranger relate to each other has now been established too.\u003c/p\u003e\u003cp\u003eIn session id8, she introduces a list of new notes at the top of the document with eleven shipping-related sayings and expressions. She also indicates through notes that she questions the beginning, and is unsure about how to end the story. Up and including session 14, she reworks and makes modest additions within the existing scenes. For example, in session id12, she has already described the lunch, with the pirates banging on pots and pans, and calling out one table at a time, with the mother and son approaching the counter and that they receive full plates. Van Pelt adds descriptions of the tables, as well as the food, adding 'baked' to fish and 'magical' to the vegetables. She also makes a characterisation move by adding that the mother waits until most people have been served, similar to how she waited until most tourists had moved away from the docks before she set out. She is portrayed as not fitting in with the other tourists and coolly observing them. In session id14, the opening is changed by switching the order of two scenes, and situating the present moment of the narrative at a different point. After this session, Van Pelt prints her document and revises it on paper. We can see this as an important point in her process. She has now resolved the 'how to start' question from her notes.\u003c/p\u003e\u003cp\u003eIn sessions 15 and 16, she revises as well as completes the story, at the end of session 16, also resolving the ending-note at least for the time being. So after session id16, she continues on a full draft. She prints this version of her document too, and starts revising it on paper first, then implementing these changes in session 17. From 17 all throughout the remaining process, the types of textual changes made are smaller-scale and mostly stylistic, with a focus on word variation, clarifying references, and creating coherence.\u003c/p\u003e\u003cp\u003eExpectations\u003c/p\u003e\u003cp\u003eAs Van Pelt incorporates extensive revision throughout her work process, I expect some semantical changes between all drafts. However, based on the key points in both plot decisions with regards to the underlying situation (the death of the husband), the ending and the beginning of the story, as well as based on the shifts in document length, I see a number of stages that have a different focus. The first, text expansion stage, from session 1 up to 7, adds many scenes and events. At the end of that stage, two thirds of the final document length is achieved. I would expect to see substantial changes in document similarity during this stage.\u003c/p\u003e\u003cp\u003eThe second stage, from id8 up and including id14, she continues gradually filling out her story, often from within existing scenes, but also tackling the beginning of the narrative. In 15 and 16 she completes her first draft by adding an ending. The final stage, from id17 onwards, many revisions still take place, but their scope is smaller than previously. I would expect this stage to show the smallest semantic shifts between consecutive drafts.\u003c/p\u003e\u003c/div\u003e"},{"header":"2. Method(s)","content":"\u003cp\u003eAs stated above, for comparing different drafts, there is no standard metric available (and very little computational work done in general). Ketzan \u0026amp; Sch\u0026ouml;ch (2021) used a Levenshtein distance on the edits only, and Santosh et al combined human annotation with LLM interpretation to group edits into themes - with mixed results. In the broader field of literary studies, Doc2Vec and Bert cosine similarity measures are often applied to measure semantic distance, and align with human judgement of textual similarity.\u003c/p\u003e\u003cp\u003eAs this is an exploratory study, I have opted to compare both a whole-document measure as well as to isolate just the textual changes.\u003c/p\u003e\u003cp\u003eIn order to compare the computational methods with a close-reading approach, I have manually annoted the events in the drafts based on Vauth et al (2021), leading to a narrativity score, as will be discussed in section \u003cspan refid=\"Sec10\" class=\"InternalRef\"\u003e2.4\u003c/span\u003e.\u003c/p\u003e\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\u003ch2\u003e2.1 Document length\u003c/h2\u003e\u003cp\u003eUsing Frog (Van der Sloot et al, 2018), a parser for modern Dutch, the texts were tokenized and lemmatized. As a basic first proxy for semantic change, the difference in word (token) count of each pair of drafts was used.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003e2.2 Counting lemmas\u003c/h2\u003e\u003cp\u003eAs the genetic dossier, such as collected through keystroke logging, starts off with an empty document that is growing and developing into a fully formed text, which may then also be reworked and rewritten, the individual drafts are of varying sizes, and in varying forms of completeness. Simply measuring shifts in document size can give a first indicator of meaningful changes. However, isolating just the parts that have been added to or deleted from the draft in between versions allows me to focus on the semantic shifts more clearly. I have furthermore opted for the isolation of words (and their lemma\u0026rsquo;s) as a unit of analysis, instead of the entire fragments that came and went.\u003c/p\u003e\u003cp\u003eThen, for each consecutive pair of documents two lists were made; of lemmas unique to the first version, and lemmas unique to the second version, so added to the document. Then, a simple count was performed of how many lemma's were brought into and taken out of the document.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\u003ch2\u003e2.3 Document cosine similarity\u003c/h2\u003e\u003cp\u003eOn the basis of a chronological ordering of the drafts, for each pair of texts, semantic similarity will be measured through the document cosine similarity using (off-the-shelf, pre-trained) Dutch BERT.\u003c/p\u003e\u003cp\u003eBERT looks at words in their sentence-context, and then creates a vector for each word in multidimensional space, based on the semantic similarities between words. These similarities are based on the similarities of their contexts of occurence, so the words usually surrounding the target word. BERT is available for multiple languages, and these numeric representations have been created using a large corpus of texts. I have used the Dutch model \u003cem\u003eGroNLP/bert-base-dutch-cased\u003c/em\u003e from the Python \u003cem\u003etransformers\u003c/em\u003e package and then created a document-based embedding. The pairwise cosine similarity was analysed using the \u003cem\u003esklearn.metrics.pairwise\u003c/em\u003e package.\u003c/p\u003e\u003cp\u003eSo when using this tool to measure 'semantic' differences, what we are looking at is how far away the words of one draft are from the words in the following draft. This may not always align with human interpretation. To give an example, in this multidimensional space, 'king' will be closer to 'queen' than to 'jester', whereas for a gender-themed analysis of textual changes, you might consider a shift from king to jester less important than a shift from king to queen.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\u003ch2\u003e2.4 Narrativity measure\u003c/h2\u003e\u003cp\u003eTo compare the computational approaches to human interpretation of semantic shifts between drafts, we opted to annotate the differences between the drafts with a focus on narrativity. As the drafts in this corpus start from an empty document and work up to a completed narrative, shifts in narrativity say something about the furthering of the plot and the expansion of the world-building.\u003c/p\u003e\u003cp\u003eIn assessing the types and amount of events that take place in each text version, we have a metric that gives a holistic score of the 'eventfulness' or narrativity of each version of the draft. As a basis for this annotation, we used Gius \u0026amp; Vauth's (2022; see also Vauth et al, 2021) work on event classification. They see events as the 'minimal narrative units' (p.333) and take each verbal phrase (a finite verb and everything connected to it) as units of analysis. This small-scale approach is suitable for our draft corpus, where the changes between drafts may lay in a few sentences only - making a holistic / macrogenetic characterisation of the draft less suitable.\u003c/p\u003e\u003cp\u003eThey distinguish four categories of events: non-events, stative events, process events and change-of-state events. Non-events are those verbal phrases that no not represent an event at all; but rather a general fact, questions or counterfactual statements (like 'I should have seen it coming.') Stative events are descriptions that do not encompass a time or duration aspect, and often provide information about the setting of a story. An example is 'His trousers were black.' Process events consist of actions and mental processes, such as walking, thinking, seeing or speaking. Change of state events contain 'physical or mental state changes'.\u003c/p\u003e\u003cp\u003eThese four types have different intensities of narrativity. The change of state is the most eventful, and most likely to propel the plot forwards, followed by the process events, and then the stative events, with the non-events coming in last, as they do not contain events that pertain to the plot at all. Gius \u0026amp; Vauth (2022) then add numerical loadings to each type, to calculate shifts in narrativity throughout a set of novels. I have used their scoring system to attribute weights in the same way. Each draft will then receive a narrativity score based on the amount of verbal phrases for each type of event, times the weight of that type of event. Increases and decreases of this score between consecutive pairs of drafts indicate shifts in narrativity.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003e2.5 Data\u003c/h2\u003e\u003cp\u003eThe raw keystroke corpus of Ellen Van Pelt's work process was used. Allthough the keystroke source material captures all textual changes made in each session, for this study I am only looking at the changes between the draft at the start and end of each writing session. I selected the Word files from the end of each writing session, which came to 19 files in total. Then, consecutive pairs of documents were used for the measurements. Version 1 was compared with version 2, version 2 with version 3, et cetera. The documents contained the story draft, but most of them also a modest amount of notes.\u003c/p\u003e\u003c/div\u003e"},{"header":"3 Exploration/Results","content":"\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\u003ch2\u003e3.1 Document length \u0026amp; Count of unique lemmas\u003c/h2\u003e\u003cp\u003eFor the tokenized texts, I performed a word count first \u0026ndash; this already gives some insight into potentially pivotal moments in the genesis. As seen in Fig.\u0026nbsp;1, over the first 7 sessions, the document size gradually expands. This is followed by a series of sessions where the size is much more stable (8\u0026ndash;14). In session 15, the draft increases in size again, and following a small expansion in session 16, the draft remains fairly similar in length for the final sessions.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e[Figure 1. Document word count for each writing session]\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e[Figure 2. Barplot with the count of unique lemmas coming into (the black bars) and disappearing from the text (grey bar) per session, on the filtered lemma set, only nouns, adjectives, verbs \u0026amp; adverbs.]\u003c/p\u003e\u003cp\u003eFollowing Sobchuk \u0026amp; Šela (2024), the set of unique lemmas going in and out of the document in each session has been further pared down to increase semantic load; only nouns, verbs, adjectives and adverbs have been taken into account.\u003c/p\u003e\u003cp\u003eThe counts of unique lemmas entering and exiting the document, as shown in Fig.\u0026nbsp;2, show parallels with the wordcount plot above. The first sessions many more new lemmas enter the document than leave it. We also see the marked influx of new lemmas in session 16 (the 15_16 comparison set), which runs parallel with a large expanse of the document length, as well as the completion of the first draft. Comparing the two metrics, we see that the period between sessions 8 and 16, is characterised by substitutions (of unique lemma's) but not by text expansion. We also see that compared to the later stage, sessions 17\u0026ndash;22, although shifts in the document length are equally modest, far fewer unique lemmas are entering and exiting the drafts. This suggests a focus on proof-reading and other less semantically rich textual changes, which is supported by the close reading. Van Pelt for example switches out instances of words that she used multiple times, such as 'pirates'[piraten] and 'ship'[schip], to increase lexical variety when referring to the same concept. She also re-orders sentences without shifts in lemmas, for example in session 21:\u003c/p\u003e\u003cp\u003eDoor zijn zonnebril kan ik het niet met zekerheid weten.\u003c/p\u003e\u003cp\u003e('Al' - occurs in other locations still, so removing it here does not remove a unique lemma from the document.)\u003c/p\u003e\u003cp\u003eBased on this lemma-visualisation, the textual scholar could select those writing sessions where many lemmas appear or dissappear from the document as entry points into the textual development. Contrary to the document length visualisation, the analysis of unique lemmas allowed for a distinction between two periods of revision. However, these counts do not yet indicate the semantic distance of those lemmas; for that I have applied a pairwise cosine similarity measure based on BERT.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003e3.2 Pairwise cosine document similarity\u003c/h2\u003e\u003cp\u003eThe document texts at the end of each writing session were compared with \u003csup\u003e2\u003c/sup\u003etheir predecessor.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e[Figure 4. Barplot showing the pairwise cosine similarity of the document texts.]\u003c/p\u003e\u003cp\u003eFigure 4 shows these pair-based similarity scores. The first marked difference is between sessions 1 and 2. As the draft is still very short at the end of session 1, it is more than doubled in size at the end of session 2 - leading to a proportionally large amount of new lemmas coming into the document too. [short texts, type-token ratio, etc]\u003c/p\u003e\u003cp\u003eThe other patterns are more telling. The middle stage of extensive revision, starting from 6_7, and running until 14_15, shows the lowest pair-wise similarity. From the unique lemma counts, we know that equal amounts of lemmas are entering and existing the document here, whereas in the first stage of expansion, many more new lemmas are added.\u003c/p\u003e\u003cp\u003eSession 8 (pair 7_8 in the graph) and session 15 (pair 14_15) show the biggest differences in the cosine similarity with their predecessor.\u003c/p\u003e\u003cp\u003eIn session 8, Van Pelt removes a distinctive passage that served as a note, rather than part of the narrative, namely a text from a tourist brochure on the pirate-boat excursion, that Van Pelt had copied from a travel agency's website and pasted into the document. She does integrate information from this section into her narrative, but many of the unique lemmas leaving the document in this work session were a part of that brochure fragment still. She also adds a distinctive section, with 11 shipping terms and expressions, also as a note rather than part of the narrative. She adds two new segments to the narrative itself, one on reading, and one on the tourists' reactions during a comedy performance by the pirates, and further expands a scene where the tourists are being served lunch, which explains the food-related lemmas entering the text at this point. It is worth further study to see if this is a genre-effect on the similarity measure; as two distictive non-story parts are playing a role here.\u003c/p\u003e\u003cp\u003eA similar shift takes place in session 15 (14_15), where Van Pelt removes all remaining notes from the draft, both the shipping terms as well as short ones on ending and beginning. However, she integrates many of the shipping expressions into the story, by substitution. She also adds short vignettes of two new characters, both speaking - which is represented in indirect speech. A trinket seller on the beach as well as an angry tourist lady are given the stage. This is the last time in the process that new characters are introduced. Also, in a new scene at the end of the draft so far, the main characters find themselves back on the pirate boat; I wouldn't expect too many new words being introduced here, as there have been scenes on the boat before.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\u003ch2\u003e3.3 Narrativity measure\u003c/h2\u003e\u003cp\u003eUsing the narrativity annotation system from Vauth et al (2021), all verbal phrases were manually annotated as one of four types of events. Each type had a different weighing of eventfulness, ranging from 0 to 7. Non-events received 0 points; stative events 2, process events 5 and change of state events 7 points. Some example sentences from the drafts can be found in Table\u0026nbsp;1.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eEvent type \u0026amp; weighing\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSentence from the corpus\u003csup\u003e3\u003c/sup\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eEnglish translation\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNon-event (0 points)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eMisschien verandert dat wanneer Finn volgend schooljaar naar het eerste leerjaar gaat.\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eMaybe that will change when Finn starts first grade next year.\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eStative event (2 points)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eOns schip heet Elysa.\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eOur ship is called Elysa.\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eProcess event (5 points)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e'Wanneer zien we de flamingo's, mama?' vroeg Finn.\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\"When are we going to see the flamingos, Mom?\" Finn asked.\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eChange of state event (7 points)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eVanmorgen pikte een tourbus ons op vlak voor het hotel.\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eThis morning a tour bus picked us up in front of the hotel.\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e[Table\u0026nbsp;1: Example sentences for each type of narrative event.]\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable border=\"1\"\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e[Figure 5. A count of each type of narrative event for each session.]\u003c/p\u003e\u003cp\u003eThe change of state events can be considered to be most closely aligned with plot development. From Fig.\u0026nbsp;5, we can glean that in the first few sessions, all types of events are added to the draft, but then from session 3 onwards, Van Pelt mainly adds change of state and process events, the most \u0026lsquo;narrative\u0026rsquo; categories. After the completion of the first draft, in session 16, relatively many process events are still added to the draft. These often consisted of dialogue and actions like \u0026lsquo;walking\u0026rsquo; or \u0026lsquo;seeing\u0026rsquo;.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e[Figure 6. Bar plot with the difference in narrativity scores between consecutive pairs of text versions, where the score of the earlier version is subtracted from the score of the later version. ]\u003c/p\u003e\u003cp\u003eAdding up all scores for each session gave a global narrativity score. Then, for each consecutive pairing of sessions, the score of the older session was subtracted from that of the newer session. Figure\u0026nbsp;6 visualises the resulting shifts in overall narrativity score between each pair of sessions.\u003c/p\u003e\u003cp\u003eThe first, text expansion, stage (session 1\u0026ndash;7) has the highest growth in eventfulness, and the sessions 19 and onwards, which were part of the final revision stage with proofreading and lower-level revisions, have the lowest scores. The sessions in between those two distinguishable stages are not as easily subdivided. It is useful to see that after completion of the first draft, in session 16, the eventfulness still increases. By zooming in on the individual event types (Fig.\u0026nbsp;5) we can tell that this increase is mainly caused by process events.\u003c/p\u003e\u003cp\u003eThe overall pattern is very similar to Fig.\u0026nbsp;2, the count of unique lemma's. Especially the lemmas coming into the document seems highly aligned with the narrativity scores. One marked difference occurs for the pair 11_12 though: it has a similar number of lemmas going in and out of the document as pair 8_11, also a similar cosine document similarity, a similar small fluctuation in document length, but quite a difference in narrativity score, with the eventfullness showing a modest increase in 8_11, but a decrease from 11 to 12. On closer inspection of the drafts, the total amount of sentences does increase over session 12, but several substitutions swap the most plot-advancing, 'change events' (7 points) for less active events, such as statives (2 points) and non-events (0 points) (see Fig.\u0026nbsp;5). One example is this rewrite, where instead of the change events of food actively being placed on the plates of the main characters, a more descriptive, stative version is given. Van Pelt prioritises setting the scene over driving the plot forwards with these types of revisions and additions.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Tabb\" border=\"1\"\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSession 11:\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eWe krijgen een bord, waarop ze vis en couscous met groenten scheppen.[7]\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eWe are given a plate on which they serve fish and couscous with vegetables.[7]\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eSubstitution in Session 12\u003c/b\u003e:\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eNet als in het buffetrestaurant in ons hotel, wordt het eten hier voor jou op je bord geschept. [2] Er is gebakken vis, couscous en een [s]toverij\u003csup\u003e4\u003c/sup\u003e van groenten. [2]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eJust like in the buffet restaurant in our hotel, the food here is served to you on your plate. [2] There is fried fish, couscous and a vegetable stew. [2]\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eThe narrativity scores provide a meaningful portrait of the story\u0026rsquo;s plot development, but seem to overlap with the patterns shown for the unique lemmas. In the conclusion, the four measures will be compared more closely to assess which ones would complement each other the most.\u003c/p\u003e\u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn this study, computational methods for measuring semantic changes were applied to a chronological series of short story drafts. Based on close reading, we expected certain sessions to contain a more drastic semantic shift than others. Applying the count of unique lemmas in and out of the documents, the final, proofreading, stage was easily distinguishable. The document similarity measure picked up on the revision-rich stage, as the stage where the semantic shifts were consistently larger between sessions than either the stage focussed on text expansion and the stage focussed on proof-reading. The narrativity scores presented quite a similar picture to the counts of new lemmas coming into the document, and clearly differentiated the first stage of text expansion, where narrativity increased a lot each session, and the final proofreading stage, where few fluctuations in narrativity take place. The revision-rich stage had a more mixed presentation.\u003c/p\u003e\u003cp\u003eIn order to get an idea of the overlap between these measures, I ran a correlation matrix on the shifts in document length, the unique lemma counts, the narrativity score and the cosine similarity score. The differences in narrativity scores and in word count had a high correlation (0.92, p\u0026thinsp;\u0026lt;\u0026thinsp;.001). Word count differences also correlated highly with the count of unique lemmas entering the document (0.94, p\u0026thinsp;\u0026lt;\u0026thinsp;.0001) The narrativity scores too correlated highly with the amount of unique lemmas coming into the document (0.82, p\u0026thinsp;\u0026lt;\u0026thinsp;.001). The cosine document similarity correlated both with the counts of lemmas in (-0.66, p\u0026thinsp;\u0026lt;\u0026thinsp;.01 ) and out (-0.50, p\u0026thinsp;\u0026lt;\u0026thinsp;.05) of the document, but not with the narrativity score or the difference in word count. The narrativity score, for this specific corpus of drafts, seems to pick up more on plot expansion, whereas the cosine similarity mainly flagged the stage of intensive revision. Concerning the operationalisation of semantic change, the lack of quantitative correlation supports the idea that these two variables measure different, complementary aspects of meaning in the text. The counts of unique lemmas entering and exiting the document formed a bridge as they correlated with all other measures.\u003c/p\u003e\u003cp\u003eAll three computational measures picked up on parts of the macrogenesis of this story. Looking at the sessions where the semantic change was highest also offered relevant insights into the textual history. However, although all measures brought something to the table, they did not measure the same 'thing'. The narrativity score was designed to track plot and plot development; the word embeddings measure distances between document vectors in a multidimensional vector space. The counts of unique lemmas, a homemade measure added for its simplicity, showed partial correlations with both narrativity and cosine similarity. There was not a clear 'winner' in terms of capturing aspects of the macrogenesis that were found through close reading, but all provided entrance points into the draft corpus by highlighting those writing sessions that stood out from others in one way (plot, text expansion) or the other (intense revision, or proofreading).\u003c/p\u003e\u003cp\u003eAs the narrativity measure was hand-annotated, and in this specific case, correlated with the document growth, it would cause scalability issues when looking at semantic shifts in a much larger corpus. The cosine similarity based on document embeddings is a well-established method in digital literary studies which had not previously been applied to draft materials. It was already known that it is good at distinguishing textual genres, and it appeared to do so in this corpus as well, by flagging sessions where notes in a specific style were added or deleted. It was rather intruiging that the sessions focussed on revision and expansion of the existing scenes were less similar to their predecessors than those sessions where the story was expanded faster by adding new material in the form of new scenes, furthering the plot. Finally, the count of unique lemmas, although it's a quirky and simplistic idea, held up rather well compared with the more established methods. By reading through the lists of unique lemmas themselves, the researcher can also connect the quantitative with the close reading.\u003c/p\u003e\u003cp\u003eAs perhaps is often the case, I found the instances where a writing session was different from the rest, or when two analyses presented contrasting findings, to be just as informative about the writing process of Ellen Van Pelt as the search for macrogenetic stages and global impressions. For example, the differences in narrativity score between 8_11 and 11_12 suggests we can tell when the forward drive of plot development is paused by looking at the narrativity score, even when the other measures did not show differences between these two sets.\u003c/p\u003e\u003cp\u003eThis study was a first exploration of the possibilities of computational methods for textgenetic materials. Its findings are tightly connected to the specific case study, which had a high amount of revision throughout the entire process. Despite this high frequency of revision, no drastic changes were made; the plot was expanded gradually, no large sections were deleted from the drafts, and no big changes were made to the characters, setting, events, perspective and timing. Future studies could curate, perhaps even construct artifical, drafts that incorporate more marked changes, to check whether the semantic measures would pick up on those adequately.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eThe author has no relevant financial or non-financial interests to disclose.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eF.B. conducted the research and wrote the manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eThis research was performed while I was a post-doc at Huygens Institute (KNAW), Amsterdam, the Netherlands. I would like to thank Karina van Dalen-Oskam for mentoring me during this time. My post-doc was part of the CLS-Infra project, funded by the European Union through its Horizon 2020 program, under grant agreement No 101004984.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eAll data used for my analyses has been published in the digital edition https://nanogenesis-digital.github.io/index.html, which is created by Lamyk Bekius.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eBarron, A. T. J., Huang, J., Spang, R. L. and DeDeo, S. (2018). Individuals, institutions, and innovation in the debates of the French revolution. Proceedings of the National Academy of Sciences of the United States of America, 115(18): 4607\u0026ndash;12.\u003c/li\u003e\n\u003cli\u003eB\u0026auml;r, Daniel, et al. \u003cem\u003eComposing Measures for Computing Text Similarity\u003c/em\u003e. Technical Report TUD-CS-2015-0017, Technische Universit\u0026auml;t Darmstadt, 2015.\u003c/li\u003e\n\u003cli\u003eBeausang, Chris. \u0026lsquo;Diachronic Delta: A Computational Method for Analysing Periods of Accelerated Change in Literary Datasets\u0026rsquo;. \u003cem\u003eDigital Scholarship in the Humanities\u003c/em\u003e, vol. 37, no. 3, Sept. 2022, pp. 644\u0026ndash;59. \u003cem\u003eSilverchair\u003c/em\u003e, https://doi.org/10.1093/llc/fqab041.\u003c/li\u003e\n\u003cli\u003eBekius, Lamyk L. (2023) \u003cem\u003eBehind the Computer Screens\u003c/em\u003e. 2023. University of Amsterdam \u0026amp; Antwerp University, Doctoral thesis, https://hdl.handle.net/11245.1/07ab9a89-89b6-44cd-84c5-ddc46c9cdf60\u003c/li\u003e\n\u003cli\u003eBekius, Lamyk L. (2024). \u003cem\u003eNanogenesis Digital\u003c/em\u003e. Retrieved August 21, 2025, from \u003cu\u003ehttps://nanogenesis-digital.github.io \u003c/u\u003e\u003c/li\u003e\n\u003cli\u003eBuschenhenke, Floor. (2025) \u003cem\u003eEntering Stories: Decoding Born-Digital Writing through Keystroke Logging\u003c/em\u003e. University of Amsterdam \u0026amp; Antwerp University, Doctoral thesis, https://hdl.handle.net/11245.1/16460778-7a64-4134-8df5-321e8ece96ef\u003c/li\u003e\n\u003cli\u003eChandrasekaran, D., \u0026amp; Mago, V. (2021). Evolution of Semantic Similarity\u0026mdash;A Survey. ACM Comput. Surv., 54(2), 41:1-41:37. https://doi.org/10.1145/3440755\u003c/li\u003e\n\u003cli\u003eEhrmanntraut, Anton, et al. \u0026lsquo;Modeling and Measuring Short Text Similarities. On the Multi-Dimensional Differences between German Poetry of Realism and Modernism\u0026rsquo;. \u003cem\u003eJournal of Computational Literary Studies\u003c/em\u003e, vol. 1, no. 1, 1, Dec. 2022. \u003cem\u003ejcls.io\u003c/em\u003e, https://doi.org/10.48694/jcls.116.\u003c/li\u003e\n\u003cli\u003eFan, Li, et al. \u0026lsquo;Exploring the Behavioral and Neural Correlates of Semantic Distance in Creative Writing\u0026rsquo;. \u003cem\u003ePsychophysiology\u003c/em\u003e, vol. 60, no. 5, 2023, p. e14239. \u003cem\u003eWiley Online Library\u003c/em\u003e, https://doi.org/10.1111/psyp.14239.\u003c/li\u003e\n\u003cli\u003eGeyer, Thomas, et al. \u0026lsquo;Reading Haiku: Semantic Distance and the \u0026ldquo;Cut Effect\u0026rdquo;\u0026rsquo;. \u003cem\u003e\u0026lsquo;To Sing the Haiku the American Way Is a Beautiful Thing\u0026rsquo;: The Haiku of Etheridge Knight\u003c/em\u003e, 2020, p. 9.\u003c/li\u003e\n\u003cli\u003eGr\u0026eacute;sillon, Almuth. (2016). \u003cem\u003eEl\u0026eacute;ments de critique g\u0026eacute;n\u0026eacute;tique: Lire les manuscrits modernes\u003c/em\u003e. CNRS \u0026eacute;ditions.\u003c/li\u003e\n\u003cli\u003eGriebel, Sarah, et al. \u003cem\u003eLocating the Leading Edge of Cultural Change\u003c/em\u003e. \u003cem\u003eZotero\u003c/em\u003e, https://ceur-ws.org/Vol-3834/paper70.pdf.\u003c/li\u003e\n\u003cli\u003eHan, M., Zhang, X., Yuan, X., Jiang, J., Yun, W., \u0026amp; Gao, C. (2021). A survey on the techniques, applications, and performance of short text semantic similarity. \u003cem\u003eConcurrency and Computation: Practice and Experience\u003c/em\u003e, 33(5), e5971. https://doi.org/10.1002/cpe.5971\u003c/li\u003e\n\u003cli\u003eHerrmann, J. Berenike, et al. \u0026lsquo;Revisiting Style, a Key Concept in Literary Studies\u0026rsquo;. \u003cem\u003eJournal of Literary Theory\u003c/em\u003e, vol. 9, no. 1, Jan. 2015. \u003cem\u003eCrossRef\u003c/em\u003e, https://doi.org/10.1515/jlt-2015-0003.\u003c/li\u003e\n\u003cli\u003eKetzan, Erik, and Christof Sch\u0026ouml;ch. \u0026lsquo;Classifying and Contextualizing Edits in Variants with Coleto: Three Versions of Andy Weir\u0026rsquo;s The Martian\u0026rsquo;. \u003cem\u003eDigital Humanities Quarterly\u003c/em\u003e, vol. 015, no. 4, Dec. 2021.\u003c/li\u003e\n\u003cli\u003eKroeger, Paul R. \u003cem\u003eAnalyzing Meaning: An Introduction to Semantics and Pragmatics.\u003c/em\u003e Language Science Press, 2023. \u003cem\u003elibrary.oapen.org\u003c/em\u003e, https://doi.org/10.5281/zenodo.6855854.\u003c/li\u003e\n\u003cli\u003eLeijten, Mari\u0026euml;lle, and Luuk Van Waes. \u0026lsquo;Keystroke Logging in Writing Research: Using Inputlog to Analyze and Visualize Writing Processes\u0026rsquo;. \u003cem\u003eWritten Communication\u003c/em\u003e, vol. 30, no. 3, July 2013, pp. 358\u0026ndash;92, https://doi.org/10.1177/0741088313491692.\u003c/li\u003e\n\u003cli\u003eNeal, T., Sundararajan, K., Fatima, A., Yan, Y., Xiang, Y., \u0026amp; Woodard, D. (2017). Surveying Stylometry Techniques and Applications. \u003cem\u003eACM Comput. Surv.\u003c/em\u003e, \u003cem\u003e50\u003c/em\u003e(6), 86:1-86:36. https://doi.org/10.1145/3132039\u003c/li\u003e\n\u003cli\u003ePeverelli, Andrea, et al. \u0026lsquo;Tracking Textual Similarities in Neo-Latin Drama Networks\u0026rsquo;. \u003cem\u003eProceedings of the Thirteenth Language Resources and Evaluation Conference\u003c/em\u003e, edited by Nicoletta Calzolari et al., European Language Resources Association, 2022, pp. 5295\u0026ndash;303. \u003cem\u003eACLWeb\u003c/em\u003e, https://aclanthology.org/2022.lrec-1.567.\u003c/li\u003e\n\u003cli\u003ePosthuma, Jente. \u0026lsquo;En Daarom Haten Ze Zichzelf\u0026rsquo;. De Gids, vol. 21, no. 1, 2021, https://www.de-gids.nl/artikelen/en-daarom-haten-ze-zichzelf.\u003c/li\u003e\n\u003cli\u003eKo van der Sloot, Iris Hendrickx, Maarten van Gompel, Antal van den Bosch and Walter Daelemans. Frog, A Natural Language Processing Suite for Dutch, Reference Guide, Language and Speech Technology Technical Report Series 18-02, Radboud University, Nijmegen, December 2018, Available from https://frognlp.readthedocs.io/en/latest/ + https://webservices.cls.ru.nl/frog\u003c/li\u003e\n\u003cli\u003eSobchuk, Oleg, and Artjoms \u0026Scaron;eļa. \u0026lsquo;Computational Thematics: Comparing Algorithms for Clustering the Genres of Literary Fiction\u0026rsquo;. \u003cem\u003eHumanities and Social Sciences Communications\u003c/em\u003e, vol. 11, no. 1, Mar. 2024, pp. 1\u0026ndash;12. \u003cem\u003ewww.nature.com\u003c/em\u003e, https://doi.org/10.1057/s41599-024-02933-6.\u003c/li\u003e\n\u003cli\u003eSzemes, Botond, and Mih\u0026aacute;ly Nagy. \u003cem\u003eRepetition and Innovation in Dramatic Texts\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003eT.y.s.s, Santosh, et al. \u0026lsquo;A Tale of Two Revisions: Summarizing Changes Across Document Versions\u0026rsquo;. \u003cem\u003eFindings of the Association for Computational Linguistics ACL 2024\u003c/em\u003e, edited by Lun-Wei Ku et al., Association for Computational Linguistics, 2024, pp. 3195\u0026ndash;211. \u003cem\u003eACLWeb\u003c/em\u003e, https://aclanthology.org/2024.findings-acl.190.\u003c/li\u003e\n\u003cli\u003eVan Hulle, D., \u0026amp; Nixon, M. (2024). \u003cem\u003eBeckett Digital Manuscript Project\u003c/em\u003e [online resource]. University of Oxford.\u003c/li\u003e\n\u003cli\u003evan Cranenburgh, Andreas, et al. \u0026lsquo;Vector Space Explorations of Literary Language\u0026rsquo;. \u003cem\u003eLanguage Resources and Evaluation\u003c/em\u003e, Feb. 2019. \u003cem\u003eSpringer Link\u003c/em\u003e, https://doi.org/10.1007/s10579-018-09442-4.\u003c/li\u003e\n\u003cli\u003eVerhagen, Arie. \u0026lsquo;Construal and Stylistics\u0026ndash;within a Language, across Contexts, across Languages\u0026rsquo;. \u003cem\u003eStylistics across Disciplines. Conference Proceedings\u003c/em\u003e, 2012. \u003cem\u003eGoogle Scholar\u003c/em\u003e, http://arieverhagen.nl/cms/files/2012_Verhagen_ConstrualStylistics.pdf.\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Footnotes","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003e \u003cspan type=\"SmallCaps\" class=\"SmallCaps\" name=\"Emphasis\"\u003eThis paragraph is taken from Buschenhenke, 2025, p.127\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e \u003cspan type=\"SmallCaps\" class=\"SmallCaps\" name=\"Emphasis\"\u003eA cosine similarity measure was also ran on the sets of unique lemmas. However, the sessions where one of these sets was very small (between 1 and 3 lemmas) had a very low similarity to the rest that is possibly related to this sample size.\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e \u003cspan type=\"SmallCaps\" class=\"SmallCaps\" name=\"Emphasis\"\u003eTaken from the the story draft, in session id14\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e ' toverij' means ' magic' but ' stoverij' means ' stew' - although Van Pelt uses ' toverij' it is quite possible that this is a typo.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"international-journal-of-digital-humanities","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"ijdh","sideBox":"Learn more about [International Journal of Digital Humanities](http://link.springer.com/journal/42803)","snPcode":"42803","submissionUrl":"https://submission.nature.com/new-submission/42803/3","title":"International Journal of Digital Humanities","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Genetic criticism, writing process, literary drafts, semantic similarity, semantic distance, measuring semantic change","lastPublishedDoi":"10.21203/rs.3.rs-7750385/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7750385/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThis study applies computational methods for measuring semantic distances to born-digital drafts. Using text versions leading up to a short story by Flemish author Ellen Van Pelt, we are looking for relevant entry points into a born-digital genetic dossier. Several methods are applied and compared to consecutive pairs of drafts; a count of unique lemmas entering and leaving the working document, a document cosine similarity measure based on a BERT model of Dutch, and a narrativity measure. By comparing these different methods with close reading, we can assess whether such computational tools may be of help to textual scholars working with big corpora of text-genetic manuscripts. The working process of Van Pelt could be divided into several stages with a different focus. The methods each partially picked up on these stages, and also highlighted specific drafts. It did not become clear which of the methods was most succesful, in part because of the single, small case study used, for a writing process that was characterised by gradual expansion and continuous revision.\u003c/p\u003e","manuscriptTitle":"Distinguishing drafts: measuring semantic distances between born-digital short story drafts","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-17 12:34:40","doi":"10.21203/rs.3.rs-7750385/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-11-05T16:01:55+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-01T22:48:12+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"101208499891683426777307947193929694237","date":"2025-10-08T14:07:41+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-10-06T14:04:52+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-10-03T08:31:06+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-10-02T12:04:48+00:00","index":"","fulltext":""},{"type":"submitted","content":"International Journal of Digital Humanities","date":"2025-09-30T10:18:38+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"international-journal-of-digital-humanities","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"ijdh","sideBox":"Learn more about [International Journal of Digital Humanities](http://link.springer.com/journal/42803)","snPcode":"42803","submissionUrl":"https://submission.nature.com/new-submission/42803/3","title":"International Journal of Digital Humanities","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"2cd3a58f-492f-472d-90cd-e607b2bb194b","owner":[],"postedDate":"October 17th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-21T17:54:45+00:00","versionOfRecord":[],"versionCreatedAt":"2025-10-17 12:34:40","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7750385","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7750385","identity":"rs-7750385","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00