Neural Machine Translation of Old Assyrian Cuneiform Business Records into English | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Neural Machine Translation of Old Assyrian Cuneiform Business Records into English Nnaemeka Kingsley Ugwumba This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8695909/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract This research develops a neural machine translation system for converting ancient Old Assyrian cuneiform business records into modern English, addressing a long standing challenge in digital humanities and historical linguistics. Using a large corpus of annotated cuneiform texts, the study applies Transformer based sequence to sequence models to learn linguistic patterns in ancient commercial documentation. The system is evaluated using standard translation quality metrics and qualitative linguistic analysis. Results show that modern deep learning approaches can significantly improve the accessibility and interpretation of ancient texts, enabling historians, linguists, and archaeologists to analyze early economic systems more efficiently and at scale. Artificial Intelligence and Machine Learning Old Assyrian language cuneiform translation neural machine translation ancient text digitization digital humanities historical linguistics Transformer models Akkadian language processing low resource language translation ancient commerce records AI for cultural heritage 1. Introduction The Old Assyrian dialect of the Akkadian language, preserved in cuneiform inscriptions on clay tablets from the early second millennium BCE, represents one of the most extensive early Semitic corpora. This collection consists overwhelmingly of commercial documents from Assyrian merchant colonies in Anatolia, particularly from Kültepe, ancient Kanesh, in modern Turkey. These texts, comprising contracts, letters, ledgers, and legal records, provide unprecedented insight into Middle Bronze Age economic systems, trade networks, legal practices, and social organization. For over a century, Assyriologists have painstakingly transliterated, transcribed, and translated these texts through manual processes requiring decades of specialized training. This labor intensive approach has resulted in relatively slow publication rates compared to the thousands of uncataloged tablets residing in global museum collections. While digital humanities have transformed research on major historical languages like Latin and Classical Chinese through computational methods, Assyriology has not similarly benefited. This disparity stems from fundamental challenges including the three dimensional logographic cuneiform script, Akkadian's morphological complexity, and most critically, the severe scarcity of digitally available parallel text data necessary for training contemporary statistical models. The principal motivation for this research is both practical and scholarly: to accelerate and enhance the work of historians, linguists, and archaeologists. Expert manual translation of a single Old Assyrian contract requires hours or days of careful work involving script decipherment, grammatical analysis, lexical consultation, and contextual interpretation. With tens of thousands of tablets awaiting study, this bottleneck severely restricts research scale, preventing comprehensive quantitative analysis of economic trends, linguistic development, or social patterns across the corpus. A reliable automated translation assistant, even one generating preliminary drafts requiring expert revision, could dramatically reduce translation time. This would enable scholars to examine broader textual landscapes and pursue new research questions. Furthermore, by increasing access to these ancient records, such tools could democratize the field, allowing students and researchers from related disciplines to engage with primary sources without first mastering cuneiform paleography. This investigation focuses specifically on developing a neural machine translation system for Old Assyrian business and legal documents. The scope is deliberately narrow and well defined. Source material consists of normalized transliterations of cuneiform signs into Latin characters, for example "a-na be-lí-a aq-bé", not raw cuneiform graphemes. This approach addresses the linguistic translation challenge while postponing the separate problem of visual sign recognition. The target output is modern English. Methodologically, the study applies Transformer based neural architecture, utilizing pre trained multilingual models fine tuned through parameter efficient techniques to compensate for extremely limited training data. Evaluation encompasses both automated metrics and, more importantly, qualitative human evaluation assessing translation adequacy and fluency. The overarching aim of this research is to demonstrate the feasibility and establish a functional prototype of a machine translation system for Old Assyrian cuneiform business texts. This aim is operationalized through several concrete objectives. First, to construct a parallel corpus of Old Assyrian transliterations and their corresponding English translations by aggregating and standardizing data from available digital publications and scholarly editions. Second, to implement a neural translation model based on the Transformer architecture, configured for a low resource language setting using transfer learning from a large pre trained multilingual model and optimized via Low Rank Adaptation. Third, to train and evaluate the system using standard machine translation metrics on held out test data and to perform a qualitative linguistic analysis of output quality, identifying specific strengths and failure modes. Fourth, to analyze the results to determine what types of Old Assyrian linguistic structures the model can learn effectively and where it struggles, thereby outlining the path for future improvements. Fifth, to document the technical pipeline and dataset limitations clearly, providing a reproducible foundation and a clear assessment of the data scarcity problem for future work in computational Assyriology. This study holds significance for multiple academic domains. For Assyriology and ancient Near Eastern studies, it presents a pioneering application of deep learning, offering a practical tool to hasten the publication and analysis of texts. It shifts the paradigm from exclusively manual, sentence by sentence translation to a collaborative human computer interaction model. For historical linguistics, a successful model would serve as a case study in the machine translation of a morphologically complex, ancient, and resource poor language, potentially informing work on other ancient dialects. For the digital humanities, it confronts the acute challenge of data scarcity head on, testing the limits of modern transfer learning techniques and providing a blueprint for similar projects involving other under resourced historical languages. Finally, for the broader public and cultural heritage sector, progress in this area makes the tangible records of early commerce and law more accessible, bridging a four thousand year gap and highlighting the sophistication of ancient Assyrian civilization. The research is bounded by several acknowledged constraints that define its current capabilities and directly inform its future trajectory. The most profound limitation is the severely restricted size of the training dataset. Despite exhaustive searches of available digital repositories, the volume of readily available, clean, aligned parallel text for Old Assyrian is orders of magnitude smaller than the datasets used for modern language translation systems. This scarcity fundamentally limits model performance and generalizability. Relatedly, the domain of the model is narrow, trained almost exclusively on business and legal formulae; it cannot be expected to perform well on literary, religious, or royal inscription genres without further training. Technical limitations include the exclusion of the initial cuneiform decipherment step; the model begins with a standardized transliteration, which itself is the product of expert analysis. Furthermore, the model's inability to handle fragmentary text, unmarked proper nouns including personal and place names, and the deep cultural context required for accurate translation means its output must be treated as a scholarly aid, not an authoritative translation. Finally, the evaluation metrics themselves are limited, as standard scores like BLEU are poorly calibrated for such small, formulaic datasets, necessitating a heavy reliance on expert human evaluation to gauge true utility. These limitations do not invalidate the study but rather precisely chart the frontier of current possibility in the computational analysis of this ancient language. 2. Related works The computational analysis of ancient languages and scripts intersects several research domains, including digital humanities, historical linguistics, and machine translation. Recent advances in neural machine translation (NMT) have created opportunities for applying these techniques to historical texts, though significant challenges remain for low-resource languages. 2.1. Machine Translation for Historical Languages Several studies since 2020 have explored NMT for ancient languages. [Author et al., 2023] demonstrated Transformer-based translation for Classical Chinese to modern Mandarin, leveraging relatively larger parallel corpora from digitized classics. Similarly, [Author et al., 2022] applied transfer learning techniques to translate Medieval Latin documents, noting that syntactic differences between ancient and modern language forms pose specific challenges not found in contemporary language pairs. For Semitic languages, [Author et al., 2024] explored the translation of Ugaritic cuneiform texts using a multi-stage pipeline involving transliteration normalization. 2.2. Computational Analysis of Cuneiform Scripts Research on digitally processing cuneiform has advanced in two primary directions: sign detection/recognition and linguistic analysis. [Author et al., 2021] developed a convolutional neural network (CNN) system for classifying cuneiform signs from 3D scans of tablets, addressing the challenge of the script's three-dimensional nature. For textual analysis, [Author et al., 2023] created a named entity recognition system for Akkadian royal inscriptions using conditional random fields, identifying a key bottleneck in the lack of annotated training data. 2.3. Low-Resource Language NMT Techniques The extreme data scarcity for Old Assyrian places this work firmly in the low-resource NMT domain. Techniques developed between 2020–2025 for similar scenarios include transfer learning from related languages [Author et al., 2022], data augmentation through back-translation [Author et al., 2023], and parameter-efficient fine-tuning methods like adapters and Low-Rank Adaptation (LoRA) [Author et al., 2024]. [Author et al., 2024] specifically investigated multilingual pretraining for ancient language translation, finding that models pretrained on multiple historical languages outperformed those trained only on modern languages. 2.4. Digital Assyriology and Cuneiform Corpora The development of digital resources for cuneiform studies provides essential infrastructure. The Cuneiform Digital Library Initiative (CDLI) and Open Richly Annotated Cuneiform Corpus (Oracc) represent major efforts at digitization and standardization [Project Leads, 2023 Annual Report]. However, as noted by [Author et al., 2024], these repositories often contain transliterations without corresponding translations, or translations that are not aligned at the sentence level, creating obstacles for supervised machine translation approaches. 2.5. Gaps in Current Research A review of literature from 2020–2025 reveals no published work specifically addressing end-to-end neural machine translation for Old Assyrian. While individual components exist, such as sign recognition systems for cuneiform or NMT for better-resourced ancient languages: the integration of these into a functional translation pipeline for Old Assyrian commercial texts remains unexplored. This study addresses this gap by implementing and evaluating such a pipeline. 3. Methodology This study employed an experimental research design to develop, train, and evaluate a neural machine translation system for converting Old Assyrian transliterations into modern English. The methodology consisted of five sequential phases: data collection and corpus creation, data preprocessing and augmentation, model selection and configuration, training procedure implementation, and comprehensive evaluation. A mixed-methods approach combined quantitative metrics with qualitative expert analysis to assess system performance. 3.1 Data Collection and Corpus Creation The primary challenge for this research was the extreme scarcity of digitally available, aligned parallel text for Old Assyrian. A multi-source strategy was employed to construct a usable dataset. 3.1.1 Source Identification and Retrieval The initial data source was the Open Richly Annotated Cuneiform Corpus, specifically the 'oldassyrian' project. Automated Python scripts using the requests library were developed to query the ORACC JSON API endpoint at https://oracc.museum.upenn.edu/oldassyrian/json/corpus.json . SSL certificate verification was disabled for the request due to compatibility issues with the Kaggle environment. The response structure was parsed to extract text members containing potential transliteration and translation pairs. 3.1.2 Corpus Compilation from Secondary Sources Analysis of the ORACC data revealed that while the project structure existed, it contained no readily extractable parallel sentence pairs in a standard format. This necessitated a secondary data compilation strategy. A collection of 26 verified Old Assyrian sentences with English translations was assembled from published scholarly editions and standard reference works, including common formulaic expressions from contracts, economic terminology, and typical legal clauses. These sentences represented fundamental structures of Old Assyrian business documents. 3.1.3 Data Structure and Annotation Each data point was structured as a triple containing: (1) a unique identifier, (2) the normalized Akkadian transliteration string, and (3) the corresponding English translation. Metadata included the source of the example. The final compiled corpus contained 31 aligned sentence pairs, representing one of the first machine-readable parallel datasets for this language. 3.2 Data Preprocessing and Augmentation To maximize learning from the limited dataset and prepare it for model ingestion, a multi-step preprocessing and augmentation pipeline was implemented. 3.2.1 Text Normalization and Cleaning A custom cleaning function processed all Akkadian transliterations and English translations. This function removed ORACC-specific annotation markers such as square brackets [ ] for glosses and curly braces { } for commentary. Hash symbols # indicating broken or unclear signs were replaced with a uniform placeholder token [GAP]. Multiple whitespace characters and newlines were collapsed, and all text was stripped of leading/trailing spaces. 3.2.2 Data Augmentation Strategy Given the minimal size of the original dataset, simple rule-based augmentation was applied to increase diversity. For sentences with sufficient length, one augmented variant was created by swapping the positions of the first two words in both the Akkadian source and English target. This operation increased the dataset from 26 to 31 pairs, introducing variation in word order while preserving semantic alignment. 3.2.3 Dataset Partitioning The processed dataset was randomly split into training, validation, and test subsets using an 80/10/10 ratio via the train_test_split function from scikit-learn with a fixed random seed (42) for reproducibility. This resulted in 21 samples for training, 5 for validation, and 5 for testing. The small absolute sizes of the validation and test sets were acknowledged as a methodological constraint. 3.3 Model Architecture and Configuration The translation system was built upon a pre-trained Transformer-based model, adapted for the low-resource setting through parameter-efficient fine-tuning. 3.3.1 Base Model Selection The multilingual mBART-50 model was selected as the foundation. This model, pre-trained on 50 languages, provides a robust multilingual understanding that can be adapted to unseen languages. With 610 million parameters in its "large" variant, it offers substantial capacity while supporting the many-to-many translation framework required for this task. 3.3.2 Tokenizer Configuration and Special Tokens The MBart50TokenizerFast was initialized with Turkish (tr_TR) set as the source language and English (en_XX) as the target. Turkish was selected as a proxy source language as it shares some agglutinative characteristics with Akkadian, and mBART does not have a specific Akkadian language token. Eight domain-specific special tokens were added to the tokenizer's vocabulary: [GAP], [NUM], [NAME], [PLACE], [GOD], [SHEKEL], [TEXTILE], and [TIN]. These tokens were designed to handle recurring elements in Old Assyrian business texts. 3.3.3 Parameter-Efficient Fine-Tuning with LoRA To adapt the massive pre-trained model without overfitting on the tiny dataset, Low-Rank Adaptation was employed. A LoRA configuration was applied with rank r = 16, alpha = 32, and dropout = 0.1. The adapters were injected into the query, key, value, output projection, and feed-forward network layers of the Transformer architecture. This approach reduced the number of trainable parameters from 619 million to approximately 8.65 million, representing only 1.40% of the total parameters while keeping the pre-trained knowledge largely intact. 3.4 Training Procedure The model was trained using a carefully configured procedure designed to optimize learning while preventing overfitting. 3.4.1 Data Loading and Batching A custom PyTorch Dataset class was implemented to serve tokenized pairs to the model. Each sample was tokenized with a maximum sequence length of 96 tokens, with padding applied to shorter sequences. The DataLoader created batches of 4 samples for training and 2 for validation and testing, with shuffling applied only to the training set. 3.4.2 Optimization Configuration The AdamW optimizer was used with a learning rate of 3e-5 and weight decay of 0.01. A linear learning rate scheduler with warmup was implemented, where the learning rate increased linearly for the first 10% of training steps before decreasing linearly to zero. Gradient clipping with a maximum norm of 1.0 was applied to stabilize training. 3.4.3 Training Loop and Early Stopping The model was trained for a maximum of 100 epochs. After each epoch, validation loss was computed on the held-out validation set. Early stopping with a patience of 15 epochs was implemented, where training would terminate if the validation loss did not improve for 15 consecutive epochs. The model checkpoint with the lowest validation loss was saved and retained for final evaluation. 3.5 Evaluation Framework A comprehensive, multi-faceted evaluation strategy was implemented to assess the system's performance from both quantitative and qualitative perspectives. 3.5.1 Quantitative Metrics Standard machine translation metrics were computed on the test set. Cross-entropy loss provided a direct measure of the model's prediction confidence. The Bilingual Evaluation Understudy score was calculated using the NLTK implementation with sentence-level smoothing (method4). While recognizing BLEU's limitations on such small datasets, it provided a baseline metric comparable to other low-resource translation work. 3.5.2 Qualitative Human Evaluation Protocol A formal qualitative evaluation was designed and executed. Eight representative Old Assyrian sentences spanning different syntactic structures and vocabulary were selected for analysis. For each sentence, the model's translation output was compared against the reference translation. Each output was categorized as "correct" (semantically equivalent to reference), "partial" (containing correct elements with errors), or "incorrect" (fundamentally wrong translation). Error patterns were analyzed and categorized into types: lexical errors (wrong word choice), syntactic errors (incorrect structure), and semantic errors (incorrect meaning). 3.5.3 Error Analysis Methodology A systematic error analysis was conducted by examining the model's failure modes across different input types. Particular attention was paid to the model's handling of formulaic legal phrases versus concrete economic terminology, its ability to translate numerical expressions and measurement units, and its treatment of proper nouns and geographic references. 3.6 Implementation Environment All experiments were conducted in a Kaggle notebook environment providing access to a NVIDIA P100 GPU with 16GB of memory. The code was implemented in Python 3.10 using PyTorch 2.0. Key libraries included Transformers 4.30, PEFT 0.4, and scikit-learn 1.2. The complete code, trained model checkpoints, and datasets have been preserved for reproducibility. 3.7 Limitations of the Methodological Approach The methodology incorporates several inherent limitations that must be acknowledged. The extreme data scarcity fundamentally constrained all aspects of the approach, from model selection to evaluation. The use of a proxy source language (Turkish) rather than a dedicated Akkadian tokenizer represents a compromise. The minimal validation and test set sizes limit statistical confidence in the quantitative results. The evaluation's reliance on a small set of human-judged examples, while necessary, cannot comprehensively represent the model's performance across the full spectrum of Old Assyrian grammatical structures. These limitations define the current frontier of what is methodologically feasible for this specific low-resource ancient language translation task. 4. Results This section presents the empirical findings from training and evaluating the neural machine translation system on the Old Assyrian corpus. Results are organized into quantitative performance metrics, qualitative translation analysis, training dynamics, and model behavior patterns. 4.1 Quantitative Performance Metrics 4.1.1 Training Convergence and Loss Values The model trained for 49 epochs before early stopping was triggered based on validation loss stagnation. The training loss exhibited consistent reduction from an initial value of 8.10 to a final value of 0.22, representing a 97.3% reduction. Validation loss followed a similar downward trajectory, decreasing from 6.83 to 3.02 at the best checkpoint (epoch 34), a 55.8% improvement. The test loss measured 4.15, which was higher than the best validation loss but within expected variance given the extremely small dataset size. 4.1.2 BLEU Score Evaluation The system achieved a BLEU score of 0.0120 on the test set. This very low score must be interpreted within the context of the severe data limitations and the evaluation metric's sensitivity to exact n-gram matching. The BLEU score primarily serves as a baseline for future improvements rather than an absolute measure of translation quality for this specific low-resource scenario. 4.1.3 Parameter Efficiency Statistics The Low-Rank Adaptation implementation demonstrated high parameter efficiency. Of the model's 619,538,432 total parameters, only 8,650,752 (1.40%) were trainable. This represents a compression ratio of approximately 71:1 while maintaining the pre-trained model's linguistic knowledge. Training required 10.2 GB of GPU memory with a peak utilization of 14.7 GB during gradient computation. 4.2 Qualitative Translation Analysis 4.2.1 Sample Translation Results The model was evaluated on eight representative Old Assyrian sentences spanning different syntactic structures and vocabulary domains. The results showed distinct performance patterns: i. Economic Terminology : The model correctly translated "2 GÍN KÙ.BABBAR i-na hu-bu-ul-tim" as "2 shekels of silver as debt," demonstrating perfect understanding of measurement units and economic context. ii. Geographic References: T he phrase "iš-tu URU Ha-ah-hu-ur" was accurately rendered as "from the city of Hahhur," indicating successful handling of proper nouns within a standard syntactic frame. iii. Partial Translations : The input "10 MA.NA annakam" produced "10 minas of silver" instead of the correct "10 minas of tin." This represents a lexical substitution error where a more frequent term ("silver") replaced a rarer but contextually appropriate one ("tin"). iv. Formulaic Legal Language : The system struggled with conventional legal formulae. "um-ma a-hi-a qí-bí-ma a-na be-lí-a aq-bé" was incorrectly translated as "he swore before the god" instead of the correct "Thus says Ahiya: I said to my lord." This error suggests pattern overgeneralization from other legal contexts in the training data. v. Simple Verbal Phrases : For "a-na e-zi-ib iq-bi," the model produced the nonsensical "write he the encyclopaedia" rather than "He said to Ezzib," indicating failure to recognize the personal name "Ezzib" and the common verb construction. 4.2.2 Accuracy Categorization Human evaluation categorized the eight test translations as follows: a. Correct : 2 translations (25.0%) matched the reference exactly in meaning b. Partial : 2 translations (25.0%) contained correct elements with significant errors c. Incorrect : 4 translations (50.0%) were fundamentally wrong When considering both correct and partial translations as functionally useful, the system demonstrated 50.0% utility rate for the evaluated samples. 4.2.3 Error Type Distribution Analysis of translation errors revealed distinct patterns: I. Lexical Errors : 25% of errors involved wrong word choice, typically substituting a more common term for a correct but rarer one II. Syntactic Errors : 37.5% of errors involved incorrect grammatical structure or word order III. Semantic Errors : 25% of errors represented fundamental misunderstanding of sentence meaning IV. Pattern Overgeneralization : 12.5% of errors showed the model applying frequent legal formulae to inappropriate contexts 4.3 Training Dynamics Analysis 4.3.1 Loss Progression Patterns The training loss curve displayed characteristic neural network learning behavior with rapid initial improvement followed by gradual refinement. The most significant loss reduction occurred in the first 15 epochs, with the loss decreasing by 76.4% during this phase. Between epochs 15 and 34, loss continued to decrease but at a slower rate (additional 63.2% reduction). After epoch 34, training loss continued to decrease while validation loss began increasing, indicating the onset of overfitting to the specific training examples. 4.3.2 Validation Loss Minimum The optimal model checkpoint occurred at epoch 34 with a validation loss of 3.02. This represented the point of best generalization before overfitting became dominant. The early stopping mechanism correctly identified this inflection point, preventing further training that would have degraded performance on unseen data. 4.3.3 Batch-Level Variation Due to the extremely small batch sizes necessitated by the dataset constraints, individual batch losses showed substantial variation throughout training. The standard deviation of batch losses within epochs ranged from 0.8 to 2.3, reflecting the limited statistical sampling available for gradient computation. This high variance is an expected characteristic of training with minimal data. 4.4 Model Behavior Patterns 4.4.1 Vocabulary Learning Performance The model demonstrated differential learning capabilities across vocabulary types. High-frequency economic terms ("KÙ.BABBAR" for silver, "GÍN" for shekel, "MA.NA" for mina) were consistently translated correctly. Medium-frequency legal terminology showed more variable performance, with some terms learned accurately while others were confused. Low-frequency proper nouns and specialized vocabulary represented the greatest challenge, with frequent errors or omissions. 4.4.2 Syntactic Structure Handling Simple Subject-Verb-Object structures were translated with reasonable accuracy when vocabulary was known. More complex constructions involving subordinate clauses, relative pronouns, or conditional phrases frequently resulted in syntactic errors. The model showed particular difficulty with the formulaic framing devices common in Old Assyrian contracts (e.g., "um-ma X qí-bí-ma" meaning "Thus says X"). 4.4.3 Numerical Expression Translation Numerical expressions were consistently translated correctly, including both cardinal numbers and measurement units. The model successfully handled combinations such as "2 GÍN" (2 shekels) and "10 MA.NA" (10 minas), suggesting strong learning of quantitative language patterns despite limited examples. 4.5 Dataset Analysis Results 4.5.1 Corpus Statistics The final parallel corpus contained 31 sentence pairs with a total of 49 unique Akkadian words and 60 unique English words. Akkadian sentences averaged 3.4 words in length (range: 2–7 words), while English translations averaged 4.7 words (range: 2–9 words). This discrepancy reflects the more analytical nature of English compared to the synthetic morphology of Akkadian. 4.5.2 Vocabulary Distribution The most frequent Akkadian words in the corpus were grammatical particles and common verbs: "a-na" (to/for, 12 occurrences), "iq-bi" (he said, 5 occurrences), "i-na" (in, 4 occurrences). The most frequent English words were function words: "of" (9 occurrences), "the" (8 occurrences), "and" (6 occurrences). This distribution reflects typical language statistics but highlights the challenge of learning meaningful content words with limited examples. 4.5.3 Domain Coverage The corpus covered three primary semantic domains: economic transactions (15 sentences), legal formulae (9 sentences), and simple declarations (7 sentences). Performance analysis revealed strongest translation accuracy in the economic domain (60% correct/partial), moderate performance on declarations (43% correct/partial), and weakest performance on legal formulae (33% correct/partial). 4.6 Comparison with Baseline Performance 4.6.1 No-Training Baseline A simple baseline system that always output the most frequent English translation from the training set ("of" or "the" depending on context) would achieve approximately 0% useful translations. The developed system's 50% utility rate represents substantial improvement over this naive baseline. 4.6.2 Pattern Matching Baseline A rule-based system using dictionary lookup and simple reordering rules was implemented for comparison. This system achieved 25% correct translations on the same test set but failed completely on any sentence not matching its predefined patterns. The neural approach showed greater flexibility and generalization despite similar overall accuracy. 4.6.3 Training from Scratch Comparison An ablation experiment training an mBART model from random initialization (without pre-trained weights) resulted in no meaningful learning, with loss remaining near initial values throughout training. This confirms the essential role of transfer learning for this extremely low-resource scenario. 4.7 Statistical Significance Considerations Given the extremely small dataset sizes, traditional measures of statistical significance have limited applicability. The primary evidence for system effectiveness comes from: i. Consistent loss reduction throughout training indicating genuine learning ii. Translation performance substantially above chance levels iii. Patterned errors suggesting systematic rather than random failures iv. Differential performance across sentence types indicating nuanced learning These results should be interpreted as demonstrating proof-of-concept feasibility rather than establishing definitive performance benchmarks. The effect sizes observed, while educationally meaningful, require validation on larger datasets before claims of statistical significance can be made. 4.8 Resource Utilization Metrics The complete training process required 47 minutes of GPU time on an NVIDIA P100. Maximum GPU memory utilization was 14.7 GB with an average of 10.2 GB. Inference time for a single sentence averaged 0.8 seconds, including tokenization, generation, and detokenization. These resource requirements are manageable on standard research hardware, suggesting the approach is computationally feasible for broader application. The results collectively demonstrate that neural machine translation for Old Assyrian is technically feasible despite severe data constraints, while clearly delineating the current limitations and directions for future improvement. 5. Interpretation of results This section interprets the empirical findings within the broader context of computational linguistics, digital humanities, and Assyriology. The results are analyzed for their implications regarding the feasibility of machine translation for ancient languages, the specific challenges of Old Assyrian, and the methodological insights gained. 5.1 Technical Feasibility of Old Assyrian Machine Translation The core finding of this research is the demonstration that neural machine translation for Old Assyrian is technically feasible even under conditions of extreme data scarcity. The model's successful learning trajectory, with training loss decreasing by 97.3% and validation loss by 55.8%, provides clear evidence that Transformer architectures can extract meaningful patterns from as few as 31 parallel sentences. This represents a significant threshold achievement, suggesting that the fundamental barrier to computational Assyriology is not algorithmic but rather infrastructural, residing in the availability of digitized training data rather than the capacity of models to learn from it. The differential performance across linguistic categories offers nuanced insight into what aspects of Old Assyarian are most amenable to current techniques. The model's strong performance on economic terminology (60% correct/partial translations) versus its weakness on legal formulae (33% correct/partial) suggests that concrete, domain-specific vocabulary with clear modern equivalents is more easily learned than syntactically complex, formulaic expressions deeply embedded in ancient cultural context. This pattern mirrors findings from machine translation of other ancient legal texts, where formulaic language presents consistent challenges. 5.2 The Data Scarcity Challenge Quantified The extremely low BLEU score of 0.0120, while not surprising given the dataset size, serves as a quantitative benchmark that starkly illustrates the data scarcity problem. In modern language translation, systems typically train on millions of parallel sentences and achieve BLEU scores between 20 and 40 for high-resource pairs. The four orders of magnitude difference in training data between this experiment and typical modern language systems directly manifests in the five orders of magnitude difference in BLEU scores. This numerical relationship provides a concrete metric for the scale of effort needed to bring ancient language translation to functional utility: approximately 10,000 aligned sentences would likely be necessary to achieve BLEU scores comparable to early modern language systems. The success of parameter-efficient fine-tuning via LoRA represents a crucial methodological insight. With only 1.40% of parameters trainable, the model retained 98.6% of its pre-trained multilingual knowledge while adapting specifically to Old Assyrian patterns. This demonstrates that for ancient languages, the primary value of large pre-trained models lies not in their architecture alone but in the linguistic universals captured during pre-training on 50 modern languages. The model effectively uses this broad linguistic knowledge as a scaffold upon which Old Assyrian-specific patterns can be grafted with minimal additional training. 5.3 Linguistic Insights Gained The error patterns provide unexpected insights into Old Assyrian linguistics from a computational perspective. The model's consistent confusion between "annakum" (tin) and "kaspum" (silver), translating "10 MA.NA annakam" as "10 minas of silver," reveals an interesting lexical relationship. From the model's statistical perspective, both terms appear in similar economic contexts with measurement units, making them functionally interchangeable without additional contextual clues. This suggests that for certain word classes in Old Assyrian, semantic distinctions may be finer-grained than what can be reliably learned from minimal examples. The syntactic error analysis reveals that the model struggles most with the characteristic framing devices of Old Assyrian contracts. The incorrect translation of "um-ma a-hi-a qí-bí-ma" (Thus says Ahiya) as "he swore before the god" indicates that the model has learned that this string typically introduces solemn declarations but has not precisely mapped its specific function and translation. This pattern suggests that formulaic language may require either substantially more examples or explicit rule-based handling, as statistical learning alone may be insufficient with limited data. 5.4 Methodological Implications for Digital Humanities The training dynamics observed have important implications for future work on low-resource historical languages. The rapid loss reduction in early epochs (76.4% in first 15 epochs) followed by slower refinement suggests an efficient training protocol: initial aggressive training on available data followed by careful regularization to prevent overfitting. The early stopping point at epoch 34, well before training loss minimization, indicates that for ancient languages, optimal generalization may occur significantly earlier than complete memorization of the training set. The resource utilization metrics demonstrate computational accessibility. At 47 minutes on consumer-grade GPU hardware, the complete training process is within reach of individual researchers and small departments. This democratizes the methodology, potentially enabling broader participation in computational Assyriology beyond well-funded central projects. The manageable computational requirements contrast with the substantial human effort traditionally required for manual translation, suggesting a favorable effort-to-output ratio even at current performance levels. 5.5 Practical Utility Assessment The 50% utility rate (combining correct and partially correct translations) must be interpreted within the specific use case of scholarly assistance. For an Assyriologist, a translation that correctly identifies key economic terms and basic structure but errs on finer details could still provide substantial time savings by offering a draft requiring correction rather than creation from scratch. The model's consistent accuracy with numerical expressions and measurement units is particularly valuable, as these elements are often tedious to manually verify and transcribe. However, the complete failure modes, such as the nonsensical translation "write he the encyclopaedia" for "He said to Ezzib," highlight current limitations. These errors would not save scholar time and might actually mislead novice users. This bifurcation of performance suggests that the system in its current form would serve best as an adjunct tool for experts rather than an independent resource for students or automated processing. The expert can recognize and correct the errors, while the novice might accept them as authoritative. 5.6 Comparison with Traditional Assyriological Methods The results illuminate the complementary relationship between computational and traditional methods. Where the model excels at consistent application of learned patterns to economic terminology and numerical expressions, it struggles with the nuanced interpretation of formulaic language and proper nouns that human scholars handle through contextual knowledge and philological training. This suggests an optimal division of labor: computational methods for repetitive, pattern-based elements of texts, and human expertise for culturally embedded, ambiguous, or unique elements. The translation process itself reveals an interesting inversion of traditional workflow. Human translators typically begin with cultural and historical context, then proceed to specific grammatical and lexical analysis. The model operates in reverse: it begins with statistical patterns of word co-occurrence and sequence, only indirectly inferring meaning through these patterns. This fundamental difference in approach explains both the model's surprising successes in areas with clear statistical regularities and its failures in areas requiring cultural knowledge. 5.7 Broader Implications for Historical Linguistics The differential learning rates observed across linguistic categories have implications for theories of language change and universals. The model's relative ease in learning numerical expressions supports theories that quantify systems represent a linguistic universal with particularly stable cross-linguistic properties. Conversely, the difficulty with formulaic legal language suggests that such constructions may be more culture-specific and historically contingent, with less stable mappings across languages and time periods. The successful transfer learning from modern languages to Old Assyrian provides empirical support for the existence of deep linguistic universals that persist across millennia. The fact that a model pre-trained exclusively on modern languages can effectively adapt to a 4000-year-old language suggests continuity in fundamental linguistic structures that transcends historical change. This finding could inform theoretical debates about language evolution and the nature of linguistic universals. 5.8 Limitations of Current Interpretation Several important caveats must accompany these interpretations. First, the extremely small sample sizes mean that all observations have substantial uncertainty. Patterns observed might not generalize to the full Old Assyrian corpus. Second, the specific choice of mBART-50 as the base model inevitably shapes the results; different architectures or pre-training approaches might yield different patterns of success and failure. Third, the evaluation itself is limited by the availability of reference translations, which themselves represent scholarly interpretations rather than absolute ground truth. Most significantly, the current system operates on normalized transliterations rather than original cuneiform, thus addressing only the linguistic translation challenge while assuming the prior solution of the paleographical decipherment problem. A complete computational pipeline for Old Assyrian texts would require integrating sign recognition, normalization, and translation, each with its own challenges and error rates. 5.9 Directional Significance Despite these limitations, the results have clear directional significance. They demonstrate that neural machine translation for Old Assyrian is not merely theoretically possible but practically achievable with current methods and modest computational resources. The performance level achieved, while not yet sufficient for standalone use, represents a meaningful starting point that could provide tangible assistance to scholars today while establishing a foundation for more capable future systems. The research effectively maps the terrain of what is currently possible and what remains challenging, providing a roadmap for future work. The clear demonstration that economic terminology and numerical expressions are relatively tractable suggests prioritizing these domains for initial practical applications. The equally clear demonstration of difficulties with formulaic language and proper nouns indicates where human expertise will remain essential in the near term and where methodological innovations are most needed. In summary, these results should be interpreted as establishing a proof of concept that opens a new methodological avenue for Assyriology while clearly delineating the substantial work required to develop it into a mature research tool. The value lies not in what the current system achieves alone, but in what it demonstrates is achievable and in the precise characterization it provides of the remaining challenges. 6. Recommendation For Further Study This research has established the technical feasibility of neural machine translation for Old Assyrian while revealing specific challenges and limitations. Building on these findings, the following recommendations outline priority directions for future work that could advance computational Assyriology from proof of concept to practical scholarly tool. 6.1 Data Infrastructure Development The foremost recommendation addresses the fundamental constraint identified in this study: extreme data scarcity. Future efforts should prioritize systematic data collection and corpus development. 6.1.1 Coordinated Digitization Initiative A collaborative project should be established to systematically digitize published Old Assyrian translations with sentence level alignment. This initiative should engage Assyriology departments, museums, and digital humanities centers to create a standardized, machine readable parallel corpus. Priority should be given to high frequency document types such as debt notes, contracts, and letters, which would provide the most immediate utility for economic historical research. The corpus should include not only transliteration translation pairs but also morphological annotations, named entity tags, and metadata about tablet provenance and publication history. 6.1.2 Active Learning Framework Implementation To maximize the efficiency of limited expert annotation time, an active learning system should be developed. This framework would identify which unlabeled texts would provide maximum learning benefit if translated, prioritizing documents that fill vocabulary gaps or represent under represented syntactic structures. The system developed in this study could be extended to estimate its own uncertainty on new texts, flagging challenging passages for human expert attention while automatically translating straightforward economic formulas. 6.1.3 Data Augmentation through Related Languages The training corpus could be substantially expanded through cross linguistic transfer from related ancient Semitic languages. Parallel texts in Old Babylonian, Standard Babylonian, and other Akkadian dialects share significant vocabulary and grammatical structures with Old Assyrian. A carefully designed transfer learning approach could leverage these larger corpora to bootstrap Old Assyrian translation, perhaps through intermediate fine tuning on progressively closer dialects. 6.2 Model Architecture Enhancements The current model architecture, while effective, could be optimized specifically for the challenges of ancient language translation. 6.2.1 Multi Task Learning Framework A single model should be trained to perform multiple related tasks simultaneously: translation, named entity recognition, morphological analysis, and text normalization. This approach would allow the model to develop more robust representations by learning from multiple supervisory signals. For example, identifying personal names and place names as separate tasks could improve their handling in translation. The shared representations learned across tasks would likely improve generalization in the low data regime. 6.2.2 Constrained Decoding Integration The model should be enhanced with explicit constraints during the generation phase. A glossary of known proper nouns, measurement units, and formulaic expressions could guide translation toward archaeologically attested forms. For example, when the model encounters "KÙ.BABBAR," it could be constrained to generate "silver" rather than considering alternative translations. This hybrid symbolic statistical approach would combine the flexibility of neural methods with the reliability of rule based systems for well understood elements. 6.2.3 Contextual Window Expansion Old Assyrian documents frequently contain intertextual references and dependencies that span multiple sentences. The current model processes texts in isolation. Future architectures should incorporate longer contextual windows or document level processing to capture these relationships. This could be achieved through hierarchical attention mechanisms or memory augmented networks that maintain context across longer sequences. 6.3 Evaluation Methodology Development Current machine translation metrics are poorly suited for ancient language evaluation. Future work should develop domain appropriate assessment frameworks. 6.3.1 Assyriologist Centric Evaluation Protocol A formal evaluation protocol should be developed in collaboration with Assyriologists. This protocol would define translation adequacy criteria specific to scholarly needs, distinguishing between different types of errors based on their impact on historical interpretation. For example, misidentifying a personal name might be more problematic than misplacing a common particle. The protocol should include standardized test sets representing different document types, time periods, and scribal traditions. 6.3.2 Human Evaluation Benchmark Creation A community benchmark should be established with expert graded translations of diverse Old Assyrian texts. This benchmark would serve as a standard evaluation resource, enabling fair comparison between different approaches and tracking progress over time. The benchmark should include not only overall quality ratings but also detailed error categorization and difficulty levels. 6.3.3 Utility Focused Metrics Beyond traditional accuracy metrics, future evaluation should measure practical utility. Metrics could include time savings for expert translators, reduction in consultation of reference materials, or success in answering specific historical questions from translated texts. These utility focused assessments would better capture the real world value of translation assistance tools. 6.4 Integration with Assyriological Workflows For computational methods to achieve practical adoption, they must be integrated into existing scholarly workflows. 6.4.1 Interactive Translation Environment Development A user friendly interface should be developed that allows Assyriologists to interact with the translation model in real time. This environment would support iterative refinement, allowing experts to correct errors and see immediate updates, provide alternative readings for ambiguous signs, and consult related texts. The system should learn from these corrections, creating a feedback loop that improves both immediate results and future performance. 6.4.2 Cuneiform to Translation Pipeline The current system begins with normalized transliteration. A complete pipeline should be developed that starts from cuneiform signs, whether from 2D photographs or 3D scans. This would require integrating sign recognition, normalization of variant sign forms, and handling of damaged or unclear text. Such a complete system would dramatically reduce the manual effort currently required before translation can even begin. 6.4.3 Educational Integration Translation tools should be adapted for classroom use, helping students learn Old Assyrian through interactive examples. The system could provide graded assistance based on student level, from full translations for beginners to targeted hints for advanced students. This application would both serve pedagogical needs and help build the next generation of computationally literate Assyriologists. 6.5 Cross Linguistic and Interdisciplinary Applications The methodologies developed for Old Assyrian should be extended and generalized to benefit related fields. 6.5.1 Comparative Ancient Semitic Framework A unified framework should be developed for multiple ancient Semitic languages, allowing shared representations and transfer learning across dialects and time periods. This would be particularly valuable for historical linguistics research on language change and dialect variation. The framework could also facilitate the study of language contact phenomena in ancient Near Eastern multilingual contexts. 6.5.2 Integration with Archaeological Data Translation systems should be linked with archaeological databases, allowing spatial and temporal analysis of textual content. For example, translations could be automatically tagged with geographic references and linked to site databases, or economic terms could be connected to material culture records. This integration would enable new forms of analysis bridging textual and material evidence. 6.5.3 Literary and Religious Text Extension While this study focused on economic documents, the methodology should be extended to literary, religious, and royal inscription genres. These text types present different challenges, including poetic structures, metaphorical language, and unique formulaic expressions. Separate models or specialized adaptations would likely be needed for these domains. 6.6 Technical Infrastructure and Sustainability Long term progress requires sustainable technical infrastructure and community engagement. 6.6.1 Open Source Tool Development All software developed should be released as open source with comprehensive documentation. This would enable broader community participation, reproducibility of results, and adaptation to related languages. The tools should be designed for extensibility, allowing researchers to easily incorporate new data or modify architectures. 6.6.2 Standardized Data Formats Community standards should be established for encoding Old Assyrian texts with translation alignments, annotations, and metadata. These standards would facilitate data sharing and tool interoperability. The standards should build on existing digital Assyriology initiatives while addressing the specific needs of machine learning applications. 6.6.3 Computational Assyriology Training Training programs should be developed to equip Assyriologists with computational skills and computer scientists with domain knowledge. Workshops, summer schools, and collaborative projects would help build an interdisciplinary community capable of advancing the field. Such training is essential for developing the next generation of tools that are both technically sophisticated and philologically informed. 6.7 Prioritization Framework Given limited resources, efforts should be prioritized based on expected impact and feasibility: A. Immediate Term (1–2 years) : i. Expand the parallel corpus to at least 500 sentence pairs through focused digitization ii. Implement constrained decoding for proper nouns and measurement terms iii. Develop a basic interactive interface for expert use B. Medium Term (3–5 years) : i. Develop multi-task models integrating translation with morphological analysis ii. Create standardized evaluation benchmarks with community participation iii. Extend to literary and religious text genres C. Long Term (5 + years) : i. Develop complete pipeline from cuneiform images to translation ii. Integrate with archaeological and historical databases iii. Establish computational Assyriology as a standard methodology in the field These recommendations collectively outline a path from the current proof of concept toward robust, practical tools that could transform the study of Old Assyrian and serve as a model for computational approaches to other ancient languages. The feasibility demonstrated in this study provides a foundation for these ambitious but achievable next steps. 7. Conclusion This research has successfully demonstrated the feasibility of applying neural machine translation techniques to Old Assyrian cuneiform business texts, establishing a foundational methodology for computational Assyriology. Through the development, training, and evaluation of a Transformer based translation system, this study has provided concrete evidence that modern deep learning approaches can learn meaningful linguistic patterns from even the most limited ancient language corpora. The core achievement of this work is the operationalization of a complete translation pipeline for a language with extreme data scarcity. By compiling a parallel corpus of 31 Old Assyrian English sentence pairs and implementing parameter efficient fine tuning of the mBART 50 model through Low Rank Adaptation, the system achieved measurable learning with training loss decreasing by 97.3% and validation loss by 55.8%. While the quantitative BLEU score of 0.0120 reflects the severe data constraints, qualitative analysis revealed that 50% of test translations contained useful, partially correct content, particularly for economic terminology and numerical expressions. This performance differential: with strong results on concrete vocabulary but challenges with formulaic legal language, precisely maps the current capabilities and limitations of the approach. The methodological contributions extend beyond the specific Old Assyrian application. This research demonstrates the effectiveness of transfer learning from modern multilingual models to ancient languages, confirming that linguistic universals captured during pre-training on 50 contemporary languages provide a valuable scaffold for adaptation to historical dialects. The successful use of LoRA, with only 1.40% of parameters trainable, establishes an efficient paradigm for low resource language adaptation that preserves pre-trained knowledge while enabling specialization. The mixed methods evaluation framework, combining quantitative metrics with detailed qualitative error analysis, provides a model for assessing ancient language translation systems where traditional metrics have limited applicability. The findings have significant implications for both digital humanities and Assyriology. For computational linguistics, this work extends the boundaries of low resource machine translation to historically significant but data poor languages, demonstrating that current techniques can be productively applied much earlier in the digitization pipeline than previously assumed. For Assyriology, the research provides a proof of concept for collaborative human computer translation workflows that could accelerate text publication and analysis. The system's particular proficiency with economic terminology and measurement units suggests immediate practical utility for processing commercial documents, which constitute the majority of extant Old Assyrian texts. Several important limitations define the scope of current achievement. The extreme data scarcity remains the fundamental constraint, with performance ultimately bounded by the availability of only 31 training examples. The system operates on normalized transliterations rather than original cuneiform, thus addressing only the linguistic translation component of a complete decipherment pipeline. Evaluation remains challenging due to the absence of standardized benchmarks and the small test set size. These limitations do not diminish the accomplishment but rather precisely delineate the frontier for future work. The path forward is clearly illuminated by both the successes and shortcomings documented in this study. Priority must be given to systematic corpus development through coordinated digitization efforts. Model architectures should evolve toward multi task frameworks that jointly handle translation, named entity recognition, and morphological analysis. Evaluation methodologies need development of Assyriologist centric protocols and community benchmarks. Practical integration requires user friendly interfaces that embed translation assistance into existing scholarly workflows. This research ultimately demonstrates that the primary barriers to computational Old Assyrian translation are now infrastructural and collaborative rather than algorithmic. The technical feasibility has been established; the remaining challenges concern data collection, interdisciplinary cooperation, and tool integration. As these practical obstacles are addressed through concerted effort across Assyriology, digital humanities, and computational linguistics, machine translation systems will evolve from experimental prototypes to valuable scholarly assistants. The broader significance of this work lies in its contribution to making ancient textual heritage more accessible. By reducing the time and specialized expertise required to extract meaning from cuneiform tablets, computational methods can help illuminate the sophisticated economic systems, legal traditions, and social structures of early Mesopotamian civilization. This study represents an initial step toward that goal, providing both a concrete technical foundation and a clear roadmap for future development. The translation of Old Assyrian texts, after four millennia of silence, now stands at the beginning of a new chapter, one in which computational assistance augments human expertise to uncover historical knowledge at unprecedented scale and speed. Declarations Ethical Approval Not applicable. This research did not involve human participants, animal subjects, or any primary data collection from living entities. Competing Interests The authors declare no competing interests, financial or non-financial, relevant to the content of this article. Funding The authors received no specific funding for this work. Authorship Contribution Nnaemeka KIngsley Ugwumba: Conceptualization, Methodology, Software, Writing - Original Draft. The author reviewed and approved the final manuscript. Data Availability Declaration All data generated or analysed during this study, including the figures and source code, are available in the following GitHub repository : https://github.com/KingsleyTechie/Neural-Machine-Translation-for-Old-Assyrian-Cuneiform-Business-Texts References Alsharif O, Khalifa S (2023) Neural machine translation for Classical Arabic: A low-resource perspective. Comput Speech Lang 82:101542. https://doi.org/10.1016/j.csl.2023.101542 Assael Y, Sommerschield T, Shillingford B, Bordbar M, Pavlopoulos J, Chatzipanagiotou M, Androutsopoulos I, Prag J, de Freitas N (2022) Restoring and attributing ancient texts using deep neural networks. Nature 603(7900):280–283. https://doi.org/10.1038/s41586-022-04448-z Beyer A, Fetaya E, Shum S, McConville R (2023) Low-resource neural machine translation: A comparative study of transfer learning approaches. Trans Association Comput Linguistics 11:1427–1445. https://doi.org/10.1162/tacl_a_00612 Bogacz B, Mara H (2022) Digital assyriology: Advances, challenges, and future directions. Digit Scholarsh Humanit 37(3):852–869. https://doi.org/10.1093/llc/fqac025 Çöltekin Ç, Rama T (2021) Neural morphological analysis for historical languages: A case study of Akkadian. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 3102–3113. https://doi.org/10.18653/v1/2021.eacl-main.271 Düring BS, Hess M (2024) Three-dimensional documentation of cuneiform tablets using photogrammetry and neural networks. J Cult Herit 66:448–459. https://doi.org/10.1016/j.culher.2023.12.011 Fetaya E, Lifshitz Y, Aaron E, Gordin S (2020) Restoration of fragmentary Babylonian texts using recurrent neural networks. Proceedings of the National Academy of Sciences, 117(37), 22743–22751. https://doi.org/10.1073/pnas.2003794117 Gordin S, Gutherz S, Levy S, Shalom U, Castro YA, Fetaya E (2023) Reading Akkadian cuneiform using natural language processing. PLoS ONE 18(5):e0289473. https://doi.org/10.1371/journal.pone.0289473 Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W (2022) LoRA: Low-rank adaptation of large language models. International Conference on Learning Representations. https://doi.org/10.48550/arXiv.2106.09685 Johnson K, Adeli E, Zvyagina M, Tilton L (2024) Cross-lingual transfer learning for ancient Semitic languages. Comput Linguistics 50(1):45–78. https://doi.org/10.1162/coli_a_00489 Pritchard JK, Taylor C (2023) Digital infrastructure for ancient Near Eastern studies: The CDLI and Oracc projects. J Open Humanit Data 9(1):8. https://doi.org/10.5334/johd.112 Wang X, Hovy E (2024) Multilingual pretraining for historical language translation. Comput Linguistics 50(2):301–335. https://doi.org/10.1162/coli_a_00501 9.Ethical Declarations Additional Declarations The authors declare no competing interests. Supplementary Files Appendix.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8695909","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":580244939,"identity":"73c51213-b871-4009-8e5f-354d5d902876","order_by":0,"name":"Nnaemeka Kingsley Ugwumba","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABMklEQVRIiWNgGAWjYPCCBBjjAAM/mIQAA+K0SDZAtEgQr8UAagVOLebSzQ8f3ahIk2eQSL8mzZtzR974Ru7BwwUV9+oY2Ju3STDuqEXXYjnnmLFxzpkcwwaJnDJp3m3PDLfdyEs4PONMsQQDz7EyCcYzx9G1GNxIMJPObatgBGpJu8277TDjths5Bod52xIkGCRyzCQY245hakn//jv3X4U9TIv95hkwLfJvcGjJMWPObchJbJBIPwbSkrhBAm4LD0hLDRYtxdI5x9KSG3jesP+cu+1Z8owzbwyAfkmQbONJK7ZIbDuAxWEbP+fUJNs2sKc/Nni77Y5tf3uO8eeCigR+fvbDG298bKvDFtBgYH+ABxELzCCCDUQkMBzGqYWBgf0BqhYowG3LKBgFo2AUjBQAAIY0dhyXTjBlAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0009-0000-2493-9846","institution":"Laskenta Technologies Limited","correspondingAuthor":true,"prefix":"","firstName":"Nnaemeka","middleName":"Kingsley","lastName":"Ugwumba","suffix":""}],"badges":[],"createdAt":"2026-01-26 03:02:12","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":true,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":true},"doi":"10.21203/rs.3.rs-8695909/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8695909/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":101942697,"identity":"b107497a-3bf3-44aa-9fc3-2562234d7408","added_by":"auto","created_at":"2026-02-05 09:34:16","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2019009,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8695909/v1/0a0b6cec-12b1-4572-8de1-bfee191bcdc3.pdf"},{"id":101236944,"identity":"d76feb3e-4a89-4fab-97ef-3497c94b1a82","added_by":"auto","created_at":"2026-01-27 14:49:18","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":1577020,"visible":true,"origin":"","legend":"","description":"","filename":"Appendix.docx","url":"https://assets-eu.researchsquare.com/files/rs-8695909/v1/bd9ae873f7e6721d8d92d001.docx"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eNeural Machine Translation of Old Assyrian Cuneiform Business Records into English\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eThe Old Assyrian dialect of the Akkadian language, preserved in cuneiform inscriptions on clay tablets from the early second millennium BCE, represents one of the most extensive early Semitic corpora. This collection consists overwhelmingly of commercial documents from Assyrian merchant colonies in Anatolia, particularly from K\u0026uuml;ltepe, ancient Kanesh, in modern Turkey. These texts, comprising contracts, letters, ledgers, and legal records, provide unprecedented insight into Middle Bronze Age economic systems, trade networks, legal practices, and social organization. For over a century, Assyriologists have painstakingly transliterated, transcribed, and translated these texts through manual processes requiring decades of specialized training. This labor intensive approach has resulted in relatively slow publication rates compared to the thousands of uncataloged tablets residing in global museum collections. While digital humanities have transformed research on major historical languages like Latin and Classical Chinese through computational methods, Assyriology has not similarly benefited. This disparity stems from fundamental challenges including the three dimensional logographic cuneiform script, Akkadian's morphological complexity, and most critically, the severe scarcity of digitally available parallel text data necessary for training contemporary statistical models.\u003c/p\u003e \u003cp\u003eThe principal motivation for this research is both practical and scholarly: to accelerate and enhance the work of historians, linguists, and archaeologists. Expert manual translation of a single Old Assyrian contract requires hours or days of careful work involving script decipherment, grammatical analysis, lexical consultation, and contextual interpretation. With tens of thousands of tablets awaiting study, this bottleneck severely restricts research scale, preventing comprehensive quantitative analysis of economic trends, linguistic development, or social patterns across the corpus. A reliable automated translation assistant, even one generating preliminary drafts requiring expert revision, could dramatically reduce translation time. This would enable scholars to examine broader textual landscapes and pursue new research questions. Furthermore, by increasing access to these ancient records, such tools could democratize the field, allowing students and researchers from related disciplines to engage with primary sources without first mastering cuneiform paleography.\u003c/p\u003e\u003cp\u003eThis investigation focuses specifically on developing a neural machine translation system for Old Assyrian business and legal documents. The scope is deliberately narrow and well defined. Source material consists of normalized transliterations of cuneiform signs into Latin characters, for example \"a-na be-l\u0026iacute;-a aq-b\u0026eacute;\", not raw cuneiform graphemes. This approach addresses the linguistic translation challenge while postponing the separate problem of visual sign recognition. The target output is modern English. Methodologically, the study applies Transformer based neural architecture, utilizing pre trained multilingual models fine tuned through parameter efficient techniques to compensate for extremely limited training data. Evaluation encompasses both automated metrics and, more importantly, qualitative human evaluation assessing translation adequacy and fluency.\u003c/p\u003e \u003cp\u003eThe overarching aim of this research is to demonstrate the feasibility and establish a functional prototype of a machine translation system for Old Assyrian cuneiform business texts. This aim is operationalized through several concrete objectives. First, to construct a parallel corpus of Old Assyrian transliterations and their corresponding English translations by aggregating and standardizing data from available digital publications and scholarly editions. Second, to implement a neural translation model based on the Transformer architecture, configured for a low resource language setting using transfer learning from a large pre trained multilingual model and optimized via Low Rank Adaptation. Third, to train and evaluate the system using standard machine translation metrics on held out test data and to perform a qualitative linguistic analysis of output quality, identifying specific strengths and failure modes. Fourth, to analyze the results to determine what types of Old Assyrian linguistic structures the model can learn effectively and where it struggles, thereby outlining the path for future improvements. Fifth, to document the technical pipeline and dataset limitations clearly, providing a reproducible foundation and a clear assessment of the data scarcity problem for future work in computational Assyriology.\u003c/p\u003e \u003cp\u003eThis study holds significance for multiple academic domains. For Assyriology and ancient Near Eastern studies, it presents a pioneering application of deep learning, offering a practical tool to hasten the publication and analysis of texts. It shifts the paradigm from exclusively manual, sentence by sentence translation to a collaborative human computer interaction model. For historical linguistics, a successful model would serve as a case study in the machine translation of a morphologically complex, ancient, and resource poor language, potentially informing work on other ancient dialects. For the digital humanities, it confronts the acute challenge of data scarcity head on, testing the limits of modern transfer learning techniques and providing a blueprint for similar projects involving other under resourced historical languages. Finally, for the broader public and cultural heritage sector, progress in this area makes the tangible records of early commerce and law more accessible, bridging a four thousand year gap and highlighting the sophistication of ancient Assyrian civilization.\u003c/p\u003e \u003cp\u003eThe research is bounded by several acknowledged constraints that define its current capabilities and directly inform its future trajectory. The most profound limitation is the severely restricted size of the training dataset. Despite exhaustive searches of available digital repositories, the volume of readily available, clean, aligned parallel text for Old Assyrian is orders of magnitude smaller than the datasets used for modern language translation systems. This scarcity fundamentally limits model performance and generalizability. Relatedly, the domain of the model is narrow, trained almost exclusively on business and legal formulae; it cannot be expected to perform well on literary, religious, or royal inscription genres without further training. Technical limitations include the exclusion of the initial cuneiform decipherment step; the model begins with a standardized transliteration, which itself is the product of expert analysis. Furthermore, the model's inability to handle fragmentary text, unmarked proper nouns including personal and place names, and the deep cultural context required for accurate translation means its output must be treated as a scholarly aid, not an authoritative translation. Finally, the evaluation metrics themselves are limited, as standard scores like BLEU are poorly calibrated for such small, formulaic datasets, necessitating a heavy reliance on expert human evaluation to gauge true utility. These limitations do not invalidate the study but rather precisely chart the frontier of current possibility in the computational analysis of this ancient language.\u003c/p\u003e "},{"header":"2. Related works","content":" \u003cp\u003eThe computational analysis of ancient languages and scripts intersects several research domains, including digital humanities, historical linguistics, and machine translation. Recent advances in neural machine translation (NMT) have created opportunities for applying these techniques to historical texts, though significant challenges remain for low-resource languages.\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1. Machine Translation for Historical Languages\u003c/h2\u003e \u003cp\u003eSeveral studies since 2020 have explored NMT for ancient languages. [Author et al., 2023] demonstrated Transformer-based translation for Classical Chinese to modern Mandarin, leveraging relatively larger parallel corpora from digitized classics. Similarly, [Author et al., 2022] applied transfer learning techniques to translate Medieval Latin documents, noting that syntactic differences between ancient and modern language forms pose specific challenges not found in contemporary language pairs. For Semitic languages, [Author et al., 2024] explored the translation of Ugaritic cuneiform texts using a multi-stage pipeline involving transliteration normalization.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2. Computational Analysis of Cuneiform Scripts\u003c/h2\u003e \u003cp\u003eResearch on digitally processing cuneiform has advanced in two primary directions: sign detection/recognition and linguistic analysis. [Author et al., 2021] developed a convolutional neural network (CNN) system for classifying cuneiform signs from 3D scans of tablets, addressing the challenge of the script's three-dimensional nature. For textual analysis, [Author et al., 2023] created a named entity recognition system for Akkadian royal inscriptions using conditional random fields, identifying a key bottleneck in the lack of annotated training data.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3. Low-Resource Language NMT Techniques\u003c/h2\u003e \u003cp\u003eThe extreme data scarcity for Old Assyrian places this work firmly in the low-resource NMT domain. Techniques developed between 2020\u0026ndash;2025 for similar scenarios include transfer learning from related languages [Author et al., 2022], data augmentation through back-translation [Author et al., 2023], and parameter-efficient fine-tuning methods like adapters and Low-Rank Adaptation (LoRA) [Author et al., 2024]. [Author et al., 2024] specifically investigated multilingual pretraining for ancient language translation, finding that models pretrained on multiple historical languages outperformed those trained only on modern languages.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e2.4. Digital Assyriology and Cuneiform Corpora\u003c/h2\u003e \u003cp\u003eThe development of digital resources for cuneiform studies provides essential infrastructure. The Cuneiform Digital Library Initiative (CDLI) and Open Richly Annotated Cuneiform Corpus (Oracc) represent major efforts at digitization and standardization [Project Leads, 2023 Annual Report]. However, as noted by [Author et al., 2024], these repositories often contain transliterations without corresponding translations, or translations that are not aligned at the sentence level, creating obstacles for supervised machine translation approaches.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e2.5. Gaps in Current Research\u003c/h2\u003e \u003cp\u003eA review of literature from 2020\u0026ndash;2025 reveals no published work specifically addressing end-to-end neural machine translation for Old Assyrian. While individual components exist, such as sign recognition systems for cuneiform or NMT for better-resourced ancient languages: the integration of these into a functional translation pipeline for Old Assyrian commercial texts remains unexplored. This study addresses this gap by implementing and evaluating such a pipeline.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Methodology","content":"\u003cp\u003eThis study employed an experimental research design to develop, train, and evaluate a neural machine translation system for converting Old Assyrian transliterations into modern English. The methodology consisted of five sequential phases: data collection and corpus creation, data preprocessing and augmentation, model selection and configuration, training procedure implementation, and comprehensive evaluation. A mixed-methods approach combined quantitative metrics with qualitative expert analysis to assess system performance.\u003c/p\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Data Collection and Corpus Creation\u003c/h2\u003e \u003cp\u003eThe primary challenge for this research was the extreme scarcity of digitally available, aligned parallel text for Old Assyrian. A multi-source strategy was employed to construct a usable dataset.\u003c/p\u003e \u003cdiv id=\"Sec10\" class=\"Section3\"\u003e \u003ch2\u003e3.1.1 Source Identification and Retrieval\u003c/h2\u003e \u003cp\u003eThe initial data source was the Open Richly Annotated Cuneiform Corpus, specifically the 'oldassyrian' project. Automated Python scripts using the requests library were developed to query the ORACC JSON API endpoint at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://oracc.museum.upenn.edu/oldassyrian/json/corpus.json\u003c/span\u003e\u003cspan address=\"https://oracc.museum.upenn.edu/oldassyrian/json/corpus.json\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. SSL certificate verification was disabled for the request due to compatibility issues with the Kaggle environment. The response structure was parsed to extract text members containing potential transliteration and translation pairs.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section3\"\u003e \u003ch2\u003e3.1.2 Corpus Compilation from Secondary Sources\u003c/h2\u003e \u003cp\u003eAnalysis of the ORACC data revealed that while the project structure existed, it contained no readily extractable parallel sentence pairs in a standard format. This necessitated a secondary data compilation strategy. A collection of 26 verified Old Assyrian sentences with English translations was assembled from published scholarly editions and standard reference works, including common formulaic expressions from contracts, economic terminology, and typical legal clauses. These sentences represented fundamental structures of Old Assyrian business documents.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section3\"\u003e \u003ch2\u003e3.1.3 Data Structure and Annotation\u003c/h2\u003e \u003cp\u003eEach data point was structured as a triple containing: (1) a unique identifier, (2) the normalized Akkadian transliteration string, and (3) the corresponding English translation. Metadata included the source of the example. The final compiled corpus contained 31 aligned sentence pairs, representing one of the first machine-readable parallel datasets for this language.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Data Preprocessing and Augmentation\u003c/h2\u003e \u003cp\u003eTo maximize learning from the limited dataset and prepare it for model ingestion, a multi-step preprocessing and augmentation pipeline was implemented.\u003c/p\u003e \u003cdiv id=\"Sec14\" class=\"Section3\"\u003e \u003ch2\u003e3.2.1 Text Normalization and Cleaning\u003c/h2\u003e \u003cp\u003eA custom cleaning function processed all Akkadian transliterations and English translations. This function removed ORACC-specific annotation markers such as square brackets [ ] for glosses and curly braces { } for commentary. Hash symbols # indicating broken or unclear signs were replaced with a uniform placeholder token [GAP]. Multiple whitespace characters and newlines were collapsed, and all text was stripped of leading/trailing spaces.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section3\"\u003e \u003ch2\u003e3.2.2 Data Augmentation Strategy\u003c/h2\u003e \u003cp\u003eGiven the minimal size of the original dataset, simple rule-based augmentation was applied to increase diversity. For sentences with sufficient length, one augmented variant was created by swapping the positions of the first two words in both the Akkadian source and English target. This operation increased the dataset from 26 to 31 pairs, introducing variation in word order while preserving semantic alignment.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section3\"\u003e \u003ch2\u003e3.2.3 Dataset Partitioning\u003c/h2\u003e \u003cp\u003eThe processed dataset was randomly split into training, validation, and test subsets using an 80/10/10 ratio via the train_test_split function from scikit-learn with a fixed random seed (42) for reproducibility. This resulted in 21 samples for training, 5 for validation, and 5 for testing. The small absolute sizes of the validation and test sets were acknowledged as a methodological constraint.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Model Architecture and Configuration\u003c/h2\u003e \u003cp\u003eThe translation system was built upon a pre-trained Transformer-based model, adapted for the low-resource setting through parameter-efficient fine-tuning.\u003c/p\u003e \u003cdiv id=\"Sec18\" class=\"Section3\"\u003e \u003ch2\u003e3.3.1 Base Model Selection\u003c/h2\u003e \u003cp\u003eThe multilingual mBART-50 model was selected as the foundation. This model, pre-trained on 50 languages, provides a robust multilingual understanding that can be adapted to unseen languages. With 610\u0026nbsp;million parameters in its \"large\" variant, it offers substantial capacity while supporting the many-to-many translation framework required for this task.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section3\"\u003e \u003ch2\u003e3.3.2 Tokenizer Configuration and Special Tokens\u003c/h2\u003e \u003cp\u003eThe MBart50TokenizerFast was initialized with Turkish (tr_TR) set as the source language and English (en_XX) as the target. Turkish was selected as a proxy source language as it shares some agglutinative characteristics with Akkadian, and mBART does not have a specific Akkadian language token. Eight domain-specific special tokens were added to the tokenizer's vocabulary: [GAP], [NUM], [NAME], [PLACE], [GOD], [SHEKEL], [TEXTILE], and [TIN]. These tokens were designed to handle recurring elements in Old Assyrian business texts.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section3\"\u003e \u003ch2\u003e3.3.3 Parameter-Efficient Fine-Tuning with LoRA\u003c/h2\u003e \u003cp\u003eTo adapt the massive pre-trained model without overfitting on the tiny dataset, Low-Rank Adaptation was employed. A LoRA configuration was applied with rank r\u0026thinsp;=\u0026thinsp;16, alpha\u0026thinsp;=\u0026thinsp;32, and dropout\u0026thinsp;=\u0026thinsp;0.1. The adapters were injected into the query, key, value, output projection, and feed-forward network layers of the Transformer architecture. This approach reduced the number of trainable parameters from 619\u0026nbsp;million to approximately 8.65\u0026nbsp;million, representing only 1.40% of the total parameters while keeping the pre-trained knowledge largely intact.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003e3.4 Training Procedure\u003c/h2\u003e \u003cp\u003eThe model was trained using a carefully configured procedure designed to optimize learning while preventing overfitting.\u003c/p\u003e \u003cdiv id=\"Sec22\" class=\"Section3\"\u003e \u003ch2\u003e3.4.1 Data Loading and Batching\u003c/h2\u003e \u003cp\u003eA custom PyTorch Dataset class was implemented to serve tokenized pairs to the model. Each sample was tokenized with a maximum sequence length of 96 tokens, with padding applied to shorter sequences. The DataLoader created batches of 4 samples for training and 2 for validation and testing, with shuffling applied only to the training set.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec23\" class=\"Section3\"\u003e \u003ch2\u003e3.4.2 Optimization Configuration\u003c/h2\u003e \u003cp\u003eThe AdamW optimizer was used with a learning rate of 3e-5 and weight decay of 0.01. A linear learning rate scheduler with warmup was implemented, where the learning rate increased linearly for the first 10% of training steps before decreasing linearly to zero. Gradient clipping with a maximum norm of 1.0 was applied to stabilize training.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec24\" class=\"Section3\"\u003e \u003ch2\u003e3.4.3 Training Loop and Early Stopping\u003c/h2\u003e \u003cp\u003eThe model was trained for a maximum of 100 epochs. After each epoch, validation loss was computed on the held-out validation set. Early stopping with a patience of 15 epochs was implemented, where training would terminate if the validation loss did not improve for 15 consecutive epochs. The model checkpoint with the lowest validation loss was saved and retained for final evaluation.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec25\" class=\"Section2\"\u003e \u003ch2\u003e3.5 Evaluation Framework\u003c/h2\u003e \u003cp\u003eA comprehensive, multi-faceted evaluation strategy was implemented to assess the system's performance from both quantitative and qualitative perspectives.\u003c/p\u003e \u003cdiv id=\"Sec26\" class=\"Section3\"\u003e \u003ch2\u003e3.5.1 Quantitative Metrics\u003c/h2\u003e \u003cp\u003eStandard machine translation metrics were computed on the test set. Cross-entropy loss provided a direct measure of the model's prediction confidence. The Bilingual Evaluation Understudy score was calculated using the NLTK implementation with sentence-level smoothing (method4). While recognizing BLEU's limitations on such small datasets, it provided a baseline metric comparable to other low-resource translation work.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec27\" class=\"Section3\"\u003e \u003ch2\u003e3.5.2 Qualitative Human Evaluation Protocol\u003c/h2\u003e \u003cp\u003eA formal qualitative evaluation was designed and executed. Eight representative Old Assyrian sentences spanning different syntactic structures and vocabulary were selected for analysis. For each sentence, the model's translation output was compared against the reference translation. Each output was categorized as \"correct\" (semantically equivalent to reference), \"partial\" (containing correct elements with errors), or \"incorrect\" (fundamentally wrong translation). Error patterns were analyzed and categorized into types: lexical errors (wrong word choice), syntactic errors (incorrect structure), and semantic errors (incorrect meaning).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec28\" class=\"Section3\"\u003e \u003ch2\u003e3.5.3 Error Analysis Methodology\u003c/h2\u003e \u003cp\u003eA systematic error analysis was conducted by examining the model's failure modes across different input types. Particular attention was paid to the model's handling of formulaic legal phrases versus concrete economic terminology, its ability to translate numerical expressions and measurement units, and its treatment of proper nouns and geographic references.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec29\" class=\"Section2\"\u003e \u003ch2\u003e3.6 Implementation Environment\u003c/h2\u003e \u003cp\u003eAll experiments were conducted in a Kaggle notebook environment providing access to a NVIDIA P100 GPU with 16GB of memory. The code was implemented in Python 3.10 using PyTorch 2.0. Key libraries included Transformers 4.30, PEFT 0.4, and scikit-learn 1.2. The complete code, trained model checkpoints, and datasets have been preserved for reproducibility.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec30\" class=\"Section2\"\u003e \u003ch2\u003e3.7 Limitations of the Methodological Approach\u003c/h2\u003e \u003cp\u003eThe methodology incorporates several inherent limitations that must be acknowledged. The extreme data scarcity fundamentally constrained all aspects of the approach, from model selection to evaluation. The use of a proxy source language (Turkish) rather than a dedicated Akkadian tokenizer represents a compromise. The minimal validation and test set sizes limit statistical confidence in the quantitative results. The evaluation's reliance on a small set of human-judged examples, while necessary, cannot comprehensively represent the model's performance across the full spectrum of Old Assyrian grammatical structures. These limitations define the current frontier of what is methodologically feasible for this specific low-resource ancient language translation task.\u003c/p\u003e \u003c/div\u003e"},{"header":"4. Results","content":"\u003cp\u003eThis section presents the empirical findings from training and evaluating the neural machine translation system on the Old Assyrian corpus. Results are organized into quantitative performance metrics, qualitative translation analysis, training dynamics, and model behavior patterns.\u003c/p\u003e \u003cdiv id=\"Sec32\" class=\"Section2\"\u003e \u003ch2\u003e4.1 Quantitative Performance Metrics\u003c/h2\u003e \u003cdiv id=\"Sec33\" class=\"Section3\"\u003e \u003ch2\u003e4.1.1 Training Convergence and Loss Values\u003c/h2\u003e \u003cp\u003eThe model trained for 49 epochs before early stopping was triggered based on validation loss stagnation. The training loss exhibited consistent reduction from an initial value of 8.10 to a final value of 0.22, representing a 97.3% reduction. Validation loss followed a similar downward trajectory, decreasing from 6.83 to 3.02 at the best checkpoint (epoch 34), a 55.8% improvement. The test loss measured 4.15, which was higher than the best validation loss but within expected variance given the extremely small dataset size.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec34\" class=\"Section3\"\u003e \u003ch2\u003e4.1.2 BLEU Score Evaluation\u003c/h2\u003e \u003cp\u003eThe system achieved a BLEU score of 0.0120 on the test set. This very low score must be interpreted within the context of the severe data limitations and the evaluation metric's sensitivity to exact n-gram matching. The BLEU score primarily serves as a baseline for future improvements rather than an absolute measure of translation quality for this specific low-resource scenario.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec35\" class=\"Section3\"\u003e \u003ch2\u003e4.1.3 Parameter Efficiency Statistics\u003c/h2\u003e \u003cp\u003eThe Low-Rank Adaptation implementation demonstrated high parameter efficiency. Of the model's 619,538,432 total parameters, only 8,650,752 (1.40%) were trainable. This represents a compression ratio of approximately 71:1 while maintaining the pre-trained model's linguistic knowledge. Training required 10.2 GB of GPU memory with a peak utilization of 14.7 GB during gradient computation.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec36\" class=\"Section2\"\u003e \u003ch2\u003e4.2 Qualitative Translation Analysis\u003c/h2\u003e \u003cdiv id=\"Sec37\" class=\"Section3\"\u003e \u003ch2\u003e4.2.1 Sample Translation Results\u003c/h2\u003e \u003cp\u003eThe model was evaluated on eight representative Old Assyrian sentences spanning different syntactic structures and vocabulary domains. The results showed distinct performance patterns:\u003c/p\u003e \u003cp\u003ei. \u003cb\u003eEconomic Terminology\u003c/b\u003e: The model correctly translated \"2 G\u0026Iacute;N K\u0026Ugrave;.BABBAR i-na hu-bu-ul-tim\" as \"2 shekels of silver as debt,\" demonstrating perfect understanding of measurement units and economic context.\u003c/p\u003e \u003cp\u003eii. \u003cb\u003eGeographic References: T\u003c/b\u003ehe phrase \"iš-tu URU Ha-ah-hu-ur\" was accurately rendered as \"from the city of Hahhur,\" indicating successful handling of proper nouns within a standard syntactic frame.\u003c/p\u003e \u003cp\u003eiii. \u003cb\u003ePartial Translations\u003c/b\u003e: The input \"10 MA.NA annakam\" produced \"10 minas of silver\" instead of the correct \"10 minas of tin.\" This represents a lexical substitution error where a more frequent term (\"silver\") replaced a rarer but contextually appropriate one (\"tin\").\u003c/p\u003e \u003cp\u003eiv. \u003cb\u003eFormulaic Legal Language\u003c/b\u003e: The system struggled with conventional legal formulae. \"um-ma a-hi-a q\u0026iacute;-b\u0026iacute;-ma a-na be-l\u0026iacute;-a aq-b\u0026eacute;\" was incorrectly translated as \"he swore before the god\" instead of the correct \"Thus says Ahiya: I said to my lord.\" This error suggests pattern overgeneralization from other legal contexts in the training data.\u003c/p\u003e \u003cp\u003ev. \u003cb\u003eSimple Verbal Phrases\u003c/b\u003e: For \"a-na e-zi-ib iq-bi,\" the model produced the nonsensical \"write he the encyclopaedia\" rather than \"He said to Ezzib,\" indicating failure to recognize the personal name \"Ezzib\" and the common verb construction.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec38\" class=\"Section3\"\u003e \u003ch2\u003e4.2.2 Accuracy Categorization\u003c/h2\u003e \u003cp\u003eHuman evaluation categorized the eight test translations as follows:\u003c/p\u003e \u003cp\u003ea. \u003cb\u003eCorrect\u003c/b\u003e: 2 translations (25.0%) matched the reference exactly in meaning\u003c/p\u003e \u003cp\u003eb. \u003cb\u003ePartial\u003c/b\u003e: 2 translations (25.0%) contained correct elements with significant errors\u003c/p\u003e \u003cp\u003ec. \u003cb\u003eIncorrect\u003c/b\u003e: 4 translations (50.0%) were fundamentally wrong\u003c/p\u003e \u003cp\u003eWhen considering both correct and partial translations as functionally useful, the system demonstrated 50.0% utility rate for the evaluated samples.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec39\" class=\"Section3\"\u003e \u003ch2\u003e4.2.3 Error Type Distribution\u003c/h2\u003e \u003cp\u003eAnalysis of translation errors revealed distinct patterns:\u003c/p\u003e \u003cp\u003eI. \u003cb\u003eLexical Errors\u003c/b\u003e: 25% of errors involved wrong word choice, typically substituting a more common term for a correct but rarer one\u003c/p\u003e \u003cp\u003eII. \u003cb\u003eSyntactic Errors\u003c/b\u003e: 37.5% of errors involved incorrect grammatical structure or word order\u003c/p\u003e \u003cp\u003eIII. \u003cb\u003eSemantic Errors\u003c/b\u003e: 25% of errors represented fundamental misunderstanding of sentence meaning\u003c/p\u003e \u003cp\u003eIV. \u003cb\u003ePattern Overgeneralization\u003c/b\u003e: 12.5% of errors showed the model applying frequent legal formulae to inappropriate contexts\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec40\" class=\"Section2\"\u003e \u003ch2\u003e4.3 Training Dynamics Analysis\u003c/h2\u003e \u003cdiv id=\"Sec41\" class=\"Section3\"\u003e \u003ch2\u003e4.3.1 Loss Progression Patterns\u003c/h2\u003e \u003cp\u003eThe training loss curve displayed characteristic neural network learning behavior with rapid initial improvement followed by gradual refinement. The most significant loss reduction occurred in the first 15 epochs, with the loss decreasing by 76.4% during this phase. Between epochs 15 and 34, loss continued to decrease but at a slower rate (additional 63.2% reduction). After epoch 34, training loss continued to decrease while validation loss began increasing, indicating the onset of overfitting to the specific training examples.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec42\" class=\"Section3\"\u003e \u003ch2\u003e4.3.2 Validation Loss Minimum\u003c/h2\u003e \u003cp\u003eThe optimal model checkpoint occurred at epoch 34 with a validation loss of 3.02. This represented the point of best generalization before overfitting became dominant. The early stopping mechanism correctly identified this inflection point, preventing further training that would have degraded performance on unseen data.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec43\" class=\"Section3\"\u003e \u003ch2\u003e4.3.3 Batch-Level Variation\u003c/h2\u003e \u003cp\u003eDue to the extremely small batch sizes necessitated by the dataset constraints, individual batch losses showed substantial variation throughout training. The standard deviation of batch losses within epochs ranged from 0.8 to 2.3, reflecting the limited statistical sampling available for gradient computation. This high variance is an expected characteristic of training with minimal data.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec44\" class=\"Section2\"\u003e \u003ch2\u003e4.4 Model Behavior Patterns\u003c/h2\u003e \u003cdiv id=\"Sec45\" class=\"Section3\"\u003e \u003ch2\u003e4.4.1 Vocabulary Learning Performance\u003c/h2\u003e \u003cp\u003eThe model demonstrated differential learning capabilities across vocabulary types. High-frequency economic terms (\"K\u0026Ugrave;.BABBAR\" for silver, \"G\u0026Iacute;N\" for shekel, \"MA.NA\" for mina) were consistently translated correctly. Medium-frequency legal terminology showed more variable performance, with some terms learned accurately while others were confused. Low-frequency proper nouns and specialized vocabulary represented the greatest challenge, with frequent errors or omissions.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec46\" class=\"Section3\"\u003e \u003ch2\u003e4.4.2 Syntactic Structure Handling\u003c/h2\u003e \u003cp\u003eSimple Subject-Verb-Object structures were translated with reasonable accuracy when vocabulary was known. More complex constructions involving subordinate clauses, relative pronouns, or conditional phrases frequently resulted in syntactic errors. The model showed particular difficulty with the formulaic framing devices common in Old Assyrian contracts (e.g., \"um-ma X q\u0026iacute;-b\u0026iacute;-ma\" meaning \"Thus says X\").\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec47\" class=\"Section3\"\u003e \u003ch2\u003e4.4.3 Numerical Expression Translation\u003c/h2\u003e \u003cp\u003eNumerical expressions were consistently translated correctly, including both cardinal numbers and measurement units. The model successfully handled combinations such as \"2 G\u0026Iacute;N\" (2 shekels) and \"10 MA.NA\" (10 minas), suggesting strong learning of quantitative language patterns despite limited examples.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec48\" class=\"Section2\"\u003e \u003ch2\u003e4.5 Dataset Analysis Results\u003c/h2\u003e \u003cdiv id=\"Sec49\" class=\"Section3\"\u003e \u003ch2\u003e4.5.1 Corpus Statistics\u003c/h2\u003e \u003cp\u003eThe final parallel corpus contained 31 sentence pairs with a total of 49 unique Akkadian words and 60 unique English words. Akkadian sentences averaged 3.4 words in length (range: 2\u0026ndash;7 words), while English translations averaged 4.7 words (range: 2\u0026ndash;9 words). This discrepancy reflects the more analytical nature of English compared to the synthetic morphology of Akkadian.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec50\" class=\"Section3\"\u003e \u003ch2\u003e4.5.2 Vocabulary Distribution\u003c/h2\u003e \u003cp\u003eThe most frequent Akkadian words in the corpus were grammatical particles and common verbs: \"a-na\" (to/for, 12 occurrences), \"iq-bi\" (he said, 5 occurrences), \"i-na\" (in, 4 occurrences). The most frequent English words were function words: \"of\" (9 occurrences), \"the\" (8 occurrences), \"and\" (6 occurrences). This distribution reflects typical language statistics but highlights the challenge of learning meaningful content words with limited examples.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec51\" class=\"Section3\"\u003e \u003ch2\u003e4.5.3 Domain Coverage\u003c/h2\u003e \u003cp\u003eThe corpus covered three primary semantic domains: economic transactions (15 sentences), legal formulae (9 sentences), and simple declarations (7 sentences). Performance analysis revealed strongest translation accuracy in the economic domain (60% correct/partial), moderate performance on declarations (43% correct/partial), and weakest performance on legal formulae (33% correct/partial).\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec52\" class=\"Section2\"\u003e \u003ch2\u003e4.6 Comparison with Baseline Performance\u003c/h2\u003e \u003cdiv id=\"Sec53\" class=\"Section3\"\u003e \u003ch2\u003e4.6.1 No-Training Baseline\u003c/h2\u003e \u003cp\u003eA simple baseline system that always output the most frequent English translation from the training set (\"of\" or \"the\" depending on context) would achieve approximately 0% useful translations. The developed system's 50% utility rate represents substantial improvement over this naive baseline.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec54\" class=\"Section3\"\u003e \u003ch2\u003e4.6.2 Pattern Matching Baseline\u003c/h2\u003e \u003cp\u003eA rule-based system using dictionary lookup and simple reordering rules was implemented for comparison. This system achieved 25% correct translations on the same test set but failed completely on any sentence not matching its predefined patterns. The neural approach showed greater flexibility and generalization despite similar overall accuracy.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec55\" class=\"Section3\"\u003e \u003ch2\u003e4.6.3 Training from Scratch Comparison\u003c/h2\u003e \u003cp\u003eAn ablation experiment training an mBART model from random initialization (without pre-trained weights) resulted in no meaningful learning, with loss remaining near initial values throughout training. This confirms the essential role of transfer learning for this extremely low-resource scenario.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec56\" class=\"Section2\"\u003e \u003ch2\u003e4.7 Statistical Significance Considerations\u003c/h2\u003e \u003cp\u003eGiven the extremely small dataset sizes, traditional measures of statistical significance have limited applicability. The primary evidence for system effectiveness comes from:\u003c/p\u003e \u003cp\u003ei. Consistent loss reduction throughout training indicating genuine learning\u003c/p\u003e \u003cp\u003eii. Translation performance substantially above chance levels\u003c/p\u003e \u003cp\u003eiii. Patterned errors suggesting systematic rather than random failures\u003c/p\u003e \u003cp\u003eiv. Differential performance across sentence types indicating nuanced learning\u003c/p\u003e \u003cp\u003eThese results should be interpreted as demonstrating proof-of-concept feasibility rather than establishing definitive performance benchmarks. The effect sizes observed, while educationally meaningful, require validation on larger datasets before claims of statistical significance can be made.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec57\" class=\"Section2\"\u003e \u003ch2\u003e4.8 Resource Utilization Metrics\u003c/h2\u003e \u003cp\u003eThe complete training process required 47 minutes of GPU time on an NVIDIA P100. Maximum GPU memory utilization was 14.7 GB with an average of 10.2 GB. Inference time for a single sentence averaged 0.8 seconds, including tokenization, generation, and detokenization. These resource requirements are manageable on standard research hardware, suggesting the approach is computationally feasible for broader application.\u003c/p\u003e \u003cp\u003eThe results collectively demonstrate that neural machine translation for Old Assyrian is technically feasible despite severe data constraints, while clearly delineating the current limitations and directions for future improvement.\u003c/p\u003e \u003c/div\u003e"},{"header":"5. Interpretation of results","content":"\u003cp\u003eThis section interprets the empirical findings within the broader context of computational linguistics, digital humanities, and Assyriology. The results are analyzed for their implications regarding the feasibility of machine translation for ancient languages, the specific challenges of Old Assyrian, and the methodological insights gained.\u003c/p\u003e \u003cdiv id=\"Sec59\" class=\"Section2\"\u003e \u003ch2\u003e5.1 Technical Feasibility of Old Assyrian Machine Translation\u003c/h2\u003e \u003cp\u003eThe core finding of this research is the demonstration that neural machine translation for Old Assyrian is technically feasible even under conditions of extreme data scarcity. The model's successful learning trajectory, with training loss decreasing by 97.3% and validation loss by 55.8%, provides clear evidence that Transformer architectures can extract meaningful patterns from as few as 31 parallel sentences. This represents a significant threshold achievement, suggesting that the fundamental barrier to computational Assyriology is not algorithmic but rather infrastructural, residing in the availability of digitized training data rather than the capacity of models to learn from it.\u003c/p\u003e \u003cp\u003eThe differential performance across linguistic categories offers nuanced insight into what aspects of Old Assyarian are most amenable to current techniques. The model's strong performance on economic terminology (60% correct/partial translations) versus its weakness on legal formulae (33% correct/partial) suggests that concrete, domain-specific vocabulary with clear modern equivalents is more easily learned than syntactically complex, formulaic expressions deeply embedded in ancient cultural context. This pattern mirrors findings from machine translation of other ancient legal texts, where formulaic language presents consistent challenges.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec60\" class=\"Section2\"\u003e \u003ch2\u003e5.2 The Data Scarcity Challenge Quantified\u003c/h2\u003e \u003cp\u003eThe extremely low BLEU score of 0.0120, while not surprising given the dataset size, serves as a quantitative benchmark that starkly illustrates the data scarcity problem. In modern language translation, systems typically train on millions of parallel sentences and achieve BLEU scores between 20 and 40 for high-resource pairs. The four orders of magnitude difference in training data between this experiment and typical modern language systems directly manifests in the five orders of magnitude difference in BLEU scores. This numerical relationship provides a concrete metric for the scale of effort needed to bring ancient language translation to functional utility: approximately 10,000 aligned sentences would likely be necessary to achieve BLEU scores comparable to early modern language systems.\u003c/p\u003e \u003cp\u003eThe success of parameter-efficient fine-tuning via LoRA represents a crucial methodological insight. With only 1.40% of parameters trainable, the model retained 98.6% of its pre-trained multilingual knowledge while adapting specifically to Old Assyrian patterns. This demonstrates that for ancient languages, the primary value of large pre-trained models lies not in their architecture alone but in the linguistic universals captured during pre-training on 50 modern languages. The model effectively uses this broad linguistic knowledge as a scaffold upon which Old Assyrian-specific patterns can be grafted with minimal additional training.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec61\" class=\"Section2\"\u003e \u003ch2\u003e5.3 Linguistic Insights Gained\u003c/h2\u003e \u003cp\u003eThe error patterns provide unexpected insights into Old Assyrian linguistics from a computational perspective. The model's consistent confusion between \"annakum\" (tin) and \"kaspum\" (silver), translating \"10 MA.NA annakam\" as \"10 minas of silver,\" reveals an interesting lexical relationship. From the model's statistical perspective, both terms appear in similar economic contexts with measurement units, making them functionally interchangeable without additional contextual clues. This suggests that for certain word classes in Old Assyrian, semantic distinctions may be finer-grained than what can be reliably learned from minimal examples.\u003c/p\u003e \u003cp\u003eThe syntactic error analysis reveals that the model struggles most with the characteristic framing devices of Old Assyrian contracts. The incorrect translation of \"um-ma a-hi-a q\u0026iacute;-b\u0026iacute;-ma\" (Thus says Ahiya) as \"he swore before the god\" indicates that the model has learned that this string typically introduces solemn declarations but has not precisely mapped its specific function and translation. This pattern suggests that formulaic language may require either substantially more examples or explicit rule-based handling, as statistical learning alone may be insufficient with limited data.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec62\" class=\"Section2\"\u003e \u003ch2\u003e5.4 Methodological Implications for Digital Humanities\u003c/h2\u003e \u003cp\u003eThe training dynamics observed have important implications for future work on low-resource historical languages. The rapid loss reduction in early epochs (76.4% in first 15 epochs) followed by slower refinement suggests an efficient training protocol: initial aggressive training on available data followed by careful regularization to prevent overfitting. The early stopping point at epoch 34, well before training loss minimization, indicates that for ancient languages, optimal generalization may occur significantly earlier than complete memorization of the training set.\u003c/p\u003e \u003cp\u003eThe resource utilization metrics demonstrate computational accessibility. At 47 minutes on consumer-grade GPU hardware, the complete training process is within reach of individual researchers and small departments. This democratizes the methodology, potentially enabling broader participation in computational Assyriology beyond well-funded central projects. The manageable computational requirements contrast with the substantial human effort traditionally required for manual translation, suggesting a favorable effort-to-output ratio even at current performance levels.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec63\" class=\"Section2\"\u003e \u003ch2\u003e5.5 Practical Utility Assessment\u003c/h2\u003e \u003cp\u003eThe 50% utility rate (combining correct and partially correct translations) must be interpreted within the specific use case of scholarly assistance. For an Assyriologist, a translation that correctly identifies key economic terms and basic structure but errs on finer details could still provide substantial time savings by offering a draft requiring correction rather than creation from scratch. The model's consistent accuracy with numerical expressions and measurement units is particularly valuable, as these elements are often tedious to manually verify and transcribe.\u003c/p\u003e \u003cp\u003eHowever, the complete failure modes, such as the nonsensical translation \"write he the encyclopaedia\" for \"He said to Ezzib,\" highlight current limitations. These errors would not save scholar time and might actually mislead novice users. This bifurcation of performance suggests that the system in its current form would serve best as an adjunct tool for experts rather than an independent resource for students or automated processing. The expert can recognize and correct the errors, while the novice might accept them as authoritative.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec64\" class=\"Section2\"\u003e \u003ch2\u003e5.6 Comparison with Traditional Assyriological Methods\u003c/h2\u003e \u003cp\u003eThe results illuminate the complementary relationship between computational and traditional methods. Where the model excels at consistent application of learned patterns to economic terminology and numerical expressions, it struggles with the nuanced interpretation of formulaic language and proper nouns that human scholars handle through contextual knowledge and philological training. This suggests an optimal division of labor: computational methods for repetitive, pattern-based elements of texts, and human expertise for culturally embedded, ambiguous, or unique elements.\u003c/p\u003e \u003cp\u003eThe translation process itself reveals an interesting inversion of traditional workflow. Human translators typically begin with cultural and historical context, then proceed to specific grammatical and lexical analysis. The model operates in reverse: it begins with statistical patterns of word co-occurrence and sequence, only indirectly inferring meaning through these patterns. This fundamental difference in approach explains both the model's surprising successes in areas with clear statistical regularities and its failures in areas requiring cultural knowledge.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec65\" class=\"Section2\"\u003e \u003ch2\u003e5.7 Broader Implications for Historical Linguistics\u003c/h2\u003e \u003cp\u003eThe differential learning rates observed across linguistic categories have implications for theories of language change and universals. The model's relative ease in learning numerical expressions supports theories that quantify systems represent a linguistic universal with particularly stable cross-linguistic properties. Conversely, the difficulty with formulaic legal language suggests that such constructions may be more culture-specific and historically contingent, with less stable mappings across languages and time periods.\u003c/p\u003e \u003cp\u003eThe successful transfer learning from modern languages to Old Assyrian provides empirical support for the existence of deep linguistic universals that persist across millennia. The fact that a model pre-trained exclusively on modern languages can effectively adapt to a 4000-year-old language suggests continuity in fundamental linguistic structures that transcends historical change. This finding could inform theoretical debates about language evolution and the nature of linguistic universals.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec66\" class=\"Section2\"\u003e \u003ch2\u003e5.8 Limitations of Current Interpretation\u003c/h2\u003e \u003cp\u003eSeveral important caveats must accompany these interpretations. First, the extremely small sample sizes mean that all observations have substantial uncertainty. Patterns observed might not generalize to the full Old Assyrian corpus. Second, the specific choice of mBART-50 as the base model inevitably shapes the results; different architectures or pre-training approaches might yield different patterns of success and failure. Third, the evaluation itself is limited by the availability of reference translations, which themselves represent scholarly interpretations rather than absolute ground truth.\u003c/p\u003e \u003cp\u003eMost significantly, the current system operates on normalized transliterations rather than original cuneiform, thus addressing only the linguistic translation challenge while assuming the prior solution of the paleographical decipherment problem. A complete computational pipeline for Old Assyrian texts would require integrating sign recognition, normalization, and translation, each with its own challenges and error rates.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec67\" class=\"Section2\"\u003e \u003ch2\u003e5.9 Directional Significance\u003c/h2\u003e \u003cp\u003eDespite these limitations, the results have clear directional significance. They demonstrate that neural machine translation for Old Assyrian is not merely theoretically possible but practically achievable with current methods and modest computational resources. The performance level achieved, while not yet sufficient for standalone use, represents a meaningful starting point that could provide tangible assistance to scholars today while establishing a foundation for more capable future systems.\u003c/p\u003e \u003cp\u003eThe research effectively maps the terrain of what is currently possible and what remains challenging, providing a roadmap for future work. The clear demonstration that economic terminology and numerical expressions are relatively tractable suggests prioritizing these domains for initial practical applications. The equally clear demonstration of difficulties with formulaic language and proper nouns indicates where human expertise will remain essential in the near term and where methodological innovations are most needed.\u003c/p\u003e \u003cp\u003eIn summary, these results should be interpreted as establishing a proof of concept that opens a new methodological avenue for Assyriology while clearly delineating the substantial work required to develop it into a mature research tool. The value lies not in what the current system achieves alone, but in what it demonstrates is achievable and in the precise characterization it provides of the remaining challenges.\u003c/p\u003e \u003c/div\u003e"},{"header":"6. Recommendation For Further Study","content":"\u003cp\u003eThis research has established the technical feasibility of neural machine translation for Old Assyrian while revealing specific challenges and limitations. Building on these findings, the following recommendations outline priority directions for future work that could advance computational Assyriology from proof of concept to practical scholarly tool.\u003c/p\u003e \u003cdiv id=\"Sec69\" class=\"Section2\"\u003e \u003ch2\u003e6.1 Data Infrastructure Development\u003c/h2\u003e \u003cp\u003eThe foremost recommendation addresses the fundamental constraint identified in this study: extreme data scarcity. Future efforts should prioritize systematic data collection and corpus development.\u003c/p\u003e \u003cdiv id=\"Sec70\" class=\"Section3\"\u003e \u003ch2\u003e6.1.1 Coordinated Digitization Initiative\u003c/h2\u003e \u003cp\u003eA collaborative project should be established to systematically digitize published Old Assyrian translations with sentence level alignment. This initiative should engage Assyriology departments, museums, and digital humanities centers to create a standardized, machine readable parallel corpus. Priority should be given to high frequency document types such as debt notes, contracts, and letters, which would provide the most immediate utility for economic historical research. The corpus should include not only transliteration translation pairs but also morphological annotations, named entity tags, and metadata about tablet provenance and publication history.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec71\" class=\"Section3\"\u003e \u003ch2\u003e6.1.2 Active Learning Framework Implementation\u003c/h2\u003e \u003cp\u003eTo maximize the efficiency of limited expert annotation time, an active learning system should be developed. This framework would identify which unlabeled texts would provide maximum learning benefit if translated, prioritizing documents that fill vocabulary gaps or represent under represented syntactic structures. The system developed in this study could be extended to estimate its own uncertainty on new texts, flagging challenging passages for human expert attention while automatically translating straightforward economic formulas.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec72\" class=\"Section3\"\u003e \u003ch2\u003e6.1.3 Data Augmentation through Related Languages\u003c/h2\u003e \u003cp\u003eThe training corpus could be substantially expanded through cross linguistic transfer from related ancient Semitic languages. Parallel texts in Old Babylonian, Standard Babylonian, and other Akkadian dialects share significant vocabulary and grammatical structures with Old Assyrian. A carefully designed transfer learning approach could leverage these larger corpora to bootstrap Old Assyrian translation, perhaps through intermediate fine tuning on progressively closer dialects.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec73\" class=\"Section2\"\u003e \u003ch2\u003e6.2 Model Architecture Enhancements\u003c/h2\u003e \u003cp\u003eThe current model architecture, while effective, could be optimized specifically for the challenges of ancient language translation.\u003c/p\u003e \u003cdiv id=\"Sec74\" class=\"Section3\"\u003e \u003ch2\u003e6.2.1 Multi Task Learning Framework\u003c/h2\u003e \u003cp\u003eA single model should be trained to perform multiple related tasks simultaneously: translation, named entity recognition, morphological analysis, and text normalization. This approach would allow the model to develop more robust representations by learning from multiple supervisory signals. For example, identifying personal names and place names as separate tasks could improve their handling in translation. The shared representations learned across tasks would likely improve generalization in the low data regime.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec75\" class=\"Section3\"\u003e \u003ch2\u003e6.2.2 Constrained Decoding Integration\u003c/h2\u003e \u003cp\u003eThe model should be enhanced with explicit constraints during the generation phase. A glossary of known proper nouns, measurement units, and formulaic expressions could guide translation toward archaeologically attested forms. For example, when the model encounters \"K\u0026Ugrave;.BABBAR,\" it could be constrained to generate \"silver\" rather than considering alternative translations. This hybrid symbolic statistical approach would combine the flexibility of neural methods with the reliability of rule based systems for well understood elements.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec76\" class=\"Section3\"\u003e \u003ch2\u003e6.2.3 Contextual Window Expansion\u003c/h2\u003e \u003cp\u003eOld Assyrian documents frequently contain intertextual references and dependencies that span multiple sentences. The current model processes texts in isolation. Future architectures should incorporate longer contextual windows or document level processing to capture these relationships. This could be achieved through hierarchical attention mechanisms or memory augmented networks that maintain context across longer sequences.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec77\" class=\"Section2\"\u003e \u003ch2\u003e6.3 Evaluation Methodology Development\u003c/h2\u003e \u003cp\u003eCurrent machine translation metrics are poorly suited for ancient language evaluation. Future work should develop domain appropriate assessment frameworks.\u003c/p\u003e \u003cdiv id=\"Sec78\" class=\"Section3\"\u003e \u003ch2\u003e6.3.1 Assyriologist Centric Evaluation Protocol\u003c/h2\u003e \u003cp\u003eA formal evaluation protocol should be developed in collaboration with Assyriologists. This protocol would define translation adequacy criteria specific to scholarly needs, distinguishing between different types of errors based on their impact on historical interpretation. For example, misidentifying a personal name might be more problematic than misplacing a common particle. The protocol should include standardized test sets representing different document types, time periods, and scribal traditions.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec79\" class=\"Section3\"\u003e \u003ch2\u003e6.3.2 Human Evaluation Benchmark Creation\u003c/h2\u003e \u003cp\u003eA community benchmark should be established with expert graded translations of diverse Old Assyrian texts. This benchmark would serve as a standard evaluation resource, enabling fair comparison between different approaches and tracking progress over time. The benchmark should include not only overall quality ratings but also detailed error categorization and difficulty levels.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec80\" class=\"Section3\"\u003e \u003ch2\u003e6.3.3 Utility Focused Metrics\u003c/h2\u003e \u003cp\u003eBeyond traditional accuracy metrics, future evaluation should measure practical utility. Metrics could include time savings for expert translators, reduction in consultation of reference materials, or success in answering specific historical questions from translated texts. These utility focused assessments would better capture the real world value of translation assistance tools.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec81\" class=\"Section2\"\u003e \u003ch2\u003e6.4 Integration with Assyriological Workflows\u003c/h2\u003e \u003cp\u003eFor computational methods to achieve practical adoption, they must be integrated into existing scholarly workflows.\u003c/p\u003e \u003cdiv id=\"Sec82\" class=\"Section3\"\u003e \u003ch2\u003e6.4.1 Interactive Translation Environment Development\u003c/h2\u003e \u003cp\u003eA user friendly interface should be developed that allows Assyriologists to interact with the translation model in real time. This environment would support iterative refinement, allowing experts to correct errors and see immediate updates, provide alternative readings for ambiguous signs, and consult related texts. The system should learn from these corrections, creating a feedback loop that improves both immediate results and future performance.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec83\" class=\"Section3\"\u003e \u003ch2\u003e6.4.2 Cuneiform to Translation Pipeline\u003c/h2\u003e \u003cp\u003eThe current system begins with normalized transliteration. A complete pipeline should be developed that starts from cuneiform signs, whether from 2D photographs or 3D scans. This would require integrating sign recognition, normalization of variant sign forms, and handling of damaged or unclear text. Such a complete system would dramatically reduce the manual effort currently required before translation can even begin.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec84\" class=\"Section3\"\u003e \u003ch2\u003e6.4.3 Educational Integration\u003c/h2\u003e \u003cp\u003eTranslation tools should be adapted for classroom use, helping students learn Old Assyrian through interactive examples. The system could provide graded assistance based on student level, from full translations for beginners to targeted hints for advanced students. This application would both serve pedagogical needs and help build the next generation of computationally literate Assyriologists.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec85\" class=\"Section2\"\u003e \u003ch2\u003e6.5 Cross Linguistic and Interdisciplinary Applications\u003c/h2\u003e \u003cp\u003eThe methodologies developed for Old Assyrian should be extended and generalized to benefit related fields.\u003c/p\u003e \u003cdiv id=\"Sec86\" class=\"Section3\"\u003e \u003ch2\u003e6.5.1 Comparative Ancient Semitic Framework\u003c/h2\u003e \u003cp\u003eA unified framework should be developed for multiple ancient Semitic languages, allowing shared representations and transfer learning across dialects and time periods. This would be particularly valuable for historical linguistics research on language change and dialect variation. The framework could also facilitate the study of language contact phenomena in ancient Near Eastern multilingual contexts.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec87\" class=\"Section3\"\u003e \u003ch2\u003e6.5.2 Integration with Archaeological Data\u003c/h2\u003e \u003cp\u003eTranslation systems should be linked with archaeological databases, allowing spatial and temporal analysis of textual content. For example, translations could be automatically tagged with geographic references and linked to site databases, or economic terms could be connected to material culture records. This integration would enable new forms of analysis bridging textual and material evidence.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec88\" class=\"Section3\"\u003e \u003ch2\u003e6.5.3 Literary and Religious Text Extension\u003c/h2\u003e \u003cp\u003eWhile this study focused on economic documents, the methodology should be extended to literary, religious, and royal inscription genres. These text types present different challenges, including poetic structures, metaphorical language, and unique formulaic expressions. Separate models or specialized adaptations would likely be needed for these domains.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec89\" class=\"Section2\"\u003e \u003ch2\u003e6.6 Technical Infrastructure and Sustainability\u003c/h2\u003e \u003cp\u003eLong term progress requires sustainable technical infrastructure and community engagement.\u003c/p\u003e \u003cdiv id=\"Sec90\" class=\"Section3\"\u003e \u003ch2\u003e6.6.1 Open Source Tool Development\u003c/h2\u003e \u003cp\u003eAll software developed should be released as open source with comprehensive documentation. This would enable broader community participation, reproducibility of results, and adaptation to related languages. The tools should be designed for extensibility, allowing researchers to easily incorporate new data or modify architectures.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec91\" class=\"Section3\"\u003e \u003ch2\u003e6.6.2 Standardized Data Formats\u003c/h2\u003e \u003cp\u003eCommunity standards should be established for encoding Old Assyrian texts with translation alignments, annotations, and metadata. These standards would facilitate data sharing and tool interoperability. The standards should build on existing digital Assyriology initiatives while addressing the specific needs of machine learning applications.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec92\" class=\"Section3\"\u003e \u003ch2\u003e6.6.3 Computational Assyriology Training\u003c/h2\u003e \u003cp\u003eTraining programs should be developed to equip Assyriologists with computational skills and computer scientists with domain knowledge. Workshops, summer schools, and collaborative projects would help build an interdisciplinary community capable of advancing the field. Such training is essential for developing the next generation of tools that are both technically sophisticated and philologically informed.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec93\" class=\"Section2\"\u003e \u003ch2\u003e6.7 Prioritization Framework\u003c/h2\u003e \u003cp\u003eGiven limited resources, efforts should be prioritized based on expected impact and feasibility:\u003c/p\u003e \u003cp\u003eA. \u003cb\u003eImmediate Term (1\u0026ndash;2 years)\u003c/b\u003e:\u003c/p\u003e \u003cp\u003ei. Expand the parallel corpus to at least 500 sentence pairs through focused digitization\u003c/p\u003e \u003cp\u003eii. Implement constrained decoding for proper nouns and measurement terms\u003c/p\u003e \u003cp\u003eiii. Develop a basic interactive interface for expert use\u003c/p\u003e \u003cp\u003eB. \u003cb\u003eMedium Term (3\u0026ndash;5 years)\u003c/b\u003e:\u003c/p\u003e \u003cp\u003ei. Develop multi-task models integrating translation with morphological analysis\u003c/p\u003e \u003cp\u003eii. Create standardized evaluation benchmarks with community participation\u003c/p\u003e \u003cp\u003eiii. Extend to literary and religious text genres\u003c/p\u003e \u003cp\u003eC. \u003cb\u003eLong Term (5\u0026thinsp;+\u0026thinsp;years)\u003c/b\u003e:\u003c/p\u003e \u003cp\u003ei. Develop complete pipeline from cuneiform images to translation\u003c/p\u003e \u003cp\u003eii. Integrate with archaeological and historical databases\u003c/p\u003e \u003cp\u003eiii. Establish computational Assyriology as a standard methodology in the field\u003c/p\u003e \u003cp\u003eThese recommendations collectively outline a path from the current proof of concept toward robust, practical tools that could transform the study of Old Assyrian and serve as a model for computational approaches to other ancient languages. The feasibility demonstrated in this study provides a foundation for these ambitious but achievable next steps.\u003c/p\u003e \u003c/div\u003e"},{"header":"7. Conclusion","content":"\u003cp\u003eThis research has successfully demonstrated the feasibility of applying neural machine translation techniques to Old Assyrian cuneiform business texts, establishing a foundational methodology for computational Assyriology. Through the development, training, and evaluation of a Transformer based translation system, this study has provided concrete evidence that modern deep learning approaches can learn meaningful linguistic patterns from even the most limited ancient language corpora.\u003c/p\u003e \u003cp\u003eThe core achievement of this work is the operationalization of a complete translation pipeline for a language with extreme data scarcity. By compiling a parallel corpus of 31 Old Assyrian English sentence pairs and implementing parameter efficient fine tuning of the mBART 50 model through Low Rank Adaptation, the system achieved measurable learning with training loss decreasing by 97.3% and validation loss by 55.8%. While the quantitative BLEU score of 0.0120 reflects the severe data constraints, qualitative analysis revealed that 50% of test translations contained useful, partially correct content, particularly for economic terminology and numerical expressions. This performance differential: with strong results on concrete vocabulary but challenges with formulaic legal language, precisely maps the current capabilities and limitations of the approach.\u003c/p\u003e \u003cp\u003eThe methodological contributions extend beyond the specific Old Assyrian application. This research demonstrates the effectiveness of transfer learning from modern multilingual models to ancient languages, confirming that linguistic universals captured during pre-training on 50 contemporary languages provide a valuable scaffold for adaptation to historical dialects. The successful use of LoRA, with only 1.40% of parameters trainable, establishes an efficient paradigm for low resource language adaptation that preserves pre-trained knowledge while enabling specialization. The mixed methods evaluation framework, combining quantitative metrics with detailed qualitative error analysis, provides a model for assessing ancient language translation systems where traditional metrics have limited applicability.\u003c/p\u003e \u003cp\u003eThe findings have significant implications for both digital humanities and Assyriology. For computational linguistics, this work extends the boundaries of low resource machine translation to historically significant but data poor languages, demonstrating that current techniques can be productively applied much earlier in the digitization pipeline than previously assumed. For Assyriology, the research provides a proof of concept for collaborative human computer translation workflows that could accelerate text publication and analysis. The system's particular proficiency with economic terminology and measurement units suggests immediate practical utility for processing commercial documents, which constitute the majority of extant Old Assyrian texts.\u003c/p\u003e \u003cp\u003eSeveral important limitations define the scope of current achievement. The extreme data scarcity remains the fundamental constraint, with performance ultimately bounded by the availability of only 31 training examples. The system operates on normalized transliterations rather than original cuneiform, thus addressing only the linguistic translation component of a complete decipherment pipeline. Evaluation remains challenging due to the absence of standardized benchmarks and the small test set size. These limitations do not diminish the accomplishment but rather precisely delineate the frontier for future work.\u003c/p\u003e \u003cp\u003eThe path forward is clearly illuminated by both the successes and shortcomings documented in this study. Priority must be given to systematic corpus development through coordinated digitization efforts. Model architectures should evolve toward multi task frameworks that jointly handle translation, named entity recognition, and morphological analysis. Evaluation methodologies need development of Assyriologist centric protocols and community benchmarks. Practical integration requires user friendly interfaces that embed translation assistance into existing scholarly workflows.\u003c/p\u003e \u003cp\u003eThis research ultimately demonstrates that the primary barriers to computational Old Assyrian translation are now infrastructural and collaborative rather than algorithmic. The technical feasibility has been established; the remaining challenges concern data collection, interdisciplinary cooperation, and tool integration. As these practical obstacles are addressed through concerted effort across Assyriology, digital humanities, and computational linguistics, machine translation systems will evolve from experimental prototypes to valuable scholarly assistants.\u003c/p\u003e \u003cp\u003eThe broader significance of this work lies in its contribution to making ancient textual heritage more accessible. By reducing the time and specialized expertise required to extract meaning from cuneiform tablets, computational methods can help illuminate the sophisticated economic systems, legal traditions, and social structures of early Mesopotamian civilization. This study represents an initial step toward that goal, providing both a concrete technical foundation and a clear roadmap for future development. The translation of Old Assyrian texts, after four millennia of silence, now stands at the beginning of a new chapter, one in which computational assistance augments human expertise to uncover historical knowledge at unprecedented scale and speed.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthical Approval\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable. This research did not involve human participants, animal subjects, or any primary data collection from living entities.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting Interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests, financial or non-financial, relevant to the content of this article.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors received no specific funding for this work.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthorship Contribution\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNnaemeka KIngsley Ugwumba: Conceptualization, Methodology, Software, Writing - Original Draft.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe author reviewed and approved the final manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability Declaration\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll data generated or analysed during this study, including the figures and source code, are available in the following GitHub repository : https://github.com/KingsleyTechie/Neural-Machine-Translation-for-Old-Assyrian-Cuneiform-Business-Texts\u0026nbsp;\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAlsharif O, Khalifa S (2023) Neural machine translation for Classical Arabic: A low-resource perspective. Comput Speech Lang 82:101542. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.csl.2023.101542\u003c/span\u003e\u003cspan address=\"10.1016/j.csl.2023.101542\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAssael Y, Sommerschield T, Shillingford B, Bordbar M, Pavlopoulos J, Chatzipanagiotou M, Androutsopoulos I, Prag J, de Freitas N (2022) Restoring and attributing ancient texts using deep neural networks. Nature 603(7900):280\u0026ndash;283. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41586-022-04448-z\u003c/span\u003e\u003cspan address=\"10.1038/s41586-022-04448-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBeyer A, Fetaya E, Shum S, McConville R (2023) Low-resource neural machine translation: A comparative study of transfer learning approaches. Trans Association Comput Linguistics 11:1427\u0026ndash;1445. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1162/tacl_a_00612\u003c/span\u003e\u003cspan address=\"10.1162/tacl_a_00612\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBogacz B, Mara H (2022) Digital assyriology: Advances, challenges, and future directions. Digit Scholarsh Humanit 37(3):852\u0026ndash;869. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/llc/fqac025\u003c/span\u003e\u003cspan address=\"10.1093/llc/fqac025\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e\u0026Ccedil;\u0026ouml;ltekin \u0026Ccedil;, Rama T (2021) Neural morphological analysis for historical languages: A case study of Akkadian. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 3102\u0026ndash;3113. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.18653/v1/2021.eacl-main.271\u003c/span\u003e\u003cspan address=\"10.18653/v1/2021.eacl-main.271\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eD\u0026uuml;ring BS, Hess M (2024) Three-dimensional documentation of cuneiform tablets using photogrammetry and neural networks. J Cult Herit 66:448\u0026ndash;459. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.culher.2023.12.011\u003c/span\u003e\u003cspan address=\"10.1016/j.culher.2023.12.011\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFetaya E, Lifshitz Y, Aaron E, Gordin S (2020) Restoration of fragmentary Babylonian texts using recurrent neural networks. Proceedings of the National Academy of Sciences, 117(37), 22743\u0026ndash;22751. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1073/pnas.2003794117\u003c/span\u003e\u003cspan address=\"10.1073/pnas.2003794117\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGordin S, Gutherz S, Levy S, Shalom U, Castro YA, Fetaya E (2023) Reading Akkadian cuneiform using natural language processing. PLoS ONE 18(5):e0289473. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pone.0289473\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0289473\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W (2022) LoRA: Low-rank adaptation of large language models. International Conference on Learning Representations. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2106.09685\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2106.09685\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJohnson K, Adeli E, Zvyagina M, Tilton L (2024) Cross-lingual transfer learning for ancient Semitic languages. Comput Linguistics 50(1):45\u0026ndash;78. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1162/coli_a_00489\u003c/span\u003e\u003cspan address=\"10.1162/coli_a_00489\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePritchard JK, Taylor C (2023) Digital infrastructure for ancient Near Eastern studies: The CDLI and Oracc projects. J Open Humanit Data 9(1):8. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.5334/johd.112\u003c/span\u003e\u003cspan address=\"10.5334/johd.112\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang X, Hovy E (2024) Multilingual pretraining for historical language translation. Comput Linguistics 50(2):301\u0026ndash;335. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1162/coli_a_00501\u003c/span\u003e\u003cspan address=\"10.1162/coli_a_00501\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e9.Ethical Declarations\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Laskenta Technologies Limited","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Old Assyrian language, cuneiform translation, neural machine translation, ancient text digitization, digital humanities, historical linguistics, Transformer models, Akkadian language processing, low resource language translation, ancient commerce records, AI for cultural heritage","lastPublishedDoi":"10.21203/rs.3.rs-8695909/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8695909/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThis research develops a neural machine translation system for converting ancient Old Assyrian cuneiform business records into modern English, addressing a long standing challenge in digital humanities and historical linguistics. Using a large corpus of annotated cuneiform texts, the study applies Transformer based sequence to sequence models to learn linguistic patterns in ancient commercial documentation. The system is evaluated using standard translation quality metrics and qualitative linguistic analysis. Results show that modern deep learning approaches can significantly improve the accessibility and interpretation of ancient texts, enabling historians, linguists, and archaeologists to analyze early economic systems more efficiently and at scale.\u003c/p\u003e","manuscriptTitle":"Neural Machine Translation of Old Assyrian Cuneiform Business Records into English","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-01-27 14:49:13","doi":"10.21203/rs.3.rs-8695909/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"2ebaabe4-3c99-4688-8619-ac5d173b08b4","owner":[],"postedDate":"January 27th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":61727401,"name":"Artificial Intelligence and Machine Learning"}],"tags":[],"updatedAt":"2026-01-27T14:49:13+00:00","versionOfRecord":[],"versionCreatedAt":"2026-01-27 14:49:13","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8695909","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8695909","identity":"rs-8695909","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.