A method for Symbol Resolution

doi:10.21203/rs.3.rs-5981027/v1

A method for Symbol Resolution

2025 · doi:10.21203/rs.3.rs-5981027/v1

preprint OA: closed

Full text JSON View at publisher

Full text 67,312 characters · extracted from preprint-html · click to expand

A method for Symbol Resolution | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A method for Symbol Resolution Sinisa Milivojevic This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5981027/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract This paper provides the result of an extensive research into a novel set of algorithms that would provide more efficient (and faster) manner of resolving symbols, such as reserved words, keywords, operators, functions and similar. A new method has been provided which has been thoroughly tested. That method required the utilisation of the hashing. Many hashes have been tested, of which only few could render required results. This resulted in the the invention and development of new hashes, by the author. All necessary information from this research is presented including characteristics of some known algorithms, and including the efficiency of the utilised hashes and a total speed of execution for the entire set of symbols on two different CPUs. Software Engineering symbol resolution sparse vector hashing new hashes 1 Introduction All compilers and interpreters, have their own set of symbols, which comprise reserved words, keywords, function names and similar. In order to recognise that set from other literals, there is a need to have fast method to recognise whether a separate literal used is one of it’s symbols. That is exactly what is meant by the term “symbol resolution”. There were many methods used in the past, one of which is hashing. The crux of this paper is dedicated to the description of methods that use a family of hashes, which are generated with both byte and bit operations on the symbols. That family of hashes is better known as xxhashes, 2 A description of the method Most of the compilers / interpreters are storing their symbols as a packed array of structures, usually written in C programming language. These structures usually contain the string for the symbol literal, it’s length and the fields for the attributes which provide more detail on what are the characteristics of the symbol. This paper presents the research that has developed a method that involved the usage of the sparse and imperfect vectors that store indices for the entries in the above described packed symbol vector. This method is called indirect indexing, since valid sparse, imperfect vector whose elements contain valid and invalid values, since this is a sparse vector. Each of the valid values is the index in the packed, perfect vector that stores all symbols. To put it in a formula, it looks like this: Symbol = Vector_of_Symbols[Sparse_Vector[x]]; where x is (obviously) the index in the sparse vector. This approach has necessitated that a method is invented for the calculation of the indices (denoted as ‘x’ in the above formula) in the sparse vector containing values that point to the valid entries in the packed vector containing symbols. These symbols are reserved words, keywords, function names, operators et cetera. A very strict criteria has been set that would necessitate that the above indices are calculated only from the string literals of these symbols, but in the manner that the sparse vector contains only one index for each separate symbol. That is the crux of the method that was developed. This is better visualised in you start from an example. If you look at one of the latest MySQL include file, lex.h, you will notice that there are 845 symbols. However, of these only 803 symbols are relevant for this research. Those 803 symbols include all the reserved words, keywords, operators and functions. The rest are optimiser hints, that can not be treated in the same way, because they contain duplicates, which is incompatible with any symbol resolution. Those are also treated totally differently in the current MySQL code. The tests were executed on the last 8.0 release available at the time. Since then, new versions and releases have come out, with some changes in the symbol table. All those were tested too and those were very similar to the the original results (presented in the next chapter). Using imperfect, sparse vector necessitated an algorithm that would find the smallest possible vector that would resolve the symbols. Hence, another objective of this research was to use an optimum-seeking algorithm that would run symbol resolution until they would all fit in the vector, without any duplicates. It was decided that this vector should have no more then 64 K entries. For that purpose a separate array was created, which contains all primes between 16 K and 64 K. Those primes were used for the searching of the smallest vector, where all the entries (in this case 803 of them) would fit, without duplicates. This vector was then generated as an obligatory part of the algorithm results. That also means that the hashing algorithm used should provide a 16-bit value as a final result. This was easily achieved by obtaining a 32-bit hash and calculating the remainder of that value with the size of the sparse vector. In such a system, each sparse, imperfect vector should have been initialised. In this system, 0 (zero) would be a valid, although highly improbable entry, since the last prime, in the above mentioned array, is definitely smaller then 64K. Hence, UINT16_MAX would be totally impossible entry, as the maximum size of the vector would be smaller. That is a reason why every entry in the sparse vector would be initialised with a value of UINT16_MAX. That has made searching algorithm easier to design. Simply, if calculated index for the sparse vector would contain a value of UINT16_MAX, then that entry is valid and it was written into that spot. If the value found was smaller, then it would indicate that the algorithm has hit upon a duplicate entry. The code would then proceed with a new, larger vector, until a vector was found that would accommodate all symbols, in this case 803 of them. The sizes of vectors, for the usable hashes, are also included in the results. It should be mentioned that the results of each successful algorithm included the generated sparse vector containing all entries including the valid ones (in this case 803 of them), vector’s size (as number of entries), some info on the successful completion and the speed of the resolution of all 803 symbols, in microseconds. This also means that the runtime code only needed the vector and the xxhash algorithm which returns the symbol. This also means that the size of the product will include additional storage for the array, whose size is circa 100 Kbytes. For the contemporary computers that is a small price to pay. It was discovered during research that those hashes that were successful, managed to do with less then 64 K, while the others could not fit even in much larger vectors. This is due to the fact that the unsuccessful algorithms have produced duplicates even with much larger vector sizes. Some produced duplicates even with 32-bit vector sizes. It should be pointed out that the speed and efficiency of searching and generating sparse vector, and filing it up with indices in the right places, is not relevant. That is because this step is performed during compiling of the SQL interpreter (or any other interpreter or compiler). The important efficiency and speed is the one in runtime, which deals only with symbol resolution. At the very start of the research it was clear that classical hashes, that only use byte arithmetics, can not produce desirable results. Hence, this is a reason why it was mandatory that it necessitate usage of hashes which use a combination of both byte and bit operations. Shortly, we use the existing term xxhashes for those hashes that use operations on both bytes and bits. The research has tested over 30 (thirty) of the existing xxhash algorithms, but found only 4 (four) of those that were successful in symbol resolution. Further analysis showed that many xxhash algorithms required initialisation with some value. Mostly, the authors suggest the use of the length of the previous symbol for the initialiser. Some other suggest that initialisation is done by the xxhash obtained from the previous symbol calculation. That approach is impossible with compilers / interpreters, since those have their syntax, which is very varied and can combine symbols in many different ways. In that context there is no such thing as a previous or next symbol. Also, many of the xxhash algorithms required that the final calculation involves a bitwise operation (usually XOR) on the obtained result. This meant that a method had to devise the usage of some seed. In order to solve both of the problems, two different algorithms were designed for the random generation of two sets of seeds, one for the initialisation of the xxhash and the other for the final bit operation. One algorithm was used for the generation of the array of 7 (seven) random seeds and second, completely different, algorithm was used for the generation of the array of 2099 (two thousand and nineteen nine) random seeds. It is not a coincidence that both of these numbers are primes. Since only 4 (four) of the existing algorithms lead to the desired results, it was necessary that this research also develops a set of new hash algorithms. In total, 8 (eight) new xxhashes were designed, some of which performed significantly better then the known ones. It must be pointed out that, of those 8 (eight) new xxhashes, 3 (three) were loosely inspired by the existing xxashes. Word “inspired” was used because those algorithms required heavy changes in order to meet the above described criteria and provide usable results. This practically created new algorithms, with entirely different efficiency …. The original algorithms will be mentioned in the code listings of the new xxhashes. As already mentioned, the example on which the algorithms were developed and tested was SQL interpreter and more precisely the symbols in MySQL implementation of SQL. One of the characteristics of SQL is that language is case-insensitive. Since symbols are all uppercase in MySQL implementation, that required the addition of the code that converts all letters into uppercase. This could have been achieved by table lookup, but finished by using a simple ternary operator. This code was not included in the presentation of the new xxhashes, nor was it included in the calculation of the algorithm efficiency. The reason for that is that, since SQL is case-insensitive, a programmer or query designer can write symbols in the queries with all possible variants of the letter cases. In order to test the validity of the final result, many tests were included. First of all, the above described vector and the indices were calculated with upper-case symbols, while testing was done with lower-case symbols. The sparse vector was initialised with UINT16_MAX values, so if the hash returned that value as the valid index, then entire calculus failed. After that test, found symbol length was compared with input symbol length and two strings were then compared for equality. 3 Presentation of the results Results presented in this paper contain all of 12 (twelve) xxhashes, but, for the comparison purposes, these also include two other, historically important, methods. First one is used in the current MySQL implementation, and it is based on the digital search method. It can be found in the MySQL source code directory sql/ as a file sql_lex_hash.cc. Beside the above method, another existing algorithm was included and that is a double hashing algorithm. It was implemented during this research, but again, only for comparing their efficiency and speed with the new method. All of results are presented in the following table. Table has the following columns: Title of the method The algorithm used Reference for the algorithm, for the existing algorithms only. These are numbers corresponding to Bibliography chapter. The algorithm efficiency, where ’n’ stands for the number of bytes (characters). Double hashing, although, has different efficiency calculus. Sparse vector size Total time (in microseconds) for the resolution of all 803 (eight hundred and three) symbols on Intel i5 CPU @ 4.0 GHz, by the method described in the previous chapter 7. Total time (in microseconds) for the resolution of all 803 (eight hundred and three) symbols on ARM, actually M3 CPU @ 4.0 GHz, by the method described in the previous chapter A table containing all of the data for those seven columns is presented below, where the title of each column is fully explained in the above list. In the above table, 3 (three) runs were necessary on Intel i5 to provide average results for the speed of resolution in µsec, and with a very small standard deviation. On the M3, however, 12 (twelve) runs were necessary to get results with the approximately same reliability, meaning with the approximately same and acceptable standard deviation. A look at the above table also shows that speed of M3 processor is significantly faster than the one of Intel’s i5. That is, actually, expected behaviour. What was unexpected is that rankings in the speed of the algorithms were very different between two CPU’s. That could have been explained only with different relative costs of different operations on the bytes and bits for different CPUs. Intel’s CPU has higher costs for division and remainder then other operations, including byte operations like addition and multiplication. ARM CPU has smaller variations among different operations and it proved much trickier to get reliable costs of operation. This started a new research with the aim of establishing more reliable operation costs for both CPUs. When this research is designed and finished, it’s results will be provided in another paper. 4 Presentations of the new hashes In this chapter, we shall present hashes in the manner that they were used. They are simply returning the index in the sparse vector, while doing some necessary checks. Hashes that are used in production do not need to do any checks. They should just return the symbol, as in the formula already provided above, which we are repeating here: Symbol = Vector_of_Symbols[Sparse_Vector[x]]; A hash function can return a pointer to the symbol. Some program opt to check for error, in which case a NULL is returned for the pointer. As it is already indicated, hashes presented here return that ‘x’ from the above formula. You will also note that these hashes return the unsigned 32-bit unsigned integer. So, for the implementation of this method, that value is converted to 16-bit unsigned integer. It is done by calculating the remainder of the 32-bit value by the sparse vector size. This is already explained at the start of this article. All the hashes, as well as many programs written during this research were coded in C standard 2017 [ 6 ]. C 2023 was, unfortunately, not finished when the research started. It is my personal opinion that C language, especially it’ latest standards, is the best tool for system programming. C + + is not suitable for that purpose, in my humble opinion. Also, as mentioned above, conversions to uppercase are omitted in the presentation of new hashes. So, here are the source codes for the new hashes. The code provided here is, from now on, in the public domain. 5 Source code Only new hashes from the above table are presented, which are the last eight hashes: uint32_t sinisa_one_hash (uint16_t length, char key[length]) { register uint32_t hash = 0,i, old_x = seeds[length % 7]; for (i = 0; i < length; ++i) { register int8_t x= (int8_t) key[i]; hash= (hash ? hash : my_seeds[(old_x*(uint32_t)x) % 2999]); hash += (uint32_t)x << (old_x % 11); old_x = x; } return hash ^ my_seeds[hash % 2999]; } uint32_t sinisa_two_hash (uint16_t length, char key[length]) { register uint32_t hash = 0, i, old_x = seeds[length % 7]; for (i = 0; i < (uint32_t)length; ++i) { register int8_t x= (int8_t) key[i]; hash= (hash ? hash : my_seeds[(old_x*(uint32_t)x) % 2999]); old_x = x; hash = ((hash < > 27)) + (uint32_t) x; } hash ^= my_seeds[hash % 2999]; return hash; } uint32_t sinisa_three_hash (uint16_t length, char key[length]) { register uint32_t hash = 0, old_x = seeds[length % 7]; for (register uint16_t i = 0; i < length; i++) { register uint8_t x= (uint8_t) key[i]; hash= (hash ? hash : my_seeds[(old_x*(uint32_t)x) % 2999]); hash+= (uint32_t)x ^ my_seeds[i*x*old_x % 2999]; old_x = x; } hash ^= my_seeds[hash % 2999]; return hash; } uint32_t sinisa_four_hash (uint16_t length, char key[length]) { register uint32_t hash = 0; for (register uint16_t i = 0; i < length; i++) { register int8_t x= (int8_t) key[i]; hash= (hash ? hash : my_seeds[(length*(uint32_t)x) % 2999]) ; hash += (uint32_t)(x) ^ seeds[x % 7]; } hash ^= my_seeds[hash % 2999]; return hash; } uint32_t sinisa_five_hash (uint16_t length, char key[length]) { register uint32_t hash, i; for (hash= (uint32_t) length, i = 0; i < (uint32_t)length; ++i) { register int8_t x= (int8_t) key[i]; hash = ((hash < > 27)) ^ (uint32_t) x; } hash ^= my_seeds[hash % 2999]; return hash; } uint32_t sinisa_six_hash (uint16_t length, char key[length]) { /* Loosely inspired by Bernstein hash [ 4 ] */ register uint32_t hash = 0, old_x = seeds[length % 7]; for (register uint16_t i = 0; i < length; ++i) { register int8_t x= (int8_t) key[i]; hash= (hash ? hash : my_seeds[(old_x*(uint32_t)x) % 2999]); old_x = x; hash = hash * 55 + (uint32_t)x; } hash ^= my_seeds[hash % 2999]; return hash; } uint32_t sinisa_seven_hash (uint16_t length, char key[length]) { /* Loosely based on Zobrist hash [ 4 ] */ register uint32_t hash, i; for (hash = length, i = 0; i < length; ++i) { register uint8_t x= (uint8_t) key[i]; hash ^= x*my_seeds[i]; } return hash; } uint32_t sinisa_eight_hash (uint16_t length, char key[length]) { /* Loosely based on CRC hash [ 4 ] */ register uint32_t hash = 0, i, old_x = seeds[length % 7]; for (i = 0; i > 11); hash ^= (hash < < 7); old_x = x; } return hash ^= my_seeds[hash % 2999] ; } References Donald E. Knuth, ”The Art of Computer Programming”. Volume 3 "Sorting and searching”, chapter 6.3 "Digital searching”, 1998 Robert Sedgewick, “Algorithms in C”, Addison-Wesley, 1998 Bob Jenkins, https://en.wikipedia.org/wiki/Jenkins_hash_function, 1997 Bob Jenkins, Hash Functions, Dr. Dobb's Journal September 1997 Phong Vo, Landon Curt Noll, https://github.com/haipome/fnv/blob/master/fnv.c, 2013 Jens Gusted, “Modern C”, Third Edition, December 2023 Tables Table 1 Presentation of the results Method title Algorithm Reference Algorithm efficiency Sparse vector size in number of entries Total time on i5 @ 4 GHz in µsec Total time on M3 @ 4 GHz in µsec Digital search Tries [1] 8*n + 8 - 36 46 Double hashing hash [2] m/n(1/(1-n/m)) m=2753, n= 67 46 44 Bob Jenkins 1 xxhash [3] 8*n + 10 51347 23 15 Bob Jenkins 2 xxhash [4] 5*n + 35 43331 22 16 Paul Hsieh (modified) xxhash [4] 5*n + 17 54401 21 16 fnv_32a_str (modified) xxhash [5] 4*n + 2 49581 16 11 Sinisa 1 xxhash new 4*n + 9 40823 24 16 Sinisa 2 xxhash new 5*n + 9 45061 20 13 Sinisa 3 xxhash new 6*n + 9 45317 25 16 Sinisa 4 xxhash new 4*n + 7 52951 25 12 Sinisa 5 xxhash new 4*n + 4 54401 20 13 Sinisa 6 xxhash new 3*n + 6 42397 19 14 Sinisa 7 xxhash new 4*n + 1 45751 17 16 Sinisa 8 xxhash new 5*n + 5 50951 19 17 Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5981027","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":412601915,"identity":"e3c5a6ae-7de2-41b1-899f-d7699f42f1d6","order_by":0,"name":"Sinisa Milivojevic","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA6klEQVRIiWNgGAWjYDACZgaGAwwMNqRrSWNgYAPxEojXeJgELfLtvAcP/qg5L29wv/nZg48/GOT5xQ4wfubBo8XgMF/CYZ5jtw03HGMzN5yRwGA4c3YCszReLcw8BocZ2G4nGBxjMJPmSWBIMLidwMaMT4t8M4/BwR//zgG1sH+T/kOMFobDPAYHeNsOALXwmEkzEKPFAKjlMG9fsuHMYzllkj1pEkC/JDZLzsHnsP4zxh9/fLOT5zt8fJvEDxsbeX7p5IMf3uBzGAwoHABTEkDM2MCE1y9w6xqQOIw/iNEyCkbBKBgFIwUAAEaYSCOP3iqkAAAAAElFTkSuQmCC","orcid":"","institution":"Oracle Inc.","correspondingAuthor":true,"prefix":"","firstName":"Sinisa","middleName":"","lastName":"Milivojevic","suffix":""}],"badges":[],"createdAt":"2025-02-07 12:15:28","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-5981027/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5981027/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":75884487,"identity":"34461e1b-a535-4aa4-a700-6d9dcecef822","added_by":"auto","created_at":"2025-02-10 08:54:35","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1244637,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5981027/v1/1100127b-905e-4438-b519-e0dc92bfbfa3.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eA method for Symbol Resolution\u003c/p\u003e","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003eAll compilers and interpreters, have their own set of symbols, which comprise reserved words, keywords, function names and similar. In order to recognise that set from other literals, there is a need to have fast method to recognise whether a separate literal used is one of it\u0026rsquo;s symbols. That is exactly what is meant by the term \u0026ldquo;symbol resolution\u0026rdquo;. There were many methods used in the past, one of which is hashing. The crux of this paper is dedicated to the description of methods that use a family of hashes, which are generated with both byte and bit operations on the symbols. That family of hashes is better known as xxhashes,\u003c/p\u003e"},{"header":"2 A description of the method","content":"\u003cp\u003eMost of the compilers / interpreters are storing their symbols as a packed array of structures, usually written in C programming language. These structures usually contain the string for the symbol literal, it\u0026rsquo;s length and the fields for the attributes which provide more detail on what are the characteristics of the symbol.\u003c/p\u003e \u003cp\u003eThis paper presents the research that has developed a method that involved the usage of the sparse and imperfect vectors that store indices for the entries in the above described packed symbol vector. This method is called indirect indexing, since valid sparse, imperfect vector whose elements contain valid and invalid values, since this is a sparse vector. Each of the valid values is the index in the packed, perfect vector that stores all symbols.\u003c/p\u003e \u003cp\u003eTo put it in a formula, it looks like this:\u003c/p\u003e \u003cp\u003eSymbol\u0026thinsp;=\u0026thinsp;Vector_of_Symbols[Sparse_Vector[x]];\u003c/p\u003e \u003cp\u003ewhere x is (obviously) the index in the sparse vector.\u003c/p\u003e \u003cp\u003eThis approach has necessitated that a method is invented for the calculation of the indices (denoted as \u0026lsquo;x\u0026rsquo; in the above formula) in the sparse vector containing values that point to the valid entries in the packed vector containing symbols. These symbols are reserved words, keywords, function names, operators et cetera.\u003c/p\u003e \u003cp\u003eA very strict criteria has been set that would necessitate that the above indices are calculated only from the string literals of these symbols, but in the manner that the sparse vector contains only one index for each separate symbol. That is the crux of the method that was developed.\u003c/p\u003e \u003cp\u003eThis is better visualised in you start from an example. If you look at one of the latest MySQL include file, lex.h, you will notice that there are 845 symbols. However, of these only 803 symbols are relevant for this research. Those 803 symbols include all the reserved words, keywords, operators and functions. The rest are optimiser hints, that can not be treated in the same way, because they contain duplicates, which is incompatible with any symbol resolution. Those are also treated totally differently in the current MySQL code.\u003c/p\u003e \u003cp\u003eThe tests were executed on the last 8.0 release available at the time. Since then, new versions and releases have come out, with some changes in the symbol table. All those were tested too and those were very similar to the the original results (presented in the next chapter).\u003c/p\u003e \u003cp\u003eUsing imperfect, sparse vector necessitated an algorithm that would find the smallest possible vector that would resolve the symbols. Hence, another objective of this research was to use an optimum-seeking algorithm that would run symbol resolution until they would all fit in the vector, without any duplicates. It was decided that this vector should have no more then 64 K entries. For that purpose a separate array was created, which contains all primes between 16 K and 64 K. Those primes were used for the searching of the smallest vector, where all the entries (in this case 803 of them) would fit, without duplicates. This vector was then generated as an obligatory part of the algorithm results. That also means that the hashing algorithm used should provide a 16-bit value as a final result. This was easily achieved by obtaining a 32-bit hash and calculating the remainder of that value with the size of the sparse vector.\u003c/p\u003e \u003cp\u003eIn such a system, each sparse, imperfect vector should have been initialised. In this system, 0 (zero) would be a valid, although highly improbable entry, since the last prime, in the above mentioned array, is definitely smaller then 64K. Hence, UINT16_MAX would be totally impossible entry, as the maximum size of the vector would be smaller. That is a reason why every entry in the sparse vector would be initialised with a value of UINT16_MAX. That has made searching algorithm easier to design. Simply, if calculated index for the sparse vector would contain a value of UINT16_MAX, then that entry is valid and it was written into that spot. If the value found was smaller, then it would indicate that the algorithm has hit upon a duplicate entry. The code would then proceed with a new, larger vector, until a vector was found that would accommodate all symbols, in this case 803 of them. The sizes of vectors, for the usable hashes, are also included in the results.\u003c/p\u003e \u003cp\u003eIt should be mentioned that the results of each successful algorithm included the generated sparse vector containing all entries including the valid ones (in this case 803 of them), vector\u0026rsquo;s size (as number of entries), some info on the successful completion and the speed of the resolution of all 803 symbols, in microseconds. This also means that the runtime code only needed the vector and the xxhash algorithm which returns the symbol. This also means that the size of the product will include additional storage for the array, whose size is circa 100 Kbytes. For the contemporary computers that is a small price to pay.\u003c/p\u003e \u003cp\u003eIt was discovered during research that those hashes that were successful, managed to do with less then 64 K, while the others could not fit even in much larger vectors. This is due to the fact that the unsuccessful algorithms have produced duplicates even with much larger vector sizes. Some produced duplicates even with 32-bit vector sizes.\u003c/p\u003e \u003cp\u003eIt should be pointed out that the speed and efficiency of searching and generating sparse vector, and filing it up with indices in the right places, is not relevant. That is because this step is performed during compiling of the SQL interpreter (or any other interpreter or compiler). The important efficiency and speed is the one in runtime, which deals only with symbol resolution.\u003c/p\u003e \u003cp\u003eAt the very start of the research it was clear that classical hashes, that only use byte arithmetics, can not produce desirable results. Hence, this is a reason why it was mandatory that it necessitate usage of hashes which use a combination of both byte and bit operations. Shortly, we use the existing term xxhashes for those hashes that use operations on both bytes and bits.\u003c/p\u003e \u003cp\u003eThe research has tested over 30 (thirty) of the existing xxhash algorithms, but found only 4 (four) of those that were successful in symbol resolution.\u003c/p\u003e \u003cp\u003eFurther analysis showed that many xxhash algorithms required initialisation with some value. Mostly, the authors suggest the use of the length of the previous symbol for the initialiser. Some other suggest that initialisation is done by the xxhash obtained from the previous symbol calculation. That approach is impossible with compilers / interpreters, since those have their syntax, which is very varied and can combine symbols in many different ways. In that context there is no such thing as a previous or next symbol.\u003c/p\u003e \u003cp\u003eAlso, many of the xxhash algorithms required that the final calculation involves a bitwise operation (usually XOR) on the obtained result. This meant that a method had to devise the usage of some seed.\u003c/p\u003e \u003cp\u003eIn order to solve both of the problems, two different algorithms were designed for the random generation of two sets of seeds, one for the initialisation of the xxhash and the other for the final bit operation. One algorithm was used for the generation of the array of 7 (seven) random seeds and second, completely different, algorithm was used for the generation of the array of 2099 (two thousand and nineteen nine) random seeds. It is not a coincidence that both of these numbers are primes.\u003c/p\u003e \u003cp\u003eSince only 4 (four) of the existing algorithms lead to the desired results, it was necessary that this research also develops a set of new hash algorithms. In total, 8 (eight) new xxhashes were designed, some of which performed significantly better then the known ones. It must be pointed out that, of those 8 (eight) new xxhashes, 3 (three) were loosely inspired by the existing xxashes. Word \u0026ldquo;inspired\u0026rdquo; was used because those algorithms required heavy changes in order to meet the above described criteria and provide usable results. This practically created new algorithms, with entirely different efficiency \u0026hellip;. The original algorithms will be mentioned in the code listings of the new xxhashes.\u003c/p\u003e \u003cp\u003eAs already mentioned, the example on which the algorithms were developed and tested was SQL interpreter and more precisely the symbols in MySQL implementation of SQL. One of the characteristics of SQL is that language is case-insensitive. Since symbols are all uppercase in MySQL implementation, that required the addition of the code that converts all letters into uppercase. This could have been achieved by table lookup, but finished by using a simple ternary operator. This code was not included in the presentation of the new xxhashes, nor was it included in the calculation of the algorithm efficiency. The reason for that is that, since SQL is case-insensitive, a programmer or query designer can write symbols in the queries with all possible variants of the letter cases.\u003c/p\u003e \u003cp\u003eIn order to test the validity of the final result, many tests were included. First of all, the above described vector and the indices were calculated with upper-case symbols, while testing was done with lower-case symbols. The sparse vector was initialised with UINT16_MAX values, so if the hash returned that value as the valid index, then entire calculus failed. After that test, found symbol length was compared with input symbol length and two strings were then compared for equality.\u003c/p\u003e"},{"header":"3 Presentation of the results","content":"\u003cp\u003eResults presented in this paper contain all of 12 (twelve) xxhashes, but, for the comparison purposes, these also include two other, historically important, methods. First one is used in the current MySQL implementation, and it is based on the digital search method. It can be found in the MySQL source code directory sql/ as a file sql_lex_hash.cc.\u003c/p\u003e \u003cp\u003eBeside the above method, another existing algorithm was included and that is a double hashing algorithm. It was implemented during this research, but again, only for comparing their efficiency and speed with the new method.\u003c/p\u003e \u003cp\u003eAll of results are presented in the following table. Table has the following columns:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eTitle of the method\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eThe algorithm used\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eReference for the algorithm, for the existing algorithms only. These are numbers corresponding to Bibliography chapter.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eThe algorithm efficiency, where \u0026rsquo;n\u0026rsquo; stands for the number of bytes (characters). Double hashing, although, has different efficiency calculus.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eSparse vector size\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eTotal time (in microseconds) for the resolution of all 803 (eight hundred and three) symbols on Intel i5 CPU @ 4.0 GHz, by the method described in the previous chapter\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003e7. Total time (in microseconds) for the resolution of all 803 (eight hundred and three) symbols on ARM, actually M3 CPU @ 4.0 GHz, by the method described in the previous chapter\u003c/p\u003e \u003cp\u003eA table containing all of the data for those seven columns is presented below, where the title of each column is fully explained in the above list.\u003c/p\u003e \u003cp\u003eIn the above table, 3 (three) runs were necessary on Intel i5 to provide average results for the speed of resolution in \u0026micro;sec, and with a very small standard deviation. On the M3, however, 12 (twelve) runs were necessary to get results with the approximately same reliability, meaning with the approximately same and acceptable standard deviation.\u003c/p\u003e \u003cp\u003eA look at the above table also shows that speed of M3 processor is significantly faster than the one of Intel\u0026rsquo;s i5. That is, actually, expected behaviour. What was unexpected is that rankings in the speed of the algorithms were very different between two CPU\u0026rsquo;s. That could have been explained only with different relative costs of different operations on the bytes and bits for different CPUs.\u003c/p\u003e \u003cp\u003eIntel\u0026rsquo;s CPU has higher costs for division and remainder then other operations, including byte operations like addition and multiplication. ARM CPU has smaller variations among different operations and it proved much trickier to get reliable costs of operation. This started a new research with the aim of establishing more reliable operation costs for both CPUs. When this research is designed and finished, it\u0026rsquo;s results will be provided in another paper.\u003c/p\u003e"},{"header":"4 Presentations of the new hashes","content":"\u003cp\u003eIn this chapter, we shall present hashes in the manner that they were used. They are simply returning the index in the sparse vector, while doing some necessary checks. Hashes that are used in production do not need to do any checks. They should just return the symbol, as in the formula already provided above, which we are repeating here:\u003c/p\u003e \u003cp\u003eSymbol\u0026thinsp;=\u0026thinsp;Vector_of_Symbols[Sparse_Vector[x]];\u003c/p\u003e \u003cp\u003eA hash function can return a pointer to the symbol. Some program opt to check for error, in which case a NULL is returned for the pointer.\u003c/p\u003e \u003cp\u003eAs it is already indicated, hashes presented here return that \u0026lsquo;x\u0026rsquo; from the above formula. You will also note that these hashes return the unsigned 32-bit unsigned integer. So, for the implementation of this method, that value is converted to 16-bit unsigned integer. It is done by calculating the remainder of the 32-bit value by the sparse vector size. This is already explained at the start of this article. All the hashes, as well as many programs written during this research were coded in C standard 2017 [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. C 2023 was, unfortunately, not finished when the research started. It is my personal opinion that C language, especially it\u0026rsquo; latest standards, is the best tool for system programming. C\u0026thinsp;+\u0026thinsp;+\u0026thinsp;is not suitable for that purpose, in my humble opinion.\u003c/p\u003e \u003cp\u003eAlso, as mentioned above, conversions to uppercase are omitted in the presentation of new hashes.\u003c/p\u003e \u003cp\u003eSo, here are the source codes for the new hashes. The code provided here is, from now on, in the public domain.\u003c/p\u003e"},{"header":"5 Source code","content":"\u003cp\u003eOnly new hashes from the above table are presented, which are the last eight hashes:\u003c/p\u003e \u003cp\u003e \u003cb\u003euint32_t sinisa_one_hash (uint16_t length, char key[length]) {\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eregister uint32_t hash\u0026thinsp;=\u0026thinsp;0,i, old_x\u0026thinsp;=\u0026thinsp;seeds[length % 7];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003efor (i\u0026thinsp;=\u0026thinsp;0; i\u0026thinsp;\u0026lt;\u0026thinsp;length; ++i) {\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eregister int8_t x= (int8_t) key[i];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash= (hash ? hash : my_seeds[(old_x*(uint32_t)x) % 2999]);\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash += (uint32_t)x \u0026lt;\u0026lt; (old_x % 11);\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eold_x\u0026thinsp;=\u0026thinsp;x;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e}\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ereturn hash ^ my_seeds[hash % 2999];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e}\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003euint32_t sinisa_two_hash (uint16_t length, char key[length]) {\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eregister uint32_t hash\u0026thinsp;=\u0026thinsp;0, i, old_x\u0026thinsp;=\u0026thinsp;seeds[length % 7];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003efor (i\u0026thinsp;=\u0026thinsp;0; i \u0026lt; (uint32_t)length; ++i) {\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eregister int8_t x= (int8_t) key[i];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash= (hash ? hash : my_seeds[(old_x*(uint32_t)x) % 2999]);\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eold_x\u0026thinsp;=\u0026thinsp;x;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash = ((hash\u0026thinsp;\u0026lt;\u0026thinsp;\u0026lt;\u0026thinsp;5) + (hash\u0026thinsp;\u0026gt;\u0026thinsp;\u0026gt;\u0026thinsp;27)) + (uint32_t) x;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e}\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash ^= my_seeds[hash % 2999];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ereturn hash;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e}\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003euint32_t sinisa_three_hash (uint16_t length, char key[length]) {\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eregister uint32_t hash\u0026thinsp;=\u0026thinsp;0, old_x\u0026thinsp;=\u0026thinsp;seeds[length % 7];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003efor (register uint16_t i\u0026thinsp;=\u0026thinsp;0; i\u0026thinsp;\u0026lt;\u0026thinsp;length; i++) {\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eregister uint8_t x= (uint8_t) key[i];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash= (hash ? hash : my_seeds[(old_x*(uint32_t)x) % 2999]);\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash+= (uint32_t)x ^ my_seeds[i*x*old_x % 2999];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eold_x\u0026thinsp;=\u0026thinsp;x;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e}\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash ^= my_seeds[hash % 2999];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ereturn hash;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e}\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003euint32_t sinisa_four_hash (uint16_t length, char key[length]) {\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eregister uint32_t hash\u0026thinsp;=\u0026thinsp;0;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003efor (register uint16_t i\u0026thinsp;=\u0026thinsp;0; i\u0026thinsp;\u0026lt;\u0026thinsp;length; i++) {\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eregister int8_t x= (int8_t) key[i];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash= (hash ? hash : my_seeds[(length*(uint32_t)x) % 2999]) ;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash += (uint32_t)(x) ^ seeds[x % 7];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e}\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash ^= my_seeds[hash % 2999];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ereturn hash;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e}\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003euint32_t sinisa_five_hash (uint16_t length, char key[length]) {\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eregister uint32_t hash, i;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003efor (hash= (uint32_t) length, i\u0026thinsp;=\u0026thinsp;0; i \u0026lt; (uint32_t)length; ++i) {\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eregister int8_t x= (int8_t) key[i];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash = ((hash\u0026thinsp;\u0026lt;\u0026thinsp;\u0026lt;\u0026thinsp;5) ^ (hash\u0026thinsp;\u0026gt;\u0026thinsp;\u0026gt;\u0026thinsp;27)) ^ (uint32_t) x;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e}\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash ^= my_seeds[hash % 2999];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ereturn hash;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e}\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003euint32_t sinisa_six_hash (uint16_t length, char key[length]) {\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e/* Loosely inspired by Bernstein hash\u003c/b\u003e [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] \u003cb\u003e*/\u003c/b\u003e\u003c/p\u003e \u003cp\u003e \u003cb\u003eregister uint32_t hash\u0026thinsp;=\u0026thinsp;0, old_x\u0026thinsp;=\u0026thinsp;seeds[length % 7];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003efor (register uint16_t i\u0026thinsp;=\u0026thinsp;0; i\u0026thinsp;\u0026lt;\u0026thinsp;length; ++i) {\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eregister int8_t x= (int8_t) key[i];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash= (hash ? hash : my_seeds[(old_x*(uint32_t)x) % 2999]);\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eold_x\u0026thinsp;=\u0026thinsp;x;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash\u0026thinsp;=\u0026thinsp;hash * 55 + (uint32_t)x;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e}\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash ^= my_seeds[hash % 2999];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ereturn hash;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e}\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003euint32_t sinisa_seven_hash (uint16_t length, char key[length]) {\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e/* Loosely based on Zobrist hash\u003c/b\u003e [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] \u003cb\u003e*/\u003c/b\u003e\u003c/p\u003e \u003cp\u003e \u003cb\u003eregister uint32_t hash, i;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003efor (hash\u0026thinsp;=\u0026thinsp;length, i\u0026thinsp;=\u0026thinsp;0; i\u0026thinsp;\u0026lt;\u0026thinsp;length; ++i) {\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eregister uint8_t x= (uint8_t) key[i];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash ^= x*my_seeds[i];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e}\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ereturn hash;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e}\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003euint32_t sinisa_eight_hash (uint16_t length, char key[length]) {\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e/* Loosely based on CRC hash\u003c/b\u003e [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] \u003cb\u003e*/\u003c/b\u003e\u003c/p\u003e \u003cp\u003e \u003cb\u003eregister uint32_t hash\u0026thinsp;=\u0026thinsp;0, i, old_x\u0026thinsp;=\u0026thinsp;seeds[length % 7];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003efor (i\u0026thinsp;=\u0026thinsp;0; i\u0026thinsp;\u0026lt;\u0026thinsp;length; ++i) {\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eregister uint8_t x= (uint8_t) key[i];\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash= (hash ? hash : my_seeds[(old_x*(uint32_t)x) % 2999]);\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash += (uint32_t)x;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ehash += (hash\u0026thinsp;\u0026gt;\u0026thinsp;\u0026gt;\u0026thinsp;11); hash ^= (hash\u0026thinsp;\u0026lt;\u0026thinsp;\u0026lt;\u0026thinsp;7);\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eold_x\u0026thinsp;=\u0026thinsp;x;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e}\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ereturn hash ^= my_seeds[hash % 2999] ;\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e}\u003c/b\u003e \u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eDonald E. Knuth, \u0026rdquo;The Art of Computer Programming\u0026rdquo;. Volume 3 \u0026quot;Sorting and searching\u0026rdquo;, chapter 6.3 \u0026quot;Digital searching\u0026rdquo;, 1998\u003c/li\u003e\n\u003cli\u003eRobert Sedgewick, \u0026ldquo;Algorithms in C\u0026rdquo;, Addison-Wesley, 1998\u003c/li\u003e\n\u003cli\u003eBob Jenkins, https://en.wikipedia.org/wiki/Jenkins_hash_function, 1997\u003c/li\u003e\n\u003cli\u003eBob Jenkins, Hash Functions, Dr. Dobb\u0026apos;s Journal September 1997\u003c/li\u003e\n\u003cli\u003ePhong Vo, Landon Curt Noll, https://github.com/haipome/fnv/blob/master/fnv.c, 2013\u003c/li\u003e\n\u003cli\u003eJens Gusted, \u0026ldquo;Modern C\u0026rdquo;, Third Edition, December 2023\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003e\u003cstrong\u003eTable 1 Presentation of the results\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"621\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eMethod title\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eAlgorithm\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eReference\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eAlgorithm efficiency\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eSparse vector size in number of entries\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eTotal time on \u0026nbsp;i5 @ 4 GHz in \u0026micro;sec\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eTotal time on \u0026nbsp;M3 @ 4 \u0026nbsp; \u0026nbsp; \u0026nbsp;GHz in \u0026micro;sec\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eDigital search\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eTries\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e[1]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e8*n + 8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; -\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;36\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; 46\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eDouble hashing\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003ehash\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e[2]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003em/n(1/(1-n/m))\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003em=2753, n= 67\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;46\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; 44\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eBob Jenkins 1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003exxhash\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e[3]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e8*n + 10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp;51347\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; 15\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eBob Jenkins 2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003exxhash\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e[4]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e5*n + 35\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; 43331\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;22\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; 16\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003ePaul Hsieh (modified)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003exxhash\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e[4]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e5*n + 17\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; 54401\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;21\u003c/p\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; 16\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003efnv_32a_str (modified)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003exxhash\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e[5]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e4*n + 2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; 49581\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;16\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; 11\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eSinisa 1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003exxhash\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003enew\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e4*n + 9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; 40823\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;24\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; 16\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eSinisa 2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003exxhash\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003enew\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e5*n + 9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; 45061\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;20\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; 13\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eSinisa 3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003exxhash\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003enew\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e6*n + 9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; 45317\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;25\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; 16\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eSinisa 4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003exxhash\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003enew\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e4*n + 7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; 52951\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;25\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; 12\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eSinisa 5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003exxhash\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003enew\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e4*n + 4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; 54401\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;20\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; 13\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eSinisa 6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003exxhash\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003enew\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e3*n + 6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; 42397\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;19\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; 14\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eSinisa 7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003exxhash\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003enew\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e4*n + 1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; 45751\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;17\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; 16\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003eSinisa 8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003exxhash\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003enew\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e5*n + 5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; 50951\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;19\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 89px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; 17\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Sinisa Milivojevic","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"symbol resolution, sparse vector, hashing, new hashes","lastPublishedDoi":"10.21203/rs.3.rs-5981027/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5981027/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThis paper provides the result of an extensive research into a novel set of algorithms that would provide more efficient (and faster) manner of resolving symbols, such as reserved words, keywords, operators, functions and similar. A new method has been provided which has been thoroughly tested. That method required the utilisation of the hashing. Many hashes have been tested, of which only few could render required results. This resulted in the the invention and development of new hashes, by the author. All necessary information from this research is presented including characteristics of some known algorithms, and including the efficiency of the utilised hashes and a total speed of execution for the entire set of symbols on two different CPUs.\u003c/p\u003e","manuscriptTitle":"A method for Symbol Resolution","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-02-10 08:30:30","doi":"10.21203/rs.3.rs-5981027/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"652ae0f2-7e1a-4f6a-a724-7cf4c1cdb42d","owner":[],"postedDate":"February 10th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":44001785,"name":"Software Engineering"}],"tags":[],"updatedAt":"2025-02-10T08:30:30+00:00","versionOfRecord":[],"versionCreatedAt":"2025-02-10 08:30:30","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5981027","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5981027","identity":"rs-5981027","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00