Back to Meskhi.Net |
Slava Meskhi
 This note does not pretend to represent some new formal theory, there are quite enough of them, but to point out mainstream in semantics of natural languages, which is predetermined by the following factors:
(i) huge set of indexed digital texts on the Web;
(ii) increased computational power of the leading search engines, primarily Google;
(iii) implementation of some sophisticated algorithms, such as LSI – Latent Semantic Indexing (see the most important references on LSI in [1]), which can process large collection of digital texts.
 So, what is new in computable semantics of natural languages?
 Let us remember the difference in methods between an adult and a child, trying to understand the meaning of some new unknown word. An adult is looking for some definition, explanation using such semantic tools as dictionaries, thesauri, ontologies and classifications. A child collects the situational context, in which the new word occurs, and after some time, when the set of the contexts is large enough, tries out the adopted semantics. As a rule, an adult is limited in time and has no right to make many mistakes, while a child is not limited in time and uses his mistakes in understanding of the meaning for the refinement of the adopted semantics. As a result, a child produces quite satisfactory semantics of the new word without understanding the appropriate definition.
 Now let us draw an analogy between a child and a search engine user. Asking Google for some unknown word/word-combination, we get some set of contexts of this term, sometimes including the definition from on-line dictionaries. Using this set of contexts we associate our semantic value of the term. We are limited in time, but nowadays this limitation is compensated by powerful search engines and by great collection of previously indexed digital texts.
 Now let us draw an analogy between a child and a search engine user. Asking Google for some unknown word/word-combination, we get some set of contexts of this term, sometimes including the definition from on-line dictionaries. Using this set of contexts we associate our semantic value of the term. We are limited in time, but nowadays this limitation is compensated by powerful search engines and by great collection of previously indexed digital texts. The context-generated semantics of natural language elements is one of the most important achievements, available AG (After Google). As it often happens, this achievement is a side effect of indexing of large collection of digital texts and interested motives of search engines owners in detection of semantically related texts. Description of the Latent Semantic Indexing is beyond the scope of this note, one can see it in [2], but the main idea is to associate with an element of natural language the collection of digital texts, in which the element occurs, and to declare two elements semantically close, if they occur in sufficient number of texts. Thus, in spite of the fact that a search engine understands nothing about mathematics and music, it detects, that “n-dimensional”, “manifold” and “topology” are semantically close words, and “Camptown races” is semantically close to music. Furthermore, the set of contexts, I mean the set of texts returned by a search engine using LSI or some other method of filtering information based on co-occurrence of words, is larger than a set retrieved by keywords only.
 So, what is the new reality in semantics of natural languages AG (After Google)?
(1) Definitions are forced to be supplanted by contexts.
(2) Semantically related words/word-combinations are computable, using co-occurrence in a large collection of digital texts, without understanding the meaning of the words.
(3) Context-generated semantics of natural languages, which looks, at first glance, more fuzzy and and obeys the laws of fuzzy logic [3], in fact is more informative and more precise, than traditional one, based on traditional semantic tools, such as dictionaries, thesauri, ontologies and classifications.
(4) Using powerful computational possibilities and sophisticated algorithms we are going back to our childhood and this is the right way.
 References
 1. Latent semantic analysis, http://en.wikipedia.org/wiki/Latent_semantic_analysis
 2. Clara Yu, John Cuadrado, Maciej Ceglowski, J. Scott Payne. “Patterns in Unstructured Data”, National Institute for Technology and Liberal Education, http://www.seobook.com/lsi/lsa_definition.htm
 3. Slava Meskhi. "Fuzzy Propositional Logic. Algebraic Approach", Studia Logica: An International Journal for Symbolic Logic, Vol. 36, No. 3 (1977), pp. 189-194