Novel Phonetic Name Matching Algorithm with a Statistical Ontology for Analysing Names Given in Accordance with Thai Astrology
Snae, Chakkrit, Bruckner, Michael, Issues in Informing Science & Information Technology
Names are used for referring to people, places, things, and even ideas or concepts, and in many cases for identifying them. Names serve as labels of categories or classes as well as individual items or instances. They are properties of individuals, which are of major importance in most communities and this also the case for Thailand. The way by which Thai parents get their children names can vary, e.g. naming by monks or grandparents. The traditional naming of Thai children from the past to the present has been continuously developed into a variety of patterns. Each pattern has its own rules with regional variations and depending on the belief developed during the centuries. The basic goal of naming in Thai society is to provide a good fortune and progress during life, and this is done by choosing given names (or first names) carefully. Most first names have a meaning. The naming methodology used in this research is the most widely used naming system, which uses Thai astrology according to the weekday of birth, unlike the Western astrology, which is based on the zodiac and the date of birth. Most Thai believe that the individual has a set of 8 attributes called "name of the angles" referred to in Thai astrology (Snae & Brueckner, 2006a), which influence each person's livelihood, fortune, and so on. The attributes are called Servant, Age, Power, Honour, Property, Diligence, Patron, and Misfortune. Each attribute has it own letters that can be used for constructing good names.
Nowadays, Thai naming systems can be seen in the Internet but most of them are on static web pages keeping indexes of names (according to birthdates) and their meanings in a database. There are some disadvantages in the current systems: (1) only a small number of names are there in the databases, around 3,000 to 4,000 names, (2) no opportunity is provided to change old names to good ones similar to the old ones with a better meaning.
To tackle names and their variations phonetic algorithms have been used for a long time. A phonetic algorithm is an algorithm for indexing of words by their pronunciation. Most phonetic algorithms were developed for use with the English language; consequently, applying the rules to words in other languages might not give a meaningful result. They are necessarily complex algorithms with many rules and exceptions, because English spelling and pronunciation is complicated by historical changes in pronunciation and words borrowed from many languages. Among the algorithms in use are the Soundex, the Metaphone and the NYSIIS algorithms together with their numerous variations.
Soundex is a phonetic algorithm to enable retrieve of information from data processing systems. R. C. Russell developed the Soundex algorithm to processes data collected from the 1890 census. Known as the Russell Soundex algorithm numerous variants have been employed for genealogy studies and retrieval systems. The Soundex algorithm has been adapted to the Thai language by Lorchirachoonkul (1982) by taking into account the specific characteristics of the Thai language.
Metaphone is a phonetic algorithm for indexing words by their sound, when pronounced in English. The algorithm produces variable length keys as its output, as opposed to Soundex's fixed-length keys. Similar sounding words share the same keys. Metaphone was developed by Lawrence Philips as a response to deficiencies in the Soundex algorithm. It is more accurate than Soundex because it uses a larger set of rules for English pronunciation. Metaphone is available as a built-in operator in a number of systems, including later versions of PHP.
In 1970 the New York State Identification and Intelligence project headed by Robert L. Taft published the paper 'Name Search Techniques". In this paper he compared Soundex with a new phonetic routine (NYSIIS) that was designed through rigorous empirical analysis.
The term ontology has been widely used in recent years in the field of Artificial Intelligence, computer and information science especially in domains such as, cooperative information systems, intelligent information integration, information retrieval and extraction, knowledge representation, and database management systems. Many different definitions of the term are proposed. One of the most widely quoted and well-known definition of ontology is Gruber's (1993): An ontology is an explicit specification of a conceptualization.
The concept of ontologies can be combined with statistics, in which case it is called statistical ontology and mostly used in data analysis, data mining or clustering of relational data and their display as statistical values (Denk, Froeschl, & Grossmann, 2002; Hert & Haas, 2003). Marchinonini, Haas, Plaisant, Shneiderman, and Hert (2003) developed a statistical ontology for finding relation of statistic and link concept of terms together by constructing definition, graph illustrating development. The ontology was used for constructing and illustrating support. Pasquier, Girardot, Jevardat de Fombelle, and Christen (2004) developed a tool called THEA (Tool for High-throughput Experiments Analysis) for analyzing huge experiment work. This tool used statistical ontology concept for constructing general meaning from basic knowledge clustering and retrieval. The data retrieval used word explanation to search knowledge in biology and used data mining for data and knowledge clustering which automatically outputted into tree pattern. In this research we use a statistical ontology for analyzing and checking good names.
Recently, Snae and Brueckner (2006a, 2006b) have pioneered to develop a dynamic online Thai naming system based on Thai astrology, which uses letters according to the day of birth to construct a melodious sounding name with a meaning and adopts the name matching algorithm LIG3 (Levenshtein, Index of Similarity Group (called ISG), and Guth) for finding similar names and variants (Snae, 2007). However, LIG3 seems to be more complex and time consuming because it contains many functions, e.g. a function to calculate the distance between two …
Questia, a part of Gale, Cengage Learning. www.questia.com
Publication information: Article title: Novel Phonetic Name Matching Algorithm with a Statistical Ontology for Analysing Names Given in Accordance with Thai Astrology. Contributors: Snae, Chakkrit - Author, Bruckner, Michael - Author. Journal title: Issues in Informing Science & Information Technology. Volume: 6. Publication date: Annual 2009. Page number: 497+. © 2008 Informing Science Institute. COPYRIGHT 2009 Gale Group.
This material is protected by copyright and, with the exception of fair use, may not be further copied, distributed or transmitted in any form or by any means.