Why Need a Thesaurus? |
|||||||||||||||||||||||||||||||||||||||
What is a Thesaurus?Definition Before tackling the subject of "Why Need a Thesaurus" let us define what a 'Thesaurus' is. In documentation standards, a thesaurus is a group of canonized words called 'Keywords' with possible links between two keywords forming a hierarchy. This group of keywords (the hierarchy) is created to accurately describe ideas, people, locations, and similar elements in documents. Same objects and ideas could
have many 'names'. Names could also be
misleading sometimes. One of the objectives
of a thesaurus is to canonize terms and decide which terms will be used for
objects, ideas, etc. The usage of canonized terms
will also unify the syntax of usage of names of persons and organization.
Using Thesaurus in Documentation This type of document description is intended to providing an accurate means of finding specific documents among a large set of documents based on the descriptive keywords used to describe each document. One or more keywords are used to describe a single document. The more ideas, people, locations, and other elements are found in a particular document, the more keywords are needed to describe this single document. This is essential because people would not know which particular keyword is going to be the key element of the search criteria at a later time. Relations Using a set of logical relations is an essential part of creating a thesaurus. Some of the relations are as follows: ü Equivalent Term ü Narrow Term ü Wide Term ü Used for ü Not Used ü Top Term
Logical relations have opposite relations (i.e. Narrow is opposite to Wide, Used for is opposite to Not Used, etc.). Relations are used to provide a logical map to how keywords are inter-related. This is a key element in guiding thesaurus users to find and use keywords in a simple and accurate way. A documenter, for example, trying to describe an article that talks about politicians and corruption, would originally come up with different words to describe that article. He or she cannot however use whichever words that come to his or her mind, since articles falling within the same category would eventually need many different keywords to have them retrieved completely during a search and retrieval episode. A thesaurus with relations will definitely change this article documentation process by filtering out "Unused Keywords" and referring the documenter to the proper keywords; for example, the thesaurus will suggest "Political Reform" instead of "Political Conduct Rehabilitation". By the same token, people trying to retrieve articles using a thesaurus based search, will be guided during the search and retrieval episode; for example, the user will be shown the related term "Political Reform" when entering "Political Rehabilitation" with a clear indication that the first one is "Used For" the second term while the second one is "Not Used".
Subject Headings Keywords are sometimes
misleading when placed alone. They
also stay misleading if placed among other keywords describing a single
document. For example, trying to
describe the exports from For this, Subject Headings are the real solution to this problem. Subject headings are a
combination of 2 or more keywords describing a state. The order of the keywords in this
combination is very important. For our
example above, " People trying to retrieve documents will be prompted a list of matching subject headings that describe certain documents; they will be able to choose one or more subject headings that fit the search objectives and therefore retrieve documents with extreme accuracy! Forming subject headings has
strict rules and documenters should abide by them for proper
documentation. The rules are very
little, simple, and easy. Directory Tree The usage of a thesaurus and
its related subjects can lead to classify documents in a directory tree like
model. This model is the new standard model for 'crawlers' that regularly scan
web sites to retrieve and store On the other hand, according to
crawlers, classified keywords and document titles are considered to have a
higher value than other words found in the full text. Why Use a Thesaurus instead of Full Text SearchFull text search has been improved greatly because of the different algorithms embedded in most full text search engines. It has evolved to accommodate for frequency, relevance, order, etc. making the search results more relevant to the searcher. Still, full text search will never be able to find solutions to some really important problems. One of these problems is that a full text could be talking about people, locations, and ideas without ever mentioning the names of these people, locations, and ideas. Another problem is that some words in full text could have different meanings based on context. Such problems lead to a lot of noise in the search results and would definitely confuse and waste the time of the searcher. We see this very clearly on Internet Search Engines; results are almost all the time a lot more (with insignificant articles) than what we are looking for! A thesaurus based search would never lead to insignificant results in this manner. Only relevant documents are returned in the search results since the results are based on well-defined subject headings based on canonized keywords! Moreover, the search process is always guided in steps in order to show the searcher all the related terms and restrictions before the search process takes place. Another important aspect of thesaurus based documentation is guiding a searcher with little knowledge about the sought subject. The thesaurus offers the searcher help in finding the used keywords (keywords are one or more regular words); for example, the searcher will find all used keywords that has the word 'political' in it such as "political reform', 'political war', 'political assembly', 'American Political Science', etc. The thesaurus will also point out the related keywords in order for the searcher to be able to cover related material in his or her search. On the other hand, a thesaurus comes in handy during the documentation process. Misspelled keywords, inexistent keywords, fragmented keywords, etc. are not allowed to be entered at will. Descriptive keys will have to belong to the thesaurus; otherwise, noise will start crawling into the data by using different keywords for same subjects! A thesaurus will definitely accept new keywords; this process must go through a thesaurus expert who will enter the new keyword with its relations to other existent keywords. In short, for accurate documentation, search, and retrieval a thesaurus is essential. For approximate documentation, full text is acceptable. ArticlesThe following articles will also shed some lights on the importance of using a thesaurus. ü http://www.dlib.org/dlib/november98/11batty.html ü http://www.ariadne.ac.uk/issue23/metadata/ Noteü The images and tables used to the right are taken from this document http://www.willpower.demon.co.uk/thesprin.htm |
|
||||||||||||||||||||||||||||||||||||||