Why Need a Thesaurus?

What is a Thesaurus?

Definition

Before tackling the subject of "Why Need a Thesaurus" let us define what a 'Thesaurus' is.  In documentation standards, a thesaurus is a group of canonized words called 'Keywords' with possible links between two keywords forming a hierarchy.  This group of keywords (the hierarchy) is created to accurately describe ideas, people, locations, and similar elements in documents. 

 

Same objects and ideas could have many 'names'.  Names could also be misleading sometimes.  One of the objectives of a thesaurus is to canonize terms and decide which terms will be used for objects, ideas, etc.

 

The usage of canonized terms will also unify the syntax of usage of names of persons and organization.

 

Objects can have many names

 

Using Thesaurus in Documentation

This type of document description is intended to providing an accurate means of finding specific documents among a large set of documents based on the descriptive keywords used to describe each document.

 

One or more keywords are used to describe a single document.  The more ideas, people, locations, and other elements are found in a particular document, the more keywords are needed to describe this single document.  This is essential because people would not know which particular keyword is going to be the key element of the search criteria at a later time.

 

 

 

Relations

Using a set of logical relations is an essential part of creating a thesaurus.  Some of the relations are as follows:

 

ü      Equivalent Term

ü      Narrow Term

ü      Wide Term

ü      Used for

ü      Not Used

ü      Top Term

 

 

 

Broader and narrower terms
Hierarchical relationships

·         Relationships must be independent of context

·         Terms must represent the same type of entity

Box ticked for 'Yes'

Mice

BT

Rodents

Rodents

NT

Mice

Box ticked for 'Yes'

Shoes

BT

Footwear

Footwear

NT

Shoes

Box crossed for 'No'

Mice

BT

Pests

Pests

NT

Mice

Box crossed for 'No'

Shoes

BT

Shoemaking

Shoemaking

NT

Shoes

 

 

 

Logical relations have opposite relations (i.e. Narrow is opposite to Wide, Used for is opposite to Not Used, etc.).

 

Relations are used to provide a logical map to how keywords are inter-related.  This is a key element in guiding thesaurus users to find and use keywords in a simple and accurate way.  A documenter, for example, trying to describe an article that talks about politicians and corruption, would originally come up with different words to describe that article.  He or she cannot however use whichever words that come to his or her mind, since articles falling within the same category would eventually need many different keywords to have them retrieved completely during a search and retrieval episode. 

 

A thesaurus with relations will definitely change this article documentation process by filtering out "Unused Keywords" and referring the documenter to the proper keywords; for example, the thesaurus will suggest "Political Reform" instead of "Political Conduct Rehabilitation".  By the same token, people trying to retrieve articles using a thesaurus based search, will be guided during the search and retrieval episode; for example, the user will be shown the related term "Political Reform" when entering "Political Rehabilitation" with a clear indication that the first one is "Used For" the second term while the second one is "Not Used".

 

Table 1: Sample thesaurus - hierarchical sequence

knitwear
> cardigans
> pullovers
outerwear
> blouses
> cardigans
> coats
> > raincoats
> dresses
> jackets
> > anoraks
> > blazers
> > dinner jackets
> > donkey jackets
> > reefer jackets
> leggings
> pullovers
> rainwear
> > raincoats
> shawls
> shirts
> skirts
> suits
> trousers
> > jeans
> > shorts
> > slacks

 

 

 

Subject Headings

Keywords are sometimes misleading when placed alone.  They also stay misleading if placed among other keywords describing a single document.  For example, trying to describe the exports from Bahrain to Indonesia using keywords alone would at least require three keywords: "Bahrain", "Indonesia", and "Foreign Exports".  However, we can find this misleading since this document would also indicate that it represents a story about the foreign exports from Indonesia to Bahrain, which is not really the case!

 

For this, Subject Headings are the real solution to this problem.

 

Subject headings are a combination of 2 or more keywords describing a state.  The order of the keywords in this combination is very important.  For our example above, "Bahrain – Foreign Trade – Indonesia" is different from "Indonesia – Foreign Trade – Bahrain".

 

People trying to retrieve documents will be prompted a list of matching subject headings that describe certain documents; they will be able to choose one or more subject headings that fit the search objectives and therefore retrieve documents with extreme accuracy!

 

Forming subject headings has strict rules and documenters should abide by them for proper documentation.  The rules are very little, simple, and easy.

 

Directory Tree

The usage of a thesaurus and its related subjects can lead to classify documents in a directory tree like model. This model is the new standard model for 'crawlers' that regularly scan web sites to retrieve and store information about them.  It is an automatic documentation process for indexing the numerous web sites online. The future technology in indexing will be using internal crawlers for indexing computers documents and databases. A directory tree is needed to allow a crawler to click on each subject and retrieve the related documents.

 

On the other hand, according to crawlers, classified keywords and document titles are considered to have a higher value than other words found in the full text.

 

Why Use a Thesaurus instead of Full Text Search

Full text search has been improved greatly because of the different algorithms embedded in most full text search engines.  It has evolved to accommodate for frequency, relevance, order, etc. making the search results more relevant to the searcher.

 

Still, full text search will never be able to find solutions to some really important problems.  One of these problems is that a full text could be talking about people, locations, and ideas without ever mentioning the names of these people, locations, and ideas.  Another problem is that some words in full text could have different meanings based on context.  Such problems lead to a lot of noise in the search results and would definitely confuse and waste the time of the searcher.  We see this very clearly on Internet Search Engines; results are almost all the time a lot more (with insignificant articles) than what we are looking for!

 

A thesaurus based search would never lead to insignificant results in this manner.  Only relevant documents are returned in the search results since the results are based on well-defined subject headings based on canonized keywords!  Moreover, the search process is always guided in steps in order to show the searcher all the related terms and restrictions before the search process takes place.

 

Another important aspect of thesaurus based documentation is guiding a searcher with little knowledge about the sought subject.  The thesaurus offers the searcher help in finding the used keywords (keywords are one or more regular words); for example, the searcher will find all used keywords that has the word 'political' in it such as "political reform', 'political war', 'political assembly', 'American Political Science', etc.  The thesaurus will also point out the related keywords in order for the searcher to be able to cover related material in his or her search.

 

On the other hand, a thesaurus comes in handy during the documentation process.  Misspelled keywords, inexistent keywords, fragmented keywords, etc. are not allowed to be entered at will.  Descriptive keys will have to belong to the thesaurus; otherwise, noise will start crawling into the data by using different keywords for same subjects!  A thesaurus will definitely accept new keywords; this process must go through a thesaurus expert who will enter the new keyword with its relations to other existent keywords.

 

In short, for accurate documentation, search, and retrieval a thesaurus is essential.  For approximate documentation, full text is acceptable.

Articles

The following articles will also shed some lights on the importance of using a thesaurus.

 

ü      http://www.dlib.org/dlib/november98/11batty.html

ü      http://www.ariadne.ac.uk/issue23/metadata/

 

Note

ü      The images and tables used to the right are taken from this document http://www.willpower.demon.co.uk/thesprin.htm