More and more search systems, especially web search engines, incorporate relevancy ranking of results. In a traditional boolean search several large sets can be combined to get one small result, only those documents fitting the specified criteria are kept and would be sorted by date. A full relevancy search can produce very large results because all the results are kept and the results sorted by a formula related to the query you put in.
What the relevancy ranks really mean is that most of the top-ranked documents will be of interest to you, while only a few of the lowest ranked documents will be of interest. Note that this does not bring all the good articles to the top, just a higher percentage.
The relevancy ranking in bibliographic and textual material is usually based on word frequency distribution, the more often the search terms appear, the higher the ranking. This is fine-tuned in any number of ways, such as including factors based on the number of times the word appears in the entire database and in the document, and the proximity of words to each other.
While there are some hybrid systems, generally you no longer have the boolean AND or OR. Instead you define what words have to be present, are important, or must be excluded. You can often truncate. eg:
| Feature | TRELLIS Keyword anywhere |
Alta Vista or Google Basic searches |
EI Compendex Simplified - basic |
|---|---|---|---|
| Word Must be Present | + | + | |
| Word is Important | * | n.a. | n.a. |
| Phrase | "word word" | "word word" | "word word" |
| Not | ! | - | n.a. |
| Truncation | ? | * | * |
| Parentheses | n.a. | n.a. | Use separate query boxes |
| Example | "lake erie" +zebra | "lake erie" +zebra | "lake erie" zebra |
Each record often comes with a percent figure - this is an indication of how well this fits the query.
If you were to work your way through the entire list you would find the first few might be the equivalent of boolean ANDing all the search keys together; further along more terms are dropped until, at the end, you have documents each containing only one of the search keys, as though all the terms had been ORed together.
Pretty much the same except I would leave out long lists of synonyms, that will bring those articles to the top. This often means you will have to repeat the search several times to pick up all synonyms/variant spellings. So rather than
wine and (sulfur or sulphur)
You may need to search:
wine sulfur
Then
wine sulphur
The above is a simple example but it is an illustration of how a complicated search might have to be broken up. A longer search, such as (fuel or fuels or coal or oil or "natural gas" or hydrogen or electric*) and ("greenhouse gas" or carbon dioxide" or methane) could become unmanageable.
These work best with larger documents as opposed to brief citations. Read the instructions / help pages carefully. Each of these systems works a bit differently.
Remember: One way or the other, the results of the search is still based on the occurrence of words in the records and the meanings you attach to them. If you don't ask the right question the computer won't find the answer. This is just a way of sorting the results.
June 20, 2005