I am asked why I lay such stress on the Interesting Words engine.
This page will explain why.
Warning!
Do not be fooled by the Indexer. Powerful though it is, it is but a cheap front-end for the real Thing – the Interesting Words engine.
Other firms spend money buying and then giving away promotional items such as coffee mugs, ballpoint pens, and baseball caps.
Don’t lie to me.
You have them in your office, too!
Me? I’m interested in getting the message out through IDEAS, and doing it in a manner that costs both of us nothing.
That’s why Indexer is available for you as a free download .
It’s my coffee mug, ballpoint pen and baseball cap all rolled into one, and it is much more useful.
Plus, you can legally share it with your colleagues and friends.
Let’s Start with The Indexer
You have downloaded the Indexer and run it on a 30-page technical or commercial document.
In 30 seconds it has removed 4 to 6 hours work from your shoulders.
You don’t believe me?
Take a look at the {XE} fields now present in your document. (Choose Tools, Options, View, Field Codes). Count the number of field codes in your document (or pick a typical page and multiply by the number of pages). Don’t kid yourself; over the long haul of four or more hours you’d be lucky to insert one field per minute. Say 20 fields per page for 30 pages, 600 minutes. Do the math.
Now compare that to 30 seconds.
And you saw that being done seconds ago.
Try it with another document.
Nothing up my sleeve!
It’s that good.
Let’s Take a Look At the Concordance Table
You’ll need to find this, but it’s not hard.
You are looking for a file called “Indxr.059” and it will be in your Documents And settings area, under Win7 a path like this: “C:\Users\ChrisC\AppData\Roaming\Greaves\Indxr”
(The file can be opened safely in Microsoft Word.)
You see a two-column table of Interesting Words.
(If you wanted too you could maintain this list as a separate document over a period of several weeks and then use it as the input to Microsoft Word’s indexing)
This shows you that Interesting Words can be extracted and stored for other uses.
Let’s Take a Look At the Rules Table
You’ll need to find this too, but it’s not hard.
It’s in the same place as the concordance table, and it has a name like “IndxrRules01.doc”.
Take a look at the default rules.
The default rules say “Words between 4 and 24 characters in length, that start with a capital letter …” (so far we have identified what are probably proper nouns “ … and are NOT found in any of these three noise-word files” (you can inspect those files, too; same place) “ … and do not contain embedded digits” (that takes care of part-numbers!).
There are more rules not shown, but you get the idea.
Précis Generator
Given that the Interesting Words engine can extract Interesting Words – based on YOUR rules, and given that the Interesting Words engine can extract Interesting Words from any chunk of text …
… start thinking what you might do if you extracted, and counted, the Interesting Words from the first sentence of each paragraph of a document.
You could quantify each First Sentence, and then collect, in sequence, those First Sentences with the highest scores.
That ought to be a good first approximation for a précis.
(I assume that your First Sentence conveys the idea and the rest of the paragraph merely amplifies the First Sentence)
Alternatively you might quantify the entire paragraph.
Alternatively you might quantify each sentence in each paragraph.
Alternatively you might quantify each paragraph that is (or is not) a Heading-style paragraph.
And so on. You get the idea.
Now suppose I allow you to specify how the précis-generator works – by a set of rules which you modify.
Now you have a rules-based précis-generator which is happily based on a rules-based Interesting Words engine.
TrailBlazer
Another great freebie from www.ChrisGreaves.com
You select a word and TrailBlazer blazes a trail of hyperlinks throughout your document.
How do you determine which words to select?
Well, you could use the Indexer, with it’s Interesting Words engine, to identify Interesting Words and then tell TrailBlazer to go ahead and blaze a trail for every Interesting Words.
Think
Suppose you are faced with massive amounts of text with no heading-style paragraphs.
Indeed, suppose you are faced with massive amounts of text with NO heading paragraphs.
As we outlined for the Précis Generator, you can quantify each paragraph in the document, using the Interesting Words engine.
That done, you can assign a level to each paragraph by condensing the range of quantities (of Interesting Words) to a grade system, much as is done in some modern educational institutions, where a mark of 80% or more is a “A” grade, 70% or more is a “B”, and so on.
In our case we will assign one of four levels – 1,2,3,4 – to each paragraph.
Then we simply work our backwards, from the end of the document to the front, and where we detect a change in grade (or “level”), we know we must insert a heading paragraph (text) and assign it an appropriate style (“Heading 1”, “Heading 2” and so on).
But from whence the heading paragraph’s text?
Well, you know those Interesting Words we can dig out of any chunk of text …?
It’s a good start.
CodeText
Now we are going to take a mighty leap of faith. Well, maybe not so mighty …
CodeText in manual mode allows the operator to select a chunk of text and store it in a database, replacing the selected text with a link to the text in the database. We collapse a set of documents to a set of skeletons with links to common phrases.
What if we could identify all the common phrases across all the documents automatically?
We could do that by tagging each paragraph in each document with the set of UNIQUE Interesting Words (or if you prefer a finer-tuned application , “with the set of UNIQUE Interesting Words”).
We quantify our results and bring to the attention of the operator the most popular phrases as prime candidates for inclusion in the database.
No Wait! There’s More!
Because we have automated this process we can automatically select the text for the operator; all the operator need do is confirm, by mouse-click or shortcut key, that the phrase should be processed.
No more “Oops! I misjudged the selection” events.
No Wait! There’s More!
(You knew this was coming)
Because we have automated this process we can process the chosen text in every document in which we found it during our analysis phase. All the operator need do is confirm, by mouse-click or shortcut key, that the phrase should be processed across all documents.
About Clichés
“A sentence or phrase, usually expressing a popular or common thought or idea”.
Clichés like “No Wait! There’s More!” are hackneyed, but in this case, justified.
If you are interested in implementing a novel idea on Knowledge Management, if you have been put in charge of a vast array of documents, if your staff and colleagues spend a great deal of their time fabricating documents, then perhaps you should contact me (416-621-9348) for a discussion of how we can work together to implement a solution to your problems

