Interesting Words (Home); Proof-of-Concept ; The Interesting Words Engine ; Why is the Interesting Words Engine so Important?
The Interesting Words Engine
We have seen one use of The Interesting Words Engine – to produce a front-end called Indxr, itself a useful end-user tool for indexing documents.
The developer with the resources of The Interesting Words Engine can produce myriad schemes for the organization.
A few are listed here.
Précis Generator
Microsoft Word provides a crude Précis Generator (Tools, Auto summarize) with very little in the way of adaptability.
A much better Précis Generator can be written using The Interesting Words Engine.
Remember that The Interesting Words Engine processes a chunk of text. That chunk can be a sentence, a paragraph, a section or an entire document.
In particular the developer can single out the first sentence (the message-carrier) of any paragraph, and extract the Interesting Words from each first sentence throughout the document.
For each chunk of text we can count the number of Interesting Words, the number of Unique Interesting Words and, of course, the length in words and the length of the chunk in characters.
We can ask that only chunks matching a specific set of styles be examined.
A simple statistical sieve allows us to obtain and rank chunks of text. The user can nominate how many chunks (typically 5) are to appear in the Précis, and there you have it, a powerful Précis Generator that exceeds Microsoft's expectations, based on The Interesting Words Engine.
Sensitive Documents
Microsoft Word provides no mechanism for de-sensitizing documents.
What makes a document sensitive? It contains words of interest to your competitors!
If we can detect Interesting Words (by your competitor's definition), then we can de-sensitize the document (before issuing it to an outside third-party or consultant).
Here's another clue: Any word that fails a spell-check (in a completed document) is probably a sensitive word.
Obfuscation: The de-sensitizing pattern is once again left as a user adaptability – remove the word, replace the word's characters with a special character, replace the word's characters with a jumbled string, let the replacement string be a random length within constraints but not equal in length to the original word.
Generating Novel Text from Thin Air
The client received unsolicited text across the internet. The text arrived from any source – pharmaceutical, biotechnology, the world of investments, aerospace, and so on.
The text was all too often poorly-developed with no headings – those stubby paragraphs styled as 'Heading 1", "Heading 2", "Heading 3" and so on.
The challenge was to automate the task of producing heading paragraphs, each one at an appropriate level.
Using Interesting Words the task is easy!
Remember that we can analyze each paragraph and produce a count of Unique Interesting Words.
The count for a series of paragraphs might read 17, 3, 10, 4, 10, 15, 14, 5, 5, 3, 5, 9, 10, 3, 5 and so on.
We allow the user to assign a heading level number to each range of counts.
Paragraphs containing 12 or more Interesting Words might be classified as "Heading 1" level paragraphs; paragraphs containing 10 or 11 Interesting Words might be classified as "Heading 2" level paragraphs; paragraphs containing 8 or 9 Interesting Words might be classified as "Heading 3" level paragraphs while paragraphs containing 6 or 7 Interesting Words might be classified as "Heading 4" level paragraphs. All other paragraphs can be classified as level 5 paragraphs.
Reading the counts "17, 3, 10, 4, 10, 15, 14, 5, 5, 3, 5, 9, 10, 3, 5" backwards we can detect breaks in sequence and recognize a position for a heading. For example, after the sequence "5, 5, 3, 5" we see a "9", and figure that immediately preceding the "9" paragraph would be a good place to insert a stubby "Heading 3" paragraph.
So far so good; we can see where we would like to place a heading paragraph, and we know its level, but where's the text?
The Interesting Words engine provides the text. We can emit, as a paragraph, all the Interesting Words of the paragraph, in sequence, or sorted by length of word, or just the unique words, and we are then in a position to do one of the following
Build a Table of Contents for the document
Generate a PowerPoint presentation from the document.
Turn the document over to a technical writer
Title Generator
The title for a document (File, Properties, Summary can readily be generated automatically where the Title property is found to be empty.
Key Words Generator
The keywords of a document (File, Properties, Summary can readily be generated automatically where the Keywords list is found to be empty.
Wbwrd – the Web Site Compiler
An application to generate and upload a web site from a set of hyper linked Microsoft Word documents can load the meta tags for description, Title and keywords by default.
While it is rumored that some major search engines ignore the keywords for ranking purposes, the meta tags are employed in the brief display that identifies each "hit", as shown in the screenshot above.
Other Uses
Applications based on Interesting Words are unlimited in number because, as cognitive beings, we seek order out of chaos, and to that end require information from data.
When the data is words, the top question will usually be "What is so interesting about these words?".
Loading
Toronto and Mississauga, Thursday, April 14, 2011 11:51 AM
Copyright © 1996-2011 Chris Greaves. All Rights Reserved.