
|
 |
|
Abstract
| | |
Three functions required for Text Mining
- Concept Extraction from Texts:
information (such as facts, intentions, expectations, claims)
extraction from texts, is performed by applying natural
language processing technology to a specific domain.
We have to not only pick up keywords but also consider synonyms and
word sense ambiguity. We use a synonym dictionary to equate synonymous
words and phrase, such as "PC" and "Personal computer".
An attribute is associated with each word or phrase to distinguish the meaning.
For example, to indicate the meaning of "Washington", we may use
"Washington[person name]" and "Washington[place name]".
Moreover, relationships between nominal and verbal elements are very important when representing the author's intention. For example:
- ThinkPad start up was faster when I installed Win98.
- ThinkPad doesn't start up fast although I installed Win98.
- Dos Thinkpad start up faster if I install Win98?
- ThinkPad start up is fast because I uninstalled Win98.
have the same status when we only pick up keywords. We have to consider the relationship between "start up" and "fast", "install/uninstall" and "Win98".
- Mining: We developed mining technologies to discover unknown knowledge from
collections of concepts that are extracted from documents as described
above. In text mining, we regard a document as a transaction of data,
and regard a concept, which consist of one or more words, as an item.
We can apply data mining technology to the text data, to discover
association rules and so on. In text mining, specialized processing or
retrieval data is necessary. Some concepts may be very important though
they are infrequent. How to recognize such a concept's importance is a
large problem.
- Information Visualization: the information we get via mining is analyzed from various points of view.
The data obtained from the text are:
- word frequency
- relative frequency
- topicality
- area dependence
- time sequence
and so on.
The interactive mining-visualization mechanism helps us to find useful facts from the mining database.
The visualization tool also permits the viewing of the original document for checking mined results.
| |
| |
|