About Text Mining
VAL |
|
Offline

Мэтр, проФАН любви... proFAN of love
    
Профиль
Группа: Администраторы
Сообщений: 38059
Пользователь №: 1
Регистрация: 6.03.2004

|
QUOTE | Text mining is the process of analyzing collections of textual materials in order to capture key concepts and themes and uncover hidden relationships and trends without requiring that you know the precise words or terms that authors have used to express those concepts. Although they are quite different, text mining is sometimes confused with information retrieval. While the accurate retrieval and storage of information is an enormous challenge, the extraction and management of quality content, terminology, and relationships contained within the information are crucial and critical processes. |
QUOTE | Text Mining and Data Mining
For each article of text, linguistic-based text mining returns an index of concepts, as well as information about those concepts. This distilled, structured information can be combined with other data sources to address questions such as:
Which concepts occur together? What else are they linked to? What higher level categories can be made from extracted information? What do the concepts or categories predict? How do the concepts or categories predict behavior?
Combining text mining with data mining offers greater insight than is available from either structured or unstructured data alone. This process typically includes the following steps:
Identify the text to be mined. Prepare the text for mining. If the text exists in multiple files, save the files to a single location. For databases, determine the field containing the text. Mine the text and extract structured data. Apply the text mining algorithms to the source text. Build concept and category models. Identify the key concepts and/or create categories. The number of concepts returned from the unstructured data is typically very large. Identify the best concepts and categories for scoring. Analyze the structured data. Employ traditional data mining techniques, such as clustering, classification, and predictive modeling, to discover relationships between the concepts. Merge the extracted concepts with other structured data to predict future behavior based on the concepts. |
--------------------
|
|
|
VAL |
|
Offline

Мэтр, проФАН любви... proFAN of love
    
Профиль
Группа: Администраторы
Сообщений: 38059
Пользователь №: 1
Регистрация: 6.03.2004

|
QUOTE | Text Analysis and Categorization
Text analysis, a form of qualitative analysis, is the extraction of useful information from text so that the key ideas or concepts contained within this text can be grouped into an appropriate number of categories. Text analysis can be performed on all types and lengths of text, although the approach to the analysis will vary somewhat.
Shorter records or documents are most easily categorized, since they are not as complex and usually contain fewer ambiguous words and responses. For example, with short, open-ended survey questions, if we ask people to name their three favorite vacation activities, we might expect to see many short answers, such as going to the beach, visiting national parks, or doing nothing. Longer, open-ended responses, on the other hand, can be quite complex and very lengthy, especially if respondents are educated, motivated, and have enough time to complete a questionnaire. If we ask people to tell us about their political beliefs in a survey or have a blog feed about politics, we might expect some lengthy comments about all sorts of issues and positions.
The ability to extract key concepts and create insightful categories from these longer text sources in a very short period of time is a key advantage of using IBM® SPSS® Modeler Text Analytics. This advantage is obtained through the combination of automated linguistic and statistical techniques to yield the most reliable results for each stage of the text analysis process. |
--------------------
|
|
|
1 Пользователей читают эту тему (1 Гостей и 0 Скрытых Пользователей)
0 Пользователей: