An experiment in automatic indexing with Korean texts: A comparison of syntactico-statistical and manual methods
Seo, Eun-Gyoung
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/21065
Description
Title
An experiment in automatic indexing with Korean texts: A comparison of syntactico-statistical and manual methods
Author(s)
Seo, Eun-Gyoung
Issue Date
1993
Director of Research (if dissertation) or Advisor (if thesis)
Smith, Linda C.
Doctoral Committee Chair(s)
Smith, Linda C.
Committee Member(s)
Allen, Bryce L.
Davis, Charles H.
Department of Study
Library and Information Science
Discipline
Library and Information Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Library Science
Computer Science
Language
eng
Abstract
This study was undertaken in order to develop practical automatic indexing techniques suitable for Korean natural language texts. The study had four purposes: to develop an automatic indexing system for Korean texts, to evaluate the efficiency of the automatic indexing system as compared with a manual indexing system, to compare the effectiveness of weighting algorithms, and to investigate the effect of abstract length.
The basic method of this automatic indexing system was to determine the syntactic category of each text word by dictionary look-up, and then to match sequences of category symbols against a dictionary of acceptable patterns. Sequences of text words that matched one of the patterns in the dictionary were extracted as content identifiers. Finally, the system selected highly ranked content identifiers from each document based on statistical (frequency of occurrence) information.
For this experimental study, the Korean text database was constructed manually based on 100 long abstracts and 200 short abstracts covering business subjects. The study assessed how well the set of index terms produced by an automatic indexing technique reflects the major topics described in an indexed document. For the evaluation, a manual index term list was constructed by consultation between two indexers as an external standard to obtain normalized values.
The experimental results showed that the performance of the automatic syntactico-statistical indexing system was comparable to that of other studies which have compared automatic indexing with manual indexing. The WDF system performed better than the IDF system in terms of the ability to present all the correct content identifiers, and the system produced more correct content identifiers in the short abstract group. As a whole, many significant concepts represented in the abstract and recognized by human indexers have been effectively extracted automatically. The extracted concept forms are for the most part comparable to those of manual indexing. Possible enhancements of the automatic syntactico-statistical indexing system are identified which could lead to improved indexing performance.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.