This book constitutes the refereed proceedings of the 5th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2004, held in Seoul, Korea in February 2004. The 74 revised papers presented together with 4 invited contributions were carefully reviewed and selected from 129 submissions. The papers are organized in sections on computational linguistics formalisms, semantics and dialogue, syntax and parsing, lexical analysis, named entity recognition, anaphora resolution, lexicon and corpus. Bilingual resources, machine translation, natural language generation, HCI applications, speech recognition, indexing, information retrieval, question answering and sentence retrieval, browsing, filtering, information extraction, text categorization, document clustering, summarization, and language identification.
Computational Linguistics.- Towards an LFG Syntax-Semantics Interface for Frame Semantics Annotation.- Projections from Morphology to Syntax in the Korean Resource Grammar: Implementing Typed Feature Structures.- A Systemic-Functional Approach to Japanese Text Understanding.- Building and Using a Russian Resource Grammar in GF.- An Application of a Semantic Framework for the Analysis of Chinese Sentences.- A Modal Logic Framework for Human-Computer Spoken Interaction.- Agents Interpreting Imperative Sentences.- Intention Retrieval with a Socially-Supported Belief System.- Extracting Domain Knowledge for Dialogue Model Adaptation.- A Probabilistic Chart Parser Implemented with an Evolutionary Algorithm.- Probabilistic Shift-Reduce Parsing Model Using Rich Contextual Information.- Evaluation of Feature Combination for Effective Structural Disambiguation.- Parsing Incomplete Sentences Revisited.- Unlexicalized Dependency Parser for Variable Word Order Languages Based on Local Contextual Pattern.- A Cascaded Syntactic Analyser for Basque.- An Analysis of Sentence Boundary Detection Systems for English and Portuguese Documents.- Towards Language-Independent Sentence Boundary Detection.- Korean Compound Noun Decomposition Using Syllabic Information Only.- Learning Named Entity Classifiers Using Support Vector Machines.- An Internet-Based Method for Verification of Extracted Proper Names.- Boundary Correction of Protein Names Adapting Heuristic Rules.- Word Sense Disambiguation Based on Weight Distribution Model with Multiword Expression.- Combining EWN and Sense-Untagged Corpus for WSD.- Feature Selection for Chinese Character Sense Discrimination.- The Role of Temporal Expressions in Word Sense Disambiguation.- An Empirical Study on Pronoun Resolution in Chinese.- Language-Independent Methods for Compiling Monolingual Lexical Data.- Getting One's First Million ...Collocations.- Automatic Syntactic Analysis for Detection of Word Combinations.- A Small System Storing Spanish Collocations.- A Semi-automatic Tree Annotating Workbench for Building a Korean Treebank.- Extracting Semantic Categories of Nouns for Syntactic Disambiguation from Human-Oriented Explanatory Dictionaries.- Hierarchies Measuring Qualitative Variables.- Substring Alignment Using Suffix Trees.- Exploiting Hidden Meanings: Using Bilingual Text for Monolingual Annotation.- Acquisition of Word Translations Using Local Focus-Based Learning in Ainu-Japanese Parallel Corpora.- Sentence Alignment for Spanish-Basque Bitexts: Word Correspondences vs. Markup Similarity.- Two-Level Alignment by Words and Phrases Based on Syntactic Information.- Exploiting a Mono-bilingual Dictionary for English-Korean Translation Selection and Sense Disambiguation.- Source Language Effect on Translating Korean Honorifics.- An Algorithm for Determining DingYu Structural Particle Using Grammar Knowledge and Statistical Information.- Generating Natural Word Orders in a Semi-free Word Order Language: Treebank-Based Linearization Preferences for German.- Guideline for Developing a Software Life Cycle Process in Natural Language Generation Projects.- A Plug and Play Spoken Dialogue Interface for Smart Environments.- Evaluation of Japanese Dialogue Processing Method Based on Similarity Measure Using tf* AoI.- Towards Programming in Everyday Language: A Case for Email Management.- Specifying Affect and Emotion for Expressive Speech Synthesis.- Overcoming the Sparseness Problem of Spoken Language Corpora Using Other Large Corpora of Distinct Characteristics.- A Syllabification Algorithm for Spanish.- Experiments on the Construction of a Phonetically Balanced Corpus from the Web.- Intelligent Text Processing.- Head/Modifier Frames for Information Retrieval.- Performance Analysis of Semantic Indexing in Text Retrieval.- A Model for Extracting Keywords of Document Using Term Frequency and Distribution.- A Combining Approach to Automatic Keyphrases Indexing for Chinese News Documents.- Challenges in the Interaction of Information Retrieval and Natural Language Processing.- The Challenge of Creative Information Retrieval.- Using T-Ret System to Improve Incident Report Retrieval.- Spanish Question Answering Evaluation.- Comparative Analysis of Term Distributions in a Sentence and in a Document for Sentence Retrieval.- Contextual Exploration of Text Collections.- Automatic Classification and Skimming of Articles in a News Video Using Korean Closed-Caption.- A Framework for Evaluation of Information Filtering Techniques in an Adaptive Recommender System.- Lexical Chains versus Keywords for Topic Tracking.- Filtering Very Similar Text Documents: A Case Study.- Using Information Extraction to Build a Directory of Conference Announcements.- Unsupervised Event Extraction from Biomedical Text Based on Event and Pattern Information.- Thai Syllable-Based Information Extraction Using Hidden Markov Models.- The Impact of Enriched Linguistic Annotation on the Performance of Extracting Relation Triples.- An kNN Model-Based Approach and Its Application in Text Categorization.- Automatic Learning Features Using Bootstrapping for Text Categorization.- Recomputation of Class Relevance Scores for Improving Text Classification.- Raising High-Degree Overlapped Character Bigrams into Trigrams for Dimensionality Reduction in Chinese Text Categorization.- Information Retrieval and Text Categorization with Semantic Indexing.- Sampling and Feature Selection in a Genetic Algorithm for Document Clustering.- A New Efficient Clustering Algorithm for Organizing Dynamic Data Collection.- Domain-Informed Topic Detection.- Assessing the Impact of Lexical Chain Scoring Methods and Sentence Extraction Schemes on Summarization.- A Term Weighting Method Based on Lexical Chain for Automatic Summarization.- Centroid-Based Language Identification Using Letter Feature Set.