語料庫語言學(xué)和計(jì)算語言學(xué)為促進(jìn)自然語言處理技術(shù)快速發(fā)展的兩門基礎(chǔ)學(xué)科!队⒄Z語料庫與自動(dòng)語法分析》系這兩個(gè)領(lǐng)域的一本專著,它以國際英語語料庫為背景,著重探討大型語料庫的語法分析,尤其是英語口語材料給計(jì)算機(jī)自動(dòng)處理帶來的一系列難題。書中涉及基于概率的自動(dòng)詞類識(shí)別和基于實(shí)例的自動(dòng)句法分析這兩大技術(shù),并有專門章節(jié)來探討句法分析的評(píng)測問題,對AUTASYS和The Survey Parser這兩個(gè)軟件系統(tǒng)的實(shí)際表現(xiàn)進(jìn)行了深入的量化評(píng)測。此外,本書還探討了介詞短語的自動(dòng)分析,特別是這類短語的句法功能的自動(dòng)判定,并對自動(dòng)語法分析在語音合成及語音識(shí)別中的應(yīng)用做了相應(yīng)的說明。
本書的主要思路就是將已經(jīng)分析過的語料庫變成一個(gè)句法知識(shí)庫,從中提取短語結(jié)構(gòu)語法規(guī)則,并通過基于實(shí)例的手段,在知識(shí)庫中為待分析語句提取一棵最佳句法樹。本書對上述各個(gè)部分的研究進(jìn)行了詳細(xì)的描述,對系統(tǒng)的實(shí)際表現(xiàn)進(jìn)行了深入的量化評(píng)測,并有專門章節(jié)來探討句法分析的評(píng)測問題。除此之外,還探討了介詞短語的自動(dòng)分析,特別是這類短語的句法功能的自動(dòng)判定,因?yàn)檫@一研究和句法相似度分析有著密切的關(guān)系。同時(shí),本書還就自動(dòng)語法分析在語音合成及語音識(shí)別中的應(yīng)用做了相應(yīng)的介紹和說明,希望對讀者能有所幫助。
Preface
前言
List of Figures
List of Tables
Abstract
1. Introduction
1.1. What is Parsing?
1.2. The Introspective View
1.3. The Retrospective View
1.4. Data-Oriented Parsing
1.5. General Problems
1.6. The Proposed Research
1.6.1. Background to the Proposed Research
1.6.2. The Basic Approach of the Proposed Research
1.6.3. The Strengths and Novelties of the Proposed Approach
1.6.3.1. Automated Grammar Generation
1.6.3.2. De-Lexicalised Terminal Nodes
1.6.3.3. Global Parse with Subcategorisation Features
1.6.3.4. High-Quality Partial Parse
1.6.3.5. Intrinsic Ability to Learn
1.7. The Organisation of the Book
2. The Automatic Analysis of English Word Classes
2.1. An Overview of Word Class Tagging
2.2. Major Word Class Tagging Schemes
2.2.1. The Lancaster-Oslo/Bergen Tagging Scheme
2.2.1.1. The Lancaster-Oslo-Bergen Corpus
2.2.1.2. The Lancaster-Oslo-Bergen Tag Set
2.2.1.3. Summary
2.2.2. The International Corpus of English Tagging Scheme
2.2.2.1. The International Corpus of English
2.2.2.2. The International Corpus of English Tag Set
2.2.3. A Comparison of LOB and ICE
2.3. Word Class Tagging Methodologies
2.3.1. The Rule-Based Approach
2.3.2. The Probabilistic Approach
2.4. AUTASYS: A Hybrid Tagging System
2.4.1. A Probabilistic Approach Using the LOB Tag Set
2.4.1.1. The Tag Assignment Module
2.4.1.1.1. Tokenisation
2.4.1.1.2. The treatment of"."
2.4.1.1.3. The treatment of"'"
2.4.1.1.4. Sentence boundary markers
2.4.1.2. Orthographic Analysis
2.4.1.3. Lexicon Lookup
2.4.1.3.1. The lexicon
2.4.1.3.2. The coverage of the lexicon
2.4.1.4. Morphological Analysis
2.4.2. The Idiom Identification Module
2.4.3. The Probabilistic Tag Selection Module
2.4.3.1. The Bigram Probabilistic Matrix
2.4.3.2. Implementing Probabilistic Tag Selection
2.4.4. The Rule-Based Refinement Module
2.4.5. Empirical Evaluation
2.4.6. Permissive AUTASYS-LOB Disagreements
2.4.6.1. NNP-NPT
2.4.6.2. JJ-JJB
2.4.6.3. NNP-NPL
2.4.6.4. RB-NN
2.4.7. Summary
2.5. A Rule-Based Approach towards LOB to ICE Translation
2.5.1. Solutions for Verbs
2.5.1.1. Auxiliary vs. Lexical
2.5.1.2. Monotransitive vs. Complex Transitive
2.5.1.3. Finite vs. Nonfinite
2.5.2. Closed Sets
2.5.3. Initial Results
2.5.4. Problems
2.5.5. Summary
3. The Automatic Induction of a Formal Grammar
4. Robust Practical Analogy-Based Parsing
5. Extensive Evaluations of the Survey Parser
6. The Resolution of Prepositional Phrases
7. Conclusions and Further Work
References
Appendix A: A List of LOB Tags
Appendix B: A List of ICE Tags
Appendix C: A List of AUTASYS Idioms
Appendix D: A List of ICE Parsing Symbols
Appendix E: A List of ICE Prepositions in Descending Frequency Order
Appendix F: A Distributional Profile of ICE-GB Prepositions
Index