《基于記憶的語言處理(英文影印版)》適用于計算語言學、心理語言學學習者和語言工程師。主要探討基于記憶的自然語言處理技術(shù),分為兩部分:基于記憶的機器學習技術(shù)和該技術(shù)在自然語言處理任務(wù)上的應用。本書邏輯清楚,深入淺出,實用性強。跟很多現(xiàn)有的自然語言處理技術(shù)相比,書中介紹的'基于記憶的學習'簡單實用;此外還詳細介紹了作者團隊開發(fā)的基于記憶學習軟件包TIMBL;而且對相關(guān)方法的描述和相關(guān)原理的解釋直觀易懂,即使是剛接觸計算語言學的學習者,也能讀懂本書的內(nèi)容。
《基于記憶的語言處理(英文影印版)》深入淺出,即使是剛接觸自然語言處理、計算語言學和機器學習方面的學習者,也能讀懂本書的內(nèi)容。
Walter Daelemans,比利時安特衛(wèi)普大學教授。Antal van den Bosch,荷蘭蒂爾堡大學教授。
Preface
1 Memory-Based Learning in Natural Language Processing
1.1 Natural language processing as classification
1.2 A linguistic example
1.3 Roadmap and software
1.4 Fiirther reading
2 Inspirations from linguistics and artificial intelligence
2.1 Inspirations from linguistics
2.2 Inspirations from artificial intelligence
2.3 Memory-based language processing literature
2.4 Conclusion
3 Memory and Similarity
3.1 German plural formation
3.2 Similarity metric
3.2. 1 Information-theoretic feature weighting .
3.2.2 Alternative feature weighting methods
3.2.3 Getting started with TiMBL
3.2.4 Feature weighting in TiMBL
3.2.5 Modified value difference metric
3.2.6 Value clustering in TiMBL
3.2.7 Distance-weighted class voting
3.2.8 Distance-weighted class voting in TiMBL
3.3 Analyzing the output of MBLP
3.3.1 Displaying nearest neighbors in TiMBL
3.4 Implementation issues
3.4.1 TiMBL trees
3.5 Methodology
3.5.1 Experimental methodology in TiMBL
3.5.2 Additional performance measures in TiMBL
3.6 Conclusion
4 Application to morpho-phonology
4.1 Phonemization
4.1.1 Memory-based word phonemization
4.1.2 TreeTalk
4.1.3 IGTree in TiMBL
4.1.4 Experiments: applying IGTree to word phonemization
4.1.5 TRIBL: trading memory for speed
4.1.6 TRIBL in TiMBL examples Editing
4.2 Morphological analysis
4.2.1 Dutch morphology
4.2.2 Feature and class encoding
4.2.3 Experiments: MBMA on Dutch wordforms
4.3 Conclusion
5 Application to shallow parsing
5.1 Part-of-speech tagging
5.1.1 Memory-based tagger architecture
5.1.2 Results
5.2 Constituent chunking
5.2.1 Results
5.2.2 Using Mbt and Mbtg for chunking
5.3 Relation finding
5.3.1 Relation finder architecture
5.3.2 Results
5.4 Conclusion
6 Abstraction and generalization
6.1 Lazy versus eager learning
6.1.1 Benchmark language learning tasks
6.1.2 Forgetting by rule induction is harmful in language learning
6.2 Editing
6.3 Why forgetting examples can be harmful
6.4 Generalizing examples
6.4.1 Careful abstraction in memory-based learning
6.4.2 Getting started with FAMBL
6.4.3 Experiments with FAMBL
6.5 Conclusion
6.6 Further reading
7 Extensions
7.1 Wrapped progressive sampling
7.1.1 The wrapped progressive sampling algorithm
7.1.2 Getting started with wrapped progressive sampling
7.1.3 Wrapped progressive sampling results
7.2 Optimizing output sequences
7.2.1 Stacking
7.2.2 Predicting class n-grams
7.2.3 Combining stacking and class n-grams
7.2.4 Summary
7.3 Conclusion
7.4 Further reading
Bibliography
Index