来源:Melanie Tosik(Twitter:@meltomene)列出的 NLP 学习资源清单
# Online courses
Stanford CS224d: Deep Learning for Natural Language Processing [更高级的机器学习算法、深度学习和 NLP 的神经网络架构]
Coursera: Introduction to Natural Language Processing [密西根大学的 NLP 课程]
# Libraries and open source
spaCy (website, blog) [Python;新兴的开放源码库并自带炫酷的用法示例、API 文档和演示应用程序]
Natural Language Toolkit (NLTK) (website, book) [Python;NLP 实用编程介绍,主要用于教学目的]
Stanford CoreNLP (website) [由 Java 开发的高质量的自然语言分析工具包]
AllenNLP (website) [Python;基于 PyTorch 的 NLP 研究库]
fastText (website) [C++;高效的文本分类(text classification)和表示学习(representation learning)工具]
# Active blogs
language processing blog (Hal Daumé III)
Language Log (Mark Liberman)
# Books
Speech and Language Processing (Jurafsky and Martin)[经典的 NLP 教科书,涵盖了所有 NLP 的基础知识,第 3 版即将出版]
Foundations of Statistical Natural Language Processing (Manning and Schütze)[更高级的统计 NLP 方法]
Introduction to Information Retrieval (Manning, Raghavan and Schütze)[关于排名/搜索的优秀参考书]
Neural Network Methods in Natural Language Processing (Goldberg)[深入介绍 NLP 的 NN 方法,和相对应的入门书籍]
Linguistic Fundamentals for Natural Language Processing (Bender)[更成功的 NLP 的词法和句法]
Deep Learning (Goodfellow, Courville and Bengio)[很好的深度学习介绍]
# Miscellaneous
Deep Learning for NLP resources [按主题分类的关于深度学习的顶尖资源的概述]
Last Words: Computational Linguistics and Deep Learning — A look at the importance of Natural Language Processing. (Manning)[文章]
Natural Language Understanding with Distributed Representation (Cho)[关于 NLU 的 ML / NN 方法的独立讲义]
Bayesian Inference with Tears (Knight)[教程工作簿]
Association for Computational Linguistics (ACL)[期刊选集]
Natural Language Understanding and Computational Semantics (Bowman)[开源的课程大纲和完整幻灯片]
fast.ai [“Making neural nets uncool again”]
# DIY projects and data sets
Nicolas Iderhoff 已经创建了一份公开、详尽的 NLP 数据集的列表。除了这些,这里还有一些推荐的项目:
Implement a part-of-speech (POS) tagger (词性标注) based on a hidden Markov model (HMM) (隐马尔可夫模型)
Implement the CYK algorithm for parsing context-free grammars
Implement semantic similarity (语义相似度) between two given words in a collection of text, e.g. pointwise mutual information (PMI) (点互信息)
Implement a Naive Bayes classifier (朴素贝叶斯分类器) to filter spam
Implement a spell checker based on edit distances between words
Implement a Markov chain (马尔科夫链) text generator
Implement a topic model using latent Dirichlet allocation (LDA)
Use word2vec to generate word embeddings from a large text corpus, e.g. Wikipedia
Use k-means to cluster tf-idf vectors of text, e.g. news articles
Implement a named-entity recognizer (NER) (命名实体识别) (also called a name tagger), e.g. following the CoNLL-2003 shared task
# NLP on social media
Twitter: #nlproc, list of NLPers (by Jason Baldrige)
Reddit: /r/LanguageTechnology
Medium: NLP