Natural Language Processing (NLP) is one of the most interesting sub-fields of data science, and data scientists are increasingly expected to be able to whip up solutions that involve the exploitation of unstructured text data. Despite this, many applied data scientists (both from STEM and social science backgrounds) lack NLP experience.
In this post I explore some fundamental NLP concepts and show how they can be implemented using the increasingly popular spaCy package in Python. This post is for the absolute NLP beginner, but knowledge of Python is assumed.
spaCy, you say?
spaCy is a relatively new package for “Industrial strength NLP in Python” developed by Matt Honnibal at Explosion AI. It is designed with the applied data scientist in mind, meaning it does not weigh the user down with decisions over what esoteric algorithms to use for common tasks and it’s fast. Incredibly fast (it’s implemented in Cython). If…
View original post 1,037 more words