Natural Language Processing (NLP) involves the task of inferring useful information from text. Most elementary demonstrations of NLP involve sentiment analysis, i.e. categorizing an opinion as positive or negative. In this demonstration I tackle a more sophisticated task, namely aspect-entity extraction.
I’ll show how we can build a machine learning pipeline for analysing customer reviews of restaurants that identifies:
- All entities for which opinions are being expressed, e.g. the restaurant, the food, or the service, along with their specific “target” words or “opinion-term expressions” (OTEs); i.e. the waiter, the roast beef, the wine, the decor.
- The particular aspect of those entities being discussed; i.e. quality, price.
- And the particular sentiment being expressed towards the aspect-entity pair; i.e. positive, negative, or neutral.
In the process, we’ll see how to implement several important NLP techniques, including:
- Text cleaning, contraction expansion, and lemmatization.
- Converting text sequence elements into pretrained embedding vectors.
- Creating Parts-of-Speech (POS) tags.
- Sequence-to-sequence IOB2 tagging for OTE identification.
- Creating an end-to-end inference pipeline.
Our final network will be able to take the following example review sentence…
- “Service was terribly slow and the restaurant was noisy, but the waiter was friendly and the calamari was very delicious.”
…and produce the following output tuples:
- (“service”, SERVICE:GENERAL, negative)
- (“restaurant”, AMBIENCE:GENERAL, negative)
- (“waiter”, SERVICE:GENERAL, positive)
- (“calamari”, FOOD:QUALITY, positive)
Notes:
- This IPython notebook is best viewed using Google Chrome; some images and hyperlinks may not work in Mozilla FireFox.
Check out some of my past projects!
I’ve worked on technical projects in a variety of fields. Here are some highlights: