Wine Recommendation Using the Aroma Wheel

· 385 words · 2 minute read

In this project, I took a modified version of the famous aroma wheel and used word-count to represent the characteristics of French wine. Don’t know how to choose a bottle of wine? Check out your favorite aroma!

Question 🔗

How can we describe the flavor of wine using huge amount of review data and make recommendation for consumers?

Quick Summary 🔗

This project used aroma word counts to represent the characteristics of wine from different French regions. A system can recommend wine with similar aroma to customers (e.g. You may also want to try the wine from a particular region because 30% of the wine there was described using the term “apple”). A quick project walkthrough presentation can be found here. Below is an introduction to the methodology used.

Dataset 🔗

The dataset used for this project contains more than 130,000 wine reviews from the Wine Enthusiast website. The data is preprocessed and put on Kaggle. Besides the reviews written by sommeliers regarding different wines, data columns like country, designation, points, price, province, region, title, variety, winery, etc., are presented.

The Aroma Wheel 🔗

Aroma words, taken from a modified version of the wine aroma wheel (Nobel et al., 1987), were used to describe wine from different regions of France. Here are some example words from the aroma wheel: iris, jasmine, rose (flower), apple, pear, peach (tree fruit), grass, tomato (vegetable), butter, cream, and lager (microbial).

Use Aroma Word Count To Represent Wine Flavor 🔗

In this project, I used wine from different regions in France to demonstrate how a group of wine can be represented using aroma words. Detailed steps follow:

  1. All wine reviews were tokenized, with stop words and punctuation removed.
  2. Aroma terms were extracted from wine reviews and counted per region.
  3. Each aroma term count was divided by total aroma word count by region. The wine from any region can thus be represented with AromaA 5%, AromaB 3%, etc., adding to 100%.
  4. The similarity of wine from different regions can be calculated using Euclidean distance between aroma-term-count percentage and thus recommendations can be made (e.g. wine with similar or different aroma).

Reference 🔗

Noble, A. C., R. A. Arnold, J. Bluechsenstein, E. J. Leach, J. O. Schmidt, and P. M. Stern. 1987. Modification of a standardized system of wine aroma terminology. American Journal of Enology and Viticulture 38:143–46.

comments powered by Disqus