Lab 7 Solution: Analyze ratings with Crunch
This is the solution page for Lab 7: Analyze ratings with Crunch.
Steps
1. Build and run the Crunch job
cd /home/cloudera/ratings-crunch
git checkout bimodal
mvn clean package
mvn kite:run-tool
2. Look at the code
Read through the source at src/main/java/org/kitesdk/examples/movies/AnalyzeRatings.java
.
Good questions earn prizes.
3. Find an interesting movie
You can use a SQL join query to view the title and the ratings histogram at once, using Hive.
Using beeline
will produce pretty output. Start beeline
in embedded mode:
beeline -u jdbc:hive2://
select m.title, h.histogram from movies as m, ratings_histograms as h where m.id = h.movie_id;
+----------------------------------------------------------------------------------+--------------------+--+
| m.title | h.histogram |
+----------------------------------------------------------------------------------+--------------------+--+
| ... | |
| Bio-Dome (1996) | [16,5,8,1,1] |
| ... | |
+----------------------------------------------------------------------------------+--------------------+--+
50 rows selected (42.249 seconds)
That can’t be right. Bio-Dome is a classic!