Trillium Tay's profile

Learning Machine Learning (Recommendation System)

Learning outcomes
• Learnt about Python's pandas and numpy modules
• Using Python's sklearn.linear_model module for Logistic Regression, sklearn.decomposition for TruncatedSVD, sklearn.neighbors for NearestNeighbors and sklearn.metrics for classification_report
Task 1 (01_02): Recommending places based on popularity (number of ratings).

1. According to data, the place with the ID of 135085 (Tortas) has the highest number of ratings, followed by 4 other places
2. Following codes are to list the cuisines of the top 5 places in number of ratings 
Task 2 (01_03): Recommending similar places using PearsonR value of other user ratings and the cuisine type of both places.

1. As Tortas (ID 135085) has the highest number of ratings, following codes are to find other places similar in rating and same in cuisine as Tortas, to recommend to the user 

2. Steps used are, to create a pivot table, get ratings of Tortas by each user, and get PearsonR value of similarity to other places by user rating.

3. Look for the top places with the highest PearsonR value, and take the placeId of values that has at least 10 ratings, and where pearsonR values are not equal to 1.0 
• Because 1.0 values most likely means there is only 1 person who rated both Tortas and the stated place the exact same value, it would not be reliable enough to recommend the stated place to a Tortas rater (user)

4. Take the placeIds from the result of Step 3, and look for their cuisine served. Find a place that serves the same type of cuisine as Tortas (Id 135046, Restaurante El Reyecito), and that would be relatively safe to recommend to the user.

Task 3 (02_01): Using Logistic Regression (using predictor variables to get a numeric categorical outcome) to decide whether to recommend to a user your bank's special term deposit offer.

1. Using "bank_full_w_dummy_vars.csv" as training data, use 19 other binary variables to decide whether a user should be recommended the deposit offer (y_binary column)
Task 4 (02_02): Using TruncatedSVD to remove redundant null values from original matrix, and use PearsonR to recommend a similar movie by user ratings.

1. Like in task 2 (01_03), make a Utility Matrix of user Id, movieName and rating value
2. Transpose the matrix, and then use TruncatedSVD to reduce to 12 columns of user ratings
3. use corrcoef to get a PearsonR table of size 1664 x 1664 based on the truncated matrix
4. Use 1664 x 1664 matrix and movie index to get a specific column for a movie, and use PearsonR value to find similar movies to be recommended 
Task 5 (02_03): Using Nearest Neighbor, recommend a similar movie by movie content.

1. Read dataset, and compare test values to the 4 columns (mpg, disp, hp, and wt) to find a row/index in the dataset (22) that the test value is most similar to
Task 6 (02_04): Similar to Task 3 (02_01), use Logistic Regression to predict the already existing data set, then use classification report to gain insight on the Precision and Recall of the model (87% and 89% respectively in this scenario)
• Precision: Of the recommended Items, what percentage did a user like (How relevant were the recommendations)
• Recall: Of a user's liked items, what percentage were recommended (How completely did the recommender predict the items that I like)
Learning Machine Learning (Recommendation System)
Published:

Learning Machine Learning (Recommendation System)

Published:

Tools

Creative Fields