Improvements to recommendation systems is a low-hanging fruit that would not only ensure that customers have a high repeat rate but also improve customer experience.
Recommendation systems are one of the primary ways in which e-commerce websites tend to generate repeat purchases, that is, getting a purchase from an already registered customer. Repeat purchase is one of the key metrics that e-commerce websites measure, since a repeat customer means less money spent on marketing to get him/her to make a purchase.
Recommendation systems are not a new technology. These have been around for as long as there has been online shopping. The most talked about recommended system is that of Amazon. The technology and its working are visible through sections like Recommended For You (shown in the image on next page), Frequently Bought Together or Based On Your Previous Purchases Following Is Recommended For You, among others.
Although the system has been around for a while, it seems to have not evolved with time. By evolution, I mean changes in recommendation systems have been largely pushed back. With this article, I wish to discuss what can be done on current recommendation systems within the framework of already existing algorithms.
How recommendation systems work
Recommendation systems have different algorithms. A simple recommendation system looks for users who have made the same purchases as you and rated these items similarly. The system then removes already-purchased items from the list of other items these similar customers purchased and rated highly. Rest of the items are then recommended to you. Two popular versions of these algorithms are collaborative filtering and cluster models.
Collaborative filtering
A collaborative filtering algorithm represents a customer as an n-dimensional vector (n being the number of item types that can be considered distinct). Vector components are positive for positively-rated items and negative for negatively-rated ones.
To give lesser weight to already popular items, the algorithm multiplies the inverse of the number of customers who have rated/purchased an item with vector components. The algorithm then finds similar customers using cosine similarity. Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. Hence, in an n-dimensional vector, cosine of the angle between vectors represents similarity.
The algorithm computes cosq as dot product of vectors or product of their magnitudes. When cosq approaches 1, vectors are closer in direction, the more similar the contents are. And when cosq approaches -1, contents are distinct.
The algorithm gives recommendations based on similarity between customers. Amazon uses collaborative filtering, but similarity is not found between users but between items. So, the system looks for items that are bought together with the item that you have chosen. Amazon maintains a similar items table that it generates offline for this purpose.
Cluster models
Cluster models divide users into pre-defined segments and use classification (of users to these segments) for recommendations. Classification is a specialised statistics field, wherein on the basis of certain rules items are classified. Rules can be pre-defined or developed using machine learning. Spam filters are the most common example of classification.
For recommendation systems, the system begins with certain segments by using either clustering algorithm or assisted learning with machine learning—examples of different customers that belong to a segment are used as an example for machine learning algorithms to learn from and then do future classification based on the understanding.
The latter technique uses pre-defined segments. Newer segments are added/merged on the basis of density of each segment in terms of customers added. The idea is to have not too many segments (leading to very few customers in each) or too few segments (wherein segments in itself are general/vague).
Cluster models group numerous customers together in a segment, match a user to the segment and then consider all customers in that segment—similar customers for the purpose of making recommendations.
Search-based methods
This is an algorithm that is not really an algorithm. It solves the problem of recommendations through search for related items. IMDB most likely uses this model (although, it tends to consider user ratings, too. Most likely, it uses a mix of collaborative filtering and search-based methods.).
As per this model, if I (as a registered member) highly rate a movie like The Godfather, the system would suggest movies with similar actors (Marlon Brando as the lead) or genres (crime/gangster). The system makes a search query on inverted indexes where most-likely tags (of genre) are associated with documents.
This is a much more precise form of free-text search, similar to what search engines use. This method suffers from the flaw that, on a number of occasions, suggestions are too broad (on the basis of genre or popularity).
Possibilities with current recommendation systems
Most recommendation systems are built after years of research, and continuation of the algorithm is a proof of their working. However, there are issues pertaining to these systems that need to be addressed.
One major problem is lack of communication between different systems. By this, I mean that most customers have distinct accounts on various platforms. These tend to depict n-dimensional depth of a user on various platforms.
For example, I enjoy watching movies and reading books. I have accounts on IMDB and Goodreads, respectively, for recommendations. These, however, are not synced with my recommendations on Amazon. So ideally, if I add a book to my To Read list on Goodreads, it should reflect as recommendations on Amazon, since Amazon owns Goodreads, too.
Similarly, if I read a book that I did not buy from Amazon, my Read list on Goodreads should be a hint to Amazon and should be removed from suggestions. And a movie that I added to my Watchlist on IMDB should show up as a suggestion on my Amazon Prime Account, and so on.
Websites that list things and classify these into various tags can be of great use in terms of suggestions. Complex items like books and movies tend to suffer from the problem of classification. While most movies/books fall under multiple genres/sub-genres, their classification and finding similar items in such categories is often problematic.
The problem arises because most similar items are loosely coupled in terms of their similarity. For example, on IMDB, people who are recommended The Lord of the Rings are also recommended Pulp Fiction. For any movie buff, it would be understandable that these two movies are extremely different from each other. IMDB tends to recommend these together because of their popularity, as both are part of Top 200 Movies list. It also recommends movies based on tags allocated/stars of the movie and the like.
While all of this works, to improve it, we need to dig deeper. Most database websites like IMDB or Goodreads allow users to tag movies/books as per their understanding. For example, 2001: A Space Odyssey (the movie) is labelled with terms like human versus computer, spaceship, spaceship setting, artificial intelligence, etc by users on IMDB (and not system-generated tags).
Such keywords are the best description of items like books/movies. These define the user’s understanding of a book/movie and are much finer than genre specifications. Using natural language processing tools like WordNet, tags can be merged and can be powerful in terms of finding similar content-based items. For instance, WordNet provides relationships in between words and phrases (vertical and horizontal). It is a great tool for understanding content.
Another layer that can be added to recommendations is social network. Now, let’s say my mother’s birthday is coming up. The recommendation system not only reminds me of the same but also suggests items from her wishlist. Would this be compelling enough for me to make a purchase that I know my mother would love? This would help lessen the impact of the cold start problem.
Cold Start is a problem that recommendation systems suffer from. This is because when a user has lesser number of purchases/interactions, the recommendation system suggests most popular suggestions. These could either be too general or not compelling enough for the user to browse through.
By adding a social layer, or deriving data from other sources (as suggested earlier, IMDB and Goodreads are both subsidiaries of Amazon), the system can have a better knowledge about the customer (in terms of likes, dislikes, etc). None of the e-commerce ventures, however, invest in social networks (except for Amazon – Goodreads). They are largely dependent on ads on such networks.
Improvements to recommendation systems is a low-hanging fruit that would not only ensure that customers have a high repeat rate but also improve customer experience. To take an analogy from the offline world, a smart shopkeeper through his keen observation is aware which customer will make a purchase (and what item would interest him/her). This is the shopkeeper’s way of generating loyalty, which is non-scalable.
This is similar for online recommendation systems (except for the scalability bit). Here, the customer displays loyalty because he/she has had a great experience on the website. The experience begins with relevant results in recommendations, which leads to purchase, thereby leading the customer to choose the platform over competition.
On a parting note, I feel recommendation systems have a lot of potential. Organisations must keep evolving their systems into smarter systems.
Tamanjit Bindra is an Army Institute of Technology alumnus. Most of his 10+ years of experience in the Internet industry has been around upcoming and open source technologies. He loves working on NLP and free text search-based projects