My musings about .NET and what not

The Wisdom of Crowds: Implementing a Smart Ranking Algorithm

From blog posts to books, sorting a list of items by user rating is a very common task. In this blog post, we discover how the writings of an obscure 18th-century British minister lets us implement a “smart” ranking algorithm that takes into account the “wisdom of the crowd”, and presents a fairer and more accurate representation of results.


The Rating Problem

Let’s pretend for the moment that you’ve hankerin’ to pick up a few fresh tips on ASP.NET 3.5. You visit a bookstore and type “ASP.NET 3.5” into the search box. You obtain the results shown in Figure 1.1

screenshot 1

Figure 1

This site, like many others, allows users to rate products on a scale of 1 to 5. The average customer rating is displayed when browsing these products. (I’ve included both image-based and numerical displays for the purposes of this discussion). By sorting the results in order of customer rating as shown, you hope to discover which books are most popular among your peers.2

Examining Figure 1, it would appear that ASP.NET 3.5 Website Programming: Problem - Design – Solution is the most popular among these books, with a perfect 5-star rating; and that Pro ASP.NET 3.5 in C# 2008 is the least popular among the rated books, with but a single star to call its own. But, as I’m sure you already suspect, the average rating alone doesn’t tell us the whole story.

Of course, e-commerce sites understand this too. That’s why you’ll almost always find, in addition to the average customer rating, the number of customers who contributed to that rating, as shown in Figure 2.

image

Figure 2

Leveraging the Wisdom of the Crowd

Knowing the number of times an item has been rated makes the results much more meaningful to us – but have you stopped to consider exactly why this is so? Why do we lend more weight to a rating when an item has been rated by more individuals? This may appear to the casual observer as the manifestation of a “bandwagon effect” of some kind, but that’s not really the case. As it turns out, this is the result of an intuitive mental calculation that mirrors an actual real-word phenomenon - a true example of crowd intelligence in action. What’s more, as we shall soon see, this phenomenon can be modeled in software.

We see from Figure 2 that ASP.NET 3.5 Website Programming: Problem - Design – Solution has the highest average customer rating, and therefore resides at the top of the list. However, since it has only been rated once, we are less inclined to trust this result. Intuitively, we understand that with a small number of ratings, the probability of an anomalous rating (an “outlier” in statistical terms) having a significant impact on the average is greatly increased. In other words, a small number of ratings increases the chance that the overall average rating may be “inaccurate” – not arithmetically inaccurate of course, but inaccurate in the sense that the data doesn’t reflect a true consensus of typical, unbiased readers.

Now take a look at some of the other results. We see that ASP.NET 3.5 Unleashed appears lower on the list by virtue of its lower average rating of 4.8. Nevertheless, we instinctively lend much more weight to its rating data, because far more customers have rated it. If we’re laying down our hard-earned money, it would appear that ASP.NET 3.5 Unleashed is the surer bet (all other things being equal, of course). That 4.8 rating may not be the highest on the list, but a rational observer would invariably conclude that this is a more reliable figure, based on the relatively large proportion of the “crowd” which contributed to it.

Including the number of ratings lets us evaluate the average rating data of each individual item not only on its own, but in relation to all the other items. The more times an item has been rated, the more believable the data, and thus the more relative weight we place upon its average rating.

Wouldn’t it be great if we could somehow leverage the intelligence of the crowd, and reflect it in the results? What if we could rank and sort the results based not solely on the raw average rating, but on the believability of the average rating? Such a ranking algorithm would more accurately reflect how we perceive each item’s ratings in relation to each other. Using this algorithm, an item which received a single 5-star rating wouldn’t immediately jump to the top of the list; likewise, an item that received a single poor rating wouldn’t immediately be forced to the bottom. The “good stuff” (items which receive mostly high ratings from a relatively large proportion of the crowd) would appear near the top, while the “not-so-good” (again, as determined by the crowd) would be pushed lower.

Bayes’ Theorem

Thomas_Bayes As it turns out, we can develop such an algorithm with the help of an 18th-century British Presbyterian minister named Thomas Bayes.

Like many learned men of his day, Reverend Bayes harbored a keen interest in mathematics. In an essay discovered and published in 1764 – three years after his death – he presents a special case of what eventually came to be known as Bayes’ theorem. Today, Bayes theorem is widely accepted among most statisticians. Applications of it can be found throughout many fields in science and engineering, from the evaluation of drug test results, to predicting the outcome of elections, to the development of spam-filtering software.

Bayes’ theorem states in part that probabilities are rationally coherent degrees of belief, or a degree of belief in a proposition given a body of well-specified information.3 Therefore, Bayes' theorem can be thought of as a representation of how an ideally rational individual would interpret the believability of an outcome, based on a series of prior outcomes.

Bayes’ theorem can be stated mathematically in terms of how the probability of an outcome A given B, is related to the converse probability of outcome B given A.

Applying Bayes’ Theorem to the Rating Problem

A Bayesian average is a method of calculating the mean of a set of data that is consistent with Bayes’ theorem.

bayesian average

Where:

  • C is a “weighting” constant, proportional to the typical size of the data set
  • m is the arithmetic mean taken over all known data
  • n is the size of the current data set

For our particular purposes, we can calculate the Bayesian average of each of our books by applying the formula above. This lets us derive a rating that is based on the “believability” of the raw data.

BA = ((AvgNumOfRatingsForAll * AvgRatingForAll) + TotalRating) / (RateCount + AvgNumOfRatingsForAll)

Where:

  • RateCount is the number of ratings given to this book
  • TotalRating is the sum of all ratings given to this book
  • AvgNumOfRatingsForAll is the average number of ratings for all books shown (where RateCount > 0)
  • AvgRatingForAll is the average unweighted rating for all books shown (where RateCount > 0)

In Figure 3, the results are now sorted by their Bayesian average. The star displays also reflect the calculated Bayesian average.

image

Figure 3

If you compare these results to those shown in Figure 2, several conclusions can be drawn:

  1. The Bayesian average is a truer reflection of how the books are rated in relation to each other.
  2. The Bayesian rank for books with a relatively small number of ratings tends to gravitate toward the average rating of all books.
  3. The more ratings a book has received, the more “believable” the ratings are. Therefore, when a book has received a relatively large number of ratings, its Bayesian rank gravitates toward its unweighted average rating.

Keeping it Fair

As pointed out, the Bayesian average formula uses a constant C that determines how much “correction” is applied to the raw, unweighted average. By examining the formula, you’ll note that C is a “dampening” value that determines the “sensitivity” of the calculation.

This means that the higher the value of C, the more customer ratings it will take to influence the Bayesian average. In other words, C can be thought of as the number of ratings it will take to have a significant impact on the relative ranking of an item. Higher values of C make the results less sensitive to the determination of the crowd, while lower values of C do the opposite. This is something you might want to take into account when determining what value to use for C.

In practice, this value should be proportional to the typical size of the data set. In our example, we used the average number of ratings given to all rated items in the search result. This works well in most cases in which new items are added on a regular and ongoing basis, such as an inventory of retail items, blog posts, songs, videos, downloadable applications, and the like.

However, consider the case in which new items are added relatively infrequently. When the existing items already have a high number of ratings, a new item will have to receive many ratings in order to move up or down in the list. In your particular application, if you want to give new items a chance to move more quickly, you can reduce the value of C by a given factor, like so:

BA = ((AvgNumOfRatingsForAll * 0.75 * AvgRatingForAll) + TotalRating) / (RateCount + (AvgNumOfRatingsForAll * 0.75))

Summary

When sorting a list of items by the “raw” average of the ratings it has received, we often get a distorted view. Results are easily skewed by a small number of ratings (or even a single rating) given to any particular item.

In order to provide a more accurate picture, we need a smarter rating system – one that can accurately quantify the “wisdom of the crowd.” This is something we naturally do when the size of the “crowd” is known to us. When an item has received many ratings, that data should be considered more “reliable,” and therefore its ranking in the list should more closely reflect the crowd’s decision. When an item has received very few ratings, its ranking should approximate the average rating for all the items shown.

We can use Bayes’ theorem to derive a smart rating algorithm to accomplish this task. Furthermore, the weighting applied by the Bayesian average formula can be adjusted to suit different purposes, thus making new items more sensitive to the influence of the crowd.

Download the Code

If you’re interested in playing around with this on your own, you can download the solution/database I used to create the screenshots in this post. You can mess around with the database values and/or the weighting constant in order to see how different scenarios affect the results.

1. Book cover images courtesy of Amazon.com.

2. All screenshots are for illustrative purposes only, and should not be construed as an actual reflection of the popularity or quality of these titles in “real life,” okay?

3. From http://en.wikipedia.org/wiki/Bayes%27_theorem

 

Subscribe to this blog for more cool content like this!

kick it on DotNetKicks.com

shout it on DotNetShoutOut.com

vote it on WebDevVote.com

Bookmark / Share

    » Similar Posts

    1. Defensive Programming, or Why Exception Handling Is Like Car Insurance
    2. Open Source or Die – The *Real* Future of Graffiti?
    3. Balsamiq – It’s Not Just For Salad Anymore

    » Trackbacks & Pingbacks

    1. Thank you for submitting this cool story - Trackback from DotNetShoutout

      The Wisdom of Crowds: Implementing a Smart Ranking Algorithm — November 9, 2009 10:03 AM
    2. You've been kicked (a good thing) - Trackback from DotNetKicks.com

      The Wisdom of Crowds: Implementing a Smart Ranking Algorithm — November 9, 2009 11:02 AM
    3. Pingback from Twitter Trackbacks for The Wisdom of Crowds: Implementing a Smart Ranking Algorithm : LeeDumond.com [leedumond.com] on Topsy.com

      Twitter Trackbacks for The Wisdom of Crowds: Implementing a Smart Ranking Algorithm : LeeDumond.com [leedumond.com] on Topsy.com — November 9, 2009 11:51 AM
    4. You are voted (great) - Trackback from WebDevVote.com

      The Wisdom of Crowds: Implementing a Smart Ranking Algorithm — November 10, 2009 12:37 PM
    5. Pingback from rofrol's status on Friday, 13-Nov-09 10:02:52 UTC - Identi.ca

      rofrol's status on Friday, 13-Nov-09 10:02:52 UTC - Identi.ca — November 13, 2009 4:03 AM
    6. This post was mentioned on Twitter by jonsiu: RT @LeeDumond: Blogged! The Wisdom of Crowds: Implementing a Smart Ranking Algorithm http://bit.ly/3jSReE ...very interesting!

      Social comments and analytics for this post — November 13, 2009 6:49 AM
    Trackback link for this post:
    http://leedumond.com/trackback.ashx?id=80

    » Comments

    1. jbland avatar

      Great post.

      I've been planning on using the Wilson Confidence Interval

      (see www.evanmiller.org/.../how-not-to-sort and www.derivante.com/.../php-content-rat )

      i'll have to add this to the toolkit

      jbland — November 9, 2009 12:26 PM
    2. jeremy Sterritt avatar

      Hi.

      In addition to digesting hardcore mathematics, some history, and a practical lesson in programming into one concise and engaging piece of reading, your article also speaks directly to the problem I'm having.

      A popular Internet radio site recently introduced a rating system to chart the popularity/performance of tens of thousands of emerging artists. The top 5 each week are rewarded with promotion and perks. The system compares artists using a simple average (positive response divided by # of plays). No distinction is made between averages based on 50 plays, 500 plays, 5000 plays. These averages, with no weighting or correction of any kind, represent the whole of the system.

      Is there any value to this system? Validity? Is it fair? Biased? Meaningless?

      If you would be so kind as to help me make sense out of this, I can be reached via email (provided) with any questions, comments, or for further info,

      Thank you very much...

      Jeremy Sterritt

      jeremy Sterritt — April 2, 2010 3:13 AM
    3. Lee Dumond avatar

      Jeremy,

      I think this would depend on what the “# of plays” represents. In other words, is it the number of plays that were actually rated, or ALL plays whether they were rated or not? It would also be influenced on whether raters were allowed to give either a positive or negative response (like Pandora’s thumbs-up/thumbs-down system), or positive responses only.

      Lee Dumond — April 2, 2010 8:50 AM
    4. Jeremy Sterritt avatar

      Several million registered users listen for free to music by, and similar to, artists they select. Emerging artists (bands) pay to have their music reach those listeners. Whenever a band's "paid play" starts playing on a listener's computer, an overlay appears on their screen prompting them to rate the song with a thumbs-up/-down. The vast majority of plays go unrated, but all are used as the divisor. Using a week's data, each band's performance is calculated as: positive actions minus negative actions divided by total paid plays (for that band). As I see it, this average indicates a band's popularity among those who had an opportunity to vote.

      Some bands have 5000 plays per week, some have 100 -- and everywhere in between. Regardless, all averages are sorted into the same list. The highest average is assigned a score of 100; all others are ranked on a scale of 0-100 relative to the top score.

      To me, this seems no different from rating MLB hitters by batting average...and the results are as you'd expect: each week's "winners" have the lowest # of at-bats.

      Thanks sooooooo much. Again, I really enjoyed your article and am enjoying learning more about Bayesian methods and applications.

      -Jeremy

      Jeremy Sterritt — April 3, 2010 4:12 AM
    5. Hey, thank you so much, the better article about Bayesian method in the internet.

      Best regards,

      André

      André — December 7, 2010 1:14 PM

    » Leave a Comment