Current location - Health Preservation Learning Network - Healthy weight loss - Is Douban Book Rating and Recommendation Reliable? -A little data analysis perspective
Is Douban Book Rating and Recommendation Reliable? -A little data analysis perspective
Douban's reading score has a very, very obvious division. I didn't do a detailed textual research in the specific period, but the old users of Douban, especially those who often read Douban, should have similar experiences: one day, you saw a book that looked interesting and scored high in the recommendation of new books read by Douban. You bought it home, but you were disappointed after reading it. At this point, you suddenly realize that the douban rating of new books published after a certain period of time is not credible.

For example, the book A Brief History of the Future that I read before is not good-looking. Sorry to get a high score of 8.5. After reading goodreads, I found that the score was only 3.69 stars, which was not as high as douban (Figure 1). There are great differences in the evaluation of this book at home and abroad.

Moreover, you can also see the opposite short comments on Amazon. Some people think that this is a "reliable prediction of the future", while others say that this book is "grandstanding and has no scientific basis" (Figures 2 and 3).

On the other hand, books with the same score of 8.5 on Douban, such as Nine Stories, The Story of Wukong and Joy of Life, are all very good, and goodreads have reached the level of 4. 15 stars. So there is a question, is the score of Douban Book reliable? Will there be some books with irrelevant scores, and what are the main influencing factors? In view of this, I selected some different books from different years and different publishers at home and abroad to make a comparison.

1. data overview

Choose 200 1-20 17, a book published in China. Limit the douban score to more than 2 weeks, on the one hand, discuss the familiar mainstream books, on the other hand, minimize the influence of the water army and so on. A total of 997 copies. With the help of CITIC Cloud Machine Learning Platform, we compared and displayed the scores of books, and the score distribution map is as follows (Figure 4):

There are many books we are familiar with, such as Fortress Besieged, The Shawshank Redemption, One Hundred Years of Solitude and so on (Figure 5 below).

At the same time, through the application of CITIC Cloud machine learning platform, we have made a variance and score distribution map for novel books published by other publishing houses including CITIC Book Publishing House (as shown in Figure 6 below). We can see that the rating range of CITIC's novels is above 7, and the STD gap is mainly distributed between 1.5- 1.75. Let's have a look and analyze. What's the difference in grading?

2. Differences in scores

2. 1 A brief history of the future VS a brief history of time

Take the scores of the above two books (Figure 7) as an example. They have the same score and a large number of people (6K, 18K), but the ratio of 4 stars to 2 stars is very different. What do you mean?

● Brief history of time: Everyone thinks it's good, so the score focuses on 4 stars.

● A brief history of the future: At the same time, many people feel good/bad, and there are many distributions of 2 and 4 stars.

That is to say, although their (average) scores are the same, the views behind them are quite different, and the scores are very different, which just corresponds to the fact that there are two diametrically opposite hot reviews in the future brief history.

2.2 How to measure the difference in scores

The difference of score distribution can be measured by variance, and the calculation method is as follows:

That is, to calculate the degree of deviation from the average score? . In the following, the standard deviation (STD) and the square root of variance can be used. A scatter plot of standard deviation (STD)- watercress score can be made (Figure 9). For comparison, make a range line with a standard deviation of 97%.

It can be seen that the STD difference between the brief history of time and the brief history of the future is really great. The standard deviation of the future brief history ranks in the top 3%, which is controversial and the time brief history is much smaller. Then we can ask questions.

These novels have the same score, but are they equally good/bad?

For example, A Brief History of Time and A Brief History of the Future are the same, but are they equally beautiful?

Of course not.

As shown in the previous comparison, although the future brief history scores high, its 4-star /2-star is quite different from the time brief history. Why? You may have heard of it, and you can see it in the comments. Usually we are always talking about the score of a book, which is just the average score. When everyone agrees, this score is of great reference value. If the scores are very different (STD is very large), the role of this score is limited.

3. Category differences

For the same category and different publishers, the rating and standard deviation are very different. So, how many shapes will there be for the scores of different categories of books in the same publishing house? We select the book part of CITIC Publishing House for data analysis, and use K-Means to input data in the proportion of four rating grades. In fact, we can divide the categories into four representative categories. The results are as follows (figure 10 and figure 1 1).

It should be noted that books with high STD are not suitable for classification because of their large shape differences.

As can be seen from the above figure, under each shape, you can also see books with high/low STD, such as Everyone Should Buy Insurance and Second-hand Time. On the whole, the scores of books published by CITIC are 7.6-8.8, and the STD is relatively stable, so there is no particularly big fluctuation, so there is actually little difference between categories.

4. Differences in scores of explosive books

We make an analogy with the changes of public opinion and the explosion of books in CITIC Publishing House in recent years (Figure 12).

The number of explosions changes as follows (Figure 13).

As can be seen from Figure 12 and Figure 13, the explosive books of CITIC Publishing House over the years are generally normally distributed, indicating that the amount of data used is basically sufficient. There is no certain regularity in the number of explosions in each era. What is the public opinion distribution of the corresponding explosion books, as shown in the following figure 14.

The above picture shows that the STD display effect of CITIC's explosive books is mainly concentrated between 1.3- 1.6, so we take out some of these books and show them below (figure 15).

As can be seen from the above figure, the scores of classic books are highly correlated with STD, and the higher the score, the lower the STD will be. That is to say, although book rating is a very personal matter, everyone's evaluation of books will be different, but with a large number of users of Douban, the rating is very popular, and the rating STD of classic books is still very small. In other words, there is no positive correlation between the score of books and the publication time and the degree of explosion.

5. What is the book with the biggest difference in evaluation?

As can be seen from the above figure, the STD of each score is high or low, so let's see what the maximum critical value of STD is. From the sample, we screened out the books with the biggest difference in STD, as shown in the following figure:

The reasons for the great difference in evaluation may come from many aspects, so I won't discuss it here.

6. Is it accurate to search for books with similar content and rating?

If you have read "Loneliness of Lanzhou University Masters" and other books, do you still want to continue to look for books with similar contents, ratings and ratings? Douban itself has a recommendation mechanism, as shown in the following figure:

We can see that some books recommended by Douban are quite different from the target books in terms of grading, grading and content. In order to verify the similarity between grading and scoring, we modeled the similarity of Douban books on CITIC Cloud machine learning platform, and found the books that are closest to the target books in content, grading and scoring through word2vec analysis.

For example, we input Iron Man in Silicon Valley, and through modeling and analyzing the data tags of Iron Man in Silicon Valley, we can find out the word cloud that is closest to the content of this book, as shown in the following figure (Figure 2 1).

We found out the evaluation of these two books from Douban, and the score and score composition are very similar.

When recommending shoes and dogs, the first correlation is that Iron Man in Silicon Valley is among the best, and the recommendation of Douban is consistent with the recommendation of machine learning.

abstract

As we all know, the average score of douban books is displayed, and we can also see the distribution of scores. In most cases, this average score is effective, because everyone's evaluation is close (STD is small), but few people pay attention to the difference in scores (that is, the size of STD). When we see a book with a large STD, the average score is not consistent with our feelings, we will feel confused, and then think that the score of Douban is not reliable. In fact,

In the book recommendation of Douban, the book closest to the target book is recommended through the comprehensive comparison of the content label, scoring composition and interval of the target book. From the observation of the current sample data of machine learning measurement, this score is more accurate.

Finally, if there is anything missing or unclear in the analysis, please point it out ~

Amway: CITIC Machine Learning Platform. Interested friends can register and try it.