Algorithm Predicts Best-Sellers with 84% Accuracy

Filigree · Jan 10, 2014

The Amazon samples seemed straightforward to me, even allowing for Amazon's supersecret algorithms. A book with a general Amazon rank of 500 is selling a hell of a lot more copies *on Amazon* than one with a rank of 6,000,000.

For reviews, one can filter out gushing 5-star reviews from family and trollish1-star reviews, and count the rest as reasonable data points.

robjvargas · Jan 10, 2014

MookyMcD said:
Original study: Earth's Orbit has adjusted by inward 1.8 centimeters in the past 100 years.

New Scientist: Study shows slight change in earth's relationship with the sun.

Media headline: Earth hurtling toward giant fireball.

Oh, ye of little imagination. We both know the headline would be all-caps and: EARTH FALLING INTO THE SUN.

Manuel Royal · Jan 10, 2014

Amadan said:
Headlines like "Will Your Book Sell? There's An Algorithm for That" are journalistic flourishes, because that sounds more interesting than "Computer scientists discovered a statistical correlation between certain lexical structures and arbitrary measures of novel 'success'."

Right-o. I haven't read the study, but -- if it's a good study by competent researches -- I'd be very surprised if they're claiming predictive ability rather than retrospective correlation.

Science reporting is mostly poor or terrible. Also, the article linked in the OP wasn't proofread, which always irritates me. (Just running spell-check didn't tell the author he'd used "expect" when he meant "except", or left out of verb here or there.) Now I'm just grumpy.

Anyway, I suggest concentrating on writing compelling stories with engaging characters. I've been quietly failing at that for years, and see no reason to change course now.

Liosse de Velishaf · Jan 10, 2014

robjvargas said:
Oh, ye of little imagination. We both know the headline would be all-caps and: EARTH FALLING INTO THE SUN.

Manuel Royal said:
Right-o. I haven't read the study, but -- if it's a good study by competent researches -- I'd be very surprised if they're claiming predictive ability rather than retrospective correlation.

Science reporting is mostly poor or terrible. Also, the article linked in the OP wasn't proofread, which always irritates me. (Just running spell-check didn't tell the author he'd used "expect" when he meant "except", or left out of verb here or there.) Now I'm just grumpy.

Anyway, I suggest concentrating on writing compelling stories with engaging characters. I've been quietly failing at that for years, and see no reason to change course now.

I'm just popping in to agree with both of these.

Russell Secord · Jan 10, 2014

Here's the thing to remember about statistics (and that's all we're talking about here). The answers you get depend on the questions you ask. Despite the headline, the study doesn't claim to predict whether a book will sell well... as several posts have already pointed out.

My feeling is that you can't predict popularity. It's a random process. You can't say that a particular book (or movie or TV series or song) will become popular unless it comes from someone who's already popular. If there were some process that could generate popular things at will, I believe public tastes would change to invalidate the process. In other words, such a process wouldn't be guaranteed, so there wouldn't be anyone willing to keep redeveloping (in effect, reduplicating) it--it would stay behind the curve.

Amadan · Jan 10, 2014

If I were forced to construct a predictive/diagnostic utility out of this study, it would be something along the lines of: "This can analyze your writing and assess the structure and complexity of your sentences and your word choices, and compare it to that of examples of successful and unsuccessful writing in the same genre."

As written, the algorithm is not nearly sophisticated enough to do even that much (and it's not meant to - remember, this is just a research paper, it's not a white paper for a proposed product), but it is conceivable that such a utility could be built from this. It would not be useful in telling a writer how to write, per se, and certainly would tell you nothing about your characters and your plot, but it could perhaps trigger some red flags indicating overuse of certain grammatical constructs and lexical items or a complexity level that's higher or lower than the "sweet spot" for your genre. Basically, something a few degrees more sophisticated than a grammar checker (and we know how useful current grammar checkers are).

CrastersBabies · Jan 10, 2014

Amadan said:
If I were forced to construct a predictive/diagnostic utility out of this study, it would be something along the lines of: "This can analyze your writing and assess the structure and complexity of your sentences and your word choices, and compare it to that of examples of successful and unsuccessful writing in the same genre."

It seems very plausible to me and a bit fascinating as well.

WaveHopper · Jan 11, 2014

Here's an article (by Karlin Lillington) yesterday from an Irish newspaper about a company using neural networks to predict the success of films. Unfortunately as its a commercial enterprise, the interviewee doesn't reveal a whole lot of detail.

Amadan · Jan 11, 2014

WaveHopper said:
Here's an article (by Karlin Lillington) yesterday from an Irish newspaper about a company using neural networks to predict the success of films. Unfortunately as its a commercial enterprise, the interviewee doesn't reveal a whole lot of detail.

Neural networks are largely being replaced by SVMs, though the basic principle for all of these methods is the same:

1. Take a large amount of data labeled with the characteristics being tested for ("true/false," "successful/unsuccessful", "terrorist/not-terrorist," etc.).
2. Extract features from the data (sentence structures and genres, actors & directors, nationality and itinerary, etc.).
3. Use various learning algorithms (e.g. neural networks, Support Vector Machines) to figure out correlations between features and labels. (The key here is that with large amounts of data, the algorithm's decision-making process is essentially unfathomable. This is why it seems like "BS" to a lot of people, because there no human reasoning going on, it's just juggling thousands and thousands of variables in enormously complex calculations.)
4. Feed unlabeled data into the algorithm, let it run, and it will produce a prediction, based on previous input.

Important things here to make the application actually reliable are 1. Large (and nowadays, millions if not billions is not that large) amounts of training data (which is why a research paper based on a few hundred examples can never be viewed as anything more than proof of concept); 2. Choice of features to extract. You have no way of knowing what will necessarily weigh most heavily in producing an accurate prediction; it could turn out that the number of vowels in the title is a strong predictor of a novel's success, for example, but the algorithm only uses features made available to it.

kuwisdelu · Jan 11, 2014

Amadan said:
Important things here to make the application actually reliable are 1. Large (and nowadays, millions if not billions is not that large) amounts of training data (which is why a research paper based on a few hundred examples can never be viewed as anything more than proof of concept)

I wouldn't say that at all. "Large" is often more trouble than it's worth. We use large datasets because they exist, not necessarily because we need them to be accurate. A few hundred observations is usually more than enough to get good results.

The difference between a classifier trained on 1,000 observations and 100,000 observations will often be maybe 3-5% accuracy, and hundreds of hours of waiting-for-the-computer time.

That's not always true, of course, but in the n >> p situation, it often is.

And lots of things change when p > n, naturally, and when we're talking about sparse datasets, etc...

And if the observations themselves have non-trivial correlation structure...

Edit: Now in the case of the paper, n=100 per genre, and they're using unigrams, so there we probably are looking at a p > n sparse dataset. So in this situation, I do think the selection of those 200 books per genre probably had a noticeable effect on the results. For example, personal identifiers ("I", "me", "my") were indicative of success. I would bet that's just an effect of randomly assigning more successful 1st person novels to their training sample.

Amadan · Jan 11, 2014

kuwisdelu said:
I wouldn't say that at all. "Large" is often more trouble than it's worth. We use large datasets because they exist, not necessarily because we need them to be accurate. A few hundred observations is usually more than enough to get good results.

The difference between a classifier trained on 100 thousand observations and 100 million observations will often be maybe 1-2% accuracy, and hundreds of hours of waiting-for-the-computer time.

That's not always true, of course, but in the n > p situation, it often is.

And lots of things change when p > n, naturally.

Well, it depends a lot on the application and the data. Sometimes 1%-2% is significant. And sometimes you have data for which individual data points have very little variation. But you're right, I should not have said that no classifiers can be accurate with only a few hundred samples.

kuwisdelu · Jan 11, 2014

Amadan said:
Well, it depends a lot on the application and the data. Sometimes 1%-2% is significant.

Absolutely, and companies like Netflix, Google, etc., will pay millions of dollars to get an extra 1%.

But as far as the public cares and needs to understand, 1-2% is not very important.

(Mostly, I'm tired of people on the internet always saying the sample size is too small for studies where sample size isn't really a problem, and want to let people know that lots of times it's other areas of methodology that are problematic instead. Sample size is an easy point of suspicion when someone doesn't like the results of a study, but it's often not the real issue. I don't want people walking away and thinking in the future "oh, because some study used 1,000 instead of 1,000,000, it's worthless," when that's often not the case.)

kuwisdelu · Jan 11, 2014

Amadan said:
the basic principle for all of these methods is the same:

1. Take a large amount of data labeled with the characteristics being tested for ("true/false," "successful/unsuccessful", "terrorist/not-terrorist," etc.).
2. Extract features from the data (sentence structures and genres, actors & directors, nationality and itinerary, etc.).
3. Use various learning algorithms (e.g. neural networks, Support Vector Machines) to figure out correlations between features and labels. (The key here is that with large amounts of data, the algorithm's decision-making process is essentially unfathomable. This is why it seems like "BS" to a lot of people, because there no human reasoning going on, it's just juggling thousands and thousands of variables in enormously complex calculations.)
4. Feed unlabeled data into the algorithm, let it run, and it will produce a prediction, based on previous input.

In the interest of education, this is missing an important step: testing and validation.

You don't just want to develop your classifier on your entire dataset and then see what the accuracy is, because that will always be biased and overconfident.

You want to do some kind of testing and validation, by dividing your dataset with known labels into two or more sets called "training" and "testing", and possibly one called "validation."

You use the "training" set to "train" your classifier. That is, the algorithm looks at all your data and their labels, and spits out a classifier you can use for prediction of new data. If your classifier were a dragon, and you were training it, this is where the montage would be.

Apply that to your "testing" set (for which you know the labels, but your algorithm doesn't) to evaluate the accuracy.

Sometimes, classifiers will have user-set parameters that the experimenter can set (as opposed to the parameters that the algorithm calculates directly from the data), that may effect the results. Basically, ways to "tweak" the algorithm.

This is where the "validation set comes in. You do lots of iterations on the "training" and "testing" set to tweak the parameters to give the best performance. Then finally run your final classifier on the "validation" set (where again, you know the labels but the algorithm doesn't) to give the final accuracy that you should report, that is the least biased.

Another way to accomplish this is "cross-validation," where you split the data into multiple groups (or "folds"), and iteratively use each group as the training set, and test on all of the remaining groups. Then you report the average accuracy.

Never trust a paper about prediction that doesn't do some kind of testing/validation.

(This paper used 5-fold cross-validation. You can find their cross-validation scheme in the "data description" link under "Downloads" on the researchers' site here.)

bearilou · Jan 11, 2014

I love when AW gets their geek on.

James D. Macdonald · Jan 11, 2014

Historical example:

Following the cholera outbreak in London, 1854, the Board of Health did a statistical study. They analyzed:

Atmospheric pressure
Temperature of the Air
Temperature of the Thames Water
Humidity of the Air
Direction of the Wind
Force of the Wind
Velocity of the Air
Electricity
Ozone
Rain
Clouds
Comparison of the Meteorology of London, Worcester, Liverpool, Dunino, and Arbroath
Wind
Progress of the Cholera in the Metropolitan Districts in the Year 1853
Atmospheric Phenomena of the Year 1853
Atmospheric Phenomena in relation to Cholera in the Metropolitan Districts in the Year 1854

Their data were complete, detailed, and accurate. Their math was exhaustive and correct. Their results were useless.

Why? Because at the time the belief among leading scientific men was that disease was caused by miasma -- bad smells. The germ theory of disease was a crank belief held by a few oddballs. But the fact was that cholera wasn't caused by bad smells. It was caused by a water-borne bacterium. No matter how much they might study the miasma in London the Board of Health would never figure out the cholera by the route they chose.

Layla Nahar · Jan 11, 2014

I use a crystal ball. It works pretty well.

Kylabelle · Jan 11, 2014

Layla Nahar said:
I use a crystal ball. It works pretty well.

+1

Amadan · Jan 11, 2014

kuwisdelu said:
In the interest of education, this is missing an important step: testing and validation.

Well yeah. I thought I was probably already going into teal deer.

(Anyone who is actually interested in this stuff, and learning a bit of programming as well the easy way, should try the free, open source Natural Language Toolkit.)

Once! · Jan 12, 2014

An interesting paper. I can see some problems with it, but there is also some confirmation of ideas that we often talk about in AW.

The problems first. A lot comes down to the books in the sample. As far as I can see, the researchers have used a relatively small sample (100 novels per genre). So my first question would be whether the sample was representative. There is a danger that the research would include successful books from the past (which have survived) more than less successful books from the same era. By contrast, they seem to have used a higher proportion of less successful books from the present.

Then we have the changes in writing styles and genre preferences over time. Just because a novel was successful 100+ years ago does not mean that it would be as successful today. Fashions have changed.

So I'm reserving judgement on some of the conclusions because of uncertainties about the samples used.

But ... there's some good news. The report confirms that less successful works over-use descriptive writing such as adverbs and interjections. Successful novels use "said" and "says" more than other words to describe speech. Clarity of speech and readability is important.

Which is what we have been saying all along. Write well and avoid purple prose and you have more chance of being successful. That is also how an editor can spot a good book from a bad one within the first page.

What the report doesn't do is tell you what you need to do in order to guarantee that you will be successful. But then it doesn't claim to do that.

It's not an easy read. Academic papers rarely are. But it has some nuggets buried deep if you are prepared to put in the effort it takes to read it. For example, one of the clearer parts is this:

Interestingly, less successful books rely on verbs that are explicitly descriptive of actions and emotions (e.g., “wanted”, “took”, "promised”, “cried”, “cheered”, etc.), while more successful books favor verbs that describe thought-processing (e.g., “recognized”, “remembered”), and verbs that serve the purpose of quotes

and reports (e.g,. “say”). Also, more successful books use discourse connectives and prepositions more frequently, while less successful books rely more on topical words that could be almost clich´e,

e.g., “love”, typical locations, and involve more extreme (e.g., “breathless”) and negative words (e.g., “risk”).

Or you can dismiss it as BS on principle.

jjdebenedictis · Jan 12, 2014

Once! said:
But ... there's some good news. The report confirms that less successful works over-use descriptive writing such as adverbs and interjections. Successful novels use "said" and "says" more than other words to describe speech. Clarity of speech and readability is important.

Which is what we have been saying all along. Write well and avoid purple prose and you have more chance of being successful. That is also how an editor can spot a good book from a bad one within the first page.

That was something that struck me too, when I first read about this. The study confirmed some of the advice that writers have long been given regarding how to strengthen their prose.

atthebeach · Jan 12, 2014

Bearilou- me too!

I would consider this a form of NLP, and NLP is indeed scientific. That does not mean the media doesn't inflate it for sensationalism -good examples above by MookyMcD and robjvargus.

I agree we should look at academic papers ourselves before deciding what is ridiculous or what is reliable.

That said, I also agree with Macdonald's cholera example, as algorithms are only as good as what and how they are configured.

I think Once! is right, that we at least see some correlations to what we already know is good/ bad for writing success. We have to take the exaggerated media title and reduce it to some nugget of truth, if any remains (and in this case I believe something does).

Also, I am impressed with your discussion Amadan and kuwisdelu (and kuwisdelu what are you getting your Ph.D in, if not this?). I am a linguist, and while I don't do this, I have friends and colleagues that combine linguistics with computer science and do a lot of NLP.

Bottom line- a computer can check text and confirm features of good writing, as defined in an algorithm. But predictions may be taking it too far- may. I find the whole idea fascinating. But no, I do not expect it to predict the next Harry Potter success.... But what if it did? (Twilight zone creepy music playing in background...)

kuwisdelu · Jan 12, 2014

atthebeach said:
(and kuwisdelu what are you getting your Ph.D in, if not this?). I am a linguist, and while I don't do this, I have friends and colleagues that combine linguistics with computer science and do a lot of NLP.

I'm in statistics. My current work is developing methods for analysis of mass spectrometry imaging data, but I hope to get into NLP eventually. I was interested in it before, but the opportunity with my current research was too good to pass up. I like working with large, complex datasets, and I'd love to fit my love of language into my day job, too.

atthebeach · Jan 12, 2014

Excellent! I have my Ph.D (in linguistics), so that is why I asked- when you said you were working on your Ph.D- good luck!!! And there is definitely a demand for combining the two.

Drachen Jager · Jan 12, 2014

Layla Nahar said:
I use a crystal ball. It works pretty well.

I just assume everything I write will be a flop.

So far I have a 100% accurate prediction rate.

Algorithm Predicts Best-Sellers with 84% Accuracy

Mildly Disturbing

nearly perfect

Burninator!

Revolutionize the World

Revolutionize the World

Revolutionize the World

DenturePunk writer

Your Genial Uncle

Seashell Seller

unaccounted for

Still confused by shoelaces

is watching you via her avatar

In my happy place

Revolutionize the World

In my happy place

Professor of applied misanthropy