Algorithm Predicts Best-Sellers with 84% Accuracy

Status
Not open for further replies.
Joined
Jun 29, 2008
Messages
11,042
Reaction score
841
Location
Second star on the right and on 'til morning.
Website
atsiko.wordpress.com
I haven't seen a thread on this yet. It makes me a bit suspicious.

http://www.escapistmagazine.com/news/view/131181-Science-Discovers-the-Secret-to-Successful-Writing

Knowing where to start, where to go and how to get there can be incredibly frustrating; even more so when you consider that there's no way to know if anyone will even like the finished product. Expect maybe there is.
Researchers at Stony Brook University in New York have apparently developed an algorithm that can predict whether or not a book is going to be commercially successful.

Read more at http://www.escapistmagazine.com/new...ret-to-Successful-Writing#DJxlRkFCQHzxHiFq.99

 
Last edited:

whatsupbuttercup

Super Member
Registered
Joined
Jul 22, 2013
Messages
297
Reaction score
86
Location
New York City
I found this tidbit to be quite interesting

The research found that books primed to fail had a tendency to overuse verbs and adverbs and included more language describing explicit actions. Successful books meanwhile spent more time describing thought processes and also made a habit of using conjunctions more heavily.

Fascinating since the article stated they used the lowest selling books on Amazon as their "critical failures" subjects. I can't help but wonder what books they used for the successes and failures.

Editing: I'm skeptical about the samples they've used in their novels, short stories, and movie scripts. (not the algorithm itself)
 
Last edited:

kuwisdelu

Revolutionize the World
Super Member
Registered
Joined
Sep 18, 2007
Messages
38,197
Reaction score
4,544
Location
The End of the World
It's very interesting from an academic perspective. Completely useless, IMO, from a "how can I write better?" perspective. Okay, well, there is some limited use in their conclusions, but it's pretty obvious stuff, such as successful academic papers being more straightforward and readable (simply sentences) and successful fiction often employing more complex sentences. The distribution of POS tags is definitely interesting.

The magazine article got it wrong that they were trying to predict bestsellers. They used various measures of success including number of downloads on Project Gutenberg and literary awards, neither of which are great measures of how popular a book is with the general population near its release. They even hypothesized that their methods would favor prediction of literary success over commercial success, due to the nature of their sampling. For example, Dan Brown's The Lost Symbol is "correctly" predicted as unsuccessful (despite commercial success).

Edit: Man, I wish I'd gotten into natural language processing when I started my PhD. Seems so fun. Oh well, I can always do it when I'm done...
 
Last edited:

MookyMcD

I go to eleven
Super Member
Registered
Joined
Sep 26, 2013
Messages
1,560
Reaction score
236
Location
Boise, ID
Website
michaeljmcdonagh.wordpress.com
Last I heard, Netflix had a million dollar bounty out for anyone who could predict what movies people would want to watch based on what they (and, presumably, everyone else in the database) were watching. That was after years of trying.

The books (and authors) who do extremely well, outside closely knitted genres, are almost invariably doing something different or something old in a new way. Backward looking statistical tools will never be able to predict that.
 

K.B. Parker

I've lost my mind
Super Member
Registered
Joined
Jun 7, 2013
Messages
612
Reaction score
62
While I find the findings odd, does anybody remember a similar study was done with music a few years back. It was a formula that predicted what songs would become hits. Except that one actually worked, I believe.
 

kuwisdelu

Revolutionize the World
Super Member
Registered
Joined
Sep 18, 2007
Messages
38,197
Reaction score
4,544
Location
The End of the World
This piece of news is ringing the top bell on my bullshit scale.

The reporting is bullshit. The paper itself is pretty interesting and doesn't make such outrageous claims.

Like always.

Last I heard, Netflix had a million dollar bounty out for anyone who could predict what movies people would want to watch based on what they (and, presumably, everyone else in the database) were watching. That was after years of trying.

It's actually really easy to do this kind of stuff really well. But companies are always looking to get that extra 1%, which could still easily translate to millions of dollars.

As a sidenote, the top algorithms for this kind of stuff are always the same bog-standard statistical tools. The actual hard part that usually improves results is in the pre-processing of the data and selection of predictors.
 
Last edited:

Kevin Nelson

Aspiring to authorship since 1975
Super Member
Registered
Joined
Jul 18, 2012
Messages
464
Reaction score
48
Location
Austin, TX
It's actually really easy to do this kind of stuff really well. But companies are always looking to get that extra 1%, which could still easily translate to millions of dollars.

Certainly not my experience. Movie-recommendation engines have been a dismal failure for me. I rated more than 200 movies on movielens.org, and it still kept giving me wildly inaccurate predictions about which movies I'd like. For example, it thought I would love the 2009 Star Trek, but I actually hated that movie.

Maybe my own personal tastes are unusually hard to predict for some reason, but I can definitely see some room for improvement in whatever algorithms these engines use.
 

Rina Evans

Super Member
Registered
Joined
Dec 29, 2011
Messages
533
Reaction score
44
Certainly not my experience. Movie-recommendation engines have been a dismal failure for me. I rated more than 200 movies on movielens.org, and it still kept giving me wildly inaccurate predictions about which movies I'd like. For example, it thought I would love the 2009 Star Trek, but I actually hated that movie.

Maybe my own personal tastes are unusually hard to predict for some reason, but I can definitely see some room for improvement in whatever algorithms these engines use.
Isn't that prediction based on genre and previously watched movies and TV shows with similar themes and such? Netflix could reasonably predict that I would love the new Star Trek movie based on my sci-fi watching history, but it can't parse out the emotional responses to very specific plot points. Right?
 

Amadan

Banned
Joined
Apr 27, 2010
Messages
8,649
Reaction score
1,623
Kuwisdelu has it right - papers like this are hard to understand by laymen, and it's not very sexy to describe statistical correlations between POS tags and commercial success, especially when it does not translate into any material advantage, e.g., "How can I write a best-seller?" But the researchers who came up with this are just doing research, not trying to write a commercial Best-Seller Predictor for publishing houses. (Though I'm sure if it had commercial applications they'd be quick to take advantage of them.)

It is interesting. It is not "bullshit." Neither is it magic.

Netflix and Amazon predictions are actually pretty good, but keep in mind they fiddle with the weights a lot, and of course there will be outliers. If you loved the last 5 Star Trek movies you saw, Netflix will probably predict a high rating for you for the next one. If for whatever reason, the next one turns you off, it's unlikely the rating algorithm will be able to catch the specific reason, like the actors or the director or the plot.
 

robjvargas

Rob J. Vargas
Banned
Joined
Dec 9, 2011
Messages
6,543
Reaction score
511
As I writer, I'd pretty much ignore this algorithm even if it does indeed work.

Somewhat like trying to figure out what people want right now and trying to write that, I just wouldn't do well trying to tie myself into a formula.

So let the publishers and agents use this. Me, I'm going to write what I write.
 

Jamesaritchie

Super Member
Registered
Joined
Feb 13, 2005
Messages
27,863
Reaction score
2,311
Sounds like pure BS, and just from the article, has a long, long, long way to go before it's actually scientific. Nor, just from the article, do I see anything at all new. What I do see is a whole host of things left out.
 

Filigree

Mildly Disturbing
Super Member
Registered
Joined
Jul 16, 2010
Messages
16,441
Reaction score
1,529
Location
between rising apes and falling angels
Website
www.cranehanabooks.com
The paper's abstract - not the reporting - is fairly interesting. It stretches my comprehension to wade through, but I have heard of some of the previous studies they used. The study doesn't precisely indicate immediate commercial success. On a deeper read, I can connect the general information to improvements in my own writing.

Especially in balancing action with introspection.

But it's not ready for commercial predictive use - which means, like astrology and Briggs Myers - it's going to be snapped up by many businesspeople.
 
Last edited:

Amadan

Banned
Joined
Apr 27, 2010
Messages
8,649
Reaction score
1,623
Sounds like pure BS, and just from the article, has a long, long, long way to go before it's actually scientific. Nor, just from the article, do I see anything at all new. What I do see is a whole host of things left out.


You do not in any way indicate that you actually read/understood the article.
 

kuwisdelu

Revolutionize the World
Super Member
Registered
Joined
Sep 18, 2007
Messages
38,197
Reaction score
4,544
Location
The End of the World
So let the publishers and agents use this. Me, I'm going to write what I write.

I wonder how 50 Shades or Hunger Games would have preformed. Probably in the 16% inaccuracy. That would be interesting to see.

The authors of the paper don't claim to even try to predict bestsellers. Their measures of success are primarily based on Project Gutenberg download counts, and they hypothesized and confirmed a bias toward predicting literary success over commercial success (see the Dan Brown example).
 

Torgo

Formerly Phantom of Krankor.
Kind Benefactor
Super Member
Registered
Joined
Apr 7, 2005
Messages
7,632
Reaction score
1,204
Location
London, UK
Website
torgoblog.blogspot.com
(I'm very interested in recommendation algorithms. The Netflix bounty no longer stands - partly because the anonymized data sets they let people practice on were assessed as a privacy risk, and partly because the shift from DVD rental to streaming turned out to make a big difference to the way it needed to work.

I did a talk recently on this subject and will find some time to write it up.)
 

Buffysquirrel

Super Member
Registered
Joined
Nov 12, 2008
Messages
6,137
Reaction score
694
I glanced over this article when it appeared in my FB feed. If they're using Gutenberg downloads, I don't see how they select out people who are dloading for class or people like me looking for a specific quote that they can't quite remember accurately.
 

Torgo

Formerly Phantom of Krankor.
Kind Benefactor
Super Member
Registered
Joined
Apr 7, 2005
Messages
7,632
Reaction score
1,204
Location
London, UK
Website
torgoblog.blogspot.com
It looks like the way this research has been reported is somewhat at odds with the actual research (per Kuwi, above.)

I'd encourage anyone wanting to discuss this usefully to read the paper (PDF link) - I don't think we need to be calling BS on the research based on a couple of somewhat click-baity news stories. (Feel free to call BS on it based on reading the research, of course.)
 

Amadan

Banned
Joined
Apr 27, 2010
Messages
8,649
Reaction score
1,623
I'd encourage anyone wanting to discuss this usefully to read the paper (PDF link) - I don't think we need to be calling BS on the research based on a couple of somewhat click-baity news stories. (Feel free to call BS on it based on reading the research, of course.)


Seriously.

Anyone who says "This is BS" without actually reading the paper (and be warned, it is an academic paper with lots of tables and statistics, and terms specific to NLP and prediction algorithms, so if you claim you understood it all and you do not have a background in computer science and/or statistics, I will express skepticism as to your credibility) is basically standing in public shouting "I make uninformed pronouncements about things I do not understand!"

(And if you have read it and still say it is BS, please tell me why, and be specific.)

I read the paper and understood it because I used to actually do research in this area. It's not perfect and I'm sure the peer reviewers had some criticisms, but it's basically a solid experiment that does not make any grandiose claims.

Headlines like "Will Your Book Sell? There's An Algorithm for That" are journalistic flourishes, because that sounds more interesting than "Computer scientists discovered a statistical correlation between certain lexical structures and arbitrary measures of novel 'success'."
 

Buffysquirrel

Super Member
Registered
Joined
Nov 12, 2008
Messages
6,137
Reaction score
694
I read New Scientist, which basically condenses science stories from the major journals, and it can be quite illuminating then to read the same story in other outlets, which as far as I can see condense the NS version even further, often unintentionally introducing hilarious errors.
 

MookyMcD

I go to eleven
Super Member
Registered
Joined
Sep 26, 2013
Messages
1,560
Reaction score
236
Location
Boise, ID
Website
michaeljmcdonagh.wordpress.com
Original study: Earth's Orbit has adjusted by inward 1.8 centimeters in the past 100 years.

New Scientist: Study shows slight change in earth's relationship with the sun.

Media headline: Earth hurtling toward giant fireball.
 
Status
Not open for further replies.