The way some others do it:
Rankings reflect sales of graphic novels, for the week ending May 31, at many thousands of venues where a wide range of books are sold nationwide. These include hundreds of independent book retailers (statistically weighted to represent all such outlets); national, regional and local chains; online and multimedia entertainment retailers; university, gift, supermarket, discount department stores and newsstands. In addition, these rankings also include unit sales reported by retailers nationwide that specialize in graphic novels and comic books. An asterisk (*) indicates that a book’s sales are barely distinguishable from those of the book above. A dagger (†) indicates that some bookstores report receiving bulk orders.
This is from the Arts Blog of the Gray Lady, the venerable New York Times. They’ve been posting a Graphic Books Bestseller Chart for three months now; the pull-quote above is from the footnote to a recent list, after the jump and all the way at the bottom, and then offset by formating the paragraph in italics so your eye will just glide over it on your way down to the links and the comments appended to the end.
“It’s just a technical note, don’t worry your pretty little head over our methodologies.”
The problem, of course, is that a “New York Times Bestseller List” is like the “Dow Jones Industrial Average” — sure, each is just a listing of top performers compiled for the benefit of a newspaper using a decades old secret formula; but the impact of the whole is so much more than just a top 10, or top 25, or the estimated value of an imaginary, arbitrary portfolio of 30 near-randomly selected stocks, the components of which are swapped out at whim — it’s designed from the get-go to seem authoritative while cherry-picking what they’d care to track. I’m not sure why the Dow continues to get press, other than the fact that there is no good alternative, and tradition and inertia lend the Dow a gravitas that no new index (or mere average with obvious, independently verifiable inclusion standards and a larger data pool) can match.
And ‘Bestsellers’… For an author and publisher, the New York Times Bestseller imprimatur is money in the bank. They proudly emblazon said status on the cover of the book, and the lucky wordsmith will forever bear the sobrequet of “New York Times Bestselling Author”.
In the publishing world, this is a big deal.
Other papers-of-record (The USA Today list, for example, which is not only longer but more inclusive and — on it’s face, at least — much more democratic) and even major retailers also maintain bestseller lists, but they’ll never be able to conjure the same magic as the New York Times. Something about old New York’s status as a publishing centre, and the close to 70 years that the NYT has published their charts, are what make their bestsellers ‘the’ bestsellers, but even Wikipedia can point you to older charts, and the controversy surrounding the term, and the different ways the term ‘bestselling’ is used depending on context, region, and even things like the format of the book and the venue in which it is sold.
It’s all hokum and snake oil. Hell, any wonk with a blog and too much free time on his hands can compile a chart. [*ahem*]
In this case, I can one-up the Times — I am proud to present: Transparency.
Here’s the method and methodologies, sources and scores, how I weight the data and why, and in way more detail than anyone really wants.
I don’t care if you want it or not; it’s not so much that my inclusion of this information makes my chart better or more accurate than Neilsen Bookscan, or USA Today, or the New York Times, or ICv2, or a slate of retailers. (Retailers, for example, know exactly how many copies they’ve sold, and similarly Publishers know exactly how many books they get paid for — and none of that data is forthcoming.) My numbers are still just estimates; my sources are online retailers and who knows? they could be lying to us. It’s not about ‘proving’ my numbers are better — this is a good faith effort to share with you [and the rest of the uncaring internets] exactly what it is that I do. You are invited to make your own value judgements as to what it’s worth.
In a way, everything is also verifiable; though the sticky wicket is that my chart relies on ephemeral data that posts to the internet once, before being replaced by a more current version — so short of exactly duplicating my data collection method, there may not be way to call me on it — but sources are clearly identified, both here and in each post. Go look at the same websites, wallow in the same data set. Get a feel for the overall geist of online sales in the same way I have. Instead of closing off my sources, and hiding my process in a footnote, here is a great-gobsmacking-big invitation to share with me. Follow along with the home game version. A couple times a year I even post my entire spreadsheet, with 3 or 6 months worth of data. Dive In, Math is Fun!
##
Two months ago (at the time of posting: April ’09) I made the decision to change from just a manga chart to a bestseller list for all graphic novels. And I’m still trying to cope.
I’ve hit the limit of what one dedicated person can do on a part-time basis. In fact, we’re past that limit as I’m far less than ‘dedicated’ and will occasionally take a couple of days off to watch a set of newly acquired anime DVDs, or read through a half-dozen manga of a given series, because I am a loser fan boy first before I am a blogger or math nerd.
So. The charts post sporadically. As I get more details for this new listing nailed down, my weekly time commitment will also decrease, slightly, and so I should be able to settle back into a regular posting schedule, but this one minor (on the face of it) adjustment has thrown my overall progress back a year. Maybe more.
Enough editorial…
Hi, my name is Matt. This is RocketBomber.com, and this is where I post a bestseller chart for Graphic Novels.
The Core of the Charts is made up of data from three sites: Amazon, Barnes & Noble, and Borders.
Once a week, I visit each site to check their Graphic Novel categories, and I sort the search results by ‘bestselling’. The links above will pull up exactly that.
I then click through, page after page, and type the titles into a spreadsheet in the order that they are ranked on the sales site. [this is the hard part]
And once I have a full list, I assign points to the books depending on how highly they rank. Add up the points each title earns (and add on similar data from a half-dozen second-tier sales sites) to get a composite score, and there’s your ranking.
In concept, it’s that simple.
##
In practice, because the sites themselves can update as often as once an hour, after I load up a website & sort the search results, I then click open each new page in a new tab until I have 20-100 tabs open, representing a snapshot of the full sales (top 900-1200 titles) of this particular sales site over a relatively short time-frame (10-15 minutes). And then I start the data entry.
For Borders, which handily allows 50 titles to a page, I only have to open 20 or so tabs. For B&N, which can support 100 titles to a page but maddeningly restricts some searches to a mere 10 titles a page with no option for more, I load up 95 tabs. And even though Amazon defaults to 12 a page — a default that no one can change, and currently, a default that one must navigate by clicking ‘next’ on each and every page — I also load up 95 tabs (1140 titles) because even though I only want a top 900, Amazon search results include so much ‘noise’ I know in advance that I’ll need to skip between 175-200 titles because they aren’t graphic novels.
Dear Amazon, Newsflash: Just because Gaiman wrote it, doesn’t make it a comic.

Let’s go back one half step: the top 900 titles.
To compile my charts, for the top three sites (op cit. Amazon, B&N, Borders) I look up the First 900 graphic novels listed. Yes, I skip a few; as noted, not everything coming up on a search is a graphic novel. I intentionally skip some others.
[currently: I skip most kids’ ‘picture books’ and adaptations of classics and material in the public domain. — I ♥ the classics, and also love the comic adaptations as much as the next guy (or more), but with up to 5 different versions of a book, all from different publishers, my spreadsheet (and my poor brain) can’t track them all — does one consider the source (i.e. Huck Finn) or the imprint (i.e. Papercutz Classics Illustrated) as the ‘series’ in this case? On the one hand, all versions of the source book should be the basis for a title ranking — on the other hand, the consumer presumably would be looking for all adaptations under a particular imprint. There is a third case of course: it’s my chart and this makes my brain hurt so I just skip ‘em]
Matters of inclusion aside…
That was the top 900 titles. Here’s how I score them:
#1 gets 100 points.
Perfectly straitforward. And that’s my benchmark: #1 on Amazon, or B&N or Borders, is worth 100 points and everything else (lower ranked on one of these sites or appearing on a different site) is worth less; some fraction of 100. I only belabour this point because from here it gets messy:
#2 gets 99.7 points, and we proceed down the charts by increments of three-tenths through 234 titles (#234 scores 30.1 points) and then shift to increments of one-tenth (#235 scores 30 points, #236 29.9, and so on) through the next 200 titles and then we switch to increments of five-hundreths of a point…
…yeah, I know. Here, look at this:

#1 gets 100 points. Everything below that only scores some fraction of 100. By the time we’re ⅔ of the way through the source chart, that fraction is the nominal one tenth of a point (not quite zero, but close) and I keep scoring titles until I get sick of looking at the website, or I hit 1000 titles, or both. In practice (and in the chart above, and for all my GN rankings as posted to date) I’m going to push until I hit 900 titles and then (gratefully) stop — but one tenth of a point isn’t going to change anything and so long as the data looks good I reserve to right to keep on going. A lot of the “long tail” in my posted manga charts (5000+ total titles at the end of ’08, the last 300 or so only appearing once or twice in sources all year) came from this kind of extended data entry.
Why all the decimals? I discovered early on (back in ’07) that if I posted nice round numbers it didn’t matter how I introduced, qualified, or explained the chart someone would mistake my score for an actual unit sales number. The simple solution (at the time) of dividing by ten — inserting a decimal point — instantly changed that. Since then I’ve modded the spreadsheet to incorporate the fractions.
So, that’s three charts that form my core, and the ‘fancy’ math I use to approximate sales.
[Note: scoring methods changed slightly starting with the charts dated 19 July; there is an explanation here. The chief upshot is a ‘fatter’ curve that reflects a greater emphasis on midlist and backlist titles, but the top of the chart does not change — #1 still equals 100 points — and everything else is still just scored at a fraction of that. A full accounting will be presented in the next update to the FAQ; until then please remember that while some of the arbitrarily assigned scores may be different, the Theory and Reasons behind the chart as presented below are still valid]
I also check one other bookstore’s site: Books-a-Million, but given their lower sales volume I discount their results slightly, and also delve less deeply: I check a top 300 (with #1 scored at 30 points, and decreasing by a tenth of a point down a straight line) with the addition of another 100 ranking titles at 0.1 points each, similar to above but stopping at #400.
Then there are a lot of top 100 charts from various sites (buy.com, Powell’s, overstock.com, deepdiscount.com, Tower, half.com) and also Amazon’s hourly top 100, which is different from the Graphic Novel ‘bestseller’ search results, oddly enough, and which I check 5 times a week — roughly once a day.
The ‘number ones’ at each of these sites score 10 points, and down the list by increments of 0.1 points until we get to #100, which scores a single tenth of a point. (Tower and half.com are proving to be of marginal utility; I may have to discount them further, i.e. #1 = just 5 points, or in the case of half.com just drop the site entirely, but as of June ’09 they are still components in the rankings)
That’s where all the numbers come from: After looking at ten different sales sites and doing all the data entry and scoring 15 different source charts (with its hourly bestsellers, Amazon gets checked a total of six times) of varied lengths and value — and then doing it again, as each set of posted rankings pulls from two weeks of data, we have just started.
Now that we have data we can run them through the spreadsheet. The trick, of course, is teaching a spreadsheet how to grok book titles, and how to discern what books are part of which series. [It’s a matter of careful formatting more than anything else — the spreadsheet knows how to put things in order, and how to add, and it can also compare two line entries to see if part or all of the line is the same. Using these simple tools, it’s possible to compile a chart of rankings — if you’ve set the sheet up correctly.]
##
Let’s assume that the same titles are all ranking in the same order on every sales site. Watchmen is number one everywhere, for example. (It sounds ridiculous, I know, but let’s go with this model.) If that were the case, and using the scoring method above, we’d get a top 900 titles that would score like this:

I’d posted a similar chart earlier in the teaser (actually the same chart, scaled differently) and in the comments JRBrown said, “To me this graph looks pretty similar to those charts of Amazon’s overall book sales that were so shocking 5 years ago, only a lot more compressed (with the top 100-150 books accounting for maybe half the sales?).”
Yes, exactly.
What I didn’t tell you is the chart above doesn’t represent actual sales, it’s only a model. This is what my approximation of online sales looks like in my monstrous, steam-powered difference engine computational works. And since I use weighted scores in comparing titles, this is why the Manga and Comics 500 are often referred to as online sales estimates. No one is giving me actual sales numbers, and these are the lengths (and depths, and bredths) to which I’ll resort to figure this out.
If we take the model and plug in the real data (the actual rankings found on online sites) along with a little ice, some lime juice, bar mix, triple sec and tequila and hit frappé, the graph looks a little more like this:

For all titles found and scored (2,725 over the two week period, 4 May to 17 May, charted above) at the ten sites currently tracked, with #1 (Watchmen) scoring close to the theoretical maximum number of points.
This is where the scores lead us. And it all starts with #1 @ Amazon = 100 points.
##
Using the sales estimates (and occasionally, a smidge of extra math) I can then sort, bend, fold, spindle and mutilate the ‘main’ chart into a number of secondary charts:
- The Top 50 Series chart uses the same scores assigned to books for the Comics 500, but with a sprinkling of extra math: A weighted score is determined using the points from the top two ranked volumes of a given series as a base, and only adding one tenth of the scores for all other books in the series. [read more]
- The Publisher’s Scorecard is the most straightforward of the lot (provided I’ve entered the publishing info for the titles) — just look at the Top 500 and count: so many for DC, Marvel, Viz, so many for Tokyopop, Dark Horse, IDW, etc. Actually, I get the spreadsheet to count them for me, but that’s the gist of it.
- New releases and preorders are almost as easy: once the publishing data for the books has been updated, a simple sort by date pulls up the requisite info for the post.
- The “Midlist 500” is a re-ranking of manga volumes after excluding all non-manga, and also the books from the top 5 manga Series: At the time of this posting (13 Jun 08) the top 5 series are Naruto, Fruits Basket, Vampire Knight, Bleach, & Death Note; all together this represents some 150 books of which at least 100 are clogging up my manga chart. After excluding these volumes I then re-run and re-number the Midlist chart with the books that are left.
Actually, The Midlist 500 is the reason I set up the spreadsheet and do the rest of the math.
##
See also: The Old Faq (last updated 7 Mar 09)
Archived Lists:
Reconstructed 2007 Manga Chart
2008 Winter — unavailable; the charts were on hiatus 6 weeks through Jan/Feb while the spreadsheet was retooled.
2008 Spring [manga only]
2008 Summer [manga only]
2008 Autumn/Annual Summary [manga only]
2009 Winter [manga only] (coming July)
2009 Spring [finally, we’re posting all GNs] (coming July/August)
##
boilerplate anti-©:
Graphic Novel estimated online sales rankings compiled by Matt Blind for the benefit of the Comics Fan, Creator, and Publishing Communities and posted in the rankings category at RocketBomber.com. Derived from publicly available information; if you feel your intellectual property has been infringed upon then I’d advise you to chill, consult your lawyers again, maybe grow a thicker skin, and then also recognise that you’re getting a free, weekly link directly to your lovely offerings [right at the top of each post, in case you missed it] on a blog that specifically caters to fans of the medium. Maybe you should be sending me money, or free manga, as opposed to getting your boxers/panties in a bunch over imaginary copyrights.
All data as posted released back into the public domain (be free, little numbers, go frolic and prosper) with merely a humble request that you link rather than steal, and that any derivative works include an attribution and also remain free to all.
##
If you have questions, corrections, or concerns that should be addressed in the body of this post, please send an email to matt [at] rocketbomber [dot] com. Questions, corrections or concerns placed in the comments below will be addressed in a more casual manner after I’ve downed a few beers and am feeling saucy.