Book APIs - the good, the bad, the ugly

Brendan · March 17, 2019, 12:38am

In the process of developing this site — migrating my massive book wish lists from Amazon > WordPress and trying to grab some useful metadata by any means necessary — I’ve grappled with a few different APIs that purport to offer bibliographic data.

I can’t say I’ve tried them all, but I’ve tried several and none of them really are what I’d call excellent. I should probably write at length at some point about my experiences with each (I think so far: Amazon, Google Books, Open Library…and been meaning to try Goodreads as well) but for now I’m curious if anyone else has experience with some kind of programmatic book data gathering experiments, and if so what you found useful / not worth it / what you wish said APIs could do better, etc.

For what it’s worth, I would of course love a universal comprehensive database with clean and complete data of all books ever — even just the metadata promised by the thwarted and aborted Google Books project would be an awesome start — but I’m pretty sure it definitely does not exist. Open Library has a long way to go, but is an exciting and commendable project that seems to aim at something like this!

brian · March 20, 2019, 1:43pm

I too have been on the hunt for one of these! Bummer to hear that none seem to completely fit the bill. Have you played around with the Google Knowledge Graph search API at all? I’ve been meaning to give that a spin for linking books and authors to other media/metadata.

The Open Library project looks really neat though! Do you happen to know any apps that are built on top of it? When it comes to these open, contributor-based, domain specific databases, The Movie Database is one of the most comprehensive I’ve come across. I think one big factor in their success was the popularity of the apps built on top of it (Plex and Letterboxd for example). A lot of contributions to TMDB have been driven by passionate users of these client apps just wanting to make their favorite websites work better.

Though, I’d imagine film + television is also a much smaller area than all of the books ever. So maybe it’s just a matter of the problem at hand being more difficult.

Brendan · March 20, 2019, 2:31pm

Oh cool no I haven’t tried Google’s Knowledge Graph Search API yet! From what I gather, Google Books is sort of abandonware at this point; see e.g.:

And super interesting history here:

Could be I wasn’t using the Google Books API quite right but I recall searching for like 25 books from my antilibrary list (by title) and only getting results for like 8. But perhaps this other API will work better…or maybe not; haha I do think also the case that it’s simply a very hard problem!

I am really rooting for Open Library but this seems like it’s early stages of a decades-long project. I’m not aware of any apps that are built using this data; if anyone comes across any let me know

Had not seen The Movie Database; thanks for the link. Interesting how the app ecosystem drives the dynamic / feedback loop for contributing data. Agree if there were a really popular open-source Goodreads sort of service that would help a lot…I think Open Library is kinda trying to be that too (they have some feature like user created book lists / reading log) but hasn’t caught on yet.

But yeah I’d think when it comes to “cataloging complete data of all [x]” books is probably harder than movies by…idk at least a couple orders of magnitude? Both b/c of overall much greater volume and messiness e.g. partial historical data, titles with dozens of different editions, etc.

tomcritchlow · March 25, 2019, 7:28pm

Commenting mostly just to subscribe - I too would love to know what the latest and greatest books API is. Seems like a thorny problem (esp if you want to tie it back to commerce). Back in the day when I built 7books I just used the Amazon API which seemed “good enough” but also not ideal…

Brendan · March 25, 2019, 7:59pm

Yeah Amazon seems like it would be the best…but when I tried recently it seemed like their advertising / affiliate API was the only one available, and due to some recent rule change they deny access if you have an affiliate account but w/o any actual sales in the last 90 days or something…so basically a huge pain in the ass. Hoping Goodreads is essentially similar data but more friendly, but I’ve yet to try it, on my to do list.

eshnil · February 11, 2020, 2:13am

For learnawesome.org, I had to write crawlers that collect information about a book from multiple resources (OpenLibrary, GoodReads, Amazon, Wikipedia and various summary sites) etc. While I managed to make it work, it wasn’t easy. Amazon actively detects crawlers and locks them out. OpenLibrary doesn’t have a lot of data yet. And GoodReads’ tagging system is useless if you want to categorize books across topics.

The programmers among you can have a look here: https://github.com/learn-awesome/learn/tree/master/app/utilities

Brendan · February 11, 2020, 3:12am

Ah, nice. I didn’t try a crawler w/ Amazon, but since I already had tons of stuff in wish lists, I found a JS snippet that basically just scraped the current (already loaded) giant list page into a table, which I copied to csv and then imported w/ a PHP script into WordPress…kinda janky but it works.

Yeah, I’m really hoping Open Library improves over the long term, love the mission but the data is so messy (and lots just missing entirely) unfortunately. The one I have yet to try is grabbing Goodreads data via the API, seems promising but idk, will hopefully give it a try soon!