Looking for a catalog of book spines
December 28, 2020 5:59 PM   Subscribe

I'm looking for pictures of the spines of every book ever published, including every changed version of the art. It's a small ask I know, but if someone knows of a good place to start that would be very helpful.

I'll be using it to train a machine to catalog an entire bookshelf from a single picture.
posted by Tell Me No Lies to Technology (12 answers total) 7 users marked this as a favorite
 
'every' is a large number, do you have a particular domain in mind? or is this a new (old) uncatalogued library.

These guys [Chiba .jp] are starting on their library, which is funny as I was thinking drone-scanning would be a good idea last week while getting a stiff neck from scanning titles.

I wish I knew of such an image store; I do judge books by their covers (and spines)
posted by unearthed at 12:06 AM on December 29, 2020


That drone-scanning isn't even about cataloguing the books but rather scanning books on the shelves to see which ones have been misplaced. Ordinarily, that's the job of a library page. The drones are not cataloging the items. That's still a process done by people.

For the OP, Cataloging is a labor-intensive process and I understand your desire to find a short cut but I doubt that your process, which is an interesting idea, is doable in the near future. Not only would you need to find an existing catalogue of images of book spines (not just the front of the book), but you'd want a catalogue that already has whatever info you desire, such as title, author, publication date, plus whatever else you need for your personal catalogue. After all, if the image store doesn't include that kind of metadata for you to draw from, then the whole venture is pointless. And you'd have to trust that who ever created it did it accurately.

And saying every book ever published, including all versions of the cover art...that's an unbelievably enormous list. How big is your personal library? (I assume it's your personal library based on your question but correct me if I'm wrong). Are all the books in English or do you need a catalogue that includes books published in other languages? How many of your books are old enough to be public domain? In those cases, that single novel could have been published 100 times with 100 different examples of cover art. The enormity of what you ask tells me that you're honestly better off cataloging it yourself. There's a reason libraries have full-time cataloguers, and the larger the library (say a university library) then it's an entire department devoted to cataloging the items.

Now I'm sure there's a decent image store of book covers, though I highly doubt you'll find one devoted to spines. My suggestion is perhaps you can try setting up a program that matches images of the front cover with those uploaded to Goodreads or something like that. Yes, it's not a single photo of a book shelf that magically creates a catalogue for you, but perhaps it could save you some manual labor from entering all the info of each individual book since Goodreads has a lot of that info available. How many books are you dealing with? Would it still save time to photograph each cover versus manually building the catalogue?

Perhaps I'm wrong and there's already a giant reposity of images of book spines on the internet with all the attendant metadate attached but I would be surprised to find one.
posted by NotTheRedBaron at 2:43 AM on December 29, 2020 [3 favorites]


The archetypal library I have in mind has tens of thousands of books, is run almost entirely by volunteers, and has no professional librarians on staff. They’re probably using something like ResourceMate to track everything.

The Chiba project is interesting, but they have barcodes and ICs to work with, which puts it way ahead of those.

In any case Open Library has catalog records for 20,000,000+ books, at least a portion of which include cover art (but no spines). If I got feeling super ambitious I could start crowdsourcing spines and link them from there. Of course they’re short a few hundred million books but it would be a decent start.

The local library has maybe 16,000 books on the shelves and would make a useful proof of concept. In a pinch I can scan those myself.

If you get down to it I don’t really need pictures of all of them — there are only so many ways you can place titles and authors on a spine, and I can probably ID the book if not the version from those. Still it would be great to have a very large dataset to help with the training and verification.
posted by Tell Me No Lies at 7:54 AM on December 29, 2020


I see, so this is not about a specific collection (your home library) but a proof of concept. If I'm understanding you, you're basically just trying to write a program that replaces the need for professional librarians. I'm sorry, but you don't seem to understand what librarians actually do. After all, who do you think is managing the budgetary resources for the library? Deciding which items to purchase and which should be deaccessioned. After all, what happens when the library runs out of physical space, which happens all the time? Choices have to be made. Volunteers with no training in library science should not be making these decisions. Is this an imaginary public library that has programs that help the community? Story time for children? Job/career advisement? Other educational programs? All of this should be done voluntarily? Libraries are not just books - resources come in many forms. Magazines, newspapers, music, film, brochures. Publish libraries are increasingly offering items such as building and gardening tools for people to borrow for projects as home who don't need or can't purchase themselves.

Theoretically, your idea may reduce some of the cataloging needs, however, there is still a need for catalogers to see the items they work with. They often work with existing catalogue records created by others (such as the records in OCLC), however, still make changes or add things based on their own library's user needs. Or, simply, there are errors in the record that need to be fixed. But often, there isn't an existing catalogue record available in OCLC and must be created from scratch. Or perhaps their specific item is unique in some way that should be reflected in the catalogue, such as a book that's been signed by the author.

You're also assuming that a book in the library has the original dust cover on it, or heck even the original cover (sometimes book repairs require replacing the cover).

This is an interesting idea but if you are really serious about it - you need to consult actual librarians because there are flaws in your premise which leads me to believe that you're attempting to build something without a full understanding of what the problems are. Like a person who wants to build a machine that replaces a doctor because they assume that all a doctor does is look at a person's vitals and lab results and ignores the time a doctor does talking with a patient, learning their history, observing their behavior to see if there's something they might not be saying or understanding and advising that patient on their health managment. Talk to professional catalogers (I'm not one, I've only done a small amount of cataloging in Library school) and learn from them what the needs and challenges are.
posted by NotTheRedBaron at 9:48 AM on December 29, 2020 [4 favorites]


NotTheRedBaron's responses are gold for thinking this problem through!

Another consideration is that spines are more likely to wear out and possibly be repaired or rebound (especially for library books) to the extent that they might not look anything like the spine of the book as it was published, adding possibly infinite variability for a single edition of a single book. Using images from a local library would probably also require you to work around the placement of spine labels containing local identifiers.
posted by quatsch at 9:50 AM on December 29, 2020 [1 favorite]


I wonder if for your purposes the specific editions and art even matter. Most books are going to have the author's last name (frequently first as well) and the title words on them in some orientation. That's going to be highly correlated with the ISBN metadata. I'm trying to think if there are any automated book storage/retrieval systems or online retailers that might have the images of the spines but I can't come up with any.
posted by wnissen at 9:54 AM on December 29, 2020


There's just not enough information on a spine to be sure you know what the book is. For example, the first and second edition of a book might look exactly the same from the spine, but they will have different content. If a book has been rebound, or recovered, it's not going to be standard with any other version of that book. Some spines don't even have any information on them. There are many good reasons why the book spine is not what is used for identifying books for cataloguing.

If you can get a picture of the backs of the books, you'd have a good chance of picking up the ISBN, which is a unique identifier for a book that can be scanned and searched in LibraryThing or Worldcat - that's probably the closest to taking a picture of a book and getting catalogue information about it. However, not every book has an ISBN, not every ISBN is entered correctly, or in a standard format, on those platforms, and not every ISBN is unique (some publishers reuse them). And not every book is available on those platforms. But it would be the better option for automating the documentation of most of a collection via photograph.

And please note that any project like this relies on the work of cataloguers around the world who have put in the time to develop skills around describing books (and other materials) and looked at the entirety of a book to document each item and ensure that accurate, useful information about it is available for public use. If a drone taking pictures of spines were to create a library catalogue, it would likely just end up with a list of authors and titles with a lot of OCR errors.
posted by phlox at 10:15 AM on December 29, 2020 [1 favorite]


And there's also the many thousands of books with very thin spines that are nearly impossible to read. Peruse the children's section and see how small those books are!
posted by NotTheRedBaron at 10:28 AM on December 29, 2020


>> ...what librarians actually do. After all, who do you think is managing...

The library I volunteer at does 100% of your list with the exception of tool lending, and it does it with one underpaid administrator (who spends most of her time fundraising) and a raft of volunteers. Undoubtedly a professional librarian could help us do it much better, but that's not in the budget. It would be nice.

>> If I'm understanding you, you're basically just trying to write a program that replaces the need for professional librarians

Not at all. I'm trying to write a program that will catalog books by the shelf, and catalog is probably the wrong word. Inventory maybe. In any case the program would tell you what is on the shelf, what is not, and what has been misfiled by the alphabetically/dewey-decimally challenged.

>> Using images from a local library would probably also require you to work around the placement of spine labels containing local identifiers.

Yeah. Machine learning can help with that but it's not a panacea. It may turn out that spine labels kill the whole idea.

There's just not enough information on a spine to be sure you know what the book is.

Hmm, that's a problem. Enough to make me think that trying to identify versions may be a dead end.

Peruse the children's section and see how small those books are!

Yes, I fear the children's section. On the other hand it would be hard to screw up inventory monitoring there as badly as volunteers do. :-)
posted by Tell Me No Lies at 1:21 PM on December 29, 2020


Enough to make me think that trying to identify versions may be a dead end.

Yeah, if there was a robust corpus of these already (and I do not know of one and agree Open Library would be great IF they had the spine data which I believe they don't, they do a pretty good job at tracking editions however) the version information can sometimes be indicated externally and sometimes not. Also many libraries, and I don't know if yours is one, regularly re-cover books or remove paper covers that may have identifying information on them, often adding their own spine labels which would have DDC or LC information on them. In a library of 16,000 items, things like editions are less of an issue, assuming it's a public and not an academic library. In an academic library, knowing the edition is more important.

phlox and NotTheRedBaron really have it, when people have done this using existing technologies, it's usually by scanning the ISBN and using that information to make an approximation of holdings, but cataloging data is more proprietary (so yes I think you're right that inventorying is what you're after) and requires more of either human eyeballs or money to make the books fully cataloged.
posted by jessamyn at 1:42 PM on December 29, 2020


Ah, I finally understand where you're actually coming from. You are talking about an actual library you volunteer at and you want to find ways to help. Basically, I'm getting that what you want is a comparitively easy way to see if books are missing or have been shelved in the wrong spot, right? That's not a cataloging issue but inventory as you mentioned in your most recent update, so all the issues I was saying about cataloging doesn't apply because the library will have already catalogued and placed barcodes on new books before putting them on the shelf.

As unearthed mentioned in the first comment, the drone scanning of barcodes is an interesting idea. But again, that requires that all the barcodes are on the spine and easily visible by the drone and for thinner books, libraries most commonly put the barcodes on the front cover which would defeat the drone. That would also defeat a photograph. You would also run into the issue of inconsistent lighting on the books themselves which would potentially obscure identifying information in shadows cast by books sticking farther out on the shelf.

I which I had an idea off the top of my head that would help but ultimately, it's still down to human eyes scanning the shelves to do it. Even libraries with more robust budgets than yours still do this. I do think in the future, there will be more effective ways to keep the shelves themselves organized and avoid the problems of missing and mis-shelved books but for right now, I don't have an answer for you.
posted by NotTheRedBaron at 1:51 PM on December 29, 2020


That's not a cataloging issue but inventory

I apologize for this confusion. As you say, amateurs can blunder around a long time if they don't consult the experts.

Ah, I finally understand where you're actually coming from. You are talking about an actual library you volunteer at

The library I volunteer at is the immediate case I can solve and test solutions on. However we are in contact with a number of other shoestring libraries that have the same problems. ResourceMate is the common denominator and whatever I do will probably hook into it.

(just as a side note, three years ago all sixteen thousand books were in the card catalog. We're still sorting through many data entry errors from the conversion to digital.)

You would also run into the issue of inconsistent lighting on the books themselves which would potentially obscure identifying information in shadows cast by books sticking farther out on the shelf.

The photos/video would almost certainly be gathered by someone with a stabilized selfie stick so a lot of that is avoidable. As for the lighting/partial pictures, images that have a predictable format but have color changes or parts missing are something that machine learning does pretty well.

From the feedback thus far it looks like I'm going to have to use that selfie stick and a big chunk of data entry to create my training library. Thank you to everyone for your comments and education.
posted by Tell Me No Lies at 3:58 PM on December 29, 2020


« Older Manage Photos in iCloud Family Storage?   |   Discontinuing COBRA mid-month Newer »

You are not logged in, either login or create an account to post comments