Amazon vs. WorldCat – an unscientific test

October 13, 2009

Most people who work in libraries these days know implicitly that users prefer Amazon’s database to the typical library’s online catalog (user testimonial to this effect), even if the data in the Amazon database is, to put it politely, an utter mess. Little did I expect that I would join the ranks of those who prefer Amazon, being an “expert” library catalog user and all, but such is what happens when Amazon delivers and libraries fail.

So, to the test, which wasn’t really a test, but a quick search. I had (finally!) finished reading the book Little, Big by John Crowley and wanted to celebrate this minor milestone (reading contemporary American fiction is something I can rarely accomplish) by tossing up something about it on Facebook to let my family and friends know that I am indeed not illiterate. I remembered the title, but had forgotten Crowley’s name. In a fit of librarianness, I went to worldcat.org to look up his name.

I dutifully typed little big in the large, friendly search box, and got back what I can only describe as a mess, namely 39, 353 results sorted by, ahem, relevance. Now granted, Crowley did himself no favors with his title, but still, one would think an exact title match would merit high relevance. No dice. It didn’t even show up in the top 50 search results, after which I stopped looking as would any rational visitor. Being a kind soul who believes in second chances, I clicked on the advanced search and searched for little big as the title. I figured that if I helped the system out, it would return the favor and put exact title matches at the top. Still no good. 12, 575 results, and the book I sought still wasn’t in the top 50. It did no good whatsoever to change the sort (not that a casual user would ever even do this).

Frustrated, I thought, hey, wonder how Amazon does with a generic title such as this. Bullseye. Didn’t even bother to tell Amazon it was a book, and it still pulled the exact title match out of its morass of a database. Not only that, it inferred from my search that I might be a fan of Crowley, and four of the top six results are books of his.

This depressed me. It really did. I mean, OCLC is the freakin’ global flagship of libraries, with the biggest, baddest, and most structured bibliographic database going. And it got smacked down, hard, by someone who sells everything from books to enema bags. OCLC is of course pushing WorldCat Local hard these days. Wonder if we could convince Amazon to offer Amazon Local, where users had the option to buy the book or get it at a library.

The book is great, by the way. Highly recommended.

16 Comments

Jakob permalink

October 13, 2009 20:40

Why does OCLC not publish their search engine as open source like Wikimedia does with MediaWiki software? You can do much better with existing Open Source Search engine software but at least people must be able to help improving.
Alice permalink

October 13, 2009 20:55

Hi Dale. Thanks for sharing your experience and I’m sorry it did not have a happier ending for libraries in general and WorldCat.org in particular. That said, we’d love to hear more of your ideas on how to improve the results ranking algorithm for the average reader.

In response to Jakob, it’s not exactly what you’re asking for…but it is a start. There is a WorldCat Search API that returns results for materials in WorldCat, with holdings information. (Holdings information is what helps you know where the eBook/article/DVD, etc is physically located.)
Jennifer permalink

October 13, 2009 21:37

I find myself relying on amazon.com and amazon.de for all sorts of things: are the books I want to order for my classes next semester in print and available? Easy reference to ISBN, links to other works by the same author. I can do all of those things on Amazon.de for contemporary German fiction easier than I can in WorldCat.
meow.
Dale permalink*

October 13, 2009 21:43

Jakob – cannot agree more. In my previous post I was pleading (again) for open source with regard to another OCLC project.

Alice – an API can be a nice thing, but it is worlds away from open source, as you acknowledged.

In many ways, my experience today with worldcat.org just underscores the point I made in my post yesterday about the limited capacity of one company to develop the best possible software. Google got lucky in this regard with their search technology, but that is a rare exception to the general rule that software developed in a hermetically sealed environment will display certain chronic ills that stem from the limits imposed by a small pool of developers, however talented otherwise.

I am glad to see that OCLC apparently monitors Twitter for mentions of its name. That is a good thing to see. My post already offered a substantive suggestion for improvement, namely, that an exact title match should perhaps be considered a significant factor for relevance ranking.

Incidentally, when I toss the search little big at generic Google, Crowley’s book comes up in second, third, and fifth position. Even searching across that utterly massive dataset, they ferret out what I want likely based on word proximity and order. That would lead to my second suggestion: consider using Google’s search technology for worldcat.org. It comes at a dear price, but if you cluster enough Google Search Appliances together, they can handle a database of the size of OCLC’s.

Frankly, though, I prefer Jakob’s implicit suggestion. Open up the search software to library developers. We in the community can make suggestions until the cows come home, but experience has shown that only so much user feedback can make it into a product no matter how good the intentions might be.
Alice permalink

October 13, 2009 22:25

Thanks, Dale. I will take your specific suggestions to improve the results listings to the WorldCat.org developers. I will also start the discussion with my colleagues in the OCLC Developer Network about how we might be able to include more of the library developer/open source community to enhance the actual search experience itself. I can’t promise anything, obviously, but we’ll see where the conversations might lead.
jge permalink

October 16, 2009 13:37

I totally agree that library catalogues should do better. But I still wonder why you as expert user didn’t use the refining or drill down options. Klick “author” in the left column and browse the list of authors. Or klick Format “Buch” and “Belletristik” — and you have Crowley in the Top 3 if you used “advanced search” and searched for “little big” in the title field.

Amazon uses user data to improve its search; I would very much like to have OPACs do the same.
Dale permalink*

October 16, 2009 13:55

Of course I know how to do those things as an expert user, but you missed my point. For one, why should I have to do those things to find my book when clearly search technology can be good enough to do it for me. But more to the point, how many expert users are there out there? At best, .0006% of humanity, I would suggest, and most of them are librarians. We cannot teach people to be expert searchers, not because that is not possible, but because most people do not want to learn such arcana. Not their cup of tea.
yest permalink

October 18, 2009 01:49

excellent post. just to add a reminder of Jon Udell’s library lookup thing, which lets Amazon users find a book in a library: http://jonudell.net/LibraryLookupGenerator.html
Dale permalink*

October 18, 2009 16:19

Thanks for the kind words, as well as for mentioning Udell’s script. He has created some good tools over the years.

In this context, I should also mention LibX. Among the many good things it does to connect users to library services is to drop an icon on Amazon pages that links the title to the user’s home library catalog. I use this all the time, and cannot believe it did not occur to me to mention this in my post.
Dale permalink*

October 19, 2009 16:56

Oh, should have added that LibX makes use of OCLC’s xISBN lookup to make that work. This is a great example of the value OCLC can provide to the library community by opening up its (our) data to outside developers (aka us).
Yasmin permalink

November 1, 2009 15:58

Sorry for joining this discussion so late, but has anyone taken into account the obvious fact that Amazon wants to *sell* books and therefore makes every effort to make discovery easy and straightforward? If people can’t find books on their platform this has a much more immediate effect (turnover, hard dollars) whereas libraries *only* lose relevance and users in the long run (which is in fact worse). Amazon relies to a great extent on its online discovery interface, while libraries still haven’t quite understood its importance.
Dale permalink*

November 2, 2009 14:08

Thanks for the comment, Yasmin. I agree completely with you that libraries have not understood the importance of presenting things that users want. Many librarians still believe that the catalog is there to help users unearth the obscure bits in the collection, and, more importantly, that this should be the default mode of the catalog. We are “selling” books, too, in a sense, and if we do not wake up to that reality, we will be buried. Actually, we are being buried, daily.

I am all for the obscure and arcane personally, but most members of academia (translation: undergraduate and, frankly, many graduate students) are not performing research that requires such depth. They are writing papers for classes and need a useful library. If we drive them into the hands of Amazon and Google with our pointyheadedness, it won’t really matter what our catalog does.