Amazon vs. WorldCat revisited: getting crushed by the little, big guns
A couple of years ago, I wrote a post where I criticized OCLC WorldCat.org‘s handling of title searching and result ranking. Quick synopsis: when searching for the book Little, Big by John Crowley, it wasn’t coming up in the top 50 results. Part of the reason for that was that I was in Germany at the time, and clearly OCLC was tweaking the relevance ranking based on the user’s IP address (not a wise idea). I compared the results to Amazon, where a search for the title–without even specifying that it was a book–brought it up in first position.
At the time I made a mental note to rerun the test when back in North America. Took a while, but finally did so today. How did it go?
Well, two years on, WorldCat.org’s relevance ranking still gets whooped on by Amazon. Amazon serves up the same results: Little, Big comes up front and center. For a new twist, I brought in the other titan of books, Google, and searched their book database. Shazam. Comes up in first position, followed immediately by the other book for which ‘little big’ is an exact title match, that being Little, Big!. On WorldCat.org, Crowley is at eighth position, while the kids’ book sits at 10. Number one? Wilder’s Little House in the Big Woods. Great book, but why is it there? Being a sport, I tried a keyword search for little big in the proprietary WorldCat database. After scanning the first 100 results without finding Crowley’s book, I gave up. It’s clear why that is so, since the query is sent with a logical AND operator, but why can’t a database for which we pay money do more sophisticated query processing?
To round out the field, I wondered what would happen if I searched for this title using, say, VuFind, a catalogue overlay/discovery layer currently used by numerous libraries. It’s an open source library application with a small crew of developers, in other words, the least high-powered of the four entries in this unscientific test. I chose to search in Yale University’s YuFind, because I knew they’d have the book but more importantly since it’s by far the largest database using VuFind as a frontend. Where does Crowley’s book appear? You guessed it: number one. (Ironically, in their Voyager-driven Orbis catalog, it comes up at #81 if you do a keyword search – ouch, but not really surprising given that Voyager was developed when some current librarians were in grade school.) Quick shouts to Ex Libris, whose Primo product also plucked Crowley’s book from 47,000+ results and put it number one (catalogue: University of Utah).
My point here? Amazon and Google set the bar for our expectations of search. VuFind uses modern tools (Solr et al.) to deliver similar results, as do Ex Libris and Serials Solutions, albeit only in their newer products. Why doesn’t every product?
UPDATE: How I wished I had made a screenshot of my OCLC results. As with the last time I wrote about this, they’ve tinkered with the weighting of Crowley’s book to push it up the results (I surmise; other factors may play a role). It now comes up in second position in a book or everything search, tucked neatly behind Wilder’s book. This solves the immediate problem, of course, but my point is the underlying handling of any search string, not just this one. Other tools do more post-query processing and deliver better results.
Comments are closed.
Thanks for continuing to help us refine and improve, Dale. The folks at OCLC who work on search relevancy and rankings want your results to be as good as they can be. This is good feedback.
You’re welcome, and thanks for understanding the purpose of this piece, which is to start conversations and encourage people to do their own exploration. What was interesting in this round is that I discovered that systems backed with Solr all pulled the title up in first position, even though they are all very different in their details.