What is the difference between Ebscohost, Archive.org, Google...etc?
February 24, 2015 5:35 PM   Subscribe

What is the difference between EBSCOhost, Archive.org, Google, WorldCat...etc?

Here's my amateur understanding: When you visit and search these sites, you are searching a database of indexed information collected and organized by the people that built these search tools. This information is updated regularly. One big difference is that Archive.org, EBSCOhost and you local library only provide information in searches about things that are only within their own databases, that is the information they share is about things they own. While Google, Yahoo and WorldCat (Maybe) provide a filter to mainly search other people's things.
Can someone provide me with a clear explanation of the difference?
posted by PHINC to Computers & Internet (4 answers total) 6 users marked this as a favorite
The "clear explanation" is the one you've already written. Archive.org and library search engines and stuff do exactly what you've said: they allow you to search through content that is actually owned by or otherwise available through them. (Archive.org is a little different in that they may not explicitly "own" the content, but they host everything that you get to through there - if you use the Web Archive, that copy of that webpage from 2001 is coming from Archive.org's server, for example.) Google and Bing (AFAIK Yahoo uses Bing to do its actual searching now) have generated a gigantic index of things that are available on the web and stored where those things are - they do not host the content (by and large). Google and Bing index automatically. They have software that crawls and finds data out on its own, and then there are software algorithms that sort through it all and make it available for you to search. Archive.org is really more of a library - someone's there to maintain the database and, most importantly, deal with making the actual content available (because they host it), just like you can search for a book at your library and then have it in your hands later.
posted by mrg at 5:56 PM on February 24, 2015 [1 favorite]

One big difference is the paid databases (eg, EBSCO, JSTOR, ProQuest) contain proprietary information. There's some overlap with coverage but in many cases a particular journal series, say, or the works of a particular publishers will only be included in a particular platform. As a result, these databases require a subscription and are ruinously expensive.

The companies behind these databases have full-time indexers (I assume) in addition to automated systems to pull out indexing information. They will thus be much more thorough. You might be able to find citation information, for example, from Google Scholar. However, it won't be nearly as comprehensive as using one of these (or ideally, all of these) when performing serious research.

Archive.org contains public-domain information with more-or-less adequate indexing. It depends on programs like Google Books and Project Gutenberg and the donation of time and effort from volunteers. While it's a fantastic resource it is not "complete" in any sense of the term.

WorldCat searches library holdings. Libraries can, but not all do, upload their catalog records to OCLC, the parent of WorldCat.

Does any of that help?
posted by orrnyereg at 5:59 PM on February 24, 2015 [1 favorite]

An additional complication of EBSCOhost in particular is that it is a database of databases as well as a front end for searching content that you might subscribe to as individual journal titles. Your particular library may buy subscriptions to databases A, B, and C but not X, Y, and Z, or journal titles Q and R separately from database S that includes those titles if you buy it. What you can access is impacted accordingly.

Your particular library might also have links in its own catalog that take you to content in EBSCOhost and other databases, or some way of doing one search across all the databases they subscribe to. This might work more or less well than searching whatever database directly because of...reasons. Various reasons. (This is a lot of my job, it's complicated.)
posted by clavicle at 7:14 PM on February 24, 2015 [4 favorites]

Libraries also offer you stuff they don't own - for example, they pay money to subscribe to EBSCO, JSTOR, ProQuest et al who own the content, but let you access it for free.

There's also differences in how the indexing is done, and in the content that's indexed. Libraries usually only catalogue at an object level, not parts-of-object level. So you'll have one catalogue record for a book, not 10 different records for each chapter of the book; one catalogue record to show that a journal is held, but the contents of the journal are not. Libraries catalogue lots of different types of material too: books and journals but also newspapers, maps, websites, sheet music, audio and/or video recordings, CD-ROMs and DVD-ROMs, kits composed of different things (like a sound cassette + book), pictures, physical objects (realia), manuscripts, microform, memorabilia, pamphlets ... I am sure I'm forgetting something but you get the idea.

Databases like EBSCO etc break one issue of a journal down into its component articles so you can search more precisely for what you need. But they usually only cover content like journal/magazine articles and sometimes reference works. Increasingly, of course, this is changing. So Naxos, for example, has databases that include streaming audio and video. Some database vendors are assembling collections of different kinds of digitised material arranged around a particular theme, so they might include theatre programmes, photographs, recordings, etc as well as reviews from newspapers, biographical material, etc.

But yeah, the main difference between library catalogues and archive.org etc and the proprietary databases is the ownership.

Google, Bing, etc are search engines. They don't actually own the material, they just search it. (Well, apart from side projects like Google Books, Google News Archive etc.) Most databases have their own search software that allows you to search and find their content and their content isn't necessarily open to search engines like Google. Library catalogues' search functions are usually built in third-party software (frequently this is a component of an ILMS, Integrated Library Management System, which has lots of back-end bits) rather than being individual to a library. There are exceptions for all of this, of course.

Does that help? I've tried not to be too technical, but there's a lot of variables that are at play when you compare these types of things and I find it's hard to explain it simply.
posted by Athanassiel at 7:15 PM on February 24, 2015 [1 favorite]

« Older What could make a gas range reek of gasoline?   |   How to be around people? Newer »
This thread is closed to new comments.