Wrong with Search: Content Duplication

I wasn’t at Google’s now annual Searchology event last week because I was at a private equity conference in New York. Here’s a rundown of the announcements at Searchology.

Some commenting after the fact were saying a couple of these announcements and demos were motivated by perceived heat from Wolfram|Alpha, which just launched. I would disagree. Wolfram isn’t really a threat to Google, although Google clearly feels more general pressure and the need to evolve its UI and presentation of results. Microsoft’s Kumo successor will soon launch. I’ve been using it casually and like the look and feel quite a bit. I haven’t yet done any systematic testing of relevance however.

Regardless of where the pressure is coming from it’s a good thing.

While I recognize there’s a lot going on and progress happening, I believe things need to change and evolve further in search. Here’s a page that I think exemplifies some of the problems right now: a search result for a press release put out last week (“AT&T Leads the U.S. in Smartphones and Integrated Devices“):

Picture 7

Because everyone is trying to rank everyone picks up these releases and tries to get additonal page views from them. Putting aside the need for Google to be evenhanded among competitors for the moment, why do I as a user need to see 15 links to the identical information?

This problem often arises in the local context too — especially for a business name search (“Nagano Sushi San Francisco“):

Picture 9

The map at the top provides the desired information quickly but the links below are essentially duplication of contact information, similar to the press releases in the above example. In the local category search context it’s somewhat less of an issue and the 10 pack provides name and number information, see, e.g., “Boston Dentist” (though there are issues about accuracy of data and hijacking, etc.):

Picture 8

However there’s still lots of duplication and apparent redundancy. Travel used to be the worst, with tons of affiliate sites popping up for every hotel lookup. Google has largely addressed that problem.

I believe that Google needs to do some things with the interface so that I don’t have to see this kind of duplication of content. Publishers that operate directory or local search sites would be likely to cry foul if Google picked a single, authoritative site and then buried everyone else below a plus box or some similar device (see, e.g., news).

Picture 10

But this is what I’m calling for in effect — something that makes the interface cleaner and perhaps uses images or other icons to enable me to more quickly get to content and avoid the click and back button problem.

Google and Yahoo! have been experimenting with favicons in paid and organic search results. Yahoo!’s Search Monkey push the idea much further to allow publishers to control presentation of their listings. I favor experiments like this, although visual clutter becomes a real possibility, because users can quickly differentiate between trusted and anonymous sites and get to results.

I haven’t thought all this through as carefully as I probably should before posting but I did the AT&T press release search this morning and that’s what got me going.

Do others agree or disagree? Do you think that these problems I complain about don’t really exist or have largely been addressed?

One Response to “Wrong with Search: Content Duplication”

  1. Stever Says:

    On the local side of things there are many cases where the duplication is not full duplication. Taking the Nagano Sushi example we see the maps result, linking direct to their website, then first organic result linking again to their website. Some duplication as they both link to the same place but fine in the context of Universal Search.

    Next we see Yelp. Sure the original data is pulled from the same data sources Google uses to populate Maps, but they have reviews for added unique content. Again, that’s fine as there is uniqueness there.

    Other IYP sites could be doing the same thing with their own unique reviews, thus again unique content worthy of ranking somewhere. But then when you get down to the YP sites and thin directories you basically just have a business listing with address and phone number. Here the duplication begins.

    Personally I think this is fine. Most searchers will be choosing from the first few results and 9 times out of 10 those results are the one’s we were looking for anyways. The rest are easily ignored.

    Otherwise, for a search as specific as a particular business name, we end up with a situation where Google is only going to show a handful of results. No page 2,3,4,5 etc… when there does exist 26,000 pages that reference that query. Then we get into the realm of stifling information, or pseudo censorship.

    I know I’ve intentionally went and looked at everything as deep as page 10 while looking for specific info about a business. Particularly when I have a funny feeling about something fishy and am looking for any negative info that might support my suspicions.

    Instead I prefer that they show it ALL and do their best to rank it in some order of relevance. It’s then falls on the individual websites to present info worthy of ranking well.

    So by default, this applies to the press release search in your first example. You searched the exact wording of the full title to that particular release so of course you get a multitude of results with the same release. However if you had searched more broadly, “AT&T Press Release”, you get the PR page from ATT itself, most recent one first, then a mix of older releases on various sites. Those likely ranking based on how much buzz they may have generated at the time and thus links pointing to those pages.

    Basically I’m saying that as long as the first few results are highly relevant to the vast majority of intents of a specific search phrase the rest of the chaff that may appear in the results quickly becomes irrelevant, unless you don’t want it to because you have another purpose to find a particular deeper result. That then comes down to the user and refining their query, or scrolling and digging deeper.

    *added this later as I skipped over part of the post before I started writring*

    Sure they could switch to showing just the top 3 or 5 with a “more” link should you want to delve deeper, but why bother, we basically have that already except they show 10 (20 when it includes a local 10 pack map). The rest, or “more” can be found on page 2,3,4,5,6,7,8,9…………..

Comments are closed.