Why We Don’t Have Good Local Business Content

The following is a guest post from Long Hill Consulting’s Marty Himmelstein:

Local search’s most significant failure is its inability to provide an accurate stratum of content about neighborhood businesses. The necessity for this base layer arises from the defining characteristic of local search, which is that it is model-based. Local search’s first job is to create an accurate depiction of places in the real world. Being found trumps being reviewed. Being found also trumps search engine optimization. When local search is running on all cylinders it will not make qualitative decisions; if there is a shop on Main Street people will find it. There will be no jousting for position, because the demarcation between fact and advertising will be clear.

To address the failure, a number of Internet companies have either been formed or have started initiatives to aggregate content about brick and mortar stores and services, either as their core service or to improve their core service (e.g., user reviews).These initiatives solicit content directly from businesses, and often, following a wiki-type model, from individuals who have no direct relationship with the businesses for which they create content. In the latter case, the contributors might receive a small financial incentive if the information they submit can be verified, usually by being ‘claimed’ by a submitted business, or when a claimed business buys additional services from the aggregator. To create an initial layer of content, most companies purchase some form of Internet Yellow Pages content from one of several compiled-list vendors. The main content sources for these lists still derive from phone directories, which the list vendors improve through varying degrees of quality control and enhancement.

Because these efforts proceed from one or more incorrect assumptions about the nature of local search, it is unlikely they will be successful. Most adhere to an erroneous ‘walled garden’ view that business content gathered on the Internet is a defensible asset. But information flows freely on the Internet, and since these services don’t control the information sources they require to assemble and maintain a data asset, no data they aggregate can be defended. (The information sources are businesses themselves, and ultimately they control their own information.) It will also be hard for any one of these initiatives to gather a critical mass of content. That local content is both valuable and not defensible is an apparent, not a real, contradiction. Local content is hard to create, but once created is a common data resource: it is best to think of the information about a business as nothing more than a structured web page. Another problem is that Google has already created the technical infrastructure to aggregate and distribute structured business content, and other initiatives have nothing to offer that improves on Google’s technology. Lastly, these initiatives assume the Internet can be used to short circuit real-world notions of community, but it can’t. Unmediated user contributed content, so successful for expressing creativity and points of view, is the low-hanging fruit of local search. It is not the organizing principle upon which local search will be built.

From the perspective of a service that requires better business information than that which is available to them, the justification for a walled garden seems simple: “We know people are willing to contribute content. We’ll create tools to make it easy for businesses, or, following a wiki model, anybody, to supply us with business information. These tools have a development cost and it takes effort to solicit and verify content, but having done so, the content we gather will be much better than standard listings data. This content has value, and there is no reason for us to give it away to others. Further, our users are precisely the ones these businesses want to reach. We’ll try a freemium model, and charge businesses for enhanced representation.”

Too many of these services are vying for businesses’ and users’ attention for any of their individual efforts to succeed. Businesses won’t contribute and maintain the same content at multiple services and pay for redundant capabilities at each. Moreover, once a business creates its digital profile, the marginal effort to distribute it to multiple services is (or could be) small. The ‘business content is a defensible asset’ model erroneously conflates business content with the value-added services that rely on that content to be successful. Business content is indeed valuable, at least as much to the services that need it to build compelling sites and capture advertising revenues as to anybody else. The demand for this content will drive the price to the businesses that supply it to zero. It’s not even hard to imagine scenarios in which businesses derive revenue from syndicating their content to downstream services. One way to ensure that Google doesn’t become the sole depository of business content is to give businesses the incentive to distribute theirs widely.

From an Internet ecosystem and data modeling perspective, multiple walled gardens of duplicated and separately maintained business content makes no sense at all. Popular services might get a continual stream of updates, but new or struggling services won’t, making it even harder for them to gain traction. This ‘each to his own’ approach will perpetuate a morass of inconsistent and obsolete content, much as we have now, to the continuing dismay of consumers.

The adherents to the flawed garden analysis are either unfamiliar with a basic data modeling tenet, or think it doesn’t apply on the Internet. Data can be distributed, copied, and duplicated but each occurrence must be traceable to a known provenance that has an unique identity. For the purposes of data modeling, the Internet is nothing more than a very big disk drive. The storage medium has changed, the requirement for sound data engineering has not.

Unique identity is not a new concept on the web. Web pages and blog posts and comments have at least an informal notion of identity, and second generation content syndication formats support stronger notions still. These formats also support structured content, a requirement for business information on the web. Google Base, a notable example, specifies Atom and RSS 2.0 formats to allow data providers to specify and upload structured content to, well, the Google Universe. Google also provides a query language API so developers can retrieve content from the Google database. Google’s walls are permeable: their interests are served by good content, not its ownership.

Unfortunately, the quality of local business content lags well behind the Internet’s technical capabilities to create, aggregate and distribute it. An important reason for this quality deficiency is that we have relied almost exclusively on the technology that enables the next generation of local search, while underestimating the need to create online representations of the real neighborhoods and relationships within which businesses exist. As I noted in a previous post:

The fundamental role of a community in local search is to establish an environment of trust so that users can rely on the information they obtain from the system. Businesses exist in a network of customers, suppliers, municipal agencies, local media, hobbyists, and others with either a professional or avocational interest in establishing the trustworthiness of local information.

Businesses are responsible for their physical storefronts, and, ultimately, their digital storefronts. But businesses don’t exist in a vacuum, either physically or online. They require the services of the community to which they belong – when online, especially in the formative stages of local search. To create accurate digital storefronts, then, we need to enable the participation of the various constituencies that are part of a community. It is within this framework that a reliable stratum of local content will be created and maintained.

Individuals who contribute content because of a small financial incentive, who are most of the time trustworthy and altruistic but will occasionally be neither, and who have no intrinsic connection with the neighborhoods in which the businesses they describe reside, do not constitute a community. It’s not that their contributions aren’t valuable or even necessary, it’s that they are not sufficient for ensuring an accurate depiction of the local environment. Pick your own war story about how local search failed you in a time of need (everybody has one), assume your need was urgent, and then consider the assurances you would require from the system to trust the information you get from it.

The only way local search can meet these assurances is to build them into its basic fabric. The basic fabric of local search is the community, because the community provides the means to establish the network of trust that is essential to local search. The purely user-contributed content model that works so well for YouTube has shortcomings when applied to local search. The preeminent virtue for YouTube is creativity, for local search veracity. YouTube is whimsical, local search mundane. People use YouTube to pass time, local search to save time.

In an immeasurably weightier circumstance, Winston Churchill remarked “You can always count on Americans to do the right thing – after they’ve tried everything else.” And so it is with local search. Its eventual shape, though tortuously arrived at, seems to me easy to discern. Each business will have its own digital identity and a core of factual information, kept in a standardized format, which it or its designees will maintain. These designees will aggregate content at the community level, as defined above. Designees will include entities, some new, but certainly some that already exist, which are trusted by both consumers and merchants. This core content will feed downstream services. To provide subjective or more detailed information, the basic content will be augmented at various points with user contributed and third party sources of information. Revenue models built around helping businesses and their designees create, maintain, verify, augment and distribute their content make sense; those built around cordoning it off do not.

_____

Marty Himmelstein is the principal of Long Hill Consulting, which he founded in 1989. Marty’s interests include databases, Internet search, and web-based information systems.

For the last eleven years, Marty has been active in location-based searching on the web, a field often called Local Search. Marty was an early member of the Vicinity engineering team. Vicinity was a premium provider of Internet Yellow Pages (Vicinity provided Yahoo!s IYP service from 1996-8), business locators, and mapping and geocoding services.

This entry was posted on January 13, 2009 at 1:53 pm and is filed under Guest columns, Internet Yellow Pages, Local Search, Small Business, User-generated content. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

29 Responses to “Why We Don’t Have Good Local Business Content”

Will Scott Says:
January 13, 2009 at 2:38 pm
Greg and Marty (and everybody):

This is my favorite statement in this whole thing.

Revenue models built around helping businesses and their designees create, maintain, verify, augment and distribute their content make sense; those built around cordoning it off do not.

And, it so thoroughly distills the difficulty of Yellow Pages appropriately serving in this context.

Will
Scott Jones Says:
January 13, 2009 at 2:50 pm
Hi, yes this is a bit of a plug for my company, however, I think how we collate our data may be of interest in relation to the subject of quality mentioned above.

We work solely in the U.K. and as of yet I haven’t come across another company in the U.S. who works in the same way as us.

To ensure our local content is as accurate as it possibly can be (or can realistically be considering financial implications) we collate the content in person. We literally employ a mini army of street researchers (as well as an office based data verifying team) who physically collate all of our content. They walk the streets of the major towns and cities every day updating our retail database. Have a look at our corporate site http://www.localdatacompany.com for more information.

I’m presuming it’s down to the geographical size of the United States why someone hasn’t replicated our ways of working over there? In my experience you can only understand “local” if you are local…by having street researchers, the quality should be higher than employing a telephone team to collate the information?
Greg Sterling Says:
January 13, 2009 at 2:54 pm
Yes, local is ultimately about the businesses providing their own information and “feet on the street” to collect and verify data. That’s why it’s so difficult. Top-down approaches are ultimately not as successful.
Why We Don’t Have Good Local Business Content « The other side of the firewall Says:
January 13, 2009 at 4:03 pm
[…] For the last eleven years, Marty has been active in location-based searching on the web, a field often called Local Search. Marty was an early member of the Vicinity engineering team. Vicinity was a premium provider of Internet Yellow Pages (Vicinity provided Yahoo!s IYP service from 1996-8), business locators, and mapping and geocoding services. [From Why We Don’t Have Good Local Business Content] […]
Chris Silver Smith Says:
January 13, 2009 at 4:17 pm
Data quality and robustness are some of the biggest issues in local search today. Improving them requires significant investment.

Heretofore, the bar for local search has been pretty low — one could compete and be profitable while being very lazy about data quality and comprehensiveness.

However, online user expectations are increasing, and impatience with inaccuracies is also increasing. The local market dominator of tomorrow may be considerably improved over what we see and use today.
AhmedF Says:
January 13, 2009 at 7:39 pm
I’ve read this article about half a dozen times, and I am still trying to make sense of it.

Yes data is nowhere where it should be – I’ve been saying that much for a long time. But other than what Scott is doing, there *is* no solution. Aggregators may end up not missing a business – but they also end up with a lot of dead businesses and errors. And opening up the data has absolutely no profitable mechanism about it. Google does it because 1) All the links lead back to them and 2) They are making $10+ billion in other areas and can afford to take losses to ensure their lead-position.

The *only way* to get 100% accurate data is to go to a business owner. The logistics alone for such a project are a nightmare – an organized community effort is not the solution (and I say that as someone who actively supports projects like OpenStreetmap).

Scott’s advantage is that suburbia is far less present in UK, and the distances themselves pose a large challenge. Trust me, we’ve done our studies 🙂
Greg Sterling Says:
January 13, 2009 at 7:42 pm
The economics and incumbents in the industry basically conspire against a single universal database solution such as universal business listing.org. That entity doesn’t have the promotional muscle to go direct to SMBs, which it would need to do to fulfill it’s promise.
AhmedF Says:
January 13, 2009 at 7:43 pm
I should add that the rush of freemium / open sites serving as a business model are already starting to fail. Indeed the modus operandi right now seems to be ‘get as much VC as you can’ just so that said-companies can get more time to figure out how to even make a realistic dollar.

Not to mention the resistance of the entrenched players. And I don’t mean just the Yellow Pages (who are actually better than most) – municipal governments, BBB, Chambers of Commerce – none of these players see any real returns from going online.
Gib Olander Says:
January 13, 2009 at 8:40 pm
This really is an interesting conversation one that I know personally some really smart people are working on (said with a wink) A lot of the points made in this piece were made in an O’Reilly post back in 2005 What is Web 2.0 – The third idea he presents is http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html?page=3

Data is the intel inside.

Anyway, getting great data is only half the problem don’t get me wrong it’s a hard problem. It’s getting old stale, outdated business listing content out of the idex that is the real interesting challenge.
MiriamEllis Says:
January 13, 2009 at 10:14 pm
Marty,

I would really appreciate if you could expand on what you mean by this:

“When local search is running on all cylinders it will not make qualitative decisions; if there is a shop on Main Street people will find it. There will be no jousting for position, because the demarcation between fact and advertising will be clear.”

I think I am having trouble understanding what you are envisioning here. As I see it, any list-type listing means competition. What model would eliminate an ordered list of local businesses, and thus eliminate competition, even if it’s only alphabetical competition? I’d really like to understand what you mean.

I enjoyed your post! Good choice of a guest poster, Greg.

Miriam
Greg Sterling Says:
January 13, 2009 at 10:21 pm
Marty said he’s going to respond to these queries/comments.
Michael Bauer Says:
January 13, 2009 at 10:49 pm
I really liked this snippet:

“The necessity for this base layer arises from the defining characteristic of local search, which is that it is model-based.”

But things kind of went down the Long Hill after that. I was thinking this was another SMB Meta initiative and I would be interested in a discussion along that vector.
BTS Says:
January 14, 2009 at 12:35 am
This really is a fundemental issue. None of us want to spend all our time solving the bad data problem and instead want to help local merchants find customers and express themselves.

I am sure AF will figure it all out.

On another note, Freemium is an interesting approach. Here is Fred’s Wilson post on it that mentions alot of the winners with this approach. Winners that changed the game.

http://avc.blogs.com/a_vc/2006/03/my_favorite_bus.html
Andew Shotland Says:
January 14, 2009 at 4:21 am
“Being found also trumps search engine optimization.”

Actually search engine optimization = being found. 🙂
Stream o’ consciousness… | Mike Orren Says:
January 14, 2009 at 7:08 am
[…] Why We Don’t Have Good Local Business Content […]
B. Chandra Says:
January 14, 2009 at 1:32 pm
Interesting analysis. However, if the walled garden model is wrong, what incentives do local sites have to incentivize businesses to add content to their pages- whether that be product listings, service descriptions, menu items, etc. If the content is accessible elsewhere, or usable by other properties, the site can not differentiate, at least not on that score.
Michael Yacavone Says:
January 14, 2009 at 2:07 pm
Would we say that Dan Bricklin’s SMBmeta proposal may turn out to be part of the end-state? http://www.trellixtech.com/smbmetaintro.html
Tim Cohn Says:
January 14, 2009 at 2:14 pm
@Andrew

…which is a requisite for establishing communication.
Scott Jones Says:
January 14, 2009 at 2:27 pm
Interesting to be part of this discussion and thanks Ahmed for the compliments.

What I have found interesting, with my few dealings with U.S. companies, is the lack of quality data seemingly available in the US….and the ready availability of poor but, low-cost data.

I think this is why so few American companies are venturing outside of their borders (in the local arena this is).

Although Yelp (who are a client of ours) have recently launched a UK site.
BTS Says:
January 14, 2009 at 4:12 pm
Who is selling quality data outside the US ?
Greg Sterling Says:
January 14, 2009 at 4:21 pm
It’s not clear that anyone is on a broad scale. But see Scott Jones’ company: http://www.localdatacompany.com/
Greg Sterling Says:
January 14, 2009 at 4:21 pm
Also Urban Mapping is collecting certain categories of local data in Europe. Not sure what the licensing status is, however.
ian Says:
January 14, 2009 at 5:39 pm
Good thing Greg doesn’t get charged by the number of comments! I vote for AF and Gib as having the most prescient comments. While I understand the essence of the original post, I did let out a small “sigh”–strategically, it’s very easy to say how this gets done, but the devil is in the (massive) details. If RHD and Idearc (sorry, I can’t remember their *new* tickers!!) weren’t buried with debt, they would become extremely attractive as an acquisition and somebody would have the keys to relationships with millions of SMEs.
Nagaraju Says:
January 14, 2009 at 8:33 pm
Interesting post. Rather than wait for the phone company or credit card company to propagate information about a new business, the owners of new business are taking the time to create an online profile on various internet destinations, thereby eliminating data issues for the most part. The challenge as Gib points out, is getting out the bad data from businesses or removing non-existant businesses altogether.

One of the challenges with SMB is that the trust they have in making a physical contact and paying for that trust ( even if the results aren’t there) has yet to be overcome with the online ways (which may yield better results are lower costs) – it’ll happen but it’s a time consuming process.
Michael Says:
January 14, 2009 at 11:50 pm
We find that people frequently would rather ask neighbors for SMB recommendations rather than dive into the big anonymous services.
Why We Don’t Have Good Local Business Content « Scott Middleton Says:
January 21, 2009 at 12:55 pm
[…] Have Good Local Business Content Published January 21, 2009 Uncategorized I read a really interesting article a week or so ago on my […]
Why No Google Local Biz Content? (Part II) « Screenwerk Says:
February 5, 2009 at 12:34 pm
[…] Local Biz Content? (Part II) By Greg Sterling Marty Himmelstein’s previous post “Why We Don’t Have Good Local Business Content?” sparked considerable discussion and debate. I’m letting Marty respond (in two […]
Jamie Hutson - jehutson.com » Blog Archive » Google Maps Local is Useless Says:
February 16, 2009 at 8:17 pm
[…] Here is another good example of why Marty Himmelstein doesn’t know what he is talking about when he says SEO doesn’t matter. As good a job as Yelp has done with SEO, they should be showing up in this search term. […]
James Says:
February 17, 2009 at 1:10 am
I don’t necessarily disagree, but I swear… I’ve seen less rhetoric and jargon in legal contracts from the phone company. It’s a blog post for crying out loud, not the NY Times.