Guest columnist Marty Himmelstein is a local search expert who founded Long Hill Consulting. He was with Vicinity Corp. (acquired by Microsoft) and wrote one of the original “local search” patents before that term existed. He asked to write this column based on the flurry of recent local patent activity and related coverage surrounding the Local.com patents in particular. The sentiments in the article are entirely his own. I have not contributed to or edited the piece.
More Matter, With Less (Prior) Art
Local.com was recently awarded a patent for geographic search on the web. This patent will encounter rough sailing ahead as it is neither original nor inventive. The bulk of this post will explore the patent’s deficiencies, and refute CEO Heath Clark’s claim that “the methods covered have subsequently become the de-facto standard for information retrieval in the local search industry.”
But first an aside. Patents without merit, either because of obviousness or prior art, are unfortunately common in the software industry. Most of them sulk in the obscurity that is their rightful place. The “yet another one” part of this Local.com episode is not its most disturbing. We can find fault with the patent, with the current state of the patent system, and with Local.com’s predictable posturing, but the company is playing by the rules, broken as they may be. The worth of its IP will be evaluated in the light of day. What is most disturbing is that the facile musings of a financial blogger of no special esteem, and without the technical wherewithal to judge the merits of the IP he so effusively lauds, are not likewise ignored. Rather, they have occasioned an untethered credulity that has caused, at its peak, a tripling of Local.com’s (LOCM) stock price, and a three order of magnitude increase in its volume.
To review, on June 25, Local.com issued a press release announcing its patent, which was granted on June 12. Nobody noticed. On June 28 the stock (LOCM) closed at 3.94 on 29,932 shares. The next day on the Seeking Alpha financial blog, John Gilliam started his daydreaming. According to Eric Savitz of Barron’s the post was picked up by Yahoo Finance, and the attendant chattering sent the stock price to 6.92 on 8,406,829 shares.
Gilliam asserts the patent “is a very broad patent that seems to encompass what the major players in search are already doing with their local search applications.” By the end of the post, it’s a good bet that Google or Yahoo! will buy Local.com because “why would Google or Yahoo sign up to pay $10 – $20 million or so per year or pay royalty fees per transaction that could push it to three or four times that level when they might be able to buy the company outright for $100 million or so?” Mr. Gilliam has a position in the stock.
It apparently hasn’t occurred to the commentator that the major players are already doing what they’re doing because they have prior art, and plenty of it. And that an overworked and under qualified patent examiner missed and misunderstood that prior art. Most telling is that nowhere in the original or in a subsequent post does Mr. Gilliam mention anything about the quality of the service Local.com provides. Does he use their service? Does he find it compelling? The trading frenzy, though, is “a very positive development,” because “one of the largest items on the expense side of Local.com’s income statement is its cost of traffic acquisition,” and the exposure will bring lots of people to its site. How silly to suppose that rash speculation will build the user base that several years of the company’s own efforts haven’t. The fundamentals apply; this company’s value will be determined by the quality of its service. But unless they get a pass from the very companies they expect to curtsy to them, they will have licensing fees of their own to deal with.
To start, while at Vicinity Corporation, I co-authored patent 6,701,307, A Method and Apparatus for Expanding Web Searching Capabilities, filed in October 1998 and granted in March 2004. The patent is now owned by Microsoft (with whom I have no affiliation). The Local.com patent was filed provisionally in May 2004. The Microsoft patent covers the essentials of geographic searching on the web in a manner that is more general and more thorough than the Local.com patent. The concepts described in the patent were implemented and publicly available as a joint project between Vicinity and Northern Light between April 2000 and January 2002.
In June 2002, Google, not unaware of the potential of geographic search (or of Vicinity’s work), awarded its first annual programming prize to Daniel Egnor’s geographic search project, and followed this with their own version, which has been available as part of Google Local since September 2003. Yahoo!, too, has had a similar capability, which from this press release, may have been released as early as April 2003. And MetaCarta’s Geographic Text Search is an interesting project that also predates Local.com’s work. This list is not exhaustive.
Local.com did cite two Microsoft/Vicinity patents, both peripheral to their application, misidentifying the patent number of one with the title of another. They didn’t cite the patent that directly pertains to their work. The first patent cited by the patent examiner was the right Microsoft patent, but he apparently couldn’t discern the similarities between it and the work in front of him, similarities that would be apparent to a person of ordinary skill in the art.
Here’s the bulk of the Microsoft abstract:
“At index time, a Web page is spidered and the text and metatags returned to a processor. The processor extracts spatial information from the text and metatags. A geocode is generated for the spatial information. The geocode is then indexed along with the remaining contents of the page. A subsequent query during query time can search for entries based on proximity to a known location using the indexed geocode.”
And portions of the Local.com abstract:
“A local search engine geographically indexes information for searching by identifying a geocoded web page of a web site and identifying at least one geocodable web page of the web site. [….] The system indexes content of the geocoded web page and content of the geocodable web page. The indexing including associating the geocode contained within content of the geocoded web page to the indexed content of the geocoded web page and the geocodable web page to allow geographical searching of the content of the web pages.”
A Tale of Two Patents
The fundamental building blocks of geographic search on the web are:
· Parsing text from web pages and other kinds of unstructured documents to find location information.
· Verifying that the candidate text does represent a location.
· Transforming the parsed textual description of a location into geographic coordinates, usually latitude and longitude, that correspond to a point on the earth’s surface, a process called geocoding.
· Indexing the geocoded location. This and the next step require the use of spatial access methods (SAMs) so that two-dimensional coordinates can be transformed into one-dimensional coordinates. SAMs are a practical rather than an absolute requirement, since without them spatial searching is costly. If you choose the right encoding method, a search engine can index and search the transformed coordinate, called a spatial key in the Microsoft patent, in the same way the other terms on the page are indexed.
· Performing spatial proximity queries at search time.
At the start of the Geosearch project at Vicinity, we thought that for the types of public and commerce-oriented location-based searching that would be popular on the Internet, most of the content in which we were interested would include well-formed addresses or telephone numbers. Our hunch was right: between fifteen and twenty percent of the web pages we examined had either a US or Canadian address or telephone number, a percentage that is consistent with other estimates. We also tried hard to find complete addresses, rather than settling for simpler-to-find postal codes. The Microsoft patent discusses some of the methods we used to find and confirm addresses in unstructured content. Our decision to do fine-grained address detection and geocoding anticipated the ever-improving mapping and routing services that Google, Yahoo!, Microsoft and others have made available . Today, a local search service that doesn’t include accurate map placement and door-to-door driving directions is at a marked disadvantage.
The Local.com patent is vague on the procedure it follows to find and verify addresses. In fact, the author misuses the term ‘geocode’ to mean textual information that represents an address: “the street address has to be present to be considered a valid geocode in one configuration.” How do you know when you have a street address? I don’t even know all the streets in my little town. The patent neither describes how it makes such a determination, nor cites prior art. But I might be missing something, because “The Geocoder as disclosed herein is able find locations (i.e., geocode information such as an address, phone number, etc.) in a similar way as do human beings.” I’ll jump on the bandwagon, too, if Local.com has IP that substantiates this assertion: there is none in the patent.
You know you have a valid street address by using sophisticated geocoding databases, such as TeleAtlas’s MultiNet. (Google Maps and other services work with these databases on your behalf when you map a location or get driving directions.) These databases are dynamic because they need to maintain an accurate model of the street networks they represent. They are capable of resolving an address to within several meters. The Local.com patent makes no provision for working with these services, contenting itself to use “a look up table containing all of the US town, state, zip code, latitude and longitude” values, which can do no better than zip-centroid resolution. Of course, a software interface to a third-party software service is not an innovation. Nevertheless, the patent’s glib treatment of an important function is indicative of its general inadequacy for solving the problem it purports to address.
The SAM described in the Microsoft patent is based on quadtrees. The basic idea is to hierarchically decompose space into successively smaller regions and assign a unique name to each region. To create a unique name, each time you decompose an area into smaller regions, you append a new piece onto the end of the name of the parent region, a different piece for each of the smaller regions. (The ‘you’ here is a piece of software.) The idea is the same as creating new subfolders under a parent folder. (In fact, you (dear reader) can think of a region as a folder – it might help for the comparison with the methods used in the Local.com patent.) Region 1342 is contained in region of 134, which is contained in region 13, and so forth. A point (a location) is directly in only one region (and indirectly in parent regions). At search time, based on the user’s search center and radius, you figure out the names of the regions that fall wholly or partially within the distance to be searched. Then, quietly, you add these names as additional search terms to the user’s query. The ideal situation is that you only have to add the name of one region, but this doesn’t happen often.
Other spatial access methods can be and are used by other web-based geographic search implementations. The problem with the Local.com patent is that it doesn’t really use any spatial indexing method, and therefore efficient proximity searching on large datasets is not practical. The method employed in the Local.com patent is to assign a web page with an address to a folder that represents a geographic region. However, at search time only one folder is searched (from claim 21):
“receiving a user query … using the location to identify a folder in which to search content of web pages indexed with that folder, the folder selected from a plurality of folders…”
To solve the boundary problem of “businesses located in nearby regions … each folder includes overlapping content of web pages from web sites associated with entities located within a certain overlapping distance into the other folder.” This solution doesn’t work, of course, because there are no boundaries in local search. You can’t know beforehand how far a user is willing to search because that depends on many factors, including what the user is searching for (restaurants or balloon rides), mode of transportation, urgency, and so forth.
To solve the scalability problem, the patent suggests creating different search engines:
“One problem with conventional search engines is that they perform searches on content from web pages collected from all over the world. In contrast, configurations described herein can divide the web into different countries and provide a search engine for each, and each search engine can index content locally for each country. Using this approach, the local search engine disclosed herein deals with data in a certain country and the resources needed to process searches are greatly reduced.”
Apparently, folders only contain pages that are “geocoded” or “geocodable.” Geocodable pages are those that are deemed to be related to pages on which “major geocodes” are found; the example given requires pages share, for starters, the same domain. Geocodable pages inherit the location of the page with the major geocode. (See discussion below.)
Unlike the regularly shaped regions associated with quadtrees and related structures, which makes for easy math, Local.com folders are based on political criteria, such as zip codes, and state and country names. Instead of doing math, “each folder can have a list of zip codes associated with that folder and zip code near a state border for example might be included in two folders.” Zip codes have irregular boundaries and people don’t care about them other than to pick up their mail. Political subdivisions don’t work at all or with enough resolution in many parts of the world, a matter of practical concern since GPS devices and mapping services like Google’s MyMaps let people chart every move they make, even on the high seas. (And chart they will.) Besides, lots of systems use political boundaries to approximate true proximity searching, so there is plenty of prior artlessness, if you will.
Assigning addresses to pages without any
The Local.com patent devotes considerable time describing how to associate a ‘major’ address on one page of a website with other pages on the website that don’t also contain a major address. The patent’s Claim 6 describes how it defines a major address:
“The method of claim 3 wherein: the geocoded web page is at least one of: i) a home page of the web site; ii) a contact page of the web site; iii) a direction page of the web site; iv) an about page of the web site; v) a help page of the web site; and vi) a page of the web site that is no deeper than a predetermined number of links below the home page of the web site; and wherein the geocode contains a complete physical address of the entity associated with the web site.”
There are at least three problems with these claims. The first is that it is conventional practice for web sites, especially business-oriented sites, to include a contact page (etc.) that contains the business’s address. The whole point of pages with obvious names and titles is that they perform the obvious functions denoted by their names. People understand these pages apply to an entire website. It is not original to apply a conventional practice as it is meant to be applied and call it an invention. (I suppose one could argue the process of automation here is itself innovative. Sigh.)
The second problem is there are many cases when the technique doesn’t work or is irrelevant. Travel agencies describe vacation spots far away; the mailing addresses of web hosting providers are mostly irrelevant; suppliers want you to go to stores that sell their products, not where they are headquartered. People are a lot better than computers at disambiguating certain types of information. A geographic search that gets a user to a nearby business’s home or contact page is good enough. Any attempt to do more is as likely to hinder as to help.
The third problem relates to both prior art and industry trends. The lack of structured content standards has been the bane of local search. An early attempt to define a standard was the Small and Medium Business Metadata Initiative, described by Dan Bricklin when he was working for Interland, in 2003. SMBMeta defines “an XML file stored at the top level of a domain that contains machine readable information about the business the web site is connected to.” Location information is part of the file and the file applies to the entire website. A structured file that explicitly describes facts and relationships is not the same thing as a process that tries to infer them. It is better. While the SMBMeta initiative didn’t catch on, equivalent mechanisms are, and will: RSS, Atom, structured blogging, Google Base. The imperfect heuristics we have had to use in an attempt to gather facts about local businesses from unstructured content will be rendered increasingly marginal.
Link Analysis based on Proximity
The basic idea of what the Local.com patent calls ‘georanking’ is that a page’s georank is increased when other pages that link to it within the same folder have an address within the boundaries of the same folder. Claim 11:
“The method of claim 10 wherein performing georanking of content of the web pages comprises: identifying links in content of web pages associated with the folder; and for each identified link, adjusting a georank of a web page referenced by that identified link if the web page identified by that link has a geocode associated with the same folder associated with the web page from which the link was identified.”
As noted, the Local.com concept of geographic folders is flawed, and the flaws are inherited by methods that use them. However, we could generalize the concept of georank and base it purely on distance, so that all pages within a certain distance of a page affect that page’s georank. Such a technique seems that it could have merit. I see two possible problems, one related to usefulness, the other to originality. For the former, pages relevant to local search will tend to be referenced by other pages relevant to local search. A town’s Chamber of Commerce site will link to the sites of member businesses. Georanking might not do much more than reiterate the information implicit in the more general link analysis upon which it is based. For some types of content, travel-related say, proximity doesn’t have much value anyway. If I am considering a trip to a particular exotic locale, I want the impressions of other people with tastes similar to mine, wherever they live. In terms of originality, link analysis is well-covered. I assume there is prior art that evaluates various attributes and relationships between pages on both sides of a link. To pass an originality test one would have to demonstrate that a distance attribute is different enough from other kinds of attributes that are used to characterize a link.
The viability of Local.com’s patent on geographic search is highly suspect. The essential pieces of geographic search on the web have been covered in practice and in patent: parsing content to find location information; geocoding; indexing and searching. Further, Local.com would have an insurmountable task demonstrating their solutions are even functionally equivalent to, let alone improvements over, existing ones. The patent makes some subsidiary claims related to associating addresses on web pages with related web pages, and modifications of standard link analysis techniques to accommodate location. The originality and usefulness of these techniques are doubtful at best.