Yale Regulating Search? Conference Recap

By Eric Goldman

I attended the Regulating Search? conference at Yale Law School this weekend. This post contains the notes I took at the conference.

A couple of meta-observations:

* almost everyone on the various panels spoke against government regulation. This was clearly a stacked deck. There are plenty of people who would love to get their regulatory hands on search engines, but their views were not widely represented. The closest pro-regulation advocate was Barbara van Schewick, but her particular axe to grind (search engines self-promote their own subsidiary offerings too favorably) was comparatively tame.

* the words “click fraud” were not uttered once. The words “adware” and “spyware” were used extremely rarely.

* Google’s representatives repeatedly tried to position Google as “neutral” and “objective.” In my search engine bias paper (and my Deregulating Relevancy paper), I debunk any effort by Google to characterize itself as passive. Perhaps this may have been true at some point in Google’s history with respect to core search, but Google has become too multi-faceted and too involved in its databases for it to continue playing the passivity card.

Introduction

James Grimmelmann noted that it’s hard to define search engines because of technological convergence and doctrinal convergence (overlapping bodies of law). He said there are three things we need:

* a definition of “search” that anticipates future developments

* clarity on the “public interest” role of search

* to know which governments or agencies can regulate search

Panel 1: The Search Space

Robin Sloan talked about his experiences producing Epic 2014, a dystopian view of Google remixing NYT content in the future. He wondered what would happen if Google were to become an agent to automate FOIA requests for its customers.

Stefan Bechtold talked about how search engines both compete with each other but also try to interpose themselves between firms and their customers. He also noted that the law assumes a search engine has a single targetable database, so he wondered how the law would cope with decentralized search databases (much like Grokster’s distributed file lists).

Andrei Broder said that search was moving away from “syntactic search” (where results match user queries) to “semantic search” (where search tries to satisfy the underlying user needs). Search is also moving away from information retrieval to information supply, where information magically flows to users instead of users initiating a query in a search box.

Andrei described four generations of search:

* 1st generation: engines index on-page text data

* 2nd generation: engines expand to include off-page data [this is the stage we're at now based on search engines indexing third party links and anchor text]

* 3rd generation: using past behavior, search engines will answer users’ needs behind the query. The 3rd generation assumes searchers can’t state their questions very well.

* 4th generation: information supply–information will be delivered to users based on user activity and context

In Q&A, Andrei added that any algorithm factor that search engines consider will be polluted by spammers. He also notes that it’s difficult for search engines to know what consumers will find acceptable from a privacy standpoint–for example, a consumer might not mind a search engine remembering data from within the same search session but might object to a search engine remembering data from 3 weeks ago.

David Caul said that search is in its infancy. Therefore, it is hard to predict the future of search.

Stephen Arnold argued that search is an application platform.

Ed Felten offered a model of search:

Search engines observe => which leads to an observations database

Search engines analyze and learn => which leads to digested observations

Search engines then serve users

Ed said the key step in the process is the analyze/learn stage, but most of the legal fights have focused on the observations stage [or, in my words, the data collection stage], such as Bidder’s Edge and Google Book Search.

Ed’s solution is to decentralize the observations/data collection process, such as through P2P data collection.

Panel 2: Search Engines and Public Regulation

Doug Lichtman gave an unscheduled introduction consisting of three principal points:

* be skeptical of government regulation

* but don’t be blind to differences that warrant differential legal treatment

* principally, the hard legal problems arise from innocent mistakes, not from the bad actors. For example, the problems with 512(c)(3) notices are attributable to people who submit 512(c)(3) notices but don’t know what’s covered by copyright law.

Barbara van Schewick asked whether we need neutrality rules for search engines. She then rephrased the question as: do search engines have incentives to treat listings in discriminatory ways. She offered three principal reasons why search engines might discriminate against search results:

1) manipulation penalty (i.e., search engines acting to protect the integrity of results). She thinks this is protected by 1st amendment law (e.g., the Search King case).

2) paid placement (i.e., when search engines try to deceive consumers). She thinks this is covered by the FTC Act.

3) “Leveraging” (i.e., when search engines promote their private offerings in preference to other editorially generated results). She had a real problem with this, seeing it as a type of tying/cross-subsidization that can unfairly increase market share. She doesn’t think the existing laws are adequate to redress leveraging.

[My comment: I was surprising that she thought leveraging was the worst aspect of search engine bias--this definitely struck me as a minority view. I also think these 3 situations are an incomplete universe of the consequences of search engine bias]

Urs Gasser made 4 points:

* we’ve done a bad job precisely describing the harms caused by search engines

* there’s a risk of too-early regulation

* if intervention is required, we should think beyond legislation/government regulation to consider self-regulation). He described how the European view is that users govern search because there is a contractual relationship between search engines and their users which creates legal rights if violated.

* we need normative criteria to evaluate any regulatory proposal. He offered three values: diversity, information autonomy and quality

Renata Hesse said that her view of the FTC’s goal is to let markets work for themselves.

I spoke about search engine bias. I argued that search engine algorithms inherently lead to biased search results, but (a) these biases are necessary to combat the search engine gamers, (b) if the biases interfere with searchers getting relevant results, searchers won’t tolerate them, and (c) as search engine algorithms move from one-size-fits-all algorithms to personalized algorithms, the bias problem will abate substantially. I will have a separate post on this later.

Panel 3: Search Engines and Intellectual Property

While the first two panels were good, it was pretty clear that the audience really wanted to talk about Google Book Search, so there was a lot of anticipation for this panel.

Before we got to Google Book Search, two speakers addressed search engines, keywords and trademarks.

Marty Schwimmer gave examples of the ubiquity of “diversion of traffic.” He asked “who owns traffic?” He talked about how we’ve moved from the early “cybersquatting years” to the current “keyword years.”

Jon Zieger talked about developing MSN Search’s keyword advertising policy. He said he faced two key uncertainties: what constitutes a “use in commerce,” and when courts will apply initial interest confusion instead of the multi-factor likelihood of consumer confusion test.

Jon said Microsoft’s policy was guided by 4 values:

* respect IP

* give users what they need

* scalability

* stability (i.e., don’t change the policy often)

He then described the implemented policy, which he characterized as a “conservative policy compared to others” [specifically Google]. The policy is that competitors can’t use third party TMs, but comparisons, reviews and other informational uses are OK. The policy applies to the ad copy and description, but it also affects the keyword (meaning that a competitor can block another competitor from buying certain keywords in almost all circumstances). I am planning to blog more about MSN Search’s keyword policy in a separate post.

Unfortunately for Marty and Jon, they were buried on the same panel as the parties discussing Google Book Search. So the rest of this panel focused on that topic.

Jon Baumgarten spoke against Google Book Search. He made the following points:

* “not every good use is a fair/legal use”

* fair use does not cover every intermediate use. He thinks the precedent (e.g., Sega v. Accolade, Sony v. Connectix, Ticketmaster v. Tickets.com) only excuses extraction of unprotected data.

* intermediaries cannot stand in the shoes of their users (cite to Kinko’s)

* we should not overemphasize the 106 distribution right and ignore the 106 reproduction right [arguing that the infringement occurs by the copying into the databases, not the results presentation]

Paul Aiken offers two concerns about Google Book Search: (1) all digitization moves copyrighted works from a secure environment (books) to an insecure environment (the Internet), and (2) the authors want their share of additional revenue. [I found both of these points especially unpersuasive]

Jason Schultz thinks we should be talking about the overarching problem of treating IP as property. He described 4 types of incentives that support copyrights:

* incentives to create/disseminate works

* incentives for broad public access to works

* add-on creativity

* technological innovation

With respect to the incentives to create/disseminate works, he said that there’s no evidence Google Book Search harms this incentive.

Jonathan Band noted that the 512(d) safe harbor is worthless because it codified search engine concerns at the time (concerns about directory-style links to infringing content, rather than hosting infringing content in the search engine databases itself). He doesn’t favor leaving the Google Book Search issue to Congress, because Congress isn’t time-responsive and does a poor job drafting legislation.

Daphne Keller made the following points:

* search engines create incentives for authors

* in other contexts, governments subsidize efforts to push content out

* there are problems with notice-based liability (see Zeran)

She also claimed: “Google’s goal is provide neutral results–neutrality is central.”

Panel 4: Search Engines and Individual Rights

Brian Marcus talked about the ADL perspective. He said the ADL offers a downloadable search results filtering program for those who want it. The ADL has also asked Google to offer its searchers personal filters that searchers could voluntarily use to filter search results.

Aden Fine asked if the Internet truly can be an open marketplace of ideas. He is torn by conflicting views:

* private censorship is significant

* but private content providers have the right to pick ‘n’ choose the content they want to make available (citing to Tornillo and Hurley).

From his perspective, the government shouldn’t be in the role of deciding what we hear; instead, the better course is to permit more speech to correct problematic content.

He is also concerned about the right to anonymously search the Internet. He thinks searchers should have the right to challenge government subpoenas before search engines disclose personal info.

Mike Godwin had the best line of the day: “USENET is like a national park for weird free speech people.”

Chris Hoofnagle expressed concern about increased availability of public records and personal data contained therein. He noted that some government actors now are removing personal data before releasing public records. From his perspective, we can protect privacy by limiting personal data from getting into the search engine databases. Like Aden, Chris also expressed concern about government actors subpoenaing search engine histories.

Alan Davidson also was concerned about privacy/government access to information. The best privacy policy in the world doesn’t restrict government demands. He favors scalable solutions.

Alan talked about the free flow of information, and said that Google was committed to objectivity. The old model of content regulation focused on publishers and readers; now, regulators focus on regulating intermediaries. He thinks this is a bad direction because it leads to:

* overbreadth–Google will block legitimate content

* lack of due process–Google will honor third party demands even if Google doesn’t understand the content it is blocking

* effectiveness–kicking content out of Google doesn’t eliminate the content from the Internet

He thinks the free flow of information is the greatest threat to tyranny.

Tal Zarsky was the last speaker. He thinks search engines are just like any other media companies. He is not convinced the richness of Internet content moots diversity concerns. He also thinks the Internet is just as concentrated as other media, and that increased transparency by search engines increases their concentration [I didn't understand this].

Whew! It was a long day (24 speakers!) but the talks were all interesting. Many thanks to the conference organizers for their hard work and for putting together a great group of panelists and audience members.

UPDATE: Other recaps of the conference: Robin Sloan, LawMeme, Urs Gasser I, II, III and IV, Michael Zimmer. Photos: LawMeme I and II, LawGeek