“Law & Wikis” Panel at AALS Law & Computers Section Annual Meeting

January 13, 2010 · by Eric Goldman · in Copyright, General

By Eric Goldman

This post recaps the 2010 AALS Law & Computers Section Annual Meeting on the theme of “Law & Wikis.” The program consisted of four papers submitted in response to a Call for Papers from Spring 2009. See my panel announcement from November. AALS will eventually post the session as a podcast.

Tim Armstrong, Crowdsourcing and Open Access. An earlier version of his slides.

Tim is interested in open access to primary legal source materials. Most legal source materials are now born digital, but legacy materials are often only in paper, and this material would be more valuable online. How can we accomplish that? Tim outlined a four-step implementation process:

Step 1: Scan the materials (Tim expressly skipped over the copyright issues associated with scanning). Current scanning projects include Google Books, the Internet Archive and the Library of Congress.

Step 2: Extract the text from the scan using a free web OCR services such as any2djvu.

Steps 1 and 2 do not scale very well, but steps 3 and 4 can scale.

Step 3: Proofread and correct the extracted text. Two crowdsourced options for this step:

Option 1: Distributed Proofreaders. This is a large community and can get fast results. However, the project is hierarchical/bureaucratic, which prevents new users from adding texts, and the project does not focus on legal materials.

Option 2: Wikisource. Any user can add materials to Wikisource, it has an easier user interface than Distributed Proofreaders, and the project already handles many legal texts. However, it may be slower than Distributed Proofreaders. Tim recommended that academics consider using Wikisource to distribute scholarship. He posted one of his PDFed articles to Wikisource, and in a few weeks, other users had extracted the text and annotated the article with links to source materials. (I believe you can see his results here).

[Eric’s note: I didn’t fully understand the value of posting a PDF to Wikisource. It seems like the author can accomplish a similar result by web publishing the final article version in HTML or a Word file, which eliminates the effort to extract the text (and avoids any errors which might arise in the extraction process). Further, the author can incorporate links him/herself into the article. In fact, I routinely include lots of links in my academic articles. Ideally, law reviews will start publishing articles with complete author-supplied links rather than viewing those links as unnecessary metadata that can be tossed. (Also, it really irks me when law review editors unnecessarily break URLs in the footnotes by adding spaces). Despite this, I could see the value of Wikisource as another distribution channel for articles to increase readership, and I imagine the Wikisource community would add links that never occurred to the author.]

Step 4: distribution and indexing.

Jon Garon, The Role of Wiki Authorship for the Curatorial Audience

Jon believes that Wikipedia as a communal authoring tool is struggling, and it will require structural changes to preserve its market share.

He highlighted two Wiki norms: (1) eliminate social biases using group deliberation. Ex: Wikipedia’s “neutral point of view.” (2) projects have group goals instead individual goals. In effect, these norms combine to work against attributed authorship. However, in other author communities, authors have shown that they want attribution and integrity. For example, 97%+ of Creative Commons users chose attribution.

[Eric’s note: I’m generally skeptical that Creative Commons license adoptions provide accurate insight into authorial desires. I believe that CC license adoption statistics are skewed by Flickr’s huge photo database. Flickr has adopted a CC license requiring attribution as its default, and the overwhelming number of users (not surprisingly) accept Flickr’s default.]

Jon described three communities of users. 1% of users actively create content. 90%+ are totally passive. The remaining ~10% of users are “curators”—they don’t create new content but curate it for others. Ex: the people who reposted videos of Obama’s inauguration.

With that in mind, he notes that Wikipedia has 75,000 regular contributors, compared to the 20M amateur bloggers, 450,000 professional bloggers and 200,000 YouTube contributors. Measured this way, Wikipedia has a relatively small contributor base, perhaps because authors’ desires conflict with Wikipedia’s norms.

He concluded by advocating a wiki for academic and professional communities where their contributions could “count” for their job requirements. To start, an academic-friendly wiki could adopt attribution and integrity norms. It would also benefit by providing ways to measure contributions, such as counting submitted words/footnotes, content resilience (how long the content lasts until it’s reversed—much like WikiTrust does), inlinks and pageviews. These measurement tools can help contributions “count” as professional accomplishments for employing education institutions and public funding sources.

[Eric’s note: Jon’s project is complementary to my paper on Wikipedia, and I generally agree with his concerns about Wikipedia’s labor supply. I also agree that Wikipedia is not very friendly towards academic contributors in a number of respects, a point I explicitly explore in my paper. We did not have time to explore what Jon thinks of Citizendium, which tries to address his concerns.]

Jacqueline Lipton, Wikipedia and the European Union Database Directive

The EU Database Directive, a product of the late 1990s, assumes a content development paradigm of a single database maker aggregating text-based data into a database. It does not contemplate Web 2.0.

Jacqui emphasized jurisdictional questions. Wikipedia’s site disclosures do not expressly address EU law. Regarding copyright, Wikipedia’s site disclosures say that the site is governed by US copyright law, but it respects other countries’ laws (whatever that means). Thus, with respect to IP dispositions, Wikipedia effectively distinguishes between laws that govern information gathering and information dissemination. Wikipedia assumes local IP laws apply to information gathering and US laws apply to information dissemination. Wikipedia doesn’t address IP laws applying to information receipt.

Could a Wikipedia contributor claim a database right in Wikipedia entries? Wikipedia’s submission rules only address contributors’ copyrights, not database rights. Jacqui isn’t sure if a Wikipedia entry could qualify as a protected “database” under the directive.

Could Wikipedia or its contributors have liability for pulling information from a database and republishing via Wikipedia? She can’t answer this question without knowing the specific member state’s directive implementation.

Jacqui summarized her unanswered Qs:

* Jurisdiction/choice of law issues

– where does information gathering occur?

– information dissemination occurs in the US (Wikipedia runs on US servers)

– where is the information received?

* Joint ownership? And ownership of what? What constitutes the database? Presumably the collection of all Wikipedia entries constitutes a “database,” but it’s not clear if any specific entry does.

* How would the directive (or specific implementations) govern databases that are jointly created by people residing in different jurisdictions?

* Which jurisdiction’s fair use/fair dealing laws apply?

Salil Mehra, WikiTruth Through WikiOrder (a paper jointly authored with David Hoffman). Download from SSRN.

This paper considers why people cooperate on Wikipedia. Their animating insight is that Wikipedia develops order just like the Shasta County ranchers profiled by Ellickson, but Wikipedia contributors lack the same kind of organic relationships with each other that the ranchers had. So why has this order developed?

Wikipedia’s dispute resolution methods are not content-based. Instead, they try to foster continued dialectic among contributors. Examples of Wikipedia’s dispute resolution tools:

* informal methods (talk page, reversions) and requests for comments from other contributors

* formal methods: mediation and arbitration. Mediation doesn’t work too well. The site emphasizes arbitration as a solution, but site members are generally anti-lawyer and adhere to an “ignore all rules” ethos.

To study the Wikipedia arbitration process, Salil and David coded 267 arbitrations. They found the most common complaints were personal attacks (94), editing wars (91) and sockpuppetry (76). Common sanctions included cautions and probations (164), article bans (117) and Wikipedia bans (47). Wikipedia bans were most commonly issued for impersonation and anti-social behavior. In contrast, editing violations were negatively correlated with Wikipedia bans—the dispute resolution system tries to rehabilitate contributors rather than kick them out.

Some audience comments to the presentations:

* Wikis are a technology, and Wikipedia is just one implementation of a wiki. We shouldn’t conflate the two.

* Author attribution and integrity are interrelated rights, especially for academics.

* Do we know what types of projects are best run through wiki technologies, and do we know what steps get the right contributors?

* What copyright foundation would best facilitate wiki activity?

“Law & Wikis” Panel at AALS Law & Computers Section Annual Meeting

Comments and Pings