By: John_Levine

John_Levine — Wed, 19 Jul 2023 17:58:00 +0000

Seems to me that browsewrap is problematic if a site is being scraped by a machine, rather than read by a person, since machines have no idea what might look obvious to a person. Back in 2006 in Field vs. Google, the court agreed with our theory that the web cache wasn’t infringing, in large part because machines can and do use ROBOTS.TXT to opt out from the cache and index. Genius is arguing that yes, you can read our site to index it but not to do other stuff, and there is at least at this point, no mechanical way to say that.
Given all of the arguments about scraping for LLM training, it would be a good idea for the industry to extend the robots convention to declare some categories of permission, e.g., indexing vs. other stuff.

Comments on: Contractual Control over Information Goods after ML Genius v. Google (Guest Blog Post)

By: John_Levine