Hello, You’ve Been Referred Here Because You’re Wrong About Web Scraping Laws (Guest Blog Post, Part 2 of 2)
[Eric’s note: this is the second of a two-part series on the denouement of the hiQ v. LnkedIn case, which ended this week with a total loss for hiQ. The prior part explained the most recent ruling, a devastating but not unexpected loss for hiQ. This part debunks some of the myths that have grown up around the hiQ case during the years of judicial confusion it has caused.]
Welcome! Someone has referred you here because you’ve said something wrong about the laws related to web scraping in the United States.
Don’t worry! You’re not alone. You’re in elite company among many of the most high-profile publications tech journalists in the country!
I apologize for not writing to you personally. But if I stopped what I was doing to write a personal message every time I saw a blogger with no legal background shilling for a web-scraping company or a half-asleep aggregator journalist [FN] spewing ignorance online about the laws that govern web scraping, I wouldn’t get out much. And since I like to go outside, I figured I’d save everyone some time and provide a condensed takedown of the some of the more common but less-informed perspectives about the laws related to web scraping in the United States.
[FN] Increasingly, I think this might be AI bot bloggers defending their fellow bots, but even if that’s true, that doesn’t make it any less wrong!
For the internet-savvy among you, you may recognize that I stole this meme from the great Ken “Popehat” White, who previously wrote “Hello! You’ve Been Referred Here Because You’re Wrong About The First Amendment.” Ken’s brilliant and you should read everything he writes. Not only that, but he also seems nice, and I’ve seen others riff off this meme, and so I am cautiously optimistic that he’ll let me do the same.
Without further ado, here is why you are wrong:
If you said something like “Web scraping is legal, reaffirms US Court”
This one has been so ubiquitous the last few months that it inspired me to write this post.
No court in the United States has ever said, without qualification or caveat, that “web scraping is legal.” The law related to web scraping is far too nuanced to make such a declarative statement. But there are no shortage of bloggers and journalists that seem intent to propagate this nonsense, for obvious clickbait purposes.
One US court, the Ninth Circuit specifically, recently said, “when a computer network generally permits public access to its data, a user’s accessing that publicly available data will not constitute access without authorization under the CFAA.” hiQ Labs II at 40. This is the legal opinion that journalists are mischaracterizing as “reaffirming that web scraping as legal.” Since that is very wrong, we’ll break it down bit by bit.
When you write that “web scraping is legal,” one might reasonably infer that there is an affirmative legal right to scrape public data in the United States. But the Ninth Circuit said something very different in this case. What they said was that one federal statute called the Computer Fraud and Abuse Act (the “CFAA”) does not apply to scraping of publicly available websites that are not protected by log-in or password access credentials (except when it does).
But since that sentence is a mouthful, many journalists paraphrased it as “web scraping is legal.” Which has led many to think that there is an affirmative right to scrape public data in the United States. That’s a very dangerous misreading of the Ninth Circuit’s opinion, but many online publications keep shouting it from the rooftops.
Don’t get me wrong: hiQ Labs II is a very important decision. It is a positive development for advocates of an open internet, of which I include myself.
But that does not mean that the many different types of web scraping that are so common today—including scraping of what most people would consider public data—are categorically legal. Just ask BrandTotal. Or Skiplagged. Or Booking Holdings. Or Octoparse. Or the Points Guy. Or, for that matter, hiQ Labs, who has effectively been run out of business by their ongoing litigation with LinkedIn, and who has been on the losing end of almost every key legal decision in their dispute with LinkedIn.
Whenever you hear someone say that “web scraping is legal,” remember this: 1) the hiQ Labs II opinion is a narrow holding about the CFAA, which gets narrower the deeper you dig; 2) hiQ Labs II is only governing law in one jurisdiction in the country, the Ninth Circuit; 3) there are lots of other laws that apply to web scraping, and hiQ Labs II, like its predecessor hiQ Labs I, did nothing to resolve those other legal issues; 4) at least for now, courts allow companies to invent property rights through a breach of contract claim in their ToS to kick people—including web scrapers—off their html-based lawn whenever they so choose; and 5) Indeed, LinkedIn won its summary judgment motion against hiQ Labs on its breach of contract claim, with the court finding that “LinkedIn’s User Agreement unambiguously prohibits scraping and the unauthorized use of scraped data.” Just this week, the parties settled this case, and hiQ Labs agreed to a permanent injunction that it would never again access LinkedIn data. That means the case that has been used to justify the notion that web scraping public data is legal has been permanently resolved, with the final resolution being the insolvency of the scraper and a permanent court-enforced injunction to never again scrape the data that was the subject-matter of the litigation.
Add it all together, and web scraping isn’t likely to be presumptively legal any time soon.
If you said something like “web scraping is illegal”
That’s misleading at best. There are millions of websites you can scrape without a hint of a legal entanglement. Web-scraping disputes tend to be context-driven and stem from common fact patterns. And most website-scraper interactions don’t fit within those scraper-litigation patterns. Notably, many websites don’t restrict web scraping in any way. Many websites don’t have IP worth protecting. Heck, most websites owners wouldn’t even know what web scraping is. For those kinds of websites, there is essentially no legal risk to scraping them.
If you said or implied that the CFAA is the only law that governs web scraping
The CFAA is the most frequently litigated legal issue with web scraping, at least historically. And because it has a criminal component and it is so terrible, it gets the most scholarly and journalistic attention. But that does not mean that the CFAA is the only legal issue with web scraping.
Here’s a not-so-short list of other laws that might apply to web scraping: Copyright, trademark, breach of contract, unfair competition, unfair and deceptive trade practices, trespass to chattels, conversion, state law trade secrets, the DTSA, tortious interference with a contract, tortious interference with a prospective economic advantage, dilution, false advertising, DMCA anti-circumvention, false designation of origin, unjust enrichment, misappropriation, every state’s computer access law, state privacy laws, and any other law that a creative plaintiff’s lawyer can tie to whatever the heck that a web scraper has done.
Which is why so much of the tech press’s coverage of this issue is bonkers. If you’re looking for legal issues, web scraping is a hornet’s nest on top of a beehive on top of a wasp’s nest resting on an anthill with scorpions. There are legal issues layered on top of legal issues everywhere you look. To declare all of that resolved because of one narrow holding with respect to the CFAA is just a flaming garbage truck full of nonsense.
If you said, “Google breaches a billion websites’ terms of service every day, and they don’t get in trouble.”
This is one of the more common “everyone-else-is-doing-it” defenses of web scraping I have heard.
And it is wrong on a few levels. Even if Google’s conduct could be interpreted as a technical violation of many websites’ notoriously overbroad terms of service, their conduct doesn’t meet the criteria for most online breach of contract disputes.
The reason is simple: Google’s search function is (almost always) welcome, and most scraping traffic on commercial websites isn’t.
Also noteworthy: It’s very easy on a technical level to opt-out of Google’s scraping. You just add a “noindex” meta tag or header in an http response. Which means, if someone doesn’t want to have their website searched or indexed by Google, they can make it happen.
Web scraping tends to create a legal conflict when: 1) someone scrapes a site that doesn’t want to be scraped, 2) the host website tells the scraper to stop, and 3) the scraper keeps doing it anyway.
To my knowledge, Google doesn’t do 3. Ever. It doesn’t search and index sites that tell it to stop searching and indexing them. For that simple reason, they avoid most legal entanglements related to their search function. If a host website tells them to stop searching a page, using technical or non-technical means, they stop.
That said, Google has gotten in trouble for web scraping in the past, albeit in a very different context.
But, either way, the dynamics of legal issues for scrapers are different than they are for search engines, particularly for Google. As with most things, the rules that apply to Google might not apply to your business. And with that, it probably doesn’t make sense to base your web-scraping legal strategy based on what Google does or does not do.
If you said that if you scrape “ethically,” you won’t have any legal problems
This one reflects a naïve understanding of the power dynamics of data access.
Good people and good companies get sued (or get threatened) all the time by companies and organizations that don’t want to get scraped. Or, for whatever reason, just don’t want people accessing their data.
There is the famous example of NYU data researchers who received a cease-and-desist letter from Facebook/Meta for doing research on Facebook’s algorithm. There are dozens of examples of Southwest suing or threatening to sue companies for the simple act of providing price transparency for Southwest’s flights.
Don’t get me wrong, there are no shortage of shady characters who engage in web scraping, too. But there are plenty of good folks, too. The legal and policy issues associated with web scraping are far too nuanced to fit into a simple good vs. evil ethical dichotomy.
If you said that “it’s just random who gets sued for scraping”
Once you spend some time watching this space, it’s easy to spot the fact patterns that lead to legal disputes. And it almost always goes something like this:
Host website has a lot of data that’s useful in some way. (LinkedIn, Craigslist, Southwest Airlines, Meta/Facebook). Potential rival takes that data (hiQ Labs, 3taps, Southwest Monkey, Power.com) to either compete against the host website or otherwise threaten their business model. Sometimes the scraper is doing something innovative and valuable with the data, and other times, they’re just ripping off the host’s business model wholesale. Varying levels of conflict ensue.
It isn’t the exact same fact pattern in every industry, but it isn’t random, either. Some companies are very protective of their websites and will aggressively litigate against those who access their data. Some can navigate scraping in a way that benefits the host websites from where they collect data and eschew conflict.
Avoiding legal conflict with web scraping is about spotting the common patterns and making sure not to go straight for the vortex.
If you said, “it’s a gray area of the law”
If you said this one, I wouldn’t fault you much. Given the number of different laws that apply to web scraping, it’s easy to get confused by this area of law.
But to call it a gray of the law is misleading, too.
Most of the laws that apply to web scraping are well established with a long history of legal precedent to support them. Trademark, copyright, trespass to chattels, the law of online contracts—none of this stuff is novel. And while I may disagree with many of the published opinions that interpret the CFAA and the law of online contracts, the longer I observe this space, the less that I find them surprising. There are edge cases where courts struggle, but those are rarely the places where the common commercial disputes arise.
Most of the time, it’s predictable how these cases will unfold.
Someone I spoke to recently described the law of web scraping right now as “Wild West-y.” I think that’s a better way to think of it than “a gray area of the law.”
It’s not that the law related to web scraping is less black and white than other laws—it’s that there’s a total disconnect between the law related to web scraping and social norms for how it’s enforced. There were plenty of things in the Wild West that were technically illegal that were commonplace for people to do. Just as there are lots of kinds of web scraping today that could be construed as illegal but that the persons and companies who might have a claim don’t have the wherewithal to stop it.
Scraping and bots drive nearly half the traffic on the internet. There are not enough sheriffs in town to stop it. And unlike in the Wild West, the “bank robbers” sometimes have proxies that make them invisible to the sheriffs. And sometimes they’re in Bangalore, Singapore, or Lithuania, which makes it very hard for Wyatt Earp to grab them by the ear and drag them down to the Dodge City jail.
I like to say that we haven’t yet reached a stable equilibrium on these issues.
I don’t think that scholars and judges have figured out yet a sustainable way to draw the lines in this legal domain yet. And it doesn’t help when so much of the information available online is just flat-out wrong.
Hopefully, if you’ve made it this far, you know a bit more about the law of web scraping than when you started. Maybe with better information, all the smart people out there can work toward finding solutions that allow us to let the pro-social and benign forms of scraping prosper while minimizing the harmful kinds of scraping.
[Eric’s closing note: for more on that latter point, see my decade-old thinkpiece on online trespass to chattels.]