Once Again, LinkedIn Can’t Use CFAA To Stop Unwanted Scraping–hiQ v. LinkedIn
The hiQ v. LinkedIn lawsuit started in 2017. In 2019, the Ninth Circuit upheld the district court’s injunction ruling in favor of hiQ. The Supreme Court vacated that decision and told the Ninth Circuit to reconsider its ruling in light of the Supreme Court’s Van Buren ruling. On remand, the Ninth Circuit again says that hiQ is entitled to injunctive relief, because LinkedIn’s claims under the CFAA don’t neutralize hiQ’s colorable tortious interference claims against LinkedIn. (Our blog post on the original Ninth Circuit ruling: “Ninth Circuit Says LinkedIn Wrongly Blocked HiQ’s Scraping Efforts”.)
The court notes that the facts from the record in the original case may have become obsolete, “given the speed at which the internet evolves,” but uses them anyways. It also sidesteps the dispute regarding whether hiQ is still in business.
Irreparable Harm / Balance of Equities: The court confirms that no viable alternative data sources exist for hiQ. The court remains skeptical of LinkedIn’s privacy-based arguments:
LinkedIn has no protected property interest in the data contributed by its users, as the users retain ownership over their profiles.
Viability of hiQ’s Tortious Interference Claim: The court again finds hiQ’s tortious interference claims viable. It’s skeptical of LinkedIn’s motivations as being legitimate:
If companies liked LinkedIn, whose servers hold vast amounts of public data, are permitted to selectively to ban only potential competitors from accessing and using that otherwise public data, the result—complete exclusion of the original innovator in aggregating and analyzing the public information—may well be considered unfair competition . . . .”
CFAA: The key question is whether hiQ’s continued access following receipt of LinkedIn’s cease-and-desist letter was “without authorization” under the CFAA. In its original Nosal ruling, the Ninth Circuit held that violation of a contractual limitation could not transform access into access that was “without authorization”. The Supreme Court in Van Buren endorsed this approach. The court says the key question is whether hiQ’s conduct is analogous to “breaking and entering”. The court says the CFAA’s legislative history makes clear that it was intended to “apply only to private information—information delineated as private through use of a permission requirement of some sort.” The court cites to Van Buren’s “gates-up-or-down inquiry”:
In other words, applying the ‘gates’ analogy to a computer hosting publicly available webpages, that computer has erected no gates to lift or lower in the first place. Van Buren therefore reinforces our conclusion that the concept of ‘without authorization’ does not apply to public websites.
The court also looks to the SCA’s similar “without authorization” provision and the rule of lenity.
It appears that the CFAA’s prohibition on accessing a computer ‘without authorization’ is violated when a person circumvents a computer’s generally applicable rules regarding access permissions, such as user name and password requirements, to gain access to a computer.
The court says companies who make their data publicly available are not without recourse. They may look to state hacking laws, trespass to chattel claims, or other causes of action “such as copyright infringement, misappropriation, unjust enrichment, conversion, breach of contract, or breach of privacy.”
This is a long-running dispute and the parties are fighting over whether hiQ is still in business. Yet, the dispute is still very heavily litigated in the district court (while the Ninth Circuit was mulling over its ruling in light of Van Buren). This makes me wonder whether the lawyers for hiQ are handling it on an hourly basis (something hiQ surely cannot afford at this point) or if a litigation funder or other benefactor is involved.
The court predictably reaches the same conclusion as in its first ruling. Was this inevitable in light of Van Buren? The Ninth Circuit says Van Buren‘s gates analogy is in sync with the Ninth Circuit’s own taxonomy of three types of computers covered by the CFAA: (1) computers for which permission is not required; (2) those for which permission is required but given; and (3) those for which permission is required but has not been given. The court has already said that accessing the second types of computers will not result in a CFAA violation (Nosal).
The court notes that permission means a username and password, but doesn’t list what else fits into this category. I assume something that screens out bots (like a captcha) would not suffice, but the opinion is not explicit.
The two noteworthy things about the ruling are: (1) the court’s continued skepticism of LinkedIn’s privacy rationale and putative ownership over user data, and (2) the court’s concerns over the power that companies who house “vast amounts of public data” wield.
Five years into this litigation, let’s take stock of all of the things we still don’t know:
- Is hiQ still an operational business? This seems like a question that must be answered before granting equitable relief designed to help it continue operating its business… ¯\_(ツ)_/¯
- Can LinkedIn enjoin hiQ’s scraping on non-CFAA grounds? If LinkedIn wins on any other claim, the CFAA issue becomes an inconsequential distraction. The court (appropriately) generally sidesteps the viability of other claims due to the case’s litigation posture, though it intimates that “it may be that web scraping exceeding the scope of the website owner’s consent gives rise to a common law tort claim for trespass to chattels.” After 5 years of litigation, we still don’t know the answer to this important question.
- Given the possibility of overlapping doctrines, do we even need the CFAA any more? This case is garbled mostly because we don’t understand the interplay between the CFAA and other overlapping doctrines, and the court’s carvebacks of the CFAA may only mean that other doctrines will fill that gap and lead to the same substantive outcomes.
- Then again, the court says: “giving companies like LinkedIn free rein to decide, on any basis, who can collect and use data—data that the companies do not own, that they otherwise make publicly available to viewers, and that the companies themselves collect and use—risks the possible creation of information monopolies that would disserve the public interest.” Does this mean that courts will reject any claim, including a TTC claim, that could lead to “information monopolies”? (See also the essential facilities discussion below).
- In the gates metaphor, if a service wants to restrict scraping, do they need to raise the gates (erect a fence) or lower the gates (drop the portcullis)? This court treats gates like a portcullis, i.e., the CFAA requires gates-down.
- Are robots.txt, IP address blocks, or cease-and-desist letters still relevant to the CFAA at all? The court says websites’ “publicly available sections lack limitations on access,” which implies that any effort that don’t create technical barriers to access (such as robots.txt and C&D letters) are irrelevant. Still no idea about IP address blocks (which LinkedIn tried).
- Who else beyond hiQ has an unrestricted right to scrape LinkedIn? It seems like a business can build solely around LinkedIn-sourced data and then weaponize that status to ensure unrestricted access, but that conclusion would motivate terrible behavior.
- Does hiQ really have a prayer of winning its tortious interference claim? Those claims RARELY succeed.
- Is the CFAA just a proxy for antitrust concerns? The court says: “If companies like LinkedIn, whose servers hold vast amounts of public data, are permitted selectively to ban only potential competitors from accessing and using that otherwise public data, the result—complete exclusion of the original innovator in aggregating and analyzing the public information—may well be considered unfair competition under California law.” But…this opinion is about CFAA, not unfair competition; and this language implies that LinkedIn is an essential facilities–a conclusion with wide-reaching implications.
- If LinkedIn can’t do anything to protect its users’ interests in publicly shared data, can anyone else do it? Or is publicly shared data forever free to whoever can grab it?
- If a service doesn’t immediately block a scraper, even if the scraper is inconsequential, does that functionally estop the service from blocking them later due to tacit acquiescence?
- The court says that unauthorized access can only occur with respect to “private information—information delineated as private through use of a permission requirement of some sort.” What user interactions or technological restrictions are sufficient to create a “permission requirement”?
- The court apparently addresses only the “without authorization” part of the CFAA. Which, if any, of this court’s analysis would differ under the “exceeds authorized access” provision?
- The court says, per the term of the injunction, LinkedIn remains free to take steps to thwart “malicious activity” and “bad actors.” How does LinkedIn know who is a bad actor and who isn’t? Are we sure hiQ isn’t a bad actor?
I think many Internet Law experts were hoping that hiQ would answer the myriad questions left open by the Van Buren decision. As my list of open issues shows, we know very little about the CFAA right now. Contact me if you are willing to teach this topic to my Internet Law students, because today I have no clue how to help them understand the CFAA.
Comments from Kieran McCarthy
- Whenever I read a CFAA case, I pause to remember that this is a federal criminal statute. And for that reason, I will always praise courts that take a narrow view of it. We do not want to open the door for rogue prosecutors to throw programmers in jail for common-place commercial activity. This opinion may help keep people out of jail for benign conduct, and the importance of that cannot be overstated. I salute the Ninth Circuit for that, first and foremost.
- Like Venkat, I too wonder who is paying hiQ Labs’ legal bills. In motions’ practice from last year, LinkedIn alleged “that hiQ went out of business three years ago.” Case 3:17-cv-03301-EMC, Document 219. Filed, 9/24/21 (emphasis in original). This case is in full-blown discovery, plus a Ninth Circuit appeal, and we are now five years into this dispute, with over 250 docket entries. How does a long out of business startup afford to pay Quinn Emanuel to fight all these fights?
- I agree with Eric that the Ninth Circuit’s insinuation that hiQ Labs could win its tortious interference claim might be the least plausible part of the opinion.
- While I agree with Eric and Venkat that privacy issues are important, I think courts should separate out that issue from the rest of the analysis. The majority of web scrapers don’t collect PII, and so to dictate all web-scraping jurisprudence based on a fact-pattern that only applies to subsection of those companies is problematic.
- With that said, the Ninth Circuit did some serious hand-waving on the privacy issues.
- This sentence is great for web scrapers: “[g]iving companies like LinkedIn free rein to decide, on any basis, who can collect and use data—data that the companies do not own, that they otherwise make publicly available to viewers, and that the companies themselves collect and use—risks the possible creation of information monopolies that would disserve the public interest.” But it’s hard to know where it fits within the broader jurisprudence on these issues. The current law of online contracts says that if you have notice of a website’s terms, and you continue to access in violation of those terms, you are in breach of contract. Indeed, just over a year ago, the district court in this case said: “Although currently hiQ is allowed access to the LinkedIn website and to copy or use public profiles posted thereon under the preliminary injunction, this does not mean that hiQ is entitled to a permanent injunction. Based on the counterclaim as pled, LinkedIn has a basis for asserting that it is entitled to a permanent injunction – or at least a declaration – that hiQ is subject to the User Agreement in the future.” hiQ Labs Inc. v. LinkedIn Corp., 2021 WL 1531172 at 6 (N.D. Cal. April 19, 2021).
- The same judge who said that giving companies like LinkedIn free rein to decide who accesses their site disserves the public interest also said that LinkedIn has a basis for asserting that it is entitled to a permanent injunction on the basis for asserting a permanent injunction because of its breach of contract claim.
- So let me get this straight: Information monopolies created by a CFAA cease-and-desist letters “disserve the public interest” but information monopolies created by breach-of-contract cease-and-desist letters derived from the exact same facts are A-OK?
- This is why I always say that we are far from a stable equilibrium on these issues. If you have the exact same set of facts, and both sides are entitled to injunctive relief on those facts based on different legal claims, sooner or later that has to be resolved one way or another.
- Eric’s “essential facilities” comment hints at the real underlying policy issue here. We have a company that makes its data available to the public but wants to restrict access to potential competitors who want to access the data and use it for commercial purposes. LinkedIn is giving the whole world a wide-open door to access this data but trying to slam the door (gate?) shut for a few select competitors who want to commercialize the data in a way they don’t like. This is anticompetitive behavior, pure and simple. The question is whether it is legal anticompetitive behavior or whether it implicates antitrust or unfair competition laws.
- Of note, the district court dismissed hiQ Labs essential facilities claim two years ago.
- This is an incredibly common fact pattern—it is nearly identical to almost every web scraping dispute from the famous Register.com v. Verio, Inc. case in 2002 to the Points Guy case in 2022. There is an entire multi-billion-dollar industry that is waiting for real guidance here, but courts repeatedly fail to deliver. Courts need to start acknowledging the importance of this specific fact pattern, and they need to provide real clarity on what is appropriate and what is not.
- The trespass to chattels claim is total crap, in my opinion. It’s sooooooo 2003. These days, only the clumsiest and most incompetent of web scrapers puts any burden on hosts’ IT infrastructure—and even so, that IT infrastructure is usually hosted on Amazon Web Services, anyway. At the motion to dismiss stage, the district court took evidence of the aggregate impact of web scraping on LinkedIn as evidence that hiQ Labs specifically had damaged LinkedIn’s infrastructure: “[T]aken in the aggregate, automated scrapers place a substantial burden on LinkedIn’s infrastructure – reaching at present into hundreds of millions of blocked access requests per day.” 2021 WL 1531172 at 10 (N.D. Cal. April 19, 2021). Taken in the aggregate, cars damage roads, but that doesn’t mean that every driver who drives on a public highway is responsible for destruction of public property.
- I’ll write more about this later, but the mainstream press’s coverage of this case has been lazy, reckless, and irresponsible. Forbes, TechCrunch, the Register, and TechRadar all covered the opinion with some version of the headline “web scraping is legal, says US court.” The distinction between “scraping public data is not a violation of the CFAA in the Ninth Circuit” and “there is an affirmative legal right to scrape publicly available data in the United States” was lost on many tech journalists. But that’s a very important distinction.
Case citation: hiQ Labs v. LinkedIn, 17-16783 (9th Cir. Apr. 18, 2022)