After hiQ Labs, Is Scraping Public Data Legal? (Guest Blog Post)

by guest blogger Kieran McCarthy

Last year, the most important case in the history of web scraping—hiQ Labs, Inc. v. LinkedIn Corp.—settled. After two trips to the 9th Circuit, a remand from the Supreme Court, and nearly six years of motions and posturing, the outcome of the litigation was a permanent injunction against hiQ, a win for LinkedIn, and insolvency for scraper hiQ Labs.

But as badly as this case ended for the scraper hiQ Labs, it is still best known for hiQ Labs’ two high-profile wins at the 9th Circuit on the CFAA claims, where the court said, twice:

Although there are significant public interests on both sides, the district court properly determined that, on balance, the public interest favors hiQ’s position. We agree with the district court that giving companies like LinkedIn free rein to decide, on any basis, who can collect and use data—data that the companies do not own, that they otherwise make publicly available to viewers, and that the companies themselves collect and use—risks the possible creation of information monopolies that would disserve the public interest.

hiQ Labs I, 938 F.3d 985 at 1005; hiQ Labs II at 43.

Not unreasonably, lawyers continue to view this language from the 9th Circuit as an invitation to explore the possibility that somewhere in the law is an affirmative right to scrape public data. And since web scraping is a tool that fuels all data-driven industries, and since controlling data is an essential strategy for digital incumbents that wish to protect their incumbency advantage, this issue isn’t going away.

Two cases have emerged that approach this issue from new angles.

The first is Meta v. Bright Data. The case is at the earliest pleading stages, but a lot of scraping folks are looking at it closely. Bright Data is the world’s biggest scraping company, headquartered in Israel. Meta is Meta.

On the surface, this case follows the fact pattern of nearly all web-scraping cases. Bright Data allegedly scraped Meta’s public data and sold it to its clients. Meta sent Bright Data a series of cease-and-desist notices telling it to stop. Bright Data didn’t stop. Meta sued. Bright Data is also counter-suing in Delaware, seeking a declaratory judgment that its present and future scraping activity is not subject to Meta’s TOS.

The two big wrinkles here are: 1) Bright Data revealed that, for six years, Meta was a Bright Data client for its scraping and proxy services, and 2) before it got sued, Bright Data closed all its Meta accounts and terminated/repudiated its online agreements with Meta.

The first wrinkle hits at what many perceive as the hypocrisy of Meta, which, despite owing its foundation to web scraping, has done more to bend the law of web scraping to its will than any other business. Before Meta was Meta, it was Facebook. And before it was Facebook, it was FaceMash. And when it was FaceMash, it got its first users by scraping and publishing a bunch of online facebook directories without the subjects’ consent. And then once it got traction, it pivoted to a more-savory business model. Eventually, it pivoted to a business model that got very fussy and oftentimes litigious with anyone that dared trample on its walled garden.

But if Bright Data’s allegations are true, Facebook never quit scraping other sites. And at the same time it was suing Power Ventures, BrandTotal, Octoparse, and the like for scraping its properties, it was scraping others behind the scenes. And while many have suspected as much, Bright Data claims to have evidence to support an unclean hands defense that other companies did not possess.

The next question is one that seems important but hasn’t received much treatment in the law: Under what circumstances can a user of an online platform terminate their relationship with an online platform? Much digital ink has been spilled on online contract formation; much less on online contract termination.

Based on a quick review of Facebook’s and Instagram’s Terms of Use, those agreements do not provide for a clear path for a user to terminate. The agreements give Facebook and Instagram broad leeway to terminate their agreements, but no obvious reciprocal termination provision exists for users. Surely, that’s not reasonable. Some termination option must exist for users.

From Bright Data’s perspective, the question is twofold: Under what circumstances can Bright Data terminate, and then, does Bright Data still have an ongoing legal right to access those Meta properties for the purpose of scraping and collecting public data?

There is no precedent directly on point here, but likely Meta will point to, Inc. v. Verio, Inc., and the long line of cases that have followed it, to argue that when a scraper has actual notice of terms and continues to access a site in violation of those terms, that it is liable for breach. There are lots of policy and legal arguments why that ought not to be the case, but to date, few courts have been receptive to those arguments. That said, few scraping-related cases pursue these issues past the first few stages of litigation. This case might be unique in that both parties have the resources and motivation to pursue these questions to the bitter end.

Another case that seems to be picking up where hiQ Labs left off is Crowder v. LinkedIn Corp. This case is an antitrust class action against LinkedIn for monopolistic behavior by stopping scraping and not allowing access to its public data.

In the hiQ Labs case, the 9th Circuit opinion was awash in vague antitrust rhetoric, the most notable of which I cited earlier in this post. But when the case returned to the district court after the initial ruling on the temporary restraining order, hiQ’s antitrust arguments were dismissed for failure to properly identify the relevant market in which LinkedIn has a monopoly. While the 9th Circuit opinion would seem to create an opening for antitrust arguments against “information monopolies,” no one has effectively made those arguments yet.

Attempting to fill that gap, the plaintiffs in this case brought four new antitrust arguments against LinkedIn.

First, Defendant sells private user data through application programming interfaces (“API”) to exclusive third parties called “partners.” Second, Defendant uses “technological countermeasures” to limit access to public user information. Third, Defendant integrated its user data with Microsoft’s Azure cloud computing system. Fourth, Defendant agreed with Facebook to divide markets to ensure Facebook would not develop a competing product.

Crowder at *1 (internal citations omitted)

Unfortunately for the plaintiffs, the court was no more receptive to these arguments than they were to hiQ Labs’ antitrust arguments.

The court dismissed the market division argument on the grounds that it was time barred. Plaintiffs alleged that Facebook and LinkedIn agreed to divvy up the social media market “between 2013 and 2016.” The court said the plaintiffs failed to allege any affirmative acts within the statute of limitations.

Further, the court found that the API agreements did not constitute anticompetitive behavior. The court was not persuaded that LinkedIn had any obligation to share data, and that the fact that it did share data with some partners weighed against any potential antitrust claim.

Next, the court concluded that if limiting access to data is irrational but for an anticompetitive purpose, the implication is “far from obvious or even intuitive.”

Finally, the court said that plaintiffs failed to allege that LinkedIn’s integration with Azure constituted anticompetitive conduct. According to the court, “Without allegations that integration was not an improvement to LinkedIn’s services or that there was an abuse of monopoly power “in some other way” associated with the introduction of the integration, this conduct is not plausibly anticompetitive.”

All these claims were dismissed without prejudice, with leave to amend. Perhaps the plaintiffs will do better with their next round of pleading. If there is a next round.

The thing that binds these two cases together is that both seem designed to build off the 9th Circuit’s reasoning in the hiQ Labs, Inc. v. LinkedIn Corp. case. The big question is whether the courts adjudicating those cases will be more inclined to follow the 9th Circuit rationale, or the district court’s rationale in the proceedings that followed it. What is certain is that companies whose business rely on scraping will continue to seek out courts that will embrace the 9th Circuit language that suggests that public data is fair game for all.