File Names Can Help Predict File Content in Child Porn Prosecution–US v. Beatty

January 18, 2010 · by Eric Goldman · in Content Regulation, Internet History, Privacy/Security, Search Engines

By Eric Goldman

United States v. Beatty, 2009 WL 5220643 (W.D. Pa. Dec. 31, 2009)

This is a child porn prosecution. Using Phex P2P software, an undercover investigator accessed the Gnutella network and conducted searches using search terms known to be used by child pornographers. The investigator identified IP address 76.188.64.82 with 11 files with troubling titles such as:

* r@ygold-pedo-13yo brother fucks 11yo sister and sperm inside 61943812.mpg

* (Pthc) 14yo Isabel-(Rape and Fuck) (R@ygold).mpg

* Little young girl hardfucked by me-7 yrs R@ygold illegal pedo sex.mpg

* (Hussyfan) (pthc) (r@ygold ) (babyshivid) Jessica 11y o get fucktgood.mpg

The investigator then matched hash tag fingerprints of the 11 files with child porn files in a database maintained by the Wyoming Internet Crimes Against Children (ICAC) Task Force. Subsequently, the investigator connected Beatty to the IP address. Based on this information, the government got a search warrant for Beatty’s home, found hundreds of incriminating files on his home computer, and got incriminating statements in an interview.

Beatty challenged the government’s right to search his home computer. The judge and the litigants agree that the government can legally conduct remote warrantless searches of P2P share directories, but the government apparently argued that they were free by extension to look through Beatty’s entire computer. The judge rejected such a broad position, saying:

even if the Defendant suffered no Fourth Amendment intrusion by virtue of Trooper Pearson’s conduct in remotely accessing certain shared computer files, the Defendant nevertheless retained a reasonable expectation of privacy in his computer and his home such that he possesses “standing” to challenge the merits of the subject search

This shifts the inquiry to the officers’ probable cause for the warrant. Apparently, the investigator did not download the files to review them or attach the files as evidence when requesting the search warrant. I’m not sure why the investigator didn’t do either step other than to avoid the toxicity of child porn generally. As a result, Beatty challenged the warrant because the warrant-approving magistrate did not see the files directly or get an affidavit from the investigator stating what he saw in the files. However, the magistrate did have the file names and the matching hash tags. Beatty challenged both.

The judge and the litigants agree that file names do not dispositively predict the actual file’s content. As we know, file names can be inaccurate for a variety of reasons: plain error, semantic ambiguity, an effort to surreptitiously install malware, and as a way of increasing the content’s perceived illicit value (see, e.g., the discussion in the uncited Perfect 10 v. ccBill case about websites with names like “illegal.net” and “stolencelebritypics.com”). The court correctly concludes that “common knowledge dictates that actual file content cannot be definitively determined from the file name alone.”

Nevertheless, the court says that file names have some predictive value:

one can also envision circumstances where the file name is so explicit and detailed in its description as to permit at least a reasonable inference as to what the actual file is likely to show. Many, if not most, of the files at issue here had titles that contained highly graphic references to specific sexual acts-including ejaculation, sexual intercourse, oral sex, and anal sex-involving children ranging in age from 7 to 13 years. Several of the files also reference terms such as “child_sex,” “pedofilia,” “illegal pedo sex,” “incest,” or “Lolita.” The unmistakable inference which arises from such highly descriptive file names, is that the content includes material pertaining to the sexual exploitation of children-i.e., evidence of criminal activity, if not outright contraband. Given the number of files in question and the pointed references in their titles to specific sexual acts involving young children-described in the most coarse and vulgar terms, this inference is a strong one.

I’m reminded of the admonishments that airport security is not a joking matter, so don’t make jokes about having a bomb while going through the airport security line. (I’ve seen a few airports, including the New Orleans airport, post reminders about this). Similarly, child porn is so toxic that no one in their right mind would falsely use a file title suggesting the file is child porn.

The judge also credits the file titles because accurate file titles enable searches by others. So, if you want to distribute child porn in a searchable way (a seemingly illogical proposition because, as this case illustrates, doing so puts you on a fast track to Club Fed), then you need to use keywords that match search terms. The court says:

As a matter of common sense, the very fact that individuals utilize search terms with P2P software to produce results (i.e., file names ) consistent with their chosen search terms suggests a substantial degree of correlation between file names and file content; if file names were, as a general rule, completely random and bearing no relation whatsoever to their content, then there would be no point in conducting a search in the first place and the whole purpose of peer-to-peer file sharing would be frustrated because there would be no meaningful method for locating the sought-after file content.

I agree with this only superficially. It’s true that searchable metadata must have some relationship to the underlying content to make a successful match, but community outsiders might think the metadata looks inaccurate or even completely random. Consider how Napster users used alternative spellings to route around the court-ordered blocks on various names. Now, go one step further: if a group of Napster users agree (in an offsite discussion forum) to tag Britney Spears’ songs using “Lolita” (a not wholly inappropriate appellation given some of the videos she made before the age of majority), then a block on searches for “Britney Spears” will eliminate an obvious matchmaking route but will fail to stop matchmaking completely. Indeed, subcommunities can develop multiple synonyms that are opaque to outsiders. For more on this, look at the Urban Dictionary to see how slang can have multiple meanings, and note my article on how a single search term can have dozens of possible meanings. As a result, the search matchmaking process may be more complicated–and the value of “accurate” file descriptors is lower–than the court contemplates.

In any case, it wasn’t clear how much traction Beatty expected from reducing the predictive value of file names. Ultimately, the search warrant was issued based on the combination of the file names with the fingerprint matches. It’s not like the investigator or the judge had no idea what the files might contain–they had a hash value fingerprint matching a known child porn file. (Beatty unsuccessfully argued that the underlying fingerprinted files should not be credited as known child porn ) Then again, there is no reason why law enforcement isn’t routinely preserving copies of suspect files they think are child porn and describing the file contents (or submitting the files) when seeking search warrants, easy steps that would have largely mooted Beatty’s challenges.

Comments and Pings

← “Law & Wikis” Panel at AALS Law & Computers Section Annual Meeting

4th Amendment Updates in the State Courts →