Reddit Files Suit Against Perplexity AI Over Data Scraping

On October 22, 2025, Reddit, Inc. initiated a federal lawsuit in the Southern District of New York against Perplexity AI, Inc. and associated data-scraping firms. The lawsuit claims violations of the Digital Millennium Copyright Act (17 U.S.C. §1201), focusing on anti-circumvention provisions, as well as unjust enrichment and unfair competition. Instead of framing this as a typical copyright infringement case, Reddit positions it as a significant challenge to what it describes as “industrial-scale” evasion of technical controls that protect its content.

The complaint asserts that the defendants employed tactics such as masking their identities, rotating IP addresses, and circumventing access controls to scrape billions of Google search-engine results pages (SERPs) that included Reddit URLs, text, images, and videos. Reddit contends that this data was then incorporated into Perplexity’s “answer engine.”

Two critical allegations stand out in this case. First, Reddit claims to have created a post that was discoverable via Google but not directly accessible, yet Perplexity’s engine reportedly surfaced significant portions of that content within hours. This, Reddit argues, provides evidence that Perplexity, either directly or through its co-defendants, scraped Google results to obtain the data.

Second, after sending a cease-and-desist letter in May 2024, Reddit observed an astonishing forty-fold increase in Perplexity’s citations to Reddit content, despite Perplexity’s public assertions of respecting robots.txt directives. Reddit’s claim under §1201(a)(1) shifts focus to the act of bypassing technological measures that control access to copyrighted works, regardless of whether the ultimate use of the material might be defensible.

The lawsuit also highlights the alleged unlawful conduct of the data-scraping partners, claiming that Perplexity collaborated with them to facilitate widespread circumvention of Reddit’s and Google’s access controls. Reddit is pursuing claims of contributory and assisting circumvention, along with unjust enrichment and unfair competition, seeking both injunctive relief and damages for business harm.

Reddit positions Perplexity within a larger ecosystem of data brokers and scrapers, contrasting its practices with its paid licensing partnerships with companies like OpenAI and Google. The lawsuit is framed as a defense of a licensing model that Reddit argues Perplexity’s competitors adhere to. According to Reddit, Perplexity’s practices undermine the value of these existing arrangements and divert user engagement away from Reddit, limiting the need for direct access to its website, mobile app, or services, ultimately impairing the platform’s commercial viability.

Moreover, Reddit contends that by circumventing its technical controls, the scraping also accessed deleted, private, or otherwise restricted posts. This not only prevents Reddit from honoring user deletion requests and privacy preferences but also jeopardizes users’ ability to manage their privacy interests, undermining trust and engagement.

As this case unfolds, several key issues will be pivotal. These include whether Google’s and Reddit’s measures qualify as §1201 “technological measures” that effectively control access, whether scraping SERPs to obtain Reddit content constitutes access to the underlying copyrighted works, and the knowledge and intent of Perplexity and the role of intermediary data-scraping firms.

Perplexity may argue that the public availability of SERP snippets means any restrictions apply to automated volume rather than access to protected content. The case of Reddit v. Perplexity exemplifies how platforms are increasingly relying on access-control and contract-based theories when direct copyright claims are uncertain.

For content owners and rightsholders, this lawsuit underscores the importance of combining technical barriers, such as API gating and rate-limiting, with contractual enforcement and auditable logs to swiftly detect and substantiate claims of circumvention. For artificial intelligence developers, the case serves as a reminder that publicly available content is not necessarily “free for training” if it is obtained through methods that circumvent access restrictions, a strategy that carries increasing legal risks under §1201(a)(1) of the DMCA.