Amazon is investigating the controversial AI startup Perplexity for allegedly violating its cloud division’s rules by improperly “scraping” content from other websites without permission, according to a report on Friday.
Wired reports that Perplexity, which was recently valued at $3 billion, is ignoring a well-known web standard called the Robots Exclusion Protocol (commonly known as robots.txt), which news publishers and other sites use to indicate which pages automated bots are not allowed to scrape.
Although adhering to this standard is not required by law, most Internet companies choose to follow this protocol, including websites that rely on Amazon Web Services, such as Perplexity.
“AWS terms of service prohibit customers from using our services for any illegal activity, and customers are responsible for complying with our terms and all applicable laws,” an Amazon spokesperson said in a statement.
Perplexity’s practices have come under increased scrutiny since earlier this month, when Forbes magazine accused the company of “directly plagiarizing” articles written by its own reporters, as well as those from CNBC and Bloomberg, including paid content.
Wired contacted Amazon after its own investigation found that Perplexity had been using “hidden IP addresses” to scrape websites run by parent company Condé Nast, despite efforts to block access to them.
The outlet said representatives at other media outlets, including Forbes, The New York Times and The Guardian, had detected the same IP address accessing their servers.
The Post has reached out to Amazon for comment.
Perplexity spokeswoman Sarah Platnick disputed Wired’s report, calling it “inaccurate.”
“Our PerplexityBot running on AWS respects robots.txt and we have verified that Perplexity-managed services are not crawling in a manner that violates AWS’s terms of service,” Platonic said in a statement.
“AWS investigated WIRED’s media queries as part of our standard protocol for investigating reports of misuse of AWS resources,” Platonic added. “We had not heard from AWS until a WIRED reporter reached out to us. It would be incorrect for AWS to say it was ‘investigating’ Perplexity outside of this specific WIRED investigation. AWS has been a valued partner to Perplexity, and we appreciate their continued cooperation.”
Platonic told Wired that PerplexityBot bypasses the robots.txt protocol in the “very rare” circumstances when a user includes a specific URL in their query.
Perplexity CEO Aravind Srinivas had previously slammed Wired’s findings, claiming they “reflect a deep and fundamental misunderstanding of how perplexity and the internet works.”
Forbes had issues with a feature called “Perplexity Pages,” a product that displays “curated” articles that pull details from articles written by third-party news outlets.
Even though the wording in Perplexity’s post closely matched the wording in the source text, the original authors were not credited.
Instead, Perplexity used what Forbes described as a “small, easy-to-miss logo” that linked back to the original source.
In one egregious example, Perplexity’s chatbot churned out versions of an exclusive paid-for Forbes report on former Google CEO Eric Schmidt’s military drone project.
“Our coverage of Eric Schmidt’s stealth drone project was posted this morning by @perplexity_ai,” Forbes editor-in-chief John Paczkowski wrote to X at the time. “They’ve plagiarized most of our coverage, citing us and a few people who reblogged us as sources in the easiest way to ignore it.”
Srinivas said the tool had “rough edges” but otherwise denied any wrongdoing.