The era of AI-powered Internet is already here

The idea of an internet dominated by AI-generated content already exists and it doesn’t look good. From ChatGPT Once on the market, AI-generated content is gradually infiltrating the Internet. Artificial intelligence has been around for decades. But consumer-facing ChatGPT has pushed AI into the mainstream, creating unprecedented accessibility to advanced AI models and demand that […]

Technology Jan 27, 2024 247 Add to Reading List

The era of AI-powered Internet is already here

The idea of an internet dominated by AI-generated content already exists and it doesn’t look good.

From ChatGPT Once on the market, AI-generated content is gradually infiltrating the Internet. Artificial intelligence has been around for decades. But consumer-facing ChatGPT has pushed AI into the mainstream, creating unprecedented accessibility to advanced AI models and demand that businesses are eager to capitalize on.

As a result, businesses and users are leveraging generative AI to produce large volumes of content. While the initial concern is the abundance of content containing inaccuracies, gibberish and misinformation, the long-term effect is a complete degradation of web content into useless garbage.

Garbage goes in, garbage goes out

If you think, The Internet already contains a bunch of useless garbage, it’s true, but it’s different. “There’s a lot of waste out there … but it’s incredibly varied and diverse,” said Nader Henein, vice president analyst at management consulting firm Gartner. As LLMs feed on each other’s content, the quality deteriorates and becomes vaguer, like a photocopy of a photocopy of an image.

Think of it this way: the first version of ChatGPT was the last model to be trained on entirely human-generated content. Since then, each model contains training data whose AI-generated content is difficult to verify or even track. This becomes unreliable, or to put it bluntly, useless data. When this happens, “we lose content quality and accuracy, And we’re losing diversity,” said Henein, a data protection and artificial intelligence researcher. “Everything starts to look like the same thing.”

“Incestuous apprenticeship,” that’s what Henein calls it. “LLMs are just one big family, they just consume each other’s content and cross-pollinate each other, and with each generation you have… more and more trash to the point where the trash exceeds the good content and things start to deteriorate from there.”

As more AI-generated content hits the web, and that content is generated by LLMs trained on AI-generated content, we envision a future web that is completely seamless and completely unreliable. Plus, it’s really boring.

Collapse of the model, collapse of the Internet

Most people I already feel something is wrong.

The tweet may have been deleted

In some of the most high-profile examples, the art is reproduced by robots. Books are swallowed whole and reproduced by LLMs without the permission of the authors. Images and videos using the voices and likenesses of celebrities are made without their consent or compensation.

But existing copyright and intellectual property laws are already in place to protect against such violations. Additionally, some are embracing AI collaboration, like Grimes, which offers revenue-sharing deals with AI music creators and record labels that explore licensing agreements with AI technology companies. On the political level, legislators introduced a Anti-counterfeiting law to protect public figures from AI replicas. The regulations needed to solve all of these problems are not in place, but it is at least imaginable to solve them.

The drop in the overall quality of everything online, however, is a more insidious phenomenon, and researchers have demonstrated why the situation is about to get worse.

In a study from Johannes Gutenberg University in Germany, researchers found that “this self-consuming training loop initially improves both quality and diversity,” which is what is likely to happen next. “However, after a few generations, the diversity of results inevitably degenerates. We find that the rate of degeneration depends on the proportion of real and generated data.”

Two others academic papers published in 2023 came to the same conclusion about the degradation of AI models when trained on synthetic, i.e. AI-generated, data. According to a study by researchers from Oxford, Cambridge, Imperial College London, the University of Toronto and the University of Edinburgh, “the use of model-generated content in training causes irreversible defects in the resulting models, where the tails of the original content distribution disappear.” “, referring to this as a “model collapse.”

Similarly, researchers from Stanford and Rice University stated that “without sufficient real, recent data on each generation of autophagic organisms [self-consuming] loop, future generative models are destined to see their quality (precision) or their diversity (recall) gradually decrease. »

Lack of diversity, Henein explains, is the fundamental problem, because while AI models try to replace human creativity, they are moving further and further away from it.

The AI-powered internet at a glance

As the collapse of the model looms, the AI-powered Internet has already arrived.

Amazon has a new feature that provides AI-generated product review summaries. Tools from Google and Microsoft use AI to make writing emails and documents easier and Indeed has launched a tool in September, which allows recruiters to create AI-generated job descriptions. Platforms like DALL-E 3 and Midjourney allow users to create AI-generated images and share them across the web.

Whether they directly produce AI-generated content like Amazon or provide a service that allows users to self-publish AI-generated content like Google, Microsoft, Indeed, OpenAI, and Midjourney, there is already .

And these are just the tools and features of big tech companies that claim to exercise some sort of surveillance. The real authors are clickbait sites that serve high volume, low quality, regurgitated content for high SEO rankings and revenue.

A recent report of 404 Media, discovered numerous sites “that rip off other media outlets by using AI to quickly produce content.” For a sample of this type of content, which avoids plagiarism at the expense of consistency, check out the dubious news site Worldtimetoday.comwhere the first line of a 2023 story touching on Gina Carano’s firing Star Wars bed“It’s been a while since Gina Carano started a tirade against Lucasfilm following her firing Star warsso, for better or worse, we were due.

Obviously this sentence was generated by AI.
Credit: Worldtimetodays.com

On Google Scholar, users discovered a hidden academic articles containing the phrase “as an AI language model,” meaning that parts of articles – or entire articles for all we know – were written by chatbots like ChatGPT. AI-generated research articles – which are supposed to have some academic credibility – can end up on news sites and blogs as authoritative references.

The tweet may have been deleted

Even Google searches sometimes turn up AI-generated celebrity likenesses instead of things like press photos or movie stills. When you Google Israel Kamakawiwo’ole, the late musician known for his ukulele cover of “Somewhere Over the Rainbow,” the best result is an AI-generated prediction of what Kamakawiwo’ole would look like if he were alive today.

Google image searches of Keira Knightley turn up distorted renderings uploaded by users to OpenArt, Playground AI and Dopamine Girl alongside real photos of the actress.

Keira Knightley's Google Image Search showing an AI-generated image of the actress

Keira doesn’t deserve this.
Credit: Mashable

Not to mention the recent pornographic deepfakes of Taylor Swift, an Instagram ad using the image of Tom Hanks to sell a dental dieta photo editing app using Scarlett Johansson’s face and voice without his consentand this fire song by Drake and The Weeknd that was actually an unauthorized song audio counterfeiting It looked exactly like them.

If our search engine results are already unreliable and the models are almost certainly feasting on this garbage, we have crossed the threshold into the AI garbage era of the web. For now, the web as we’ve known it is still somewhat recognizable, but the warnings are no longer abstract.

The Internet is not completely doomed

Assuming that products like ChatGPT fail to greet you and start reliably generating dynamic, exciting content that humans actually find enjoyable or useful to consume, what happens next?

Expect communities and organizations to fight back by protecting their content from AI models that try to scrape it. The open, ad-supported, search-based web may be dying, but the Internet will evolve. Expect more reputable media sites to place their content behind paywalls and reliable information from subscriber newsletters.

Expect to see more copyright and licensing battles, like The New York Times’ lawsuit against Microsoft and OpenAI. Expect to see more tools like Black nightshade, an invisible tool that protects copyrighted images by attempting to corrupt models trained on them. Expect the development of new, sophisticated watermarking and verification tools that prevent AI scraping.

On the other hand, you can also expect other current affairs posts like Associated Press – And maybe CNN, Fox and Time – to adopt generative AI and enter into licensing agreements with companies like OpenAI.

Like tools like ChatGPT and Google EMS become substitutes for traditional search, expect SEO-based revenue models to change.

The positive side of the model collapse, however, is the loss of demand. The proliferation of generative AI is currently driven by hype, and if models trained on low-quality content are no longer useful, demand will dry up. What’s left (hopefully) is us feeble-minded humans with the unquenchable urge to rant, share, inform, and otherwise express ourselves online.

The subjects
Artificial Intelligence ChatGPT

Teknory