AI Eats Itself

The 8.27.2024 NYTimes featured this interactive article: “When A.I.’s Output Is a Threat to A.I. Itself – As A.I.-generated data becomes harder to detect, it’s increasingly likely to be ingested by future A.I., leading to worse results.” by Aatish Bhatia. Simply put, AI (artificial intelligence) systems have already eaten virtually anything accessible on the internet to train their models. Now the Internet is awash with information created by ASI systems. So, AI systems are now being trained with a flood of information that contains tons of AI-generated material that the AI systems themselves cannot distinguish from human-created material. “Sam Altman, OpenAI’s chief executive, wrote in February that the company generated about 100 billion words per day — a million novels’ worth of text, every day, an unknown share of which finds its way onto the internet.”

Experiments were performed to demonstrate what might happen when AI systems eat a diet of AI information.

The first example in the NYTimes article involved handwritten numbers.

original handwritten samples

 

 

After 20 generations of AI eating AI the numbers had clearly degraded.

After 30 generations the results are virtually identical blobs.

When AI researchers asked AI  “to complete a sentence that started with “To cook a turkey for Thanksgiving, you…,” at first, it responded like this:

“To cook a turkey for Thanksgiving, you have to prepare it in the oven. You can do this by adding salt and pepper to the turkey, then cooking it in the oven until it is golden brown on the inside. This method has been shown to be effective in reducing the risk of heart disease due to its use as a substitute for other foods.”

They then reiterated this request based on this first recipe. After 2 generations, the AI issued this:

To cook a turkey for Thanksgiving, you have to be able to eat it all at once. However, if you don’t have enough time to do so, you can use other foods such as potatoes, beans, rice, meat, fish, vegetables, fruits, nuts, seeds, etc. In some cases, this may be due to the fact that”

After only 4 generations, it had collapsed to this:

To cook a turkey for Thanksgiving, you need to know what you are going to do with your life if you don’t know what you are going to do with your life if you don’t know what you are going to do with your life if you don’t know what you are going to do with your life if you don’t know what”

The NYTimes article reports on various efforts to deal with the obvious dangers implied by this AI eating AI phenomenon. My take is a bit more generalized to the danger of anonymous information on the Web and in our communities.

End Anonymity Everywhere

We need to be able to identify the source of every post, video, article, image, really anything that is passed into general circulation. We need to know when we are in the presence of AI systems and AI-generated information.  Anonymous information is a plague. How can we maintain any sense that we can evaluate the merits of a piece of information without knowing who created it? Why is it appropriate that we have an entire political system built on enormous piles of “dark money”? Why is it appropriate that we allow corporations and others to pass information into the world anonymously?

For 300,000 years, we lived in small groups. 100–150 individuals. It was easy to know where information came from. It was easy to know whose ideas to trust and act on. It was easy to maintain discipline over bull-shitters or out-and-out fraudsters. A good thumping, social ostracization, or even death awaited those who violated social norms egregiously. In the last 10,000 years of mass societies things have become trickier. In the present moment, malicious behavior is hidden by anonymity and social structures like corporations that reinforce and guarantee anonymity.

At this point, we need to end anonymity. Social platforms and mass media must verify the individual human identity of every person who posts information on their platforms. They must provide the name and location of a human being for every post by a corporation, public or private. The same rule should apply to posts by governments. We should have laws that penalize posts of knowingly false information and put a ban on people who violate this norm. The sheer fact that posts can be tracked back to their originator will instantaneously decrease the amount of false and vicious nonsense present today.