paint-brush
The Danger of AI-Generated Informationby@modeltuning
106 reads New Story

The Danger of AI-Generated Information

by Model Tuning4mFebruary 17th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Excessive reliance on AI-generated content leads to a narrowing of public knowledge, especially in rare or eccentric viewpoints. The extent of knowledge collapse depends on factors like discount rates, learning rates, and the truncation limits of AI content.
featured image - The Danger of AI-Generated Information
Model Tuning HackerNoon profile picture
0-item

Author:

(1) Andrew J. Peterson, University of Poitiers (andrew.peterson@univ-poitiers.fr).

Abstract and Introduction

Related Work

The media, filter bubbles and echo chambers

Network effects and Information Cascades

Model collapse

Known biases in LLMs

A Model of Knowledge Collapse

Results

Discussion and References


Appendix

Comparing width of the tails

Defining knowledge collapse

Results

Our main concern is with the view that AI, by reducing the costs of access to certain kinds of information, could only make us better off. In contrast to the literature on model collapse, we consider the conditions under which strategic humans may seek out the input data that will maintain the full distribution of knowledge. Thus, we begin with a consideration of different discount rates. First, we present the a kernel density estimate of public knowledge at the end of 100 rounds (Figure 3). As a baseline, when there is no discount from using AI (discount rate is 1), then as expected public knowledge converges to the true distribution,[9] As AI reduces the cost of truncated knowledge, however, the distribution of public knowledge collapses towards the center, with tail knowledge being under-represented. Under these conditions, excessive reliance on AI-generated content over time leads to a curtailing of the eccentric and rare viewpoints that maintain a comprehensive vision of the world.


Figure 3: Knowledge collapse: The cheaper it is to rely on AI-generated content, the more extreme the degeneration of public knowledge towards the center.


Fixing specific parameters, we can get a sense of the size of the the impact of relying on AI. For example, for our default model,[10] after nine generations, when there is no AI discount the public distribution has a Hellinger distance of just 0.09 from the true distribution[11]. When AI-generated content is 20% cheaper (discount rate is 0.8), the distance increases to 0.22, while a 50% discount increases the distance to 0.40. Thus, while the availability of cheap AI-approximations might be thought to only increase public knowledge, under these conditions public knowledge is 2.3 or 3.2 times further away from the truth due to reliance on AI.


For subsequent results illustrating the tradeoff of different parameters, we plot the Hellinger distance between public knowledge at the end of the 100 rounds and the true distribution. First, we examine the importance of updating on the value of relative samples and the relationship to the discount factor in Figure 4. That is, we compare the situation in which individuals do not update on the value of innovation in previous rounds (learning rate near zero, e.g. lr = 0.001) to the case where they update rapidly (here lr = 0.1). As above, the more AI-generated content is cheaper (discount rate indicated by colors), the more public knowledge collapses towards the center. At the same time, when individuals update more slowly on the relative value of learning from AI (the further to the left in the figure), the more public knowledge collapses. We also observe a tradeoff, that is, faster updating on the relative value of AI-generated content can compensate for more extreme price disparities. And conversely, if the discount rate is not too extreme, even slower updating on the relative values is not too harmful.


Figure 4: Discount rate and learning rate


Figure 5: Discount rate and truncation limits


In Figure 5, we consider the impact of variations in how extreme the truncation of AI-generated content is on the collapse of knowledge. Intuitively, extreme truncation (small values of σtr) correspond to a situation in which AI, for example, summaries an idea with only the most obvious or common perspective. Less extreme truncation corresponds to the idea that AI manages to represent a variety of perspectives, and excludes only extremely rare or arcane perspectives. Naturally, in the latter case, (e.g. if AI truncates the distribution two standard deviations from the mean), the effect is minimal. If AI truncates knowledge outside of 0.25 standard deviations from the mean, the impact is large, though once again this is at least someone moderated when the discount is smaller (especially if there is no generational effect).


We compare the effect of the generational compounding of errors in Figure 6. If there is no generational change, there is at worst only a reduction in the tails of public knowledge outside the truncation limits. In this case the distribution is stable and does not “collapse”, 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 Truncation 0.2 0.3 0.4 0.5 0.6 Hellinger Distance Discount Factor 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Figure 5: Discount rate and truncation limits that is, over time the problem is not progressively worse. We see a jump from this baseline to the case where there is generational change, though the effect of how often generational change occurs (every 3, 5, 10, or 20 rounds) does not have a significant impact.


This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.


[9] Even with no discount, there are occasional samples from the truncated distribution, but only enough to realize that they are of relatively less worth than full-distribution samples


[10] Truncation at σtr = 0.75 standard deviations from the mean, generations every 10 rounds, learning rate of 0.05.


[11] Even here there are occasional samples from the truncated distribution – just enough to realize that they have less value than the full-distribution samples.