{"id":2354,"date":"2026-06-20T15:58:51","date_gmt":"2026-06-20T15:58:51","guid":{"rendered":"https:\/\/expatcircle.com\/cms\/?p=2354"},"modified":"2026-06-20T15:58:51","modified_gmt":"2026-06-20T15:58:51","slug":"ai-models-collapse-when-trained-on-recursively-generated-data","status":"publish","type":"post","link":"https:\/\/expatcircle.com\/cms\/ai-models-collapse-when-trained-on-recursively-generated-data\/","title":{"rendered":"AI Models Collapse When Trained on Recursively Generated Data"},"content":{"rendered":"<article>\n<header><em>Based on: Nature (2024), Volume 631, Pages 755\u2013759<\/em><\/p>\n<p>Original paper:<br \/>\n<a href=\"https:\/\/www.nature.com\/articles\/s41586-024-07566-y\" target=\"_blank\" rel=\"noopener\"><br \/>\nhttps:\/\/www.nature.com\/articles\/s41586-024-07566-y<br \/>\n<\/a><\/p>\n<\/header>\n<section>\n<h2>The Core Finding<\/h2>\n<p>A 2024 study published in <em>Nature<\/em> shows a fundamental limitation of modern AI systems:<br \/>\nwhen language models are repeatedly trained on data generated by other AI models, performance begins to degrade.<\/p>\n<p>Instead of improving over time, the model gradually loses diversity, nuance, and rare statistical patterns.<br \/>\nWhat emerges is a kind of \u201cfeedback loop of imitation,\u201d where models increasingly learn from distorted reflections of reality.<\/p>\n<\/section>\n<section>\n<h2>What \u201cModel Collapse\u201d Means<\/h2>\n<p>The process can be understood as a recursive training loop:<\/p>\n<ul>\n<li>A model is trained on real-world data<\/li>\n<li>It generates synthetic text<\/li>\n<li>That synthetic data is reused for training future models<\/li>\n<li>Each iteration reduces informational diversity<\/li>\n<\/ul>\n<p>Over time, rare and complex information disappears first, leaving behind increasingly generic and homogenized outputs.<\/p>\n<\/section>\n<section>\n<h2>Why This Matters<\/h2>\n<p>This issue becomes especially relevant as more of the internet is now being generated or rewritten by AI systems.<br \/>\nIf future training datasets contain a large proportion of synthetic content, models may begin to learn primarily from themselves.<\/p>\n<p>In that scenario, data quality\u2014not just data quantity\u2014becomes the limiting factor for progress in machine learning.<\/p>\n<\/section>\n<section>\n<h2>Interpretation: The Library of Babel<\/h2>\n<p>The phenomenon strongly resembles Jorge Luis Borges\u2019 thought experiment<br \/>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/The_Library_of_Babel\" target=\"_blank\" rel=\"noopener\"><br \/>\n\u201cThe Library of Babel\u201d<br \/>\n<\/a>.<\/p>\n<p>In Borges\u2019 concept, an infinite library contains every possible combination of letters\u2014meaning all true, false, and meaningless texts exist simultaneously.<\/p>\n<p>Similarly, recursive AI training risks creating a data environment where signal and noise blur together,<br \/>\nand the model\u2019s internal representation of reality gradually loses grounding in the original source of truth.<\/p>\n<\/section>\n<footer><strong>Conclusion:<\/strong> The key risk for future AI systems is not merely insufficient data, but the increasing dominance of synthetic data generated by other models\u2014leading to progressive informational collapse.<\/p>\n<\/footer>\n<\/article>\n","protected":false},"excerpt":{"rendered":"<p>Based on: Nature (2024), Volume 631, Pages 755\u2013759 Original paper: https:\/\/www.nature.com\/articles\/s41586-024-07566-y The Core Finding A 2024 study published in Nature shows a fundamental limitation of modern AI systems: when language models are repeatedly trained on data generated by other AI models, performance begins to degrade. Instead of improving over time, the model gradually loses diversity, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[44],"tags":[],"class_list":["post-2354","post","type-post","status-publish","format-standard","hentry","category-society"],"_links":{"self":[{"href":"https:\/\/expatcircle.com\/cms\/wp-json\/wp\/v2\/posts\/2354","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/expatcircle.com\/cms\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/expatcircle.com\/cms\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/expatcircle.com\/cms\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/expatcircle.com\/cms\/wp-json\/wp\/v2\/comments?post=2354"}],"version-history":[{"count":2,"href":"https:\/\/expatcircle.com\/cms\/wp-json\/wp\/v2\/posts\/2354\/revisions"}],"predecessor-version":[{"id":2356,"href":"https:\/\/expatcircle.com\/cms\/wp-json\/wp\/v2\/posts\/2354\/revisions\/2356"}],"wp:attachment":[{"href":"https:\/\/expatcircle.com\/cms\/wp-json\/wp\/v2\/media?parent=2354"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/expatcircle.com\/cms\/wp-json\/wp\/v2\/categories?post=2354"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/expatcircle.com\/cms\/wp-json\/wp\/v2\/tags?post=2354"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}