AI Ouroboros, Reddit Edition
Last year, if you recall, there was a mod-led protest at Reddit over some ham-fisted changes from the admins. Specifically, the admins implemented significant costs/throttles on API calls such that no 3rd-party Reddit app would have been capable of surviving. Even back then it was known that the admins were snuffing out competition ahead of an eventual Reddit IPO.
Well, that time is nigh. If you want a piece of an 18-year old social media company that has never posted a profit – $18m revenue, -$90m net losses last year – you can (eventually) purchase $RDDT.
But that’s not the interesting thing. What’s interesting is that Google just purchased a license to harvest AI training material from Reddit, to the tune of $60 million/year. And who is Reddit’s 3rd-largest shareholder currently? Sam Altman, of OpenAI (aka ChatGPT) fame. It’s not immediately clear whether OpenAI has or even needs a similar license, but Altman owns twice as many shares as the current CEO of Reddit so it probably doesn’t matter. In any case, that’s two of the largest AI feeding off Reddit.
In many ways, leveraging Reddit was inevitable. It’s been an open secret for years that Google search results have been in decline, even before Google started plastering advertisements six layers deep. Who knew that when you allowed people to get certified in Search Engine Optimization, that eventually search results would turn to shit? Yeah, basically everyone. One of the few ways around that though was to seed your search with +Reddit, which returned Reddit posts on the topic at hand. Were these intrinsically better results? Actually… yes. A site with weaponized SEO wins when they get your click. But even though there are bots and karma whores and reposts and all manner of other nonsense on Reddit, fundamentally posts must receive upvotes to rise to the top, which is an added layer of complexity that SEO itself does not help. Real human input from people who otherwise have no monetary incentive to contribute is much more likely to float to the top and be noticed.
Of course, anyone who actually spends any amount of time on Reddit will understand the downsides of using it for AI training purposes. One of the most upvoted comments on the Reddit post about this:
starstarstar42 3237 points 1 day ago*
Good luck with that, because vinyl siding eats winter squid and obsequious ladyhawk construction twice; first on truck conditioners and then with presidential urology.
Edit: I people my found have
That’s all a bit of cheeky fun, which will undoubtedly be filtered away by the training program. Probably.
What may not be filtered away as easily are the many hundreds/thousands of posts made by bot accounts that already repost the same comment from other people in the same thread. I’m not sure how or why it works, but the reposted content sometimes becomes higher rated than the original; perhaps there is some algorithm to detect a trending comment, which then gets copied and boosted with upvotes from other bot accounts? In any case, karma farming in this automated way allows the account to be later sold to others who need such (disposable) accounts to post in more specialized sub-Reddits that otherwise require certain limits to post anything (e.g. account has to be 6+ months old and/or have 200+ karma, etc). Posts from these “mature” accounts as less obviously from bots.
While that may not seem like a big deal at first, the endgame is the same as with SEO: gaming the system. The current bots try to hijack human posts to farm karma. The future bots will be posting human-like responses generated by AI to farm karma. Hell, the reinforcement mechanism is already there, e.g. upvotes! Meanwhile, Google and OpenAI will be consuming Reddit content which itself will consist of more and more of their own AI output. The mythological Ouroboros was supposed to represent a cycle of death and rebirth, but the AI version is more akin to a dog eating its own shit.
I suppose sometime in the future its possible for the tech-bro handlers or perhaps the AI itself to recognize (via reinforcement) that they need to roll back one iteration due to consuming too much self-content. Perhaps long-buried AOL chatroom logs and similar backups would become the new low-background steel, worth its weight in gold Bitcoin.
Then again, it may soon be an open question of how much non-AI content even exists on the internet anymore, by volume. This article mentions experts expect 90% of the internet to be “synthetically generated” by 2026. As in, like, 2 years from now. Or maybe it’s already happened, aka Dead Internet.
[Fake Edit] So… I wrote almost exactly this same post a year ago. I guess the update is: it’s happening.
Posted on February 26, 2024, in Commentary and tagged AI, Bots, ChatGPT, Ouroboros, Reddit, Search Engine Optimization. Bookmark the permalink. 5 Comments.
I’m really not sure AI is as far along as people think. Yes its initially impressive at a glance, like ChatGPT giving you something that appears correct, or that new movie AI creating something that looks better than the old Will Smith AI video. But anytime it has to do something non-trivial and you look at the details, often its confidently wrong. Text that gets basic facts wrong, movies with 6 fingers, etc. Always wrong? Nope. But enough that if you blindly use it you are a fool? Yup.
LikeLike
Yes and no. What we currently have publicly available is what it is. I’m not necessarily assuming there is some super-secret version making Reddit posts right now, but things are moving quickly. Remember: ChatGPT was released in November 2022 – it’s not even two years old. One day we’re going to hear “yep, we fixed the finger thing” and that will be that.
And as far as the text AI goes, it doesn’t need to be perfect, it just needs to better than the average person. Which, depending on the subject, it arguably already is.
LikeLike
I feel duty-bound to note that one likely reason dogs eat their own shit is because they are evolutionarily equipped to process it for further nutritional value, so as a metaphor that may actually undermine the point you’re making…
LikeLike
Ha. Ever since I learned about “runts” in writing, I have endeavored (sometimes excessively so) to make sure all paragraphs end at the “correct” spot. That sentence originally had “forever and ever” at the end, but alas it was sacrificed.
LikeLike
Once you realize how much of Reddit is just flat wrong about even simple things you question the wisdom of trying to utilize it for anything. I guess it’s all about money, who cares if it’s true as long as people buy it.
LikeLike