Last year, if you recall, there was a mod-led protest at Reddit over some ham-fisted changes from the admins. Specifically, the admins implemented significant costs/throttles on API calls such that no 3rd-party Reddit app would have been capable of surviving. Even back then it was known that the admins were snuffing out competition ahead of an eventual Reddit IPO.
Well, that time is nigh. If you want a piece of an 18-year old social media company that has never posted a profit – $18m revenue, -$90m net losses last year – you can (eventually) purchase $RDDT.
But that’s not the interesting thing. What’s interesting is that Google just purchased a license to harvest AI training material from Reddit, to the tune of $60 million/year. And who is Reddit’s 3rd-largest shareholder currently? Sam Altman, of OpenAI (aka ChatGPT) fame. It’s not immediately clear whether OpenAI has or even needs a similar license, but Altman owns twice as many shares as the current CEO of Reddit so it probably doesn’t matter. In any case, that’s two of the largest AI feeding off Reddit.
In many ways, leveraging Reddit was inevitable. It’s been an open secret for years that Google search results have been in decline, even before Google started plastering advertisements six layers deep. Who knew that when you allowed people to get certified in Search Engine Optimization, that eventually search results would turn to shit? Yeah, basically everyone. One of the few ways around that though was to seed your search with +Reddit, which returned Reddit posts on the topic at hand. Were these intrinsically better results? Actually… yes. A site with weaponized SEO wins when they get your click. But even though there are bots and karma whores and reposts and all manner of other nonsense on Reddit, fundamentally posts must receive upvotes to rise to the top, which is an added layer of complexity that SEO itself does not help. Real human input from people who otherwise have no monetary incentive to contribute is much more likely to float to the top and be noticed.
Of course, anyone who actually spends any amount of time on Reddit will understand the downsides of using it for AI training purposes. One of the most upvoted comments on the Reddit post about this:
starstarstar42 3237 points 1 day ago*
Good luck with that, because vinyl siding eats winter squid and obsequious ladyhawk construction twice; first on truck conditioners and then with presidential urology.
Edit: I people my found have
That’s all a bit of cheeky fun, which will undoubtedly be filtered away by the training program. Probably.
What may not be filtered away as easily are the many hundreds/thousands of posts made by bot accounts that already repost the same comment from other people in the same thread. I’m not sure how or why it works, but the reposted content sometimes becomes higher rated than the original; perhaps there is some algorithm to detect a trending comment, which then gets copied and boosted with upvotes from other bot accounts? In any case, karma farming in this automated way allows the account to be later sold to others who need such (disposable) accounts to post in more specialized sub-Reddits that otherwise require certain limits to post anything (e.g. account has to be 6+ months old and/or have 200+ karma, etc). Posts from these “mature” accounts as less obviously from bots.
While that may not seem like a big deal at first, the endgame is the same as with SEO: gaming the system. The current bots try to hijack human posts to farm karma. The future bots will be posting human-like responses generated by AI to farm karma. Hell, the reinforcement mechanism is already there, e.g. upvotes! Meanwhile, Google and OpenAI will be consuming Reddit content which itself will consist of more and more of their own AI output. The mythological Ouroboros was supposed to represent a cycle of death and rebirth, but the AI version is more akin to a dog eating its own shit.
I suppose sometime in the future its possible for the tech-bro handlers or perhaps the AI itself to recognize (via reinforcement) that they need to roll back one iteration due to consuming too much self-content. Perhaps long-buried AOL chatroom logs and similar backups would become the new low-background steel, worth its weight in gold Bitcoin.
Then again, it may soon be an open question of how much non-AI content even exists on the internet anymore, by volume. This article mentions experts expect 90% of the internet to be “synthetically generated” by 2026. As in, like, 2 years from now. Or maybe it’s already happened, aka Dead Internet.
[Fake Edit] So… I wrote almost exactly this same post a year ago. I guess the update is: it’s happening.
AI Ouroboros, Reddit Edition
Feb 26
Posted by Azuriel
Last year, if you recall, there was a mod-led protest at Reddit over some ham-fisted changes from the admins. Specifically, the admins implemented significant costs/throttles on API calls such that no 3rd-party Reddit app would have been capable of surviving. Even back then it was known that the admins were snuffing out competition ahead of an eventual Reddit IPO.
Well, that time is nigh. If you want a piece of an 18-year old social media company that has never posted a profit – $18m revenue, -$90m net losses last year – you can (eventually) purchase $RDDT.
But that’s not the interesting thing. What’s interesting is that Google just purchased a license to harvest AI training material from Reddit, to the tune of $60 million/year. And who is Reddit’s 3rd-largest shareholder currently? Sam Altman, of OpenAI (aka ChatGPT) fame. It’s not immediately clear whether OpenAI has or even needs a similar license, but Altman owns twice as many shares as the current CEO of Reddit so it probably doesn’t matter. In any case, that’s two of the largest AI feeding off Reddit.
In many ways, leveraging Reddit was inevitable. It’s been an open secret for years that Google search results have been in decline, even before Google started plastering advertisements six layers deep. Who knew that when you allowed people to get certified in Search Engine Optimization, that eventually search results would turn to shit? Yeah, basically everyone. One of the few ways around that though was to seed your search with +Reddit, which returned Reddit posts on the topic at hand. Were these intrinsically better results? Actually… yes. A site with weaponized SEO wins when they get your click. But even though there are bots and karma whores and reposts and all manner of other nonsense on Reddit, fundamentally posts must receive upvotes to rise to the top, which is an added layer of complexity that SEO itself does not help. Real human input from people who otherwise have no monetary incentive to contribute is much more likely to float to the top and be noticed.
Of course, anyone who actually spends any amount of time on Reddit will understand the downsides of using it for AI training purposes. One of the most upvoted comments on the Reddit post about this:
That’s all a bit of cheeky fun, which will undoubtedly be filtered away by the training program. Probably.
What may not be filtered away as easily are the many hundreds/thousands of posts made by bot accounts that already repost the same comment from other people in the same thread. I’m not sure how or why it works, but the reposted content sometimes becomes higher rated than the original; perhaps there is some algorithm to detect a trending comment, which then gets copied and boosted with upvotes from other bot accounts? In any case, karma farming in this automated way allows the account to be later sold to others who need such (disposable) accounts to post in more specialized sub-Reddits that otherwise require certain limits to post anything (e.g. account has to be 6+ months old and/or have 200+ karma, etc). Posts from these “mature” accounts as less obviously from bots.
While that may not seem like a big deal at first, the endgame is the same as with SEO: gaming the system. The current bots try to hijack human posts to farm karma. The future bots will be posting human-like responses generated by AI to farm karma. Hell, the reinforcement mechanism is already there, e.g. upvotes! Meanwhile, Google and OpenAI will be consuming Reddit content which itself will consist of more and more of their own AI output. The mythological Ouroboros was supposed to represent a cycle of death and rebirth, but the AI version is more akin to a dog eating its own shit.
I suppose sometime in the future its possible for the tech-bro handlers or perhaps the AI itself to recognize (via reinforcement) that they need to roll back one iteration due to consuming too much self-content. Perhaps long-buried AOL chatroom logs and similar backups would become the new low-background steel, worth its weight in
goldBitcoin.Then again, it may soon be an open question of how much non-AI content even exists on the internet anymore, by volume. This article mentions experts expect 90% of the internet to be “synthetically generated” by 2026. As in, like, 2 years from now. Or maybe it’s already happened, aka Dead Internet.
[Fake Edit] So… I wrote almost exactly this same post a year ago. I guess the update is: it’s happening.
Posted in Commentary
5 Comments
Tags: AI, Bots, ChatGPT, Ouroboros, Reddit, Search Engine Optimization