AI: The largest socialist wealth transfer of the past 50 years

A few months back, Elon Musk, the right-wing owner of Twitter and Grok, his pet Generative AI project, posted something I wrote on his Twitter feed, with the caption “This is the quality of humor we want from Grok.”

He even had it pinned to his profile for a short while.

I wrote this over on Quora in March of 2024. On the one hand, it’s interesting to know that Elon Musk reads my stuff. On the other, do you notice anything funny about the screenshot of his Tweet?

Yup, no credit.

The Tweet went viral, and has since been posted all over Facebook, Tumblr, Twitter, Reddit, and TikTok…all without attribution.

Right now, as I write this, OpenAI, the company behind ChatGPT, has a market cap of $157,000,000,000, making it more valuable than companies like AT&T, Lowe’s, and Siemens.

It is not a profitable company; in fact, it’s burning cash at a prodigious rate. Unlike other companies, though, which burned cash early on to achieve economies of scale, OpenAI’s costs scale directly with size, which is not at all normal for tech companies. At its current rate of growth, in four years its datacenters will consume more electricity than some entire nations.

But I’m not here to talk about whether AI is the next Apple or the next Pets dot com. Instead, let’s talk about what generative AI is, and how it represents the greatest wealth transfer of the last fifty years.

AI is not intelligent. Generative AI does not know anything. Many people imagine that it’s a huge database of all the world’s facts, and when you ask ChatGPT something, it looks up the answer in that immense library of knowledge.

No.

Generative AI is actually more like an immense, staggeringly complex autocomplete. It ingests trillions of words, and it learns “when you see these words, the most likely next words are those words.” It doesn’t understand anything; in a very real sense, it doesn’t even “understand” what words are.

As the people over at MarkTechPost discovered, many LLM models struggle to answer basic arithmetic questions.

AIs make shit up. They have no knowledge and understand nothing; when presented with text input, they produce text output that follows the basic pattern of the input plus all the text they’ve seen before. That’s it. They will cheerfully produce output that looks plausible but is absolutely wrong—and the more sophisticated they are, the more likely they are to produce incorrect output.

If you want to understand Generative AI, you must, you absolutely must understand that it is not programmed with knowledge or facts. It takes in staggering quantities of text from all over and then it “learns” that these words are correlated with those words, so when it sees these words, it should spit out something that looks like those words.

It doesn’t produce information, it produces information-shaped spaces.

To produce those information-shaped spaces, it must be trained on absolutely staggering quantities of words. Hundreds of billions at least; trillions, preferably. This is another absolutely key thing to understand: the software itself is simple and pretty much valueless. Only the training gives it value. You can download the software for free.

So where does this training data come from?

You guessed it: the Internet.

OpenAI and the other AI companies sucked in trillions of words from hundreds of millions of sites. If you’ve ever posted anything on the Internet—an Amazon review, a blog, a Reddit post, anything—what you wrote was used to train AI.

AI companies are worth hundreds of billions of dollars. All that worth, every single penny of it, comes from unpaid work by people who provided content to the AI companies without their knowledge or consent and without compensation.

This is probably the single largest wealth transfer in modern history, and it went up, not down.

There are a few dirty secrets lurking within the data centers of AI companies. One is the staggering energy requirements. Training ChatGPT 4 required 7.2 gigawatt-hours of electricity, which is about the same amount that 6,307,200 homes use in an entire year. (I laugh at conservatives who whine “eLeCtRiC cArS aRe TeRrIbLe WhErE wIlL aLl ThE eLeCtRiCiTy CoMe FrOm” while fellating Elon Musk over how awesome AI is. Training ChatGPT 4 required enough power to charge a Tesla 144,000 times. Each single ChatGPT query consumes a measurable amount of power—about 2.9 watt-hours of electricity.

Image: Jason Mavrommatis

All the large LLMs were trained on copyrighted data, in violation of copyright. Every now and then they spit out recognizable chunks of the copyrighted data they were trained on; pieces of New York Times articles, Web essays, Reddit posts. OpenAI has, last time I checked, something like 47 major and hundreds of smaller copyright lawsuits pending against it, all of which it is fighting. (It might be more by now; there are so many it’s hard to keep up.)

That, I think, is the defining computer science ethical problem of our time: To what extent is it okay to build value and make money from other people’s work without their knowledge or consent?

Elon Musk recognizes the value in what I write. He recognizes that it has both artistic and financial value. He posts my content as an aspirational goal. He doesn’t credit me, even as he praises my work.

That’s a problem.

Those who create things of value are rarely recognized for the value they create, if the things they create can’t immediately be liquidated for cash. That’s not new. What’s new is the scale to which other people’s creativity is commoditized and turned into wealth by those who had nothing whatsoever to do with the work, and are merely profiting from the labor of others without consent.

OpenAI says it would be “impossible” to train their models without using other people’s copyrighted work for free.

“Because copyright today covers virtually every sort of human expression – including blogposts, photographs, forum posts, scraps of software code, and government documents – it would be impossible to train today’s leading AI models without using copyrighted materials. […]

Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.”

It also claims their use of other people’s work is “fair use,” even while they admit that chatbots sometimes spit out verbatim chunks of recognizable work. This is a highly dubious claim—while fair use doesn’t have a precise legal definition (the doctrine of fair use exists as an affirmative defense in court to charges of copyright infringement), one of the key components of fair use has always been commercialization of other people’s work…and with a market cap of $157,000,000,000, it’s pretty tough to argue that OpenAI is not commercializing other people’s work. It charges $20/month for full access to ChatGPT.

So at the end of the day, what we have is this: a company founded by people who are neither writers nor artists, producing hundreds of billions of dollars of wealth from the uncompensated, copyrighted work of writers and artists whilst cheerfully admitting that could not produce any value if they had to pay for their training data.

And it’s not just copyrighted data.

OpenAI Dall-e cheerfully spit this image out when I typed “Scrooge McDuck stealing money from starving artist.”

Here’s the thing:

Scrooge McDuck is trademarked. Trademark law is not the same as copyright law. Trademarks are more like patents than copyrights; in the US, trademarks are administered by the Patent and Trademark Office, not the copyright office.

In no way, shape, or form is this “fair use.”

Generative AI recognizes trademarked characters. You can ask it for renderings of Godzilla or Mickey Mouse or Spider-Man or Scrooge McDuck and it’ll cheerfully spit them out. The fact that Dall-e recognizes Scrooge and Spider-man and Godzilla demonstrates without a shadow of a doubt it was trained on trademarked properties.

So far, all the lawsuits aimed at AI infringement have been directed at the companies making AI models, but there’s no reason it has to be that way. You “write” a book with AI or you create a cover for your self-published work with AI and it turns out there’s a trademark or copyright violation in it? You can be sued. That hasn’t happened yet, but it will.

(Side note: The books I publish use covers commissioned from actual artists. Morally, ethically, and legally, this is the right thing to do.)

Why do I call OpenAI and its kin a socialist wealth transfer? Because they treat products of value as a community property. Karl Marx argued that socialism is the transition between capitalism and communism, a system where nothing is privately owned and everything belongs to the public, and that’s exactly how OpenAI and its kin see creative works: owned by nobody, belonging to the public, free to use. It’s just that “free to use” means “a vehicle for concentrating wealth.”

From creators according to their ability, to OpenAI according to its greed.

It seems to me that what we need as a society is a long, serious conversation about what it means to create value, and who should share in that value. It also seems to me this is exactly the conversation the United States is fundamentally incapable of having.

8 thoughts on “AI: The largest socialist wealth transfer of the past 50 years

  1. Everyone i’ve shared this with so far has either not cared or vehemently disagreed. I have attempted to convince them to publish rebuttals, but they may just feel like it’s not worth the effort. So here is one response I have gotten in chat to think about:

    the stochastic parrot take is not novel or interesting, but it is true.
    in any case, this article is very silly.
    not because it doesn’t like LLMs, but because it is just really stupid. no, socialism is not when companies use the fruits of people’s labor for free. that’s not what words mean at all
    this article spends ages appealing to how LLMs violate the holy order of intellectual property, and then its explanation of why this is socialism is just a paragraph that says, “this is socialism because it is [thing that is not socialism]
    I think the agenda to expand the power or scope of intellectual property is very harmful. intellectual property causes countless ills and is already extremely powerful for corporations. the fundamental problem that generative neural networks present, in the context of society, as with any technology, is a labor problem.
    corporations will take any opportunity to put people out of jobs and get the products of labor for free.

    (And in response to a suggestion that the reference to socialism was just a bit of irony and not an actual claim that this is real socialism:)
    well, I think that’s really sloppy. people always try to point out why people are hypocrites, which can be entirely counterproductive if you do not refute or even accept the premises of that which you aim to denounce
    imagine if you accept this argument, which implies that violating disney’s trademarks amounts to socialism. if you believe that socialism is bad, this presents no quandary. clearly, disney deserves to be paid lots and lots of money.
    this would not help artists whatsoever.
    but also to accept your argument as valid, you must accept whatever ridiculous claim you made to make it.

  2. So the media seems to claim that LLM could somehow become sentient or serious efforts are made to create a AGI type intelligence. It seems rather presumptuous when the LLM have shown that it doesn’t understand anything. My question is why aren’t they training the AI on actual knowledge, or rather fine tune the algorithm on data that has been verified to be correct, the same way you know, Google does it?

    If reddit or 4chan posts have the same value as peer reviewed articles, then how can AI even be called “intelligent” at all? It just seems lazy. When you search for anything on google, or any other search engines, it often finds peer reviewed articles or sources that can be considered authoritative (such as from a major university or government/official sources). In fact it just seems like asking google questions is more likely to yield accurate answers than asking any LLM based AI any question.

    Or do you think LLM is made deliberately dumb for some nefarious purpose?

  3. “Training ChatGPT 4 required 7.2 gigawatt-hours of electricity, which is about the same amount that 6,307,200 homes use in an entire year. … Training ChatGPT 4 required enough power to charge a Tesla 144,000 times.”

    Did you get these numbers right? It’s the same amount of power to charge a tesla 144k times as it is to power 6.3mil homes for a year? I didn’t realize the teslas were so thirsty.

      • Y’all are right, something is off.

        A Tesla batters holds about 50 to 100 kWh of electricity, depending on the exact model. So one could charge about 72,000 to 144,000 Teslas. OK, so his 144,000 Teslas is about right.

        BUT… an average US home uses about 10,800 kWh of electricity per year. So the correct number for that would be… the amount that would power about 667 homes for a year.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.