Generative AI Knows Nowt: Large Language Models Are Cribbing Content, Not Nicking Knowledge

The Handwringing Over ‘Knowledge Theft’

There’s been a lot of handwringing of late about large language models like ChatGPT supposedly ‘nicking’ knowledge and putting human expertise out of business. But that framing completely misunderstands what these models are and how they work. The reality is that generative AI doesn’t actually ‘know’ anything in a human sense – it is simply very good at stitching together and regurgitating content that already exists.

Just Really Good at Patterns, Not Comprehension

Large language models like Claude are trained on a massive corpus of digital text data scraped from the internet and books – everything from websites and Wikipedia entries to social media posts and product reviews. Through a complex machine learning process, the model learns patterns and associations in how this training data is written and structured.

When you query a language model, it doesn’t rely on ‘knowledge’ that it understands like a human would. Rather, it takes the input prompt, finds relevant patterns and sequences in its training data that are statistically associated with that prompt, and stitches together an output response drawing upon that data. It is essentially a sophisticated stochastic parrot operating at scale.

Intellectual Property Pilferers

So in reality, these models aren’t pilfering ‘knowledge’ – they are cribbing content. The training data itself represents the accumulation of human knowledge generation and intellectual labour over decades and centuries. But the models themselves have no conceptual understanding of that knowledge. They are simply very good at mimicking and recombining the written output of human-generated content in new ways, through predictive modelling.

This is why large language models can often give impressively fluent and coherent responses that seem knowledgeable on the surface. But dig a little deeper, and the holes in true comprehension become evident through factual errors, contradictions, incoherent reasoning, and inability to apply real context. Generative AI is just a sophisticated regurgitation engine, not an oracle of true knowledge.

Powerful Utilities, Not Human Replacements

Of course, this doesn’t mean large language models are useless or unimportant. As tools for automating content generation, analysis, and certain task workflows, they are incredibly powerful. But we can choose to retain a clear-eyed view of what they are and what their existence represents. Generative AI may be very impressive technologically, but it is not rendering human knowledge obsolete. Rather, it is highlighting the immense value of the intellectual labour that humans have generated and curated over centuries in creating the very training data that gives language models their capabilities in the first place.

—Think Different

Making Lives More Wonderful

Generative AI Knows Nowt: Large Language Models Are Cribbing Content, Not Nicking Knowledge