Infinity is not everything

Infinity is not everything

By Aurelie Herbelot

Generative AI is apparently on its way to dethrone human artists and scientists. But is it truly creative? In this post, we will take a deep dive into the linguistic and logical processes behind creativity, and ask which of those neural networks actually implement.

Apparently, generative AI is going to revolutionise the way we create. According to the news, AI artists and AI novelists are already there. Not talking about AI scientists. And if you don’t believe it, go and admire the intricate images produced by DALL-E or Stable Diffusion. See how the authors of that scientific-looking paper credited Chat-GPT for writing some of the material included in their article. It is happening.

I will give it to you. Generative AI, as its name implies, generates a lot of new stuff. But which kind of new?

There are two meanings of the word new. The first one refers to a new instance of an existing type. A production chain that churns out glass bottles on a conveyor belt does produce something new every few seconds, i.e. a glass bottle. But that glass bottle is identical to the previous one and the one before that.

The second sense of new refers to a new type of things. The 19th century saw the invention of the modern combustion engine, and by extension of the automobile. A new type of locomotion was born, unlike anything we had seen before. Similarly, the emergence of the Internet a century later heralded a new type of communication.

It should be obvious that it is this second meaning of new that makes our species particularly innovative. Some bird species build a new nest every year, but that nest is not conceptually new. In contrast, human builders and architects have created many different types of habitats over the years, driven by environmental, social and aesthetic pressures. And so it is that human-created buildings include caves, yurts, wood cabins, stilt houses, skyscrapers, and much much more. Arguably, the only reason why we can speak of scientific, artistic or social progress is the ability of humans to build the conceptually new.

It may seem that the two senses of new are on a continuum. The production chain is an extreme example of the first type of new: it generates more instances of identical objects. But what about a watchmaker, who manually makes different versions of a given type of watch, perhaps in different designs? What about the watchmaker who thought of attaching a clock-like object to a bracelet, thereby inventing the wristwatch? Some innovations may be newer than others, but at the end of the day, aren’t we talking about degrees rather than real difference? Generative models can produce an infinity of new content: each new image generated by Stable Diffusion is truly different from its predecessors. So out of that infinity, some content will just be newer than some other, in the second sense of new.

Right?

Not quite. Arguably, how creative you are depends on the processes involved in the act of generation. And as it happens, there is not a single trick to generation. In the next section, we will start by looking at the most obvious way to create something new: by composition. And we will see that it is far from accounting for everything that the human mind can conjure up.

Linguistic collages

Generative models such Chat-GPT or DALL-E are derivations of so-called ‘Large Language Models’ or LLMS. Since LLMs are supposedly about language, let’s take a little linguistic detour and ask what it means to generate a sentence.

Welcome to the history of syntax.

Decades before you even heard of Generative AI, a branch of linguistics was busy investigating what is known as ‘Generative Grammar’. The term stems from the work of various linguists, including Zelig Harris and Noam Chomsky, on syntactic structures. Generative Grammar approaches seek to explain how natural languages can generate an infinity of different sentences, using a finite vocabulary. This question is usually referred to as the problem of ‘productivity’, and in the theoretical framework advocated by the Chomskian school, it has an answer in a specific type of Generative Grammar. Chomsky’s grammar ensures that an infinity of new utterances can be generated by a speaker, via the process of composition, that is, the act of combining and recombining linguistic constructs. It also ensures that composition respects some syntactic constraints such as number or gender agreement. Generative grammar lets you write the sentences the dog chases the cat, and the dog chases the cat that chases the mouse, and the dog chases the cat that chases the mouse that chases the elephant, and so on and so on, ad infinitum. But it won’t let you write the dogs chases cat. You would be violating the constraints of English doing so.

In 1963, following Chomsky’s work on syntax, researchers Katz and Fodor proposed that meanings could also be composed following a set of rules that included so-called ‘selectional constraints’. The point of selectional constraints was to ensure that a generated sentence would not only be syntactically correct, but also make sense to a speaker. Katz and Fodor argued that sentences such as ‘paint is silent’ are nonsensical and should be disallowed by a generative model, in the same way that ‘the dogs eats’ is syntactically disallowed for violating number agreement. In a nutshell, the proposal was to extend grammar from syntax to semantics, in order to account for acceptability in sentence meanings.

Ten years later, in a 1973 paper, computational linguist Yorick Wilks argued that there was something slightly off about the Chomskian theories of meaning. It was too constraining to account for human creativity, not only in utterance production but also in sentence understanding. In actual fact, Wilks argued, it was well-known that ‘speakers have the ability to embed odd-looking utterances in stories so as to make them meaningful in context’.1 That is, no matter how weird a phrase or sentence may sound, humans are extremely proficient at extracting meaning from it. From already conventionalized oddities such as banana chairs all the way to parliamentary potatoes and sharp glue,2 we can usually make up a narrative where the linguistic construction makes sense. My personal version of parliamentary potato oscillates between a potato that a parliamentary threw at a member of another party during a debate, and a jaded member of parliament who sits in debates as they would in front of the TV (in reference to ‘couch potato’). You will have your own interpretations.

Wilks made a strong claim for using the term ‘selectional preference’ in place of ‘selectional constraint’. The idea is to acknowledge that ‘parliamentary debate’ is a much more probable string than ‘parliamentary potato’, but also that the latter is not completely impossible. Linguists sometimes talk of ‘semantic deviance’ to refer to odd-looking utterances that do not conform to habitual patterns. Arguably, conceptual deviance is an integral ingredient of progress. If you could not think and talk about something that does not exist, you could not make it happen. No car, no Internet, no human artifact would ever have seen the light of day.

The consequence of all this is that there is a fundamental difference between the ability to generate infinity (i.e. to be ‘productive’ in the Chomskian sense) and the ability to generate something that breaks with an existing paradigm (i.e. to be ‘creative’ or ‘semantically deviant’). The former requires combination and recombination of parts. The latter requires breaking them. So the first and the second senses of new are crucially different.

Nature knows all this perfectly well. When members of a species reproduce, they do not simply pass on or combine their existing genetic material. A crucial aspect of the process is mutation. Think about this: without mutation, evolution could not take place. If life had merely been generative, we would not exist.

Language, like biological species, mutates all the time. The 7000 languages spoken on our planet have evolved over millenia. The English of today is not Shakespeare’s English, and the English of tomorrow will not be the one we speak today. Mutation is the difference between life and stagnation. Generative Grammar allows you to produce an infinity of sentence, but it has as much syntactic creativity of a school teacher who tells you that double negatives are ‘bad English’. Generative Grammar can produce an infinity of new syntactic constructions, but none of them are new in the second sense of new. In a word, Generative Grammar is not Creative Grammar.

So if composition is akin to the mere recombination of genetic material, where should we find the stuff of mutation?

Thinking outside of the box

We humans are relatively good at ‘thinking outside of the box’. We do it all the time without really noticing. Think of a common task, like having a weekly meeting with a particular colleague at work. We learn the normal pattern of such an interaction over time: prepare for the meeting, walk to a specific office, engage in chit-chat, go through the agenda, chit-chat again, etc. We even learn deviations to the standard script. For instance, that colleague never fails to send a message if they are late, or sick, or prevented in any way to actually attend the meeting.

But now, imagine a situation that goes off-script: our colleague has not turned up for the meeting and did not send any notice of their absence. What happens then?

One typical reaction to the unseen is to engage in what logicians call abduction. That is, the process of recruiting some fact out of thin air to make sense of the situation. Standard abduction is supposed to reach out for the simplest explanation: our colleague is perhaps stuck in traffic and their phone has run out of battery. But note that we are also very adept at making up more extravagant explanations. Perhaps our colleague realised this morning that they were in possession of a winning lottery ticket and are already on their way to the Bahamas. Anything goes, as long as the resulting story is consistent with observed facts. Philosophers sometimes distinguish between selective and creative abduction to distinguish between the recourse to the simplest explanation and the creation of a new hypothesis. But for the sake of simplicity, I will stick to using abduction in the following.

The interesting thing about abduction is that it often recruits properties that do not belong to the set of features that would normally be relevant for the task. Here, the task is having a meeting. Our brain has learned the patterns involved in the task and encoded them in an appropriate mental space. That space may involve complex properties themselves associated with their own patterns, like reading a report or talking about the weather. It most likely excludes irrelevant properties and patterns such as baking a cake or taking the dog for a walk. As we will soon find out, if we were artificial neural networks, we would have encoded the task in a fixed space and be stuck in our mental box. But fortunately, our biological brain can do better. It is able to dynamically expand our task space to additional dimensions, for instance the ones encoding properties such as ‘being stuck in traffic’, or ‘winning the lottery’. If the new dimensions provide a satisfactory explanation, we have successfully performed creative abduction. We have very literally thought ‘outside of the box’.

Let us look at extreme examples of abduction in science and the arts. In 1964, Peter Higgs and colleagues were grappling with a tricky problem in quantum physics, related to the acquisition of mass by certain particles. Their solution to the problem was to invent a new particle out of thin air: what came to be known as the Higgs Boson. The existence of the particle was only confirmed 40 years later at the Large Hadron Collider at CERN. Another example comes from the study of quasi-crystals: there is a naturally-occurring mineral called ‘icosahedrite’ which shows an arrangement of atoms that does not conform to the conventional patterns we see in crystals. In spite of its apparent irregularity, if we think of looking at it not in three, but six dimensions, its structure becomes symmetric again. This is mind-boggling because, as suggested by some physicists, it opens the idea of a physical universe in higher dimensions.

The arts have made similar discoveries over the years. One instance in the development of Western music was the introduction of a new notation in the Late Middle Ages, which freed composers from the constraints of so-called rhythmic modes, prevalent in the previous centuries. Among other things, this revolution fostered experiments with the binary rhythm which is now an inherent part of our musical landscape. For better or for worse, the famous Darth Vader’s theme composed by John Williams would not exist without this 13th century innovation. A similar story can be told about the visual arts and the development of perspective. Most Western paintings prior to the 15th century show a ‘flat’ composition, ignoring the effect of distance on object sizes. Filippo Brunelleschi is usually credited with the idea of simulating depth in drawings.

What is fascinating about these discoveries is the way they required their creators to rework existing patterns inside a completely different mental space, introducing entirely new concepts to standard practice. These examples are of course particularly striking, and resulted in real paradigm changes in their respective fields. But at a fundamental level, they simply make use of abduction to create something truly new.

So what kind of generation process do neural models implement? Can they think outside of the box, or are they stuck inside it? In order to find out, we must consider how are they are built. The followingsection is slightly more technical, but I have tried to make it as fun as possible. So stick around!

Neural net, teacher’s pet

Very roughly speaking, a neural network is a collection of interconnected artificial neurons, sometimes also referred to as ‘nodes’ or ‘units’. Modern neural networks are huge and contain millions of neurons, but to make things easy, let’s consider just two units. We will call them Kim and Sandy.

Kim and Sandy are connected, meaning that Kim can send information to Sandy. What kind of information? Anything. Neuron Kim might encode a color property, or the display of an emotion, or something structurally complex, e.g. what a typical visit to the restaurant is like. Particular aspects of the property are mathematically represented as numbers. Supposing that Kim encodes the color blue, different values might correspond to different shades of blue, or perhaps even to the property of not being blue.3 Sandy of course also encodes a property, which may be the appearance of the sky on a sunny day. That property is similarly expressed as a number.

The connection between Kim and Sandy is associated with a certain strength, which we will call a ‘weight’. The weight of a connection is crucially important to the good functioning of the network. When you hear about ‘training’ a neural net, it refers to the mathematical process of learning the right weights for all connections in the network, with the aim of achieving the best performance in a certain task. Weights can be positive (and therefore excitatory) or negative (inhibitory). If the weight between Kim and Sandy is positive, then Kim contributes to making Sandy’s property value higher. If it is inhibitory, then Kim reduces Sandy’s property value. Weights will converge towards their own individual values during training (e.g. 0.5, or -0.2, or 0.0003). If Kim represents the color blue and Sandy represents clear skies, then the relation between particular shades of blue and the presence or absence of a clear sky should end up being encoded in the weight between them.

The interesting thing is that groups of neurons, and connections between them, can be represented spatially. Remember how humans think outside of the box? Let us now consider the box itself. In a neural network, the box is a multidimensional space made of lots of Kims and Sandys. Think of a cube. Associate names to the three edges of the cube: Kim, Sandy, Chidi. Imagine the edges can be divided like a ruler to represent numbers between -1 and 1. Whenever Kim wants to encode the value 0.6, it simply puts a tick at position 0.6 on its edge. Same for Chidi. Two ticks put together make some coordinates, like the ones we sometimes use when talking of latitude and longitude. That is, the information jointly held by Kim and Chidi has a position in space, perhaps in the ‘floor’ of our three-dimensional cube.

Next, let’s assume that Chidi encodes the appearance of clouds. Together, Kim and Chidi seem to hold valuable information for Sandy: a particular shade of deep blue, associated with an absence of clouds, may indicate the presence of beautiful clear sky. Whenever Kim and Chidi pass their information to Sandy, their weights – together with an additional bit of maths called an activation function – will project the coordinates on the floor of the cube to a particular position on the vertical edge associated with Sandy. Here again, the hope is that Kim and Sandy’s weights will be appropriate to tick the right position on Sandy’s edge and give it a positive clear sky value whenever appropriate.

Now, it should be obvious that data looks different depending on the space (i.e. the properties) used to encode it. Take a picture of your house from the front, and then from the sky. The same object looks very different. More crucially, note that it may not be possible to infer what your house looks like from the front by looking at the aerial picture. The aerial view encodes the building from the point of view of longitude and latitude, but it misses out altitude. Whether the side view or the bird’s eye view is ‘better’ depends on the task you want to perform. We usually encode street maps from the sky. But we prefer to show architectural details from the side of a building. Cities, of course, live in three dimensions. But for the sake of economy, we happily chop away one of those dimensions when drawing maps or architectural sketches.

In a similar way, training a neural network results in a specific encoding of properties in the network’s neurons. If a task is best solved by encoding three properties, say, ‘the color blue’, ‘being a plant’ and ‘having needles’, then that is what the system will encode. Never mind the color red, being happy and having fur. Once the network is in the conceptual box of its learned properties, it can’t get out. This doesn’t mean it cannot generate all sorts of output. Go and count the number of points in a 3-dimensional box: there is an infinity. But that infinity is not everything. It is just infinity from a particular perspective.

Of course, state-of-the-art neural nets are huge boxes. They encode very many properties. But they are boxes nevertheless. They are constrained, just as Generative Grammar is constrained. And that is a fundamental difference between them and the human mind.

By virtue of their architecture and training regime, state-of-the-art deep learning models are static. They do not think outside of the box of their acquired patterns. They cannot play with their mental space because they are restricted by their training process. Yes, they can perform well in standard tests. They are top of the class. They did everything the teacher said. But they are not built to be semantically deviant. So no, they are not creative in the full sense of the term. They cannot be.

One may be impressed by the infinite productivity of systems such as Chat-GPT or Stable Diffusion. But infinity is not everything. When it comes to being creative, there is more to the task than mere composition. I saw the other day on a social platform an automatically generated picture of ‘a stormtrooper hoovering a beach’. As fun as the image may have been, the creativity there had come from the human prompt. As far as the model is concerned, a stormtrooper hoovering a beach is just that: stormtrooper + hoover + beach.

Human artists and scientists are by nature deviant and abductive. It may be possible for computational models to one day acquire those abilities. But they will not emerge from the type of systems that are currently so prominent in the news.

Remember those mutations? In the grand scheme of evolution, LLMs lack some fundamental survival skills.

  1. Yorick Wilks. 1973. Preference Semantics. Technical report, Stanford University. 

  2. I am borrowing these examples from this nice paper by Eva Maria Vecchi et al. 

  3. Note that we normally don’t know what neurons actually represent, and this is why neural nets are often referred to as ‘black boxes’. But occasionally, using specific techniques, it is possible to make hypotheses about what a single neuron encodes, and to verify such hypotheses.