In The Art of Science and Engineering Richard Hamming writes a thought provoking essay on the topic of creativity. While reading the chapter I noticed that several of the points he made had an underlying thread that could be followed using the language of search and computation. This is something I’ve thought about in the past in a fragmented way and Hamming’s observations serve as a useful vehicle to explore these ideas further.

Search is both a very basic idea and a deceptively complex one. In its informal framing, search problems are about using algorithms that perform computations in some well defined space to retrieve information or arrive at a solution. This definition includes optimization which immediately makes the subject ubiquitous: seen everywhere from shortest paths to gradient descent.

The more time you spend thinking about search, this initially myopic description grows in size until you seem to see it everywhere. At that point you begin to wonder if the analogy is totally vacuous or extremely powerful, and whether anything is gained from seeing the world this way. I think the right place to begin to think about this question is to pull apart the structure of a search problem.

The components of a search problem are the search algorithm, the search space and how its represented (representation), and the precise definition of the constraints/goals of the problem. In other words it’s delineated by how you search, where you search, and how you know if you’ve found what you’re looking for.

A more formal characterization of search problems would rigorously define what it means to have a starting state, goal state and how to move from one state to the next. It would also be more precise about the model of computation. For the purpose of this discussion I think the higher vantage point will suffice.

It’s not so easy to define creativity or tease apart its components.

Definitions of creativity often include desiderata of novelty and originality. Though these two concepts are related to creativity, Hamming correctly points out that there are important distinctions to be made.

When asked by a colleague for advice on how to do something truly original, Hamming reflects that you could simply multiply two very large numbers and rest easy knowing you were likely the first to do so. As a less contrived example, Hamming talks about modern artists producing apparently novel works of art that may not have value to the average member of the public. He also notes, self deprecatingly, that anything he painted would also have a claim to novelty but would likely not be received as creative.

“Value” is hard to define but it seems essential to the disentangling of these related ideas.

Evidently we want the word “creative” to include the concept of value – but value to whom? (324)

A search space is populated with elements, like Hamming’s large numbers, or paintings of questionable merit. From the perspective of the search problem, many of these elements may have low “value” even though they are feasible. Say the search space is images, and our representation is 1024 X 1024 X 3 channel arrays. In this space most valid elements are not what we would commonly considered a “natural” image like an object or scene. If our goal involves finding natural images, the space is primarily filled with low “value” noise.

In a less rigorously defined problem, or in the presence of uncertainty, matters of taste might make a solution more or less useful. A parsimonious solution may have a better chance of being correct or generalize to related problems. Seen as as an argument to the mathematical community, rather than an objectively verifiable formal object, a proof is an example of this. Among the potentially correct proofs, arguments which appeal to qualitative judgments of value are still preferred.

We can now try and stretch this analogy to domains where judgments of value are even more subjective and the precise structure of the search space more elusive. Imagine every great novel, painting and film as an element of a search space mostly populated by junk.

Sometimes novelty or originality isn’t evident but value is. Hamming points out that creativity isn’t always about generating new ideas and that combining existing ideas in a useful way is often viewed as a creative act.

Creativity seems, among other things, to be “usefully” putting together things which were not perceived to be related before, and it may be the initial psychological distance between the things which counts most. (325)

The idea of relative distances takes us to the role of the search space and how we choose to represent it. The representation is of crucial importance since a search problem might be very difficult in one but trivial in another. A widely repeated example which illustrates this idea is that any two random points in very high dimensional spaces are likely to be orthogonal to each other with high probability. Using distances like cosine or euclidean would then prove frustratingly fruitless, since everything is “far away” from everything else. Projecting these high dimensional points onto a lower dimensional space in a way that preserves important relationships would address this pathology. In the lower dimensional space the original metrics may now yield intuitive results.

In the language of creativity, the utility of representations comes from our ability to make analogies. To be able to see the path from one idea to another and navigate between them.

Probably the most important tool in creativity is the use of an analogy. Something seems like something else which we knew in the past. (328)

To be able to see these analogies, Hamming believes you need to evaluate problems from first principles and understand them deeply. He advocates approaching the same topic repeatedly from different angles and focusing on the fundamentals. By modeling the problems correctly and representing hard earned knowledge appropriately we can learn useful representations to exploit downstream.

We find the analogies when something reminds us of something else – is it only a matter of the “hooks” we have in our minds? (329)

Creating hooks could be viewed as learning representations that reduce the distance between things that have some essential similarity. This gives us the ability to quickly retrieve similar ideas when we encounter something new when otherwise they would not appear to have anything to do with each other.

Hamming believes that luck in this arena favors the prepared mind, but there seems to be a limit to the exploitation of process and habit. Regardless of how well we can utilize analogies or any other tool of creativity, most creative acts still seem to involve periods of hard work, trial and error, inspiration, and dejection.

Then comes the moment of “insight”, creativity or whatever you want to call it – you see the solution. Of course, it often happens that you are wrong; a closer examination of the problem shows the solution is faulty, but might be saved by some suitable revision. (327)

The natural next question is whether the difficulties of being creative have any connection to the difficulties of search problems.

The difficulty of computation in search problems is studied with computational complexity theory. There exists a rich taxonomy of problem classes with rigorous inclusion criteria and well studied properties. For the purposes of this discussion a simpler division into three rough classes of problems will do.

The first class, P, contains those problems that can be solved efficiently, in polynomial time for all inputs. The second, EXP, contains problems that cannot be solved efficiently and require exponential time in general.

There is a third class called NP, which stands for non-deterministic polynomial time. Problems in NP have the interesting property that correct solutions can be verified efficiently even though no general efficient algorithms exist to arrive at these solutions. Whether polynomial time algorithms for these problems exist at all is a famous open question (P=NP?), but it’s suspected that there aren’t.

The false starts and false solutions often sharpen the next approach you try. You now know how not to do it! You have a smaller number of approaches left to explore. You have a better idea of what will not work and possibly why it will not work. (327)

Often there is no way to arrive at a good solution without trial and error, moments of insight, randomness and luck, even if in hindsight you can see the clear and simple path. This reminds me of NP, in that it’s easier to appreciate the realized path retroactively than it is to find the correct path prospectively.

In our definition of $P$, one thing that was glossed over is what “efficient” means. Algorithms that run in polynomial time can still have high leading exponents or constant factors. Consider $X$ vs $100 * X$ or $X^{100}$, the second scales much worse than the first, and the third isn’t “efficient” in any practical sense. These factors often make all the difference in practice even if they don’t affect an algorithm’s membership in the complexity zoo.

In the same way, the difference between an impossible creative act and a tractable one may be the ability to reduce the “constant factor” via the representation of the search space, good heuristics and amount of “compute”.

There are other times where even in hindsight the creative act appears miraculous and just as difficult to appreciate. Here, maybe luck or inspiration play a bigger role than anything else. It’s about stumbling upon the winning path to the solution for a particular instance of a generally difficult class of problems.

Since I’m currently immersed in machine learning and AI, I want to include some brief connections to those topics. This might be a case of “when you have a hammer everything looks like a nail”, but I think the links are suggestive enough to mention.

Since much of machine learning involves optimization: representations and search are of central importance. The choice of representation, model class and learning algorithm contributes to the structure which makes learning possible: called the inductive bias. As an example consider a convolutional neural net classifier that places images into their proper object category. Convolutions are a particular algorithmic choice well suited to exploiting structure in natural images. The internal representations induced by the use of convolutions, when paired with the right learning algorithms and data, make it feasible to “find” the final learned model when otherwise it may have been a needle in a haystack.

The space of all neural networks is populated primarily by classifiers that cannot categorize images correctly. It is through the choice of inductive bias that the search problem becomes feasible. In this example the distinctions between algorithm, representation and model is a little fuzzy.

To make the distinctions more clear, and more relevant to the discussion on creativity we could consider techniques that learn embeddings. Dimensionality reduction algorithms attempt to learn low dimensional representations of high dimensional data in various ways. With a good learned lower dimensional representation of images, we could classify them into object categories effectively using a very simple decision boundary. Given a new image, we could label it by finding the label of the embedding closest to its own. The problem of image classification is made easier by virtue of the representation.

When discussing analogies, I mentioned Hamming’s point about understanding things deeply, modeling the world, and how this related to useful representations. Generative modeling, specifically latent variable modeling, is a formal approach to modeling data generating processes.

Given access to data from some real world process, latent variable models attempt to describe the process that generates that data. This description includes components that represent “unseen” variables that influence the process. These latent variables, once inferred, are useful as a type of representation. When I was a Master’s student, one project I worked on was using a type of latent variable model called a Variational Autoencoder to model biological sequences such as proteins. One application we had in mind was to use the latent representations inferred through the modeling to find “near by” proteins. We were also interested in generating new proteins by moving around in latent space.

All of the connections I’ve described between creativity and search problems are themselves analogies. Some are stronger than others and it seems prudent to wonder how useful or interesting this line of thinking even is.

As Hamming himself says:

Many a poor analogy has proved useful in the hands of experts. This implies the analogy you use is only partial, and you need to be able to abandon it when it is pressed too far; analogies are seldom so perfect that every detail in one situation exactly matches those of the other. (329)

Talking about creativity in the language of search puts the creative individual forward as explorer: seeking novels, paintings, proofs and more in dark seas of noise.

I think this is a nice image to keep in mind the next time you’re banging your head against a wall trying to figure something out.

References

All page number references are for:

Hamming, Richard. The Art of Doing Science and Engineering: Learning to Learn Hardcover . Stripe Press, 2020.

Creativity and search

References