This is the first in a short series of posts about ‘abstractions’. It comes from reading about a concept in the AI Alignment world known as ‘natural abstractions’. I felt there was something interesting in the idea and also that there was an alternate framing of the concept that clarified for me why it was interesting and that I wanted to share.
In this first post I am going to start by giving my own definition of an ‘abstraction’ and discussing some aspects of it. Even though not necessarily precisely equivalent to existing definitions, my definition I think captures the same essence. In later posts, I will go on to talk about why I think abstractions defined in this way are a helpful way of thinking about some important aspects of AI Alignment.
I decided to try and write these posts so as not to require any existing knowledge about ‘natural abstractions’ or indeed too much knowledge of alignment. However, I will try and occasionally explain how the things here fit in with the existing literature. If you are coming to this without any background knowledge, then you can safely ignore those parts.
I also initially started writing these posts in a far more mathematical formal way and decided instead to pare back the level of formalism considerably. It was obfuscating rather than helping explain the ideas, as well as limiting who might understand them. Hopefully I have done so in such a way that any mathematicians among you could formalise the ideas should you wish. In particular, I strongly suspect that category theorists could condense these posts to a fraction of their length!
A definition of an abstraction
I will start with my version my definition of an abstraction and then give some examples.
Let’s say we have a ‘domain’ and a set of questions about that domain.
I then define an abstraction as a map from that domain (and the set of questions) to another domain (and a set of questions about that second domain) that somehow ‘preserves’ the answers to questions.
By that I mean that if we map a question to the second domain, answer the question, we can then map the answer back to the first domain in a way that gives an answer to the original question.
I am not actually requiring that this final answer be a correct answer, but the idea is that we would like it to be. If the answer is in fact always correct, then I will call the abstraction a ‘perfect abstraction’. If the answer is usually correct but not always, then I might refer to the abstraction as a ‘leaky abstraction’.
Example 1: Maps
Let’s make our original domain ‘London’ and our second domain a ‘map of London’.
We can ask navigation questions about ‘London’. We can map those to questions about the ‘map of London’ and answers those questions. We can then map our answers back to give us an answer to our original question so we can actually find our way somewhere in London and end up where we would like to be.
Example 2: Scientific models
An important class of examples is scientific models.
Here our original domain could be ‘a real-life electric circuit with some observations of that circuit’ and our question might be about the current somewhere in the circuit (within certain error bounds).
We map that to our ‘scientific model of the electric circuit’, solve some equations to answer our corresponding question about the model and get the value of the current in the model, which we can then map back to give us the actual value in real life.
Example 3: Mathematics
I am bunching these together because mathematicians are very fond of abstractions.
One way they do this is by coming up with systems of axioms that capture only the most essential parts of a structure. This doesn’t necessarily make the problem easier but it does mean that they don’t have have to keep solving very similar problems again and again. It also usually makes the mathematics easier to understand by paring things back to their essentials.
But just as importantly, mathematicians use abstractions to solve problems. They love to somehow transform a problem in one domain into a problem in another domain that is easier to solve. This precisely fits our definition of an abstraction above.
Here are a couple of examples of such abstractions.
Let’s say our domain is the integers and our question is ‘Is 100003 the sum of two perfect squares?’. It turns out that we can solve this by mapping the problem to the integers modulo 4 (i.e. the remainders we get when dividing integers by 4), answering the question there and then mapping the answer back. Feel free to fill in the details!
A beautiful and more advanced mathematical example is Galois Theory. This can for instance be used to answer the question ‘Can we solve the quintic with a nice formula a bit like the quadratic formula?’. It does this by mapping the question which it phrases as being about mathematical objects called field extensions (our first domain) to a problem about group theory (our second domain). We can answer the question in this second domain and then map the answer back to prove that there is no nice solution to the quintic.
Example 4: Language (and ontologies)
We can also think of language as an abstraction. I map things that I see in the real world to words. So I might map the apple on a tree or in a fruit bowl to the word ‘apple’. In this case, a major type of question it helps answer, although arguably not the only one, is one of communication: ‘what is the other person thinking about?’.
Language isn’t a perfect abstraction. Many words are overloaded: the question ’Is this a cat?’ might be answered differently about a cuddly toy cat if one were a biologist or the parent of a toddler. We are good at figuring out from context which of the overlapping abstractions we mean. There is also a fuzziness to many words - hence the debate over whether a Jaffa Cake is a cake or a biscuit. Nonetheless language stills does a decent job at answering communication questions.
Likewise, if you are familiar with the idea of an ontology, then you can think of an ontology as being an abstraction.
Example 5: Proxies
As a final example that I shall return to later in the context of alignment, anything we regard as a ‘proxy’ can be thought of an abstraction in some sense.
Let’s say the original domain is ‘my happiness’ and our question is ‘how happy am I?’. Then we could map that to the domain of ‘my facial expressions’ and the question ‘How many times did I smile today?’. We can then map back the number of smiles to some measurement of my happiness and regard the number of smiles as a proxy for my happiness.
Of course, how well our original question is answered is debatable, but we do get an ‘answer’ out of the proxy.
Some comments on this definition
The other definitions and descriptions of an ‘abstraction’ in the AI Alignment space out tend to be information-theoretic in nature and my definition also has an information-theoretic flavour to it.
There is a sense in particular in which we are collapsing the domain and throwing away information that we don’t need to answer the question. We could even think of the idea in terms of the algorithm for answering the question in the second domain as being shorter in some way (something which should ring bells for anybody who has heard of Kolmogorov complexity). However, as illustrated by Galois Theory, the second domain doesn’t necessarily need to be ‘lower dimensional’, even although it often will be.
It is important to appreciate that as well as answering the question in the second domain, our map may have an algorithmic complexity to it. If I want to predict whether it will snow tomorrow, then I could do a complicated set of meteorological simulations based on a large set of observations. Or I could say to myself ‘it is 30 degrees Celsius and the sky is clear’ and decide that I don’t need to do all that to make a prediction good enough for my purposes.
Maps for abstractions may also be difficult to find. I find it incredible that Galois came up a map as clever and beautiful as the one that he did.
Abstractions are very question-dependent. Some abstractions may be better for answering some questions about a domain than others. Different types of navigational maps are good for different types of navigation questions. An Ordnance Survey map is great for very different types of navigation than Google Maps. If you wanted to visit my house then I could hand-draw you a map and it would be perfect for that, better even than Google Maps since I know the common mistakes people make finding it. But the map would not be useful for answering any other navigational question.
It is also clear from our examples that an abstraction can often answer more than a single question. As humans, we tend to like abstractions that answer large ranges of questions rather than small ones. I will call an abstraction ‘powerful’ if it answers a large range of questions. This is obviously a rather loose definition as I’m not defining what I mean by a ‘large range’, but it’s still useful terminology: we can easily see that some abstractions are more powerful than others. The theory of electromagnetism answers a larger range of questions than for Ohm’s Law does for example.
Enter uncertainty
Mathematics is clean and precise and gives us perfect abstractions, but in the real world we have uncertainty. In fact most of the questions we might want to answer about the real world could be classed as ‘prediction questions’.
There are two types of uncertainty to consider: epistemic uncertainty related to how much we believe the abstraction and aleatory uncertainty, related to the stochastic nature of the real world.
Both of these make discussing abstractions in general more nuanced than just discussing the perfect abstractions we find in mathematics.
Epistemic Uncertainty
How we decide what to believe depends on our exact epistemology.
For most people however, I think it is fair to say that there are two things that make us believe an abstraction:
How much adversarial testing it has undergone
Whether we have an explanation for it
The first of these is fairly self-explanatory - we try and ‘break’ the abstraction and if we fail to, despite our best ingenuity, then we trust it more.
The idea of an explanation is more subtle. I will define an explanation for an abstraction as a decomposition of the abstraction into other abstractions in which we already have high epistemic confidence. We can think of the subcomponent abstractions as being composed together in the usually way that we compose functions.
For example, there is a formula for the period of a pendulum in terms of its length.
Adversarial testing could involve constructing a weird variety of pendulums in real life and measuring their length and period.
An explanation on the other hand involves proving the formula based on the laws of Newtonian mechanics. These are abstractions themselves, in which we have high epistemic confidence, at least at certain scales.
So there is a slightly recursive nature to this definition - we need to apply the same process to the laws of Newtonian mechanics to have epistemic confidence in them. Interestingly, one way we do adversarial testing is by building new abstractions out of them and then doing adversarial testing on those new abstractions.
So for me a ‘good explanation’ for an abstraction is one that I can decompose into other ‘simpler’ abstraction in which I have strong epistemic confidence via the same two criteria.
But I also think a ‘really good explanation’ is one for which those component abstractions are not only ones in which we have high epistemic confidence but which are also powerful abstractions.
(As an aside, I think this is compatible with Deutsch’s ‘Hard to Vary’ definition of a ‘good explanation’, but I need to think a bit more about exactly how they relate!)
For real life abstractions, as opposed to mathematical abstractions, we can never have complete epistemic confidence in the abstraction. We don’t know if the laws of physics will break down tomorrow in some unexpected way.
We may also have abstractions which have undergone some adversarial testing but not as much as we might like. So we have some confidence in them but not as high as we might for example have for certain laws of physics.
If an explanation for an abstraction is built out of other component abstractions in which we have varying degrees of epistemic confidence then we need to decide how much confidence that then gives us in our abstraction.
Figuring out our epistemic confidence based on these things is where ideas from Bayesian epistemology can enter the fray for people who are so inclined.
Finally I want to point out that we can still have high epistemic confidence in explanation-less knowledge. However I think it is human nature (or at least my human nature!) to seek explanations and not just knowledge.
Aleatory uncertainty
In the examples I gave earlier, you may have noticed that for many questions our abstractions don’t answer the original question definitively but with some likelihood. There is uncertainty in real life, in part due to our lack of perfect observational information and in part due to the inherent uncertainty of quantum mechanics.
When we use an abstraction to answer a question, the likelihood of a correct answer that we are happy with may vary.
For some questions, we care about getting as definite an answer as we can get. We don’t want our bridge to fall down. On the other hand, if we were playing poker, for instance, we might be happy with any abstraction that gives us an edge, however slight. Sometimes we might want to know what the likelihood of a correct answer is and other times we may not care so much.
Dealing with such uncertainty is the domain of statistics. Statistics helps us find models that fit probabilistic data, often by making assumptions about the form of a model and then telling us something about the parameters of the model based on our observed data.
There’s a sense in which we can think of all prediction questions as statistical questions and most of the work elsewhere that has been done on abstractions relates to this aleatory uncertainty.
I expect to come back to talk about this type of uncertainty further at later points in the series. For the moment, I just want to raise as something that we need to constantly bear in mind when we talk about abstractions.
Next up
In the next post in the series, I am going to discuss whether we can ‘order’ abstractions by how ‘good’ they are and different ways we might do so. In particular, I am going to broach the question of whether some abstractions are more ‘natural’ than others.
I am also going to talk about ‘abstraction optimisation processes’ and how that relates to AI. Abstractions can be thought of in a sense as a way to get ‘more intelligence for the same compute power’ so how AI systems end up finding them is very interesting.
Thank you to Dalin Williams for some very helpful feedback on an early draft of some of these ideas and to the rest of my cohort on AGISF for the feedback on my presentation of them.
This is super interesting - thanks for sharing and writing. I was also wondering what the relationship between an abstraction and an explanation is (as you briefly mentioned). I like how you link the idea of a really good abstraction as being hard to vary.
In The Fabric of Reality, Deutsch starts off by describing better explanations having more "depth" (akin to "hard to vary" in the beginning of infinity, but also alluding to "deeper" explanations as being more fundamental) and more "breadth" (i.e. having "reach" across domains).
Linking this to the way you have described abstractions - I wonder whether using your suggestion of abstractions being decomposed into more fundamental abstractions is related to Deutsch's "depth" of explanations. And in parallel, I suppose that abstractions that apply across multiple domains (i.e. what you describe as "powerful") might relate to their "breadth" of explanation.
Let me know what you think.
Also, would you be able to give an example/clarify what you mean here? Really intriguing point but I'm struggling to figure it out: "Finally I want to point out that we can still have high epistemic confidence in explanation-less knowledge. However I think it is human nature (or at least my human nature!) to seek explanations and not just knowledge."
Looking forward to the next part!