Are Agents Decomposable?
Is the Orthogonality Thesis related to whether we can decompose agents?
I’ve been wondering recently about whether all ‘agents’ can be decomposed into a 'Prediction Component' and a 'Decision Component' where the Decision Component has a set of actions available to select between (which can include actions that make observations). It can ask questions of the Prediction Component but can't make predictions itself.
The Decision Component could be the policy of reinforcement learning agent, it could be based on a utility function or it could be something else entirely. Some agents might be 'high' in prediction making ability but have a very simplistic decision-making process (e.g. ChatGPT) while others might have relatively low prediction capability but possibly more complicated decision making (many biological species).
Such decomposability has some relationship I think to The Orthogonality Thesis that the goals and intelligence of AGI system are orthogonal. I think of the Prediction Component as being ‘intelligence’ while the Decision Component is more akin to ‘wisdom’ with ‘intelligence and ‘wisdom’ being orthogonal.
It is worth pointing out that as well as questions along the lines of 'if I took this action, what do I predict about my future observations?', the questions could include formalised versions of things like:
If I take this action to modify this aspect of my prediction process or decision making process, what do you predict will happen?
Of my available set of actions, which do you predict will most increase my future set of actions?
If I ask you to make predictions for this set of prediction questions, what do you predict will happen?
Which predictions should I focus attention on to be most likely to attain a particular goal?
Which actions will improve my prediction making abilities in future?
Of course, life is also made more interesting by the fact that actions could also lead the creation of new agents. If we think of evolution as an agent then it created humans. Humans have created computers and so on.
It makes me think that we also can’t think of agents without thinking of other parts of their decision making loops. ChatGPT may not have much of a Decision Component by itself but we have to think of its Decision Component as currently being the humans who use ChatGPT.
I don’t know if all agents are decomposable in this way - are there any clear counterexamples? What might a non-decomposable agent look like? Being able to see if the Decision Component of an agent is changing feels like a core thing that we would want interpretability to be able to do and it feels like to do that we want some way of factoring out that component from prediction capabilities.