Origins of Mind : 09

challenge
Explain the emergence of sophisticated human activities including referential communication and mindreading.
The challenge is to explain the emergence, in evolution or development, of sophisticated forms of human activity including, referential communication and mindreading.
A number of researchers have suggested that meeting this challenge requires us to invoke some kind of social interaction ...
According to what Moll and Tomasello call the Vygotskian Intelligence Hypothesis,

‘participation in … leads children to construct uniquely powerful forms of cognitive representation.’

(Moll & Tomasello 2007)

‘perception, action, and cognition are grounded in

(Knoblich & Sebanz 2006)

‘human cognitive abilities … [are] built upon

(Sinigaglia and Sparaci 2008)

I'm going to assume that they are right.
If we take these ideas seriously, the first question we need to ask is, What kinds of social interaction matters for the emergence of sophisticated human activities?

What kinds of social interaction?  Joint action!

There seems to be some consensus on the idea that joint action is particularly important.
Explain the emergence of sophisticated human activities including referential communication and mindreading.
So the challenge was to explain the emergence of sophisticated human activities including referential communication and mindreading.
conjecture
Joint action plays a role in explaining how sophisticated human activities emerge.
The conjecture I want to consider, borrowed from a variety of researchers, is that joint action plays a role in explaining how sophisticated human activities emerge. There is a compelling objection to this conjecture. It will take me a while to explain what the objection is. The objection arises when we ask ask what joint action is.

Minimally, an account of joint action should explain what distingiushes joint action from parallel but merely individual action.

What distingiushes joint action from parallel but merely individual action?

A paradigm case of joint action would be two sisters cycling to school together.
By contrast, two strangers cycling side-by-side are performing parallel but merely individual actions.
Or, to take another paradigm case,
when members of a flash mob in the Central Cafe respond to a pre-arranged cue by noisily opening their newspapers, they perform a joint action.
But when someone not part of the mob just happens to noisily open her newspaper in response to the same cue, her action is not part of any joint action.%
shared intention

Lots of philosophers and some psychologists think that all joint actions involve shared intention, and even that characterising joint action is fundamentally a matter of characterising shared intention.

‘I take a collective action to involve a collective [shared] intention.’

(Gilbert 2006, p. 5)

‘The sine qua non of collaborative action is a joint goal [shared intention] and a joint commitment’

(Tomasello 2008, p. 181)

‘the key property of joint action lies in its internal component [...] in the participants’ having a “collective” or “shared” intention.’

(Alonso 2009, pp. 444-5)

‘Shared intentionality is the foundation upon which joint action is built.’

(Carpenter 2009, p. 381)

shared intention

It is helpful to draw a parallel with individual action ... Consider Davidson’s question is, Which events are actions?
Suppose we ask, Which events are actions? (This is Davidson’s question.) Here the contrast is with things that merely happen to an agent. To illustrate, we might be struck by the contrast between your arm being caused to go up by forces beyond your control, and the action you perform when you raise your arm. Or we might be struck by the contrast between mere reflexes, such as the eyeblink reflex, and the action of blinking your eyes (perhaps to greet someone).
One quite natural and certainly influential way to answer this question is by appeal to intention. The idea is that events are actions in virtue of being appropriately related to an intention of yours.
Note that I’m not confidently endorsing this answer; in fact I’m not even confident that the question is ultimately the right question to ask. I’m just suggesting this is a reasonably straightforward starting point for us.
This question about ordinary, individual action is parallel to our current working question about joint action, which we might phrase as ‘Which events are *joint* actions’ ...
Now we can see one attraction of appealing to shared intention. It allows us to give a parallel answer to the question about joint action: a joint action is an event which is appropriately related to a shared intention.
So to the extent that we are persuaded by the standard account of which events are actions, it is natural to aim for a structurally parallel account of which events are joint actions. To do this we merely have to characterise shared intentions.
As we shall see, there are long running, deep conflicts over the nature of shared intention. The range of different approaches can be quite daunting. This parallel between intention and shared intention is important because it is a rare point on wihch almost everyone will agree. \textbf{Despite the disagreement on details, I think one thing almost everyone agrees about is this: shared intention is to joint action at least approximately what ordinary individual intention is to ordinary, individual action.}
It’s important to acknowledge that we haven’t yet said anything very informative about what shared intention is. The question was, Which events are joint actions? The answer was, those which stand in an appropriate relation to a shared intention. Then we ask what a shared intention is. And the answer is, it’s something in virtue of which events are joint actions. I don’t think the circle makes this completely useless; but I’m mentioning the circularity to stress that we don’t yet have an account of what shared intention is. An account of shared intention should to provide deep insight into the nature of shared agency.
‘I will … adopt Bratman’s … influential formulation of joint action … each partner needs to intend to perform the joint action together ‘‘in accordance with and because of meshing subplans’’ (p. 338) and this needs to be common knowledge between the participants.’
Carpenter(2009, p. 281)

What is shared intention?

Functional characterisation:

shared intention serves to (a) coordinate activities, (b) coordinate planning and (c) structure bargaining

Constraint:

Inferential integration... and normative integration (e.g. agglomeration)

In making this idea more precise, Bratman proposes sufficient conditions for us to have a shared intention that we J ...

Substantial account:

We have a shared intention that we J if

‘1. (a) I intend that we J and (b) you intend that we J

‘2. I intend that we J in accordance with and because of la, lb, and meshing subplans of la and lb; you intend [likewise] …

‘3. 1 and 2 are common knowledge between us’

(Bratman 1993: View 4)

... the idea is then that an intentional joint action is an action that is appropriately related to a shared intention.
Explain the emergence of sophisticated human activities including referential communication and mindreading.
Joint action plays a role in explaining how sophisticated human activities emerge.

What is shared intention?

Functional characterisation:

shared intention serves to (a) coordinate activities, (b) coordinate planning and (c) structure bargaining

Constraint:

Inferential integration... and normative integration (e.g. agglomeration)

Substantial account:

We have a shared intention that we J if

‘1. (a) I intend that we J and (b) you intend that we J

‘2. I intend that we J in accordance with and because of la, lb, and meshing subplans of la and lb; you intend [likewise] …

‘3. 1 and 2 are common knowledge between us’

(Bratman 1993: View 4)

Note that the conditions require not just that we intend the joint action, but that we intend it because of each other's intentions, where this is common knowledge.
So we need not just intentions about intentions ...
... also you need to know things about my knowledge of your intentions concerning my intentions.
This indicates that, in general, having shared intentions requires mindreading at close to (or perhaps just beyond) the limits of most adult humans' abilities. Bratman's account of shared intention is an example where reciprocity is modeled as higher-order escalation.
Objection: Meeting the sufficient conditions for joint action given by Bratman’s account could not significantly \textit{explain} the development of an understanding of minds because it already \textit{presupposes} too much sophistication in the use of psychological concepts.
And this is a problem for us ...
Explain the emergence of sophisticated human activities including referential communication and mindreading.
Joint action plays a role in explaining how sophisticated human activities emerge.
This is a problem because our conjecture was that joint action plays a role in explaining how sophisticated human activities emerge.
objection
Joint action presupposes mindreading at the limits of human abilities.
But if joint action presupposes mindreading at close to the limits of human abilities, and if mindreading abilities are a paradigm case of humans' cognitive sophistication, then we must reject the conjecture. For in appealing to joint action we would be presupposing what was supposed to be explained. In what follows I want to defend the conjecture by identifying a way around the objection.
But before I do this, I want to mention a problem with the objection ...

Functional characterisation:

shared intention serves to (a) coordinate activities, (b) coordinate planning and (c) structure bargaining

Substantial account:

We have a shared intention that we J if

‘1. (a) I intend that we J and (b) you intend that we J

‘2. I intend that we J in accordance with and because of la, lb, and meshing subplans of la and lb; you intend [likewise] …

‘3. 1 and 2 are common knowledge between us’

(Bratman 1993: View 4)

Note that these conditions are offered as sufficient but not necessary. (Bratman originally claimed that they were necessary and sufficient, but nothing in the construction rules out alternative realisations of the functional characterisation of shared intention.)
As it stands, then, this objection does not establish much. It concerns conditions imposed by the substantial account of shared intention which are sufficient but not necessary conditions. The substantial account is supposed to characterise one—perhaps one among many—ways in which the functional role of shared intentions can be realised. So the objection serves only to raise a question.
\textbf{Are there in fact alternative sufficient conditions for shared intention, conditions that can be met without already having abilities to use psychological concepts whose development was supposed to be explained by joint action?}
The answer to this question is not entirely straightforward. We must begin with the functional roles of shared intention, for these provide necessary conditions. One of the roles of shared intentions is to coordinate planning. What does coordinating planning involve? Intuitively the idea is that just as individual intentions serve to coordinate an individual’s planning over time, so shared intentions coordinate planning between agents. (I use the terms ‘individual intention’ and ‘individual goal’ to refer to intentions and goals explanatory of individual actions; an ‘individual action’ is an action performed by just one agent such as that described by the sentence ‘Ayesha repaired the puncture all by herself’.) A second role for shared intentions is to structure bargaining concerning plans. To understand these roles it is essential to understand what ‘planning’ means in this context. The term ‘planning’ is sometimes used quite broadly to encompass processes involved in low-level control over the execution of sequences of movements, as is often required for manipulating objects manually \citep[e.g.][]{en_1535}, as well as processes controlling the movements of a limb on a single trajectory \citep[e.g.][]{en_1681}. In Bratman’s account and this paper, the term ‘planning’ is used in a narrower sense. Planning in this narrow sense exists to coordinate an agent’s various activities over relatively long intervals of time; it involves practical reasoning and forming intentions which may themselves require further planning, generating a hierachy of plans and subplans. Paradigm cases include planning a birthday party or planning to move house.
Given the functional roles of shared intention, when (if ever) must the states which realise shared intentions include intentions about others’ intentions? Coordinating plans with others does not seem always or in principle to require specific intentions about others’ intentions. It is plausible that in everyday life some of our plans are coordinated largely thanks to a background of shared preferences, habits and conventions. Consider, for example, people who often meet in a set place at a fixed time of day to discuss research over lunch. These people can coordinate their lunch plans merely by setting a date and following established routine; providing nothing unexpected happens, they seem not to need intentions about each other’s intentions. Within limits, then, coordinating plans may not always require intentions about intentions. The same may hold for structuring bargaining. But when the background of shared preferences, habits and conventions is not sufficient to ensure that our plans will be coordinated, it is necessary to monitor or manipulate others’ plans. And since intentions are the basic elements of plans (in the special sense of ‘plan’ in terms of which Bratman defined shared intention), this means monitoring or manipulating others’ intentions. The background which makes for effortlessly coordinated planning is absent when our aims are sufficiently novel, when the circumstances sufficiently unusual (as in many emergencies), and when our co-actors are sufficiently unfamiliar. In all of these cases, coordinating plans and structuring bargaining will involve monitoring or manipulating others’ intentions. Now this does not necessarily involve forming intentions about their intentions because, in principle, monitoring and manipulating others’ intentions could (within limits) be achieved by representing states which serve as proxies for intentions rather than by representing intentions as such, much as one can (within limits) monitor and manipulate others’ visual perceptions by representing their lines of sight. But possession of general abilities to monitor and manipulate others’ intentions does require being able to form intentions about others’ intentions.
The question was whether there are sufficient conditions for shared intention which do not presuppose abilities to use psychological concepts whose development is supposed to be explained by joint action. As promised, the answer is not straightforward. In a limited range of cases, coordinating plans and perhaps structuring bargaining does not appear to require insights into other minds. But in other cases, particularly cases involving novel aims or agents unfamiliar with each other, intentions about others’ intentions are generally required.
The main question for this section was whether Bratman’s account captures a notion of joint action suitable for explaining the early development of children’s abilities to think about minds. Some of the joint actions which young children engage in involve novel aims, and some involve unfamiliar partners. So if these joint actions did involve coordinating planning and structuring bargaining, they could not rest on a shared background but would require abilities to form intentions about others’ intentions. It follows that joint action would presuppose much of the sophistication in the use of psychological concepts whose development it was supposed to explain. So given the premise that joint action plays a role in explaining early developments in understanding minds, it cannot be the case that the joint actions children engage in as soon as they engage in any joint actions involve shared intentions as characterised by Bratman.
Explain the emergence of sophisticated human activities including referential communication and mindreading.
Joint action plays a role in explaining how sophisticated human activities emerge.
Joint action presupposes mindreading at the limits of human abilities.
How to get around the objection?

1. joint action fosters an understanding of minds;

2. all joint action involves shared intention; and

3. a function of shared intention is to coordinate two or more agents’ plans.

The objection arises because not all of the following claims are true: % \begin{quote} (1) joint action fosters an understanding of minds; (2) all joint action involves shared intention; and (3) a function of shared intention is to coordinate two or more agents’ plans. \end{quote} % These claims are inconsistent because if the second and third were both true, abilities to engage in joint action would presuppose, and so could not significantly foster, an understanding of minds.
What are our options?
This is a bad option; it either involves rejecting claims about intention that amount to saying there is no such thing as intention, or else it involves breaking the parallel between intention and shared intention. But that parallel is pretty much all we have to anchor or understanding of shared intention.
This is the claim I will eventually reject. But first let us examine children’s capacities.

## Development of Joint Action: Planning

\section{Development of Joint Action: Planning}

\section{Development of Joint Action: Planning}

Objection: ‘Despite the common impression that joint action needs to be dumbed down for infants due to their ‘‘lack of a robust theory of mind’’ ... all the important social-cognitive building blocks for joint action appear to be in place: 1-year-old infants understand quite a bit about others’ goals and intentions and what knowledge they share with others’

Carpenter conflates goals and intentions, so ignores the key difference between actions and plans.

‘I ... adopt Bratman’s (1992) influential formulation of joint action or shared cooperative activity. Bratman argued that in order for an activity to be considered shared or joint each partner needs to intend to perform the joint action together ‘‘in accordance with and because of meshing subplans’’ (p. 338) and this needs to be common knowledge between the participants’

\citep[p.~381]{carpenter:2009_howjoint}.

Carpenter, 2009

So: the objection I just offered to taking Bratman’s account of shared intention and joint action to characterise the notion of joint action of interest in explaining development was narrowly theoretical.
The objection was that you can’t explain the developmental emergence of mindreading by invoking joint action if your account of joint action implies that abilities to perform joint actions presuppose sophisticated mindreading.
Accepting this theoretical objection would be consistent with accepting Carpenter’s claim. The only consequence is that we would have to reject the conjecture that you can explain the developmental emergence of mindreading by invoking joint action.
Explain the emergence of sophisticated human activities including referential communication and mindreading.
Joint action plays a role in explaining how sophisticated human activities emerge.
Joint action presupposes mindreading at the limits of human abilities.
So I think Carpenter is saying: the objection is correct, and we should reject the conjecture.

‘shared intentional agency [i.e. ‘joint action’] consists, at bottom, in interconnected planning’

Bratman, 2011 p. 11

‘shared intentional agency [i.e. ‘joint action’] consists, at bottom, in interconnected planning agency of the participants’ \citep{Bratman:2011fk}.

Paulus et al, 2016 figure 1

Task: give the tool to another person, who needs to put the spherical end into the box. (Tip: you need to grasp it by the spherical end and pass it so that the other takes the cube-end; they can then insert it optimally.)

Paulus et al, 2016 figure 2B

‘3- and 5-year-old children do not consider another person’s actions in their own action planning (while showing action planning when acting alone on the apparatus).

Seven-year-old children and adults however, demonstrated evidence for joint action planning. ... While adult participants demonstrated the presence of joint action planning from the very first trials onward, this was not the case for the 7-year-old children who improved their performance across trials.’

Paulus et al, 2016 p. 1059

Warneken et al, 2014 figure 1A

‘One child had to insert the turn-tool on the right of the apparatus and then turn so that the metal rod stretching across moved the panel out of the way of the ball. The other person could then insert the push tool on the left, pushing the silver ball into the hole similar to a billiard cue.’ \citep{warneken:2014_young}

Warneken et al, 2014 figure 2

Unidirectional : child A has to select the tool that B doesn’t have.
Bidirectional : child A can select either tool.
‘(a) Unidirectional: The left box will be opened first. Only the left child has a choice. For success, this child has to choose the push tool (lower left: thick handle, long thin top). The partner child has to retrieve the only available turn tool (upper right: thick handle, short thin top).’ \citep{warneken:2014_young}

Warneken et al, 2014 figure 3

BU - first bidirectional then unidirectional.
The three year olds are hopeless in all conditions except the bidirectional condition when they have first had the unimanual condition. So there is no forward planning, but there is some evidence that three-year-olds can take into account what another has done.
‘by age 3 children are able to learn, under certain circumstances, to take account of what a partner is doing in a collaborative problem-solving context. By age 5 they are already quite skillful at attending to and even anticipating a partner’s actions’ \citep[p.~57]{warneken:2014_young}.

What is shared intention?

Functional characterisation:

shared intention serves to (a) coordinate activities, (b) coordinate planning and (c) structure bargaining

Constraint:

Inferential integration... and normative integration (e.g. agglomeration)

Substantial account:

We have a shared intention that we J if

‘1. (a) I intend that we J and (b) you intend that we J

‘2. I intend that we J in accordance with and because of la, lb, and meshing subplans of la and lb; you intend [likewise] …

‘3. 1 and 2 are common knowledge between us’

(Bratman 1993: View 4)

Note that the conditions require not just that we intend the joint action, but that we intend it because of each other's intentions, where this is common knowledge.

Mismatch:

Bratman’s account of joint action

vs

1- to 3-year-olds’ joint action abilities

All the evidence has suggests that there is a mismatch between Bratman’s account of joint action and 1- to 3-year-olds’ joint action abilities.
Let’s consider two more studies on this ...
I started with Carpenter’s objection ...

Objection: ‘Despite the common impression that joint action needs to be dumbed down for infants due to their ‘‘lack of a robust theory of mind’’ ... all the important social-cognitive building blocks for joint action appear to be in place: 1-year-old infants understand quite a bit about others’ goals and intentions and what knowledge they share with others’

Carpenter conflates goals and intentions, so ignores the key difference between actions and plans.

‘I ... adopt Bratman’s (1992) influential formulation of joint action or shared cooperative activity. Bratman argued that in order for an activity to be considered shared or joint each partner needs to intend to perform the joint action together ‘‘in accordance with and because of meshing subplans’’ (p. 338) and this needs to be common knowledge between the participants’

Carpenter, 2009

It turns out, I think, that Carpenter is wrong to this extent. Whatever exactly one-year-olds mindreading abilities, they do not seem to be making much use of information about others’ intentions in performing joint actions. And so it is wrong to think of their abilities in terms of Bratman’s account of joint actions.

1. joint action fosters an understanding of minds;

2. all joint action involves shared intention; and

3. a function of shared intention is to coordinate two or more agents’ plans.

Explain the emergence of sophisticated human activities including referential communication and mindreading.
Joint action plays a role in explaining how sophisticated human activities emerge.
Joint action presupposes mindreading at the limits of human abilities.
So with respect to our overall line of enquiry, it may still be possible to hold on to the conjecture if we can overcome the objection.

## Development of Joint Action: Years 1-2

No planning ...

... So which joint actions can one- and two-year-olds perform?

4-6 months

Methods: still face; replay (infants detect whether caregiver reacts, so are less satisifed with a replay).
\citet[p.~196]{brownell:2011_early} comment: ‘infants become progressively tuned to the timing and structure of dyadic exchange’

6-12 months

\citet[p.~197]{brownell:2011_early} comment: ‘adult-infant dyadic interactions expand to include objects, events, and individuals outside of the dyad (Moore and Dunham 1995)’

~ 12-24 months

infants initiate and re-start joint actions

e.g. ‘peek-a-boo; tickle; rhythmic games; chase’

Brownell, 2011

\citet[p.~197]{brownell:2011_early} comment: ‘Eventually, infants begin themselves to initiate joint action with adults and to respond in unique ways when adults violate their expectations for participation in the joint activity. For example, if a parent becomes distracted during peek-a-boo and fails to take her turn, 12-month olds may try to re-start the game by vocalizing to the adult or by re- enacting a well-rehearsed part of the game such as placing the cloth over their own face and waiting. One-year olds also begin to point to interesting sights and events to share their interest and affect and they expect adults to respond appropriately by looking (Liszkowski, et al 2006).’
‘infants learn about cooperation by participating in joint action structured by skilled and knowledgeable interactive partners before they can represent, understand, or generate it themselves. Cooperative joint action develops in the context of dyadic interaction with adults in which the adult initially takes responsibility for and actively structures the joint activity and the infant progressively comes to master the structure, timing, and communications involved in the joint action with the support and guidance of the adult. ... Eager participants from the beginning, it takes approximately 2 years for infants to become autonomous contributors to sustained, goal-directed joint activity as active, collaborative partners’ \citep[p.~200]{brownell:2011_early}.
‘Without the structure and scaffolding provided by the expert adult partner, 1-year-old children are unable to generate and sustain joint action with each other in the service of an external goal. By age two, however, they can do so readily, even with unfamiliar agemates and on novel, unfamiliar tasks’ \citep[p.~204]{brownell:2011_early}.

Warneken and Tomasello, 2007 figure 2 (part)

Ages: 14, 18 and 24 months.
Elevator task: free an object from a cylinder. Two roles. Role A: position yourself in the right location to retrieve the target object. Role B: push up the cylinder and hold it in place while another retrieves it.

Warneken and Tomasello, 2007 figure 3 (part)

‘The 14-month-olds of this study displayed coordinated behaviors in the elevator task Role A of positioning themselves in the right location and retrieving the target object from the cylinder when the partner pushed it up, but they had major problems performing Role B, pushing the cylinder up and holding it in place until the partner could fetch the object. If they pushed up the cylinder at all, they would repeatedly drop it when the other person was just about to take the object out’ \citep{warneken:2007_helping}.
\citet[p.~200]{brownell:2011_early} comment: ‘Across these non-routine tasks, 18-month olds’ behavior with the adult partner was rated as predominantly “uncoordinated” (vs. “coordinated” or “very coordinated”) and the children exhibited “low” cooperative engagement (vs. “medium” or “high”). On those tasks requiring children to anticipate the partner’s actions and to adjust their behavior accordingly, 18-month olds’ performance did not differ from chance. By age two, children operated at “medium” levels of cooperative engagement and were above chance in anticipating and coordinating their behavior with the adult.’
‘social coordinations show a marked improvement between children at 14 and 18 months of age’ \citep{warneken:2007_helping}.

Joint Action in Years 1-2

In the first and second years of life,

there is joint action

but it does not appear to involve planning agency

or shared intention.

Bratman’s account does not characterise

the sort of joint actions

infants perform in the first and second years of life.

Two-year-olds perform some joint actions but not others.
What distinguishes the joint actions they can perform from those they cannot?

## Collective Goals vs Shared Intentions

Explain the emergence of sophisticated human activities including referential communication and mindreading.
Joint action plays a role in explaining how sophisticated human activities emerge.
Joint action presupposes mindreading at the limits of human abilities.
How to get around the objection? Maybe we have to construct an alternative notion of joint action?

‘all sorts of joint activity is possible without conscious goal representations, complex reasoning, and advanced self-other understanding ...

In studying its development in children the problem is how to characterize and differentiate primitive, lower levels of joint action operationally from more complex and cognitively sophisticated forms’

Brownell, 2011 p. 195

‘all sorts of joint activity is possible without conscious goal representations, complex reasoning, and advanced self-other understanding ... both in other species and in our own joint behavior as adults, some of which occurs outside of reflective awareness ... In studying its development in children the problem is how to characterize and differentiate primitive, lower levels of joint action operationally from more complex and cognitively sophisticated forms’ \citep[p.~195]{brownell:2011_early}.
Ayesha takes a glass and holds it up while Beatrice pours prosecco; unfortunately the prosecco misses the glass and soak Zachs’s trousers.
Here are two sentences, both true:

The tiny drops fell from the bottle.

- distributive

The tiny drops soaked Zach’s trousers.

- collective

The first sentence is naturally read *distributively*; that is, as specifying something that each drop did individually. Perhaps first drop one fell, then another fell.
But the second sentence is naturally read *collectively*. No one drop soaked Zach’s trousers; rather the soaking was something that the drops accomplised together.
If the sentence is true on this reading, the tiny drops' soaking Zach’s trousers is not a matter of each drop soaking Zach’s trousers.
Now consider an example involving actions and their outcomes:

Their thoughtless actions soaked Zach’s trousers. [causal]

- ambiguous

This sentence can be read in two ways, distributively or collectively. We can imagine that we are talking about a sequence of actions done over a period of time, each of which soaked Zach’s trousers. In this case the outcome, soaking Zach’s trousers, is an outcome of each action.
Alternatively we can imagine several actions which have this outcome collectively---as in our illustration where Ayesha holds a glass while Beatrice pours. In this case the outcome, soaking Zach’s trousers, is not necessarily an outcome of any of the individual actions but it is an outcome of all of them taken together. That is, it is a collective outcome.
(Here I'm ignoring complications associated with the possibility that some of the actions collectively soaked Zach’s trousers while others did so distributively.)
Note that there is a genuine ambiguity here. To see this, ask yourself how many times Zach’s trousers were soaked. On the distributive reading they were soaked at least as many times as there are actions. On the collective reading they were not necessarily soaked more than once. (On the distributive reading there are several outcomes of the same type and each action has a different token outcome of this type; on the collective reading there is a single token outcome which is the outcome of two or more actions.)
Conclusion so far: two or more actions involving multiple agents can have outcomes distributively or collectively. This is not just a matter of words; there is a difference in the relation between the actions and the outcome.
Now consider one last sentence:

The goal of their actions was to fill Zach’s glass. [teleological]

Whereas the previous sentence was causal, and so concened an actual outcome of some actions, this sentence is teleological, and so concerns an outcome to which actions are directed.

- also ambiguous

Like the previous sentence, this sentence has both distributive and collective readings. On the distributive reading, each of their actions was directed to an outcome, namely soaking Zach’s trousers. So there were as many attempts on his trousers as there are actions. On the collective reading, by contrast, it is not necessary that any of the actions considered individually was directed to this outcome; rather the actions were collectively directed to this outcome.
Conclusion so far: two or more actions involving multiple agents can be collectively directed to an outcome.
Where two or more actions are collectively directed to an outcome, we will say that this outcome is a *collective goal* of the actions. Note two things. First, this definition involves no assumptions about the intentions or other mental states of the agents. Relatedly, it is the actions rather than the agents which have a collective goal. Second, a collective goal is just an actual or possible outcome of an action.
An outcome is a \emph{collective goal} of two or more actions involving multiple agents if it is an outcome to which those actions are collectively directed \citep{butterfill:2016_minimal}.
We provide a defintion of joint to include the notion of a collective goal ...

Joint action:

An event involving two or more agents where the agents’ actions have a collective goal.

Is this good enough? I’m not sure it is. But note that it is agnostic about mechanisms ... Our acting on a shared intention is one way of for our actions to have a collective goal; but maybe there are others ...

In virtue of what do actions involving multiple agents ever have collective goals?

Recall how Ayesha takes a glass and holds it up while Beatrice pours prosecco; and unfortunately the prosecco misses the glass, soaking Zachs’s trousers. Ayesha might say, truthfully, ‘The collective goal of our actions was not to soak Zach's trousers in sparkling wine but only to fill this glass.’ What could make Ayesha’s statement true?
As this illustrates, some actions involving multiple agents are purposive in the sense that
among all their actual and possible consequences,
there are outcomes to which they are directed
and the actions are collectively directed to this outcome
so it is not just a matter of each individual action being directed to this outcome.
In such cases we can say that the actions have a collective goal.
As what Ayesha and Beatrice are doing---filling a glass together---is a paradigm case of joint action, it might seem natural to answer the question by invoking a notion of shared (or `collective') intention. Suppose Ayesha and Beatrice have a shared intention that they fill the glass. Then, on many accounts of shared intention,
the shared intention involves each of them intending that they, Ayesha and Beatrice, fill the glass; or each of them being in some other state which picks out this outcome.
The shared intention also provides for the coordination of their actions (so that, for example, Beatrice doesn't start pouring until Ayesha is holding the glass under the bottle). And coordination of this type would normally facilitate occurrences of the type of outcome intended. In this way, invoking a notion of shared intention provides one answer to our question about what it is for some actions to be collectively directed to an outcome.
Are there also ways of answering the question which involve psychological structures other than shared intention? In this paper we shall draw on recent discoveries about how multiple agents coordinate their actions to argue that the collective directedness of some actions to an outcome can be explained in terms of a particular interagential structure of motor representations. Our actions having collective goals is not always only a matter of what we intend: sometimes it constitutively involves motor representation.

Shared Goals

Functional role: coordinate actions

Our actions do, or will, have a collective goal, G, because:

(i) We each expect the other(s) to perform an action directed to G.

(ii) We each expect that if G occurs, it will occur as a common effect of all of our actions.

For us to have a \emph{shared goal} $G$ is for $G$ to be a collective goal of our present or future actions in virtue of the facts that: \begin{enumerate} \item We each expect the other(s) to perform an action directed to G. \item We each expect that if G occurs, it will occur as a common effect of all of our actions. \end{enumerate} (Compare \citealp{Butterfill:2011fk,vesper_minimal_2010}.)
Explain the emergence of sophisticated human activities including referential communication and mindreading.
Joint action plays a role in explaining how sophisticated human activities emerge.
Joint action presupposes mindreading at the limits of human abilities.
The objection will seem unanswerable if we assume that all joint actions involve shared intention and that Bratman’s account of shared intention is correct.
But I’ve argued that the evidence suggests that children in the second and third years of life are not in the business of coordinating plans, so Bratman’s account of shared intention does not characterise the way they understand the joint activities that they participate in.
For this reason we should not accept that all joint actions involve shared intention and that Bratman’s account of shared intention is correct.
Instead, I think we can allow that there are joint actions which do not involve shared intentions but instead involve shared goals. It is these joint actions that we will need to appeal to in explaining the emergence of mindreading and referential communication.

How?

Joint action explains the emergence of referential communication.

Our next (and last) question ...
‘the basic skills and motivations for shared intentionality typically emerge at around the first birthday from the interaction of two developmental trajectories, each representing an evolutionary adaptation from some different point in time. The first trajectory is a general primate (or perhaps great ape) line of development for understanding intentional action and perception, which evolved in the context of primates’ crucially important competitive interactions with one another over food, mates, and other resources (Machiavellian intelligence; Byrne & Whiten, 1988). The second trajectory is a uniquely human line of development for sharing psychological states with others, which seems to be present in nascent form from very early in human ontogeny as infants share emotional states with others in turn-taking sequences (Trevarthen, 1979). The interaction of these two lines of development creates, at around 1 year of age, skills and motivations for sharing psychological states with others in fairly local social interactions, and then later skills and motivations for reacting to and even internalizing various kinds of social norms, collective beliefs, and cultural institutions’ \citep[p~124]{Tomasello:2007gl}.