Suppose that this is how a given casino’s 10-cent slot machine works: it has a random number generator which produces a string of numbers between 1 and 1000, given a seed value. Pulls of the lever are put into correspondence, chronologically, with this randomly-generated string. If a lever pull matches a certain designated number, say, 222, then that lever pull gets a payout of $90. Here’s a proposition about these slot machines:

A) The objective chance that the slot machine pays out, on any given pull, is 1/1000.

It’s true that, if we were to know the value of the seed and the nature of the random number generator, then we could figure out precisely when the machine will pay out. But, given determinism, precisely the same thing is true of any coin flip or die roll. Were we to know the precise microphysical initial conditions of the coin flip and the laws of nature, we could figure out whether the coin will land heads or tails. This is no obstacle to there being an objective chance associated with an event – it only tells us that a precise specification of the microphysical initial conditions is inadmissible information. Similarly, the seed-value and the method of random number generation is inadmissible information when it comes to the slot machine. But this doesn’t mean that there isn’t an objective chance that the slot machine pays out, on any given pull.

Here’s another proposition:

B) The objective chance that any given roll of a fair, six-faced die lands 1-up is 1/6.

This should be beyond reproach.

Finally, consider this proposition:

C) If there is a robust causal law to the effect that events of type A cause all and only events of type B — so that every A event leads to a B event, and no B event is caused by anything other than an A event — then the objective chance of an A event occurring is equal to the objective chance of a B event occurring.

Besides being intuitively plausible, I take (C) to be one of the central claims underlying the Bayes-Net approach to testing causal hypotheses. When we model causation and objective chance in the way specified by Pearl’s and Spirtes et. al.’s causal models, we allow the causal laws codified in the structural equations to induce a probability function over the endogenous variables. If (C) were false, then this would be illegitimate.

The Puzzle is that (A), (B), and (C) are inconsistent, as the following story demonstrates.

Suppose that the casino owners want to know the seed value for their slot machine. They want, that is, inadmissible information that will let them calculate, ahead of time, what their bottom line will look like after a certain number of pulls of the slot machine. However, while protective of their bottom line, they aren’t unscrupulous. They don’t want to plant the seed, they just want to know what it is. So, here’s what they do: they produce 6 randomly-selected seed values, using standard techniques (clipping three numbers from the end of a 10-digit decimal expansion of an arbitrarily selected time, e.g.). Then, they roll a die to determine which of these seed values will go into the slot machine.

Suppose that it’s true that, if the first seed is selected, then the slot machine will pay out on the 1001st pull of the lever. If any of the other seeds are selected, then the slot machine will not pay out on the 1001st pull. Then, there is a robust causal law asserting the following: The slot machine will pay out on the 1001st pull if and only if the die landed 1-up.

If (B) is true, then the objective chance of the die landing 1-up is 1/6. But then, if (C) is true, then the objective chance of the machine paying out on the 1001st pull is 1/6 — since there is a robust causal law saying that the die lands 1-up if and only if the 1001st pull pays out. By (C), the objective chance of the cause must be equal to the objective chance of the effect. So the objective chance of the machine paying out on the 1001st pull must be 1/6. But this contradicts (A), which says that the objective chance of the machine paying out on any given pull is 1/1000, not 1/6.

It’s true, of course, that both the die roll and the causal law involves all sorts of inadmissible information. But inadmissible information is only relevant to the question of what our credence should be. The puzzle, as I’ve formulated it, has absolutely nothing to do with credence. It has to do only with the objective chance function, and the connection between the objective chances of various events which are related by robust causal laws.

To get a contradiction, you need to make assumptions about how many argument places the objective chance function has. I think the right assumptions won’t lead to contradiction.

Here’s one picture–objective chance is a function from world/time/proposition triples to values between 0 and 1. (You don’t say anything about chances being time-relative, but most people make that assumption.) If you set things up this way, I think you really do get trouble–let w be the world of the example, specify a time t before the die is rolled, and we get two incompatible answers to the question: “what is the value of the objective chance function for the argument (w, t, the machine pays out)?”

But there are independent reasons to think the objective chance function shouldn’t be thought of this way. Especially if we agree that non-trivial chances are possible given determinism, it’s plausible to say that propositions only have chances relative to some “grounds.” We might talk about the chance of P at t relative to all the microphysical facts, (maybe these chances are all 0 or 1 if determinism is true) relative to just the macrophysical facts, relative to some subset of the macrophysical facts (e.g., facts about the pressure and volume of some particular gas), etc. Once we add this extra argument place to the objective chance function, we can hold that the argument for a contradiction turns on an equivocation. The chance is 1/1000 relative to one way of filling in the extra argument place, 1/6 relative to another. (Incidentally, once we do this it’s actually natural to drop the time argument place, and build it into the grounds.)

A lot of people hold that chance is relative to more than just worlds times and propositions. For one example, see Chris Meacham’s paper “Two Mistakes Regarding the Principal Principle,” available here: http://people.umass.edu/cmeacham/Meacham.Two.Mistakes.pdf

Daniel,

I think your diagnosis is dead-on. But I still think there’s something to worry about. In applied statistical work, it seems to me like the extra argument – the ‘grounds’ – will be characterized by the appropriate reference class for an individual. So, if we want to empirically investigate whether smoking affects the chance that I develop lung cancer by the time I’m 50, the proper reference class to look at is (something like) 20-something white males, and not 20-somethings named Dmitri who live in a house with an odd-numbered street address.

In that guise, here’s the puzzle: what makes us think that, for any arbitrary causal graph, the appropriate reference class for a given variable is going to be the same as the appropriate reference class for its causal descendants?

“Especially if we agree that non-trivial chances are possible given determinism, it’s plausible to say that propositions only have chances relative to some “grounds.””

I think Dan has it right. In particular, I totally agree with the above conditional. But let me take things another step.

The consequent of Dan’s conditional commits us to relative objective probabilities (ROPs). But these don’t seem—to me anyway—like what we mean when we talk about objective probability. To see why, consider the role that objective probability plays in our reasoning. One big part of that role—the fundamental part?—is the way that the notion of objective probability figures in our practical reasoning. As Newcomb’s problem shows, when we reason about what to do, we don’t (shouldn’t) think about the subjective conditional probabilities of certain outcomes given certain actions. What we do (should) think about is the subjective probabilities of the objective conditional probabilities of certain outcomes given certain actions. But if there are ROPs, which ROP (should) figure in our practical reasoning? There’s no obvious answer here… except of course the answer that ROPs are not what we’re interested in when we’re interested in objective probability.

So we should accept Dan’s conditional and then modus tollens: there are no (non-trivial) objective probabilities in a deterministic world.

Dustin,

I should stress that my primary interest in this puzzle is its relation to the role that (C) plays in the kinds of empirical methods that authors like Judea Pearl and Spirtes, Glymour, and Scheines are designing for use by practicing researchers in fields like economics, sociology, and epidemiology.

Now, if you’re right, then the right thing to say is not that, since (C) is o.k., these statistical methods are fit for use by practicing researchers. If you’re right, then the right thing to say is that statistics as a field is doomed from the get-go. After all, there are deterministic formulations of quantum mechanics which save all the phenomena. So it’s at least compatible with all our experience of the world that, on your view, the objective chance of every event is either 0 or 1. But the kinds of sample distributions we use to estimate the objective chances of various events are almost never 0 or 1. Conclusion: sample populations are not a good guide to objective probability. Conclusion: statistics is a dead field. So, on your view, the statistical methods of Pearl and the CMU crowd are not threatened by the falsity of (C), but they’re invalid methods of reasoning nonetheless. Even though there’s a tight connection between causation and objective probability, there’s no connection at all between statistical data and objective probability, so we can’t use statistical data to draw conclusions about the causal structure of the world.

(Also, thanks very much for your comments.)

I’m having a really hard time seeing how the puzzle has anything at all to do with causation. Isn’t the problem here just that (A) and (B) look incompatible? On one characterization, the chance of the machine paying out is 1/1000; on another, it is 1/6. Am I missing something?

Maybe you could make things clearer for me by telling me what the variables are, what the causal structure is, and what the statistical units are.

As a side note, I’m not convinced that (C) is required by Bayes nets accounts of causation. How does the falsity of (C) prevent causal structures from inducing probabilities?

Jonathan,

A structural equation is, I take it, just a kind of causal law to the effect that, the value of X will be x if and only if the values of its (endogenous and exogenous) parents, Pa(X), are pa(x). (Whatever values get mapped to x by X’s structural equation.)

When we use this structural equation and the probability functions of Pa(X) to induce a probability function over X, we’re implicitly assuming something like (C). We’re implicitly assuming that if pa(x) always leads to x, and x is not caused by anything other than pa(x), then the probability of pa(x) must be the probability of x.

We can’t get a contradiction from (A) and (B) alone because it could just be that, even though the chance of the die landing 1 is 1/6 and the chance of the slot machine paying out is 1/1000, there’s just no interesting connection between the chances of these two events. In particular, their being *causally* related needn’t make it the case that the chances have to be related. I need (C) to get the contradiction because (C) is what gives me the interesting relationship between causation and chance.

Sorry to be a pain here, but I’m just not understanding your story, yet. Again, could you tell me what the statistical units, variables, and causal structure are supposed to be?

With respect to (C), I agree about the following. Suppose that

Xis the set of all parents of some variable Y, and suppose that the variables inXtake on the valuesx, denotedX=x. If the relationship between the variables inXand the variable Y is deterministic, so that wheneverX=x, Y=y for some unique value y, then Pr{X=x} = Pr{Y=y}.But I deny that Bayes nets require deterministic relationships like this. It could be the case that Y has some associated ineliminable error term such that whenever

X=x, Y=y+e, where e is a value taken at random from some distribution. We still get a probability distribution over values of Y in this case, and that distribution is (partially) induced by the distribution on Y’s parents.I don’t think this matters to your specific case, but I do think it is worth being clear about what assumptions causal modeling (in the Pearl/SGS tradition) does and does not require.

Jonathan,

That’s why I wrote “(exogenous and endogenous) parents” above. If the error term, or exogenous variable, e is “taken at random from some distribution”, then it must *have* a distribution, and that distribution is part of what goes into inducing a probability distribution over Y. If we don’t specify the distribution for e, then Y=y won’t have a well-defined probability.

Here’s a way of spelling things out: D is the value of the die roll, D \in {1…6}, and S is a binary variable which is 1 if the slot machine pays out on the 1001st pull and 0 otherwise. Then, the following structural equation is true: S = f(D), where f(.) is a function which takes 1 to 1 and every other value to 0. So that would make the causal structure this: D —-> S. If we model this with a Bayes Net, then we should have that P(S=1) = P(D=1), where P(.) is the objective probability function.

“Here’s a way of spelling things out: D is the value of the die roll, D \in {1…6}, and S is a binary variable which is 1 if the slot machine pays out on the 1001st pull and 0 otherwise. Then, the following structural equation is true: S = f(D), where f(.) is a function which takes 1 to 1 and every other value to 0. So that would make the causal structure this: D —-> S. If we model this with a Bayes Net, then we should have that P(S=1) = P(D=1), where P(.) is the objective probability function.”

Great! Now, what are the statistical units over which S and D are defined? That is, S and D are random variables, so they are measurable functions from the universe of discourse U, a collection of units, into the real numbers. What are the elements of U in this case?

“If the error term, or exogenous variable, e is “taken at random from some distribution”, then it must *have* a distribution, and that distribution is part of what goes into inducing a probability distribution over Y.”

The way I framed it, there is no variable E that appears in the *causal* Bayes net over the variables

Xand Y. The variable E in the indeterministic case is a dummy variable, and it is not among the parents (either endogenous or exogenous) of Y in the causal Bayes net.Jonathan,

I mean this to be a case of singular causation. So the statistical units are just the possible outcomes of that die roll {1-up, 2-up,…, 6-up} and the possible states of that pull of the slot machine’s lever {payout, no payout}. I think that it makes sense to talk about the chances of one-shot events that are never repeated, so I think it makes sense to talk about the chance that *that* die roll lands 1-up, and I think that it makes sense to talk about the chance that *that* pull of the lever results in a payout.

I recognize that if we were to have a large population of casinos, all of which started with the same seeds and all of which picked a seed by rolling a die, then 1/6th of those casino’s slot machines would pay out on the 1001st pull. (It’s also true that, if I didn’t have every casino start out with the same 6 seeds, then there wouldn’t be a general causal law relating the value of S to the value of D.) But imagine that things aren’t like that, and this slot machine is unique. Presumably, we can still use bayes nets to model singular causal relations and we can use statistical data to estimate the chances associated with the variables in our graph. It seems to me interesting that, in this case, were we to do that, we wouldn’t be able to estimate the objective chance that S=1 by just sampling all of the slot machines – if we did that, then we’d be led to the estimate that P(S=1) is about 1/1000, and not 1/6. So we’d get a violation of the markov condition. In order to get things to work, we’d need to focus on only those machines for which there had been some similar set up involving the same six seeds and the rolling of the die (if there are any others).

The general lesson I take from the puzzle is just that the particular causal graph we’re looking at can make a difference to which reference class is appropriate. If we were looking at a different causal graph, one which included the very same slot machine, and therefore, the very same variable, S, but lacked the variable D, then the appropriate reference class for S might just be the class of all 10-cent slot machines, and not the class of all 10-cent slot machines with the wonky die-roll seed selection process in their history.

Another way of putting the same point: If we’re to be sure that we’re not looking at the wrong reference class, then we need to be sure that, for every sampled individual, the values of the (corresponding) variables for *that* individual are related by the same structural equations. (Not just that they are related by *some* structural equations, but that they are related by the *very same* structural equations — after all, if we allowed the seeds to vary across die rolls, then there would still be *some* structural equation relating each of the (corresponding) variables S and D for each casino. They just wouldn’t be the *same* structural equations.) And that looks, on its face, like a too stringent requirement.

This is unrelated, and it takes us astray, but I’m interested because I’m writing a paper on indeterminism and the markov condition: I take it that the dummy variable approach comes from Daniel Steel’s BJPS article. I agree that we can model every case of indeterministic causation that way (I’m not sure that the converse is true, that every model like that corresponds to a case of genuinely indeterministic causation), but I think that that way of modeling things can be misleading (and, indeed, Steel is misled). In particular, I think we have to be careful to state the causal markov condition properly as the claim that if the *causal* exogenous variables are independent and the graph is acyclic, then a variable’s causal parents will screen it off from all variables in the graph except its descendants, and not present it (as Steel does) as the claim that if *all* exogenous variables (including the dummy ones) are independent and the graph is acyclic, then a variable’s causal parents will screen it off from all variables in the graph except its descendants.

Dmitri,

I think we may be talking past one another (or maybe it’s all my fault) with respect to the main point. I am hoping to write up a blog post of my own on this in the next day or two — mostly so that I can put up some drawings. When I do, I’ll let you know. The short form is that I do not think there is anything paradoxical here precisely because the reference classes (that is the universe of discourse, or set of statistical units) has not been specified for the variables in the model. The statistical units are not necessarily the same as the values available for the variables in the model, and in this case, they are not the same. (In general, a random variable is a function from units to values, so we really should write, e.g. X(u)=x, to denote that the variable X takes the value x for the unit u.) On a minimal description, the units are at least ordered pairs , where d is an allowed value for your variable D and s is an allowed value for your variable S. But once you set the universe of discourse, the probabilities fall out in a consistent way. The important issue here (for me, at least) is that a causal model is usually understood as intra-unit, not inter-unit. Hence, you only have a properly specified model if all of the variables range over the same set of units.

I have serious reservations about the singularity of the model. Especially this remark: “But imagine that things aren’t like that, and this slot machine is unique. Presumably, we can still use bayes nets to model singular causal relations and we can use statistical data to estimate the chances associated with the variables in our graph.” If the case is really unique — really singular — then there are no statistics available. You can’t do statistics with a single datum. So, for singular cases, even if objective chance makes sense metaphysically or logically, it is not something one could estimate statistically.

As to the side issue. No, I am not talking about Steel. I think the point I’m making is just well-known among causal modelers in the Pearl/SGS tradition. But if you want an authority, I refer you to Glymour (2010) “What’s right with Bayes nets …” BJPS 61, 161-211. The paper is essentially a lengthy, brutal review of Cartwright’s

Hunting Causes. The relevant portion of the paper is on page 175.“The fact that the Markov Condition holds necessarily for deterministic systems and for the resulting systems when unit outdegreee and zero indegree variables are marginalized out, does

notmean that the Markov Condition requires an underlying deterministic system. Any probability distribution on categorical variables satisfying the Markov Condition for a DAG can be decomposed so that each variable is a deterministic function of its parents and of a new disturbance variable that bears a probability distribution, but no such interpretation is required.Whether the variables are continuous or categorical or some of each, systems are perfectly possible for which there are no real error terms, but the joint probability distribution satisfies the Markov Condition.” [Bold emphasis added.]Anyway, I agree with you that Steel states the Markov condition incorrectly.

Nuts. I forgot about html rules for a moment there …

My comment has a missing ordered pair in it. It should read, “On a minimal description, the units are at least ordered pairs <d,s>, where d is an allowed value for D and s is an allowed value for S.”

Dmitri,

I have a post up here discussing your puzzle. Let me know if you think I’m making a mistake or (especially) if you think I am misrepresenting you.

Hey Dmitri,

Late to the game here and too lazy to do more than glance at earlier comments but (1) it sounds like you’ve set up a deterministic world where you want objective single-case probabilities to take values between 0 and 1, which makes me think you should throw out the “objective” bit and (2) if you don’t already know it, you should look at Railton’s 1978 article “A Deductive Nomological Model of Probabilistic Explanation” in Philosophy of Science, vol 45, no 2, pp 206-226, especially section 5, “Epistemic Relativity and Maximal Specificity Disowned”–you’ll have to completely disagree with him about what objective single-case probabilities are. (A and B both seem false on his view, which I take to be standard if only because it’s the only one I know.)

Best,

j