Quantcast
Viewing latest article 5
Browse Latest Browse All 9

Why put distributions on functions/expressions?

You might well ask, in all this declarative vs generative modeling discussion, why would you want to put distributions on expressions involving data and parameters?

And the answer I'd give to that is that sometimes it's more natural. For example, with the roundoff error in the laser example (data rounded to nearest 1/4 wavelength) the alternative is to write a likelihood on a discrete grid of rounded results. for example if you know that x is rounded to the nearest 0.25, then its underlying value fell somewhere between -0.125 and +0.125 relative to its reported value, in Stan:

increment_log_prob(log(normal_cdf(x+0.125,mu,sigma)-normal_cdf(x-0.125,mu,sigma)));

But, what if you knew not just that it was rounded, but for example you had some second measurement which, with noise, gives you some additional information about the underlying value for some of the measurements? Now, to do this above model you have to calculate a weighted integral of the normal_pdf instead of an unweighted one (which is what the normal_cdf is).

Alternatively, you could put a prior on the roundoff error, and express your probability over the result of subtracting the roundoff error. For example suppose in this particular case, the underlying value was above by about 0.08 (but no more than 0.125)

real<lower=0,upper=0.125> err;
err ~ exponential(1/0.08);
(x-err) ~ normal(mu,sigma);

If you marginalize away the variable err, by calculating

Image may be NSFW.
Clik here to view.
\int_0^{0.125} \mathrm{normal\_pdf}(x-\mathrm{err},\mu,\sigma) p(\mathrm{err}) d\mathrm{err}


when Image may be NSFW.
Clik here to view.
p(\mathrm{err})=\mathrm{uniform(-0.125,0.125)}
you get the above model with the normal_cdf, but if you use the informed prior, you get a model which is, marginally, a weighted integral of the normal_pdf over the range [0,0.125]. Good luck calculating that in the likelihood, it will require doing numerical integration in the increment_log_prob statement!

But, why else would you want to do this? Another reason that I can think of is that often, at least in the physical sciences, we can decompose the error in to a variety of sources. For example, in the laser molecular measurement example we had two sources: molecular vibrations, and roundoff. But in a moderately famous neutrino experiment from a few years back, you had what people thought were faster-than-light neutrinos. It turned out if I remember correctly that this was due to an error in the model for the errors! Specifically, if you have a signal traveling through the earth at the speed of light, and then hitting a detector, and the detector is connected to an amplifier, and the amplifier is connected to a signal processor, and the signal processor is connected to a computer network, and the leg-bone is connected to the hip-bone... Then at each stage there is some kind of systematic error in the signal which you can model via various functions and prior information. But at the end, you're expecting (but not requiring!) that the signal couldn't have gotten there faster than light.

(signal_speed - my_complicated_error_model)/speed_of_light ~ normal(1,0.00001); // a little remaining statistical noise

or suppose that from previous experiments, you know the noise is a little skewed or something

(signal_speed - my_complicated_error_model)/speed_of_light ~ gamma(appropriate,constants); // a bit non-normal noise

The fact that is salient, is that after removing the systematic errors, the remaining result should be just about exactly the speed of light. If, however, you use a highly informed prior for some portion of the error model, you can wind up overwhelming this distribution, and getting samples that predict faster than light. If your error model is correct, then you've discovered something! and if you are less certain about your error model and express that uncertainty appropriately, then the knowledge you have about the speed of the signal being almost exactly the speed of light will help you discover the true value for that component of the error.

Of course, in most cases you could write these models in a generative way. But we've seen in the above example the generative model might very well require doing weighted numerical integration in the likelihood function. The distribution that is implied by the above statement on the signal speed could well be extremely strange, especially if my_complicated_error_model has all sorts of informed priors on multiple parameters involved in it.

Remember, the distribution of a+b is the convolution of the distribution for a and the distribution for b.

Image may be NSFW.
Clik here to view.
p_{a+b}(x) = \int_{-\infty}^{\infty} pa(x+q)pb(x-q)dq

Suppose your error model has multiple terms, each of which is an informed prior:

a-(b+c+d+e+f+g) ~ gamma(q,p);

This means that the appropriate convolution of all those informed distributions for b,c,d,e,f,g with the distribution for a, is a gamma distribution with some constants q,p you've gotten from previous experiments:

convolution_of_a_b_c_d_e_f_g = gamma_pdf(q,p)

What does that mean for the probability of a? Good luck writing that down in a simple likelihood. Ok, so perhaps you can figure it out, the point is that if the only reason you aren't willing to put a distribution on the left hand side is that it's not generative... then you're wasting a lot of time for a requirement that isn't a real requirement! Much better to simply state the fact that you know, and let Stan find the samples for a.


Viewing latest article 5
Browse Latest Browse All 9

Trending Articles