Suppose there are physics labs around the world that want to find out the length of a special unobtanium molecule. They have a variety of special laser molecular traps which can read out the length of the molecule to the nearest 1/4 wavelength of the laser light. Unfortunately, they all have different lasers with different wavelengths that aren't known exactly. Fortunately for us, one lab has extra money, and has a carefully measured and tuned laser, but unfortunately they can only give us a few samples of the molecular length (otherwise, we could just as well ignore everyone else as far as obtaining the avg length goes)
One of the reasons to do this experiment is to find out the length of the unobtanium, but the other reason is to calibrate the other labs measurement apparati for future use. So we'd like to do both!
Furthermore, suppose one of the labs, in initially getting things set up, did a lot of measurements of the molecule with a highly out of tune laser (very uncertain what the actual wavelength is) and found that the molecule vibrates in such a way that any given measurement of the length is distributed around the average length by a root-mean-squared error of 0.1294795 when measured as a fraction of the mean length (they have no calibration data on their laser, so they can't give a measurement in meters!). Unfortunately, they didn't think to keep the measurements, they just calculated the mean mu and the standard deviation, sigma, and reported sigma/mu only.
What the facilities do know is an interval where their laser's wavelength lies, perhaps by some cheap but imprecise measurement apparatus, except for the one which is quite precise (without this, there is no hope to connect things to the definition of a meter).
See the examp_laser_molecule R/Rstan code (rename it to .R instead of .txt)
Note, we know that the laser measurements are rounded to the nearest quarter wavelength. This means that for every measurement, there is some number between -0.125 and 0.125 which we could subtract from the rounded data, to get the "real" data. Since we know nothing about the actual value, we give a maxent prior of uniform on that interval for this parameter.
Similarly, the labs have reported a range over which they are SURE their laser's wavelength lies. So we specify maxent priors as uniform on that range for the wavelength.
Since we know the relative RMS error in the length (caused by molecular vibration!) we specify a quantity which we know is distributed around 1 with sigma = sd_scaled = 0.1294795.
However, secretly, when we simulate the process we use a t distribution with 13 degrees of freedom, rescaled to have the appropriate sigma. In other words, the normal distribution we're using is maxent for something which has a given RMS error, but the actual underlying doesn't satisfy the normal assumption exactly.
The part that generated concern (pun intended) with the "generative modeling" crowd was this:
(l1[i]-roundl1[i])*wl1/true_len ~ normal(1,sd_scaled);
And similar lines related to the data from other labs, l2, l3, l4, l5.
I read this line as: given the rounding error and wavelength, and the true_len, if I calculate the left hand side it will look like a sample from a normal(1,sd_scaled) distribution.
To an HMC monte-carlo simulator like Stan, you could read this as: Stay in the region of parameter space where the left hand sides (all of them) look like samples from the distribution on the right hand side.
In mathematics of conditional probability:
p((l1[i]-roundl1[i])*wl1/true_len | roundl1[i], wl1, true_len, My_Knowledge) = normal_pdf(1,sd_scaled);
to get a full joint distribution, we'll need to multiply by
p(roundl1[i])*p(wl1)*p(true_len), since those parameters all have independent priors. This distribution implies various weird distributions on the l1 data, and the roundl1 parameters etc, but it DOES express a true fact about the world, and it seems to me that is all that should matter in terms of getting correct facts out from the posterior distribution. (Note: I have no "proof" of that, but from intuition on logic, if Bayesian logic is a generalization of binary logic, then if you make true statements in the Bayesian logic, it shouldn't imply false statements out the other end right?)
Note, this type of method may not be maximally informative, efficient, unbiased, or whatnot, and if you're not careful with your choice of function on the left, it may actually be very uninformative (for example, if everything you put in comes out like a normal deviate, then it can't tell you much about which parameter values you should concentrate on). If you want to do this, you need to express facts which aren't true when the parameters have the "wrong" values and are true when the parameters have the "right" values. It may be that such things are more easily understandable in say physics than in social sciences or some other areas of applications.