The rest of this document outlines the overall statistical model and focuses on
a few issues related to the numerical estimation of the posterior.
In order to understand the underlying Bayesian framework, readers
should familiarize themselves with Sherwood, Webb, et al. 2020, and
Marvel and Webb 2025.
Stan requires us to define the parameters (i.e. the primary, unconstrained
parameters that will be Monte Carlo sampled), any transformed parameters
(i.e. parameters deterministically constrained by the primary parameters),
and the model formulation. To understand how to formulate a model
in Stan, let’s take as an example a posterior with two independent
parameters θ1,2 and two independent pieces of observational
evidence y1, y2, which would make the posterior:
Stan uses two possible formulations. The simplest form of the syntax
is based on the second, distributional formulation, where we define
the model by simply writing out the four distributions above.
In the back-end, each distributional statement adds to the log-likelihood.
For example, since θ1 has a Gaussian distribution,
We can also directly add to log-posterior, and we will need to do that
to account for a change of variables if we want to go from uniform
S to uniform λ priors without having to change the model structure.
SW20 uses four lines of evidence (sec. 7 of SW20): process, historical,
cold past climates (LGM) and warm past climates (Pliocene). Under
the independence ansatz (eq. 11), the posterior for climate sensitivity,
S, is:
The SW20 framework is unusual in that it does not provide data points
yj that can be evaluated by plugging them into a parametric likelihood.
Rather, it provides distributions of the likelihoods for different
lines of evidence, p(Ei∣S), that are written in terms
of the distributions of certain variables (e.g. Thist,Fhist,Nhist,Δλ
for the historical evidence). These distributions need to be interpreted
as likelihoods rather than frequentist sample distributions (see also
Marvel and Webb 2025). Let’s take the example of the historical data.
For a Bayesian calculation we need to write a likelihood as a probability
of the evidence conditional on the parameter S we want to estimate:
However, SW20 does not provide data points, but rather distributions
for (Thist,Nhist,Fhist,Δλ). The
likelihood of the historical evidence can be written as the likelihood
of a given combination of variables:
Thus (Thist,Nhist,Fhist,Δλ) are
not data points but should be viewed as parameters — nuisance parameters,
since we will integrate them out to get a marginal likelihood or posterior
for the parameter of interest S.
Ostensibly the likelihood written above is not a function of S
(or λ). However, S links F,T,N,Δλ such that
for a given value of S, the four values are not independent.
Thus, if we introduce S as a parameter, only three of the other
four variables are independent, and one has to become dependent on
the others. For example, if we choose temperature as the dependent
variable, we can write
Thus temperature is now a constrained parameter that is a deterministic
function of other parameters. In practice we will sample the primary
(unconstrained) parameters (Nhist,Fhist,Δλ,F2×,S)
via a Monte Carlo technique, and we’ll compute the constrained parameter
Thist (transformed parameter in Stan’s language) from the primary
parameters, before plugging it into the likelihood distribution for
Thist. Note that we do not necessarily need to make Thist
the constrained parameter, and we could make Nhist, Fhist,
or even Δλ instead.
The coupling equations link S with the nuisance parameters used
in the likelihood formulation. However, it is up to us to choose which
of the parameters becomes the constrained parameter. For example, the
following two formulations for the historical likelihood are equivalent:
However, there are a couple of considerations that can help us decide
which parameter to choose as the constrained parameter:
1. Pick the formulation that doesn’t require solving a non-trivial
inverse. The LGM equation (eq. 22) is quadratic in TLGM
because of the state-dependence term 2α(ΔT)2:
Treating TLGM as the S-coupling would require solving
this for TLGM via the quadratic formula, which has (i)
two roots and a branch-picking problem, (ii) a 1/α singularity
that is hit constantly because α∼N(0.1,0.1) passes
through zero, and (iii) a discriminant that can go negative near the
boundary of physical configurations, producing NaN in autodiff. Going
the other way — sampling TLGM as primary and computing
FLGM forward — is a single evaluation of a smooth
polynomial, with no branches and no singularities. So FLGM is
forced to be the S-coupling for LGM.
2. Don’t put a kinked / discontinuous likelihood on a transformed
parameter.Fhist has Sherwood’s asymmetric Gaussian observation
(different σ above and below μF); the log-density
jumps by log(σlo/σhi)≈0.747
at Fhist=μF. In the Stan Hamiltonian Monte Carlo
(HMC) sampler, if Fhist were the historical S-coupling
(transformed parameter receiving the asymmetric likelihood), that
discontinuity would propagate through the Hamiltonian energy-balance
equation into the joint log-density in primary-parameter space, biasing
HMC. So we do not want to use Fhist. We could choose
either Nhist or Δλ instead.
The primary unconstrained parameters define Stan’s sampling space,
and they are declared in the parameters block. We will use the
following unconstrained parameters:
Stan allows us to define any number of constrained parameters, which
are declared in the transformed parameters block. These will consist
not only of the three parameters defined by the three coupling equations
needed to write the likelihood, but also any other constrained
parameter that might make the model formulation easier, or that we
want to output. For example, we may want to define λ=−F2×/S
or Fplio=log2(CO2,plio/284)⋅F2×.
In this model formulation, I’ve chosen Thist, FLGM, and Tplio.
Any other formulation that does not violate rules 1 and 2 should work.
4.2 Avoiding biases from the historical aerosol forcing formulation¶
tl;dr:Fhist has an asymmetric Gaussian likelihood. Sampling Fhist directly through Stan’s
creates a finite log-density jump at F=μF that biases HMC. We sample an unconstrained z-score F_z ~ N(0,1) and define F_hist = mu_F + F_z * sigma(sign(F_z)) as a transformed parameter; this induces exactly the same asymmetric distribution but is smooth in sampling space.
SW20- encode the 5/50/95 percentiles of Fhist via
two half-Gaussians glued at μF,
and analogously on the other side. Critically, in the sampling
space (z-space) the prior is N(0,1), which is everywhere
smooth. The map F(z) is continuous (both branches give F=μF
at z=0); only its derivative has a kink at z=0, which HMC handles
correctly. The discontinuous jump in log-density that confused leapfrog
in F-space has been moved to a kink in the transformation,
where it is benign.
4.3 Switching between uniform-λ and uniform-S priors¶
SW20 presents two choices of priors: uniform in λ and uniform
in S. If we write the model in S-space, with S as a primary
parameter, declaring S with <lower=A, upper=B> is equivalent
to a prior on S that is uniform on (A,B).
Setting λ=−F2×/S at fixed F2×, the
change of variable gives
i.e. non-uniform in λ at fixed F2×. This
is the “US” prior of Sherwood (sec. 7.2): uniform on S.
To recover the paper’s Uλ prior (uniform on each component
feedback λi, which by convolution is approximately uniform
on λ) we can either rewrite the model to use λ as
a primary parameter and give it a uniform prior, or make the change
of variables by adding
target += log(F_2xCO2) - 2 * log(S);
This multiplies the joint by S2/F2×, giving an implicit
prior on (λ,F2×) proportional to pF2×(F2×)
— i.e. uniform in λ at every F2× slice, with F2×
retaining its Gaussian prior. This matches Sherwood’s Uλ
exactly (uniform-λ× Gaussian-F2×).
Sherwood, S. C., Webb, M. J., Annan, J. D., Armour, K. C., Forster, P. M., Hargreaves, J. C., Hegerl, G., Klein, S. A., Marvel, K. D., Rohling, E. J., Watanabe, M., Andrews, T., Braconnot, P., Bretherton, C. S., Foster, G. L., Hausfather, Z., von der Heydt, A. S., Knutti, R., Mauritsen, T., … Zelinka, M. D. (2020). An Assessment of Earth’s Climate Sensitivity Using Multiple Lines of Evidence. Reviews of Geophysics, 58(4). 10.1029/2019rg000678
Marvel, K., & Webb, M. (2025). Towards robust community assessments of the Earth’s climate sensitivity. Earth System Dynamics, 16(1), 317–332. 10.5194/esd-16-317-2025