Lab 3: A closer look at convergence

Applied Bayesian Modeling (ICPSR Summer Program 2025)
Author
Affiliation

Ursinus College

Published

July 20, 2025

The purpose of this tutorial is to show how to deploy the most common convergence diagnostics after fitting a Bayesian model in R (see the previous lab.

If you find any errors in this document or have suggestions for improvement, please email me.

How to use this tutorial

This tutorial provides a general approach to convergence diagnostics in R, independent of the software used to fit a Bayesian model. For consistency with Lab 2, we begin with a model fit using rstanarm, but the tutorial below also works with models fit with rstan, JAGS, or BUGS.

This tutorial should be read in conjunction with the background readings on convergence:

In addition, the Stan reference manual, which you can find here, offers a detailed discussion of convergence-related issues and some tips in chapter 16 (direct link).

Fitting the model

We pick up at Lab 2 and fit a model after generating a “fake” dataset:

Code
n.sim <- 100; set.seed(123)
x1 <- rnorm(n = n.sim, mean = 5, sd = 2)
x2 <- rbinom(n.sim, size = 1, prob = 0.3)
e <- rnorm(n = n.sim, mean = 0, sd = 1)
b1 <- 1.2
b2 <- -3.1
a <- 1.5
y <- a + b1 * x1 + b2 * x2 + e
sim.dat <- data.frame(y, x1, x2)
library("rstanarm")
bayes.mod <- stan_glm(y ~ x1 + x2, 
                     data = sim.dat,
                     family = "gaussian",
                     prior = normal(location = 0, scale = 10))

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 3.7e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.37 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.035 seconds (Warm-up)
Chain 1:                0.036 seconds (Sampling)
Chain 1:                0.071 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 5e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.04 seconds (Warm-up)
Chain 2:                0.035 seconds (Sampling)
Chain 2:                0.075 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 8e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.036 seconds (Warm-up)
Chain 3:                0.036 seconds (Sampling)
Chain 3:                0.072 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 5e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.036 seconds (Warm-up)
Chain 4:                0.034 seconds (Sampling)
Chain 4:                0.07 seconds (Total)
Chain 4: 

A first take at convergence is provided in the standard summary of the model object:

Code
summary(bayes.mod)

Model Info:
 function:     stan_glm
 family:       gaussian [identity]
 formula:      y ~ x1 + x2
 algorithm:    sampling
 sample:       4000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 100
 predictors:   3

Estimates:
              mean   sd   10%   50%   90%
(Intercept)  1.9    0.3  1.5   1.8   2.2 
x1           1.1    0.1  1.1   1.1   1.2 
x2          -3.1    0.2 -3.4  -3.1  -2.8 
sigma        0.9    0.1  0.9   0.9   1.0 

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 6.8    0.1  6.7   6.8   7.0  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
              mcse Rhat n_eff
(Intercept)   0.0  1.0  5064 
x1            0.0  1.0  4776 
x2            0.0  1.0  4807 
sigma         0.0  1.0  3950 
mean_PPD      0.0  1.0  4739 
log-posterior 0.0  1.0  1696 

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

Here, the \(\hat{R}\) or Rhat measure (the “potential scale reduction factor”) is the (square root of) the ratio of the variance within and between chains. When chains are mixing well, this ratio should be very close to 1. You can read more on this in Gelman and Shirley (2011), section 6.6 (available in the course folder).

Preparing the posterior output for further analysis

Regardless of what program is used for estimating the regression, the next step is to convert the posterior distribution into an R object of the class mcmc.list for further analysis. An mcmc.list object is similar to a list of matrices, but it contains some additional information about the number of chains and iterations that is necessary for convergence diagnostics.

To convert a model built by rstanarm into an mcmc.list, we use the rstan::As.mcmc.list() function. This function takes two (relevant) arguments: an object of the class stanfit, and the name(s) of the parameters for which we wish to inspect convergence. This is the default object produced by using rstan. We, however, used rstanarm as a wrapper for rstan, and therefore have a different type of object:

Code
class(bayes.mod)
[1] "stanreg" "glm"     "lm"     

Upon closer inspection, though, we find that this object contains an object of the class stanfit:

Code
names(bayes.mod)
 [1] "coefficients"      "ses"               "fitted.values"     "linear.predictors" "residuals"         "df.residual"       "covmat"            "y"                 "model"             "data"              "family"            "offset"            "weights"           "prior.weights"    
[15] "contrasts"         "na.action"         "formula"           "terms"             "prior.info"        "dropped_cols"      "algorithm"         "stan_summary"      "stanfit"           "rstan_version"     "call"              "stan_function"     "compute_mean_PPD"  "xlevels"          

This makes it easy to supply the required object to rstan::As.mcmc.list. We’re restricting this object to only contain parameters named “alpha” and “beta” because these are the internal names for the intercept (“alpha”) and predictors (“beta”).

Code
bayes.mcmc <- rstan::As.mcmc.list(bayes.mod$stanfit, pars = c("alpha", "beta"))

Convergence diagnostics using the coda package

We can use the coda package (Plummer et al. 2006) to produce visual and numeric convergence diagnostics. This package should have been installed when you installed rstanarm. If

Code
library("coda")

fails, install the package:

Code
install.packages("coda")

First, we can produce density- and traceplots of the MCMC output:

Code
plot(bayes.mcmc)

Next, we can generate an autocorrelation plot for each chain; see the background readings for more explanation on these diagnostics:

Code
autocorr.plot(bayes.mcmc, ask = FALSE)

We can calculate other diagnostics:

Code
gelman.plot(bayes.mcmc)

Code
geweke.diag(bayes.mcmc)
[[1]]

Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5 

alpha[1]  beta[1]  beta[2] 
  -1.197    0.826    1.100 


[[2]]

Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5 

alpha[1]  beta[1]  beta[2] 
   0.136   -0.612    0.301 


[[3]]

Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5 

alpha[1]  beta[1]  beta[2] 
  -1.575    1.758   -0.683 


[[4]]

Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5 

alpha[1]  beta[1]  beta[2] 
  -1.232    0.475    1.436 
Code
raftery.diag(bayes.mcmc)
[[1]]

Quantile (q) = 0.025
Accuracy (r) = +/- 0.005
Probability (s) = 0.95 

You need a sample size of at least 3746 with these values of q, r and s

[[2]]

Quantile (q) = 0.025
Accuracy (r) = +/- 0.005
Probability (s) = 0.95 

You need a sample size of at least 3746 with these values of q, r and s

[[3]]

Quantile (q) = 0.025
Accuracy (r) = +/- 0.005
Probability (s) = 0.95 

You need a sample size of at least 3746 with these values of q, r and s

[[4]]

Quantile (q) = 0.025
Accuracy (r) = +/- 0.005
Probability (s) = 0.95 

You need a sample size of at least 3746 with these values of q, r and s
Code
heidel.diag(bayes.mcmc)
[[1]]
                                       
         Stationarity start     p-value
         test         iteration        
alpha[1] passed       1         0.293  
beta[1]  passed       1         0.311  
beta[2]  passed       1         0.258  
                                  
         Halfwidth Mean  Halfwidth
         test                     
alpha[1] passed     1.85 0.01575  
beta[1]  passed     1.14 0.00275  
beta[2]  passed    -3.10 0.01246  

[[2]]
                                       
         Stationarity start     p-value
         test         iteration        
alpha[1] passed       1         0.434  
beta[1]  passed       1         0.207  
beta[2]  passed       1         0.135  
                                  
         Halfwidth Mean  Halfwidth
         test                     
alpha[1] passed     1.86 0.01736  
beta[1]  passed     1.13 0.00272  
beta[2]  passed    -3.09 0.01248  

[[3]]
                                       
         Stationarity start     p-value
         test         iteration        
alpha[1] passed       1         0.261  
beta[1]  passed       1         0.294  
beta[2]  passed       1         0.398  
                                  
         Halfwidth Mean  Halfwidth
         test                     
alpha[1] passed     1.84 0.0168   
beta[1]  passed     1.14 0.0029   
beta[2]  passed    -3.09 0.0114   

[[4]]
                                       
         Stationarity start     p-value
         test         iteration        
alpha[1] passed       1         0.230  
beta[1]  passed       1         0.398  
beta[2]  passed       1         0.301  
                                  
         Halfwidth Mean  Halfwidth
         test                     
alpha[1] passed     1.86 0.0184   
beta[1]  passed     1.13 0.0036   
beta[2]  passed    -3.10 0.0112   

Quick convergence diagnostics: superdiag

A convenient function to analyze numerical representations of diagnostics in one sweep is the superdiag package (Tsai, Gill, and Rapkin 2012).

First, install the package:

Code
install.packages("superdiag", dependencies = TRUE, repos = "https://cloud.r-project.org")

Then, load it:

Code
library("superdiag")

Next, call the superdiag function and specify how many iterations of the chain you want to discard before analyzing for convergence:

Code
superdiag(bayes.mcmc, burnin = 500)
Number of chains = 4 
Number of iterations = 1000 per chain before discarding the burn-in period
Burn-in period = 500 per chain
Sample size in total = 2004 

****************** The Geweke diagnostic: ******************
Windows:
           chain 1 chain 2 chain 3 chain 4
From start     0.1   0.015   0.222   0.836
From stop      0.5   0.891   0.666   0.096

Z-scores:
         chain 1 chain 2 chain 3 chain 4
alpha[1]  -0.383   0.359    0.96   1.221
beta[1]    0.479   0.552   -1.20  -0.817
beta[2]    0.838  -0.784   -1.76  -0.216

*************** The Gelman-Rubin diagnostic: ***************
Potential scale reduction factors:
         Point est. Upper C.I.
alpha[1]          1          1
beta[1]           1          1
beta[2]           1          1

Multivariate psrf: 1.001

************* The Heidelberger-Welch diagnostic ************
Chain 1:
epsilon=0.1, alpha=0.05                                       
         Stationarity start     p-value
         test         iteration        
alpha[1] passed       1         0.295  
beta[1]  passed       1         0.121  
beta[2]  passed       1         0.342  
                                  
         Halfwidth Mean  Halfwidth
         test                     
alpha[1] passed     1.86 0.02505  
beta[1]  passed     1.13 0.00403  
beta[2]  passed    -3.10 0.01896  

Chain 2:
epsilon=0.117, alpha=0.1                                       
         Stationarity start     p-value
         test         iteration        
alpha[1] passed       1         0.234  
beta[1]  passed       1         0.352  
beta[2]  passed       1         0.576  
                                  
         Halfwidth Mean  Halfwidth
         test                     
alpha[1] passed     1.85 0.02137  
beta[1]  passed     1.14 0.00377  
beta[2]  passed    -3.10 0.01671  

Chain 3:
epsilon=0.023, alpha=0.1                                       
         Stationarity start     p-value
         test         iteration        
alpha[1] passed        1        0.10111
beta[1]  passed        1        0.15516
beta[2]  failed       NA        0.00406
                                 
         Halfwidth Mean Halfwidth
         test                    
alpha[1] passed    1.84 0.02422  
beta[1]  passed    1.14 0.00404  
beta[2]  <NA>        NA      NA  

Chain 4:
epsilon=0.121, alpha=0.05                                       
         Stationarity start     p-value
         test         iteration        
alpha[1] passed       52        0.261  
beta[1]  passed        1        0.229  
beta[2]  passed        1        0.252  
                                  
         Halfwidth Mean  Halfwidth
         test                     
alpha[1] passed     1.86 0.02006  
beta[1]  passed     1.13 0.00367  
beta[2]  passed    -3.09 0.01487  

*************** The Raftery-Lewis diagnostic ***************
Chain 1:
Convergence eps = 0.001
Quantile (q) = 0.025
Accuracy (r) = +/- 0.005
Probability (s) = 0.95 

You need a sample size of at least 3746 with these values of q, r and s

Chain 2:
Convergence eps = 0.005
Quantile (q) = 0.01
Accuracy (r) = +/- 0.001
Probability (s) = 0.95 

You need a sample size of at least 38031 with these values of q, r and s

Chain 3:
Convergence eps = 0.001
Quantile (q) = 0.001
Accuracy (r) = +/- 0.0025
Probability (s) = 0.99 

You need a sample size of at least 1061 with these values of q, r and s

Chain 4:
Convergence eps = 5e-04
Quantile (q) = 0.25
Accuracy (r) = +/- 0.001
Probability (s) = 0.999 

You need a sample size of at least 2030169 with these values of q, r and s

************* The Hellinger distance diagnostic ************
Between chains: 
            Min    Max
alpha[1] 0.0549 0.0957
beta[1]  0.0518 0.0866
beta[2]  0.0403 0.0592

Within chain 1:
           100    200    300    400
alpha[1] 0.141 0.1065 0.1165 0.0926
beta[1]  0.152 0.1208 0.1353 0.0639
beta[2]  0.119 0.0834 0.0995 0.1169

Within chain 2:
            100    200    300   400
alpha[1] 0.1317 0.1564 0.1209 0.120
beta[1]  0.1041 0.2141 0.1115 0.123
beta[2]  0.0696 0.0733 0.0956 0.120

Within chain 3:
           100    200    300   400
alpha[1] 0.104 0.0560 0.0965 0.091
beta[1]  0.101 0.0379 0.1119 0.053
beta[2]  0.159 0.1127 0.1738 0.126

Within chain 4:
            100    200    300   400
alpha[1] 0.0778 0.1332 0.1007 0.071
beta[1]  0.1150 0.0956 0.0902 0.101
beta[2]  0.0748 0.0951 0.0634 0.101

Note that the rstanarm object by default retains 1000 iterations, hence the burn-in period you provide for superdiag must be less than 1000.

Quick diagnostic plots: mcmcplots

Similar to shinystan, but a tad more convenient way to obtain graphical diagnostics and results in one page is using the mcmcplots package (Curtis 2012):

First, install the package:

Code
install.packages("mcmcplots", dependencies = TRUE, repos = "https://cloud.r-project.org")

Then, load it:

Code
library("mcmcplots")

For quick diagnostics, you can produce html files with trace, density, and autocorrelation plots all on one page. The files will be displayed in your default internet browser.

Code
mcmcplot(bayes.mcmc)

A page will open in your browser, looking similar to the below:

Page produced by mcmcplot()

Some commands for individual plots:

Code
denplot(bayes.mcmc)

Code
traplot(bayes.mcmc)

If you want to produce a coefficient dot plot with credible intervals, use caterplot:

Code
caterplot(bayes.mcmc, 
  labels = c("alpha", "beta1", "beta2"))

As always, check the help files for options to customize these plots.

More plots: ggmcmc

Yet another convenient option for plotting output is the ggmcmc package (Fernández i Marín 2013):

First, install the package:

Code
install.packages("ggmcmc", dependencies = TRUE, repos = "https://cloud.r-project.org")

Then, load it:

Code
library("ggmcmc")

To use this package, you need to first convert your MCMC object into a ggs object, using the ggs function. This object can then be passed on to the various ggmcmc plotting commands:

Code
bayes.gg <- ggs(bayes.mcmc)
Code
ggs_histogram(bayes.gg)

Code
ggs_density(bayes.gg)

Code
ggs_traceplot(bayes.gg)

Code
ggs_running(bayes.gg)

Code
ggs_compare_partial(bayes.gg)

Code
ggs_autocorrelation(bayes.gg)

Code
ggs_geweke(bayes.gg)

Code
ggs_caterpillar(bayes.gg)

Or, you can use the ggmcmc command to create a PDF file containing a variety of diagnostic plots to your WD:

Code
ggmcmc(bayes.gg, 
       file = "bayes_ggmcmc.pdf")
Plotting histograms
Plotting density plots
Plotting traceplots
Plotting running means
Plotting comparison of partial and full chain
Plotting autocorrelation plots
Plotting crosscorrelation plot
Plotting Potential Scale Reduction Factors
Plotting shrinkage of Potential Scale Reduction Factors
Plotting Number of effective independent draws
Plotting Geweke Diagnostic
Plotting caterpillar plot
Time taken to generate the report: 2.8 seconds.

Using shinystan

As a reminder, the shinystan package (Stan Development Team 2017) offers a convenient way to inspect the convergence of samples in your web browser. Just as you saw in Lab 2, simply call launch_shinystan on the model you fit with rstanarm:

Code
launch_shinystan(bayes.mod)

After a few seconds, your browser should show the shinystan interface:

ShinyStan start-up screen

Click on any of the buttons DIAGNOSE, ESTIMATE, EXPLORE, or HELP to proceed.

References

Cowles, Mary Kathryn, and Bradley P. Carlin. 1996. “Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review.” Journal of the American Statistical Association 91 (434): 883–904. http://www.jstor.org/stable/2291683.
Curtis, S. McKay. 2012. Mcmcplots: Create Plots from MCMC Output. http://CRAN.R-project.org/package=mcmcplots.
Fernández i Marín, Xavier. 2013. ggmcmc: Graphical Tools for Analyzing Markov Chain Monte Carlo Simulations from Bayesian Inference. http://xavier-fim.net/packages/ggmcmc.
Gelman, Andrew, and Kenneth Shirley. 2011. “Inference from Simulations and Monitoring Convergence.” In Handbook of Markov Chain Monte Carlo, edited by Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng, 163–74. Chapman; Hall/CRC.
Plummer, Martyn, Nicky Best, Kate Cowles, and Karen Vines. 2006. CODA: Convergence Diagnosis and Output Analysis for MCMC.” R News 6 (1): 7–11. http://CRAN.R-project.org/doc/Rnews/.
Robert, Christian, and George Casella. 2010. Introducing Monte Carlo Methods with r. New York, NY: Springer.
Stan Development Team. 2017. Shinystan: Interactive Visual and Numerical Diagnostics and Posterior Analysis for Bayesian Models. http://mc-stan.org/.
Tsai, Tsung-han, Jeff Gill, and Jonathan Rapkin. 2012. Superdiag: R Code for Testing Markov Chain Nonconvergence. http://CRAN.R-project.org/package=superdiag.