At the Columbia workshop last week, Andre Wibisono presented work related with a recent arXival on the exponentially fast convergence of both unadjusted Langevin and proximal sampler algorithms under strong [definitely strong] log-concavity assumptions. The idea behind the proximal sampler is to target the demarginalised density
0″ class=”latex” />
by introducing an auxiliary Gaussian vector y, which preserves f(x) as the marginal distribution on the first component vector X. While the auxiliary Y is (obviously) conditionally Gaussian, the conditional of X is at least as challenging as simulating from f. Unless η is chosen small enough to regularize log g(x) into a strongly log-concave function, since
when x*=x*(y) is the maximiser of log g(x,y) (for a given value y) and β>0 is the appropriate log-concavity constant. This inequality means that an accept-reject can be implemented to simulate from the conditional of X given Y but it requires both the factor β and the derivation of x*(y), hence a pretty good understanding and a rather high regularity of the actual target f(x). Besides, the regularization term ||x-y||² means that y is approximately the previous value of the (sub)chain X, hence it creates a rappel force that slows down the exploration of the target.
Since the arXival does not contain numerical comparisons, I attempted one using the (2D) banana shaped distribution,
target=function(x,sig,B,mu)-x[1]^2/2/sig-(x[2]+B*x[1]^2-mu)^2/2
with μ=σ=B=10. Comparing with a vanilla random walk Metropolis with three potential scales, chosen randomly at each iteration. Since I did not want to check whether or not the target was log-concave (and derive the corresponding β), I used the Normal distribution centred at proposal x*(y) of a Metropolis step, again with several scales. The following is the representation of the samples (sienna for MCMC, navy blue for proximal with β=50, dark green for β=5), with a lesser rate of tail exploration for the proximal samplers. It is thus unclear to me the theoretical characterisations of the method translate into practical efficiency beyond the most regular cases.