Марковская цепь Монте-Карло: различия между версиями

Материал из Википедии — свободной энциклопедии
Перейти к навигации Перейти к поиску
[непроверенная версия][непроверенная версия]
Содержимое удалено Содержимое добавлено
Новая страница: «{{В инкубаторе}} {{Инкубатор, пишу}} <!-- Начинайте писать ниже этой строки --> '''Сюд...»
 
Нет описания правки
Строка 4: Строка 4:


<!-- Начинайте писать ниже этой строки -->
<!-- Начинайте писать ниже этой строки -->
{{short description|Class of dependent sampling algorithms}}
<math></math>{{Bayesian statistics}}


In [[statistics]], '''Markov chain Monte Carlo''' ('''MCMC''') methods comprise a class of [[algorithm]]s for sampling from a [[probability distribution]]. By constructing a [[Markov chain]] that has the desired distribution as its [[Markov chain#Steady-state analysis and limiting distributions|equilibrium distribution]], one can obtain a sample of the desired distribution by recording states from the chain. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution. Various algorithms exist for constructing chains, including the [[Metropolis–Hastings algorithm]].
'''Сюда поместите название статьи''' —


== Application domains ==
MCMC methods are primarily used for calculating [[Numerical analysis|numerical approximations]] of [[Multiple integral|multi-dimensional integrals]], for example in [[Bayesian statistics]], [[computational physics]],<ref>{{Cite journal|last=Kasim|first=M.F.|last2=Bott|first2=A.F.A.|last3=Tzeferacos|first3=P.|last4=Lamb|first4=D.Q.|last5=Gregori|first5=G.|last6=Vinko|first6=S.M. | date = September 2019 |title=Retrieving fields from proton radiography without source profiles |journal=Physical Review E|volume=100|doi=10.1103/PhysRevE.100.033208|arxiv=1905.12934}}</ref> [[computational biology]]<ref>{{Cite journal|last=Gupta|first=Ankur|last2=Rawlings|first2=James B. | date = April 2014 |title=Comparison of Parameter Estimation Methods in Stochastic Chemical Kinetic Models: Examples in Systems Biology |journal=AIChE Journal|volume=60|issue=4|pages=1253–1268|doi=10.1002/aic.14409 |pmc=4946376|pmid=27429455}}</ref> and [[computational linguistics]].<ref>See Gill 2008.</ref><ref>See Robert & Casella 2004.</ref>


In Bayesian statistics, the recent development of MCMC methods has made it possible to compute large [[Bayesian network#Hierarchical models|hierarchical models]] that require integrations over hundreds to thousands of unknown parameters.<ref>{{cite book|last1=Banerjee|first1=Sudipto|last2=Carlin|first2=Bradley P.|last3=Gelfand|first3=Alan P.|title=Hierarchical Modeling and Analysis for Spatial Data|publisher=CRC Press|isbn=978-1-4398-1917-3|page=xix|edition=Second|date=2014-09-12}}</ref>

In [[rare event sampling]], they are also used for generating samples that gradually populate the rare failure region.

In [[cryptocurrency]], MCMC has been used for generating forecasts based on historical price data.

== General explanation ==
[[File:Metropolis algorithm convergence example.png|thumbnail|upright=1.25|Convergence of the [[Metropolis–Hastings algorithm]]. Markov chain Monte Carlo attempts to approximate the blue distribution with the orange distribution.]]

Markov chain Monte Carlo methods create samples from a continuous [[random variable]], with [[probability density]] proportional to a known function. These samples can be used to evaluate an integral over that variable, as its [[expected value]] or [[variance]].

Practically, an [[Statistical ensemble|ensemble]] of chains is generally developed, starting from a set of points arbitrarily chosen and sufficiently distant from each other. These chains are [[stochastic processes]] of "walkers" which move around randomly according to an algorithm that looks for places with a reasonably high contribution to the integral to move into next, assigning them higher probabilities.

Random walk Monte Carlo methods are a kind of random [[Computer simulation|simulation]] or [[Monte Carlo method]]. However, whereas the random samples of the integrand used in a conventional [[Monte Carlo integration]] are [[statistically independent]], those used in MCMC are [[autocorrelation|autocorrelated]]. Correlations of samples introduces the need to use the [[Markov chain central limit theorem]] when estimating the error of mean values.

These algorithms create [[Markov chains]] such that they have an [[Markov chain#Steady-state analysis and limiting distributions|equilibrium distribution]] which is proportional to the function given.

==Reducing correlation==
While MCMC methods were created to address multi-dimensional problems better than generic Monte Carlo algorithms, when the number of dimensions rises they too tend to suffer the [[curse of dimensionality]]: regions of higher probability tend to stretch and get lost in an increasing volume of space that contributes little to the integral. One way to address this problem could be shortening the steps of the walker, so that it doesn't continuously try to exit the highest probability region, though this way the process would be highly autocorrelated and expensive (i.e. many steps would be required for an accurate result). More sophisticated methods such as [[Hamiltonian Monte Carlo]] and the [[Wang and Landau algorithm]] use various ways of reducing this autocorrelation, while managing to keep the process in the regions that give a higher contribution to the integral. These algorithms usually rely on a more complicated theory and are harder to implement, but they usually converge faster.

== Examples ==

=== Random walk ===

* [[Metropolis–Hastings algorithm]]: This method generates a Markov chain using a proposal density for new steps and a method for rejecting some of the proposed moves. It is actually a general framework which includes as special cases the very first and simpler MCMC (Metropolis algorithm) and many more recent alternatives listed below.
**[[Gibbs sampling]]: This method requires all the [[conditional distribution]]s of the target distribution to be sampled exactly. When drawing from the full-conditional distributions is not straightforward other samplers-within-Gibbs are used (e.g., see <ref>{{Cite journal|title = Adaptive Rejection Sampling for Gibbs Sampling|journal = Journal of the Royal Statistical Society. Series C (Applied Statistics)|date = 1992-01-01|pages = 337–348|volume = 41|issue = 2|doi = 10.2307/2347565|first = W. R.|last = Gilks|first2 = P.|last2 = Wild|jstor=2347565}}</ref><ref>{{Cite journal|title = Adaptive Rejection Metropolis Sampling within Gibbs Sampling|journal = Journal of the Royal Statistical Society. Series C (Applied Statistics)|date = 1995-01-01|pages = 455–472|volume = 44|issue = 4|doi = 10.2307/2986138|first = W. R.|last = Gilks|first2 = N. G.|last2 = Best|author2-link= Nicky Best |first3 = K. K. C.|last3 = Tan|jstor=2986138}}</ref><ref>{{Cite journal|title = Independent Doubly Adaptive Rejection Metropolis Sampling Within Gibbs Sampling|journal = IEEE Transactions on Signal Processing|date = 2015-06-01|issn = 1053-587X|pages = 3123–3138|volume = 63|issue = 12|doi = 10.1109/TSP.2015.2420537|first = L.|last = Martino|first2 = J.|last2 = Read|first3 = D.|last3 = Luengo|arxiv = 1205.5494|bibcode = 2015ITSP...63.3123M |ref=harv}}</ref>). Gibbs sampling is popular partly because it does not require any 'tuning'.
** [[Metropolis-adjusted Langevin algorithm]] and other methods that rely on the gradient (and possibly second derivative) of the log target density to propose steps that are more likely to be in the direction of higher probability density.<ref>See Stramer 1999.</ref>
**[[Pseudo-Marginal Metropolis-Hastings algorithm|Pseudo-marginal Metropolis–Hastings]]: This method replaces the evaluation of the density of the target distribution with an unbiased estimate and is useful when the target density is not available analytically, e.g. [[latent variable model]]s.
* [[Slice sampling]]: This method depends on the principle that one can sample from a distribution by sampling uniformly from the region under the plot of its density function. It alternates uniform sampling in the vertical direction with uniform sampling from the horizontal 'slice' defined by the current vertical position.
* [[Multiple-try Metropolis]]: This method is a variation of the Metropolis–Hastings algorithm that allows multiple trials at each point. By making it possible to take larger steps at each iteration, it helps address the curse of dimensionality.<ref>{{Cite journal|title = The Multiple-Try Method and Local Optimization in Metropolis Sampling|journal = Journal of the American Statistical Association|date = 2000-03-01|issn = 0162-1459|pages = 121–134|volume = 95|issue = 449|doi = 10.1080/01621459.2000.10473908|first = Jun S.|last = Liu|first2 = Faming|last2 = Liang|first3 = Wing Hung|last3 = Wong}}</ref><ref>{{Cite journal|title = On the flexibility of the design of multiple try Metropolis schemes|journal = Computational Statistics|date = 2013-07-11|issn = 0943-4062|pages = 2797–2823|volume = 28|issue = 6|doi = 10.1007/s00180-013-0429-2|first = Luca|last = Martino|first2 = Jesse|last2 = Read|arxiv = 1201.0646}}</ref>
* [[Reversible-jump]]: This method is a variant of the Metropolis–Hastings algorithm that allows proposals that change the dimensionality of the space.<ref>See Green 1995.</ref> Markov chain Monte Carlo methods that change dimensionality have long been used in [[statistical physics]] applications, where for some problems a distribution that is a [[grand canonical ensemble]] is used (e.g., when the number of molecules in a box is variable). But the reversible-jump variant is useful when doing Markov chain Monte Carlo or Gibbs sampling over [[nonparametric]] Bayesian models such as those involving the [[Dirichlet process]] or [[Chinese restaurant process]], where the number of mixing components/clusters/etc. is automatically inferred from the data.
* [[Hamiltonian Monte Carlo|Hamiltonian (or Hybrid) Monte Carlo]] (HMC): Tries to avoid random walk behaviour by introducing an auxiliary [[momentum]] vector and implementing [[Hamiltonian dynamics]], so the potential energy function is the target density. The momentum samples are discarded after sampling. The end result of Hybrid Monte Carlo is that proposals move across the sample space in larger steps; they are therefore less correlated and converge to the target distribution more rapidly.

===Training-based ===
Another approach is to use the previous steps to generate the next candidate. This training-based algorithm is able to accelerate MCMC by an order of magnitude.<ref>{{cite journal|last1=Tahmasebi|first1=Pejman|last2=Javadpour|first2=Farzam|last3=Sahimi|first3=Muhammad|title=Stochastic shale permeability matching: Three-dimensional characterization and modeling|journal=International Journal of Coal Geology|date=August 2016|volume=165|pages=231–242|doi=10.1016/j.coal.2016.08.024|url=https://www.researchgate.net/publication/307626119}}</ref>

Interacting MCMC methodologies are a class of [[mean field particle methods]] for obtaining [[Pseudo-random number sampling|random samples]] from a sequence of probability distributions with an increasing level of sampling complexity.<ref name="dp13">{{cite book|last = Del Moral|first = Pierre|title = Mean field simulation for Monte Carlo integration|year = 2013|publisher = Chapman & Hall/CRC Press|quote = Monographs on Statistics & Applied Probability|url = http://www.crcpress.com/product/isbn/9781466504059|pages = 626}}</ref> These probabilistic models include path space state models with increasing time horizon, posterior distributions w.r.t. sequence of partial observations, increasing constraint level sets for conditional distributions, decreasing temperature schedules associated with some Boltzmann-Gibbs distributions, and many others. In principle, any Markov chain Monte Carlo sampler can be turned into an interacting Markov chain Monte Carlo sampler. These interacting Markov chain Monte Carlo samplers can be interpreted as a way to run in parallel a sequence of Markov chain Monte Carlo samplers. For instance, interacting [[simulated annealing]] algorithms are based on independent Metropolis-Hastings moves interacting sequentially with a selection-resampling type mechanism. In contrast to traditional Markov chain Monte Carlo methods, the precision parameter of this class of interacting Markov chain Monte Carlo samplers is ''only'' related to the number of interacting Markov chain Monte Carlo samplers. These advanced particle methodologies belong to the class of Feynman-Kac particle models,<ref name="dp04">{{cite book|last = Del Moral|first = Pierre|title = Feynman-Kac formulae. Genealogical and interacting particle approximations|year = 2004|publisher = Springer|quote = Series: Probability and Applications|url = https://www.springer.com/mathematics/probability/book/978-0-387-20268-6|pages = 575}}</ref><ref name="dmm002">{{cite book|last1 = Del Moral|first1 = Pierre|last2 = Miclo|first2 = Laurent|contribution = Branching and Interacting Particle Systems Approximations of Feynman-Kac Formulae with Applications to Non-Linear Filtering|title=Séminaire de Probabilités XXXIV|editors=Jacques Azéma, Michel Ledoux, Michel Émery, Marc Yor|series = Lecture Notes in Mathematics|date = 2000|volume = 1729|pages = 1–145|url = http://archive.numdam.org/ARCHIVE/SPS/SPS_2000__34_/SPS_2000__34__1_0/SPS_2000__34__1_0.pdf|doi = 10.1007/bfb0103798|isbn = 978-3-540-67314-9}}</ref> also called Sequential Monte Carlo or [[particle filter]] methods in [[Bayesian inference]] and [[signal processing]] communities.<ref name=":3">{{Cite journal|title = Sequential Monte Carlo samplers - P. Del Moral - A. Doucet - A. Jasra - 2006 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library| doi=10.1111/j.1467-9868.2006.00553.x|volume=68|issue = 3|year=2006|journal=Journal of the Royal Statistical Society. Series B (Statistical Methodology)|pages=411–436 | last1 = Del Moral | first1 = Pierre|arxiv=cond-mat/0212648}}</ref> Interacting Markov chain Monte Carlo methods can also be interpreted as a mutation-selection [[Genetic algorithm|genetic particle algorithm]] with Markov chain Monte Carlo mutations.

Markov Chain quasi-Monte Carlo (MCQMC).<ref>{{cite journal |last=Chen |first=S. |first2=Josef |last2=Dick |first3=Art B. |last3=Owen |title=Consistency of Markov chain quasi-Monte Carlo on continuous state spaces |journal=[[Annals of Statistics]] |volume=39 |issue=2 |year=2011 |pages=673–701 |doi=10.1214/10-AOS831 |doi-access=free }}</ref><ref>{{cite thesis |last=Tribble |first=Seth D. |title=Markov chain Monte Carlo algorithms using completely uniformly distributed driving sequences |type=Diss. |publisher=Stanford University |year=2007 |id={{ProQuest|304808879}} }}</ref>

The advantage of [[low-discrepancy sequence]]s in lieu of random numbers for simple independent Monte Carlo sampling is well known.<ref>{{cite journal |last=Papageorgiou |first=Anargyros |first2=J. F. |last2=Traub |title=Beating Monte Carlo |journal=Risk |volume=9 |issue=6 |year=1996 |pages=63–65 |doi= }}</ref> This procedure, known as [[Quasi-Monte Carlo method]] (QMC),<ref>{{cite journal | last1 = Sobol | first1 = Ilya M | year = 1998 | title = On quasi-monte carlo integrations | url = | journal = Mathematics and Computers in Simulation | volume = 47 | issue = 2| pages = 103–112 | doi=10.1016/s0378-4754(98)00096-2 }}</ref> yields an integration error that decays at a superior rate to that obtained by IID sampling, by the [[Koksma-Hlawka inequality]]. Empirically it allows the reduction of both estimation error and convergence time by an order of magnitude.{{Citation needed|date=April 2015}} The Array-RQMC method combines randomized quasi-Monte Carlo and Markov chain simulation by simulating <math>n</math> chains simultaneously in a way that the empirical distribution of the <math>n</math> states at any given step is a better approximation of the true distribution of the chain than with ordinary MCMC.<ref>{{cite journal |last=L'Ecuyer |first=P. |first2=C. |last2=Lécot |first3=B. |last3=Tuffin |title=A Randomized Quasi-Monte Carlo Simulation Method for Markov Chains |journal=[[Operations Research (journal)|Operations Research]] |volume=56 |issue=4 |year=2008 |pages=958–975 |doi=10.1287/opre.1080.0556 |url=https://hal.inria.fr/inria-00070462/file/RR-5545.pdf }}</ref> In empirical experiments, the variance of the average of a function of the state sometimes converges at rate <math>O(n^{-2})</math> or even faster, instead of the <math>O(n^{-1})</math> Monte Carlo rate.<ref>{{cite journal |last=L'Ecuyer |first=P. |first2=D. |last2=Munger |first3=C. |last3=Lécot |first4=B. |last4=Tuffin |title=Sorting Methods and Convergence Rates for Array-RQMC: Some Empirical Comparisons |journal=Mathematics and Computers in Simulation |volume=143 |year=2018 |pages=191–201 |doi=10.1016/j.matcom.2016.07.010 }}</ref>

==Convergence==

Usually it is not hard to construct a Markov chain with the desired properties. The more difficult problem is to determine how many steps are needed to converge to the stationary distribution within an acceptable error.<ref name="Gelman and Rubin, 1992">{{cite journal|last1=Gelman|first1=A.|last2=Rubin|first2=D.B.|title=Inference from iterative simulation using multiple sequences (with discussion)|journal=Statistical Science|date=1992|volume=7|issue=4|pages=457–511|doi=10.1214/ss/1177011136|bibcode=1992StaSc...7..457G|url=http://www.stat.duke.edu/~scs/Courses/Stat376/Papers/ConvergeDiagnostics/GelmanRubinStatSci1992.pdf}}</ref> A good chain will have [[Markov chain mixing time|rapid mixing]]: the stationary distribution is reached quickly starting from an arbitrary position. A standard empirical method to assess convergence is to run several independent simulated Markov chains and check that the ratio of inter-chain to intra-chain variances for all the parameters sampled is close to 1.<ref name="Gelman and Rubin, 1992" /><ref name="Colwes and Carlin, 1996">{{cite journal|last1=Cowles|first1=M.K.|last2=Carlin|first2=B.P.|title=Markov chain Monte Carlo convergence diagnostics: a comparative review|journal=Journal of the American Statistical Association|date=1996|volume=91|issue=434|pages=883–904|doi=10.1080/01621459.1996.10476956|citeseerx=10.1.1.53.3445}}</ref>

Typically, Markov chain Monte Carlo sampling can only approximate the target distribution, as there is always some residual effect of the starting position. More sophisticated Markov chain Monte Carlo-based algorithms such as [[coupling from the past]] can produce exact samples, at the cost of additional computation and an unbounded (though finite in expectation) [[running time]].

Many random walk Monte Carlo methods move around the equilibrium distribution in relatively small steps, with no tendency for the steps to proceed in the same direction. These methods are easy to implement and analyze, but unfortunately it can take a long time for the walker to explore all of the space. The walker will often double back and cover ground already covered.

Further consideration of convergence is at [[Markov chain central limit theorem]]. See <ref>{{cite journal |last=Hill |first=S. D. |last2=Spall |first2=J. C. |year=2019 |title=Stationarity and Convergence of the Metropolis-Hastings Algorithm: Insights into Theoretical Aspects |journal=IEEE Control Systems Magazine |volume=39 |issue=1 |pages=56–67 |doi=10.1109/MCS.2018.2876959 }}</ref> for a discussion of the theory related to convergence and stationarity of the Metropolis-Hastings algorithm.

== Software ==
Several software programs provide MCMC sampling capabilities, for example:
*Packages that use dialects of the [[Bayesian inference using Gibbs sampling|BUGS]] model language:
**[[WinBUGS]] / [[OpenBUGS]]/ [https://www.multibugs.org/ MultiBUGS]
**[[Just another Gibbs sampler|JAGS]]
**[https://r-nimble.org/ NIMBLE]
*[https://greta-stats.org/ greta], a Bayesian statistical modeling language / R package which uses TensorFlow behind the scenes,<ref>{{Cite web|url=https://greta-dev.github.io/greta/software.html|title=greta's software dependencies and inspirations|last=|first=|date=|website=greta-stats.org/|archive-url=|archive-date=|access-date=2020-04-13}}</ref> similar to PyMC3's use of Theano as the computational back-end
*[[MCSim]]
*[[PyMC3]]
*[https://github.com/prmiles/pymcmcstat/wiki pymcmcstat]
* [[R (programming language)]] with the packages adaptMCMC, atmcmc, BRugs, mcmc, MCMCpack, ramcmc, rjags, rstan, etc.
*[[Stan (software)|Stan]]
*[https://www.tensorflow.org/probability/ TensorFlow Probability] ([[Probabilistic programming language|probabilistic programming]] library built on [[TensorFlow]])
*[http://micans.org/mcl/ MCL] (a cluster algorithm for graphs)<ref>{{cite journal|last1=Enright|first1=AJ|last2=Van Dongen|first2=S|last3=Ouzounis|first3=CA|title=An efficient algorithm for large-scale detection of protein families.|journal=Nucleic Acids Research|date=1 April 2002|volume=30|issue=7|pages=1575–84|pmid=11917018|pmc=101833|doi=10.1093/nar/30.7.1575}}</ref> and [https://bitbucket.org/azadcse/hipmcl/wiki/Home HipMCL] (a parallelized version)<ref>{{cite journal|last1=Azad|first1=A|last2=Pavlopoulos|first2=GA|last3=Ouzounis|first3=CA|last4=Kyrpides|first4=NC|last5=Buluç|first5=A|title=HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks.|journal=Nucleic Acids Research|date=6 April 2018|volume=46|issue=6|pages=e33|doi=10.1093/nar/gkx1313|pmid=29315405|pmc=5888241}}</ref>
* [http://dfm.io/emcee/current/ emcee] (MIT licensed pure-Python implementation of Goodman & Weare's Affine Invariant Markov chain Monte Carlo Ensemble sampler)
* [https://improbable-research.github.io/keanu/ Keanu] a general-purpose probabilistic programming library built in Java
* [https://zeus-mcmc.readthedocs.io/ Zeus] is a pure-Python implementation of the Ensemble Slice Sampling method
* [https://turing.ml/dev/ Turing.jl], a package for general-purpose probabilistic programming in Julia
* [https://mambajl.readthedocs.io/en/latest/ Mamba.jl], a platform for MCMC method in Julia

== See also ==
*[[Coupling from the past]]
*[[Metropolis-adjusted Langevin algorithm]]
*[[Markov chain central limit theorem]]

== References ==
=== Citations ===
{{Reflist}}

=== Sources ===
{{refbegin|40em}}
* Christophe Andrieu, Nando De Freitas, Arnaud Doucet and Michael I. Jordan [http://www.cs.princeton.edu/courses/archive/spr06/cos598C/papers/AndrieuFreitasDoucetJordan2003.pdf ''An Introduction to MCMC for Machine Learning''], 2003
* {{cite book
| last1 = Asmussen
| first1 = Søren
| last2 = Glynn
| first2 = Peter W.
| title = Stochastic Simulation: Algorithms and Analysis
| publisher = Springer
| series = Stochastic Modelling and Applied Probability
| volume = 57
| year = 2007
}}
*{{cite web
| first = P.
| last = Atzberger
| url = http://www.math.ucsb.edu/~atzberg/spring2006/monteCarloMethod.pdf
| title = An Introduction to Monte-Carlo Methods
}}
*{{cite book
| first = Bernd A.
| last = Berg
| author1link = Bernd A. Berg
| title = Markov Chain Monte Carlo Simulations and Their Statistical Analysis
| publisher = [[World Scientific]]
| year = 2004
}}
*{{cite book
| last = Bolstad
| first = William M.
| year = 2010
| title = Understanding Computational Bayesian Statistics
| publisher = Wiley
| isbn = 978-0-470-04609-8
}}
*{{cite journal
| first1 = George
| last1 = Casella
| first2 = Edward I.
| last2 = George
| title = Explaining the Gibbs sampler
| journal = [[The American Statistician]]
| volume = 46
| issue = 3
| pages = 167–174
| year = 1992
| doi=10.2307/2685208
| jstor = 2685208
| citeseerx = 10.1.1.554.3993
}}
*{{cite journal
| first1 = A.E.
| last1 = Gelfand
| first2 = A.F.M.
| last2 = Smith
| title = Sampling-Based Approaches to Calculating Marginal Densities
| journal = [[Journal of the American Statistical Association]]
| volume = 85
| issue = 410
| pages = 398–409
| year = 1990
| doi=10.1080/01621459.1990.10476213
| citeseerx = 10.1.1.512.2330
}}
*{{cite book
| first1 = Andrew
| last1 = Gelman
| author1link = Andrew Gelman
| first2 = John B.
| last2 = Carlin
| first3 = Hal S.
| last3 = Stern
| first4 = Donald B.
| last4 = Rubin
| author4link = Donald B. Rubin
| title = Bayesian Data Analysis
| publisher = [[Chapman and Hall]]
| edition = 1st
| year = 1995
}} ''(See Chapter 11.)''
*{{cite journal
| first1 = S.
| last1 = Geman
| first2 = D.
| last2 = Geman
| author2link = Donald Geman
| title = Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images
| journal = [[IEEE Transactions on Pattern Analysis and Machine Intelligence]]
| volume = 6
| issue = 6
| pages = 721–741
| year = 1984
| doi = 10.1109/TPAMI.1984.4767596
}}
*{{cite book
| last1 = Gilks
| first1 = W.R.
| last2 = Richardson
| first2 = S.
| first3 = D.J.
| last3 = Spiegelhalter
| author3link = David Spiegelhalter
| title = Markov Chain Monte Carlo in Practice
| publisher = [[Chapman and Hall]]/CRC
| year = 1996
}}
*{{cite book
| first = Jeff
| last = Gill
| title = Bayesian methods: a social and behavioral sciences approach
| edition = 2nd
| year = 2008
| publisher = [[Chapman and Hall]]/CRC
| isbn = 978-1-58488-562-7
}}
*{{cite journal
| first = P.J.
| last = Green
| title = Reversible-jump Markov chain Monte Carlo computation and Bayesian model determination
| journal = [[Biometrika]]
| volume=82
| issue = 4
| pages = 711–732
| year = 1995
| doi = 10.1093/biomet/82.4.711
| citeseerx = 10.1.1.407.8942
}}
*{{cite journal
| first = Radford M.
| last = Neal
| title = Slice Sampling
| journal = [[Annals of Statistics]]
| volume = 31
| issue = 3
| pages = 705–767
| year = 2003
| jstor = 3448413
| doi=10.1214/aos/1056562461
| doi-access = free
}}
*{{cite web
| last = Neal
| first = Radford M.
| url = http://www.cs.utoronto.ca/~radford/review.abstract.html
| title = ''Probabilistic Inference Using Markov Chain Monte Carlo Methods''
| year = 1993
}}
*{{cite book
|first = Christian P.
|last = Robert
|last2 = Casella
|first2 = G.
|title = Monte Carlo Statistical Methods
|url = https://archive.org/details/springer_10.1007-978-1-4757-4145-2
|edition = 2nd
|year = 2004
|publisher = Springer
|isbn = 978-0-387-21239-5
}}
*{{cite book
| first1 = R.Y.
| last1 = Rubinstein
| first2 = D.P.
| last2 = Kroese |authorlink2=Dirk Kroese
| title = Simulation and the Monte Carlo Method
| edition = 2nd
| publisher = [[John Wiley & Sons|Wiley]]
| year = 2007
| isbn = 978-0-470-17794-5
}}
*{{cite journal
| first = R.L.
| last = Smith
| title = Efficient Monte Carlo Procedures for Generating Points Uniformly Distributed Over Bounded Regions
| journal = [[Operations Research: A Journal of the Institute for Operations Research and the Management Sciences|Operations Research]]
| volume = 32
| issue = 6
| pages = 1296–1308
| year = 1984
| doi = 10.1287/opre.32.6.1296
}}
* {{cite journal
| first = J.C.
| last = Spall
| title = Estimation via Markov Chain Monte Carlo
| journal = [[IEEE Control Systems Magazine]]
| volume = 23
| issue = 2
| pages = 34–45
|date=April 2003
| doi=10.1109/mcs.2003.1188770
}}
*{{cite journal
| last1 = Stramer
| first1 = O.
| last2 = Tweedie
| first2 = R.
| year = 1999
| title = Langevin-Type Models II: Self-Targeting Candidates for MCMC Algorithms
| journal = Methodology and Computing in Applied Probability
| volume = 1
| issue = 3
| pages = 307–328
| doi = 10.1023/A:1010090512027
}}
{{refend}}

== Further reading ==

{{refbegin}}
*{{cite journal
| first = Persi
| last = Diaconis
| author1link = Persi Diaconis
| url = http://www.ams.org/bull/2009-46-02/S0273-0979-08-01238-X/S0273-0979-08-01238-X.pdf
| title = The Markov chain Monte Carlo revolution
| journal = [[Bull. Amer. Math. Soc.]]
| volume = 46
| issue = 2
|date=April 2009
| pages = 179–205
| id = S 0273-0979(08)01238-X
| doi=10.1090/s0273-0979-08-01238-x
}}
*{{Citation
| last1 = Press
| first1 = W.H.
| author1link = William H. Press
| last2 = Teukolsky
| first2 = S.A.
| author2link = Saul Teukolsky
| last3 = Vetterling
| first3 = W.T.
| last4 = Flannery
| first4 = B.P.
| year = 2007
| title = Numerical Recipes: The Art of Scientific Computing
| edition = 3rd
| publisher = [[Cambridge University Press]]
| isbn = 978-0-521-88068-8
| chapter = Section 15.8. Markov Chain Monte Carlo
| chapter-url = http://apps.nrbook.com/empanel/index.html#pg=824
}}
*{{cite journal
| last = Richey
| first = Matthew
| url = http://stat.wharton.upenn.edu/~stjensen/stat542/lecture14.mcmchistory.pdf
| title = The Evolution of Markov Chain Monte Carlo Methods
| journal = [[The American Mathematical Monthly]]
| volume = 117
| issue = 5
|date=May 2010
| pages = 383–413
| doi=10.4169/000298910x485923
| citeseerx = 10.1.1.295.4478
}}
{{refend}}

== External links ==
*[https://web.archive.org/web/20110531150413/http://www.bioss.ac.uk/students/alexm/MCMCintroPresentation.pdf MCMC sampling and other methods in a basic overview], by Alexander Mantzaris ([http://www.bioss.ac.uk/students/alexm/MCMCintroPresentation.pdf original link - now broken])
*[https://pymc-devs.github.io/pymc/ PyMC] - Python module implementing Bayesian statistical models and fitting algorithms, including Markov chain Monte Carlo.
*[http://a2rms.sourceforge.net IA2RMS] is a Matlab code of the "Independent Doubly Adaptive Rejection Metropolis Sampling" method, {{harvtxt |Martino |Read |Luengo |2015}}, for drawing from the full-conditional densities within a Gibbs sampler.

{{DEFAULTSORT:Markov Chain Monte Carlo}}
[[Category:Monte Carlo methods]]
[[Category:Markov chain Monte Carlo| ]]
[[Category:Computational statistics]]
[[Category:Markov models]]
[[Category:Bayesian estimation]]


<!-- Так помечается раздел (если он нужен)-->
== ==
== ==



Версия от 18:23, 4 мая 2020

Шаблон:Short description Шаблон:Bayesian statistics

In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain a sample of the desired distribution by recording states from the chain. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution. Various algorithms exist for constructing chains, including the Metropolis–Hastings algorithm.

Application domains

MCMC methods are primarily used for calculating numerical approximations of multi-dimensional integrals, for example in Bayesian statistics, computational physics,[1] computational biology[2] and computational linguistics.[3][4]

In Bayesian statistics, the recent development of MCMC methods has made it possible to compute large hierarchical models that require integrations over hundreds to thousands of unknown parameters.[5]

In rare event sampling, they are also used for generating samples that gradually populate the rare failure region.

In cryptocurrency, MCMC has been used for generating forecasts based on historical price data.

General explanation

Convergence of the Metropolis–Hastings algorithm. Markov chain Monte Carlo attempts to approximate the blue distribution with the orange distribution.

Markov chain Monte Carlo methods create samples from a continuous random variable, with probability density proportional to a known function. These samples can be used to evaluate an integral over that variable, as its expected value or variance.

Practically, an ensemble of chains is generally developed, starting from a set of points arbitrarily chosen and sufficiently distant from each other. These chains are stochastic processes of "walkers" which move around randomly according to an algorithm that looks for places with a reasonably high contribution to the integral to move into next, assigning them higher probabilities.

Random walk Monte Carlo methods are a kind of random simulation or Monte Carlo method. However, whereas the random samples of the integrand used in a conventional Monte Carlo integration are statistically independent, those used in MCMC are autocorrelated. Correlations of samples introduces the need to use the Markov chain central limit theorem when estimating the error of mean values.

These algorithms create Markov chains such that they have an equilibrium distribution which is proportional to the function given.

Reducing correlation

While MCMC methods were created to address multi-dimensional problems better than generic Monte Carlo algorithms, when the number of dimensions rises they too tend to suffer the curse of dimensionality: regions of higher probability tend to stretch and get lost in an increasing volume of space that contributes little to the integral. One way to address this problem could be shortening the steps of the walker, so that it doesn't continuously try to exit the highest probability region, though this way the process would be highly autocorrelated and expensive (i.e. many steps would be required for an accurate result). More sophisticated methods such as Hamiltonian Monte Carlo and the Wang and Landau algorithm use various ways of reducing this autocorrelation, while managing to keep the process in the regions that give a higher contribution to the integral. These algorithms usually rely on a more complicated theory and are harder to implement, but they usually converge faster.

Examples

Random walk

  • Metropolis–Hastings algorithm: This method generates a Markov chain using a proposal density for new steps and a method for rejecting some of the proposed moves. It is actually a general framework which includes as special cases the very first and simpler MCMC (Metropolis algorithm) and many more recent alternatives listed below.
    • Gibbs sampling: This method requires all the conditional distributions of the target distribution to be sampled exactly. When drawing from the full-conditional distributions is not straightforward other samplers-within-Gibbs are used (e.g., see [6][7][8]). Gibbs sampling is popular partly because it does not require any 'tuning'.
    • Metropolis-adjusted Langevin algorithm and other methods that rely on the gradient (and possibly second derivative) of the log target density to propose steps that are more likely to be in the direction of higher probability density.[9]
    • Pseudo-marginal Metropolis–Hastings: This method replaces the evaluation of the density of the target distribution with an unbiased estimate and is useful when the target density is not available analytically, e.g. latent variable models.
  • Slice sampling: This method depends on the principle that one can sample from a distribution by sampling uniformly from the region under the plot of its density function. It alternates uniform sampling in the vertical direction with uniform sampling from the horizontal 'slice' defined by the current vertical position.
  • Multiple-try Metropolis: This method is a variation of the Metropolis–Hastings algorithm that allows multiple trials at each point. By making it possible to take larger steps at each iteration, it helps address the curse of dimensionality.[10][11]
  • Reversible-jump: This method is a variant of the Metropolis–Hastings algorithm that allows proposals that change the dimensionality of the space.[12] Markov chain Monte Carlo methods that change dimensionality have long been used in statistical physics applications, where for some problems a distribution that is a grand canonical ensemble is used (e.g., when the number of molecules in a box is variable). But the reversible-jump variant is useful when doing Markov chain Monte Carlo or Gibbs sampling over nonparametric Bayesian models such as those involving the Dirichlet process or Chinese restaurant process, where the number of mixing components/clusters/etc. is automatically inferred from the data.
  • Hamiltonian (or Hybrid) Monte Carlo (HMC): Tries to avoid random walk behaviour by introducing an auxiliary momentum vector and implementing Hamiltonian dynamics, so the potential energy function is the target density. The momentum samples are discarded after sampling. The end result of Hybrid Monte Carlo is that proposals move across the sample space in larger steps; they are therefore less correlated and converge to the target distribution more rapidly.

Training-based

Another approach is to use the previous steps to generate the next candidate. This training-based algorithm is able to accelerate MCMC by an order of magnitude.[13]

Interacting MCMC methodologies are a class of mean field particle methods for obtaining random samples from a sequence of probability distributions with an increasing level of sampling complexity.[14] These probabilistic models include path space state models with increasing time horizon, posterior distributions w.r.t. sequence of partial observations, increasing constraint level sets for conditional distributions, decreasing temperature schedules associated with some Boltzmann-Gibbs distributions, and many others. In principle, any Markov chain Monte Carlo sampler can be turned into an interacting Markov chain Monte Carlo sampler. These interacting Markov chain Monte Carlo samplers can be interpreted as a way to run in parallel a sequence of Markov chain Monte Carlo samplers. For instance, interacting simulated annealing algorithms are based on independent Metropolis-Hastings moves interacting sequentially with a selection-resampling type mechanism. In contrast to traditional Markov chain Monte Carlo methods, the precision parameter of this class of interacting Markov chain Monte Carlo samplers is only related to the number of interacting Markov chain Monte Carlo samplers. These advanced particle methodologies belong to the class of Feynman-Kac particle models,[15][16] also called Sequential Monte Carlo or particle filter methods in Bayesian inference and signal processing communities.[17] Interacting Markov chain Monte Carlo methods can also be interpreted as a mutation-selection genetic particle algorithm with Markov chain Monte Carlo mutations.

Markov Chain quasi-Monte Carlo (MCQMC).[18][19]

The advantage of low-discrepancy sequences in lieu of random numbers for simple independent Monte Carlo sampling is well known.[20] This procedure, known as Quasi-Monte Carlo method (QMC),[21] yields an integration error that decays at a superior rate to that obtained by IID sampling, by the Koksma-Hlawka inequality. Empirically it allows the reduction of both estimation error and convergence time by an order of magnitude.Ошибка: некорректно задана дата установки (исправьте через подстановку шаблона) The Array-RQMC method combines randomized quasi-Monte Carlo and Markov chain simulation by simulating chains simultaneously in a way that the empirical distribution of the states at any given step is a better approximation of the true distribution of the chain than with ordinary MCMC.[22] In empirical experiments, the variance of the average of a function of the state sometimes converges at rate or even faster, instead of the Monte Carlo rate.[23]

Convergence

Usually it is not hard to construct a Markov chain with the desired properties. The more difficult problem is to determine how many steps are needed to converge to the stationary distribution within an acceptable error.[24] A good chain will have rapid mixing: the stationary distribution is reached quickly starting from an arbitrary position. A standard empirical method to assess convergence is to run several independent simulated Markov chains and check that the ratio of inter-chain to intra-chain variances for all the parameters sampled is close to 1.[24][25]

Typically, Markov chain Monte Carlo sampling can only approximate the target distribution, as there is always some residual effect of the starting position. More sophisticated Markov chain Monte Carlo-based algorithms such as coupling from the past can produce exact samples, at the cost of additional computation and an unbounded (though finite in expectation) running time.

Many random walk Monte Carlo methods move around the equilibrium distribution in relatively small steps, with no tendency for the steps to proceed in the same direction. These methods are easy to implement and analyze, but unfortunately it can take a long time for the walker to explore all of the space. The walker will often double back and cover ground already covered.

Further consideration of convergence is at Markov chain central limit theorem. See [26] for a discussion of the theory related to convergence and stationarity of the Metropolis-Hastings algorithm.

Software

Several software programs provide MCMC sampling capabilities, for example:

See also

References

Citations

  1. Kasim, M.F.; Bott, A.F.A.; Tzeferacos, P.; Lamb, D.Q.; Gregori, G.; Vinko, S.M. (September 2019). "Retrieving fields from proton radiography without source profiles". Physical Review E. 100. arXiv:1905.12934. doi:10.1103/PhysRevE.100.033208.
  2. Gupta, Ankur; Rawlings, James B. (April 2014). "Comparison of Parameter Estimation Methods in Stochastic Chemical Kinetic Models: Examples in Systems Biology". AIChE Journal. 60 (4): 1253—1268. doi:10.1002/aic.14409. PMC 4946376. PMID 27429455.
  3. See Gill 2008.
  4. See Robert & Casella 2004.
  5. Banerjee, Sudipto. Hierarchical Modeling and Analysis for Spatial Data / Sudipto Banerjee, Bradley P. Carlin, Alan P. Gelfand. — Second. — CRC Press, 2014-09-12. — P. xix. — ISBN 978-1-4398-1917-3.
  6. Gilks, W. R.; Wild, P. (1992-01-01). "Adaptive Rejection Sampling for Gibbs Sampling". Journal of the Royal Statistical Society. Series C (Applied Statistics). 41 (2): 337—348. doi:10.2307/2347565. JSTOR 2347565.
  7. Gilks, W. R.; Best, N. G.; Tan, K. K. C. (1995-01-01). "Adaptive Rejection Metropolis Sampling within Gibbs Sampling". Journal of the Royal Statistical Society. Series C (Applied Statistics). 44 (4): 455—472. doi:10.2307/2986138. JSTOR 2986138.
  8. Martino, L.; Read, J.; Luengo, D. (2015-06-01). "Independent Doubly Adaptive Rejection Metropolis Sampling Within Gibbs Sampling". IEEE Transactions on Signal Processing. 63 (12): 3123—3138. arXiv:1205.5494. Bibcode:2015ITSP...63.3123M. doi:10.1109/TSP.2015.2420537. ISSN 1053-587X. {{cite journal}}: Недопустимый |ref=harv (справка)
  9. See Stramer 1999.
  10. Liu, Jun S.; Liang, Faming; Wong, Wing Hung (2000-03-01). "The Multiple-Try Method and Local Optimization in Metropolis Sampling". Journal of the American Statistical Association. 95 (449): 121—134. doi:10.1080/01621459.2000.10473908. ISSN 0162-1459.
  11. Martino, Luca; Read, Jesse (2013-07-11). "On the flexibility of the design of multiple try Metropolis schemes". Computational Statistics. 28 (6): 2797—2823. arXiv:1201.0646. doi:10.1007/s00180-013-0429-2. ISSN 0943-4062.
  12. See Green 1995.
  13. Tahmasebi, Pejman; Javadpour, Farzam; Sahimi, Muhammad (August 2016). "Stochastic shale permeability matching: Three-dimensional characterization and modeling". International Journal of Coal Geology. 165: 231—242. doi:10.1016/j.coal.2016.08.024.
  14. Del Moral, Pierre. Mean field simulation for Monte Carlo integration. — Chapman & Hall/CRC Press, 2013. — P. 626. — «Monographs on Statistics & Applied Probability».
  15. Del Moral, Pierre. Feynman-Kac formulae. Genealogical and interacting particle approximations. — Springer, 2004. — P. 575. — «Series: Probability and Applications».
  16. Del Moral, Pierre. Séminaire de Probabilités XXXIV / Pierre Del Moral, Laurent Miclo. — 2000. — Vol. 1729. — P. 1–145. — ISBN 978-3-540-67314-9. — doi:10.1007/bfb0103798.
  17. Del Moral, Pierre (2006). "Sequential Monte Carlo samplers - P. Del Moral - A. Doucet - A. Jasra - 2006 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library". Journal of the Royal Statistical Society. Series B (Statistical Methodology). 68 (3): 411—436. arXiv:cond-mat/0212648. doi:10.1111/j.1467-9868.2006.00553.x.
  18. Chen, S.; Dick, Josef; Owen, Art B. (2011). "Consistency of Markov chain quasi-Monte Carlo on continuous state spaces". Annals of Statistics. 39 (2): 673—701. doi:10.1214/10-AOS831.
  19. Tribble, Seth D. (2007). Markov chain Monte Carlo algorithms using completely uniformly distributed driving sequences (Diss.). Stanford University. ProQuest 304808879.
  20. Papageorgiou, Anargyros; Traub, J. F. (1996). "Beating Monte Carlo". Risk. 9 (6): 63—65.
  21. Sobol, Ilya M (1998). "On quasi-monte carlo integrations". Mathematics and Computers in Simulation. 47 (2): 103—112. doi:10.1016/s0378-4754(98)00096-2.
  22. L'Ecuyer, P.; Lécot, C.; Tuffin, B. (2008). "A Randomized Quasi-Monte Carlo Simulation Method for Markov Chains" (PDF). Operations Research. 56 (4): 958—975. doi:10.1287/opre.1080.0556.
  23. L'Ecuyer, P.; Munger, D.; Lécot, C.; Tuffin, B. (2018). "Sorting Methods and Convergence Rates for Array-RQMC: Some Empirical Comparisons". Mathematics and Computers in Simulation. 143: 191—201. doi:10.1016/j.matcom.2016.07.010.
  24. 1 2 Gelman, A.; Rubin, D.B. (1992). "Inference from iterative simulation using multiple sequences (with discussion)" (PDF). Statistical Science. 7 (4): 457—511. Bibcode:1992StaSc...7..457G. doi:10.1214/ss/1177011136.
  25. Cowles, M.K.; Carlin, B.P. (1996). "Markov chain Monte Carlo convergence diagnostics: a comparative review". Journal of the American Statistical Association. 91 (434): 883—904. CiteSeerX 10.1.1.53.3445. doi:10.1080/01621459.1996.10476956.
  26. Hill, S. D.; Spall, J. C. (2019). "Stationarity and Convergence of the Metropolis-Hastings Algorithm: Insights into Theoretical Aspects". IEEE Control Systems Magazine. 39 (1): 56—67. doi:10.1109/MCS.2018.2876959.
  27. greta's software dependencies and inspirations. greta-stats.org/. Дата обращения: 13 апреля 2020.
  28. Enright, AJ; Van Dongen, S; Ouzounis, CA (1 April 2002). "An efficient algorithm for large-scale detection of protein families". Nucleic Acids Research. 30 (7): 1575—84. doi:10.1093/nar/30.7.1575. PMC 101833. PMID 11917018.
  29. Azad, A; Pavlopoulos, GA; Ouzounis, CA; Kyrpides, NC; Buluç, A (6 April 2018). "HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks". Nucleic Acids Research. 46 (6): e33. doi:10.1093/nar/gkx1313. PMC 5888241. PMID 29315405.

Sources

Further reading

См. также

Примечания

Литература

Ссылки

[[:Категория:]]