For more details see Bayesian Statistics

Keywords and phrases:

Amount of Information, Decision Theory, Exchangeability, Foundations of Inference, Hypothesis Testing, Interval Estimation, Intrinsic, Discrepancy, Maximum Entropy, Point Estimation, Rational Degree of Belief, Reference Analysis, Scientific Reporting.

Mathematical statistics uses two major paradigms, conventional (or

Bayesian methods provide a complete paradigm for both statistical inference and decision making

under uncertainty.

Keywords and phrases:

Amount of Information, Decision Theory, Exchangeability, Foundations of Inference, Hypothesis Testing, Interval Estimation, Intrinsic, Discrepancy, Maximum Entropy, Point Estimation, Rational Degree of Belief, Reference Analysis, Scientific Reporting.

Mathematical statistics uses two major paradigms, conventional (or

**frequentist**),**and Bayesian**.Bayesian methods provide a complete paradigm for both statistical inference and decision making

under uncertainty.

**Bayesian methods contain as particular cases many of****the more often used frequentist procedures, solve many of the difficulties**faced by conventional statistical methods, and**extend the applicability of statistical methods**. In particular, Bayesian methods make it possible to incorporate scientific hypothesis in the analysis (by means of the prior distribution) and may be applied to problems whose structure is too complex for conventional methods to be able to handle. The Bayesian paradigm is based on an interpretation of probability as a**rational, conditional measure of uncertainty**. Statistical inference about a quantity of interest is described as the modification of the uncertainty about its value in the light of evidence, and Bayes’ theorem precisely specifies**how this modification should be made**.Scientific experimental or observational results generally consist of (possibly many) sets of data of the general form D = {x1, . . . , xn}, where the xi’s are somewhat “homogeneous” (possibly multidimensional) observations xi. Statistical methods are then typically used to derive conclusions on both the nature of the process which has produced those observations, and on the expected behavior at future instances of the same process. A central element of any statistical analysis is the

Bayesian statistics only require the

The main consequence of these foundations is the mathematical need to describe by means of probability distributions all uncertainties present in the problem. In particular,

An important particular case arises when

In this paper it is assumed that probability distributions may be described through their probability density functions, and no distinction is made between a random quantity and the particular values that it may take. Bold italic roman fonts are used for observable random vectors (typically data) and bold italic greek fonts are used for unobservable random vectors (typically parameters); lower case is used for variables and upper calligraphic case for their dominion sets. Moreover, the standard mathematical convention of referring to functions, say fx and gx of x ∈ X, respectively by f(x) and g(x), will be used throughout. Thus,

**specification of a probability model which is assumed to describe the mechanism****which has generated the observed data D as a function of a (possibly multidimensional) parameter****(vector) ω ∈ Ω**, sometimes referred to as the state of nature, about whose value only limited information (if any) is available. All derived statistical conclusions are obviously conditional on the assumed probability model. Unlike most other branches of mathematics, conventional methods of statistical inference suffer from the**lack of an axiomatic basis**. In marked contrast, the Bayesian approach to statistical inference is firmly based on axiomatic foundations which provide**a unifying logical structure,****and guarantee the mutual consistency of the methods proposed**. Bayesian methods constitute a complete paradigm to statistical inference, a scientific revolution in Kuhn (1962) sense.Bayesian statistics only require the

**mathematics of probability theory and the interpretation of probability**which most closely corresponds to the standard use of this word in everyday language: it is no accident that some of the more important seminal books on Bayesian statistics, such as the works of de Laplace (1812), Jeffreys (1939) and de Finetti (1970) or are actually entitled “Probability Theory”.The main consequence of these foundations is the mathematical need to describe by means of probability distributions all uncertainties present in the problem. In particular,

**unknown parameters in probability models must have a joint probability distribution which describes the available information about their values**; this is often regarded as the characteristic element of a Bayesian approach. Notice that (in sharp contrast to conventional statistics)**parameters are treated as random variables within the Bayesian paradigm**. This is not a description of their variability (parameters are typically fixed unknown quantities) but a description of the uncertainty about their true values.An important particular case arises when

**either no relevant prior information is readily available, or that information is subjective and an “objective” analysis is desired**, one that is exclusively based on accepted model assumptions and well-documented data. This is addressed by reference analysis which**uses information-theoretic concepts to derive appropriate reference posterior distributions**, defined to encapsulate inferential conclusions on the quantities of interest solely based on the supposed model and the observed data.In this paper it is assumed that probability distributions may be described through their probability density functions, and no distinction is made between a random quantity and the particular values that it may take. Bold italic roman fonts are used for observable random vectors (typically data) and bold italic greek fonts are used for unobservable random vectors (typically parameters); lower case is used for variables and upper calligraphic case for their dominion sets. Moreover, the standard mathematical convention of referring to functions, say fx and gx of x ∈ X, respectively by f(x) and g(x), will be used throughout. Thus,

**p(θ |C) and p(x |C) respectively represent general probability densities of the random vectors θ ∈ Θ and x ∈ X under conditions C**, so that p(θ |C) ≥ 0, Integral Θ p(θ |C) dθ = 1, and p(x |C) ≥ 0, integral X p(x |C) dx = 1. This admittedly imprecise notation will greatly simplify the exposition. If the random vectors are discrete, these functions naturally become probability mass functions, and integrals over their values become sums.Bayesian methods make use of the the concept of

**intrinsic discrepancy**, a very general measure of the divergence between two probability distributions. The intrinsic discrepancy δ{p1, p2} between two distributions of the random vector x ∈ X described by their density functions p1(x) and p1(x) is defined asIt may be shown that the intrinsic divergence is

If p1(x | θ) and p2(x |λ) describe two alternative distributions for data x ∈ X, one of which is assumed to be true,

The intrinsic discrepancy serves to define a useful type of convergence; a sequence of densities

{pi(x)}

**symmetric, non-negative**(and**it is zero if, and****only if, p1(x) = p2(x) almost everywhere**); it is is invariant under one-to-one transformations of x. Besides, it is additive: if x = {x1, . . . , xn} and pi(x) =*sum_all from**j***qi(xj), then δ{p1, p2} = nδ{q1, q2}. Last, but not least, it is defined even if the support of one of the densities is strictly contained in the support of the other.***=1 to n*If p1(x | θ) and p2(x |λ) describe two alternative distributions for data x ∈ X, one of which is assumed to be true,

**their intrinsic discrepancy δ{p1, p2} is the minimum expected log-likelihood ratio in favour of the true sampling distribution**. For example, the intrinsic discrepancy between a Binomial distribution with probability function Bi(r | n, φ) and its Poisson approximation Pn(r |nφ), is δ(n, φ) =**Bi(r | n, φ) log[Bi(r | n, φ)/Pn(r | nφ)] (since the second sum diverges); it is easily verified that δ(10, 0.05) ≈ 0.0007, corresponding to an expected likelihood ratio for the Binomial when it is true of only 1.0007; thus, Bi(r | 10, 0.05) is quite well approximated by Pn(r | 0.5).***sum_all from r =0 to n*The intrinsic discrepancy serves to define a useful type of convergence; a sequence of densities

{pi(x)}

**converges intrinsically to a density p(x) if (and only if), limi→∞ δ{pi, p} = 0, i.e., if (and only if) the sequence of the corresponding intrinsic discrepancies converges to zero.***from i=1to ∞***Foundations**

A central element of the Bayesian paradigm is the use of probability distributions to describe all relevant unknown quantities, interpreting the probability of an event as a conditional measure of uncertainty, on a [0, 1] scale, about the occurrence of the event in some specific conditions. The limiting extreme values 0 and 1, which are typically inaccessible in applications, respectively describe impossibility and certainty of the occurrence of the event. This interpretation of probability includes and extends all other probability interpretations. There are two independent arguments which prove the mathematical inevitability of the use of probability distributions to describe uncertainties; these are summarized later in this section.