This book introduces the gamma-divergence, a measure of distance between probability distributions that was proposed by Fujisawa and Eguchi in 2008. The gamma-divergence has been extensively explored to provide robust estimation when the power index γ is positive. The gamma-divergence can be defined even when the power index γ is negative, as long as the condition of integrability is satisfied. Thus, the authors consider the gamma-divergence defined on a set of discrete distributions. The arithmetic, geometric, and harmonic means for the distribution ratios are closely connected with the gamma-divergence with a negative γ. In particular, the authors call the geometric-mean (GM) divergence the gamma-divergence when γ is equal to -1. The book begins by providing an overview of the gamma-divergence and its properties. It then goes on to discuss the applications of the gamma-divergence in various areas, including machine learning, statistics, and ecology.
Bernoulli, categorical, Poisson, negative binomial, and Boltzmann distributions are discussed as typical examples. Furthermore, regression analysis models that explicitly or implicitly assume these distributions as the dependent variable in generalized linear models are discussed to apply the minimum gamma-divergence method. In ensemble learning, AdaBoost is derived by the exponential loss function in the weighted majority vote manner. It is pointed out that the exponential loss function is deeply connected to the GM divergence. In the Boltzmann machine, the maximum likelihood has to use approximation methods such as mean field approximation because of the intractable computation of the partition function. However, by considering the GM divergence and the exponential loss, it is shown that the calculation of the partition function is not necessary, and it can be executed without variational inference.