Chapter 12 Bayesian machine learning

In this chapter, we focus on Bayesian approaches to supervised Machine learning (ML) problems, where the outcome variable is observed and used to guide prediction or inference. In contrast, unsupervised ML refers to settings in which the outcome variable is not observed, such as in clustering or dimensionality reduction.

Machine learning methods are often characterized by high-dimensional parameter spaces, particularly in the context of nonparametric inference. It is important to note that nonparametric inference does not imply the absence of parameters, but rather models with potentially infinitely many parameters. This setting is often referred to as the wide problem, where the number of input variables, and consequently parameters, can exceed the sample size.

Another common challenge in ML is the tall problem, which occurs when the sample size is extremely large, necessitating scalable algorithms.

Specifically, we introduce Bayesian ML tools for regression, including regularization techniques, regression trees, and Gaussian processes. Extensions of these methods for binary classification are explored in some of the exercises.

The section begins with a discussion on the relationship between cross-validation and Bayes factors and concludes with Bayesian approaches for addressing large-scale data challenges.