Machine learning: a quick review (part 7)
7- Bayesian Learning
Bayesian statistics provides us the tools to update our beliefs (represented as probability distributions) based on new data
Uncertainty: Lack of knowledge that is intrinsic to the world n Probability distributions are exactly that and it turns out that these are the key to understanding Gaussian processes.
7-1- Gaussian process
A Gaussian process is a probability distribution over possible functions.
Bayes’ rule to update our distribution of functions by observing training data
Gaussian processes know what they don’t know.Gaussian processes let you incorporate expert knowledge. When you’re using a GP to model your problem you can shape your prior belief via the choice of kernel
GP are nonparameteric. Nonparameteric need to take into account the whole training data each time they make a prediction. computational cost of predictions scales (cubically!) with the number of training samples
Parametric approaches distill knowledge about the training data into a set of numbers. Ex. Linear regression, Neural network
7-2- Bayesian Neural network
weights are considered a probability distribution Infinite weights
Since Integrate over all evidence of infinite weights is analytically intractable, simulation or numerical based alternative approaches such as Monte Carlo Markov Chain (MCMC), variational inference (VI) are considered