Model-based machine learning
Overall Course Objectives
This course is designed for engineers, systems analysts, statisticians or related professionals looking to perform advanced data analysis in their future research or practice. Model-based machine learning corresponds to a class of algorithms, called Probabilistic Graphical Models (PGMs), that allow the combination of domain knowledge with data-driven methods, in a very simple way.
While Machine Learning has plenty of algorithms (e.g. Neural Networks, Gaussian Processes, Support Vector Machines, Decision Trees, etc.) that have the benefit of being “push-button” solutions, they are generally very hardly adaptable beyond the original design. Our task becomes about transforming our problem and data to fit each individual algorithm. Many times, we drop relevant information (e.g. known relationship between 2 variables, different noise distributions in input variables), and our results may suffer from it.
PGMs allow us to include prior knowledge, parametric and non-parametric (sub)-models, and uncertainty about inputs and parameters. They are perfect for combining different types of data, and, in the past few years, a growing community has developed tools for PGMs, that simplify its design and inference process. Together with Deep Learning, PGMs belong to the forefront of Machine Learning and Data Mining research, essential to process Big and Small Data.
While this course is, in nature, about methodology, it is grounded on a sequence of example applications, mostly focused on transport system problems.
See course description in Danish
Learning Objectives
- Explain central concepts in model-based machine learning, including probabilistic graphical models (PGMs), Bayesian inference and belief propagation
- Examine use cases for different PGMs and distinguish their underlying assumptions
- Implement PGMs in a probabilistic programming language (e.g. Pyro or Stan)
- Recognise practical data modelling aspects, like overfitting, system (e.g. spatial-temporal) dynamics, conditional independence, imputation, conjugate prior
- Evaluate quality of different models for given a problem and dataset
- Relate existing problems and data with modelling approaches to tackle them
- Formulate new models given a problem and data
- Develop and present a project based on a PGM
- Present, and be able to argue for, a project based on a PGM
Course Content
This course will have lectures with slides and with laboratory work using interactive tools (Jupyter notebooks in Python using a probabilistic programming language like Pyro or STAN). Students will always work manually each module, during and after the theoretical class, to assimilate new concepts. It is designed to be incremental, and strongly supported with practice.
Modules:
– Review of basics – random variable, probability distributions, Bayes Theorem
– Probabilistic Graphical Models foundations – Bayesian networks, factorization,
D-separation, conditional independence
– Probabilistic Graphical Models – Generative models, Representing your own problem
– Different models – Regression, Classification, Hierarchical Models, Temporal models, Generative models, Gaussian Processes
– Inference – Exact Inference
– Inference – Markov Chain Monte Carlo
– Inference – Variational Inference
– Advanced topics
Teaching Method
Lectures and practical laboratories with iPython notebook