Applied Machine Learning and Big Data
Overall Course Objectives
Systematically acquisition, cleaning, storing and analysis of data and reporting these establishes ability to react – and has the potential to transform the business in many companies.
In the course work is carried out on structured and unstructured heterogeneous data targeting data visualization, clustering and classification.
Furthermore work is carried out using cloud services for big data analysis, and administration of Unix/Linux as basis for server platforms.
A part of the course is concentrated on carrying out a project according to the student interest. In the project are applied methods presented in the course. A part of the project is searching for project relevant information in a scientific information base, e.g. DTU Findit
See course description in Danish
Learning Objectives
- Understand and apply multidimensional data representations.
- Understand and apply methods for cleaning of datasets.
- Understand and apply machine leaning for visualization of datasets.
- Understand and apply machine learning for cluster analysis (unsupervised classification) of dataset.
- Understand and apply machine learning for classification.
- Understand and apply cloud services for big data analysis.
- Understand and apply fundamental Unix/Linux tools for administration of server platform for big data.
- Understand and apply a scientific database, e.g. DTU Findit for search of references targeted a given project area.
Course Content
Data representation in multidimensional heterogeneous datasets.
Cleaning of dataset, including identification and removal of outliers.
Application for machine learning based visualization; cluster analysis
e.g. k-nearest-neighbor, hierarchical clusteranalysis, spectral clustering, naiive Bayes;
and classification e.g. logistic regression, support vector machines, decision trees, random forests, deep neural networks, recurrent neural networks.
Applying cloud services for big data analysis, including tools for administration of a server platform.
Applying a scientific database for knowledge.
Recommended prerequisites
You must have programming knowledge and desire to learn more programming. Bring your own PC (Mac, Windows or Linux). Depending on your data size you must expect to allocate funds for the storage and processing of the data.
You must have basic knowledge about Linear Algebra (such as vector/matrix multiplication/inverse, eigenvalues/eigenvectors) and Probability Theory (such as probability density function, probability distributions, central limit theorem, bayes theorem).
Teaching Method
Lectures and project work
Faculty
Remarks
Section of AI, Mathematics and Software
Sundhedsteknologi: valgfag
IT-elektronik:valgfag
Softwareteknologi:valgfag
ITØ:valgfag
Limited number of seats
Minimum: 6, Maximum: 15.
Please be aware that this course has a minimum requirement for the number of participants needed, in order for it to be held. If these requirements are not met, then the course will not be held. Furthermore, there is a limited number of seats available. If there are too many applicants, a pool will be created for the remainder of the qualified applicants, and they will be selected at random. You will be informed 8 days before the start of the course, whether you have been allocated a spot.