Machine Learning Lecture Recordings
I have uploaded most of my “Machine Learning” lecture to YouTube.
The slides are in English, but the audio is in German.
Some very basic contents (e.g., a demo of standard k-means clustering) were left out from this advanced class, and instead only a link to recordings from an earlier class were given. In this class, I wanted to focus on the improved (accelerated) algorithms instead. These are not included here (yet). I believe there are some contents covered in this class you will find nowhere else (yet).
The first unit is pretty long (I did not split it further yet). The later units are shorter recordings.
ML F1: Principles in Machine Learning
- Principles in Machine Learning
- Principles in Machine Learning /2
- Occam’s Razor – Principle of Parsimony
- Simple Models …
- Computational Learning Theory
- Probably Approximately Correct Learning (PAC Learning) /1
- Probably Approximately Correct Learning (PAC Learning) /2
- PAC Learnable – Examples
- VC Dimension (Vapnik-Chervonenkis-Dimension )
- VC Dimension Example
- Error Bounds and the VC Dimension
- No Free Lunch
- No Free Lunch Theorem
- No Free Lunch Theorem – Explanation
- Bias-Variance Tradeoff
- Bias-Variance Tradeoff
- Bias vs. Variance
- Bias-Variance Decomposition
- Bias-Variance Illustration
- Different Kinds of Bias
- Data Often Has Bias
- AI Can Be Sexist and Racist
- Relationships
ML F2/F3: Correlation does not Imply Causation & Multiple Testing Problem
- Correlation does not Imply Causation
- Correlation does not Imply Causation /2
- Correlation does not Imply Causation /3
- Correlation with Statistics Classes
- Multiple Testing Problem
- Bonferroni’s Principle – Multiple Testing Problem
- Multiple Testing Problem
ML F4: Overfitting – Überanpassung
- Overfitting
- Underfitting and Overfitting
- Overfitting Decision Tree
- Overfitting Due to Noise
- Overfitting Due to Insufficient Examples
ML F5: Fluch der Dimensionalität – Curse of Dimensionality
- Curse of Dimensionality
- Combinatorial Explosion
- Concentration of Distances
- Data is in the Margins
- Illustration: “Shrinking” Hyperspheres
- Illustration: “Shrinking” Hyperspheres /2
- Effect on Search in High Dimensionality
- Summary
ML F6: Intrinsische Dimensionalität – Intrinsic Dimensionality
- Intrinsic Dimensionality
- Estimating Intrinsic Dimensionality
- Angle-Based Intrinsic Dimensionality Intuition
- Angle-Based Intrinsic Dimensionality (ABID) /2
- Consequences & Solutions
ML F7: Distanzfunktionen und Ähnlichkeitsfunktionen
- Distance Functions
- Distances, Metrics and Similarities
- Distances, Metrics and Similarities /2
- Distance Functions
- Distance Functions /2
- Similarity Functions
- Distances for Binary Data
- Jaccard Coefficient for Sets
- Example Distances for Categorical Data
- Mahalanobis Distance
- Scaling & Normalization
- To Scale, or not to Scale?
- To Scale, or not to Scale? /2
ML L1: Einführung in die Klassifikation
- Classification
- Prediction Problems
- Classification: A Multi-Stage Process
- Classification Problem
- Example
- Process of Constructing a Model
- Process of Applying the Model
ML L2: Evaluation und Wahl von Klassifikatoren
- Evaluation and Selection of Classifiers
- Quick Recap: Classification
- Classifier Evaluation: Confusion Matrix
- Classifier Evaluation: Accuracy and Error-Rate
- Precision, Recall, and F-measure
- Classifier Evaluation: Multi-Class Confusion Matrix
- Training Accuracy vs. Accuarcy on New Data
- The Need for Validation
- Holdout Validation
- Cross-Validation
- Bootstrap Validation
- Considerations for Selecting a Model
ML L3: Bayes-Klassifikatoren
- Bayesian Classification
- Bayes Classification: Motivation
- Bayes’ Theorem: Review
- Optimal Bayes Classifier
- Naïve Bayes Classifier
- Probability Models for a Single Attribute
- Multivariate Gaussian Bayes Classification
- Naïve Bayes Classifier: Example
- Naïve Bayes Classifier: Computational Aspects
- Naïve Bayes Classifier: Comments & Discussion
ML L4: Nächste-Nachbarn Klassifikation
- Nearest-Neighbor Classification
- Nearest Neighbor Classifier Motivation
- Nearest Neighbor Classifier: Foundations
- Nearest Neighbor Classifier: Example
- Nearest Neighbor Classification: Example
- Nearest Neighbor Decision Rules
ML L5: Nächste Nachbarn und Kerndichteschätzung
- Nearest-Neighbor as Density estimation
- Nearest Neighbor Classification and Density Estimation
- Predicting with Kernel Density Estimation with k=1,3,5,15
- Error Probability of Nearest Neighbors
- Nearest Neighbor Regression
- Nearest-Neighbor Classification: Comments & Discussion
ML L6: Lernen von Entscheidungsbäumen
- Decision Tree Learning
- Example (Variant of a Dataset in )
- Decision Tree Example
- Decision Trees as Rule-based Systems
- Basic Notions
- Constructing a Decision Tree /1
- Visual Interpretation of Decision Trees on R²
- Constructing a Decision Tree /2
- Decision Tree Classification: Example
ML L7: Splitkriterien bei Entscheidungsbäumen
- Decision Tree Splitting
- Split for Categorical Attributes
- Split for Numeric Attributes
- Best Split – Example
- Quality Measures for Splits
- Measure of Impurity: Gini Index
- Gini-Index: Example
- Information Gain
- Information Gain: Example
- Information Gain: Gain-Ratio
- Gain-Ratio: Beispiel
- Classification Error
- Gini, Entropy and Classification Error
- Comparing Split Selection Measures
- Splits for Numerical Attributes
ML L8: Ensembles und Meta-Learning: Random Forests und Gradient Boosting
- Ensembles and Meta-Learning
- Ensembles and Meta-Learning
- Error-Rate of Ensembles
- Random Forests
- Boosting
- Random Forest Classification: Example
- Gradient Boosting Classification: Example
ML L9: Support Vector Machinen - Motivation
- Support Vector Machine Motivation
- Support Vector Machines
- Support Vector Machines /2
- Finding the Best Separating Hyperplane
- Maximum Margin Hyperplane
ML L10: Affine Hyperebenen und Skalarprodukte – Geometrie für SVMs
ML L11: Maximum Margin Hyperplane – die “breitest mögliche Straße”
- Maximum Margin Hyperplane
- A Naïve Attempt
- Support Vectors – Separable Data
- Computing the Maximum Margin Hyperplane (MMH)
- Computing the Maximum Margin Hyperplane (MMH) /2
- Boundary of the Maximum Margin Hyperplane (MMH)
- Deriving the Primal SVM Optimization Problem
ML L12: Training Support Vector Machines
- Training Support Vector Machines
- Optimization Problem
- Karush-Kuhn-Tucker KKT Conditions
- Switching to the Dual Problem
- Classification with the Dual SVM
- Optimizing the λi
- Optimizing SVMs
- Sequential Minimal Optimization
- Further Improvements
ML L13: Non-linear SVM and the Kernel Trick
- Non-linear SVM and the Kernel Trick
- Nonlinear SVM
- Nonlinear SVM /2
- Kernel Functions
- Soft Margin SVM Classifier
- Soft Margin SVM Classifier /2
- Soft Margin SVM Classifier /3
- Soft Margin SVM Classifier /4
ML L14: SVM – Extensions and Conclusions
- SVM – Extensions and Conclusions
- Separation of more than 2 Classes
- Support Vector Regression
- Support Vector Regression Optimization Problem
- Support Vector Regression Dual
- Support Vector Data Description (SVDD)
- SVDD Dual Problem
- Support Vector Clustering
- SVMs: Comments & Discussion
ML L15: Motivation of Neural Networks
ML L16: Threshold Logic Units
- Threshold Logic Units
- Threshold Logic Units (TLUs)
- Threshold Logic Units – Example
- Geometric Interpretation of TLUs
- Exclusive-Or (XOR) Problem
- Exclusive-Or (XOR) Problem /2
- Exclusive-Or (XOR) Problem /3
- Universality of TLUs
- Mark I Perceptron
ML L17: General Artificial Neural Networks
- General Artificial Neural Networks
- Simplifying Threshold Logic Units
- Weight Matrices
- From TLUs to Multilayer Perceptrons
- Some Activation Functions
- Some Activation Functions /2
- Some Activation Functions /3
- Some Activation Functions /4
ML L18: Learning Neural Networks with Backpropagation
- Learning Neural Networks with Backpropagation
- Basic Gradient Descent
- Stochastic Gradient Descent
- Learning Single-Layer Perceptrons
- Backpropagation
- Training with Backpropagation
ML L19: Deep Neural Networks
- Deep Neural Networks
- Universal Approximation Theorem
- Deep vs. Wide Neural Networks
- High vs. Low Dimensionality
- (Early) Problems of Deep Learning
- Autoencoders
- Layer-wise Pre-Training of Deep Neural Networks
- Dropout Regularization
- Batch Normalization
- Choosing Activation Functions
ML L20: Convolutional Neural Networks
ML L21: Recurrent Neural Networks and LSTM
- Recurrent Neural Networks
- Recurrent Neural Networks (RNNs) on Sequences
- Recurrent Neural Networks (RNN)
- Long-Short Term Memory (LSTM)
- Further Developments
ML L22: Conclusion Classification
ML U1: Einleitung Clusteranalyse
- Cluster Analysis Introduction
- What is Clustering?
- What is Clustering? /2
- Applications of Clustering
- Basic Steps for Clustering
ML U2: Hierarchisches Clustering
- Hierarchical Agglomerative Clustering
- Distance of Clusters
- AGNES – Agglomerative Nesting
- AGNES – Agglomerative Nesting /2
- Extracting Clusters from a Dendrogram
- Benefits and Limitations of HAC
ML U3: Accelerating HAC mit Anderberg’s Algorithmus
- Accelerating Hierarchical Clustering
- Complexity of Hierarchical Clustering
- Anderberg’s Caching
- AGNES vs. Anderberg , NNChain , SLINK
- Example: Hierarchical Clustering with Anderberg
ML U4: k-Means Clustering
- K-means Clustering
- The Sum of Squares Objective
- The Standard Algorithm (Lloyd’s Algorithm)
- Non-determinism & Non-optimality
- Initialization
- Initialization /2
- Complexity of k-Means Clustering
ML U5: Accelerating k-Means Clustering
- Accelerating k-Means Clustering
- k-Means++: Weighted Random Initialization
- Making k-means Faster
- Bounding the Distances – Elkan and Hamerly
- Hamerly’s k-means
- Example: k-Means Clustering with Hamerly’s Algorithm
- Speedup with Hamerly, Elkan, and Exponion
ML U6: Limitations of k-Means Clustering
- Limitations of k-Means Clustering
- Benefits and Drawbacks of k-Means
- Choosing the ``Optimum’’ k for k-Means
- Limitations of k-Means
ML U7: Extensions of k-Means Clustering
- Extensions of k-Means Clustering
- k-Means and Distances
- k-Means Minimizes Sum of Squares, not Euclidean Distance!
- k-Means Variations for Other Distances
- Spherical k-Means for Text Clustering
- Pre-processing and Post-processing
ML U8: Partitioning Around Medoids (k-Medoids)
- Partitioning Around Medoids (k-Medoids)
- k-medoids Clustering
- Partitioning Around Medoids
- Algorithm: Partitioning Around Medoids
- Algorithm: Partitioning Around Medoids /2
- Change in TD
- Finding the Best Swap Faster
- k-Medoids, k-Means style
- Example for the Inferiority of k-Means Style k-Medoids
ML U9: Gaussian Mixture Modeling (EM Clustering)
- Gaussian Mixture Modeling Introduction
- Expectation-Maximization in Clustering
- Fitting Multiple Gaussian Distributions
- Gaussian Mixture Modeling as E-M-Optimization
- Algorithm: EM Clustering
- Numerical Issues in GMM
ML U10: Gaussian Mixture Modeling Demo
ML U11: BIRCH and BETULA Clustering
- BIRCH and BETULA
- BIRCH Clustering
- BIRCH Clustering Features
- BIRCH Distances
- BIRCH CF-Tree
- BETULA Cluster Features
- BETULA Distance Computations
- Accelerating k-Means with BIRCH and BETULA
- Accelerating GMM with BETULA
ML U12: Motivation Density-Based Clustering (DBSCAN)
ML U13: Density-reachable and density-connected (DBSCAN Clustering)
- Density-Based Clustering Fundamentals
- Density-based Clustering: Foundations
- Density-based Clustering: Foundations /2
- Density-based Clustering: Foundations /3
- Density-reachability and Density-connectivity
- Density-reachability
ML U14: DBSCAN Clustering
- DBSCAN
- Clustering Approach
- Abstract DBSCAN Algorithm
- DBSCAN Algorithm
- DBSCAN Algorithm /2
- DBSCAN Algorithm /3
- DBSCAN in Context
ML U15: Parameterization of DBSCAN
- DBSCAN Parameterization
- Choosing DBSCAN parameters
- Choosing DBSCAN parameters /2
- Choosing DBSCAN parameters /3
ML U16: Extensions and Variations of DBSCAN Clustering
- DBSCAN Extensions
- Generalized Density-based Clustering
- Grid-based Accelerated DBSCAN
- Anytime Density-Based Clustering (AnyDBC)
- Hierarchical DBSCAN* (HDBSCAN*)
- Improved DBSCAN Variations
ML U17: OPTICS Clustering
- OPTICS Clustering
- Density-based Hierarchical Clustering
- Density-based Hierarchical Clustering /2
- OPTICS Clustering
- Cluster Order
- OPTICS Algorithm
ML U18: Cluster Extraction from OPTICS Plots
- Cluster Extraction from OPTICS Plots
- OPTICS Reachability Plots
- Extracting Clusters from OPTICS Reachability Plots
- Role of the Parameters and minPts
ML U19: Understanding the OPTICS Cluster Order
- Understanding the OPTICS Cluster Order
- Properties of the OPTICS Cluster Order
- Cluster Order as Serialized Spanning Tree
- OPTICS as Density Spanning Trees
- Cluster Order to Dendrograms
ML U20: Spectral Clustering
- Spectral Clustering
- Minimum Cuts
- Graph Laplacian
- From Clustering Graphs to Clustering Data
- Spectral Clustering
- Spectral Clustering is Related to DBSCAN
ML U21: Biclustering and Subspace Clustering
- Biclustering and Subspace Clustering
- Biclustering & Subspace Clustering
- Bicluster Patterns
- Density-based Subspace Clustering
- Subspace Clustering with Apriori-Style Search
- Correlation Clustering
- 4C: Computing Correlation Connected Clusters
- Hough Transform
- CASH: Robust Clustering in Arbitrarily Oriented Subspaces