Chapter 2 Fundamentals of Machine Learning

1) Question:

2.1 Various common algorithm illustrations 14
2.2 What is supervised learning, unsupervised learning, semi-supervised learning, and weak supervised learning? 15
2.3 What are the steps insupervised learning? 16
2.4 What is multi-instance learning? 17
2.5 What is the difference between classification networks and regression? 17
2.6 What is a neural network? 17
2.7 Advantages and Disadvantages of Common Classification Algorithms? 18
2.8 Is the correct rate good for evaluating classification algorithms? 20
2.9 How to evaluate a classification algorithm? 20
2.10 What kind of classifier is the best? 22
2.11 The relationship between big data and deep learning 22
2.12 Understanding Local Optimization and Global Optimization 23
2.13 Understanding Logistic Regression 24
2.14 What is the difference between logistic regression and a Naive Bayes Classifier? twenty four
2.15 Why do you need a cost function? 25
2.16 Principle of the function of the cost function 25
2.17 Why is the cost function non-negative? 26
2.18 Common cost function? 26
2.19 Why use cross entropy instead of quadratic cost function? 28
2.20 What is a loss function? 28
2.21 Common loss function 28
2.22 Why does logistic regression use a logarithmic loss function? 30
2.22 How does the logarithmic loss function measure loss? 31
2.23 Why do gradients need to be reduced in machine learning? 32
2.24 What are the disadvantages of the gradient descent method? 32
2.25 Gradient descent method intuitive understanding? 32
2.23 What is the description of the gradient descent algorithm? 33
2.24 How to tune the gradient descent method? 35
2.25 What is the difference between random gradients and batch gradients? 35
2.26 Performance Comparison of Various Gradient Descent Methods 37
2.27 How to calculate the derivative calculation diagram of a graph? 37
2.28 Summary of thoughts on Linear Discriminant Analysis (LDA) 39
2.29 Ideas on Graphical LDA Core 39
2.30 What are the principles of the second class LDA algorithm? 40
2.30 LDA algorithm flow summary 41
2.31 What is the difference between LDA and PCA? 41
2.32 LDA advantages and disadvantages 41
2.33 Summary of thoughts on Principal Component Analysis (PCA) 42
2.34 Graphical PCA Core Ideas 42
2.35 PCA algorithm reasoning 43
2.36 Summary of PCA Algorithm Flow 44
2.37 Main advantages and disadvantages of PCA algorithm 45
2.38 Necessity and purpose of dimensionality reduction 45
2.39 What is the difference between KPCA and PCA? 46
2.40 Model Evaluation 47
2.40.1 Common methods for model evaluation? 47
2.40.2 Empirical error and generalization error 47
2.40.3 Graphic under-fitting, over-fitting 48
2.40.4 How to solve over-fitting and under-fitting? 49
2.40.5 The main role of cross-validation? 50
2.40.6 k fold cross validation? 50
2.40.7 Confusion Matrix 50
2.40.8 Error Rate and Accuracy 51
2.40.9 Precision and recall rate 51
2.40.10 ROC and AUC 52
2.40.11 How to draw ROC curve? 53
2.40.12 How to calculate TPR, FPR? 54
2.40.13 How to calculate Auc? 56
2.40.14 Why use Roc and Auc to evaluate the classifier? 56
2.40.15 Intuitive understanding of AUC 56
2.40.16 Cost-sensitive error rate and cost curve 57
2.40.17 What are the comparison test methods for the model 59
2.40.18 Deviation and variance 59
2.40.19 Why use standard deviation? 60
2.40.20 Point Estimation Thoughts 61
2.40.21 Point Estimation Goodness Principle 61
2.40.22 The connection between point estimation, interval estimation, and central limit theorem? 62
2.40.23 What causes the category imbalance? 62
2.40.24 Common Category Unbalance Problem Resolution 62
2.41 Decision Tree 64
2.41.1 Basic Principles of Decision Trees 64
2.41.2 Three elements of the decision tree? 64
2.41.3 Decision Tree Learning Basic Algorithm 65
2.41.4 Advantages and Disadvantages of Decision Tree Algorithms 65
2.40.5 Concept of entropy and understanding 66
2.40.6 Understanding of Information Gain 66
2.40.7 The role and strategy of pruning treatment? 67
2.41 Support Vector Machine 67
2.41.1 What is a support vector machine 67
2.25.2 What problems are solved by the support vector machine? 68
2.25.2 What doesthe kernel function do? 69
2.25.3 What is the Dual Problem? 69
2.25.4 Understanding Support Vector Regression 69
2.25.5 Understanding SVM (Nuclear Function) 69
2.25.6 What are common kernel functions? 69
2.25.6 Soft Interval and Regularization 73
2.25.7 Main features and disadvantages of SVM 73
2.26 Bayesian 74
2.26.1 Graphical Maximum Likelihood Estimate 74
2.26.2 What is the difference between a naive Bayes classifier and a general Bayesian classifier? 76
2.26.4 Plain and semi-simple Bayesian classifiers 76
2.26.5 Three typical structures of Bayesian network 76
2.26.6 What is the Bayesian error rate 76
2.26.7 What is the Bayesian optimal error rate? 76
2.27 EM algorithm to solve problems and implementation process 76
2.28 Why is there a dimensionality disaster? 78
2.29 How to avoid dimension disasters 82
2.30 What is the difference and connection between clustering and dimension reduction? 82
2.31 Differences between GBDT and random forests 83
2.32 Comparison of four clustering methods 84

2) Answer:

References:

Chapter 2 Fundamentals of Machine Learning

About Me: