Chapter 13 Optimization Algorithm

1) Question:

13.1 What is the difference between CPU and GPU? 314
13.2 What to do if you don't have enough training samples? 315
13.3 What sample sets are not suitable for deep learning? 315
13.4 Is it possible to find a better algorithm than the known algorithm? 316
13.5 What is collinearity and is there a correlation with the fit? 316
13.6 How is the generalized linear model applied in deep learning? 316
13.7 What causes the gradient to disappear? 317
13.8 What are the weight initialization methods? 317
13.9 How to avoid falling into the local optimal solution and move towards the global maximum in the heuristic optimization algorithm? 318
13.10 How to improve the Gradient Descent method in convex optimization to prevent locking in to a local optimal solution 319
13.11 What are some common loss functions? 319
13.14 How to select features 321
13.14.1 How to approacj feature selection 321
13.14.2 Classification of feature selection methods 321
13.14.3 Feature selection purpose 322
13.15 Gradient disappearance / Gradient explosion causes, and solutions 322
13.15.1 Why use gradient update rules? 322
13.15.2 Does the gradient disappear and what is the cause of the explosion? 323
13.15.3 Solutions for Gradient Disappearance and Explosion 324
13.16 Why does deep learning not use second-order optimization?
13.17 How to optimize a deep learning system? 326
13.18 Why set a single numerical evaluation indicator? 326
13.19 Satisfiying and optimizing metrics 327
13.20 How to divide a dataset into training, development and test sets 328
13.21 How to set Development/Test Set Size 329
13.22 When should I change development and test sets and metrics? 329
13.23 What is the significance of setting the evaluation indicators? 330
13.24 What is the avoidance of deviation? 331
13.25 What is the TOP5 error rate? 331
13.26 What is the human error rate? 332
13.27 Can we avoid the relationship between deviation and several error rates? 332
13.28 How to choose to avoid deviation and Bayesian error rate? 332
13.29 How to reduce the variance? 333
13.30 Best estimate of Bayesian error rate 333
13.31 How many examples of machine learning are required before the algorithm surpasses human performance? 334
13.32 How can I improve my model? 334
13.33 Understanding Error Analysis 335
13.34 Why is it worth the time to look at the error flag data? 336
13.35 What is the significance of quickly setting up the initial system? 336
13.36 Why should I train and test on different divisions? 337
13.37 How to solve the data mismatch problem? 338
13.38 Gradient Test Considerations? 340
13.39 What is the random gradient drop? 341
13.40 What is the batch gradient drop? 341
13.41 What is the small batch gradient drop? 341
13.42 How to configure the mini-batch gradient to drop 342
13.43 Locally Optimal Problems 343
13.44 Improving Algorithm Performance Ideas 346

2) Answer:

References:

Page updated

Google Sites

Report abuse

Chapter 13 Optimization Algorithm

About Me: