Multi-learning
{Multi-class, Multi-label, Muli-output}
{Multi-class, Multi-label, Muli-output}
1) Multi-class classification:
Multiclass or multinomial classification is the problem of classifying instances into one of three or more classes (classifying into one of two classes is called binary classification). Each sample can only be labeled as one class.
Ex: Classification using features extracted from a set of images of fruit.
Each image may either be of an orange, an apple, or a pear.
Each image is one sample and is labeled as one of the 3 possible classes.
=> It makes the assumption that each sample is assigned to one and only one label - one sample cannot, for example, be both a pear and an apple.
General strategies:
Transformation to binary:
One-vs.-rest.
One-vs.-one.
Extension from binary:
Neural networks.
Extreme learning machines (ELM).
k-nearest neighbours.
Naive Bayes.
Decision trees.
Support vector machines.
Hierarchical classification
2) Multi-label classification:
Multi-label classification (related problem of multi-output classification) is a task where an instance may be associated with multiple labels (labeling each sample with m labels from n_classes possible classes, where m can be 0 to n_classes inclusive).
There is no constraint on how many of the classes the instance can be assigned to. This can be thought of as predicting properties of a sample that are not mutually exclusive.
Ex: Prediction of the topics relevant to a text document or video.
The document or video may be about one of ‘religion’, ‘politics’, ‘finance’ or ‘education’, several of the topic classes or all of the topic classes.
Problem transformation:
Transformation into binary classification problems.
Transformation into multi-class classification problem.
Ensemble methods.
Adapted algorithms:
k-nearest neighbors
Decision trees
Kernel methods for vector output
Neural networks
3) Multi-task classification:
Multiclass-multioutput classification (also known as multi-task classification) is a subfield of classification task which labels each sample with a set of non-binary properties.
This is both a generalization of the multi-label classification task, which only considers binary attributes, as well as a generalization of the multi-class classification task, where only one property is considered.
Multi-task learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better.[3]
In the classification context, MTL aims to improve the performance of multiple classification tasks by learning them jointly. A single estimator thus handles several joint classification tasks.
Multi-task learning works because regularization induced by requiring an algorithm to perform well on a related task can be superior to regularization that prevents overfitting by penalizing all complexity uniformly. One situation where MTL may be particularly helpful is if the tasks share significant commonalities and are generally slightly under-sampled.
Ex: Classification of the properties “type of fruit” and “colour” for a set of images of fruit.
The property “type of fruit” has the possible classes: “apple”, “pear” and “orange”.
The property “colour” has the possible classes: “green”, “red”, “yellow” and “orange”.
Each sample is an image of a fruit, a label is output for both properties and each label is one of the possible classes of the corresponding property.
Methods:
Task grouping and overlap
Exploiting unrelated tasks
Transfer of knowledge
Group online adaptive learning
References:
Multi-class Classification:
Survey on multiclass classification methods
Pattern Recognition and Machine Learning
A novel progressive learning technique for multi-class classification
Progressive Learning Technique
Multi-label Classification:
Deep Learning for Multi-label Classification_2014 [Link]
Classifier Chains for Multi-label Classification
Multilabel classification for exploiting cross-resistance information in HIV-1 drug resistance prediction
Exploiting HIV-1 protease and reverse transcriptase cross-resistance information for improved drug resistance prediction by means of multi-label classification
DRABAL: novel method to mine large high-throughput screening assays using Bayesian active learning
A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach
Discrimination Threshold — yellowbrick 0.9 documentation
Random k-labelsets: An ensemble method for multilabel classification
ML-KNN: A lazy learning approach to multi-label learning.
An extensive experimental comparison of methods for multi-label learning
Constructing a multi-valued and multi-labeled decision tree
MMDT: a multi-valued and multi-labeled decision tree classifier for data mining
Combine multi-valued attribute decomposition with multi-label learning
Multi-label neural networks with applications to functional genomics and text categorization
Data Streams. Advances in Database Systems.
Online Bagging and Boosting
Multi-label Classification Using Ensembles of Pruned Sets
Multi-label classification via multi-target regression on data streams
Multi-label classification from high-speed data streams with adaptive model rules and random rules
Scalable and efficient multi-label classification for evolving data streams
Learning from Time-Changing Data with Adaptive Windowing
A Novel Online Stacked Ensemble for Multi-Label Stream Classification.
Dealing with concept drift and class imbalance in multi-label stream classification.
Discriminative methods for multi-labeled classification
On the stratification of multi-label data
Multilabel Classification with R Package mlr.
Multi-task Classification:
https://en.wikipedia.org/wiki/Multi-task_learning#cite_note-:2-3
A model of inductive bias learning
Is learning the n-th thing any easier than learning the first?
Multi-task learning
Learning from hints in neural networks
Convex Learning of Multiple Tasks and their Structure
Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data
Learning Task Grouping and Overlap in Multi-Task Learning.
A Convex Feature Learning Formulation for Latent Task Structure Discovery.
Hierarchical Regularization Cascade for Joint Learning.
Going deeper with convolutions
Deep Learning Overview
Group online adaptive learning
Learning output kernels with block coordinate descent
Clustered multi-task learning: A convex formulation
Collaborative Email-Spam Filtering with the Hashing-Trick
Multi-Task Learning for Boosting with Application to Web Search Ranking.
MALSAR: Multi-tAsk Learning via StructurAl Regularization.
Regularized multi–task learning.
Learning multiple tasks with kernel methods
Convex multi-task feature learning
Integrating low-rank and group-sparse structures for robust multi-task learning.
An accelerated gradient method for trace norm minimization.
A framework for learning predictive structures from multiple tasks and unlabeled data" (PDF).
A convex formulation for learning shared structures from multiple tasks.
Learning incoherent sparse and low-rank patterns from multiple tasks.
Clustered multi-task learning: A convex formulation.
Clustered multi-task learning via alternating structure optimization.
Citation:
Multi-class
Mohamed, Aly (2005). "Survey on multiclass classification methods" (PDF). Technical Report, Caltech.
Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer.
Venkatesan, Rajasekar; Meng Joo, Er (2016). "A novel progressive learning technique for multi-class classification". Neurocomputing. 207: 310–321. arXiv:1609.00085. doi:10.1016/j.neucom.2016.05.006.
Venkatesan, Rajasekar. "Progressive Learning Technique"
Multi-label
Jesse Read, Bernhard Pfahringer, Geoff Holmes, Eibe Frank. Classifier Chains for Multi-label Classification. Machine Learning Journal. Springer. Vol. 85(3), (2011).
Heider, D; Senge, R; Cheng, W; Hüllermeier, E (2013). "Multilabel classification for exploiting cross-resistance information in HIV-1 drug resistance prediction". Bioinformatics. 29 (16): 1946–52. doi:10.1093/bioinformatics/btt331. PMID 23793752.
Riemenschneider, M; Senge, R; Neumann, U; Hüllermeier, E; Heider, D (2016). "Exploiting HIV-1 protease and reverse transcriptase cross-resistance information for improved drug resistance prediction by means of multi-label classification". BioData Mining. 9: 10. doi:10.1186/s13040-016-0089-1. PMC 4772363. PMID 26933450.
Soufan, Othman; Ba-Alawi, Wail; Afeef, Moataz; Essack, Magbubah; Kalnis, Panos; Bajic, Vladimir B. (2016-11-10). "DRABAL: novel method to mine large high-throughput screening assays using Bayesian active learning". Journal of Cheminformatics. 8: 64. doi:10.1186/s13321-016-0177-8. ISSN 1758-2946. PMC 5105261. PMID 27895719.
Spolaôr, Newton; Cherman, Everton Alvares; Monard, Maria Carolina; Lee, Huei Diana (March 2013). "A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach". Electronic Notes in Theoretical Computer Science. 292: 135–151. doi:10.1016/j.entcs.2013.02.010. ISSN 1571-0661.
"Discrimination Threshold — yellowbrick 0.9 documentation". www.scikit-yb.org. Retrieved 2018-11-29.
Tsoumakas, Grigorios; Vlahavas, Ioannis (2007). Random k-labelsets: An ensemble method for multilabel classification(PDF). ECML. Archived from the original (PDF) on 2014-07-29. Retrieved 2014-07-26.
^ Zhang, M.L.; Zhou, Z.H. (2007). "ML-KNN: A lazy learning approach to multi-label learning". Pattern Recognition. 40 (7): 2038–2048. CiteSeerX 10.1.1.538.9597. doi:10.1016/j.patcog.2006.12.019.
Madjarov, Gjorgji; Kocev, Dragi; Gjorgjevikj, Dejan; Džeroski, Sašo (2012). "An extensive experimental comparison of methods for multi-label learning". Pattern Recognition. 45 (9): 3084–3104. doi:10.1016/j.patcog.2012.03.004.
Chen, Yen-Liang; Hsu, Chang-Ling; Chou, Shih-chieh (2003). "Constructing a multi-valued and multi-labeled decision tree". Expert Systems with Applications. 25 (2): 199–209. doi:10.1016/S0957-4174(03)00047-2.
Chou, Shihchieh; Hsu, Chang-Ling (2005-05-01). "MMDT: a multi-valued and multi-labeled decision tree classifier for data mining". Expert Systems with Applications. 28 (4): 799–812. doi:10.1016/j.eswa.2004.12.035.
Li, Hong; Guo, Yue-jian; Wu, Min; Li, Ping; Xiang, Yao (2010-12-01). "Combine multi-valued attribute decomposition with multi-label learning". Expert Systems with Applications. 37 (12): 8721–8728. doi:10.1016/j.eswa.2010.06.044.
Zhang, M.L.; Zhou, Z.H. (2006). Multi-label neural networks with applications to functional genomics and text categorization(PDF). IEEE Transactions on Knowledge and Data Engineering. 18. pp. 1338–1351.
Aggarwal, Charu C., ed. (2007). Data Streams. Advances in Database Systems. 31. doi:10.1007/978-0-387-47534-9. ISBN 978-0-387-28759-1.
Oza, Nikunj (2005). "Online Bagging and Boosting". IEEE International Conference on Systems, Man and Cybernetics. hdl:2060/20050239012.
Read, Jesse; Pfahringer, Bernhard; Holmes, Geoff (2008-12-15). Multi-label Classification Using Ensembles of Pruned Sets. IEEE Computer Society. pp. 995–1000. doi:10.1109/ICDM.2008.74. hdl:10289/8077. ISBN 9780769535029. S2CID 16059274.
Osojnik, Aljaź; Panov, PanăźE; DźEroski, Sašo (2017-06-01). "Multi-label classification via multi-target regression on data streams". Machine Learning. 106 (6): 745–770. doi:10.1007/s10994-016-5613-5. ISSN 0885-6125.
Sousa, Ricardo; Gama, João (2018-01-24). "Multi-label classification from high-speed data streams with adaptive model rules and random rules". Progress in Artificial Intelligence. 7 (3): 177–187. doi:10.1007/s13748-018-0142-z. ISSN 2192-6352. S2CID 32376722.
Read, Jesse; Bifet, Albert; Holmes, Geoff; Pfahringer, Bernhard (2012-02-21). "Scalable and efficient multi-label classification for evolving data streams". Machine Learning. 88(1–2): 243–272. doi:10.1007/s10994-012-5279-6. ISSN 0885-6125.
Bifet, Albert; Gavaldà, Ricard (2007-04-26), "Learning from Time-Changing Data with Adaptive Windowing", Proceedings of the 2007 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, pp. 443–448, CiteSeerX 10.1.1.215.8387, doi:10.1137/1.9781611972771.42, ISBN 9780898716306
Büyükçakir, Alican; Bonab, Hamed; Can, Fazli (2018-10-17). A Novel Online Stacked Ensemble for Multi-Label Stream Classification. ACM. pp. 1063–1072. arXiv:1809.09994. doi:10.1145/3269206.3271774. ISBN 9781450360142. S2CID 52843253.
Xioufis, Eleftherios Spyromitros; Spiliopoulou, Myra; Tsoumakas, Grigorios; Vlahavas, Ioannis (2011-07-16). Dealing with concept drift and class imbalance in multi-label stream classification. AAAI Press. pp. 1583–1588. doi:10.5591/978-1-57735-516-8/IJCAI11-266. ISBN 9781577355144.
Godbole, Shantanu; Sarawagi, Sunita (2004). Discriminative methods for multi-labeled classification (PDF). Advances in Knowledge Discovery and Data Mining. pp. 22–30.
Sechidis, Konstantinos; Tsoumakas, Grigorios; Vlahavas, Ioannis (2011). On the stratification of multi-label data (PDF). ECML PKDD. pp. 145–158.
Philipp Probst, Quay Au, Giuseppe Casalicchio, Clemens Stachl, Bernd Bischl. Multilabel Classification with R Package mlr. The R Journal (2017) 9:1, pages 352-369.
Multi-task:
Baxter, J. (2000). A model of inductive bias learning" Journal of Artificial Intelligence Research 12:149--198, On-line paper
Thrun, S. (1996). Is learning the n-th thing any easier than learning the first?. In Advances in Neural Information Processing Systems 8, pp. 640--646. MIT Press. Paper at Citeseer
Caruana, R. (1997). "Multi-task learning" (PDF). Machine Learning. 28: 41–75. doi:10.1023/A:1007379606734.
Suddarth, S., Kergosien, Y. (1990). Rule-injection hints as a means of improving network performance and learning time. EURASIP Workshop. Neural Networks pp. 120-129. Lecture Notes in Computer Science. Springer.
Abu-Mostafa, Y. S. (1990). "Learning from hints in neural networks". Journal of Complexity. 6 (2): 192–198. doi:10.1016/0885-064x(90)90006-y.
Weinberger, Kilian. "Multi-task Learning".
Ciliberto, C. (2015). "Convex Learning of Multiple Tasks and their Structure". arXiv:1504.03101 [cs.LG].
Hajiramezanali, E. & Dadaneh, S. Z. & Karbalayghareh, A. & Zhou, Z. & Qian, X. Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data. 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada. arXiv:1810.09433
Romera-Paredes, B., Argyriou, A., Bianchi-Berthouze, N., & Pontil, M., (2012) Exploiting Unrelated Tasks in Multi-Task Learning. http://jmlr.csail.mit.edu/proceedings/papers/v22/romera12/romera12.pdf
Kumar, A., & Daume III, H., (2012) Learning Task Grouping and Overlap in Multi-Task Learning. http://icml.cc/2012/papers/690.pdf
Jawanpuria, P., & Saketha Nath, J., (2012) A Convex Feature Learning Formulation for Latent Task Structure Discovery. http://icml.cc/2012/papers/90.pdf
Zweig, A. & Weinshall, D. Hierarchical Regularization Cascade for Joint Learning. Proceedings: of 30th International Conference on Machine Learning (ICML), Atlanta GA, June 2013. http://www.cs.huji.ac.il/~daphna/papers/Zweig_ICML2013.pdf
Szegedy, Christian; Wei Liu, Youssef; Yangqing Jia, Tomaso; Sermanet, Pierre; Reed, Scott; Anguelov, Dragomir; Erhan, Dumitru; Vanhoucke, Vincent; Rabinovich, Andrew (2015). "Going deeper with convolutions". 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1–9. arXiv:1409.4842. doi:10.1109/CVPR.2015.7298594. ISBN 978-1-4673-6964-0.
Roig, Gemma. "Deep Learning Overview" (PDF).
Zweig, A. & Chechik, G. Group online adaptive learning. Machine Learning, DOI 10.1007/s10994-017- 5661-5, August 2017. http://rdcu.be/uFSv
Dinuzzo, Francesco (2011). "Learning output kernels with block coordinate descent" (PDF). Proceedings of the 28th International Conference on Machine Learning (ICML-11). Archived from the original (PDF) on 2017-08-08.
Jacob, Laurent (2009). "Clustered multi-task learning: A convex formulation". Advances in Neural Information Processing Systems. arXiv:0809.2085. Bibcode:2008arXiv0809.2085J.
Attenberg, J., Weinberger, K., & Dasgupta, A. Collaborative Email-Spam Filtering with the Hashing-Trick. http://www.cse.wustl.edu/~kilian/papers/ceas2009-paper-11.pdf
Chappelle, O., Shivaswamy, P., & Vadrevu, S. Multi-Task Learning for Boosting with Application to Web Search Ranking. http://www.cse.wustl.edu/~kilian/papers/multiboost2010.pdf
Zhou, J., Chen, J. and Ye, J. MALSAR: Multi-tAsk Learning via StructurAl Regularization. Arizona State University, 2012. http://www.public.asu.edu/~jye02/Software/MALSAR. On-line manual
Evgeniou, T., & Pontil, M. (2004). Regularized multi–task learning. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 109–117).
Evgeniou, T.; Micchelli, C.; Pontil, M. (2005). "Learning multiple tasks with kernel methods" (PDF). Journal of Machine Learning Research. 6: 615.
Argyriou, A.; Evgeniou, T.; Pontil, M. (2008a). "Convex multi-task feature learning". Machine Learning. 73 (3): 243–272. doi:10.1007/s10994-007-5040-8.
Chen, J., Zhou, J., & Ye, J. (2011). Integrating low-rank and group-sparse structures for robust multi-task learning. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.
Ji, S., & Ye, J. (2009). An accelerated gradient method for trace norm minimization. Proceedings of the 26th Annual International Conference on Machine Learning (pp. 457–464).
Ando, R.; Zhang, T. (2005). "A framework for learning predictive structures from multiple tasks and unlabeled data" (PDF). The Journal of Machine Learning Research. 6: 1817–1853.
Chen, J., Tang, L., Liu, J., & Ye, J. (2009). A convex formulation for learning shared structures from multiple tasks. Proceedings of the 26th Annual International Conference on Machine Learning (pp. 137–144).
Chen, J., Liu, J., & Ye, J. (2010). Learning incoherent sparse and low-rank patterns from multiple tasks. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1179–1188).
Jacob, L., Bach, F., & Vert, J. (2008). Clustered multi-task learning: A convex formulation. Advances in Neural Information Processing Systems, 2008
Zhou, J., Chen, J., & Ye, J. (2011). Clustered multi-task learning via alternating structure optimization. Advances in Neural Information Processing Systems.