异常数据点检测:
One-class SVM
PCA-based anomaly detection
聚类:
K-means
预测值:
数据分类是排序的:Ordinal regression
预测事件次数:Poisson regression
预测分布:Fast forest quantile regression
快速训练,线性模型:Linear regression
线性模型,小数据集:Bayesian linear regression
精确,但训练时间长:Neural network regression
精确,快速训练:Decision forest regression
精确,快速学习,占用多内存:Boosted decision tree regression
预测两个分类(二元分类):
大于100个特征,线性模型:Two-class SVM
快速训练,线性模型:Two-class averaged perceptron
快速训练,线性模型:Two-class logistic regression
快速训练,线性模型:Two-class Bayes point machine
精确,快速训练:Two-class decision forest
精确,快速训练,占用多内存:Two-class boosted decision tree
精确,占用内存少:Two-class decision jungle
大于100个特征:Two-class locally deep SVM
精确,训练时间长:Two-class neural network
预测多个分类(多元分类):
快速训练,线性模型:Multiclass logistic regression
精确,训练时间长:Multiclass neural network
精确,快速训练:Multiclass decision forest
精确,占用内存少:Multiclass decision jungle
依赖于二元分类器:One-v-all multiclass
在实际应用中,先分清问题类别(值预测,分类预测,聚类),然后一一尝试该问题分类中的每种算法,最后再做决定。
参考链接 : https://azure.microsoft.com/en-us/documentation/articles/machine-learning-algorithm-cheat-sheet/