Which machine learning algorithm should I use? - Subconscious Musings It depends on what one wants to do. Look for patterns? (unsupervised) Predict some outputs from some inputs? (supervised)

Dimensional reduction can be a good first step, because if much of the variation is in a few directions, one can project the data onto those directions and get a less bulky dataset for other methods.

Data can be either numeric or categorical, being a member of some category. For example, a pet can be a dog, a cat, a rodent, a bird, a reptile, a fish, a tarantula, ...

I'll write out that article's flowchart as pseudocode.

Dimension Reduction?

Yes:

. Unsupervised Learning: Dimension Reduction

No:

. (do nothing)

End

Have Responses?

Yes:

. Predicting Numeric?

. Yes:

. . Supervised Learning: Regression

. No:

. . Supervised Learning: Classification

. End

No:

. Unsupervised Learning: Clustering

End

Here are the subcategories.

Unsupervised Learning: Dimension Reduction:

Topic Modeling?

Yes:

. Probabilistic?

. Yes:

. . Latent Dirichlet Analysis

. No:

. . Singular Value Decomposition

. End

No:

. Principal Component Analysis

End

Unsupervised Learning: Clustering:

Hierarchical?

Yes:

. Hierarchical

No:

. Need to Specify k?

. Yes:

. . Categorical Variables?

. . Yes:

. . . k-modes

. . No:

. . . Prefer Probability?

. . . Yes:

. . . . Gaussian Mixture Model

. . . No:

. . . . k-means

. . End

. No:

. . DBScan

. End

End

Supervised Learning: Regression:

Speed or Accuracy?

Speed:

. Decision Tree, Linear Regression

Accuracy:

. Random Forest, Neural Network, Gradient Boosting Tree

End

Supervised Learning: Classification:

Speed or Accuracy?

Speed:

. Explainable?

. Yes:

. . Decision Tree, Logistic Regression

. No:

. . Data Is Too Large?

. . Yes:

. . . Naive Bayes

. . No:

. . . Linear SVM, Naive Bayes

. End

Accuracy:

. Kernel SVM, Random Forest, Neural Network, Gradient Boosting Tree

End

As one can see, this is a well-developed field, though one being extended with "deep learning" algorithms for huge datasets like big collections of image files.