Cross Validation IRIS data with various Models, Gaussian Classifier and SVM

1 Cross Validation Data Design

The Iris dataset is composed of 150 samples of 4-dimensional vectors with 1 integer label. There are 3

distinct labels, and there are exactly 50 samples per label. We first want to 5-fold cross validation as

follows:

For class 1, split data into 5 folds: sample numbers 1-10, 11-20, 21-30, 31-40, and 41-50, and they

are named as f11, f12, f13, f14, and f15, respectively.

  1. For class 2, its folds are f21, f22, f23, f24, and f25.

  2. For class 3, its folds are f31, f32, f33, f34, and f35.

  3. Create a training data set by R1 = {f11, . . ., f14, f21, . . ., f24, f31, . . ., f34 }, and a test set by T1 = {f15, f25, f35 }.

  4. Use R1 to train the above 6 Gaussian classifiers, calculate the accuracy on T1.

  5. Repeat the above with R2-R5 and T2-T5, to obtain 5 accuracies.

  6. Find the average accuracies, and determine the best Gaussian classifier for Iris dataset.

2 Gaussian Classifier Design

There are 6 different types of uni-model Gaussian classifiers.

  1. Σc = σ2I

  2. Σc = Σ = diag (σ1, . . ., σm**)**

  3. Σc = Σ

  4. Σc = σc2I (alpha C suare I)

  5. Σc1 /= Σc2 – general case

  6. Σc1 /= Σc2, Σc = diag (σc,1, . . ., σc,m**) – diagonal co-variance case**

For given data in the form of matrix, (dimension) × (number of samples), or (number of samples) ×

(dimension), depending on your implementation, write 6 methods (functions) to estimate mean and

covariance matrix of the above 6 techniques.

Perform 5-fold cross-validation experiments for all 6 methods

Evaluate the average performace

Determine which is the best out of 6 methods

You may use python with numpy, scikit-learn, etc.

3 Support Vector Machines

The hyperparamters of SVM are C (non-separability) and kernel-specific parameters. Use 5-fold cross

validation to

Determine the best C and degree (order, rank, etc.) of the polynomial kernel functions.

Determine the best C and the standard deviation of the Gaussian kernel function (or called

paramter of RBF kernel).

These hyperparameter selection process is called grid search because the combinations of discrete param-

eter selection constitute a grid in a multidimensional vector space, and we look into every grid to find

the optimal hyperparameter set.

The critical design issue is the selection of discretizing continuous parameter space: for example, the

usual choice of C = {1, 10, 100, . . .}.