# Resampling: Cross-Validated (10 fold, repeated 5 times) # Pre-processing: centered (4), scaled (4) # 3 classes: 'setosa', 'versicolor', 'virginica' The model will be tested and trained several times on subsets of the training data to increase the accuracy in the test data. This process decreases over-fitting in the training set and helps the model work on an unknown or new dataset. We also define a 10 fold cross validation method to be repeated 5 times. We use 80% of the dataset to use into train and the remaining 20% into test. Now we split up the dataset into a train and a test partition. An intuition is that we can predict Setosa easily and might have some challenges in Versicolor and Virginica. An observation from this plot is that Versicolor and Virginica have similar patterns and Setosa is quite distinct. The following plot command from the caret package can show us how the Species values are distributed in a pairwise plot. # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # Sepal.Length Sepal.Width Petal.Length Petal.Width We can also set a seed value to output reproducable results. A quick summary() and head() also gives us a nice introduction to the dataset. The Iris dataframe is already included in R which we can attach using the data() command. # The following objects are masked from 'package:stats': # Type 'citation("pROC")' for a citation. Let’s jump into the code.Ĭalling/Invoking all the necessary packages in R #Calling libraries In this blog, I will use the caret package from R to predict the species class of various Iris flowers. It includes three iris species with 50 samples each as well as some properties about each flower. The Iris dataset was used in Fisher’s classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |