People often quantify how much they perform exercises, but little investigation focuses on how well they do exercises. This dataset is aimed to study how our body parts move when doing dumbbell lifts correctly. Further information about the data can be found in “Weight Lifting Exercises Dataset” section of the data source page below.
training <- read.csv("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv")[,-1]
testing <- read.csv("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv")[,-1]
str(training)
'data.frame': 19622 obs. of 159 variables:
$ user_name : Factor w/ 6 levels "adelmo","carlitos",..: 2 2 2 2 2 2 2 2 2 2 ...
$ raw_timestamp_part_1 : int 1323084231 1323084231 1323084231 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 ...
$ raw_timestamp_part_2 : int 788290 808298 820366 120339 196328 304277 368296 440390 484323 484434 ...
$ cvtd_timestamp : Factor w/ 20 levels "02/12/2011 13:32",..: 9 9 9 9 9 9 9 9 9 9 ...
$ new_window : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
$ num_window : int 11 11 11 12 12 12 12 12 12 12 ...
$ roll_belt : num 1.41 1.41 1.42 1.48 1.48 1.45 1.42 1.42 1.43 1.45 ...
$ pitch_belt : num 8.07 8.07 8.07 8.05 8.07 8.06 8.09 8.13 8.16 8.17 ...
$ yaw_belt : num -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 ...
$ total_accel_belt : int 3 3 3 3 3 3 3 3 3 3 ...
$ kurtosis_roll_belt : Factor w/ 397 levels "","-0.016850",..: 1 1 1 1 1 1 1 1 1 1 ...
$ kurtosis_picth_belt : Factor w/ 317 levels "","-0.021887",..: 1 1 1 1 1 1 1 1 1 1 ...
$ kurtosis_yaw_belt : Factor w/ 2 levels "","#DIV/0!": 1 1 1 1 1 1 1 1 1 1 ...
$ skewness_roll_belt : Factor w/ 395 levels "","-0.003095",..: 1 1 1 1 1 1 1 1 1 1 ...
$ skewness_roll_belt.1 : Factor w/ 338 levels "","-0.005928",..: 1 1 1 1 1 1 1 1 1 1 ...
$ skewness_yaw_belt : Factor w/ 2 levels "","#DIV/0!": 1 1 1 1 1 1 1 1 1 1 ...
$ max_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
$ max_picth_belt : int NA NA NA NA NA NA NA NA NA NA ...
$ max_yaw_belt : Factor w/ 68 levels "","-0.1","-0.2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ min_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
$ min_pitch_belt : int NA NA NA NA NA NA NA NA NA NA ...
$ min_yaw_belt : Factor w/ 68 levels "","-0.1","-0.2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ amplitude_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
$ amplitude_pitch_belt : int NA NA NA NA NA NA NA NA NA NA ...
$ amplitude_yaw_belt : Factor w/ 4 levels "","#DIV/0!","0.00",..: 1 1 1 1 1 1 1 1 1 1 ...
$ var_total_accel_belt : num NA NA NA NA NA NA NA NA NA NA ...
$ avg_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
$ stddev_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
$ var_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
$ avg_pitch_belt : num NA NA NA NA NA NA NA NA NA NA ...
$ stddev_pitch_belt : num NA NA NA NA NA NA NA NA NA NA ...
$ var_pitch_belt : num NA NA NA NA NA NA NA NA NA NA ...
$ avg_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
$ stddev_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
$ var_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
$ gyros_belt_x : num 0 0.02 0 0.02 0.02 0.02 0.02 0.02 0.02 0.03 ...
$ gyros_belt_y : num 0 0 0 0 0.02 0 0 0 0 0 ...
$ gyros_belt_z : num -0.02 -0.02 -0.02 -0.03 -0.02 -0.02 -0.02 -0.02 -0.02 0 ...
$ accel_belt_x : int -21 -22 -20 -22 -21 -21 -22 -22 -20 -21 ...
$ accel_belt_y : int 4 4 5 3 2 4 3 4 2 4 ...
$ accel_belt_z : int 22 22 23 21 24 21 21 21 24 22 ...
$ magnet_belt_x : int -3 -7 -2 -6 -6 0 -4 -2 1 -3 ...
$ magnet_belt_y : int 599 608 600 604 600 603 599 603 602 609 ...
$ magnet_belt_z : int -313 -311 -305 -310 -302 -312 -311 -313 -312 -308 ...
$ roll_arm : num -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 ...
$ pitch_arm : num 22.5 22.5 22.5 22.1 22.1 22 21.9 21.8 21.7 21.6 ...
$ yaw_arm : num -161 -161 -161 -161 -161 -161 -161 -161 -161 -161 ...
$ total_accel_arm : int 34 34 34 34 34 34 34 34 34 34 ...
$ var_accel_arm : num NA NA NA NA NA NA NA NA NA NA ...
$ avg_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
$ stddev_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
$ var_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
$ avg_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
$ stddev_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
$ var_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
$ avg_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
$ stddev_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
$ var_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
$ gyros_arm_x : num 0 0.02 0.02 0.02 0 0.02 0 0.02 0.02 0.02 ...
$ gyros_arm_y : num 0 -0.02 -0.02 -0.03 -0.03 -0.03 -0.03 -0.02 -0.03 -0.03 ...
$ gyros_arm_z : num -0.02 -0.02 -0.02 0.02 0 0 0 0 -0.02 -0.02 ...
$ accel_arm_x : int -288 -290 -289 -289 -289 -289 -289 -289 -288 -288 ...
$ accel_arm_y : int 109 110 110 111 111 111 111 111 109 110 ...
$ accel_arm_z : int -123 -125 -126 -123 -123 -122 -125 -124 -122 -124 ...
$ magnet_arm_x : int -368 -369 -368 -372 -374 -369 -373 -372 -369 -376 ...
$ magnet_arm_y : int 337 337 344 344 337 342 336 338 341 334 ...
$ magnet_arm_z : int 516 513 513 512 506 513 509 510 518 516 ...
$ kurtosis_roll_arm : Factor w/ 330 levels "","-0.02438",..: 1 1 1 1 1 1 1 1 1 1 ...
$ kurtosis_picth_arm : Factor w/ 328 levels "","-0.00484",..: 1 1 1 1 1 1 1 1 1 1 ...
$ kurtosis_yaw_arm : Factor w/ 395 levels "","-0.01548",..: 1 1 1 1 1 1 1 1 1 1 ...
$ skewness_roll_arm : Factor w/ 331 levels "","-0.00051",..: 1 1 1 1 1 1 1 1 1 1 ...
$ skewness_pitch_arm : Factor w/ 328 levels "","-0.00184",..: 1 1 1 1 1 1 1 1 1 1 ...
$ skewness_yaw_arm : Factor w/ 395 levels "","-0.00311",..: 1 1 1 1 1 1 1 1 1 1 ...
$ max_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
$ max_picth_arm : num NA NA NA NA NA NA NA NA NA NA ...
$ max_yaw_arm : int NA NA NA NA NA NA NA NA NA NA ...
$ min_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
$ min_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
$ min_yaw_arm : int NA NA NA NA NA NA NA NA NA NA ...
$ amplitude_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
$ amplitude_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
$ amplitude_yaw_arm : int NA NA NA NA NA NA NA NA NA NA ...
$ roll_dumbbell : num 13.1 13.1 12.9 13.4 13.4 ...
$ pitch_dumbbell : num -70.5 -70.6 -70.3 -70.4 -70.4 ...
$ yaw_dumbbell : num -84.9 -84.7 -85.1 -84.9 -84.9 ...
$ kurtosis_roll_dumbbell : Factor w/ 398 levels "","-0.0035","-0.0073",..: 1 1 1 1 1 1 1 1 1 1 ...
$ kurtosis_picth_dumbbell : Factor w/ 401 levels "","-0.0163","-0.0233",..: 1 1 1 1 1 1 1 1 1 1 ...
$ kurtosis_yaw_dumbbell : Factor w/ 2 levels "","#DIV/0!": 1 1 1 1 1 1 1 1 1 1 ...
$ skewness_roll_dumbbell : Factor w/ 401 levels "","-0.0082","-0.0096",..: 1 1 1 1 1 1 1 1 1 1 ...
$ skewness_pitch_dumbbell : Factor w/ 402 levels "","-0.0053","-0.0084",..: 1 1 1 1 1 1 1 1 1 1 ...
$ skewness_yaw_dumbbell : Factor w/ 2 levels "","#DIV/0!": 1 1 1 1 1 1 1 1 1 1 ...
$ max_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
$ max_picth_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
$ max_yaw_dumbbell : Factor w/ 73 levels "","-0.1","-0.2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ min_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
$ min_pitch_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
$ min_yaw_dumbbell : Factor w/ 73 levels "","-0.1","-0.2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ amplitude_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
$ amplitude_pitch_dumbbell: num NA NA NA NA NA NA NA NA NA NA ...
[list output truncated]
str(testing)
'data.frame': 20 obs. of 159 variables:
$ user_name : Factor w/ 6 levels "adelmo","carlitos",..: 6 5 5 1 4 5 5 5 2 3 ...
$ raw_timestamp_part_1 : int 1323095002 1322673067 1322673075 1322832789 1322489635 1322673149 1322673128 1322673076 1323084240 1322837822 ...
$ raw_timestamp_part_2 : int 868349 778725 342967 560311 814776 510661 766645 54671 916313 384285 ...
$ cvtd_timestamp : Factor w/ 11 levels "02/12/2011 13:33",..: 5 10 10 1 6 11 11 10 3 2 ...
$ new_window : Factor w/ 1 level "no": 1 1 1 1 1 1 1 1 1 1 ...
$ num_window : int 74 431 439 194 235 504 485 440 323 664 ...
$ roll_belt : num 123 1.02 0.87 125 1.35 -5.92 1.2 0.43 0.93 114 ...
$ pitch_belt : num 27 4.87 1.82 -41.6 3.33 1.59 4.44 4.15 6.72 22.4 ...
$ yaw_belt : num -4.75 -88.9 -88.5 162 -88.6 -87.7 -87.3 -88.5 -93.7 -13.1 ...
$ total_accel_belt : int 20 4 5 17 3 4 4 4 4 18 ...
$ kurtosis_roll_belt : logi NA NA NA NA NA NA ...
$ kurtosis_picth_belt : logi NA NA NA NA NA NA ...
$ kurtosis_yaw_belt : logi NA NA NA NA NA NA ...
$ skewness_roll_belt : logi NA NA NA NA NA NA ...
$ skewness_roll_belt.1 : logi NA NA NA NA NA NA ...
$ skewness_yaw_belt : logi NA NA NA NA NA NA ...
$ max_roll_belt : logi NA NA NA NA NA NA ...
$ max_picth_belt : logi NA NA NA NA NA NA ...
$ max_yaw_belt : logi NA NA NA NA NA NA ...
$ min_roll_belt : logi NA NA NA NA NA NA ...
$ min_pitch_belt : logi NA NA NA NA NA NA ...
$ min_yaw_belt : logi NA NA NA NA NA NA ...
$ amplitude_roll_belt : logi NA NA NA NA NA NA ...
$ amplitude_pitch_belt : logi NA NA NA NA NA NA ...
$ amplitude_yaw_belt : logi NA NA NA NA NA NA ...
$ var_total_accel_belt : logi NA NA NA NA NA NA ...
$ avg_roll_belt : logi NA NA NA NA NA NA ...
$ stddev_roll_belt : logi NA NA NA NA NA NA ...
$ var_roll_belt : logi NA NA NA NA NA NA ...
$ avg_pitch_belt : logi NA NA NA NA NA NA ...
$ stddev_pitch_belt : logi NA NA NA NA NA NA ...
$ var_pitch_belt : logi NA NA NA NA NA NA ...
$ avg_yaw_belt : logi NA NA NA NA NA NA ...
$ stddev_yaw_belt : logi NA NA NA NA NA NA ...
$ var_yaw_belt : logi NA NA NA NA NA NA ...
$ gyros_belt_x : num -0.5 -0.06 0.05 0.11 0.03 0.1 -0.06 -0.18 0.1 0.14 ...
$ gyros_belt_y : num -0.02 -0.02 0.02 0.11 0.02 0.05 0 -0.02 0 0.11 ...
$ gyros_belt_z : num -0.46 -0.07 0.03 -0.16 0 -0.13 0 -0.03 -0.02 -0.16 ...
$ accel_belt_x : int -38 -13 1 46 -8 -11 -14 -10 -15 -25 ...
$ accel_belt_y : int 69 11 -1 45 4 -16 2 -2 1 63 ...
$ accel_belt_z : int -179 39 49 -156 27 38 35 42 32 -158 ...
$ magnet_belt_x : int -13 43 29 169 33 31 50 39 -6 10 ...
$ magnet_belt_y : int 581 636 631 608 566 638 622 635 600 601 ...
$ magnet_belt_z : int -382 -309 -312 -304 -418 -291 -315 -305 -302 -330 ...
$ roll_arm : num 40.7 0 0 -109 76.1 0 0 0 -137 -82.4 ...
$ pitch_arm : num -27.8 0 0 55 2.76 0 0 0 11.2 -63.8 ...
$ yaw_arm : num 178 0 0 -142 102 0 0 0 -167 -75.3 ...
$ total_accel_arm : int 10 38 44 25 29 14 15 22 34 32 ...
$ var_accel_arm : logi NA NA NA NA NA NA ...
$ avg_roll_arm : logi NA NA NA NA NA NA ...
$ stddev_roll_arm : logi NA NA NA NA NA NA ...
$ var_roll_arm : logi NA NA NA NA NA NA ...
$ avg_pitch_arm : logi NA NA NA NA NA NA ...
$ stddev_pitch_arm : logi NA NA NA NA NA NA ...
$ var_pitch_arm : logi NA NA NA NA NA NA ...
$ avg_yaw_arm : logi NA NA NA NA NA NA ...
$ stddev_yaw_arm : logi NA NA NA NA NA NA ...
$ var_yaw_arm : logi NA NA NA NA NA NA ...
$ gyros_arm_x : num -1.65 -1.17 2.1 0.22 -1.96 0.02 2.36 -3.71 0.03 0.26 ...
$ gyros_arm_y : num 0.48 0.85 -1.36 -0.51 0.79 0.05 -1.01 1.85 -0.02 -0.5 ...
$ gyros_arm_z : num -0.18 -0.43 1.13 0.92 -0.54 -0.07 0.89 -0.69 -0.02 0.79 ...
$ accel_arm_x : int 16 -290 -341 -238 -197 -26 99 -98 -287 -301 ...
$ accel_arm_y : int 38 215 245 -57 200 130 79 175 111 -42 ...
$ accel_arm_z : int 93 -90 -87 6 -30 -19 -67 -78 -122 -80 ...
$ magnet_arm_x : int -326 -325 -264 -173 -170 396 702 535 -367 -420 ...
$ magnet_arm_y : int 385 447 474 257 275 176 15 215 335 294 ...
$ magnet_arm_z : int 481 434 413 633 617 516 217 385 520 493 ...
$ kurtosis_roll_arm : logi NA NA NA NA NA NA ...
$ kurtosis_picth_arm : logi NA NA NA NA NA NA ...
$ kurtosis_yaw_arm : logi NA NA NA NA NA NA ...
$ skewness_roll_arm : logi NA NA NA NA NA NA ...
$ skewness_pitch_arm : logi NA NA NA NA NA NA ...
$ skewness_yaw_arm : logi NA NA NA NA NA NA ...
$ max_roll_arm : logi NA NA NA NA NA NA ...
$ max_picth_arm : logi NA NA NA NA NA NA ...
$ max_yaw_arm : logi NA NA NA NA NA NA ...
$ min_roll_arm : logi NA NA NA NA NA NA ...
$ min_pitch_arm : logi NA NA NA NA NA NA ...
$ min_yaw_arm : logi NA NA NA NA NA NA ...
$ amplitude_roll_arm : logi NA NA NA NA NA NA ...
$ amplitude_pitch_arm : logi NA NA NA NA NA NA ...
$ amplitude_yaw_arm : logi NA NA NA NA NA NA ...
$ roll_dumbbell : num -17.7 54.5 57.1 43.1 -101.4 ...
$ pitch_dumbbell : num 25 -53.7 -51.4 -30 -53.4 ...
$ yaw_dumbbell : num 126.2 -75.5 -75.2 -103.3 -14.2 ...
$ kurtosis_roll_dumbbell : logi NA NA NA NA NA NA ...
$ kurtosis_picth_dumbbell : logi NA NA NA NA NA NA ...
$ kurtosis_yaw_dumbbell : logi NA NA NA NA NA NA ...
$ skewness_roll_dumbbell : logi NA NA NA NA NA NA ...
$ skewness_pitch_dumbbell : logi NA NA NA NA NA NA ...
$ skewness_yaw_dumbbell : logi NA NA NA NA NA NA ...
$ max_roll_dumbbell : logi NA NA NA NA NA NA ...
$ max_picth_dumbbell : logi NA NA NA NA NA NA ...
$ max_yaw_dumbbell : logi NA NA NA NA NA NA ...
$ min_roll_dumbbell : logi NA NA NA NA NA NA ...
$ min_pitch_dumbbell : logi NA NA NA NA NA NA ...
$ min_yaw_dumbbell : logi NA NA NA NA NA NA ...
$ amplitude_roll_dumbbell : logi NA NA NA NA NA NA ...
$ amplitude_pitch_dumbbell: logi NA NA NA NA NA NA ...
[list output truncated]
The “classe” variable in the training set indicates five types of doing dumbbell lifts, where only type A stands for exactly correct way and the other four types stand for common mistakes.
Here I would like to select pitch/roll/yaw degrees of arm/forearm/dumbbell/belt to discriminate the five types, because I think how the body parts move determines whether people are doing exercises correctly.
library(e1071)
# Create SVM
# Method: C-classification and radial kernal
# Parameters (default values): cost = 1 and gamma = (data dimension)^-1
svm_model <- svm(classe ~ pitch_arm + pitch_forearm + pitch_dumbbell + pitch_belt +
roll_arm + roll_forearm + roll_dumbbell + roll_belt +
yaw_arm + yaw_forearm + yaw_dumbbell + yaw_belt, data = training,
scale = FALSE, cross = 10)
summary(svm_model)
Call:
svm(formula = classe ~ pitch_arm + pitch_forearm + pitch_dumbbell +
pitch_belt + roll_arm + roll_forearm + roll_dumbbell + roll_belt +
yaw_arm + yaw_forearm + yaw_dumbbell + yaw_belt, data = training,
cross = 10, scale = FALSE)
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1
gamma: 0.08333333
Number of Support Vectors: 18403
( 4713 3772 3327 3066 3525 )
Number of Classes: 5
Levels:
A B C D E
10-fold cross-validation on training data:
Total Accuracy: 44.83233
Single Accuracies:
44.44444 44.69929 44.54638 43.42508 45.38971 47.19674 44.85219 44.64832 43.52701 45.59348
# predict(svm_model, testing) gives me factor(0)
predict(svm_model, testing[, match(c("pitch_arm", "pitch_forearm", "pitch_dumbbell", "pitch_belt", "roll_arm", "roll_forearm", "roll_dumbbell", "roll_belt", "yaw_arm", "yaw_forearm", "yaw_dumbbell", "yaw_belt"), names(testing))])
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
A A B A A A D B A A A C A A A A A A A A
Levels: A B C D E
library(rpart)
library(rattle)
# Create a decision tree
rpart_model <- rpart(classe ~
pitch_arm + pitch_forearm + pitch_dumbbell + pitch_belt +
roll_arm + roll_forearm + roll_dumbbell + roll_belt +
yaw_arm + yaw_forearm + yaw_dumbbell + yaw_belt,
data = training)
fancyRpartPlot(rpart_model, sub = NULL)
predict(rpart_model, testing)
A B C D E
1 0.005221932 0.310704961 0.537859008 0.11749347 0.02872063
2 0.823299453 0.168100078 0.008600469 0.00000000 0.00000000
3 0.006128703 0.183861083 0.543411645 0.08478039 0.18181818
4 0.977477477 0.022522523 0.000000000 0.00000000 0.00000000
5 0.115303983 0.243186583 0.549266247 0.04612159 0.04612159
6 0.000000000 0.016453382 0.000000000 0.26325411 0.72029250
7 0.061690315 0.137569402 0.142504627 0.49722394 0.16101172
8 0.115303983 0.243186583 0.549266247 0.04612159 0.04612159
9 0.993662864 0.006337136 0.000000000 0.00000000 0.00000000
10 0.823299453 0.168100078 0.008600469 0.00000000 0.00000000
11 0.044905009 0.687392055 0.105354059 0.04663212 0.11571675
12 0.006128703 0.183861083 0.543411645 0.08478039 0.18181818
13 0.187797147 0.152931854 0.577654517 0.03565769 0.04595880
14 0.993662864 0.006337136 0.000000000 0.00000000 0.00000000
15 0.061690315 0.137569402 0.142504627 0.49722394 0.16101172
16 0.091054313 0.210862620 0.017571885 0.15814696 0.52236422
17 0.867346939 0.065051020 0.005102041 0.02933673 0.03316327
18 0.104786546 0.160413972 0.078913325 0.54333765 0.11254851
19 0.104786546 0.160413972 0.078913325 0.54333765 0.11254851
20 0.044905009 0.687392055 0.105354059 0.04663212 0.11571675
library(randomForest)
# Create random forests
# Parameters (default values): mtry = sqrt(p)
rf_model <- randomForest(classe ~
pitch_arm + pitch_forearm + pitch_dumbbell + pitch_belt +
roll_arm + roll_forearm + roll_dumbbell + roll_belt +
yaw_arm + yaw_forearm + yaw_dumbbell + yaw_belt,
data = training)
rf_model
Call:
randomForest(formula = classe ~ pitch_arm + pitch_forearm + pitch_dumbbell + pitch_belt + roll_arm + roll_forearm + roll_dumbbell + roll_belt + yaw_arm + yaw_forearm + yaw_dumbbell + yaw_belt, data = training)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 3
OOB estimate of error rate: 0.84%
Confusion matrix:
A B C D E class.error
A 5568 8 0 3 1 0.002150538
B 17 3732 44 3 1 0.017118778
C 0 19 3378 23 2 0.012857978
D 2 4 13 3195 2 0.006529851
E 0 5 12 5 3585 0.006099251
predict(rf_model, testing)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
B A B A A E D B A A B C B A E E A B B B
Levels: A B C D E
# Cross-validation for feature selection
rf_cv <- rfcv(training[,match(c("pitch_arm", "pitch_forearm", "pitch_dumbbell", "pitch_belt", "roll_arm", "roll_forearm", "roll_dumbbell", "roll_belt", "yaw_arm", "yaw_forearm", "yaw_dumbbell", "yaw_belt"), names(training))], training$classe, 10)
rf_cv$error.cv
12 6 3 1
0.008867598 0.026246050 0.110029559 0.440424014
The svm function in e1071 package provides an argument for k-fold cross-validation of a SVM model. According to the result of 10-fold validation, the accuracy is about 45%, which is fairly good compared with a random guess - 1/5, 20%. However, the expected errors would be over 50%, and more than half of the predicted results would be incorrect.
Theoretically, the best SVM parameters can be found with tune. However, the following codes cost so much computation resource for my laptop that the mission was not completed in ten hours. I believe the accuracy of SVM would be improved if I tuned “gamma” and “cost” appropriately.
svm_tune <- tune(svm,
classe ~ pitch_arm + pitch_forearm + pitch_dumbbell + pitch_belt +
roll_arm + roll_forearm + roll_dumbbell + roll_belt +
yaw_arm + yaw_forearm + yaw_dumbbell + yaw_belt,
scale = FALSE,
ranges = list(gamma = seq(0, 1, 0.05), cost = 2^(-1:5)),
data = training[, match(c("classe", "pitch_arm", "pitch_forearm", "pitch_dumbbell", "pitch_belt", "roll_arm", "roll_forearm", "roll_dumbbell", "roll_belt", "yaw_arm", "yaw_forearm", "yaw_dumbbell", "yaw_belt"), names(training))])
For decision tree, the predictions seem poor. The tree cannot generate some nodes separating the training data clearly. Identified categories appear in both sides of some nodes.
If one tree is not enough, how about a forest? I used rfcv to perform cross-validation for feature selection of the built random forests. The errors show that, when using my selected 12 variables, the error can be reduced to about 0.009, which is less than 1%. Besides, the error drops remarkably when the used variable number increases from 1, 3, 6, to 12.
I selected the random forests model. With that, I got 20/20 correct predictions for testing data.
sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Traditional)_Taiwan.950
[2] LC_CTYPE=Chinese (Traditional)_Taiwan.950
[3] LC_MONETARY=Chinese (Traditional)_Taiwan.950
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Traditional)_Taiwan.950
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] randomForest_4.6-12 rattle_4.1.0 rpart_4.1-11
[4] e1071_1.6-8
loaded via a namespace (and not attached):
[1] rpart.plot_2.1.2 Rcpp_0.12.10 RGtk2_2.20.33
[4] class_7.3-14 digest_0.6.12 rprojroot_1.2
[7] backports_1.0.5 magrittr_1.5 evaluate_0.10
[10] stringi_1.1.5 rmarkdown_1.5 RColorBrewer_1.1-2
[13] tools_3.4.0 stringr_1.2.0 yaml_2.1.14
[16] compiler_3.4.0 htmltools_0.3.6 knitr_1.16
The data are generously provided by Groupware@LES.