Dimensionality Reduction and Discretization

Objectives:

The objective of this document is to give a brief introduction to dimensionality reduction and discretization. At the end of this tutorial you will have learned

Dimensionality reduction
- Feature subset selection techniques
- Feature reduction techniques
Discretization

Dimensionality Reduction

Dimensionality reduction is performed in two different ways. The first one is feature subset selection and the second one is feature reduction or extraction. This tutorial will briefly cover both of these topics.

Let’s load our main data to use:

data  <-  read.csv(url("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"),
                   header  =  T,  sep=";")

Feature Subset Selection

Feature subset selection aims to select an optimal subset of existing features for modelling purposes. There are two ways to perform feature subset selection: by using wrappers or by using filters. This tutorial will briefly cover both of these topics.

Wrappers

Wrappers are methods that evaluate predictor (feature) performance by adding/removing them into models and measuring the model performance.

For wrapper methods, we will use the caret package.

# if  ("caret"  %in%  rownames(installed.packages())  ==  FALSE){
#   install.packages("caret")
# }
require(caret)

Recursive Feature Elimination

When using recursive feature elimination, initially all variables are included in the model. Later, by removing variables, model performance is recomputed and optimal set of features is determined.

In caret package, recursive feature elimination can be utilized with several models such as linear regression (lmFuncs), random forests (rfFuncs), naive Bayes (nbFuncs) and bagged trees (treebagFuncs). You can also use other functions that can be used with caret’s train function. For further information, check caret’s package documentation.

#subset the data
data.train.index <- createDataPartition(data[,12], p=.8, list = F, times = 1)
data.train <- data[data.train.index,]
#Set the control variables for feature selection 
#We are using linear regression model (lmFuncs) and cross-validation (cv) method to verify with 10 cross-validations 
control.rfe <- rfeControl(functions=lmFuncs, method="cv", number=10)
#x defines predictors, while y defines the output 
results.rfe <- rfe(x = data.train[,-12], y = data.train[,12], sizes = c(1:11),
                   rfeControl = control.rfe)
print(results.rfe) #Print the results

##
## Recursive feature selection
##
## Outer resampling method: Cross-Validated (10 fold)
##
## Resampling performance over subset size:
##
##  Variables   RMSE Rsquared    MAE  RMSESD RsquaredSD   MAESD Selected
##          1 0.8548  0.09364 0.6581 0.03536    0.03710 0.02310
##          2 0.8379  0.12799 0.6451 0.03490    0.04469 0.02172
##          3 0.8355  0.13264 0.6442 0.03471    0.04668 0.02297
##          4 0.8327  0.13761 0.6421 0.03386    0.04212 0.02218
##          5 0.8237  0.15382 0.6361 0.03721    0.05350 0.02451
##          6 0.7812  0.23802 0.6049 0.03808    0.06265 0.02412
##          7 0.7672  0.26684 0.5932 0.03092    0.04395 0.01861
##          8 0.7622  0.27411 0.5910 0.02742    0.03546 0.01886
##          9 0.7620  0.27458 0.5908 0.02775    0.03557 0.01900
##         10 0.7598  0.27887 0.5882 0.02850    0.03456 0.01970        *
##         11 0.7600  0.27858 0.5881 0.02870    0.03492 0.01980
##
## The top 5 variables (out of 10):
##    density, volatile.acidity, pH, sulphates, chlorides

predictors(results.rfe) #Print the names of selected variables

##  [1] "density"             "volatile.acidity"    "pH"
##  [4] "sulphates"           "chlorides"           "alcohol"
##  [7] "residual.sugar"      "fixed.acidity"       "citric.acid"
## [10] "free.sulfur.dioxide"

trellis.par.set(caretTheme())#Set the theme for the RMSE plor

plot(results.rfe, type = c("g","o"))#Plot RMSE

In the table, RMSE refers to “root mean squared error”. In general, we want this to be as low as possible (while also paying attention to possible over fitting problems).

Filters

Filters are methods that evaluate predictor (feature) performance outside models. They usually employ some form of scoring of the variables based on different criteria.

For filter methods, we will use FSelector package.

#install.packages("FSelector")
require(FSelector)

Entropy Based Methods

These methods filter features based on entropy-based scoring.

Information Gain

weights.ig <- information.gain(quality~.,
                               data = data.train) #Compute the weights of variables
print(weights.ig) #Print the weights

##                      attr_importance
## fixed.acidity            0.040864113
## volatile.acidity         0.028661446
## citric.acid              0.046100975
## residual.sugar           0.015631862
## chlorides                0.055416377
## free.sulfur.dioxide      0.025967672
## total.sulfur.dioxide     0.054063089
## density                  0.094802184
## pH                       0.009277158
## sulphates                0.012191765
## alcohol                  0.152440118

subset.ig <- cutoff.k(weights.ig, 5) #Get the most influential 5 variables
f.ig <- as.simple.formula(subset.ig, "quality") #Express the relationship as a formula 
print(f.ig) #Print formula

## quality ~ alcohol + density + chlorides + total.sulfur.dioxide +
##     citric.acid
## <environment: 0x0000000041f65a50>

Gain Ratio

weights.gr <- gain.ratio(quality~.,
                               data = data.train) #Compute the weights of variables
print(weights.gr) #Print the weights

##                      attr_importance
## fixed.acidity             0.03838954
## volatile.acidity          0.03297964
## citric.acid               0.04851085
## residual.sugar            0.01452914
## chlorides                 0.05090511
## free.sulfur.dioxide       0.03249979
## total.sulfur.dioxide      0.03943687
## density                   0.08035167
## pH                        0.01569906
## sulphates                 0.01173380
## alcohol                   0.09679671

subset.gr <- cutoff.k(weights.gr, 5) #Get the most influential 5 variables
f.gr <- as.simple.formula(subset.gr, "quality") #Express the relationship as a formula 
print(f.gr) #Print formula

## quality ~ alcohol + density + chlorides + citric.acid + total.sulfur.dioxide
## <environment: 0x0000000041dba0d0>

Chi-Squared

weights.chi <- chi.squared(quality~.,
                               data = data.train) #Compute the weights of variables
print(weights.chi) #Print the weights

##                      attr_importance
## fixed.acidity              0.1639365
## volatile.acidity           0.1734001
## citric.acid                0.1745144
## residual.sugar             0.1256604
## chlorides                  0.2348272
## free.sulfur.dioxide        0.1680390
## total.sulfur.dioxide       0.1615530
## density                    0.2453800
## pH                         0.1368993
## sulphates                  0.1106995
## alcohol                    0.2685477

subset.chi <- cutoff.k(weights.chi, 5) #Get the most influential 5 variables
f.chi <- as.simple.formula(subset.chi, "quality") #Express the relationship as a formula 
print(f.chi) #Print formula

## quality ~ alcohol + density + chlorides + citric.acid + volatile.acidity
## <environment: 0x0000000041499370>

Correlation

Linear Correlation

Computes correlations using Pearson coefficient to see if there is a linear correlation.

weights.lc <- linear.correlation(quality~.,
                               data = data.train) #Compute the weights of variables
print(weights.lc) #Print the weights

##                      attr_importance
## fixed.acidity            0.119125708
## volatile.acidity         0.201295979
## citric.acid              0.005699735
## residual.sugar           0.081020161
## chlorides                0.214043309
## free.sulfur.dioxide      0.025059470
## total.sulfur.dioxide     0.162478872
## density                  0.293223538
## pH                       0.110598017
## sulphates                0.061679814
## alcohol                  0.428421927

subset.lc <- cutoff.k(weights.lc, 5) #Get the most influential 5 variables
f.lc <- as.simple.formula(subset.lc, "quality") #Express the relationship as a formula 
print(f.lc) #Print formula

## quality ~ alcohol + density + chlorides + volatile.acidity +
##     total.sulfur.dioxide
## <environment: 0x0000000040375ab8>

Rank Correlation

Computes correlations using Spearman coefficient to see if there is a monotonic correlation. This could be more useful in case of non-linear monotonic relationships. Let’s illustrate the relationship:

x <- (1:100) #Generate numbers from 1 to 100
y <- exp(x) #Exponentiate them

cor(x, y, method = "pearson") #Equals to roughly 0.25

## [1] 0.252032

cor(x, y, method = "spearman") #Equals to 1

## [1] 1

Now let’s filter using rank correlation:

weights.rc <- rank.correlation(quality~.,
                               data = data.train) #Compute the weights of variables
print(weights.rc) #Print the weights

##                      attr_importance
## fixed.acidity             0.08897530
## volatile.acidity          0.19474660
## citric.acid               0.02048422
## residual.sugar            0.06591007
## chlorides                 0.31356636
## free.sulfur.dioxide       0.03195105
## total.sulfur.dioxide      0.18688722
## density                   0.33710912
## pH                        0.11790922
## sulphates                 0.03781927
## alcohol                   0.43276843

subset.rc <- cutoff.k(weights.rc, 5) #Get the most influential 5 variables
f.rc <- as.simple.formula(subset.rc, "quality") #Express the relationship as a formula 
print(f.rc) #Print formula

## quality ~ alcohol + density + chlorides + volatile.acidity +
##     total.sulfur.dioxide
## <environment: 0x000000004202bd58>

All of the filtering methods return the same subset of features.

Feature Reduction (Feature Extraction)

Feature reduction or feature extraction techniques aim to generate new, more informative features using the existing set of features. They aim to incorporate the information provided by existing features into a lower number of newly generated features.

Linear

Linear feature extraction techniques aim to generate new features by using a linear combination of existing features.

Principle Component Analysis

Principle component analysis (PCA) projects the entire n-feature set into a k linearly independent features (k<=n). Instead of using the original variables, you can use the computed principle components for modelling.

fit.pca <- prcomp(data.train[,-12])
summary(fit.pca)

## Importance of components:
##                           PC1      PC2     PC3     PC4     PC5     PC6
## Standard deviation     43.662 12.77092 4.62419 1.03755 0.84008 0.13619
## Proportion of Variance  0.911  0.07794 0.01022 0.00051 0.00034 0.00001
## Cumulative Proportion   0.911  0.98890 0.99912 0.99964 0.99997 0.99998
##                            PC7     PC8    PC9    PC10      PC11
## Standard deviation     0.11989 0.10721 0.0934 0.02016 0.0005903
## Proportion of Variance 0.00001 0.00001 0.0000 0.00000 0.0000000
## Cumulative Proportion  0.99999 1.00000 1.0000 1.00000 1.0000000

Based on the PCA summary, we can see that the first two principle components account for 98% of the variance in the data, so we can use these components, instead of the whole dataset.

new.data.pca <- fit.pca$x[,1:2]
summary(new.data.pca)

##       PC1                PC2
##  Min.   :-221.044   Min.   :-71.1260
##  1st Qu.: -30.015   1st Qu.: -7.5495
##  Median :   4.425   Median :  0.1139
##  Mean   :   0.000   Mean   :  0.0000
##  3rd Qu.:  31.343   3rd Qu.:  8.0233
##  Max.   : 132.712   Max.   : 57.8401

Singular Value Decomposition

Singular value decomposition (SVD) is similar to PCA. It also uses projection to lower dimensions. The SVD is a numerical method while PCA is an analysis approach. In R, prcomp functions uses svd (numerical method) to calculate principle components which is more stable than using the euclidean distance. The output of prcomp function is better in terms of interpretability than the output of svd function. Also, summary function does not work properly for the output of svd.

fit.svd <- svd(data.train[,-12])

To see the amount of variance each component accounts for, use the following code:

cumsum(fit.svd$d)/sum(fit.svd$d)

##  [1] 0.8656029 0.9394550 0.9667310 0.9909514 0.9962393 0.9978389 0.9985331
##  [8] 0.9991597 0.9997037 0.9998859 1.0000000

According to the results, first two components account for 94% of variance and we can use those two.

new.data.svd <- fit.svd$u[,1:2]
summary(new.data.svd)

##        V1                  V2
##  Min.   :-0.038853   Min.   :-0.0899225
##  1st Qu.:-0.018458   1st Qu.:-0.0093666
##  Median :-0.014816   Median : 0.0004081
##  Mean   :-0.015286   Mean   : 0.0001455
##  3rd Qu.:-0.011957   3rd Qu.: 0.0102804
##  Max.   :-0.001188   Max.   : 0.0698730

As mentioned previously, using prcomp is more intuitive and easier than using svd.

Factor Analysis

Factor analysis is a general term of methods that use linear projection (such as PCA). Instead of using a specific method, we can use general factor analysis to reduce the dimensionality.

First, we need to determine the number of factors we want to obtain.

# if  ("nFactors"  %in%  rownames(installed.packages())  ==  FALSE){
#   install.packages("nFactors")
# }
library(nFactors)
ev <- eigen(cor(data.train[,-12])) # get eigenvalues
ap <- parallel(subject=nrow(data.train[,-12]),var=ncol(data.train[,-12]),
               rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)

At the scree plot that we have obtained, optimal coordinates is determined as 3. This means, the optimal number of factors that explain the variability in the data is three.

So we can now obtain our factors:

fit.fa <- factanal(data.train[,-12], 3, rotation="varimax")
print(fit.fa, digits=2, cutoff=.1, sort=TRUE)

##
## Call:
## factanal(x = data.train[, -12], factors = 3, rotation = "varimax")
##
## Uniquenesses:
##        fixed.acidity     volatile.acidity          citric.acid
##                 0.00                 0.99                 0.91
##       residual.sugar            chlorides  free.sulfur.dioxide
##                 0.00                 0.87                 0.89
## total.sulfur.dioxide              density                   pH
##                 0.71                 0.00                 0.71
##            sulphates              alcohol
##                 0.97                 0.22
##
## Loadings:
##                      Factor1 Factor2 Factor3
## residual.sugar        0.96    0.18    0.21
## density               0.69    0.65    0.31
## alcohol              -0.29   -0.83
## fixed.acidity        -0.13            0.99
## volatile.acidity
## citric.acid                           0.29
## chlorides                     0.36
## free.sulfur.dioxide   0.29    0.16
## total.sulfur.dioxide  0.32    0.42    0.11
## pH                   -0.13    0.21   -0.48
## sulphates                     0.16
##
##                Factor1 Factor2 Factor3
## SS loadings       1.71    1.56    1.45
## Proportion Var    0.16    0.14    0.13
## Cumulative Var    0.16    0.30    0.43
##
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 4254.76 on 25 degrees of freedom.
## The p-value is 0

As we have mentioned previously that factors are linear combinations of variables. fit.pa$loadings hold the information of which variable is included in which factor and its coefficient in linear combination.

Non-linear

Non-linear feature extraction techniques aim to generate new features by using a non-linear combination of existing features.

Multidimensional Scaling

Multidimensional scaling uses similarity measures to reduce the dimension of the data. First we compute a distance matrix and based on that distance matrix, we reduce the dimension.

d <- dist(data.train[,-12])
fit.mds <- cmdscale(d,eig=TRUE, k=2) #Reduce data to two variables
new.data.mds <- fit.mds$points

You can use new.data.mds instead of all the variables in the dataset.

Isomap

Isomap is a similar function to multidimensional scaling and it extends metric multidimensional scaling (MDS) by incorporating the geodesic distances imposed by a weighted graph.

# if  ("vegan"  %in%  rownames(installed.packages())  ==  FALSE){
#   install.packages("vegan")
# }
require(vegan)
d <- vegdist(data.train[,-12])
fit.iso <- isomap(d, ndim=2, k = 3, fragmentedOK = T)
#Reduce data to two variables (ndim) and retain 3 (k) distances per data point.
#Data might be fragmented, we tell the function that it's ok.
new.data.iso <- fit.iso$points

Feature Selection

Feature selection is basically selecting the most appropriate subset of attributes based on the target attribute. Here, we will cover Minimum Redundancy Maximum Relevance (mRMR) feature selection. Note that there are other methods available for feature selection such as mutual information based criterion.

Minimum Redundancy Maximum Relevance (mRMR)

mRMR tries to maximize relevance according to the target variable based on mutual information and chooses a variable where the mutual information between the variable and the others is the least minimum (minimize redundancy).

In order to perform mRMR feature selection, we will use mRMRepackage.

# if  ("mRMRe"  %in%  rownames(installed.packages())  ==  FALSE){
#   install.packages("mRMRe")
# }
require(mRMRe)
data(cgps) #load the data

We will use mRMR.classic function to select features. This function requires data frame with the columns of the following types: “numeric”, “ordered_factor” and “Surv”. Also, we should convert the data set as mRMRe.Data type. Then, we will use the mRMR.classic function with the parameters: data, target_indices (index of the target feature (column) from the data matrix), and feature_count (number of features to be selected by mRMR feature selection). In this example, we will select 3 features for 3 different target variables.

# Convert the data type
feature_data <- mRMR.data(data =  data.frame(cgps.ge))
# Create an mRMR filter and obtain the indices of selected features
filter3 <- mRMR.classic(data = feature_data, target_indices = 3:5,
                       feature_count = 3)
solutions(filter3)

## $`3`
##      [,1]
## [1,]  592
## [2,]  805
## [3,]  673
##
## $`4`
##      [,1]
## [1,]  578
## [2,]  339
## [3,]  251
##
## $`5`
##      [,1]
## [1,]  149
## [2,]  586
## [3,]  160

As the output, we can see the three features selected for each target index. For example; mRMR function selected 592nd, 805th and 673th variables for the 3rd target variable; 578th, 339th and 251st variables for the 4th target variable, and 149th, 586th and 160th variables for the 5th target variable.

Discretization

Discretization methods aim to discretize continuous variables. For discretization we will use two packages. Namely, discretization and arules.

Assume that we want to perform binning (either equal width or equal frequency), following code allows us to do that. Initially, we have a data frame of 11 predictor variables and one outcome variable. We want to discretize the predictor variables which are numeric. discretize function only works on vectors. Instead of discretizing all of the predictor variables one by one inside a for loop, we can use the lapply function which takes a function and applies it to all of the variables in a data frame. discretize function takes three main inputs: the data vector to be discretized, the method and the number of categories. Inside the lapply function, we also need to determine the method and the number of categories. lapply splits the data frame into vectors and gives them to the function as input. lapply returns a list as an output.

discretize function returns a vector of factors which represent the interval that that particular data point falls into. So if you want the numeric discretization, you need to apply as.numeric function to all variables, again we use lapply for this. Finally, we convert the list back to a data frame. For association mining, you may want to keep the factor form of discretization.

#Install and load packages
# if  ("arules"  %in%  rownames(installed.packages())  ==  FALSE){
#   install.packages("arules")
# }
require(arules)
#Loop through all variables in dataset to discretize.
data.eqw <- NULL
for (i in 1:11){
  d <- discretize(data.train[,i], method = "interval", categories =3)
  data.eqw <- cbind(data.eqw, d)
}
names(data.eqw) <- names(data.train[,-12])

#Or equivalently we can use lapply syntax
#Apply equal width binning discretization to all variables in dataset.
data.eqw <- data.frame(lapply(data.train[,-12],
                              FUN = discretize, method = "interval", categories = 3))
#Take a look at first few data points in the data.eqw
head(data.eqw)

##   fixed.acidity volatile.acidity   citric.acid residual.sugar
## 1 [ 3.90, 7.33)      [0.08,0.42) [0.000,0.553)    [ 0.6,22.3)
## 2 [ 3.90, 7.33)      [0.08,0.42) [0.000,0.553)    [ 0.6,22.3)
## 3 [ 3.90, 7.33)      [0.08,0.42) [0.000,0.553)    [ 0.6,22.3)
## 4 [ 7.33,10.77)      [0.08,0.42) [0.000,0.553)    [ 0.6,22.3)
## 5 [ 3.90, 7.33)      [0.08,0.42) [0.000,0.553)    [ 0.6,22.3)
## 6 [ 3.90, 7.33)      [0.08,0.42) [0.000,0.553)    [ 0.6,22.3)
##       chlorides free.sulfur.dioxide total.sulfur.dioxide       density
## 1 [0.009,0.121)       [  2.0, 50.2)            [128,247) [0.987,1.004)
## 2 [0.009,0.121)       [  2.0, 50.2)            [128,247) [0.987,1.004)
## 3 [0.009,0.121)       [  2.0, 50.2)            [128,247) [0.987,1.004)
## 4 [0.009,0.121)       [  2.0, 50.2)            [  9,128) [0.987,1.004)
## 5 [0.009,0.121)       [  2.0, 50.2)            [128,247) [0.987,1.004)
## 6 [0.009,0.121)       [  2.0, 50.2)            [128,247) [0.987,1.004)
##            pH   sulphates     alcohol
## 1 [2.72,3.09) [0.22,0.50) [ 8.0,10.1)
## 2 [3.09,3.45) [0.22,0.50) [ 8.0,10.1)
## 3 [3.09,3.45) [0.22,0.50) [ 8.0,10.1)
## 4 [3.09,3.45) [0.22,0.50) [10.1,12.1)
## 5 [3.09,3.45) [0.22,0.50) [ 8.0,10.1)
## 6 [2.72,3.09) [0.22,0.50) [ 8.0,10.1)

#Turn it into a numeric data frame
data.eqw <- data.frame(lapply(data.eqw, as.numeric))
#Take a second look at first few data points in the data.eqw
head(data.eqw)

##   fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1             1                1           1              1         1
## 2             1                1           1              1         1
## 3             1                1           1              1         1
## 4             2                1           1              1         1
## 5             1                1           1              1         1
## 6             1                1           1              1         1
##   free.sulfur.dioxide total.sulfur.dioxide density pH sulphates alcohol
## 1                   1                    2       1  1         1       1
## 2                   1                    2       1  2         1       1
## 3                   1                    2       1  2         1       1
## 4                   1                    1       1  2         1       2
## 5                   1                    2       1  2         1       1
## 6                   1                    2       1  1         1       1

#For equal frequency
data.eqf <- lapply(data.train[,-12],
                   FUN = discretize, method = "frequency", categories = 3)
data.eqf <- data.frame(lapply(data.eqf, as.numeric))

#For k-means clustering discretization 
data.eqc <- lapply(data.train[,-12],
                   FUN = discretize, method = "cluster", categories = 3)
data.eqc <- data.frame(lapply(data.eqc, as.numeric))

#You can also use user-specified intervals 
##Lets assume that we want to discretize by quantiles of each variable 
cats <- data.frame(lapply(data.train[,-12], FUN = quantile)) #We need to add -Inf and Inf as lower and upper boundaries. #rep function replicates the given value or variable by given number. 
cats <- rbind(rep(-Inf, 11), cats, rep(Inf,11))
#In this case we need to use mapply instead of lapply because
#we have multiple different inputs to the function for each variable 
data.us <- data.frame(mapply(data.train[,-12],
                             FUN = discretize, method = "fixed", categories = cats))
data.us <- data.frame(lapply(data.us, as.numeric))

If you want to use different boundaries for each variable, just bind them into a data.frame and pass it into mapply as we did above. Don’t forget to add -Inf and Inf.

We can also use Minimum Description Length Principle (mdlp) to discretize the data points, which uses entropy criterion to determine the optimal discretization. It returns two lists. One holds the cutpoints (boundaries) of the discretization, and the second one holds the discretized data. Keep in mind that mdlp is a supervised clustering method, so you need to provide the outcome variable along with the dataset. By default, mdlp assumes the last column in your data frame to be the outcome variable, which is true in our case as our outcome variable is “quality”.

#Install and load packages
# if  ("discretization"  %in%  rownames(installed.packages())  ==  FALSE){
#   install.packages("discretization")
# }
require(discretization)

#Loop through all variables in dataset to discretize.
data.mdlp <- mdlp(data.train)
summary(data.mdlp$Disc.data)

##  fixed.acidity   volatile.acidity  citric.acid    residual.sugar
##  Min.   :1.000   Min.   :1.000    Min.   :1.000   Min.   :1.00
##  1st Qu.:1.000   1st Qu.:1.000    1st Qu.:4.000   1st Qu.:1.00
##  Median :1.000   Median :2.000    Median :4.000   Median :1.00
##  Mean   :1.057   Mean   :1.626    Mean   :3.757   Mean   :1.49
##  3rd Qu.:1.000   3rd Qu.:2.000    3rd Qu.:4.000   3rd Qu.:2.00
##  Max.   :2.000   Max.   :3.000    Max.   :5.000   Max.   :3.00
##    chlorides     free.sulfur.dioxide total.sulfur.dioxide    density
##  Min.   :1.000   Min.   :1.000       Min.   :1.000        Min.   :1.000
##  1st Qu.:1.000   1st Qu.:2.000       1st Qu.:2.000        1st Qu.:2.000
##  Median :2.000   Median :2.000       Median :2.000        Median :3.000
##  Mean   :2.016   Mean   :2.097       Mean   :2.273        Mean   :2.263
##  3rd Qu.:3.000   3rd Qu.:2.000       3rd Qu.:3.000        3rd Qu.:3.000
##  Max.   :3.000   Max.   :3.000       Max.   :3.000        Max.   :3.000
##        pH         sulphates    alcohol         quality
##  Min.   :1.00   Min.   :1   Min.   :1.000   Min.   :3.000
##  1st Qu.:1.00   1st Qu.:1   1st Qu.:2.000   1st Qu.:5.000
##  Median :1.00   Median :1   Median :3.000   Median :6.000
##  Mean   :1.32   Mean   :1   Mean   :2.962   Mean   :5.874
##  3rd Qu.:2.00   3rd Qu.:1   3rd Qu.:4.000   3rd Qu.:6.000
##  Max.   :2.00   Max.   :1   Max.   :5.000   Max.   :9.000

Useful Links:

Statsoft PCA and Factor Analysis: http://www.statsoft.com/Textbook/Principal-Components-Factor-Analysis
- Details on PCA and Factor Analysis
Statsoft Multidimensional Scaling: http://www.statsoft.com/Textbook/Multidimensional-Scaling
- Details on multidimensional scaling
Caret webpage: http://topepo.github.io/caret/index.html
- Detailed information about caret package
Dimensionality reduction in R: https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Dimensionality_Reduction
- Details on using R for dimensionality reduction
🆕 PCA and K-means Clustering of Delta Aircraft: https://www.r-bloggers.com/pca-and-k-means-clustering-of-delta-aircraft/
🆕 Principal Component Analysis and k-means Clustering to Visualize a High Dimensional Dataset: https://medium.com/@dmitriy.kavyazin/principal-component-analysis-and-k-means-clustering-to-visualize-a-high-dimensional-dataset-577b2a7a5fe2