The dataset ToyotaCorolla.csv contains data on used cars on sale during the late summer of 2004 in The Netherlands. It has 1,436 records containing details on 38 attributes, including Price, Age, Kilometers, HP, and other specifications. The dataset has two categorical attributes, Fuel Type and Color.
1. Load the CSV file into R and get familiar with the data
a. Use the code template provided to load the data into an R data frame. You will need to modify the template with appropriate names, e.g. the data frame name should reflect the data you are loading, and the variable names in the code will not be consistent with the variables in the Toyota Corolla file.
b. Run the R code to determine the following for the data in the data frame:
The number of observations
The names of the variables
The means for all non-character variables
2. Prepare the data. We plan to analyze the data using various data mining techniques described in future chapters. Prepare the data for use as follows:
a. Transform the two nominal variables in the dataset — Fuel Type and Color — into dummy variables.
Describe how you would convert these to binary variables (i.e., describe conceptually how a nominal variable is transformed into a dummy variable, not the R code to create a dummy variable).
Confirm this using R code to transform the variables into dummies and listing the variable names in the data frame.
b. Prepare the transformed dataset for data mining techniques of supervised learning by creating partitions in R.
Select all the variables and use the default value for the random seed. Use partitioning percentages for the training (50%), validation (30%), and test (20%) sets.
Describe the roles that these partitions will play in modeling.
You will use an R add-in called knitr to submit your work. knitr will allow you to create a document in easy to read format with your R code, your output, and any analysis that you write.
You will submit the file you create using knitr (either Word or pdf) for this assignment.
Your document will contain your R code and output along with your answers to each part of this assignment. Please have a header in your document for each part above that indicates which question and sub-question is being addressed. The second attachment below provides an example of how you may format your knitr output.
Last Completed Projects
topic title | academic level | Writer | delivered |
---|