Use R-Script language to write following 1-8 ! thanks !! only allowed use R scripts !!!
1 - File Reader (for comma separated values CSV) to linear correlation node and to the interactive histogram for basic exploratory data analysis (EDA).
2- File Reader to 'Rule Engine Node' to turn the 11-point scale to dichotomy variable (good wine and rest), the code is shown below will be used to put in the rule engine:
- $quality$ > 6.5 => "good"
- TRUE => "bad"
3- Rule Engine Node output to the input of Column Filter node to filter out the original 10point feature. This will basically prevent the issue of leaking.
4- Column Filter Node output to the input of Partitioning Node (your standard train/test split, for example, 75%/25%, choose 'random' or 'stratified')
5- Partitioning Node train data split output to the input of Train data split to input Decision Tree Learner node.
6- Partitioning Node test data split output to input Decision Tree Predictor Node
7- Decision Tree Learner Node output to input Decision Tree Node input
8- Decision Tree output to input ROC Node. This will make it possible to evaluate the entire model base by computing the area under the curve. These processes can be conducted separately for both the red and the white wine to answer the third research question/ hypothesis which is; how do the top 3 important physiochemical properties for red wine correlated to that of white wine?