Showing Page:
1/14
Student Name:
Student Number:
Course Name/Code:
School/Department:
Course Instructor:
Question 1: Drawing graphs and linear regression analysis
1.
Showing Page:
2/14
Showing Page:
3/14
2. Regression lines
Showing Page:
4/14
Showing Page:
5/14
Question 2:
1.
Means
X1
X2
X3
X4
9
9
9
9
2.
Means
Y1
Y2
Y3
Y4
7.500909
7.500909
7.5
7.500909
3.
Variance
X1
X2
X3
X4
11
11
11
11
4.
Sample variance
Y1
Y2
Y3
Y4
4.127269
4.127629
4.12262
4.123249
5.
Showing Page:
6/14
Model
Intercept
Model 1 (y1~x1)
3.001
Model 2 (y2~x2)
3.001
Model 3 (y3~x3)
3.0025
Model 4 (y4~x4)
3.0017
6.
Model
slope
Model 1 (y1~x1)
0.5001
Model 2 (y2~x2)
0.500
Model 3 (y3~x3)
0.4997
Model 4 (y4~x4)
0.4999
7.
#Pearson correlation
> #correlation 1
> cor(x1,y1)
[1] 0.8164205
> #correlation 2
> cor(x2,y2)
[1] 0.8162365
> #correlation 3
> cor(x3,y3)
[1] 0.8162867
> #correlation 4
> cor(x4,y4)
[1] 0.8165214
8.
>#coefficient of determination
> #coeff 1
> (cor(x1,y1))^2
[1] 0.6665425
> #coeff 2
> (cor(x2,y2))^2
[1] 0.666242
> #coeff 3
> (cor(x3,y3))^2
[1] 0.666324
> #coeff 4
> (cor(x4,y4))^2
[1] 0.6667073
9.
Model
P-value
Model 1 (y1~x1)
0.00217
Model 2 (y2~x2)
0.002179
Model 3 (y3~x3)
0.002176
Model 4 (y4~x4)
0.002165
R CODE
#Parameter calculation
#importing and reading data into R
data<-read.csv("C:/Users/MYLES/OneDrive/Desktop/assignment.csv",header=TRUE)
#Reading in data
data
#Reading the headers (variables names
Showing Page:
7/14
x1<-data$x1
x2<-data$x2
x3<-data$x3
x4<-data$x4
y1<-data$y1
y2<-data$y2
y3<-data$y3
y4<-data$y4
#calculating the means
#Return the means
mean(x1)
mean(x2)
mean(x3)
mean(x4)
mean(y1)
mean(y2)
mean(y3)
mean(y4)
var(x1)
var(x2)
var(x3)
var(x4)
var(y1)
var(y2)
var(y3)
var(y4)
#model 1
lmodel1<-lm(y1 ~ x1)
lmodel1
#model 2
lmodel2<-lm(y2 ~ x2)
lmodel2
#model 3
lmodel3<-lm(y3 ~ x3)
lmodel3
#model 4
lmodel4<-lm(y4 ~ x4)
lmodel4
#Pearson correlation
#correlation 1
cor(x1,y1)
Showing Page:
8/14
#correlation 2
cor(x2,y2)
#correlation 3
cor(x3,y3)
#correlation 4
cor(x4,y4)
1.
#coefficient of determination
#coeff 1
(cor(x1,y1))^2
#coeff 2
(cor(x2,y2))^2
#coeff 3
(cor(x3,y3))^2
#coeff 4
(cor(x4,y4))^2
#p-values
#model 1
summary(lmodel1)
#model 2
summary(lmodel2)
#model 3
summary(lmodel3)
#model 4
summary(lmodel4)
#R Code for scatter plot
#importing and reading data into R
data<-read.csv("C:/Users/MYLES/OneDrive/Desktop/assignment.csv",header=TRUE)
#Reading in data
data
#Reading the headers (variables names
x1<-data$x1
x2<-data$x2
x3<-data$x3
x4<-data$x4
y1<-data$y1
y2<-data$y2
y3<-data$y3
y4<-data$y4
#drawing scatterplots
#Scatterplot 1
plot(x1, y1, main = "Scatterplot for x1 and y1", xlab = "x1", ylab = "y1", pc
h = 19, frame = FALSE)
#Scatterplot 2
plot(x2, y2, main = "Scatterplot for x2 and y2", xlab = "x2", ylab = "y2", pc
h = 19, frame = FALSE)
#Scatterplot 3
plot(x3, y3, main = "Scatterplot for x3 and y3", xlab = "x3", ylab = "y3", pc
h = 19, frame = FALSE)
#Scatterplot 4
plot(x4, y4, main = "Scatterplot for x4 and y4", xlab = "x4", ylab = "y4", pc
h = 19, frame = FALSE)
#plotting line 4
plot(x4, y4, main = "regression line for x4 and y4", xlab = "x4", ylab = "y4"
, pch = 19, frame = FALSE)
#drawing a regression line
abline(lm(y4 ~ x4, data =data), col = "blue")
#plotting line 3
Showing Page:
9/14
plot(x3, y3, main = "regression line for x3 and y3", xlab = "x3", ylab = "y3"
, pch = 19, frame = FALSE)
#drawing a regression line
abline(lm(y3 ~ x3, data =data), col = "blue")
#plotting line 2
plot(x2, y2, main = "regression line for x2 and y2", xlab = "x2", ylab = "y2"
, pch = 19, frame = FALSE)
#drawing a regression line
abline(lm(y2 ~ x2, data =data), col = "blue")
#plotting line 1
plot(x1, y1, main = "regression line for x1 and y1", xlab = "x1", ylab = "y1"
, pch = 19, frame = FALSE)
#drawing a regression line
abline(lm(y1 ~ x1, data =data), col = "blue")
Discussion
The scatterplot plot between y1 and x1 and the scatterplot between x3 and y3 show a
linear relationship between x1 and y1. However, the scatterplot between x2 and y2 shows a
nonlinear distribution of points. The plotted points tend to form a curve than a straight line. The
scatterplot between y4 and x4 tend to form a vertical line with one outlier to the far right. In
general, linear regression modelling is suitable for modeling or establishing the relationship
between x1 and y1, x3 and y3, and x4 and y4.
From the plotted regression lines, the degree of association is highest for regression
model between x3 and y3 since most if the plotted points fall along the fitted regression line. In
contrast, the degree of association is lowest for x4 and y4 because only two plotted points fall
along the fitted regression line.
The means for x1, x2, x3, and x4 is 9 with sample variances of approximately 11. The
sample means for y1, y2, y3, and y4 is approximately 7.5 with standard deviation of 4.1. The
fitted regression model shows that the intercept for all the four fitted regression models is 3 on
average and an approximated slope of 0.5. Therefore, a one unit increase in either x1, x2, x3, or
x4 increases the corresponding value of y1, y2, y3, and y4 respectively.
Showing Page:
10/14