Name: _______________________________________________
SAS® Forecasting Project for Critical Thinking
This project utilizes the “Real Estate – Base” database. The purpose is twofold:
-
Build critical thinking skills needed to structure data analysis appropriately for effective decision
making.
Analyze available data practically and skillfully in order to build an explanatory regression model.
The Real Estate - Base database includes the following variables for 101 homes (* NOTE: These variables
are shown as qualitative variables within the database):
a.
b.
c.
d.
e.
f.
g.
h.
i.
j.
k.
l.
*Unit#
*Type
*Location
*U/S/R
Price
Sq. Ft.
Lot (Acres)
Garage
BRs
Baths
*Pool
Age
(An assigned database key)
(H = House, C = Condo/Apartment)
(1 through 10 – voting district where located)
(Urban vs. Suburban vs. Rural location)
(The price the house ended up selling for in 2017)
(Heated/Cooled & Attached square footage)
(Acreage of property)
(Number of attached covered and/or enclosed parking positions)
(Number of qualified bedrooms)
(Number of bathrooms – no tub or shower indicated as .5)
(No=No Access; HA=Shared Pool; AG=Above Ground; IG=In Ground)
(Age of home in rounded year at end of 2017)
At a high level, here are the steps you are going to perform:
1. Download the Excel spreadsheet with the Real Estate Data in it and create the requested
Scatterplots. NOTE: It is important that the Dependent Variable (Price) is on the Y-axis and the
Independent Variable is on the X-axis. The order of the two columns will dictate that.
2. Perform Regression Analysis within Excel to determine how well the prescribed Independent
Variables explain changes in the Dependent Variable.
3. Upload the Real Estate dataset into SAS Studio.
4. Perform a series of Regression Analyses in SAS Studio to find a better set of explanatory
variables.
5. Answer a critical thinking exercise regarding forecasting and the data set we have.
Here are the steps in detail:
1. Create the following charts in Excel using the charting tools and the indicated variables in “Real
Estate - Base.xlsx” (Remember, Price is your Dependent Variable)
a. Create a new tab in the spreadsheet called “Scatterplots”. After creating each
Scatterplot on the original tab, move it to the Scatterplot tab you created.
b. Create a Scatterplot using the variables Price and Sq. Ft.
c. Create a Scatterplot using the variables Price and Lot (Acres).
d. Create a Scatterplot using the variables Price and Garage.
e. Create a Scatterplot using the variables Price and BRs.
f. Create a Scatterplot using the variables Price and Baths.
g. Create a Scatterplot using the variables Price and Age.
2. What sort of relationship do you see between these variables based on the scatterplots?
a. Between Price and Sq. Ft. (Circle)?
No relationship
Weak
Moderate
Strong
Moderate
Strong
Moderate
Strong
Moderate
Strong
Moderate
Strong
Moderate
Strong
b. Between Price and Lot (Circle)?
No relationship
Weak
c. Between Price and Garage (Circle)?
No relationship
Weak
d. Between Price and BRs (Circle)?
No relationship
Weak
e. Between Price and Baths (Circle)?
No relationship
f.
Weak
Between Price and Age (Circle)?
No relationship
Weak
3. In the Excel spreadsheet provided, using the Data Analysis Add-in, run a regression analysis with
Price as the Dependent Variable and Lot, Garage and BRs as the Independent Variables and
select to have Excel create a new tab called “Regression Model”. It is recommended that you
run individual regressions with each variable alone to see how strong each R2 is.
4. Provide the following from the “Excel Model”:
a. Coefficient of Determination (R-squared)
___________________
b. Y-Intercept for the Regression Model
___________________
c. Slope value for X1 (Lot)
___________________
d. Slope value for X2 (Garage)
___________________
e. Slope value for X3 (BRs)
___________________
5. Do you think we need all three current Independent variables in our Regression model to
predict changes in Price (Circle)?
Yes
No
Explain: _________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
6. Which variable(s) would you remove (Circle)?
Lot Size
Garage
BRs
7. Of the following variables in the spreadsheet, which variable would you select next to add to the
model (i.e., you think it would create a stronger prediction of Price)?
Type
Location
U/S/R
Sq. Ft.
Baths
Pool
Age
8. Run a SAS Regression Model on the Real Estate – Base database using Price as the Dependent
Variable (Y) and include the original Independent Variables (minus any you removed in step 6)
and adding the variable you chose in step 7. Print your model output and turn it in with the
assignment. (NOTE: You may have to repeat this exercise until you find a combination of
variables that gives you a higher R2).
9. Provide the following from the SAS Model:
a. Coefficient of Determination (R-squared ).
________________________
b. Y-Intercept for the Regression Model
________________________
c. Slope value for each of your Independent Variables.
i. Var_______________________
________________________
ii. Var_______________________
________________________
iii. Var_______________________
________________________
iv. Var_______________________
________________________
v. Var_______________________
________________________
10. Did your SAS model provide a stronger Coefficient of Determination (Circle)?
Yes
No
Critical Thinking Question:
11. A large real estate company is trying to use similar data plus their own sales data to forecast
total sales for the coming year for each of their agents and they have pulled data from their
Finance records. They are trying to assemble the best data to build a Regression model.
a. Would it make sense to use the same data as we used above in the SAS model? Why or
why not?
__________________________________________________________________________________
__________________________________________________________________________________
b. Recommend two data elements you think they probably have available to help them
predict sales for each of their sales people.
1. ______________________________________________
2. ______________________________________________
GRADING RUBRIC
Overall Score Possible = 100
Problem Area
Did the student create the Excel tab for Scatterplots?
Possible Points
2
Did the student create the correct scatterplots and move
them to the new tab?
3
Did the student make a selection for each type of
relationship?
5
Did the student run Data Analysis on the Excel spreadsheet
creating a new tab for the model output?
10
Did the student provide the correct model output values from
the spreadsheet in the problem document?
10
Did the student answer Critical Thinking questions 5, 6 and 7?
20
Did the student run a regression model in SAS and provide a
print out of the model output?
20
Did the student provide the correct model output values from
SAS in the problem document and answer the decision
problem (#10)?
10
Did the student complete all parts of the Critical Thinking
problem #11?
20
Total Critical Thinking Points
100
Points Awarded
Unit #
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Type
H
H
H
H
H
H
H
H
H
C
H
H
H
H
H
H
H
H
H
C
H
H
H
H
H
H
H
H
H
C
H
H
H
H
H
H
H
H
H
C
H
H
H
H
H
H
Location
10
2
5
6
9
1
3
7
4
8
10
2
5
6
9
1
3
7
4
8
10
2
5
6
9
1
3
7
4
8
10
2
5
6
9
1
3
7
4
8
10
2
5
6
9
1
U/S/R
R
U
S
S
S
U
S
R
S
U
R
U
S
S
S
U
S
R
S
U
R
U
S
S
S
U
S
R
S
U
R
U
S
S
S
U
S
R
S
U
R
U
S
S
S
U
Price
$ 54.000
$ 98.000
$ 125.700
$ 250.000
$ 411.500
$ 56.500
$ 289.500
$ 420.000
$ 199.800
$ 249.900
$ 77.000
$ 78.600
$ 199.800
$ 279.500
$ 842.000
$ 66.720
$ 311.450
$ 311.520
$ 187.500
$ 311.750
$ 98.000
$ 112.000
$ 146.850
$ 301.500
$ 690.000
$ 71.200
$ 275.000
$ 598.230
$ 176.500
$ 405.200
$ 68.521
$ 101.500
$ 117.650
$ 266.000
$ 601.500
$ 39.800
$ 401.500
$ 782.000
$ 201.500
$ 199.650
$ 119.500
$ 88.420
$ 188.500
$ 231.100
$ 485.200
$ 48.999
Sq. Ft.
1100
1875
1350
2612
2190
1800
1605
2199
2120
900
1950
1420
2090
2770
3650
1600
2288
2000
1880
980
3011
2980
1850
3520
3300
1905
2850
3250
1900
1150
2015
2190
1750
2190
3450
1064
2540
4200
1980
850
1865
1750
1700
2045
2700
1550
Lot (Acres)
2
0,25
0,25
0,5
0,5
0,25
0,25
12
0,4
0
1
0,5
0,75
0,5
1
0,25
0,5
1,5
0,25
0
3
0,4
0,25
0,5
0,75
0,5
0,25
10
0,4
0
1,5
0,66
0,66
1
0,75
0,5
0,75
5
0,66
0
14
0,75
0,5
0,5
0,5
0,75
Garage
0
1
0
2
1
0
2
2
2
0
1
0
2
2
3
1
2
2
1
1
1
2
0
3
3
1
2
3
2
1
1
1
1
2
2
0
2
3
2
0
1
1
2
1
2
1
BRs
2
3
2
3
3
3
3
3
3
2
2
2
3
3
5
3
3
3
3
2
4
3
3
4
4
3
3
4
3
3
3
3
3
3
3
2
4
5
3
2
3
3
3
3
3
3
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
H
H
H
C
H
H
H
H
H
H
H
H
H
C
H
H
H
H
H
H
H
H
H
C
H
H
H
H
H
H
H
H
H
C
H
H
H
H
H
H
H
H
H
C
H
H
H
3
7
4
8
10
2
5
6
9
1
3
7
4
8
10
2
5
6
9
1
3
7
4
8
10
2
5
6
9
1
3
7
4
8
10
2
5
6
9
1
3
7
4
8
10
2
5
S
R
S
U
R
U
S
S
S
U
S
R
S
U
R
U
S
S
S
U
S
R
S
U
R
U
S
S
S
U
S
R
S
U
R
U
S
S
S
U
S
R
S
U
R
U
S
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
366.500
356.420
157.650
288.500
49.874
91.640
179.500
189.500
532.800
52.100
399.500
388.600
175.800
301.500
95.400
96.888
171.630
207.500
577.900
49.875
247.800
497.500
205.000
469.800
77.000
91.400
152.800
401.500
505.000
58.700
285.235
675.500
188.760
302.900
171.680
84.600
166.900
366.900
411.960
68.900
297.600
524.700
181.500
312.800
88.520
79.450
151.960
2390
2050
1830
1014
1450
1800
2015
1950
2888
2012
2450
3450
2200
1050
2220
1995
2100
2750
3120
1011
2120
3890
2100
1250
1090
1900
1040
3850
2950
1000
2850
2740
1850
900
2950
1640
1800
3200
2400
2200
3300
4350
1800
940
1750
1490
1500
0,5
9
0,25
0
0,5
0,5
0,75
0,5
0,5
0,4
0,5
4
0,4
0
8
0,5
1
0,75
0,75
0,25
0,5
22
0,66
0
2,5
0,4
0,25
1
0,5
0,4
0,5
75
0,25
0
11
0,75
0,8
0,75
0,5
0,5
0,5
11
0,4
0
4
0,5
0,5
2
2
1
1
0
2
1
2
2
1
2
2
2
1
2
1
2
2
2
0
2
3
2
2
0
1
1
2
2
1
1
2
1
1
2
1
2
3
2
2
2
2
2
1
1
0
1
4
3
2
2
2
3
3
3
4
3
3
3
3
2
3
2
3
3
4
2
2
4
4
3
3
2
2
4
3
2
3
3
2
2
3
2
3
4
3
3
4
4
3
2
2
3
3
94
95
96
97
98
99
100
101
H
H
H
H
H
H
C
H
6
9
1
3
7
4
8
10
S
S
U
S
R
S
U
R
$
$
$
$
$
$
$
$
302.900
489.650
64.995
400.500
711.000
172.450
345.900
81.400
2175
2550
850
2752
4540
1590
980
1275
1
0,5
0,25
1
18
0,5
0
2
2
2
0
2
2
1
1
1
3
3
2
3
5
2
3
2
Baths
1
2
1,5
2
2
1
2
2,5
2
2
2
2
2
2,5
5
1,5
2
2
2
1,5
2
2
2
2,5
3,5
1,5
2
2
2
2,5
1,5
2,5
1,5
2,5
3
1,5
2,5
2,5
2
2
2
2
2
2
2,5
2
Pool
No
No
AG
No
No
No
HA
No
No
HA
No
No
IG
HA
HA
No
No
IG
IG
HA
AG
No
No
IG
HA
AG
No
No
No
HA
No
IG
No
HA
IG
No
No
No
HA
HA
No
AG
No
No
No
No
Age
27
26
82
11
17
21
6
72
15
4
12
16
22
9
4
28
11
21
9
5
35
4
11
3
9
37
5
2
3
0
38
16
22
8
6
31
9
4
8
6
17
21
15
8
15
29
2
2
2
2
1
2,5
2
2,5
2,5
2
2,5
2,5
2,5
2
2
1,5
2
2
3
2
2
3,5
2,5
2,5
1
2
2
3
3
2
2,5
2
2
2
2
2
2
2,5
2
2
2,5
3
2
2
2
1,5
2
No
No
No
HA
No
No
No
No
No
No
No
No
No
HA
IG
No
No
IG
HA
No
HA
IG
HA
HA
AG
No
IG
HA
IG
AG
IG
IG
No
HA
AG
No
No
HA
No
No
HA
No
No
HA
No
No
No
13
17
8
2
36
9
12
4
4
16
7
37
2
1
21
15
36
7
2
14
6
3
4
0
35
4
3
7
1
25
2
15
4
1
5
7
2
7
9
17
8
3
12
7
37
32
17
2
2,5
1,5
2,5
3
2
2
1,5
No
No
No
HA
No
IG
HA
No
11
11
12
6
14
9
2
24
Name: _______________________________________________
SAS® Forecasting Project for Critical Thinking
This project utilizes the “Real Estate – Base” database. The purpose is twofold:
-
Build critical thinking skills needed to structure data analysis appropriately for effective decision
making.
Analyze available data practically and skillfully in order to build an explanatory regression model.
The Real Estate - Base database includes the following variables for 101 homes (* NOTE: These variables
are shown as qualitative variables within the database):
a.
b.
c.
d.
e.
f.
g.
h.
i.
j.
k.
l.
*Unit#
*Type
*Location
*U/S/R
Price
Sq. Ft.
Lot (Acres)
Garage
BRs
Baths
*Pool
Age
(An assigned database key)
(H = House, C = Condo/Apartment)
(1 through 10 – voting district where located)
(Urban vs. Suburban vs. Rural location)
(The price the house ended up selling for in 2017)
(Heated/Cooled & Attached square footage)
(Acreage of property)
(Number of attached covered and/or enclosed parking positions)
(Number of qualified bedrooms)
(Number of bathrooms – no tub or shower indicated as .5)
(No=No Access; HA=Shared Pool; AG=Above Ground; IG=In Ground)
(Age of home in rounded year at end of 2017)
At a high level, here are the steps you are going to perform:
1. Download the Excel spreadsheet with the Real Estate Data in it and create the requested
Scatterplots. NOTE: It is important that the Dependent Variable (Price) is on the Y-axis and the
Independent Variable is on the X-axis. The order of the two columns will dictate that.
2. Perform Regression Analysis within Excel to determine how well the prescribed Independent
Variables explain changes in the Dependent Variable.
3. Upload the Real Estate dataset into SAS Studio.
4. Perform a series of Regression Analyses in SAS Studio to find a better set of explanatory
variables.
5. Answer a critical thinking exercise regarding forecasting and the data set we have.
Here are the steps in detail:
1. Create the following charts in Excel using the charting tools and the indicated variables in “Real
Estate - Base.xlsx” (Remember, Price is your Dependent Variable)
a. Create a new tab in the spreadsheet called “Scatterplots”. After creating each
Scatterplot on the original tab, move it to the Scatterplot tab you created.
b. Create a Scatterplot using the variables Price and Sq. Ft.
c. Create a Scatterplot using the variables Price and Lot (Acres).
d. Create a Scatterplot using the variables Price and Garage.
e. Create a Scatterplot using the variables Price and BRs.
f. Create a Scatterplot using the variables Price and Baths.
g. Create a Scatterplot using the variables Price and Age.
2. What sort of relationship do you see between these variables based on the scatterplots?
a. Between Price and Sq. Ft. (Circle)?
No relationship
Weak
Moderate
Strong
Moderate
Strong
Moderate
Strong
Moderate
Strong
Moderate
Strong
Moderate
Strong
b. Between Price and Lot (Circle)?
No relationship
Weak
c. Between Price and Garage (Circle)?
No relationship
Weak
d. Between Price and BRs (Circle)?
No relationship
Weak
e. Between Price and Baths (Circle)?
No relationship
f.
Weak
Between Price and Age (Circle)?
No relationship
Weak
3. In the Excel spreadsheet provided, using the Data Analysis Add-in, run a regression analysis with
Price as the Dependent Variable and Lot, Garage and BRs as the Independent Variables and
select to have Excel create a new tab called “Regression Model”. It is recommended that you
run individual regressions with each variable alone to see how strong each R2 is.
4. Provide the following from the “Excel Model”:
a. Coefficient of Determination (R-squared)
___________________
b. Y-Intercept for the Regression Model
___________________
c. Slope value for X1 (Lot)
___________________
d. Slope value for X2 (Garage)
___________________
e. Slope value for X3 (BRs)
___________________
5. Do you think we need all three current Independent variables in our Regression model to
predict changes in Price (Circle)?
Yes
No
Explain: _________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
6. Which variable(s) would you remove (Circle)?
Lot Size
Garage
BRs
7. Of the following variables in the spreadsheet, which variable would you select next to add to the
model (i.e., you think it would create a stronger prediction of Price)?
Type
Location
U/S/R
Sq. Ft.
Baths
Pool
Age
8. Run a SAS Regression Model on the Real Estate – Base database using Price as the Dependent
Variable (Y) and include the original Independent Variables (minus any you removed in step 6)
and adding the variable you chose in step 7. Print your model output and turn it in with the
assignment. (NOTE: You may have to repeat this exercise until you find a combination of
variables that gives you a higher R2).
9. Provide the following from the SAS Model:
a. Coefficient of Determination (R-squared ).
________________________
b. Y-Intercept for the Regression Model
________________________
c. Slope value for each of your Independent Variables.
i. Var_______________________
________________________
ii. Var_______________________
________________________
iii. Var_______________________
________________________
iv. Var_______________________
________________________
v. Var_______________________
________________________
10. Did your SAS model provide a stronger Coefficient of Determination (Circle)?
Yes
No
Critical Thinking Question:
11. A large real estate company is trying to use similar data plus their own sales data to forecast
total sales for the coming year for each of their agents and they have pulled data from their
Finance records. They are trying to assemble the best data to build a Regression model.
a. Would it make sense to use the same data as we used above in the SAS model? Why or
why not?
__________________________________________________________________________________
__________________________________________________________________________________
b. Recommend two data elements you think they probably have available to help them
predict sales for each of their sales people.
1. ______________________________________________
2. ______________________________________________
GRADING RUBRIC
Overall Score Possible = 100
Problem Area
Did the student create the Excel tab for Scatterplots?
Possible Points
2
Did the student create the correct scatterplots and move
them to the new tab?
3
Did the student make a selection for each type of
relationship?
5
Did the student run Data Analysis on the Excel spreadsheet
creating a new tab for the model output?
10
Did the student provide the correct model output values from
the spreadsheet in the problem document?
10
Did the student answer Critical Thinking questions 5, 6 and 7?
20
Did the student run a regression model in SAS and provide a
print out of the model output?
20
Did the student provide the correct model output values from
SAS in the problem document and answer the decision
problem (#10)?
10
Did the student complete all parts of the Critical Thinking
problem #11?
20
Total Critical Thinking Points
100
Points Awarded
Purchase answer to see full
attachment