Writing Project: Checklist for the results section-written portion
_____ The results section clearly states the methodology being used.
_____ Dummy variables are used correctly to incorporate zip code in the model
_____ Multicolinearity is discussed and addressed; correlation coefficients are calculated to identify
potential multicollinearity.
_____ Outliers and observations with unusually high leverage are identified and addressed.
_____The statistical significance of coefficients is discussed and addressed; the possibility of eliminating
variables with low statistical significance is considered.
_____A partial F-test is conducted for the joint statistical significance of the dummy variables
_____ There is a clear discussion of each model specification presented and why it is preferred or not
preferred to other possible specifications.
_____ Both the magnitude and the statistical significance of the coefficients in the preferred
specification are discussed.
_____ The discussion of the regression coefficients uses the appropriate units and relevant magnitudes.
Writing Project: Checklist for the results section-table of regression results
_____ The table of regression results has a title.
_____ The table of regression results is laid out in the conventional format and self-explanatory.
_____ The table of regression results contains the coefficients, t-stats or p-values for each, the Rsquared and adjusted R-squared for each model, and the sample size (if it changes from model
to model).
_____ Coefficients and p-values in the regression results table are rounded consistently.
_____ The text and table(s) are well integrated; the text refers to the table of results where appropriate.
Comments/questions:
Vanessa Price ($) Bedrooms Bathrooms House Ft2 Lot Ft2
Year Built Zip Code Address
Home 1
710,000
3
2
1,780
1,306
2015
98122 913 29th Ave
Home 2
497,500
2
3
880
1,379
2006
98122 711 26th Ave
Home 3
625,000
2
2
1,130
2,347
2007
98122 822 17th Ave
Home 4
881,875
4
4
2,808
4,704
1906
98122 1715 Madrona Dr
Home 5
820,000
3
2.5
2,370
7,620
1994
98122 963 22nd Ave
Home 6
525,000
3
1
1,220
2,592
1902
98122 1221 E Jefferson St
Home 7
878,500
3
2.5
2,230
2,700
1987
98122 308 35th Ave
Home 8
856,000
3
2
1,600
4,000
1925
98122 356 27th Ave
Home 9
650,000
3
2
1,630
1,012
2016
98122 1415 E Fir St
Home 10
387,500
3
2
1,050
2,578
1903
98122 713 23rd Ave
Home 11
970,000
3
1
1,840
3,500
1904
98122 2515 E Yesler Way
Home 12
370,000
3
2
1,090
3,049
1901
98122 517 23rd Ave
Home 13
639,000
3
3
1,600
1,742
2016
98122 1523 19th Ave
Home 14
918,000
3
2.5
1,930
1,753
2016
98122 918 15th Ave
Home 15
855,000
3
2
1,260
3,841
1903
98122 723 19th Ave
House
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Price $
Bedrooms BathroomsHouse Ft2 Lot Ft2
Year built Zipcode
$1,325,000.00
4
3.5
3,150
5,640
1920
98109
$625,000.00
3
1.5
1,440
3,400
1910
98109
$705,000.00
3
2.75
2,260
4,000
1956
98109
$546,000.00
3
1.5
1,850
9,240
1954
98109
$1,165,000.00
2
1.5
2,130
6,534
1924
98109
$751,000.00
4
4.5
1,910
2,090
2007
98109
$1,166,000.00
4
3
2,760
4,200
1910
98109
$1,100,000.00
3
3.5
2,160
3,458
1917
98109
$1,025,000.00
4
2
2,220
4,000
1924
98109
$1,820,000.00
5
3.5
3,200
4,000
1929
98109
$1,675,000.00
3
2
3,200
7,840
1961
98109
$865,000.00
3
1.5
2,640
3,092
1925
98109
$1,285,000.00
5
4
3,600
6,373
1904
98109
$740,000.00
4
3
1,812
1,429
2007
98109
$2,300,000.00
4
3.25
3,500
4,046
2016
98109
Price (1000) $
$1,325.00
$625.00
$705.00
$546.00
$1,165.00
$751.00
$1,166.00
$1,100.00
$1,025.00
$1,820.00
$1,675.00
$865.00
$1,285.00
$740.00
$2,300.00
Age
98 years
108 years
62 years
64 years
94 years
11 years
108 years
101 years
94 years
89 years
57 years
93 years
114 years
11 years
2 years
House
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Price(1000)Bed
476
1032
960
575
650
1706
1121
378
860
789
672
678
608
947
1565
Bath
2
3
3
2
4
3
4
3
3
2
3
3
2
4
4
1
2.5
1.75
1
2
5
2
1.5
2
1
3
3
1
2
3
House sqft Lot sqft Yr built
Zip
Add.
1510
2550
1928
98103 3645 Carr
2370
3066
1929
98103 3728 Woodlawn
3400
8712
1921
98103 4212 Francis
1480
2680
1946
98103 1102 N 41
2290
4791
1922
98103 4019 Woodland
4101
5702
1990
98103 4315 Bagley
2930
3484
1922
98103 5800 Greenwood
1370
6840
1946
98103 3721 Eastern
2260
3191
1926
98103 1316 N39
1130
3301
1900
98103 4223 Dayton
1770
1714
2012
98103 3810 Linden
2210
1306
1998
98103 3652 Whitman
1830
3920
1909
98103 4028 Midvale
2250
4356
1912
98103 208 N 54
3048
5227
2007
98103 711 N61
Address
4018 31ST Ave W
2519 34TH Ave W
2560 31ST Ave W
3051 21ST Ave W
2409 Montavista Pl W
4428 36TH Ave W
2915 W Garfield St
2631 34TH Ave W
4209 26TH Ave W
4516 35TH Ave W
4037 Williams Ave W
3637 W Commodore Way
5436 40TH Ave W
2223 W Barrett St
4315 32ND Ave W
Price ($1000s)
Bedrooms BathroomsHouse FT2 Lot FT2
YR Built Zip Code
990
3
2
1810
4800
1955
98199
816
3
1
1870
5998
1938
98199
650
3
1
1270
5500
1928
98199
735
3
2
1660
2700
1928
98199
1938
4
4
4070
4791
1978
98199
1425
4
4
3180
4791
2005
98199
1010
3
2
2040
5000
1938
98199
851
3
2
1521
6534
1941
98199
705
3
2
1640
5776
1950
98199
742
2
1
1140
4791
1947
98199
998
3
3
1880
5500
1947
98199
1975
5
5
3850
8276
2017
98199
1186
2
2
1776
4791
1958
98199
755
3
2
2380
6098
1905
98199
675
3
1.5
1250
4800
1943
98122
Age
63
80
90
90
40
13
80
77
68
71
71
1
60
113
75
Empirical Results:
For this writing project concerning unemployment in Oregon for 2009, I estimated two
different regression models for two separate dependent variables: percentage of male
unemployment and percentage of female unemployment. For each dependent variable, I
estimated three different specifications. The independent variables used in these regression
models are (1) population of 18 to 24 year olds as a percentage, (2) population of naturalized
foreign-born citizens as a percentage, (3) population of foreign-born non-citizens as a
percentage, (4) black population as a percentage, (5) Asian population as a percentage, (6)
Hispanic population as a percentage, (7) percentage of people who have attained a college or
graduate degree, (8) percentage of people who work for the government, (9) percentage of
people who work in construction, and (10) percentage of people who work white collar jobs.
Male Unemployment:
Table 2 (on Page 6) shows the regression results for male unemployment. In the first
specification, I regress male unemployment on all ten of the independent variables. With a
relatively low R-Squared Adjusted at about 0.35, I looked to eliminating variables with low
tratios
and high p-values in order to increase this value. With their low t-ratios and high p-values,
these independent variables might be insignificant to the model at a 1% significance level. In this
specification, the variables with the highest t-ratios were both independent variables concerning
citizenship: population of naturalized foreign-born citizens as a percentage and population of
foreign-born non-citizens as a percentage. With t-ratios of 0.02 and 0.23, respectively, these
variables might be insignificant to the model.
In the second specification for male unemployment, I eliminate both independent
variables concerning citizenship. With an increased R-Squared Adjusted of around 0.39, it seems
as if these two variables are indeed insignificant to the model. To confirm this, I performed a
partial F-test to test these variables’ significance to the overall model. With a null hypothesis of
both variables being insignificant, an alternative hypothesis of at least one variable being
significant, and an F statistic of .03, it is clear that these two variables are not statistically
significant to the overall model at a 1% significance level. Thus, it is perfectly acceptable to
eliminate these two variables from the model. Nonetheless, I still looked to eliminating variables
with low t-ratios and high p-values in order to increase the R-Squared Adjusted. In this
specification, the variables with the highest t-ratios were all independent variables concerning
race and ethnicity: black population as a percentage, Asian population as a percentage, and
Hispanic population as a percentage. With t-ratios of 0.67, -0.43, and -0.40, respectively, these
variables might be insignificant to the model. Although the variable for age (population of 18 to
24 year olds as a percentage) actually has the highest t-ratio in this specification, this variable
will not be eliminated because it is a key part of answering the research question of this project.
In the final specification for male unemployment, I eliminate not only both independent
variables concerning citizenship (population of naturalized foreign-born citizens as a percentage
and population of foreign-born non-citizens as a percentage) but also all independent variables
concerning race and ethnicity (black population as a percentage, Asian population as a
percentage, and Hispanic population as a percentage). With an increased R-Squared Adjusted of
around 0.44, it seems as if all of the race and ethnicity variables are indeed insignificant to the
model. To confirm this, I performed a partial F-test to test these variables’ significance to the
Klutho 5
overall model. With a null hypothesis of all variables being insignificant, an alternative
hypothesis of at least one variable being significant, and an F statistic of about 0.24, it is clear
that all race and ethnicity variables are not statistically significant to the overall model at a 1%
significance level. Thus, it is perfectly acceptable to eliminate these variables from the overall
model.
Therefore, the final model for male unemployment includes only five independent
variables: (1) population of 18 to 24 year olds as a percentage, (2) population of naturalized
foreign-born citizens as a percentage, (3) percentage of people who have attained a college or
graduate degree, (4) percentage of people who work for the government, (5) percentage of
people who work in construction, and (6) percentage of people who work white collar jobs.
Although only one variable in this specification is significant according to t-tests, eliminating
one or a combination of these variables only served to decrease the R-Squared Adjusted.
Looking at the coefficients of these five variables, it is clear that education has the largest impact
on male unemployment in Oregon. Holding all other variables constant, a 1% increase in the
population with college or graduate degrees is associated with a 0.51% decrease in male
unemployment. White-collar employment and construction employment also have a relatively
large impact on male unemployment. Keeping all other variables constant, a 1% increase in share
of employment in white-collar jobs will increase predicted male unemployment by 0.41%, while
(holding all other variables constant) a 1% increase in the share of employment in construction is
associated with a 0.36% decrease in predicted male unemployment. Government employment
also has a positive relationship with male unemployment; a 1% increase in people who work for
the government will increase predicted male unemployment by 0.18%. Age, however, has the
smallest impact on male unemployment in Oregon; holding all other variables constant, a 1%
population increase in 18-24 year olds is associated with a 0.07% increase in predicted male
unemployment. Therefore, the effect of age on predicted male unemployment is not statistically
significant.
Klutho 6
Table 2: Regression Results for Male Unemployment in Oregon at County Level
Dependent Variable: Male Unemployment (in %)
(1) – Full Model (2) – Reduced Model (3) – Final Model
Intercept -0.03
(-0.21)
-0.02
(-0.12)
-0.03
(-0.23)
Pop, 18 to 24 Years,
2009 (in %)
0.13
(0.42)
0.10
(0.36)
0.07
(0.30)
Pop, Foreign Born Naturalized, 2009
(in %)
0.03
(0.02)
Pop, Foreign Born Not a Citizen, 2009
(in %)
0.09
(0.23)
Black Pop, 2009
(in %)
0.39
(0.60)
0.42
(0.67)
Asian Pop, 2009
(in %)
-0.26
(-0.41)
-0.23
(-0.43)
Hispanic Pop, 2009
(in %)
-0.06
(-0.40)
-0.02
(-0.40)
College or Graduate
Degree, 2009 (in %)
-0.52
(-2.02)
-0.49
(-2.41)*
-0.51
(-2.75)**
Employment,
Government (in %)
0.18
(1.04)
0.17
(1.12)
0.18
(1.63)
Employment,
Construction, 2009
(in %)
-0.37
(-0.75)
-0.40
(-0.86)
-0.36
(-0.94)
Employment, White
Collar, 2009 (in %)
0.42
(1.54)
0.39
(1.69)
0.41
(1.93)
R-Squared 0.533 0.531 0.519
R-Squared Adjusted 0.346 0.393 0.439
Number of Observations is 36
T-Ratios are in parentheses
**significant at 1%, *significant at 5%
Klutho 7
Female Unemployment:
Table 3 (on Page 9) depicts the regression results for female unemployment. In the first
specification, I regress female unemployment on all ten of the independent variables. Although
its R-Squared Adjusted is not as low as the male R-Squared Adjusted, this specification had a
relatively low R-Squared Adjusted at about 0.45. To increase this value, I looked to eliminating
variables with low t-ratios, which might prove these variables to be insignificant to the model at
a 1% significance level. In this specification, the variables with the highest t-ratios were
population of foreign-born non-citizens as a percentage and percentage of people who work for
the government. With t-ratios of 0.15 and -0.15, respectively, these variables might be
insignificant to the model. Although the variable for age (population of 18 to 24 year olds as a
percentage) actually has one of the highest t-ratios in this specification, this variable will not be
eliminated because it is a key part of answering the research question of this project.
In the second specification for female unemployment, I eliminate the independent
variables for population of foreign-born non-citizens as a percentage and percentage of people
who work for the government. With an increased R-Squared Adjusted of around 0.49, it seems as
if these two variables are indeed insignificant to the model. To confirm this, I performed a partial
F-test to test these variables’ significance to the overall model. With a null hypothesis of both
variables being insignificant, an alternative hypothesis of at least one variable being significant,
and an F statistic of about .03, it is clear that these two variables are not statistically significant to
the overall model at a 1% significance level. Thus, it is perfectly acceptable to eliminate these
two variables from the model. Nonetheless, I still looked to eliminating variables with low tratios
and high p-values in order to increase the R-Squared Adjusted. In this specification, the
variables with the highest t-ratios were two independent variables concerning race and ethnicity:
black population as a percentage and Asian population as a percentage. With t-ratios of 0.63 and
-0.35, respectively, these variables might be insignificant to the model. Like in the previous
specification, the variable for age (population of 18 to 24 year olds as a percentage) actually has
the highest t-ratio in this specification; once again, this variable will not be eliminated because it
is a key part of answering the research question of this project.
In the final specification for female unemployment, I eliminate not only the independent
variables for population of foreign-born non-citizens as a percentage and percentage of people
who work for the government but also the independent variables of black population as a
percentage and Asian population as a percentage. With an increased R-Squared Adjusted of
around 0.52, it seems as if these two race and ethnicity variables are indeed insignificant to the
model. To confirm this, I performed a partial F-test to test these variables’ significance to the
overall model. With a null hypothesis of both variables being insignificant, an alternative
hypothesis of at least one variable being significant, and an F statistic of about 0.21, it is clear
that the variables concerning black and Asian populations are not statistically significant to the
overall model at a 1% significance level. Thus, it is perfectly acceptable to eliminate these
variables from the model.
Therefore, the final model for female unemployment includes only six independent
variables: (1) population of 18 to 24 year olds as a percentage, (2) Hispanic population as a
percentage, (3) percentage of people who have attained a college or graduate degree, (4)
percentage of people who work in construction, and (5) percentage of people who work white
collar jobs. Although only one variable in this specification is significant according to t-tests,
Klutho 8
eliminating one or a combination of these variables only served to decrease the R-Squared
Adjusted.
Looking at the coefficients of these five variables, it is clear that naturalized citizenship
has the largest impact on female unemployment in Oregon. Holding all other variables constant,
a 1% increase in the population of naturalized foreign-born citizens is associated with a 1.2%
decrease in predicted female unemployment. Compared to all other coefficients in both the male
unemployment and female unemployment models, this coefficient has the largest magnitude.
Although they do no have as great of an impact as naturalized foreign-born citizens, white-collar
employment and construction employment also have a relatively large impact on female
unemployment. Keeping all other variables constant, a 1% increase in share of employment in
white-collar jobs will increase predicted female unemployment by 0.30%, while (holding all
other variables constant) a 1% increase in the share of employment in construction is associated
with a 0.38% increase in predicted female unemployment. Hispanic race/ethnicity also has a
positive relationship with female unemployment; a 1% increase in Hispanic population will
increase predicted female unemployment by 0.22%. Education, however, has a negative
relationship with female unemployment; a 1% increase in the population with college or
graduate degrees is associated with a 0.23% decrease in predicted female unemployment.
Compared to the role of education in the male unemployment model, it is surprising that this
value is lower; nonetheless, it is clear that education plays a role in determining unemployment
for females. Just as in the male unemployment model, age has the smallest impact on female
unemployment in Oregon; holding all other variables constant, a 1% population increase in 1824 year olds is associated with a 0.03% decrease in predicted female unemployment. Therefore,
the effect of age on predicted female unemployment is not statistically significant.
Klutho 9
Table 3: Regression Results for Female Unemployment in Oregon at County Level
Dependent Variable: Female Unemployment (in %)
(1) – Full Model (2) – Reduced Model (3) – Final Model
Intercept -0.01
(-0.07)
-0.01
(-0.11)
-0.002
(-0.03)
Pop, 18 to 24 Years,
2009 (in %)
-0.03
(-0.17)
-0.04
(-0.22)
-0.03
(-0.20)
Pop, Foreign Born Naturalized, 2009
(in %)
-1.31
(-1.67)
-1.21
(-1.93)
-1.20
(-2.97)**
Pop, Foreign Born Not a Citizen, 2009
(in %)
0.04
(0.15)
Black Pop, 2009
(in %)
0.26
(0.59)
0.27
(0.63)
Asian Pop, 2009
(in %)
-0.15
(-0.35)
-0.14
(-0.35)
Hispanic Pop, 2009
(in %)
0.21
(2.24)*
0.22
(4.01)**
0.22
(4.29)**
College or Graduate
Degree, 2009 (in %)
-0.23
(-1.30)
-0.22
(-1.68)
-0.23
(-1.85)
Employment,
Government (in %)
-0.02
(-0.15)
Employment,
Construction, 2009
(in %)
-0.41
(-1.22)
-0.37
(-1.70)
0.38
(-1.80)
Employment, White
Collar, 2009 (in %)
0.32
(1.73)
0.31
(1.96)
0.30
(2.00)
R-Squared 0.610 0.609 0.603
R-Squared Adjusted 0.454 0.493 0.521
Number of Observations is 36
Standard Errors are in parentheses
**significant at 1%, *significant at 5%
Purchase answer to see full
attachment