Review the Learning Resources related to hypothesis testing, meaningfulness, and statistical significance
As a scholar-practitioner, it is important for you to understand that just because a hypothesis test indicates a relationship exists between an intervention and an outcome, there is a difference between groups, or there is a correlation between two constructs, it does not always provide a default measure for its importance. Although relationships are significant, they can be very minute relationships, very small differences, or very weak correlations. In the end, we need to ask whether the relationships or differences observed are large enough that we should make some practical change in policy or practice.For this Discussion, you will explore statistical significance and meaningfulness.To prepare for this Discussion:Review the Learning Resources related to hypothesis testing, meaningfulness, and statistical significance.Review Magnusson’s web blog found in the Learning Resources to further your visualization and understanding of statistical power and significance testing.Review the American Statistical Association’s press release and consider the misconceptions and misuse of p-values.Consider the scenario:A research paper claims a meaningful contribution to the literature based on finding statistically significant relationships between predictor and response variables. In the footnotes, you see the following statement, “given this research was exploratory in nature, traditional levels of significance to reject the null hypotheses were relaxed to the .10 level.”By Day 3Post your response to the scenario in which you critically evaluate this footnote. As a reader/reviewer, what response would you provide to the authors about this footnote?Scenarios are listed as follows:
1. The p-value was slightly above conventional threshold, but was described as
“rapidly approaching significance” (i.e., p =.06).
An independent samples t test was used to determine whether student satisfaction
levels in a quantitative reasoning course differed between the traditional classroom
and on-line environments. The samples consisted of students in four face-to-face
classes at a traditional state university (n = 65) and four online classes offered at
the same university (n = 69). Students reported their level of satisfaction on a fivepoint
scale, with higher values indicating higher levels of satisfaction. Since the
study was exploratory in nature, levels of significance were relaxed to the .10 level.
The test was significant t(132) = 1.8, p = .074, wherein students in the face-to-face
class reported lower levels of satisfaction (M = 3.39, SD = 1.8) than did those in the
online sections (M = 3.89, SD = 1.4). We therefore conclude that on average,
students in online quantitative reasoning classes have higher levels of satisfaction.
The results of this study are significant because they provide educators with
evidence of what medium works better in producing quantitatively knowledgeable
practitioners.
2. A results report that does not find any effect and also has small sample size
(possibly no effect detected due to lack of power).
A one-way analysis of variance was used to test whether a relationship exists
between educational attainment and race. The dependent variable of education
was measured as number of years of education completed. The race factor had
three attributes of European American (n = 36), African American (n = 23) and
Hispanic (n = 18). Descriptive statistics indicate that on average, European
Americans have higher levels of education (M = 16.4, SD = 4.6), with African
Americans slightly trailing (M = 15.5, SD = 6.8) and Hispanics having on average
lower levels of educational attainment (M = 13.3, SD = 6.1). The ANOVA was not
significant F (2,74) = 1.789, p = .175, indicating there are no differences in
educational attainment across these three races in the population. The results of
this study are significant because they shed light on the current social conversation
about inequality.
3. Statistical significance is found in a study, but the effect in reality is very small (i.e.,
there was a very minor difference in attitude between men and women). Were the
results meaningful?
An independent samples t test was conducted to determine whether differences
exist between men and women on cultural competency scores. The samples
consisted of 663 women and 650 men taken from a convenience sample of public,
private, and non-profit organizations. Each participant was administered an
instrument that measured his or her current levels of cultural competency. The
© 2016 Laureate Education, Inc. Page 2 of 2
cultural competency score ranges from 0 to 10, with higher scores indicating higher
levels of cultural competency. The descriptive statistics indicate women have
higher levels of cultural competency (M = 9.2, SD = 3.2) than men (M = 8.9, SD =
2.1). The results were significant t (1311) = 2.0, p <.05, indicating that women are
more culturally competent than are men. These results tell us that gender-specific
interventions targeted toward men may assist in bolstering cultural competency.
4. A study has results that seem fine, but there is no clear association to social
change. What is missing?
A correlation test was conducted to determine whether a relationship exists
between level of income and job satisfaction. The sample consisted of 432
employees equally represented across public, private, and non-profit sectors. The
results of the test demonstrate a strong positive correlation between the two
variables, r =.87, p < .01, showing that as level of income increases, job
satisfaction increases as well. Press release as follows:AMERICAN STATISTICAL ASSOCIATION RELEASES STATEMENT ON
STATISTICAL SIGNIFICANCE AND P-VALUES
Provides Principles to Improve the Conduct and Interpretation of Quantitative
Science
March 7, 2016
The American Statistical Association (ASA) has released a “Statement on Statistical Significance
and P-Values” with six principles underlying the proper use and interpretation of the p-value
[http://amstat.tandfonline.com/doi/abs/10.1080/00031305.2016.1154108#.Vt2XIOaE2MN]. The ASA
releases this guidance on p-values to improve the conduct and interpretation of quantitative
science and inform the growing emphasis on reproducibility of science research. The statement
also notes that the increased quantification of scientific research and a proliferation of large,
complex data sets has expanded the scope for statistics and the importance of appropriately
chosen techniques, properly conducted analyses, and correct interpretation.
Good statistical practice is an essential component of good scientific practice, the statement
observes, and such practice “emphasizes principles of good study design and conduct, a variety
of numerical and graphical summaries of data, understanding of the phenomenon under study,
interpretation of results in context, complete reporting and proper logical and quantitative
understanding of what data summaries mean.”
“The p-value was never intended to be a substitute for scientific reasoning,” said Ron
Wasserstein, the ASA’s executive director. “Well-reasoned statistical arguments contain much
more than the value of a single number and whether that number exceeds an arbitrary
threshold. The ASA statement is intended to steer research into a ‘post p<0.05 era.’”
“Over time it appears the p-value has become a gatekeeper for whether work is publishable, at
least in some fields,” said Jessica Utts, ASA president. “This apparent editorial bias leads to the
‘file-drawer effect,’ in which research with statistically significant outcomes are much more
likely to get published, while other work that might well be just as important scientifically is
never seen in print. It also leads to practices called by such names as ‘p-hacking’ and ‘data
dredging’ that emphasize the search for small p-values over other statistical and scientific
reasoning.”
The statement’s six principles, many of which address misconceptions and misuse of the pvalue,
are the following:
1. P-values can indicate how incompatible the data are with a specified statistical model.
2. P-values do not measure the probability that the studied hypothesis is true, or the
probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based only on
whether a p-value passes a specific threshold.
4. Proper inference requires full reporting and transparency.
5. A p-value, or statistical significance, does not measure the size of an effect or the
importance of a result.
6. By itself, a p-value does not provide a good measure of evidence regarding a model or
hypothesis.
The statement has short paragraphs elaborating on each principle.
In light of misuses of and misconceptions concerning p-values, the statement notes that
statisticians often supplement or even replace p-values with other approaches. These include
methods “that emphasize estimation over testing such as confidence, credibility, or prediction
intervals; Bayesian methods; alternative measures of evidence such as likelihood ratios or
Bayes factors; and other approaches such as decision-theoretic modeling and false discovery
rates.”
“The contents of the ASA statement and the reasoning behind it are not new—statisticians and
other scientists have been writing on the topic for decades,” Utts said. “But this is the first time
that the community of statisticians, as represented by the ASA Board of Directors, has issued a
statement to address these issues.”
“The issues involved in statistical inference are difficult because inference itself is challenging,”
Wasserstein said. He noted that more than a dozen discussion papers are being published in
the ASA journal The American Statistician with the statement to provide more perspective on
this broad and complex topic. “What we hope will follow is a broad discussion across the
scientific community that leads to a more nuanced approach to interpreting, communicating,
and using the results of statistical methods in research.”