Unit 5:
DATA ANALYSIS
We put a great deal of faith in scientific methods of data collection. Yet, the
process can be riddled with errors. Before moving to how we should analyze data
collected during an intervention it would be helpful to consider some common
research errors.
These include:
• mistakes in recording and categorizing data
• mistakes in sampling (perhaps drawing a small sample or one that is not
representative of the larger workforce — for example, having the poorest
10% in terms of performance and designing a training program around the
information they provide)
• subject misrepresentation — or the possibility that people may provide
inaccurate information
• investigator bias that limits what one will find — only looking for and
finding what we believe to be the problem
• faulty instrumentation or the use of data collection techniques that are
neither valid nor reliable
Validity refers to measuring what you actually intend to measure, whereas
reliability concerns measuring with consistency and accuracy.
We can also experience errors when interpreting data. Common mistakes involve:
• making too much out of limited data (drawing conclusions that may not be
warranted)
• making too many decisions based on limited data
• ignoring important findings because of the commitment to established
systems
Effective diagnosis, then, is a function of two factors. Remaining objective and
providing helpful feedback.
Objectivity involves:
• being aware of any biases in the data collected
• questioning and confirming findings before drawing conclusions
• looking beyond symptoms, to recognize actual problems
• recognizing patterns that emerge
• considering the uniqueness of each intervention
• understanding how one’s presence can influence the data collection process
Providing Feedback involves:
• converting feedback into usable information for the client
• clarifying findings and offering assistance in the interpretation of the data
• translating findings into an action plans (solutions) for how to address the
identified problems.
To develop your skills in providing useful feedback and action plans, complete the
Providing Useful Feedback Activity in this unit.
There are many ways to analyze data collected depending on the type of data
collection techniques used.
Data can be numerical (quantitative) or textual (qualitative).
Numerical/quantitative data derive from questionnaires or interviews and require
statistical analysis. A review of basic statistical techniques is provided in the
associated power points. You can find these under the supplemental materials for
this unit.
Descriptive
Statistics
Advanced
Statistics
Unit 5
Unit 5
Descriptive Statistics
Advanced Statistics
Textual/qualitative data comes from many sources, any which ask for people to
report in their own words. This may include open-ended responses on
questionnaires, interviews, focus groups, or organizational documents. Regardless
of the source, textual/qualitative data require the use of textual analysis. More
information about textual analysis is available in the associated power point, which
can be found in the supplemental materials for this unit.
Textual
Analysis
Unit 5
Textual Analysis
Analyzing Results
When reporting the results of a needs assessment it is important to include
information about the data analysis, but not so much information that the findings
are overshadowed by the analysis.
Thus, it is important to streamline the information presented. Only provide the
information that is essential. Just because you collected data does not mean you
need to report about it necessarily. If it tells us little it will only complicate the
process and can be left out of the final report. Or it can be part of a more detailed
report when a shorter concise report is also made available.
Final written reports should include:
• a general overview
• objectives and scope
• methods of data collection
• methods for data analysis
• findings and conclusions
• recommendations based on findings
• expected benefits
• implementation guidelines
Advanced
Statistics
Unit 5
There are several related
topics in this unit…
Types of Variables in Analysis
Univariate and Multivariate
Statistics Overview
Univariate Statistics
Multivariate Statistics
Types of Variables in Analysis
Statistics
Independent Variables (IV)
This is the variable thought to influence or cause a change in the value of another variable.
For example, if you do not get enough sleep you will experience fatigue and drowsiness during work. Lack
of sleep, then, is the independent variable thought to affect fatigue and drowsiness.
Dependent Variables (DV)
This is the variable that is thought to be changed or affected by another (independent) variable. Said another
way, the value of the dependent variable is responsive to or determined by changes in the independent
variable.
In the example above fatigue and drowsiness are the variables affected. We will experience more fatigue
and drowsiness if we have less sleep.
Confounding Variables
This is a variable that confounds, or confuses, the relationship between the independent and dependent
variables. Or we can think of it this way…something other than the independent variable is accounting for
changes in the dependent variable.
For example, how engaging and interesting a meeting is (vs. boring) will affect whether or not you feel
fatigue and drowsiness during the meeting. Thus, lack of sleep is not accounting for fatigue or drowsiness.
Rather the nature of the meeting or a combination of lack of sleep and the nature of the meeting are causing
fatigue and drowsiness.
Univariate and Multivariate
Statistics Overview
Statistics
We differentiate statistics as univariate or multivariate depending on the
number of dependent variables involved in the statistical analysis.
When there is a single dependent variable we use a univariate statistic.
When there is more than one dependent variable we use a multivariate
statistic.
We also need to consider how both the dependent and independent variables
were measured in order to determine what statistic is appropriate. Remember
that we can measure numerically (interval and ratio level of measurement) or
we can measure simply by differentiating between types (nominal level of
measurement).
Univariate Statistics
Statistics
There are two groups of univariate statistics we commonly use
when we have a single numerical dependent variable.
The first set are appropriate when we have a nominal/categorical
independent variable. This would include statistics that compare
categories or groups like men/women, highly
satisfied/dissatisfied employees, youth/seniors, etc.
These include…
t-test
ANOVA
ANCOVA
and Factorial Analysis of Variance
Univariate Statistics
Statistics
We use the following statistics when we have a single numerical dependent
variable and we want to make…
t-test
a simple comparison between two groups
ANOVA
(a one-way analysis of variance)
a comparison between three or more groups
ANCOVA
a comparison between three or more groups
while controlling for a confounding variable
In all these cases we have only a single independent variable, which may be
comprised of two, three, or more groups. However, when we have more than
one independent variable we need to use a factorial analysis of variance.
Factorial Analysis of Variance
Statistics
A factorial analysis of variance involves a comparison of scores
on a single, numerical dependent variable — the value of which
is determined by several nominal or categorical independent
variables.
Factorial analyses of variance are prefaced with a numerical
string or statement that indicates:
the number of independent variables (designated by the total
number of numbers in the string, not the values of the
numbers)
and the number of levels of each independent variable
(designated by the actual values of each number in the string)
Factorial Analysis of Variance
Statistics
So for example, a 3x2x3 factorial analysis of variance has…
3 independent variables,
the first with 3 levels,
the second with 2 levels,
and the third with 3 levels.
Similarly, a 4x2 factorial analysis of variance has…
two independent variables,
the first with four levels
and the second with two.
This could be a comparison that examines
student achievement (A, B, C, and D students)
and sex (male, female).
Univariate Statistics
Statistics
When we attempt to determine if variables are related and both the
independent and dependent variables have been measured numerically we use
one of the following univariate statistics…
Correlation
simply assessing the relationship between
independent and dependent variables
Regression
assessing the ability of the independent variable to
predict the value of the dependent variable
Multiple
Regression
assessing the predictive ability of several
independent variables on a single dependent
variable
Univariate Statistics
Statistics
The chart below helps to clarify how the common univariate statistical procedures relate
and differ from one another. Being univariate all the statistics below have a single dependent
variable that is numerical (measured at the interval or ratio level of measurement).
t-test (2 groups)
ANOVA (3+ groups)
ANCOVA (while controlling)
Factorial Analysis
of Variance (with more than 1 IV)
Correlation (relating)
Regression (predicting)
Multiple Regression
(with more than 1 IV)
The family of statistics in the left-hand column have nominal/categorical independent variables
(abbreviated in the chart as IV) and therefore involve comparisons between groups.
The family of statistics in the right-hand column have numerical independent variables and thus
involve assessing relationships between variables (versus groups).
Multivariate Statistics
Statistics
Multivariate statistics are appropriate when we have more than one
dependent variable. It is helpful to think of them as an extension of the two
previous groups discussed.
When we compare groups and we have more than one dependent variable we
move from an ANOVA to a…
MANOVA
compares groups in terms of more than one
dependent variable
Or from an ANCOVA to a…
MANCOVA
compares groups in terms of more than one
dependent variable while controlling for a
confounding variable
Multivariate Statistics
Statistics
Similarly, we can move from a multiple regression (which
considers how several numerical independent variables predict a
single numerical dependent variable) to a…
Canonical
Correlation
examines the relationship between multiple
independent and multiple dependent
variables all of which are numerical
or, said another way, examines the
relationship between a group of
numerical independent and a group of
numerical dependent variables
Multivariate Statistics
Statistics
The chart below serves to clarify how the common multivariate statistical procedures
relate and differ from one another. As multivariate statistics all of those listed below
have multiple dependent variables (abbreviated as DV in the chart) that are numerical
in nature.
MANOVA (more than 1 DV)
MANCOVA (while controlling)
Canonical Correlation
(comparing two sets of variables)
As with the univariate families of statistics, the family of statistics in the left-hand
column have nominal/categorical independent variables and therefore involve
comparisons between groups.
The family of statistics in the right-hand column have numerical independent variables
and thus involve assessing relationships between variables (versus groups).
Uni- and Multivariate Statistics
Statistics
Finally, the chart below puts both the univariate and multivariate statistics together.
You can see then how the univariate statistics link to the multivariate statistics.
Univariate Statistics (Single Dependent Variable)
t-test (2 groups)
Correlation (relating)
ANOVA (3+ groups)
Regression (predicting)
ANCOVA (while controlling) Multiple Regression
Factorial Analysis
(with more than 1 IV)
of Variance (with more than 1 IV)
Multivariate Statistics (More Than One Dependent Variable)
MANOVA (more than 1 DV)
MANCOVA (while controlling)
Canonical Correlation
(comparing two sets of variables)
Descriptive
Statistics
Unit 5
There are several related
topics in this unit…
Descriptive Statistics Overview
Measures of Central Tendency
Measures of Dispersion
Descriptive Statistics Overview
Statistics
Descriptive Statistics tell us about specific trends in our data and describe specific
features of our sample.
For example, a researcher will use descriptive statistics to tell readers about the
proportion of men and women who participated in a study. The research may write
something like:
“In this study 40% of the sample were men, whereas 60% were female.”
Or the researcher may inform readers about participants’ average scores on a particular
variable in the study. In this case the researcher may say:
“The mean score on the communication competence measure was 14.55”.
The primary descriptive statistics fall into one of two “families”:
measures of central tendency
measures of dispersion.
Measures of Central Tendency
Statistics
Measures of central tendency, as the name infers, tell us
about a central characteristic of the data. Measures of
central tendency include…
the Mode
the Median
and the Mean
Mode
Statistics
The mode is the simplest measure of central tendency.
It indicates which score or value in a distribution occurs most frequently.
The mode is appropriate when we have nominal or categorical data.
In these instances we are interested in how often each category was used or
appeared. In essence we count observations that appear in each category and
then report which category had the most observations.
Thus, the mode is the descriptive statistic that tells us which category has the
most observations or which category appears most often in the data.
Mode
Statistics
Say, for example, that we are interested in people’s perceptions of what constitutes
sexual harassment? To determine this we could provide people with a list of behaviors
and ask them to respond by simply checking “yes” or “no” (nominal categories) if they
believe the behavior reflects sexual harassment:
1. sexual comments
2. inappropriate gaze
3. sexual jokes
4. display of pornographic materials in the office
5. “pick-up” or “come on” lines
____ yes
____ yes
____ yes
____ yes
____ yes
____ no
____ no
____ no
____ no
____ no
The mode will tell us which of these behaviors people perceived to be more sexually
harassing compared to the others as it would reflect the category that had the most
“yes” responses. Or if we were interested in the least sexually harassing behavior, we
could count up the “no” responses and report the mode for “no” responses.
Median
Statistics
The median divides a distribution of quantitative data exactly in
half. It is the score above which and below which half the
observations fall.
The median is most appropriate for describing the center point of
a set of ordinal data because it tells us the point at which half of
the cases rank higher and half rank lower.
For example, in a horse race the horse that finished fifth out of
nine represents the median, as four horses finished before or
above it and four horses finished behind or below it.
Median
Statistics
The median also can be used with interval/ratio data, but can be problematic
because it is not sensitive to extreme scores.
That is, two distributions may have the same median or middle point, but one
could include much higher and/or lower values than the other.
Simply seeing the median would lead us to believe that the two distributions of
scores are more similar than they actually are.
For example:
4
10
10
11
12
13
15
15
28
17
36
21
47
25
The median for both distributions is 15, but the first distribution includes much
lower and higher values (from 4 to 47) than the second one (from 10 to 25).
Median
Statistics
Beyond simply describing the middle point of a distribution,
researchers may use the median to create groups to compare. This
is called a median split. When researchers have ordinal or ratio
data but want to create groups or categories they can do so by
using a median split to create two groups.
Accordingly, the researcher determines what the median is for the
variable of interest and then creates a “high” group with scores
above the median and a “low group” with scores below the
median.
Median
Statistics
For example, a researcher interested in comparing people high and low in
verbal aggressiveness can find the median of the verbal aggressiveness scores
for all participants. She can then take all of the cases above that median to
create the “high” group and all the scores below that median to create the
“low” group.
Then the researcher can compare people high and low in verbal aggressiveness
on some other variable of interest.
Do people high and low in verbal aggressiveness differ with regard to their
marital satisfaction? Their communication competence?
Mean
Statistics
The mean is the arithmetic average. It is computed by adding all
the scores in a distribution and dividing by the total number of
scores.
It helps to clarify what the average score on a variable of interest
is. For example, we may see any of the following reported.
“The mean…
“…for communication apprehension was 14.56.”
“…for hours of television watched per week was 8.56.”
“…for age of respondents in this study was 43.69 years.”
Mean
Statistics
The mean is appropriate for interval/ratio data because of
“assumed equivalence” or the idea that all points on the scale are
assumed to be of equal distance from one another (i.e.., 1 is the
same distance from two as two is from three and so on).
Unlike the other types of central tendency descriptive statistics
the mean is sensitive to all scores, including extreme scores, in
the distribution.
That is why it is thought to be the most sophisticated measure of
central tendency.
Measures of Dispersion
Statistics
Measures of dispersion show us how data spread out in a distribution.
Think about, for example, dropping a glass of water and a can of motor oil on
the floor. Both will spill and disperse (i.e., spread out), but they will do so very
differently.
Thus, measures of dispersion tell us about how data spread out across a
distribution. They include…
the Range
Variance
and Standard Deviation
Range
Statistics
The range is the simplest measure of dispersion. It reports the distance
between the highest and lowest scores in the distribution.
The range, therefore, is calculated by subtracting the lowest number from the
highest number in the distribution.
The range gives a general sense of how much the data spread out across the
distribution, which can be helpful for understanding whether a study included
a lot of variability or whether it drew from a narrow spectrum.
For example, if a researcher intends to study a communication variable across
a wide range of age groups, a sample of people aged 18-21 (a range of 3) is not
very diverse. Yet a sample of people aged 18-70 (a range of 52) is.
Range
Statistics
One concern with the range is sensitivity to extreme scores. Because the range takes
into account all scores in the distribution it can be misleading when “outlier” scores
exist in a distribution. Outliers are scores that are far removed from the rest of the
distribution.
In the example of age just used you could have a distribution that ranges from 18-70,
yet there is only one person aged 70 and the next closest score is actually 24. The age
of 70 makes the distribution look much larger than it actually is once you take this
outlier into account. If we exclude the outlier, which often researchers do when
necessary, the range is actually 6 as the scores spread from 18-24, not 58 as is the case
when the outlier is included and the scores spread from 18-70.
To avoid problems with outliers researchers may report the interquartile range. This
is the range of scores representing the middle 50% of scores (or the middle two quarters
of the distribution). The upper and lower 25% of scores (the outer quarters where
outliers will be) are excluded. An interquartile range provides a more conservative
representation of the range.
Variance
Statistics
Variance is the average distance of scores from the mean, in squared units.
We can compute variance when we have interval or ratio data.
Why squared units? Well, that has to do with how we compute variance. To
compute variance we do the following:
1. Subtract each score in the distribution from the mean score of the
distribution
2. Square each of these values
3. Sum all of the squared values
4. Divide the sum of squared values by the total number of scores
Variance
Statistics
When computing variance we need to square the values in step two so that
they do not cancel one another out in step 3.
For example, say that we have values that are +2, +3, and +4 points above the
mean and values that are -2, -3, and -4 points below the mean. When we go to
add these up without squaring them they will cancel each other out and we will
end up with a value of zero.
To ensure this doesn’t happen we square all of the values. Thus, 2, 3, and 4
become 4, 9, and 16 regardless of whether or not they were positive or
negative values previously. This is so, as you may recall, because we square
negative numbers to get rid of the negative sign/value.
Thus, all of the values are positive and can be summed for a total. This sum in
turn (known as the sum of squares) can be divided by the total number
of observations.
Variance
Statistics
So, in our example above we would add
4 + 9 + 16 + 4 + 9 + 16 = 58
Then we would divide 58 by 6 (the number of observations) to obtain the
variance. In this case the variance is = 9.67.
You can see that this last part of the process essentially involves the
computation of a mean.
Thus, it is helpful to think of variance as the mean or average of how scores
disperse or spread out from the mean score.
Variance is a helpful measure of dispersion, however it is of very limited use
because it is no longer in the original units of measure. Rather, because of the
computation necessary it ends up in squared units.
Standard Deviation
Statistics
How then can we change variance into something usable and
meaningful? That is, how do we return to the original units of
measure?
Well, we need to get rid of the squared scores.
You may recall that we use the square root when we want to get
rid of squared scores. The same is true here.
We can take the square root of variance to calculate or compute a
measure of dispersion that is in the original units of measure.
This produces the standard deviation.
Standard Deviation
Statistics
Standard deviation, like variance, is a measure of
dispersion that explains how much scores in a set of
interval/ratio data vary from the mean. However, unlike
variance, it is expressed in the original units of
measurement.
So, say the variance is 9.67 as was the case in our
earlier example…
…the square root of 9.67 is 3.11.
…the standard deviation therefore is 3.11.
Standard Deviation
Statistics
Standard deviation helps us understand how a distribution spreads out. It is
often reported alongside the mean score of a distribution.
So, for example, we may see reports that list any of the following means and
standard deviations:
M = 12.56, SD = 2.45
M = 10.21, SD = 5.64
M = 28.45, SD = 8.45
An italicized M is the statistical notation for the mean and an italicized SD is
the statistical notation for the standard deviation.
From the reports for each distribution above we would know both the
average score (M) and the average distance of all other scores in the
distribution from that average score (SD).
Standard Deviation
Statistics
When we see the M and SD reported, we can draw some conclusions about the
distribution.
What if we saw the following descriptive statistics reported for three different
distributions of data?
M = 14.56, SD = 2.45
M = 14.56, SD = 5.64
M = 14.56, SD = 8.45
The examples above all include the same mean to make a point about the standard
deviation. In the first distribution the scores do not disperse widely, in the
second they disperse moderately, and in the third they disperse considerably.
Thus, the first distribution would appear as a tall and narrow curve, the second
as a bell-shaped curve, and the third as a broad and comparatively flat curve.
Providing Useful Feedback Activity
Below are several possible findings that could be generated through a needs
analysis. But they need to be converted into useable feedback before they can be
applied. Read each of the findings and convert them into usable feedback (i.e.,
clarify findings and offer action plans related to the particular findings). Consult
the example below carefully before completing the remainder of the activity. Use
as much space as necessary to provide your responses.
EXAMPLE
Qualitative data revealed that numerous employees reported incidents
in which they felt they were disciplined excessively when they made
mistakes on the job.
Clarification:
Employees reported feeling like they were excessively reprimanded.
Action Plan:
Train mid-level managers in providing constructive rather than
punitive employee feedback.
ACTIVITY
1. Questionnaire data revealed that production workers’ scores on an index of
organizational loyalty were lower than their counterparts in sales and engineering.
Clarification:
Action Plan:
2. Interview data collected from 15 customer service representatives revealed
discrepancies about how employees felt the company expected them to deal with
customer complaints.
Clarification:
Action Plan:
3. Questionnaire data revealed that sales agents’ scores on an index of
organizational conflict indicated that they generally avoided conflict with their
supervisors.
Clarification:
Action Plan:
4. Interview data collected from 25 engineers revealed that they believed it would
damage their careers if they reported product defects to management.
Clarification:
Action Plan:
5. Observations of janitorial staff indicated that their friendly demeanor with the
office personnel was not reciprocated.
Clarification:
Action Plan:
Instruction:
1st Level Activities rely directly on your applying course content from the
unit. They do not require you to do any additional work beyond applying the
course content from the unit to the activity.
PAPERS
Note: The format for papers can be either APA, MLA or another standard
format. Citations can be in the text or noted in notes or in a bibliography.
Double spaced is preferred. Assume that approximately 250 words equals one
page.
You have to write at least 2 double space pages on the activity worksheet.
Unit 5:
DATA ANALYSIS
We put a great deal of faith in scientific methods of data collection. Yet, the
process can be riddled with errors. Before moving to how we should analyze data
collected during an intervention it would be helpful to consider some common
research errors.
These include:
• mistakes in recording and categorizing data
• mistakes in sampling (perhaps drawing a small sample or one that is not
representative of the larger workforce — for example, having the poorest
10% in terms of performance and designing a training program around the
information they provide)
• subject misrepresentation — or the possibility that people may provide
inaccurate information
• investigator bias that limits what one will find — only looking for and
finding what we believe to be the problem
• faulty instrumentation or the use of data collection techniques that are
neither valid nor reliable
Validity refers to measuring what you actually intend to measure, whereas
reliability concerns measuring with consistency and accuracy.
We can also experience errors when interpreting data. Common mistakes involve:
• making too much out of limited data (drawing conclusions that may not be
warranted)
• making too many decisions based on limited data
• ignoring important findings because of the commitment to established
systems
Effective diagnosis, then, is a function of two factors. Remaining objective and
providing helpful feedback.
Objectivity involves:
• being aware of any biases in the data collected
• questioning and confirming findings before drawing conclusions
• looking beyond symptoms, to recognize actual problems
• recognizing patterns that emerge
• considering the uniqueness of each intervention
• understanding how one’s presence can influence the data collection process
Providing Feedback involves:
• converting feedback into usable information for the client
• clarifying findings and offering assistance in the interpretation of the data
• translating findings into an action plans (solutions) for how to address the
identified problems.
To develop your skills in providing useful feedback and action plans, complete the
Providing Useful Feedback Activity in this unit.
There are many ways to analyze data collected depending on the type of data
collection techniques used.
Data can be numerical (quantitative) or textual (qualitative).
Numerical/quantitative data derive from questionnaires or interviews and require
statistical analysis. A review of basic statistical techniques is provided in the
associated power points. You can find these under the supplemental materials for
this unit.
Descriptive
Statistics
Advanced
Statistics
Unit 5
Unit 5
Descriptive Statistics
Advanced Statistics
Textual/qualitative data comes from many sources, any which ask for people to
report in their own words. This may include open-ended responses on
questionnaires, interviews, focus groups, or organizational documents. Regardless
of the source, textual/qualitative data require the use of textual analysis. More
information about textual analysis is available in the associated power point, which
can be found in the supplemental materials for this unit.
Textual
Analysis
Unit 5
Textual Analysis
Analyzing Results
When reporting the results of a needs assessment it is important to include
information about the data analysis, but not so much information that the findings
are overshadowed by the analysis.
Thus, it is important to streamline the information presented. Only provide the
information that is essential. Just because you collected data does not mean you
need to report about it necessarily. If it tells us little it will only complicate the
process and can be left out of the final report. Or it can be part of a more detailed
report when a shorter concise report is also made available.
Final written reports should include:
• a general overview
• objectives and scope
• methods of data collection
• methods for data analysis
• findings and conclusions
• recommendations based on findings
• expected benefits
• implementation guidelines
Unit 5:
DATA ANALYSIS
We put a great deal of faith in scientific methods of data collection. Yet, the
process can be riddled with errors. Before moving to how we should analyze data
collected during an intervention it would be helpful to consider some common
research errors.
These include:
• mistakes in recording and categorizing data
• mistakes in sampling (perhaps drawing a small sample or one that is not
representative of the larger workforce — for example, having the poorest
10% in terms of performance and designing a training program around the
information they provide)
• subject misrepresentation — or the possibility that people may provide
inaccurate information
• investigator bias that limits what one will find — only looking for and
finding what we believe to be the problem
• faulty instrumentation or the use of data collection techniques that are
neither valid nor reliable
Validity refers to measuring what you actually intend to measure, whereas
reliability concerns measuring with consistency and accuracy.
We can also experience errors when interpreting data. Common mistakes involve:
• making too much out of limited data (drawing conclusions that may not be
warranted)
• making too many decisions based on limited data
• ignoring important findings because of the commitment to established
systems
Effective diagnosis, then, is a function of two factors. Remaining objective and
providing helpful feedback.
Objectivity involves:
• being aware of any biases in the data collected
• questioning and confirming findings before drawing conclusions
• looking beyond symptoms, to recognize actual problems
• recognizing patterns that emerge
• considering the uniqueness of each intervention
• understanding how one’s presence can influence the data collection process
Providing Feedback involves:
• converting feedback into usable information for the client
• clarifying findings and offering assistance in the interpretation of the data
• translating findings into an action plans (solutions) for how to address the
identified problems.
To develop your skills in providing useful feedback and action plans, complete the
Providing Useful Feedback Activity in this unit.
There are many ways to analyze data collected depending on the type of data
collection techniques used.
Data can be numerical (quantitative) or textual (qualitative).
Numerical/quantitative data derive from questionnaires or interviews and require
statistical analysis. A review of basic statistical techniques is provided in the
associated power points. You can find these under the supplemental materials for
this unit.
Descriptive
Statistics
Advanced
Statistics
Unit 5
Unit 5
Descriptive Statistics
Advanced Statistics
Textual/qualitative data comes from many sources, any which ask for people to
report in their own words. This may include open-ended responses on
questionnaires, interviews, focus groups, or organizational documents. Regardless
of the source, textual/qualitative data require the use of textual analysis. More
information about textual analysis is available in the associated power point, which
can be found in the supplemental materials for this unit.
Textual
Analysis
Unit 5
Textual Analysis
Analyzing Results
When reporting the results of a needs assessment it is important to include
information about the data analysis, but not so much information that the findings
are overshadowed by the analysis.
Thus, it is important to streamline the information presented. Only provide the
information that is essential. Just because you collected data does not mean you
need to report about it necessarily. If it tells us little it will only complicate the
process and can be left out of the final report. Or it can be part of a more detailed
report when a shorter concise report is also made available.
Final written reports should include:
• a general overview
• objectives and scope
• methods of data collection
• methods for data analysis
• findings and conclusions
• recommendations based on findings
• expected benefits
• implementation guidelines
Advanced
Statistics
Unit 5
There are several related
topics in this unit…
Types of Variables in Analysis
Univariate and Multivariate
Statistics Overview
Univariate Statistics
Multivariate Statistics
Types of Variables in Analysis
Statistics
Independent Variables (IV)
This is the variable thought to influence or cause a change in the value of another variable.
For example, if you do not get enough sleep you will experience fatigue and drowsiness during work. Lack
of sleep, then, is the independent variable thought to affect fatigue and drowsiness.
Dependent Variables (DV)
This is the variable that is thought to be changed or affected by another (independent) variable. Said another
way, the value of the dependent variable is responsive to or determined by changes in the independent
variable.
In the example above fatigue and drowsiness are the variables affected. We will experience more fatigue
and drowsiness if we have less sleep.
Confounding Variables
This is a variable that confounds, or confuses, the relationship between the independent and dependent
variables. Or we can think of it this way…something other than the independent variable is accounting for
changes in the dependent variable.
For example, how engaging and interesting a meeting is (vs. boring) will affect whether or not you feel
fatigue and drowsiness during the meeting. Thus, lack of sleep is not accounting for fatigue or drowsiness.
Rather the nature of the meeting or a combination of lack of sleep and the nature of the meeting are causing
fatigue and drowsiness.
Univariate and Multivariate
Statistics Overview
Statistics
We differentiate statistics as univariate or multivariate depending on the
number of dependent variables involved in the statistical analysis.
When there is a single dependent variable we use a univariate statistic.
When there is more than one dependent variable we use a multivariate
statistic.
We also need to consider how both the dependent and independent variables
were measured in order to determine what statistic is appropriate. Remember
that we can measure numerically (interval and ratio level of measurement) or
we can measure simply by differentiating between types (nominal level of
measurement).
Univariate Statistics
Statistics
There are two groups of univariate statistics we commonly use
when we have a single numerical dependent variable.
The first set are appropriate when we have a nominal/categorical
independent variable. This would include statistics that compare
categories or groups like men/women, highly
satisfied/dissatisfied employees, youth/seniors, etc.
These include…
t-test
ANOVA
ANCOVA
and Factorial Analysis of Variance
Univariate Statistics
Statistics
We use the following statistics when we have a single numerical dependent
variable and we want to make…
t-test
a simple comparison between two groups
ANOVA
(a one-way analysis of variance)
a comparison between three or more groups
ANCOVA
a comparison between three or more groups
while controlling for a confounding variable
In all these cases we have only a single independent variable, which may be
comprised of two, three, or more groups. However, when we have more than
one independent variable we need to use a factorial analysis of variance.
Factorial Analysis of Variance
Statistics
A factorial analysis of variance involves a comparison of scores
on a single, numerical dependent variable — the value of which
is determined by several nominal or categorical independent
variables.
Factorial analyses of variance are prefaced with a numerical
string or statement that indicates:
the number of independent variables (designated by the total
number of numbers in the string, not the values of the
numbers)
and the number of levels of each independent variable
(designated by the actual values of each number in the string)
Factorial Analysis of Variance
Statistics
So for example, a 3x2x3 factorial analysis of variance has…
3 independent variables,
the first with 3 levels,
the second with 2 levels,
and the third with 3 levels.
Similarly, a 4x2 factorial analysis of variance has…
two independent variables,
the first with four levels
and the second with two.
This could be a comparison that examines
student achievement (A, B, C, and D students)
and sex (male, female).
Univariate Statistics
Statistics
When we attempt to determine if variables are related and both the
independent and dependent variables have been measured numerically we use
one of the following univariate statistics…
Correlation
simply assessing the relationship between
independent and dependent variables
Regression
assessing the ability of the independent variable to
predict the value of the dependent variable
Multiple
Regression
assessing the predictive ability of several
independent variables on a single dependent
variable
Univariate Statistics
Statistics
The chart below helps to clarify how the common univariate statistical procedures relate
and differ from one another. Being univariate all the statistics below have a single dependent
variable that is numerical (measured at the interval or ratio level of measurement).
t-test (2 groups)
ANOVA (3+ groups)
ANCOVA (while controlling)
Factorial Analysis
of Variance (with more than 1 IV)
Correlation (relating)
Regression (predicting)
Multiple Regression
(with more than 1 IV)
The family of statistics in the left-hand column have nominal/categorical independent variables
(abbreviated in the chart as IV) and therefore involve comparisons between groups.
The family of statistics in the right-hand column have numerical independent variables and thus
involve assessing relationships between variables (versus groups).
Multivariate Statistics
Statistics
Multivariate statistics are appropriate when we have more than one
dependent variable. It is helpful to think of them as an extension of the two
previous groups discussed.
When we compare groups and we have more than one dependent variable we
move from an ANOVA to a…
MANOVA
compares groups in terms of more than one
dependent variable
Or from an ANCOVA to a…
MANCOVA
compares groups in terms of more than one
dependent variable while controlling for a
confounding variable
Multivariate Statistics
Statistics
Similarly, we can move from a multiple regression (which
considers how several numerical independent variables predict a
single numerical dependent variable) to a…
Canonical
Correlation
examines the relationship between multiple
independent and multiple dependent
variables all of which are numerical
or, said another way, examines the
relationship between a group of
numerical independent and a group of
numerical dependent variables
Multivariate Statistics
Statistics
The chart below serves to clarify how the common multivariate statistical procedures
relate and differ from one another. As multivariate statistics all of those listed below
have multiple dependent variables (abbreviated as DV in the chart) that are numerical
in nature.
MANOVA (more than 1 DV)
MANCOVA (while controlling)
Canonical Correlation
(comparing two sets of variables)
As with the univariate families of statistics, the family of statistics in the left-hand
column have nominal/categorical independent variables and therefore involve
comparisons between groups.
The family of statistics in the right-hand column have numerical independent variables
and thus involve assessing relationships between variables (versus groups).
Uni- and Multivariate Statistics
Statistics
Finally, the chart below puts both the univariate and multivariate statistics together.
You can see then how the univariate statistics link to the multivariate statistics.
Univariate Statistics (Single Dependent Variable)
t-test (2 groups)
Correlation (relating)
ANOVA (3+ groups)
Regression (predicting)
ANCOVA (while controlling) Multiple Regression
Factorial Analysis
(with more than 1 IV)
of Variance (with more than 1 IV)
Multivariate Statistics (More Than One Dependent Variable)
MANOVA (more than 1 DV)
MANCOVA (while controlling)
Canonical Correlation
(comparing two sets of variables)
Descriptive
Statistics
Unit 5
There are several related
topics in this unit…
Descriptive Statistics Overview
Measures of Central Tendency
Measures of Dispersion
Descriptive Statistics Overview
Statistics
Descriptive Statistics tell us about specific trends in our data and describe specific
features of our sample.
For example, a researcher will use descriptive statistics to tell readers about the
proportion of men and women who participated in a study. The research may write
something like:
“In this study 40% of the sample were men, whereas 60% were female.”
Or the researcher may inform readers about participants’ average scores on a particular
variable in the study. In this case the researcher may say:
“The mean score on the communication competence measure was 14.55”.
The primary descriptive statistics fall into one of two “families”:
measures of central tendency
measures of dispersion.
Measures of Central Tendency
Statistics
Measures of central tendency, as the name infers, tell us
about a central characteristic of the data. Measures of
central tendency include…
the Mode
the Median
and the Mean
Mode
Statistics
The mode is the simplest measure of central tendency.
It indicates which score or value in a distribution occurs most frequently.
The mode is appropriate when we have nominal or categorical data.
In these instances we are interested in how often each category was used or
appeared. In essence we count observations that appear in each category and
then report which category had the most observations.
Thus, the mode is the descriptive statistic that tells us which category has the
most observations or which category appears most often in the data.
Mode
Statistics
Say, for example, that we are interested in people’s perceptions of what constitutes
sexual harassment? To determine this we could provide people with a list of behaviors
and ask them to respond by simply checking “yes” or “no” (nominal categories) if they
believe the behavior reflects sexual harassment:
1. sexual comments
2. inappropriate gaze
3. sexual jokes
4. display of pornographic materials in the office
5. “pick-up” or “come on” lines
____ yes
____ yes
____ yes
____ yes
____ yes
____ no
____ no
____ no
____ no
____ no
The mode will tell us which of these behaviors people perceived to be more sexually
harassing compared to the others as it would reflect the category that had the most
“yes” responses. Or if we were interested in the least sexually harassing behavior, we
could count up the “no” responses and report the mode for “no” responses.
Median
Statistics
The median divides a distribution of quantitative data exactly in
half. It is the score above which and below which half the
observations fall.
The median is most appropriate for describing the center point of
a set of ordinal data because it tells us the point at which half of
the cases rank higher and half rank lower.
For example, in a horse race the horse that finished fifth out of
nine represents the median, as four horses finished before or
above it and four horses finished behind or below it.
Median
Statistics
The median also can be used with interval/ratio data, but can be problematic
because it is not sensitive to extreme scores.
That is, two distributions may have the same median or middle point, but one
could include much higher and/or lower values than the other.
Simply seeing the median would lead us to believe that the two distributions of
scores are more similar than they actually are.
For example:
4
10
10
11
12
13
15
15
28
17
36
21
47
25
The median for both distributions is 15, but the first distribution includes much
lower and higher values (from 4 to 47) than the second one (from 10 to 25).
Median
Statistics
Beyond simply describing the middle point of a distribution,
researchers may use the median to create groups to compare. This
is called a median split. When researchers have ordinal or ratio
data but want to create groups or categories they can do so by
using a median split to create two groups.
Accordingly, the researcher determines what the median is for the
variable of interest and then creates a “high” group with scores
above the median and a “low group” with scores below the
median.
Median
Statistics
For example, a researcher interested in comparing people high and low in
verbal aggressiveness can find the median of the verbal aggressiveness scores
for all participants. She can then take all of the cases above that median to
create the “high” group and all the scores below that median to create the
“low” group.
Then the researcher can compare people high and low in verbal aggressiveness
on some other variable of interest.
Do people high and low in verbal aggressiveness differ with regard to their
marital satisfaction? Their communication competence?
Mean
Statistics
The mean is the arithmetic average. It is computed by adding all
the scores in a distribution and dividing by the total number of
scores.
It helps to clarify what the average score on a variable of interest
is. For example, we may see any of the following reported.
“The mean…
“…for communication apprehension was 14.56.”
“…for hours of television watched per week was 8.56.”
“…for age of respondents in this study was 43.69 years.”
Mean
Statistics
The mean is appropriate for interval/ratio data because of
“assumed equivalence” or the idea that all points on the scale are
assumed to be of equal distance from one another (i.e.., 1 is the
same distance from two as two is from three and so on).
Unlike the other types of central tendency descriptive statistics
the mean is sensitive to all scores, including extreme scores, in
the distribution.
That is why it is thought to be the most sophisticated measure of
central tendency.
Measures of Dispersion
Statistics
Measures of dispersion show us how data spread out in a distribution.
Think about, for example, dropping a glass of water and a can of motor oil on
the floor. Both will spill and disperse (i.e., spread out), but they will do so very
differently.
Thus, measures of dispersion tell us about how data spread out across a
distribution. They include…
the Range
Variance
and Standard Deviation
Range
Statistics
The range is the simplest measure of dispersion. It reports the distance
between the highest and lowest scores in the distribution.
The range, therefore, is calculated by subtracting the lowest number from the
highest number in the distribution.
The range gives a general sense of how much the data spread out across the
distribution, which can be helpful for understanding whether a study included
a lot of variability or whether it drew from a narrow spectrum.
For example, if a researcher intends to study a communication variable across
a wide range of age groups, a sample of people aged 18-21 (a range of 3) is not
very diverse. Yet a sample of people aged 18-70 (a range of 52) is.
Range
Statistics
One concern with the range is sensitivity to extreme scores. Because the range takes
into account all scores in the distribution it can be misleading when “outlier” scores
exist in a distribution. Outliers are scores that are far removed from the rest of the
distribution.
In the example of age just used you could have a distribution that ranges from 18-70,
yet there is only one person aged 70 and the next closest score is actually 24. The age
of 70 makes the distribution look much larger than it actually is once you take this
outlier into account. If we exclude the outlier, which often researchers do when
necessary, the range is actually 6 as the scores spread from 18-24, not 58 as is the case
when the outlier is included and the scores spread from 18-70.
To avoid problems with outliers researchers may report the interquartile range. This
is the range of scores representing the middle 50% of scores (or the middle two quarters
of the distribution). The upper and lower 25% of scores (the outer quarters where
outliers will be) are excluded. An interquartile range provides a more conservative
representation of the range.
Variance
Statistics
Variance is the average distance of scores from the mean, in squared units.
We can compute variance when we have interval or ratio data.
Why squared units? Well, that has to do with how we compute variance. To
compute variance we do the following:
1. Subtract each score in the distribution from the mean score of the
distribution
2. Square each of these values
3. Sum all of the squared values
4. Divide the sum of squared values by the total number of scores
Variance
Statistics
When computing variance we need to square the values in step two so that
they do not cancel one another out in step 3.
For example, say that we have values that are +2, +3, and +4 points above the
mean and values that are -2, -3, and -4 points below the mean. When we go to
add these up without squaring them they will cancel each other out and we will
end up with a value of zero.
To ensure this doesn’t happen we square all of the values. Thus, 2, 3, and 4
become 4, 9, and 16 regardless of whether or not they were positive or
negative values previously. This is so, as you may recall, because we square
negative numbers to get rid of the negative sign/value.
Thus, all of the values are positive and can be summed for a total. This sum in
turn (known as the sum of squares) can be divided by the total number
of observations.
Variance
Statistics
So, in our example above we would add
4 + 9 + 16 + 4 + 9 + 16 = 58
Then we would divide 58 by 6 (the number of observations) to obtain the
variance. In this case the variance is = 9.67.
You can see that this last part of the process essentially involves the
computation of a mean.
Thus, it is helpful to think of variance as the mean or average of how scores
disperse or spread out from the mean score.
Variance is a helpful measure of dispersion, however it is of very limited use
because it is no longer in the original units of measure. Rather, because of the
computation necessary it ends up in squared units.
Standard Deviation
Statistics
How then can we change variance into something usable and
meaningful? That is, how do we return to the original units of
measure?
Well, we need to get rid of the squared scores.
You may recall that we use the square root when we want to get
rid of squared scores. The same is true here.
We can take the square root of variance to calculate or compute a
measure of dispersion that is in the original units of measure.
This produces the standard deviation.
Standard Deviation
Statistics
Standard deviation, like variance, is a measure of
dispersion that explains how much scores in a set of
interval/ratio data vary from the mean. However, unlike
variance, it is expressed in the original units of
measurement.
So, say the variance is 9.67 as was the case in our
earlier example…
…the square root of 9.67 is 3.11.
…the standard deviation therefore is 3.11.
Standard Deviation
Statistics
Standard deviation helps us understand how a distribution spreads out. It is
often reported alongside the mean score of a distribution.
So, for example, we may see reports that list any of the following means and
standard deviations:
M = 12.56, SD = 2.45
M = 10.21, SD = 5.64
M = 28.45, SD = 8.45
An italicized M is the statistical notation for the mean and an italicized SD is
the statistical notation for the standard deviation.
From the reports for each distribution above we would know both the
average score (M) and the average distance of all other scores in the
distribution from that average score (SD).
Standard Deviation
Statistics
When we see the M and SD reported, we can draw some conclusions about the
distribution.
What if we saw the following descriptive statistics reported for three different
distributions of data?
M = 14.56, SD = 2.45
M = 14.56, SD = 5.64
M = 14.56, SD = 8.45
The examples above all include the same mean to make a point about the standard
deviation. In the first distribution the scores do not disperse widely, in the
second they disperse moderately, and in the third they disperse considerably.
Thus, the first distribution would appear as a tall and narrow curve, the second
as a bell-shaped curve, and the third as a broad and comparatively flat curve.
Textual
Analysis
Unit 5
There are several related
topics in this unit…
What is Textual Analysis?
Where do we find texts to examine?
How do we do textual analysis?
What are some concerns associated
with conducting textual analysis?
What is Textual Analysis?
Textual Analysis
The fundamental premise of textual analysis is that we
can learn about communication by examining
communication artifacts.
Textual analysis is the methodology communication
professionals use to analyze and interpret organizational
artifacts.
Where do we find texts to examine?
Textual Analysis
Communication artifacts or texts derive from one of two sources.
An Existing Universe of Texts
Organizational Texts exist naturally in many forms including, but not
limited to…
reports
training manuals
emails
training videos
corporate newsletters
mission statements
memos
web pages
advertisements
Or the Creation of Texts through Another Methodology
Texts can be created by asking people in an interview or a
questionnaire to report about a communication phenomenon.
Conducting a Textual Analysis
Textual Analysis
Textual Analysis is a multi-step process that involves…
Selecting Texts
Determining the Unit of Analysis
Determining Categories
Coding
and Analyzing Data
Selecting Texts
Textual Analysis
We begin textual analysis by first selecting a sample of texts from
the existing universe of texts.
A researcher must ensure that the texts selected are:
Representative
that the texts sampled are representative of
all possible types of texts that exist within
the universe of texts
Sufficient
that there are enough texts in the sample to
draw meaningful conclusions
Determining the Unit of Analysis
Textual Analysis
Once texts have been selected the researcher must determine what the unit of analysis will be.
The unit of analysis can be a particular statement within the text, the entire text, or some specific
feature of the text.
For example, a researcher may be examining billboards on state highways as a set of texts. She
is interested in how people see and understand the first line of text in billboards. Thus, she decides
that she wants the first line of text on the billboards to be her unit of analysis.
A different researcher may interested in the overall message of the billboard and therefore he
decides that the unit of analysis would be all of the text on the billboard, not just the first line of
text.
A third researcher may be interested in the graphic images that appear on billboards. She decides
to make these the unit of analysis.
Thus we can see how the same set of texts (in this case billboards on state highways) can be
analyzed differently depending on the unit of analysis designated by the researcher.
Determining the Unit of Analysis
Textual Analysis
Once the unit of analysis has been determined
the researcher must begin “unitizing” the texts.
Unitizing is the process of identifying the units
of analysis within the texts to be examined.
Determining the Unit of Analysis
Textual Analysis
For the researchers studying billboards unitizing would involve the following:
identify where the first line of text begins and ends
identify what constitutes the entire text of the billboard
identify what graphic images would be signaled out for study.
In each case the researcher sets the parameters for determining what
constitutes a unit of analysis in the given set of texts. Then the researcher
decides how many units of analysis there actually are in the data set.
In the fist two cases the number of units of analysis would match the number
of billboards included in the sample of texts. However, in the case of graphic
images on billboards there could in fact be many more units of analysis than
there are billboards because individual billboards may have several graphics.
Determining Categories
Textual Analysis
Next the researcher must determine what the categories used in
the textual analysis will be. Categories can be drawn from one of
two sources:
Theory and/or Previous Literature
Previous research and theory will suggest to us what
categories are relevant for a given phenomenon.
For example, when analyzing diaries cataloguing employees’
emotional exchanges at work we could rely upon the standard
6 category list of emotional prototypes, which includes
happiness, sadness, joy, affection, surprise, and anger.
Determining Categories
Textual Analysis
Or the researcher can look to the actual data for categories.
The Current Data (Grounded Theory)
Via a type of analysis known as grounded theory we can
examine the data with the intention of allowing the categories
to emerge naturally from the data.
For instance we could identify several themes that appear in
fans’ web blogs posted on their favorite athletes’ websites. By
reviewing these texts we could find some emergent themes to
provide a sense of what types of messages fans post and why
they post such messages.
Determining Categories
Textual Analysis
Categories derived from previous literature and/or theory must
be:
Mutually exclusive refers to the idea that the categories are
independent and separate from one another. It is a fancy way
of saying that they should not overlap.
Exhaustive means that the categories exhaust all of the
possible categories that should be used.
Equivalent means the categories are measuring or getting at
the same idea.
Coding
Textual Analysis
Once we have selected texts, unitized them, and decided on
categories we can begin coding.
Coding is the process by which we place the units of analysis
into the categories we’ve decided to use. We do this by reviewing
each unit of analysis and placing it in one of the predetermined
categories.
We cannot be certain that all people would code the same data the
same way. Therefore we use multiple coders to do the coding.
Coding
Textual Analysis
The coding process involves:
Training Coders
Assessing Intercoder Reliability
and Achieving Consensus
Training Coders
Textual Analysis
To train coders we need to first introduce them to the category
scheme to be used, reviewing carefully what does and does not
belong in each category.
Then we should have coders practice coding with a set of texts
that are not part of the data to be analyzed.
If we recognize any problems with the coding scheme at this
point or how the coders are using it we need to make the
necessary corrections before beginning our analysis of the actual
data.
Assessing Intercoder Reliability
Textual Analysis
As we noted earlier, we should and usually do use more than one coder. To
determine the degree to which multiple coders have categorized the data
similarly we need to compute what is called intercoder reliability.
We determine intercoder reliability to assess the degree to which coders used
the category scheme similarly and placed units of analysis into categories
accordingly.
There are several ways to compute intercoder reliability including:
Percentage Agreement
Cohen’s Kappa
and Scott’s π
Assessing Intercoder Reliability
Textual Analysis
Percentage agreement simply involves computing the number of times
coders agreed out of the number of total times they could have agreed.
The problem with percentage agreement is that it includes something called
chance agreement (or the possibility that coders agreed by chance and not
intentionally). Therefore percentage agreement is thought of as an “inflated”
measure of intercoder reliability.
In contrast Cohen’s Kappa and Scott’s π use a mathematical computation that
factors out chance agreement and they are therefore considered more
“conservative” estimates of intercoder reliability.
Achieving Consensus
Textual Analysis
Because coders will not agree all of the time there will be unresolved cases
where disagreement has occurred. That is, cases in which coders believe the
same observation belongs in two different categories.
We do not want to exclude these cases. Rather we need to train the coders to
reach consensus.
Coders can achieve consensus by talking about the cases in which originally
there was disagreement until they can come to some consensus about where
the observation belongs.
In research reports we usually see the practice described in a statement like
“Coders discussed cases in which disagreement occurred until consensus was
reached and all observations were categorized.”
Analyzing Data
Textual Analysis
Once the coding procedure has been completed we are ready to
analyze the results. Results can be analyzed either qualitatively or
quantitatively.
Qualitative analysis simply involves examining the texts for
themes and describing the themes accordingly.
Quantitative analysis involves counting the number of
observations in each category and then comparing those
amounts using the X2 (chi square) statistic. This statistic tells
us whether or not differences in amounts are robust enough to
have happened beyond chance.
Concerns with Textual Analysis
Textual Analysis
When conducting textual analysis we have to be attentive to any limitations in
the universe of texts. There are two particular limitations to which we should
attend:
Selective deposit refers to the idea that not all texts would have been
retained or archived. For example, there are many more gospels than just
those that were included in the New Testament.
Selective survival refers to the idea that only some texts have survived
from a larger universe of texts. For example, how many of the speeches of
this country’s founding fathers have survived? Certainly nowhere near as
many as they gave.
Purchase answer to see full
attachment