▶
Data Collection Methods
Methods refers to techniques used to collect data, whereas design is the overall plan or
strategy for when and from whom data are collected. Methods generally fall into one of
two categories: those for collecting primary data (i.e., the generation of new data) and
those for collecting secondary data (i.e., the use of existing data). The evaluation
method needs to be consistent with the purpose of the evaluation and the specific
evaluation question. The following discussion of methods and data sources focuses on
collection of both primary and secondary quantitative data. Methods to collect qualitative
data are sufficiently different and are not covered in this chapter. The most common
form of primary data collected is through the use of surveys and questionnaires. For
each health and well-being domain, various sources of data can be used to generate
information (TABLE 13-3).
TABLE 13-3 Example of Data Sources for Each Health and Well-Being Domain
Health Domain
Examples of Data Sources
Physical health
Survey data: self-report
Secondary data: medical records for medical diagnoses
Physical data: scale for weight, laboratory tests
Observation: response to physical activity
Knowledge
Survey data: self-report, standardized tests
Secondary data: school records
Physical data: not applicable
Observation: performance of task
Lifestyle behavior
Survey data: self-report
Secondary data: police records
Physical data: laboratory tests related to behaviors, such as nicotine or cocaine blood lev
Observation: behaviors in natural settings
Cognitive processes
Survey data: self-report, standardized tests of cognitive development and problem solvin
Secondary data: school records
Physical data: imaging of brain activity
Observation: problem-solving tasks, narrative
Mental health
Survey data: self-reported motivation, values, attitudes
Secondary data: medical records diagnostic category
Physical data: self-inflicted wounds, lab values
Observation: emotional bonding
Social health
Survey data: self-report, social network questionnaires, report of others
Secondary data: attendance records of recreational activities
Physical data: not applicable
Observation: interpersonal interactions
TABLE 13-3 Example of Data Sources for Each Health and Well-Being Domain
Health Domain
Examples of Data Sources
Community-level well-being
Survey data: self-report of safety and community cohesion
Secondary data: neighborhood crime rates, number of primary care clinics
Physical data: number of vacant lots, number of grocery stores
Observation: children and adults using parks, pedestrian traffic, amount of fresh produc
Resources
Survey data: self-report
Secondary data: employer records, country marriage records, school records
Physical data: address
Observation: possessions
Surveys and Questionnaires
A survey is a method that specifies how and from whom data are collected, whereas
a questionnaire is a tool for data collection. Typically surveys use questionnaires to collect data.
For example, the U.S. Census is a survey that uses a questionnaire to collect data on all persons
who reside in the United States. In most cases, residents complete a pen-and-paper questionnaire.
However, in some instances, a census taker completes the questionnaire while talking with the
individual. Although the distinction between a survey and a questionnaire is important for the
sake of clear thinking, generally the word survey implies the use of a questionnaire.
Questionnaire Construction Considerations
Much has been written about ways to construct health questionnaires (e.g., Aday & Cornelius,
2006), ways to write individual questions on the questionnaire (Aday & Cornelius,
2006; Krosnick, 1999), and techniques to have sets of questions form a valid and reliable
scale (DeVellis, 2003; Fowler, 1995; Pedhazur & Schmelkin, 1991). Several key points
can be drawn from these resources that are paramount to developing a good health program
evaluation.
To the extent possible, evaluators should use existing questionnaire items and valid and reliable
scales so that they can avoid spending precious resources “reinventing the wheel.” An example
of existing items is the U.S. Census Bureau race/ethnicity categories. These race/ethnicity items
can be used rather than creating new race/ethnicity categories for a particular evaluation. An
advantage of using existing items is that it provides some assurance that the items are
understandable. Existing scales also can be used, which makes it possible to compare evaluation
participants with those with whom the scale was previously used. However, if the intended
audience has a unique characteristic—for example, a specific medical diagnosis—that is relevant
to understanding the effect of the program, then comparison to existing scales may not be the
optimal choice.
Instruments also need to be appropriate for diverse ethnicities and possibly multiple languages.
In terms of instruments, cultural sensitivity has two dimensions: the surface structure, which
consists of the superficial characteristics, and the deep structure, which consists of core values or
meanings. This second dimension is sometimes called cultural relevance. Attention to both
careful translation and cultural relevance is especially needed for questionnaires that are being
used for the first time with a different cultural group, such as was done by Yu, Wu, and Mood
(2005). Epstein, Osborne, Elsworth, Beaton, and Guillemin (2015), in studying translation for
one health education questionnaire, identified five different translation errors: question style,
frequency or time frame, breadth of what is meant, strength of emphasis, and actual meaning.
One type of scale that is likely to be discussed with regard to evaluating health programs is the
use of client goals. MacKay, Somerville, and Lundie (1996) reported that this evaluation
technique has been used since 1968. Program staff members may be inclined to count the
number of client goals attained as an indicator of program success; indeed, the temptation is to
consider this quantity as outcome data that are very readily available. Unfortunately, this crude
measure of client outcome is highly problematic from an evaluation perspective. The main
problem is that, unless the goals are highly standardized for specific health problems, there can
be great variability in the goals set. Similarly, unless strict criteria have been established for
determining whether a client goal was reached, biases among program staff members may
influence client assessments. The use of goal attainment scaling, in which a Likert-type scale
specific to each goal is used, still poses serious problems (MacKay et al., 1996) and therefore
its use ought to be severely curtailed.
To assess the readability, ease of completion, and overall appeal of the questionnaire, a pretest or
pilot test is advised. Involvement of stakeholders in this activity is encouraged for two reasons: It
helps the evaluators have a better questionnaire given the intended audience, and it helps
stakeholders anticipate what the evaluation data will include. Key considerations are to keep the
language simple, use an easy-to-follow format and layout, and break down complex concepts
into more easily understood ideas. Even if evaluation participants are expected to be well
educated, people are more likely to complete questionnaires that are easy and quick to read.
Verify that what is in the questionnaire corresponds to the program outcome objectives.
Evaluators are often tempted to add “just a few more questions” because the opportunity exists
or because the information might be interesting to know. A shorter questionnaire is both better
and more likely to be completed. A good rationale for going beyond the program objectives in
what is collected is if those data will be used for subsequent program planning or to refine the
current program.
Regardless of the care taken to construct a questionnaire, whatever can be misinterpreted or be
done wrong will inevitably happen. For example, questionnaires that are copied double-sided and
are stapled together are likely to have skipped pages. Unless the questionnaire is administered by
an interviewer who is well trained, do not use skip patterns that direct respondents to skip
questions based on a previous response. These complicated patterns quickly become confusing,
and the well-intending respondent may answer all questions, including those items that ought to
have been skipped. Use of skip patterns is really appropriate only for questionnaires conducted in
person or over the phone.
Survey Considerations
Any survey, whether done in person or via mail or e-mail, needs careful planning. The process
by which the questionnaire is distributed and returned to the evaluator must be planned
thoroughly to minimize nonresponse and nonparticipation rates. That process is as critical as the
quality of the questionnaire to the success of the evaluation. One technique for developing a
well-crafted survey plan is to imagine and role-play each step in the process, and to follow the
paper from hand to hand.
Collection of questionnaire data is being done electronically more and more often, whether via
handheld devices used in the field, Internet-based surveys, or client-accessed computers. The
same advice about the need to have a carefully crafted data collection plan applies to the use of
electronic data collection: Follow the answers from the asker to the responder through data entry
to computer output. Each of these steps is needed to be able to collect the data accurately and
feasibly, and constitutes the survey design.
Response Biases
A threat to the quality of questionnaire data, and especially to self-report data from individuals,
comes from the various types of response bias, the intentional or unconscious systematic way in
which individuals select responses. One of the most common types of response bias, known
as social desirability, is answering questions in a manner intended to make a favorable
impression (Krumpal, 2013). Social desirability is a powerful motivator and has been widely
included in program evaluations in which there is the potential for the participant to want to
please the evaluators or when the participants believe there is a socially correct answer they are
supposed to give. Response bias can also occur as a result of the respondent falling into a pattern
of just giving the same response, regardless of the question or his or her true opinion or feeling.
Response bias can be difficult to anticipate. Nonetheless, evaluators would be wise to consider
that both response bias and errors inherent in the way the variables are measured can
interactively produce questionable or even totally undesirable data (TABLE 13-4).
TABLE 13-4 Interaction of Response Bias and Variable Error
Variable Error
Bias
Low
High
Low
Ideal: high range of honest responses on good measure
Questionable
poor measure
High
Questionable but acceptable data from skewed responses (i.e., toward socially desirable
responses) on good measure
Unusable dat
Secondary Data
Secondary data are data that have already been collected and are now being used for a purpose
that is secondary to their original purpose. Some sources of existing data are appropriately used
to assess the effect of health programs; others are not. Each source of secondary data must be
carefully considered with regard to its quality. Evaluators must decide whether the data are
actually needed to answer the evaluation question.
Vital records—namely, birth certificates, death certificates, and disease registries—are a major
source of secondary data for health program evaluators. Birth records contain a wealth of
information on prenatal variables, delivery complications, and infant characteristics. These
records are usually not available for up to a year past the date of the birth of the infant, so
evaluations of prenatal programs that are designed to affect birth outcomes will not be able to
include data from birth records immediately following the program. If the evaluation is
longitudinal and focuses on trends, then birth record data may be useful. However, pinpointing
the time of the programmatic intervention may be challenging. In addition, for community-based
interventions, sampling comparable communities for comparison of birth data will need to take
into account how to select the two communities using the address information on the birth
certificates. These same caveats to using birth data apply to data from death certificates or
disease registries.
Medical records, case files, or insurance claims may also contain information desired for the
evaluation. Several issues must be considered before embarking on data abstraction. First is the
quality of the data as recorded and available for abstraction. Because the data in such records are
collected for clinical purposes rather than evaluation purposes, the information can be
inconsistent and may vary by the practitioner recording the data. If the evaluator has reason to
believe that data in the record are recorded reliably, the evaluator must then devise a reliable way
to abstract the data. This effort involves training individual data abstractors. If any interpretation
of the record information is required, guidelines for what will be recorded and decision rules for
interpretation must be understood and applied consistently by all of the data abstractors.
Typically, the goal is at least 80% agreement between any two abstractors on the coding of data
from a single data source.
Another source of secondary data is national surveys, such as the National Health and Nutrition
Examination Survey (NHANES) or the National Survey of Family Growth (NSFG). These and
several other surveys are conducted periodically by various federal agencies with a health focus,
including the Occupational Safety and Health Administration. These data sets have often been
used for community assessment. Data from these surveys are accessible to the public through the
Internet; they can be used for evaluation of population-level programs. Some data sets have
restrictions or stipulations on their use that must be addressed before they can be used. A
drawback to using these broad surveys is that the most recent data can be as much as two years
old. As secondary data sets, they may be of limited value in determining the effect of small-scale
programs. By contrast, they may be highly useful if the effect evaluation focuses on a
population-level health program, such as a state program, and the timing is such that immediate
information is not critical.
The use of large secondary data sets for the evaluation of programs faces the challenge of
overcoming conceptual issues, such as associating the variables available in the data set to the
program theory and determining the reliability and validity of the data. Other pragmatic
considerations arise as well, such as selection of subsamples and the need to recode data. In
addition, data from some national surveys may not generate results applicable to rural
populations (Borders, Rohrer, & Vaughn, 2000). Overall, the evaluator needs to be cautious
and have a specific rationale for using large secondary data sets for an effect evaluation.
Big Data
Large amounts of complex, diverse data from surveillance, program client records, electronic
health records, and billings are increasingly available for public health evaluation. In addition,
federal and public efforts to promote healthcare transparency have resulted in making decades of
data available for research (Groves, Kayyali, Knott, & Kuiken, 2013). These have commonly
been referred to as big data and have been defined as “large volumes of high velocity, complex,
and variable data that require advanced techniques and technologies to enable the capture,
storage, distribution, management, and analysis of the information” (Cottle et al., 2013).
Big data is characterized by the four V’s: volume, velocity, variety, and veracity (IBM, 2011).
The continuous rapid generation and accumulation of health data from various sources creates a
tremendous volume of data in structured, unstructured, and semistructured forms, all of which
need credibility verification (Raghupathi & Raghupathi, 2014). Storage and management of
digital, data such as those from billing and electronic health records, can overwhelm traditional
data management methods (Raghupathi & Raghupathi, 2014), requiring customized
applications and analytical tools (Groves et al., 2013). Examples of big data platforms and
tools include Hadoop/MapReduce (open-source platforms on the cloud), Pig, Hive, Jaql,
Zookeeper, HBase, Oozie, (Raghupathi & Raghupathi, 2014) and expandable infrastructures
(i.e., data lakes, cloud data storage) (Roski, Bo-Linn, & Andrews, 2014) .
Data analytics have numerous applications for health program monitoring. For instance,
predictive analytics links numerous data sources (e.g., clinical, genetic, outcomes, claims, social
data, and more) to predict outcomes within and/or outside established health systems (Groves et
al., 2013; Schneeweiss, 2016). Monitoring can also occur through wearables such as the
Fitbit and Apple Watch, as well as through smart clothing, that collect physiological data,
integrated with technology from the Internet of Things (IoT) for daily care, rehabilitation,
training, and chronic disease management (Chen, Ma, Song, Lai, & Hu, 2016). Public health
informatics can be used to detect trends including risky sexual behavior, drug use, disease
transmission and other public health threats, and events such as hospital readmissions suggesting
some aspect of public health system failure (Lee et al., 2016; Luo, Wu, Gopukumar, & Zhao,
2016).
Issues that public health planners need to consider in relation to big data include availability,
scalability, privacy, quality assurance, and ease of use (Raghupathi & Raghupathi, 2014).
Big data are increasingly key components to informing public health as well as other sectors, and
they are likely to affect the nature of evaluation in upcoming decades.
Physical Data
Biological samples, anthropometric measures, and environmental samples are examples of
physical data that may be needed to evaluate a health program. Biological samples include things
such as blood, urine, or hair; anthropometric measures are typically height, weight, and body
mass index; and environmental samples range from ozone levels to bacteria counts in water
supplies, to lead levels in fish. The decision regarding inclusion of physical data in the evaluation
should be based on the health program goal and objectives, as well as the determination of
whether the intervention and causal theories underlying the health program are substantiated
sufficiently to justify the cost and effort needed to collect physical data, especially if laboratory
tests are necessary.
As with the collection of other types of data, physical data need to be collected consistently.
Evaluators may not have control over laboratory processing, so they need some assurance that
any laboratory results are reliable. Evaluators need to be familiar with the laboratory standards
for processing the samples and take steps to minimize factors that would lead to erroneous
variation in results. Another consideration with regard to physical data, and specifically
biological data, is the cost involved in collecting, storing, and processing the data. Generally, use
of biological data in an evaluation can be quite an expensive proposition, and evaluators need to
be proactive in budgeting for these expenses.
SECTION V
Outcome and Impact Evaluation of Health
Programs
© Lynne Nicholson/Shutterstock
CHAPTER 11
Planning the Intervention Effect
Evaluations
In the daily work of implementing a program, evaluation of intervention effects can seem like a
luxury. The reality is that conducting an evaluation whose purpose is to identify whether the
intervention had an effect requires considerable forethought regarding a broad range of issues,
each of which has the potential to detract seriously from the credibility of the evaluation.
The intervention effect evaluation deserves the same degree of attention during program
planning as does development of the program interventions; ideally, it should be designed
concurrently with the program. All too often, attention is focused on developing the evaluation
only after the goals and objectives are finalized and the program is up and running. Wellarticulated program outcome goals and outcome objectives facilitate development of the
evaluation, but insights about the program process can be gained from developing an evaluation
plan. As highlighted in the planning and evaluation cycle (FIGURE 11-1), the planning and
decisions about the effect evaluation should occur as the program is being developed.
FIGURE 11-1 Planning and Evaluation Cycle, with Effect Evaluation Highlights
The contents of this chapter address the broad areas of data collection and evaluation rigor within
the context of the program theory and feasibility considerations. The information presented on
designs and sampling is not intended to duplicate the extensive treatment of research methods
and statistics provided in research textbooks. Instead, basic research content is presented as the
background for the problems commonly encountered in conducting a health program evaluation,
and practical suggestions are provided for minimizing those problems. Because the focus here is
on practical solutions to real problems, the suggestions offered in this chapter may differ from
those usually found in research and statistics textbooks. Nonetheless, good research methods and
statistics textbooks are invaluable resources and references that should be on the bookshelf of
every program evaluator.
Planning the evaluation begins with selecting the evaluation questions and then developing the
details of the evaluation implementation plan, similar to the details of the program organization
plan. Aspects of the evaluation plan related to data collection—namely, levels of measurement
and levels of analysis, as well as techniques to collect data—are discussed next. These elements
of evaluations are closely aligned with research methodology, and achieving scientific rigor is
the first yardstick used when planning the intervention effect evaluation.
▶Developing the Evaluation Questions
The first step in planning the evaluation is deciding which questions the effect
evaluation must be able to answer. Development of the evaluation questions actually
began with development of the logic model, the effect theory, and the outcome TREW
(Timeframe, what portion of Recipients experience what Extent of Which type of
change) objectives. Those planning tools also form the basis for decisions about the
focus and purpose of the intervention evaluation. The effect theory draws attention to
the specific aspects of the health problem that are being addressed by the program.
Many aspects of the health problem and possible health outcomes of the program could
potentially be addressed by an evaluation, and novice evaluators and enthusiastic
program supporters will be tempted to include as much as possible in the evaluation.
Succumbing to this temptation will lead to higher evaluation costs, produce an
overwhelming amount of data to analyze and interpret, and distract from the essence of
the program. Thus, staying focused on the outcome objectives minimizes the chance
that the evaluation will become a fishing expedition. In other words, designing an
evaluation can quickly lead to the development of a creative “wish list” of statements
such as, “if only we knew X about the recipients and Y about their health.” The TREW
objectives serve as a sounding board against which to determine whether the “if only we
knew” statement has relevance and importance to understanding whether the program
was effective.
The obvious reason for doing an evaluation is to determine the effect of the program on
recipients. Patton (2008) has argued that the usefulness of evaluation information
should be a major reason for doing the evaluation. Nevertheless, evaluations may be
conducted for other reasons, such as meeting the requirements of funding agencies.
Responding to funding agency requirements can be an opportunity to engage program
stakeholders and elicit their interests with regard to why an evaluation should be
performed. Information that stakeholders want from an evaluation, once made explicit,
then can be incorporated into the evaluation.
A key aspect of the question about doing an evaluation in the first place is determining
who cares whether the evaluation is done and what it might find. There might also be a
desire to prove that the program was the source of some beneficial change. Evaluations
that attempt to answer causal questions are the most difficult type to perform, but
causality testing provides the richest information for understanding the validity of the
effect theory.
Characteristics of the Right Question
Evaluations should be useful, feasible, ethical, and accurate (Patton, 2008). To be useful, the
data collected must be relevant. Relevant data may not be easily accessible, and their successful
collection may require negotiation and reliance on the effect theory, as the following true story
reveals. A community agency wanted to know if its program was having an effect on the
substance abuse rates among school-age children. The stakeholders did not believe that data
collected several years prior to the program’s implementation from across the state were relevant
to their community. Yet data could not be collected from children in grades 6 through 8
regarding their use of illegal drugs (an intermediate program outcome) because the school board
refused to allow the evaluators into the schools. Thus, the evaluation question of whether the
substance abuse prevention intervention changed the behavior of these children could not be
answered. Eventually, the program staff members restated the question to focus on whether
children who had received the substance abuse prevention program had learned about the
negative health effects of illegal substances (a direct program outcome). The school board was
willing to allow data on this question to be collected from children.
Another characteristic of the right evaluation question is that more than one answer is possible.
While this characteristic may seem counterintuitive, allowing for the possibility of multiple
answers shows less bias on the part of the evaluator for arriving at the desired answer.
Evaluations that are flexible and inclusive may yield multiple answers that may reveal subtle
differences among participants or program components that were not anticipated. Compare the
evaluation question of, “Did the program make a difference to participants?” with the evaluation
question of, “Which types of changes did participants experience?” The second question makes it
possible to identify not only changes that were anticipated based on the effect theory but also
other changes that may not have been anticipated.
A third characteristic is that the right evaluation question produces information wanted and or
needed by decision makers and stakeholders. Ultimately, the right question produces information
that decision makers can use, regardless of whether it actually is used in decision making. The
test of usefulness of the information generated by the evaluation helps avoid the fishing
expedition problem and, more important, could be a point at which developing the evaluation
provides feedback relevant to the intervention design.
As these three characteristics suggest, having a clear purpose for the evaluation and knowing
what is needed as an end product of the evaluation are the critical first steps in developing the
evaluation question. The nature of the effect evaluation is influenced by the skill and
sophistication of those persons doing the evaluation as well as the purpose of evaluation. The key
factor in stating the intervention effect evaluation question is the degree to which the evaluation
must document or explain health changes in program participants.
Outcome Documentation, Outcome Assessment, and Outcome Evaluation
Evaluation of the effect of the intervention can range from the simple to the highly complex. At
minimum, it should document the effect of the program in terms of reaching the stated outcome
and impact objectives. An outcome documentation evaluation asks the question, “To what extent
were the outcome objectives met?” To answer this question, an outcome documentation
evaluation uses data collection methods that are very closely related to the objectives. In this
way, the TREW objectives that flowed from the effect theory become the cornerstone of an
outcome documentation evaluation.
The next level of complexity is an outcome assessment evaluation, which seeks to answer the
question, “To what extent is any noticeable change or difference in participants related to having
received the program interventions?” An outcome assessment goes beyond merely documenting
that the objectives were met by quantifying the extent to which the interventions seem related to
changes observed or measured among program recipients. With this type of effect evaluation, the
data collection may need to be more complex and better able to detect smaller and more specific
changes in program participants. Note that the outcome assessment addresses the existence of a
relationship between those persons who received the program and the presence of a change, but
it does not attempt to determine whether the change was caused by the program. This subtle
linguistic difference, which is often not recognized by stakeholders, is actually an enormous
difference from the point of view of the evaluation design.
The most complex and difficult question to answer is, “Were the changes or differences due to
participants having received the program and nothing else?” To answer this question, an outcome
evaluation is needed. Because this type of effect evaluation seeks to attribute changes in program
participants to the interventions and nothing else, the data collection and sample selection must
be able to detect changes due to the program and other potentially influential factors that are not
part of the program. This highly rigorous requirement makes an outcome evaluation the most
like basic research (especially clinical trials) into the causes of health problems and the efficacy
of interventions.
Thinking of the three levels of program effects evaluation (TABLE 11-1) as outcome
documentation, outcome assessment, and outcome evaluation helps delineate the level of
complexity needed in data collection, the degree of scientific rigor required, and the design of the
evaluation.
TABLE 11-1 Three Levels of Intervention Effect Evaluations
Outcome Documentation
Outcome Assessment
Purpose
Show that outcome and impact TREW objectives
were met
Determine whether participants in the
program experienced any change/benefit
Relationship to
program effect theory
Confirms reaching benchmarks set in the objectives
that were based on program effect theory
Supports program effect theory
Level of rigor required
Minimal
Moderate
Data collection
Data type and collection timing based on TREW
objectives being measured
Data type based on program effect theory;
timing based on feasibility
Evaluation and Research
The distinction between evaluation and research can be ambiguous and is often blurred in the
minds of stakeholders. Nonetheless, fundamental differences do exist (TABLE 11-2),
particularly with regard to purpose and audiences for the final report. Much less distinction is
made with regard to methods and designs—both draw heavily from methodologies used in
behavioral and health sciences.
TABLE 11-2 Differences Between Evaluation and Research
Characteristic
Research
Evaluation
Goal or purpose
Generation of new knowledge for prediction
Social accountability and program or p
Questions addressed
Scientist’s own questions
Questions derived from program goals
Nature of problem addressed
Areas where knowledge is lacking
Outcomes and impacts related to prog
Guiding theory
Theory used as basis for hypothesis testing
Theory underlying the program interv
Appropriate techniques
Sampling, statistics, hypothesis testing, and so on
Whichever research techniques fit wit
Setting
Anywhere that is appropriate to the question
Any setting where evaluators can acce
Dissemination
Scientific journals
Internal and externally viewed progra
Allegiance
Scientific community
Funding source, policy preference, scie
The differences between research and evaluation are important to appreciate for two reasons.
First, communicating the differences to stakeholders and program staff members helps establish
realistic expectations about implementing the evaluation and about the findings of the evaluation.
As a consequence, it will be easier to gain their cooperation and feedback on the feasibility of the
evaluation. Second, understanding the differences can allay anxieties about spending undue
amounts of time doing research, which takes time away from providing the program, which in
turn is the primary concern of program staff members.
Research, in a pure sense, is done for the purpose of generating knowledge, whereas
program evaluation is done for the purpose of understanding the extent to which the intervention
was effective. These need not be mutually exclusive purposes. That is, a good program
evaluation can advance knowledge, just as knowledge from research can be used in program
development. Evaluation research is performed for the purpose of generating knowledge about
the effectiveness of a program and, as such, represents the blending of research and evaluation.
While these three terms are often used interchangeably or ambiguously, it is easiest to think of
evaluation research as research done by professional evaluators, following standards for
evaluation and using research methods and designs. Evaluation research is most often an
outcome assessment or an outcome evaluation. In this regard, it tends to be more complex, to be
costly, and to require more evaluation skill than most program staff members have. This
discussion is not meant to imply that simpler outcome documentation is not valuable. The value
always lies in the evaluation addressing the right question for the program.
Rigor in Evaluation
Rigor is important in evaluation, as in research, because there is a need to be confident that the
findings and results are as true a representation as possible of what happened. The playing cards
are often stacked against finding any difference from a program because of programmatic
reasons, such as having a weak or ineffective intervention, and because of evaluation research
methods reasons, such as having measures with low validity or reliability. Rigor results from
minimizing the natural flaws associated with doing evaluation, which might otherwise diminish
the evaluators’ ability to identify the amount of effect of the program. The net effects are those
that are attributable only to the program, whereas the total change includes effects from the
intervention as well as effects that are artifacts of the evaluation design (FIGURE 11-2), such as
history, maturation of the participants, or societal changes. The purpose of the effect evaluation
is to identify the net effects, so rigor is used to minimize the inclusion of nonintervention effects
and design effects.
FIGURE 11-2 Diagram of Net Effects to Which Measures Need to Be Sensitive
UNIT VI STUDY GUIDE
Program Assessments
Course Learning Outcomes for Unit VI
Upon completion of this unit, students should be able to:
2. Analyze the theories and models that are important for planning health education and promotion
interventions.
2.1 Explain the difference between formative and process evaluations used in program
implementation models.
2.2 Explain the difference between impact and outcome evaluation used in program
implementation models.
2.3 Explain key sampling designs used with program assessment tools in program implementation
models.
Course/Unit
Learning Outcomes
2.1
2.2
2.3
Learning Activity
Unit Lesson
Chapter 10, p. 259
Unit VI Assessment
Unit Lesson
Chapter 11
Unit VI Assessment
Unit Lesson
Chapter 13
Unit VI Assessment
Reading Assignment
Chapter 10: Program Quality and Fidelity: Managerial and Contextual Considerations, p. 259
Chapter 11: Planning the Intervention Effect Evaluations
Chapter 13: Sampling Designs and Data Sources for Effect Evaluations
In order to access the following resource, click the link below:
For the following video, you are only required to watch the first eight minutes.
Rideout, C. (2015, December 21). FNH 473 Video 4: Implementing community-based programs [Video File].
Retrieved from https://www.youtube.com/watch?v=9VrQ7H2Cvwo
Click here for the transcript of the video.
Unit Lesson
So far, we have reviewed most of the key aspects in program planning and implementation, but we still have
a few more concepts to get through. Before we get started let’s acknowledge two vocabulary words:
assessment and evaluation. These words are often used interchangeably. In some cases, they actually can
be, but there is a difference between them. Program evaluation is a systematic method for collecting,
analyzing, and using information to answer questions about projects, policies and programs, particularly about
their effectiveness and efficiency (Newcomer, Hatry, & Wholey, 2015). Program assessment tools are used to
evaluate program effectiveness and efficiency. An easy way to remember to keep the two terms separate is
CHE 6304, Health Program Planning, Implementation, and Assessment
1
by understanding that evaluation is a method while assessment is a tool usedUNIT
in accomplishing
the method.
x STUDY GUIDE
Most types of program evaluation will fall in one of four categories: formative evaluation,
process evaluation,
Title
outcome evaluation, and impact evaluation.
Formative Evaluation
Formative evaluations consider if the program intervention that has been developed is feasible (Poland,
Frohlich, & Cargo, 2008). Formative evaluations assess a small population before implementing on a larger
scale. It is common to see formative evaluations conducted if the intervention or activity that is being
developed is new, the activity is being used in a different setting, the intervention is being used with a new
population, or if the an original intervention is being adapted in any way. Formative evaluations produce
information that program planners can use to help them achieve the program intervention goals and improve
the program intervention implementation process along the way or for the next time the intervention is
implemented. An example of a formative evaluation is a community nutrition education class that is designed
to be offered to a community next summer being offered to a small cohort of residents in the fall before
implementation to determine if the curriculum developed will see the desired results.
Process Evaluation
Process evaluations are to determine if the program intervention is implemented in the way it was intended to
be (Poland et al., 2008). It also allows the health program planner to look at developed processes before
implementation to see if they actually work in the way they were intended. Process evaluations produce
information that the program planners can use to help them control the type, quantity, and quality of the
program intervention. An example of this is giving a survey to host locations of a nutrition class. The survey
could ask about the scheduling process, class offerings, classroom cleanliness, and similar questions where
the answers shed light about possible course improvements and therefore help start the ball rolling on an
implementation process for those improvements.
Outcome Evaluation
Outcome evaluations examine how effective the program intervention is at making a difference in the target
population (Poland et al., 2008). This evaluation assesses if the goals of the intervention were met or not.
Outcome evaluations produce measures of change in behavior, quality of life, and health conditions of the
target population, for example. An example of an outcome evaluation of a community nutrition education class
would be tracking how many community residents have taken the class. If the program intervention had a
goal to educate 350 residents on nutrition in one year, after one year, the number of residents who took the
class would be the outcome of the program. It is also important to note that the majority of the time there are
multiple outcomes of a program intervention since there are normally multiple objectives set for the target
population.
Impact Evaluation
Impact evaluations highlight the overall effectiveness of the program intervention at achieving its overall goal
(Poland et al., 2008). Each program intervention that is developed has main goals and objectives that are set
to help program planners to reach the goals, and the impact evaluation measures whether the goals were
met. Impact evaluations produce information that is similar to outcomes but includes the bigger picture such
as changes in skills, knowledge, and/or attitudes. An example of an outcome evaluation of a community
nutrition education class would be an increase in residents’ knowledge of how to shop for and cook healthy
foods.
When we look at conducting outcome and impact evaluation, it is important to gather baseline data that
assesses the current state of a skill, knowledge, behavior, or attitude. Baseline data is important in outcome
evaluation because it allows the program planner to assess changes that can be seen in the target
population. Baseline data is equally important in impact evaluations as it allows the program planner to
assess overall effectiveness. The same tool can be used to collect multiple types of evaluation. Pre-testing
and post-testing are often used in interventions to collect change data. One tool that is used often is surveys.
Using tested survey assessment tools that have already been developed and tested for reliability and validity
CHE 6304, Health Program Planning, Implementation, and Assessment
2
and that will work for your health program intervention is better than starting from
scratch.
If you
must start
UNIT
x STUDY
GUIDE
from scratch, there are a few things that you should know.
Title
Survey Design and Distribution
Before working on the questions of a survey, a program planner must first determine the goal of the
information that is gathered (Fink, 2012). The program planner must also determine how the information that
is being gathered will be used. The health program planner should determine the population that will use the
survey. Once the population is determined, the health program planner should identify what it is that he or she
is looking to compare from the pre-test and post-test.
After the goal and purpose of the survey are identified, the program planner must identify the target
population that will be completing the survey (Fink, 2012). Program planners should know their audience to
ensure the survey meets the average reading level of the target population. A program planner will not gather
the information that he or she is seeking if the questions are being asked on a level that the target population
cannot understand. For instance, a survey should not ask elementary students about their body mass index
(BMI) numbers when they may not know what that means. In addition, a population with a low educational
level also may not understand BMI.
When designing the survey, it is good to know the difference between the targeted population and the nontargeted population, which allows the program planner to be mindful of what the goals while working on
questions that fit that population (Fink, 2012). Depending on the size of the target population, it may be easier
for the program planner to reach the entire population or to determine a process to reach a sampling of the
targeted population. Chapter 13 goes into detail about sampling. The size of the target population matters
because it may not be feasible to reach them all or to gather follow-up data.
Another key aspect of survey design and distribution is developing a timeline (Fink, 2012). Having a timeline
in mind for when the surveys will be distributed is important. The program planner needs to determine other
outside factors that may play a part in survey data collection. For instance, if the survey is distributed at a
school, the program planner must factor in any school breaks that will hinder data collection. On the other
hand, if the survey is distributed door-to-door, the program planner must take weather conditions into account.
The timeline not only considers the distribution and collection process of the survey, but it also includes how
much time the respondents will need to complete the survey. If the survey is being developed from scratch,
time is needed to review the survey and test it for validity and reliability. It is rare for a project planner to only
work on one project at a time. Since there is more public health work to be done than there are public health
professionals, the program planner should also allot for time for workload or surprises that may affect the
survey process.
Surveys are useful tools that can be used in formative, process, outcome, and impact evaluations. Surveys
can be used in a paper, telephonic, or electronic format; it is up to the program planner to determine the best
tool to use based on the population, timeline, and goals of the health program. For your assignment this week
you will be the program planner, so use the tips and information from this lesson to help you determine what
will be best in the program that you are developing.
References
Fink, A. (2012). How to conduct surveys: A step-by-step guide (5th ed.). Thousand Oak, CA: Sage.
Newcomer, K. E., Hatry, H. P., & Wholey, J. S. (2015). Handbook of practical program evaluation (4th ed.).
Hoboken, NJ: Wiley.
Poland, B., Frohlich, K. L., & Cargo, M. (2008). Context as a fundamental dimension of health promotion
program evaluation. In L. Potvin & D. V. McQueen (Eds.), Health promotion evaluation practices in
the Americas (pp. 299-317). New York, NY: Springer.
CHE 6304, Health Program Planning, Implementation, and Assessment
3
Suggested Reading
UNIT x STUDY GUIDE
Title
You have already watched the first eight minutes of this video. You can now watch the remainder of the video,
which covers words of wisdom, advice, and timing challenges in terms of implementing community-based
programs.
Rideout, C. (2015, December 21). FNH 473 Video 4: Implementing community-based programs [Video File].
Retrieved from https://www.youtube.com/watch?v=9VrQ7H2Cvwo
Click here for the transcript of the video.
CHE 6304, Health Program Planning, Implementation, and Assessment
4
Purchase answer to see full
attachment