Research Design and Methods

Content Type

User Generated

User

znynx_fnyru11

Subject

Health Medical

Description

Many times, research design and methods are chosen simultaneously, depending on the study population, the purpose of the study, the variables, and the time frame within which data on these variables is to be collected.

There are a number of different advantages for using secondary data to answer research questions, such as lower cost and timeliness. However, researchers and public health practitioners must be aware of the challenges and limitations that existing datasets may pose to their research.

For this Discussion, you will select a published study on public health prevalence or incidence that uses secondary data and review its study design and methods.

To prepare:

Read this week’s Learning Resources.
Research the Walden Library and select a published study on public health prevalence or incidence that uses secondary data.

By Day 3

Post a 3- to 4-paragraph evaluation of the published study you selected. Include the following:

A brief description of the study you selected including research question
An explanation of the study design
An explanation of the dataset used in study
Identify dependent and independent variables used
An evaluation of whether the selected study design, methods, and secondary data sources were the most appropriate (provide your rationale).

Support your post based on the Learning Resources and current literature. Use APA formatting for your Discussion and to cite your resources.

Unformatted Attachment Preview

User's Guide: Which Comes First - The Dataset or the Research Question? Researchers often face a decision: whether to first develop a research question and then find the best dataset to answer it, or to first select a high-value dataset and then develop the research question. For most researchers, the best approach is often a hybrid of these two options. The core of high-quality research is a cogent and important research question. For this reason, it is usually a mistake to simply choose a dataset and then flip through the codebooks until one finds an interesting variable. On the other hand, datasets rarely have all of the data that one wants, so trying to fit a pre-conceived notion into existing data can be an exercise in frustration. Moreover, this approach often results in subpar research when the investigator doesn’t respect the limitations and strengths of the data with which she is working. Thus, the best approach is often to develop a broadly-defined area of inquiry, and then identify a handful of datasets that are well-suited to that focus. Then, one can carefully evaluate the structure and content of the data to look for unique ways to translate that area of inquiry into a specific question that is both important and well-suited for that dataset. For example, a researcher might find that a dataset contains a unique series of questions that provide a novel framework for studying her area of interest. Or, the dataset may have a unique structure – such as the collection of longitudinal data, or national representativeness, or linking of patient survey data with biomarkers – that offers a fresh and exciting way to evaluate a research topic. Finally, the accessibility, ease of use, and local experience working with a dataset is of critical importance. For example, Medicare claims data is a tremendous resource for research, but is very challenging to use. If one’s mentor has used this database and one has access to local data analysts with extensive experience using Medicare data, that’s great. If not, proceed with caution unless you have an abundance of time, money, and patience. For this reason, the best datasets for a junior investigator are often those where (1) there is local experience using the dataset that can be put to use, and/or (2) the dataset is relatively easy to access, learn, and use. CHAPTER 4 RESEARCH DESIGN, VALIDITY, AND BEST AVAILABLE EVIDENCE T he best available evidence on public health programs and policies comes from high-quality studies. A study’s quality is dependent upon the strength of its methodology, including its research design, outcome measures, settings, participants, interventions, data collection strategies, and statistical techniques. This chapter explores commonly used research designs and discusses how they affect a study’s internal validity, external validity, and quality. Subsequent chapters discuss other components of study methodology. CHAPTER OBJECTIVES  After reading this chapter, you will be able to •• Describe the characteristics of commonly used research designs in studies to define and meet public health program needs, including   Randomized controlled trials with concurrent, parallel, or wait-list control groups and factorial designs Quasi-experimental designs with concurrent or parallel control groups 107 108– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE Time-series designs, including pretest-posttest and interrupted time-series designs  Observational designs, including cohorts, case controls, and crosssectional surveys Describe the methods that statisticians, epidemiologists, and other health researchers use to ensure that experimental and control groups are equivalent before they participate in research; these include blinding, random allocation, matching, propensity score analysis, and analysis of covariance Describe the threats to internal and external validity that can result from a study’s research design Read a research article that evaluates program effectiveness, describe the main objective, explain how participants were assessed for inclusion and exclusion into the study, and describe the research design When given a table of data, write up the results comparing the experimental and control groups When given an excerpt from a study report or article, list the variables or covariates that the researchers controlled for statistically in order to prevent them from confounding the results Describe how the choice and implementation of a study’s research design affects its quality  •• •• •• •• •• ••  RESEARCH METHODS AND RESEARCH DESIGN A study’s methods include its research design; outcome measures; criteria for including settings, participants, and interventions; sampling strategies and techniques for reducing bias between participants and interventions; and data collection and statistical strategies. Most researchers agree that the “best” way of demonstrating program effectiveness is through a well-designed and implemented randomized controlled trial (RCT). The Randomized Controlled Trial: Going for the Gold An RCT is an experimental study in which eligible individuals or groups of individuals (e.g., schools, communities) are assigned at random to receive one of several programs or interventions. The group in an experiment that receives the specified program is the experimental group. The control group is another group assigned to the experiment, but not for the purpose Chapter 4. Research Design, Validity, and Best Available Evidence– ●–109 of being exposed to the program. The performance of the control group usually serves as a standard against which to measure the effect of the program on the experimental group. The control program may be typical practice (usual care), an alternative practice, or a placebo (a treatment or program believed to be inert or innocuous). Random assignment, or random allocation, means that people end up in the experimental or control group by chance rather than by choice. Randomized controlled trials are sometimes called true experiments because, at their best, they can demonstrate causality. That means that, in theory at least, the researcher can assume that if participants in an RCT achieve desirable outcomes, the program caused them. True experiments are often contrasted with quasi experiments, and observational studies. A quasi-experimental design is one in which the control group is predetermined (without random assignment) to be comparable to the program group in critical ways, such as being in the same school or eligible for the same services. In observational designs, the researcher does not intervene. He or she studies the effects of already existing programs on individuals and groups (e.g., a retrospective design, historical analysis, or summative evaluation of programs such as Head Start or the Welfare to Work Program of the U.S. government). Observational designs are sometimes called descriptive. True and quasi-experimental designs aim to link programs to outcomes, while observational studies are used to illuminate the need for programs, learn about their implementation, and clarify the findings of current evaluations by applying lessons learned from previous research. The RCT is considered the gold standard of research designs because it is the only one that can be counted on to rule out inherent participant characteristics that may affect the program’s outcomes. Put another way, if participants are assigned to experimental and control programs randomly, then the two groups will probably be alike in all important ways before they participate. If they are different afterward, the difference can reasonably be linked to the program. Suppose the evaluators of a health literacy program in the workplace hope to improve writing skills. They recruit volunteers to participate in a 6-week writing program and compare their writing skills to those of other workers who are, on average, the same age and have similar educational backgrounds and writing skills. Suppose also that, after the volunteers complete the 6-week program, the evaluators compare the two groups’ writing and find that the experimental group performed much better. Can the evaluators claim that the literacy program is effective? Possibly. But the nature of the design is such that you cannot really tell if some other factors that the 110– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE evaluators did not measure are responsible for the apparent program success. The volunteers may have done better because they were more motivated to achieve (that is why they volunteered), had more home-based social support, and so on. A better way to evaluate the workplace literacy program is to (a) randomly assign all eligible workers (e.g., those who score below a certain level on a writing test) to the experimental program and to a comparable (but not new) control program and then (b) compare changes in writing skills over time. With random assignment, all the important factors (e.g., motivation, home support) are likely to be equally distributed between the two groups. Then, if the scores are significantly different in favor of the experimental group, the evaluators will be on firmer ground in concluding that the program is effective (see Table 4.1). Table 4.1 An Effective Literacy Program: Hypothetical Example Assignment Before the Program After the Program Randomly assigned to the experimental literacy program Relatively weak writing skills Significantly improved writing skills Randomly assigned to a standard and comparable literacy program Relatively weak writing skills No change: Writing skills are still relatively weak CONCLUSION: The experimental program effectively improved writing skills when compared to a standard and comparable program. In sum, RCTs are quantitative, comparative, controlled experiments in which investigators study two or more programs, interventions, or practices in a series of eligible individuals who receive them in random order. Here are two commonly used randomized control designs: 1. Concurrent controls in which two (or more) groups are randomly constituted, and they are studied at the same time (concurrently). Concurrent controls are sometimes called parallel controls. 2. Wait-list controls in which one group receives the program first and others are put on a waiting list; if the program appears to be effective, participants on the waiting list receive it. Participants are randomly assigned to the experimental and wait-list groups. Chapter 4. Research Design, Validity, and Best Available Evidence– ●–111 Concurrent or Parallel Controls. Here is how evaluation researchers design randomized controlled trials with concurrent groups (see Figure 4.1): 1. First the researcher appraises the eligibility of the potential participants. •• Some people are excluded because they did not meet the inclusion criteria or they did meet the exclusion criteria. •• Some eligible people decide not to participate. They change their mind, become ill, or are too busy. 2. The remaining potential participants are enrolled in the evaluation study. 3. These participants are randomly assigned to the experiment or to an alternative (the control). 4. Participants in the experimental and control groups are pretested, that is, compared at baseline (before program participation) when possible. They are always compared (posttested) after participation. Figure 4.1 Randomized Control Trial With Concurrent Controls Potential Participants Are Assessed for Eligibility Eligible Participants Are Enrolled Randomized to Experimental Group Control Group Excluded From Participation • Did not meet inclusion criteria • Met exclusion criteria • Refused to participate 112– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE Example 4.1 illustrates two randomized controlled trials with concurrent controls. Example 4.1 Two Randomized Controlled Trials With Concurrent Controls 1. Evaluating Home Visitation by Nurses to Prevent Child Maltreatment in Families Referred to Child Protection Agencies (MacMillan et al., 2005) Objective. Recurrence of child maltreatment is a major problem, yet little is known about approaches to reduce this risk in families referred to child protection agencies. Since home visitation by nurses for disadvantaged first-time mothers has proven effective in the prevention of child abuse and neglect, the researchers investigated whether this approach might reduce the recurrence of maltreatment. Assessment for Eligibility. Families were eligible if they met the following criteria: (1) the index child was younger than 13 years, (2) the reported episode of physical abuse or neglect occurred within the previous 3 months, (3) the child identified as physically abused or neglected was still living with his or her family or was to be returned home within 30 days of the incident, and (4) families were able to speak English. Families in which the abuse was committed by a foster parent, or in [which] the reported incident included sexual abuse, were not eligible. Evaluation Research Design. The evaluators randomly assigned 163 families to control or intervention groups. Control families received standard services arranged by the agency. These included routine follow-up by caseworkers whose focus was on assessment of risk of recidivism, provision of education about parenting, and arrangement of referrals to community-based parent education programs and other services. The intervention group of families received the same standard care plus home visitation by a public-health nurse every week for 6 months, then every 2 weeks for 6 months, then monthly for 12 months. Findings. At 3-years’ follow-up, recurrence of child physical abuse did not differ between groups. However, hospital records showed significantly higher recurrence of either physical abuse or neglect in the intervention group than in the control group. 2. Evaluating Therapy for Depressed Elderly People: Comparing a Holistic Approach to Medication Alone (Nickel et al., 2005) Objective. To find out whether recovering the ability to function socially takes a different course with integrative, holistic treatment than it does with medication alone. Assessment for Eligibility. To be included, participants had to be female; aged 65–75; living at home; and disturbed by symptoms such as sadness, lack of drive, and reclusion. Grounds for exclusion were the need for personal assistance in any of four key activities of daily living: Bathing, dressing, walking inside the house, and transferring from a chair; significant cognitive Chapter 4. Research Design, Validity, and Best Available Evidence– ●–113 impairment with no available proxy; diagnosis of a terminal illness, psychosis, or bipolar disorder; the current use of antidepressants or psychotherapy; and plans to change of residence within next four months. Findings. Both forms of therapy did afford a relatively rapid reduction of depressive symptoms. The integrative treatment not only led to a quicker reduction in depression, however, but was also the only one that led to a significant improvement in the ability to function socially. 3. Evaluating a Health Care Program to Get Adolescents to Exercise (Patrick et al., 2006) Objective. Many adolescents do not meet national guidelines for participation in regular, moderate, or vigorous physical activity; for limitations on sedentary behaviors; or for dietary intake of fruits and vegetables, fiber, or total dietary fat. This study evaluated a health care–based intervention to improve these behaviors. Assessment for Eligibility. Adolescents between the ages of 11 and 15 years were recruited through their primary care providers. A total of 45 primary care providers from 6 private clinic sites in San Diego County, California, agreed to participate in the study. A representative group of healthy adolescents seeing primary care providers was sought by contacting parents of adolescents who were already scheduled for a well-child visit and by outreach to families with adolescents. Adolescents were excluded if they had health conditions that would limit their ability to comply with physical activity or diet recommendations. Evaluation Research Design. After baseline measures but before seeing the provider, participants were randomized to either the Patient-Centered Assessment and Counseling for Exercise + Nutrition (PACE+) program or to a sun protection control condition. Findings. Compared with adolescents in the sun protection control group, girls and boys in the diet and physical activity program significantly reduced sedentary behaviors. Boys reported more active days per week. No program effects were seen with percentage of calories from fat consumed or minutes of physical activities per week. The percentage of adolescents meeting recommended health guidelines was significantly improved for girls for consumption of saturated fat and for boys’ participation in days per week of physical activity. No between-group differences were seen in body mass index. Wait-List Control: Do It Sequentially. With a wait-list control design, both groups are assessed for eligibility, but one is randomly assigned to be given the program now (experimental group) and the other is put on a waiting list (control group). After the experimental group completes the program, both groups are assessed a second time. Then the control group receives the program and both groups are assessed again (see Figure 4.2). 114– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE Figure 4.2 Randomized Controlled Trial Using a Wait-List Control Potential Participants Are Assessed for Eligibility Eligible Participants Are Enrolled Excluded From Participation • Did not meet inclusion criteria • Met exclusion criteria • Refused to participate Randomized to Experimental Group: Gets the Program NOW Control Group: Put on a Waiting List Here is how this design is used: 1. Compare Group 1 (experimental group) and Group 2 (control group) at baseline (the pretest). If random assignment has worked, the two groups should not differ from one another. 2. Give Group 1 the program. 3. Assess the outcomes for Groups 1 and 2 at the end of the program. If the program is working, expect to see a difference in outcomes favoring the experimental group. 4. Give the program to Group 2. Chapter 4. Research Design, Validity, and Best Available Evidence– ●–115 5. Assess the outcomes a second time. If the program is working, Group 2 should catch up to Group 1), and both should have improved in their outcomes (Figure 4.4). Figure 4.3 Evaluating Effectiveness With a Wait-List Control The Wait-List Group (Control Group) Catches Up 25 20 Scores 15 10 5 0 Baseline Time 1 Outcome Time 2 Outcome Measure Times Experimental Group Control Group Example 4.2 has three illustrative wait-list control evaluation designs. Example 4.2 Three RCTs With Wait-List Controls 1. Evaluating a Methadone Maintenance Treatment in an Australian Prison System (Dolan et al., 2003) Objective. To determine whether methadone maintenance treatment reduced heroin use, syringe sharing, and HIV or hepatitis C incidence among prisoners. (Continued) 116– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE Example 4.2 (Continued) Assessment for Eligibility. Male inmates were eligible to participate if they (1) were assessed as suitable for methadone maintenance by a detailed interview with medical staff who confirmed they had a heroin problem; (2) were serving prison sentences longer than four months at the time of interview; and (3) were able to provide signed informed consent. Evaluation Research Design. All eligible prisoners seeking drug treatment were randomized to methadone or a wait-list control group and followed up after four months. Findings. Heroin use was significantly lower among treated than control subjects at follow-up. Treated subjects reported lower levels of drug injection and syringe sharing at follow-up. There was no difference in HIV or hepatitis C incidence. 2. Evaluating Two Brief Treatments for Sleep Problems in Young Learning Disabled Children: A Randomized Controlled Trial (Montgomery, Stores, & Wiggs, 2004) Objective. To investigate the efficacy of a media-based, brief behavioral treatment of sleep problems in children with learning disabilities. Assessment for Eligibility. The study included children aged 2–8 years with any form of severe learning disability, confirmed by a general practitioner. Severe sleep problems were defined according to standardized criteria as follows: (1) night waking occurring three or more times a week for more than a few minutes and the child disturbing the parents or going into their room or bed and/or (2) settling problems occurring three or more times a week with the child taking more than one hour to settle and disturbing the parents during this time. These problems needed to have been present for at least three months and not be explicable in terms of a physical problem such as pain. Evaluation Research Design. The parents of severely learning disabled children took part in a randomized controlled trial with a wait-list control group. Face-to-face delivered treatment was compared to usual care, and a booklet-delivered treatment was compared to usual care. Findings. Both forms of treatment (face-to-face and booklet) were almost equally effective compared with the controls. Two thirds of children who were taking over 30 minutes to settle five or more times per week and waking at night for over 30 minutes four or more times per week improved on average to having such settling or night waking problems for only a few minutes or only once or twice per week. These improvements were maintained after six months. 3. Evaluating a Mental Health Intervention for Schoolchildren Exposed to Violence: A Randomized Controlled Trial (Stein et al., 2003) Objective. To evaluate the effectiveness of a collaboratively designed school-based intervention for reducing children’s symptoms of posttraumatic stress disorder (PTSD) and depression that has resulted from exposure to violence. Assessment for Eligibility. Sixth-grade students at two large middle schools in Los Angeles who reported exposure to violence and had clinical levels of symptoms of PTSD using standard measures. Chapter 4. Research Design, Validity, and Best Available Evidence– ●–117 Evaluation Research Design. Students were randomly assigned to a ten-session standardized cognitive-behavioral therapy (the Cognitive-Behavioral Intervention for Trauma in Schools) early intervention group or to a wait-list delayed intervention comparison group conducted by trained school mental health clinicians. Findings. Compared with the wait-list delayed intervention group (no intervention), after three months of intervention, students who were randomly assigned to the early intervention group had significantly lower scores on symptoms of PTSD, depression, and psychosocial dysfunction. At six months, after both groups had received the intervention, the differences between the two groups were not significantly different for symptoms of PTSD and depression. A wait-list control design (sometimes called switching replications or delayed treatment design) has the advantage of allowing the evaluator to compare experimental and control group performance on the same program. It is sometimes difficult to find or implement an alternative control program that is equal to or better than the new program. Also, the new program may be designed to fill a gap in the availability of programs, and no comparable program may actually be available at all. Finally, the nature of the design means that everyone receives the program, and, in some circumstances, this may be an incentive for everyone to participate fully. Wait-list control designs are particularly practical when programs are repeated at regular intervals, as they are in schools with a semester system. For example, students can be randomly assigned to Group 1 or Group 2, with Group 1 participating in the first semester. Group 2 can then participate in the second semester. The design is especially efficient in settings that can wait for results. Wait-list control designs are also reliant upon having the experimental group cease its improvement at the time of program completion. If improvement in the experimental group continues until the control group completes the program, then the effects of the program on the control group may appear to be less spectacular than they actually were. To avoid this confusion, some investigators advocate waiting for improvement in the experimental program to level off (a “wash out” period) and to time the implementation of the control program accordingly. However, the amount of time needed for the effect to wash out is usually unknown in advance. 118– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE Factorial Designs Factor 2: Notification Status Factorial designs enable researchers to evaluate the effects of varying the features of an intervention or practice to see which combination works best. In Example 4.3 the investigators are concerned with finding out if the response rate to web-based surveys can be improved by notifying prospective responders in advance by e-mail and/or pleading with them to respond. The investigators design a study to solve the response-rate problem using a two-by-two (2 × 2) factorial design in which participants are either notified about the survey in advance by e-mail or not prenotified, or they are pleaded with to respond or not pleaded with. The factors (they are also independent variables) are pleading (Factor 1) and notifying (Factor 2). Each factor has two levels: plead versus don’t plead and notify in advance versus don’t notify in advance. Factor 1: Pleading Status Plead Don’t Plead Notify in Advance Don’t Notify in Advance In a 2 × 2 design, there are four study groups: (1) prenotification e-mail and pleading invitation e-mail, (2) prenotification e-mail and nonpleading invitation, (3) no prenotification e-mail and pleading invitation, (4) no prenotification and nonpleading invitation. In the diagram above, the empty cells are placeholders for the number of people in each category (e.g., the number in the groups plead × notify in advance compared to the number in plead × don’t notify in advance). With this design, the researchers can study main effects (plead versus don’t plead) or interactive effects (prenotification and pleading). The outcome in this study is always the response rate. If research participants are assigned to groups randomly, the study is a randomized controlled trial. Example 4.3 Factorial Design (Felix, Burchett, & Edwards, 2011) Improving Response Rate to Web Surveys Objectives. To evaluate the effectiveness of pre-notification and pleading invitations in Web surveys by embedding a randomized controlled trial (RCT) in a Web-based survey. Chapter 4. Research Design, Validity, and Best Available Evidence– ●–119 Study Design and Setting. E-mail addresses of 569 authors of published maternal health research were randomized in a 2×2 factorial trial of a pre-notification vs. no pre-notification e-mail and a pleading vs. a non-pleading invitation e-mail. The primary outcome was completed response rate, and the secondary outcome was submitted response rate (which included complete and partial responses). Results. Pleading invitations resulted in 5.0% more completed questionnaires, although this difference did not reach statistical significance [odds ratio (OR) 1.23; 95% confidence interval (CI): 0.86, 1.74; P = 0.25]. Pre-notification did not increase the completion rate (OR 1.04; 95% CI 0.73, 1.48; P = 0.83). Response was higher among authors who had published in 2006 or later (OR 2.07; 95% CI: 1.43, 2.98; P = 0.001). There was some evidence that pre-notification was more effective in increasing submissions from authors with recent publications (P = 0.04). Conclusion. The use of a “pleading” tone to e-mail invitations may increase response to a Web-based survey. Authors of recently published research are more likely to respond to a Web-based survey. Factorial designs may include many factors and many levels. It is the number of levels that describes the name of the design. For instance, in a study of psychotherapy versus behavior modification in outpatient, inpatient, and day treatment settings, there are two factors (treatment and setting), with one factor having two levels (psychotherapy versus behavior modification) and one having three levels (inpatient, day treatment, and outpatient). This design is a 2 × 3 factorial design. Doing It Randomly Randomization is considered to be the primary method of ensuring that participating study groups are probably alike at baseline, that is, before they participate in a program. The idea behind randomization is that if chance—which is what random means—dictates the allocation of programs, all important factors will be equally distributed between and among experimental and control groups. No single factor will dominate any of the groups, possibly influencing program outcomes. That is, each group will be as smart, as motivated, as knowledgeable, as self-efficacious, and so on as the other to begin with. As a result, any differences between or among groups that are observed later, after program participation, can reasonably be assigned to the program rather than to the differences that were there at the beginning. In researchers’ terms, randomized controlled trials result in unbiased estimates of a program’s or treatment’s effects. How does random assignment work? Table 4.2 describes a commonly used method and some considerations. 120– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE Table 4.2 Random Assignment 1. An algorithm or set of rules is applied to a table of random numbers, which are usually generated by computer (although tables of random numbers are sometimes used in small studies). For instance, if the research design includes an experimental group and a control group, and an equal probability of being assigned to each, then the algorithm could specify using the random number 1 for assignment to the experimental group and 2 for assignment to the control group— or vice versa. (Other numbers are ignored.) 2. As each eligible person enters the study, he or she is assigned one of the numbers (1 or 2). 3. The random assignment procedure should be designed so that members of the research team who have contact with study participants cannot influence the allocation process. For instance, random assignments to experimental or control groups can be placed in advance in a set of sealed envelopes by someone who will not be involved in opening them. Each envelope should be numbered (so that all can be accounted for by the end of the study). As a participant comes through the system, his or her name is recorded, the envelope is opened, and the assignment (1 or 2) is recorded next to the person’s name. 4. It is crucial that researchers prevent interference with randomization. Who would tamper with assignment? Sometimes members of the research team may feel pressure to ensure that the most “needy” people receive the experimental program. One method of avoiding this is to ensure that tamper-proof procedures are in place. If the research team uses envelopes, they should ensure the envelopes are opaque (so no one can see through them) and sealed. In large studies, randomization is done off site. Variations on how to conduct the random allocation of participants and programs certainly exist. As described in the checklist below, look for adherence to certain principles regardless of the specifics of the method reported in a particular evaluation study. What Evidence-Based Public Health Practice Should Watch For: A Checklist 99 Study team members who have contact with participants were not part of the allocation process. Randomization can be done off-site. 99 Assignment was not readily available to evaluation team members. 99 A table of random numbers or a computer-generated list of random numbers was used. Random Clusters In some situations, it may be preferable for researchers to randomly assign clusters of individuals (e.g., families, communities) rather than Chapter 4. Research Design, Validity, and Best Available Evidence– ●–121 individuals to the experimental or control groups. In fact, randomization by cluster may be the only feasible method of conducting an evaluation in many settings. Research that uses clusters to randomize is variously known as field trials, community-based trials, or cluster randomized trials. Compared with individually randomized trials, cluster randomized trials are more complex to design and require more participants to obtain equivalent statistical power and more complex analysis. This is because observations on individuals in the same cluster (e.g., children in a classroom) tend to be interrelated by potentially confounding (confusing) variables. For example, students in a classroom are about the same age, may have the same ability, and will have similar experiences. Consequently, the actual sample size is less (one classroom) than the total number of individual participants (25 students). The whole is less than the sum of its parts! Example 4.4 contains an example of random assignment by cluster. In this example, the cluster comprises colleges. Please note that data on the outcome (cessation of smokeless tobacco use in the previous 30 days) were collected from individual students, but randomization was done by college— not by student. Is this OK? The answer depends on how the study deals with the potential problems caused by randomizing with one unit (colleges) and analyzing data from another (students). Example 4.4 A College-Based Smokeless Tobacco Program for College Athletes (Walsh et al., 1999) Objective. The purpose of this study was to determine the effectiveness of a college-based smokeless tobacco cessation intervention that targeted college athletes. Effectiveness was defined as reported cessation of smokeless tobacco use in the previous 30 days. Assessment for Eligibility. Current users of smokeless tobacco (use more than once per month and within the past month) were eligible for the study. A total of 16 colleges with an average of 23 smokeless tobacco users in each were selected from lists of all publicly supported California universities and community colleges. Half the participants were selected to be urban and half to be rural; all had varsity football and baseball teams. One-year prevalence of cessation among smokeless tobacco users was determined by self-report of abstinence for the previous 30 days. Evaluation Research Design. The occurrence of smokeless tobacco use was calculated for each athlete using information from a questionnaire given to them at baseline. Colleges were then matched by pairs so that the level of smoking was approximately the same in each of the individual colleges paired. One college from each pair was randomized to receive the program, while the other college in the pair received no program. Findings. In both groups, 314 students provided complete data on cessation. Cessation frequencies were 35% in the program colleges and 16% in the control colleges. The program effect increased with level of smokeless tobacco use. 122– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE  ENSURING BASELINE EQUIVALENCE: WHAT EVIDENCE-BASED PUBLIC HEALTH PRACTICE SHOULD WATCH FOR When reviewing articles, be certain that researchers provide information as to whether baseline characteristics are balanced among clusters and individuals. The evaluators in Example 4.4 sought to achieve balance (i.e., equivalence) among universities (the clusters) by including only public universities and junior colleges in California. They aimed for equivalence in smoking levels among students within each university by pairing up universities in terms of their students’ smoking levels and randomly assigning pair members to the experimental or the control group. In addition to descriptive information on methods used to ensure equivalence, look for proof that the process worked and that, after it was over, the groups were indeed equivalent: Of 273 children with asthma in this cohort, 42.1% were female, 41.7% were African-American, and the average age was 8.2 years. The baseline characteristics for Program and non-Program groups were quite similar in terms of demographics, enrollment, and asthma comorbidity. Compared with the Program group, the non-Program group had a significantly higher percentage of females and “other race” children, but significantly less Managed Care Organization enrollment and less allergy comorbidity. Despite all efforts, chance may dictate that the two groups differ on important variables at baseline. Bad luck! Statistical methods may be used to “correct” for these differences, but it is usually better to anticipate the problem. Improving on Chance Small to moderate-sized RCTs can gain power to detect a difference between experimental and control programs (assuming one is actually present) if special randomization procedures are used to balance the numbers of participants in each (blocked randomization) and in the distribution of baseline variables that might influence the outcomes (stratified blocked randomization). Chapter 4. Research Design, Validity, and Best Available Evidence– ●–123 Why are special procedures necessary if random assignment is supposed to take care of the number of people in each group or the proportion of people in each with certain characteristics? The answer is that, by chance, one group may end up being larger than the other or differing in age, gender, and so on. Good news: This happens less frequently in large studies. Bad news: The problem of unequal distribution of variables becomes even more complicated when groups or clusters of people (e.g., schools, families) rather than individuals are assigned. In this case, the evaluator has little control over the individuals within each cluster, and the number of clusters (over which he or she does have control) is usually relatively small (e.g., five schools, 10 clinics). Some form of constraint such as stratification is almost always recommended in RCTs in which allocation is done by cluster. Two commonly used methods for ensuring equal group sizes and balanced variables are blocked randomization and stratified blocked randomization as described in Table 4.3. Table 4.3 Enhancing Chance: Blocked and Stratified Blocked Randomization Blocking or Balancing the Number of Participants in Each Group Stratifying or Balancing Important Predictor (Independent) Variables Randomization is done in blocks of predetermined size. For example, if the block’s size is 6, randomization proceeds normally within each block until the third person is randomized to one group, after which participants are automatically assigned to the other group until the block of 6 is completed. This means that in a study of 30 participants, 15 will be assigned to each group, and in a study of 33, the disproportion can be no greater than 18:15. Stratification means dividing participants into segments. For example, participants can be divided into differing age groups (the stratum) or gender or educational level. In a study of a program to improve knowledge of how to prevent infection from HIV/AIDS, having access to reliable transportation to attend education classes is a strong predictor of outcome. It is probably a good idea to have similar numbers of people who have transportation (determined at baseline) assigned to each group. This can be done by dividing the study sample at baseline into participants with or without transportation (stratification by access to transportation) and then carrying out a blocked randomization procedure with each of these two strata. Example 4.5 illustrates how these techniques have been applied in program evaluations. 124– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE Example 4.5 Enhancing Chance by Using Special Randomization Procedures 1. We randomly allocated families to control or intervention groups using a computer program sequence generated by our statistician, blocked after every eight allocations. We aimed to do secondary analyses within the intervention group, albeit with modest power, on the basis of the number of nurse visits. Therefore, to increase the numbers in the intervention group, toward the end of recruitment, we randomly allocated families using a 5-to-3 ratio (5 intervention families to 3 controls). Randomization was stratified by the age of the index child—i.e., younger than 4 years and 4 to 12 years—since evidence exists indicating that preschool children are at increased risk for recurrence of physical abuse and neglect. Group assignment was placed in numbered sequential sealed envelopes. 2. Group allocation was based on block randomization. A sequential list of case numbers was matched to group allocations in blocks of ten by randomly drawing five cards labeled “control” and five cards labeled “treatment” from an envelope. This procedure was repeated for each block of ten sequential case numbers. The list of case numbers and group allocation was held by a researcher not involved in recruiting or interviewing inmates. The trial nurses responsible for assessing, recruiting, and interviewing inmates had no access to these lists. Once an inmate had been recruited and interviewed, the study nurse contacted the Central Randomization System via a mobile telephone to ascertain the inmate’s group allocation. 3. After the baseline period, patients meeting the inclusion criteria were randomly stratified by center (block size 12 not known to trial centers) in a 2:1:1 ratio (experimental program, an alternative program, waiting list) using a centralized telephone randomization procedure (random list generated with the Sample Software, version 8.0, Bronx, New York). Blinding In some randomized studies, the participants and investigators do not know which participants are in the experimental and control groups: This is the double-blind experiment. When participants do not know, but investigators do, this is called the blinded trial. Participants, people responsible for program implementation or assessing program outcomes, and statistical analysts are all candidates for being blinded. Experts in clinical trials maintain that blinding is as important as randomization in ensuring valid study results. Randomization, they say, eliminates confounding variables, or confounders, before the program is implemented—at baseline—but it cannot do away with confounding variables that occur as the study progresses. A confounding variable is an extraneous variable in a statistical or research model that affects the dependent variables but has either not been considered or not been controlled for. For example, age, educational level, and motivation may be confounders in a study that involves adherence to a complicated intervention. Chapter 4. Research Design, Validity, and Best Available Evidence– ●–125 Confounders can lead to a false conclusion that the dependent variables are in a causal relationship with the independent or predictor variables. For instance, suppose research shows that drinking coffee (independent or predictor variable) is associated with heart attacks (dependent variable). One possibility is that drinking coffee causes heart attacks. Another is that having heart attacks causes people to drink more coffee. A third explanation is that some other confounding factor, such as smoking, is responsible for heart attacks and is also associated with drinking coffee. Confounding during the course of a study can occur if participants get extra attention or the control group catches on to the experiment. The extra attention or changes in the control group’s perceptions may alter the outcomes of a study. One method of getting around and understanding the biases that may result in unblinded studies is to standardize all program activities and to monitor the extent to which the program has been implemented as planned. Pay special attention to the biases that may have occurred in randomized controlled studies without blinding. Expect the evaluator to report on how the program’s implementation was monitored and the extent to which any deviations from standard program procedures may have affected the outcomes. Example 4.6 contains examples of blinding used in RCTs. Example 4.6 Blinding in RCTs 1. The Researcher Is Blinded Seventy-five opaque envelopes were produced for the initial randomization and lodged with an independent staff member. Each contained a slip of paper with the word conventional, booklet, or control (25 each). The randomization was performed by this staff member selecting an envelope for each participant immediately after the initial assessment meeting with parents. For the re-randomization of the control crossover group, this process was repeated with a second batch of 26 envelopes, half each with the word conventional or booklet. The researcher conducting the study was therefore blind to the nature of the treatment allocated until after the posttreatment assessment. Following that point, both participant and researcher were aware of the treatment group to which they had been randomized. 2. Patients and Researchers Are Blinded This study was a randomized, multicenter trial comparing acupuncture, sham acupuncture, and a no-acupuncture waiting-list condition. The additional no-acupuncture waiting list control was included because sham acupuncture cannot be substituted for a physiologically inert placebo. Patients in the acupuncture groups were blinded to which treatment they received. Analysis of headache diaries was performed by two blinded evaluators. The study duration per patient was 28 weeks: 4 weeks before randomization, the baseline; 8 weeks of treatment; and 16 weeks of follow-up. Patients allocated to the waiting list received true acupuncture after 12 weeks and were also followed up for 24 weeks after randomization (to investigate whether changes were similar to those in patients receiving immediate acupuncture). 126– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE RCTs are generally expensive, time-consuming, and tend to address very specific research questions. They should probably be saved for relatively mature programs and practices, that is, those that previous research suggests are likely to be effective. Previous research includes large pilot studies and other randomized trials. Despite an important advantage over other types of research designs (specifically, their ability to establish that Program A is likely to have caused Outcome A), RCTs’ requirement for control sometimes gives them a bad name. Some researchers and practitioners express concern over the fairness of excluding certain groups and individuals from participation or from receiving the experimental program. Others question the idea of regarding humans as fit subjects for an experiment and of obtaining good information from quantitative or statistically oriented research. To some extent, these are personal or ethical concerns and are not inherent weaknesses of the RCT design itself. Nevertheless, it is certainly reasonable to expect researchers to explain their choice of design in ethical as well as methodological terms. In Example 4.7, the evaluators of a program that provides respite care for homeless patients, a cohort study, defend themselves. Example 4.7 Statement by the Evaluators of a Study of Respite Care for Homeless Patients Regarding the Ethics of Their RCT (Buchanan, Doblin, Sai, & Garcia, 2006) Finally, a randomized control trial is needed. Although the available demographic data, clinical variables, and baseline utilization data were similar in our respite care and usual care study groups, it is possible that unmeasured variables, including differential rates of substance use or psychiatric illness, may have confounded our results. Some might argue that a randomized trial would be unethical, given the obvious humanitarian virtues of respite care. But a randomized trial would be no less ethical than the current status quo in the United States, where respite care is available only to some, not all, homeless people. Now is the time for such a trial, given the results of the present study, the financial distress of many U.S. hospitals, and the unmet needs of our country’s homeless people. The evaluators understand that some people believe that if you have an intervention or program that is perceived to be humanitarian (e.g., respite care), you should not conduct an experiment in which some people are necessarily denied services. The investigators counter by arguing that some homeless people do not have access to respite care anyway. The evaluators also point out that evaluation research that takes the form of a randomized trial can help clarify the results of their study by providing information on unmeasured factors such as differential rates of substance use or psychiatric illness. Chapter 4. Research Design, Validity, and Best Available Evidence– ●–127 Quasi-Experimental Research Designs Quasi-experimental research is characterized by nonrandomized assignment to groups or by conducting a series of measures over time on one or more groups. Nonrandomized Controlled Trials: Concurrent Controls Nonrandomized controlled trials are a type of quasi-experimental design. In fact, quasi-experiment is often synonymous with nonrandomized control trial and is defined as a design in which one group receives the program and one does not, the assignment of participants to groups is not controlled by the researcher, and assignment is not random. Quasi-experimental, nonrandomized controlled trials rely on participants who volunteer to join the study, are geographically close to the study site, or conveniently turn up (e.g., at a clinic or a school) while the study is being conducted. As a result, people or groups in a quasi-experiment may self-select, and the evaluation findings may not be unbiased because they are dependent upon participant choice rather than chance. Quasi-experimental researchers use a variety of methods to ensure that the participating groups are as similar to one another as possible (equivalent) at baseline or before “treatment.” Among the strategies used to ensure equivalence is matching. Example 4.4 showed the matching approach applied to the randomized trial of a smokeless tobacco program for college athletes Matching requires selecting pairs of participants or clusters of individuals who are comparable to one another on important confounding variables. For example, suppose a researcher was interested in comparing the acuity of vision among smokers and nonsmokers. One method of helping to ensure that the two groups are balanced on important confounders requires that, for every smoker, there is a nonsmoker of the same age, sex, and medical history. Matching can effectively prevent confounding by important factors such as age and sex for individuals. The strategy’s implementation can be relatively expensive, however, because finding a match for each study participant is sometimes difficult and often time-consuming. Another technique for allocating participants to study groups in quasiexperiments include assigning each potential participant a number and using an alternating sequence in which every other individual (1, 3, 5, etc.) is assigned to the experimental group and the alternate participants (2, 4, 6, etc.) are assigned to the control. A different option is to assign groups in order of appearance; for example, patients who attend the clinic on Monday, 128– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE Wednesday, and Friday are in the experimental group, and those attending on Tuesday, Thursday, and Saturday are assigned to the control. To prevent certain types of patients (e.g., those who can only come on a certain day) from automatically being in one or the other of the groups, the procedure for assignment can be reversed after some number of days or weeks. Illustrations of nonrandomized, quasi-experimental designs with concurrent groups are given in Example 4.8. Example 4.8 Quasi-Experimental Design: Concurrent Groups 1. Reducing Injuries Among Teen Agricultural Workers (Reed & Kidd, 2004) Objective. To test an agricultural safety curriculum [Agricultural Disability Awareness and Risk Education (AgDARE)] for use in high school agriculture classes. Assessment for Eligibility. A total of 21 schools (1,138 agriculture students) from Kentucky, Iowa, and Mississippi participated in the program. Research Design. Schools in each state were grouped geographically to improve homogeneity in agricultural commodities and production techniques and then assigned randomly to either one of two intervention groups (A or B) or the control group. Fourteen schools were assigned to the intervention arms, and seven schools were assigned to the control group. Findings. Students who participated in AgDARE scored significantly higher in farm safety attitude and intent to change work behavior than the control group. School and public health nurses, working together with agriculture teachers, may make an effective team in reducing injuries among teen agricultural workers. 2. Contraceptive Practices Among Rural Vietnamese Men (Ha, Jayasuriya, & Owen, 2005) Objective. To test a social-cognitive intervention to influence contraceptive practices among men living in rural communes in Vietnam. Assessment for Eligibility. There were 651 married men from 12 villages in two rural communes (An Hong and Quoc Tuan) in the An Hai district of Hai Phong province in Vietnam. Interviewers visited each household in the selected villages and sought all married men aged 19–45 years who had lived with their wives in the same house during the three months prior to the study. The inclusion criteria were as follows: the wife was currently not pregnant, the couple did not plan to have a child in the next six months, they currently did not use condoms consistently for family planning, and the wives currently did not use the pill consistently for family planning. Evaluation Research Design. Villages were chosen as the primary unit for intervention. From each of the two communes, three villages were chosen for intervention and three as controls. The intervention villages were separated from control villages by a distance of 2–3 km. Participants in both study groups were assessed, using interviewer-based questionnaires, prior to (baseline) and following the intervention (posttest). Chapter 4. Research Design, Validity, and Best Available Evidence– ●–129 Findings. There were 651 eligible married men in the 12 villages chosen. A significant positive movement in men’s stage of readiness for IUD use by their wife occurred in the intervention group. There were no significant changes in the control group. Compared to the control group, the intervention group showed higher pros, lower cons, and higher self-efficacy for IUD use by their wife as a contraceptive method. Interventions based on social-cognitive theory can increase men’s involvement in IUD use in rural Vietnam and should assist in reducing future rates of unwanted pregnancy. Strong quasi-experimental designs have many desirable features. They can provide information about programs when it is inappropriate or too late to randomize participants. Another desirable characteristic of quasiexperiments is that, when compared to RCTs, their settings and participants may more accurately reflect the messiness of the real world. An RCT requires strict control over the environment, and to get that control the evaluator has to be extremely stringent with respect to the research question being posed and who is included and excluded from study participation. As a result, RCT findings may apply to a relatively small population in constrained settings. Nonrandomized designs are sometimes chosen over randomized ones in the mistaken belief that they are more ethical than randomized trials. The idea behind the ethical challenge is that, if the evaluation researcher suspects that Program A is better than Program B, then how (in ethical terms) can he or she allocate Program B to innocent participants? In fact, evaluations are only ethical if they are designed well enough to have a strong likelihood of producing an accurate answer about program effectiveness. There are cases in which programs that were presumed effective turned out not to be so after all. We have to assume that the evaluator has no evidence that Program A is better than Program B to start with because, if he or she had proof, then the evaluation would be unnecessary. Some researchers and practitioners also think that quasi-experiments are less costly than RCTs, but this has never been proven. Poor studies, whether RCTs or quasi-experiments, are costly when they result in misleading or incorrect information, which may delay or even prevent participants from getting needed services or education. Good quasi-experiments are difficult to plan and implement and require the highest level of research expertise. Many borrow techniques from RCTs, including blinding. Many others use sophisticated statistical methods to enhance confidence in the findings. The most serious potential flaw in quasi-experimental designs without random assignment is that the groups in the experimental and control 130– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE groups may differ from one another at baseline so that the program cannot have a fair trial. Therefore, in evaluating quasi-experiments, it is absolutely crucial to find confirmation (usually done statistically) that either no difference in groups existed to begin with or the appropriate statistical methods were used to control for the differences. Time-Series Designs Time-series designs are longitudinal studies that enable the researcher to monitor change from one time to the next. They are sometimes called repeated measures analyses. Debate exists over whether time-series designs are research or analytic designs. In a simple self-controlled design (also called pretest-posttest design), each participant is measured on some important program variable and serves as his or her own control. Participants are usually measured twice (at baseline and after program participation), but they may be measured multiple times afterward as well (see Example 4.9). Example 4.9 Pilot Test of a Cognitive-Behavioral Program for Women With Multiple Sclerosis (Sinclair & Scroggie, 2005) Objective. The purpose of this quasi-experimental study was to evaluate the effectiveness of a cognitive-behavioral intervention for women with multiple sclerosis (MS). Assessment for Eligibility. Thirty-seven adult women with MS participated in a group-based program titled “Beyond MS,” which was led by master’s-prepared psychiatric nurses. Research Design. Perceived health competence, coping behaviors, psychological well-being, quality of life, and fatigue were measured at four time periods: 5 weeks before the beginning of the intervention, immediately before the program, at the end of the 5-week program, and at a 6-month follow-up. Findings. There were significant improvements in the participants’ perceived health competence, indices of adaptive and maladaptive coping, and most measures of psychological well-being from pretest to posttest. The positive changes brought about by this relatively brief program were maintained during the 6-month follow-up period. Pretest-posttest designs have many disadvantages from an evidencebased practice perspective. Participants may become excited about taking part in an experiment, and this excitement may help motivate performance; without a comparison group, you cannot control for the excitement. Also, Chapter 4. Research Design, Validity, and Best Available Evidence– ●–131 between the pretest and the posttest, participants may mature physically, emotionally, and intellectually, affecting the program’s outcomes. Finally, selfcontrolled evaluations may be affected by historical events, including changes in program administration and policy. Because of their limitations, self-controlled time-series designs are not considered experimental designs (some researchers call them preexperimental rather than quasi-experimental), and they are only appropriate for pilot studies or preliminary feasibility studies. Pretest-posttest designs are not useful for evidence of effectiveness, and they are not meant to be. Historical Controls Some researchers make up for the lack of a readily available control group by using a historical control. With traditional historical controls, investigators compare outcomes among participants who receive a new program with outcomes among a previous group of participants who received the standard program. An illustration of the use of historical controls is given in Example 4.10. Time-series designs can also be improved by adding more measurements for a single group of participants before and after the program (in a single time-series design) and adding a control (in a multiple time-series design). Example 4.10 Historical Controls: Use and Impact of an eHealth System by Low-Income Women With Breast Cancer (Gustafson et al., 2005) Objective. To examine the feasibility of reaching underserved women with breast cancer and determine how they use the system and what impact it had on them. Assessment for Eligibility. Participants included women recently diagnosed with breast cancer whose income was at or below 250% of the poverty level and were living in rural Wisconsin (n = 144; all Caucasian) or Detroit (n = 85; all African American). Evaluation Research Design. Historical Control: A comparison group of patients (n = 51) with similar demographics was drawn from a separate recently completed randomized clinical trial. Findings. When all low-income women from this study are combined and compared with a lowincome control group from another study, the Comprehensive Health Enhancement Support System [CHESS]) group was superior to that control group in 4 of 8 outcome variables at both statistically and practically significant levels (social support, negative emotions, participation in health care, and information competence). We conclude that an eHealth system like CHESS will have a positive impact on low-income women with breast cancer. 132– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE Interrupted or Single Time-Series Designs The interrupted or single time-series design without a control group (hence, the “single”) involves repeated measurement of a variable (e.g., reported crime) over time, encompassing periods both before and after implementation of a program. The goal is to evaluate whether the program has interrupted or changed a pattern established before the program’s implementation. For instance, an evaluation using an interrupted timesseries design may collect quarterly arrest rates for drug-related offenses in a given community for 2 years before and 2 years following the implementation of a drug enforcement task force. The data analysis would focus on changes in patterns before and after the introduction of the program. In a multiple time-series design, multiple interrupted observations are collected before and after a program is launched. The “multiple” means that the observations are collected in two or more groups. Time-series designs are complex research designs requiring many observations of outcomes and, in the case of multiple time-series designs, the participation of many individuals and even communities. Their complex analysis has led some researchers to take the position that they are really data analytic strategies. Observational Designs In observational designs, researchers conduct studies with existing groups of people or use existing databases. They do not intervene, which is to say, they do not introduce programs. Among the observational designs that are used in evaluation research are cohorts, case controls, and crosssectional surveys. Cohort Designs A cohort is a group of people who have something in common and who remain part of a study group over an extended period of time. In public health research, cohort studies are used to describe and predict the risk factors for a disease and the disease’s cause, incidence, natural history, and prognosis. They tend to be extremely large studies. Cohort studies may be prospective or retrospective. With a prospective design, the direction of inquiry is forward in time; with a retrospective design, the direction is backward in time. Chapter 4. Research Design, Validity, and Best Available Evidence– ●–133 Example 4.11 contains abstracts of two cohort studies. The first is an abstract of the National Treatment Improvement Evaluation Survey, a longitudinal study (a prospective study that takes place over several years) of a national sample of substance abuse treatment programs that had received federal treatment improvement demonstration grants in 1990–1991 (the cohort). Treatment programs and their clients across 16 states completed highly structured lay-administered interviews between July 1993 and November 1995. Administrative interviews elicited information from senior program administrators that focused on program finances and staff configuration, including the primary measure of interest and whether the program had staff designated as case managers. The second abstract in Example 4.11 is of a study to examine detection rates of depression in primary care. The investigators used data collected from a prospective cohort study of 1,293 consecutive general practice attendees in the United Kingdom. Example 4.11 Two Cohort Studies 1. Prospective Cohort Design: Case Managers as Facilitators of Medical and Psychosocial Service Delivery in Addiction Treatment Programs (Friedmann, Hendrickson, Gerstein, & Zhang, 2004) Objective. To examine whether having designated case management staff facilitates delivery of comprehensive medical and psychosocial services in substance abuse treatment programs. Assessment for Eligibility. Clients from long-term residential, outpatient, and methadone treatment modalities. Research Design. A prospective cohort study of 2,829 clients admitted to selected substance abuse treatment programs. Findings. Availability of designated case managers increased client-level receipt of only two of nine services, and exerted no effect on service comprehensiveness compared to programs that did not have designated case managers. These findings do not support the common practice of designating case management staff as a means to facilitate comprehensive services delivery in addiction treatment programs. 2. A Prospective Cohort Design: Recognition of Depression in Primary Care: Does it Affect Outcome? The PREDICT-NL Study (Kamphuis et al., 2011) Background. Detection rates of depression in primary care are < 50%. Studies showed similar outcome after 12 months for recognized and unrecognized depression. Outcome beyond 12 months is less well studied. (continued) 134– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE Example 4.11 (Continued) Objective. We investigated recognition of depression in primary care and its relation to outcome after 6, 12 and 39 months. Methods. Data were used from a prospective cohort study of 1,293 consecutive general practice attendees (PREDICT-NL), who were followed up after 6 (n = 1236), 12 (n = 1179) and 39 (n = 752) months. We measured the presence and severity of major depressive disorder (MDD) according to DSM-IV criteria and Patient Health Questionnaire 9 (PHQ-9) and mental function with Short Form 12 (SF-12). Recognition of depression was assessed using international classification of primary care codes (P03 and P76) and Anatomical Therapeutic Chemical (N06A) codes from the GP records (6 months before/after baseline). Results. At baseline, 170 (13%) of the participants had MDD, of whom 36% were recognized by their GP. The relative risk of being depressed after 39 months was 1.35 [95% confidence interval (CI) 0.7–2.7] for participants with recognized depression compared to unrecognized depression. At baseline, participants with recognized depression had more depressive symptoms (mean difference PHQ-9 2.7, 95% CI 1.6–3.9) and worse mental function (mean difference mental component summary −3.8, 95% CI −7.8 to 0.2) than unrecognized depressed participants. After 12 and 39 months, mean scores for both groups did not differ but were worse than those without depression. Conclusions. A minority of patients with MDD is recognized in primary care. Those who were unrecognized had comparable outcome after 12 and 39 months as participants with recognized depression. High-quality prospective or longitudinal studies are expensive to conduct, especially if the researcher is concerned with outcomes that are relatively rare or hard to predict. Studying rare and unpredictable outcomes requires large samples and numerous measures. Also, researchers who do prospective cohort studies have to be on guard against loss of subjects over time, or attrition (also called loss to follow-up). For instance, longitudinal studies of children are often beset by attrition because, over time, children lose interest, move far away, change their names, or are otherwise unavailable. If a large number of people drop out of a study, the sample that remains may be very different from the one that was originally enrolled. The remaining sample may be more motivated or less mobile than those who left, for example, and these factors may be related in unpredictable ways to any observed outcomes. When reviewing prospective cohort studies, make sure that the researchers address how they handled loss to follow-up or attrition. Ask these questions: How large a problem was attrition? Were losses to follow-up handled in the analysis? Were the study’s findings affected by the losses? Because of the difficulties and expense of implementing prospective cohort designs, many cohort designs reported in the literature tend to be Chapter 4. Research Design, Validity, and Best Available Evidence– ●–135 retrospective. Retrospective cohort designs use existing databases to identify cohorts; they may do an analysis of the data that already exist in the database or collect new data. A sample retrospective cohort design that identifies the cohort and collects new data is illustrated in Example 4.12. Example 4.12 Retrospective Cohort Design: Tall Stature in Adolescence and Depression in Later Life (Bruinsma et al., 2006) Objective. To examine the long-term psychosocial outcomes for women assessed or treated during adolescence for tall stature. Assessment for Eligibility. Women assessed or treated for tall stature identified from the records of Australian pediatricians were eligible to participate. Research Design. Retrospective cohort study in which women treated for tall stature were traced using electoral rolls and telephone listings. Once found, the women were contacted by mail and invited to complete a postal questionnaire and computer assisted telephone interview. Psychosocial outcomes were measured using the depression, mania, and eating disorders modules of the Composite International Diagnostic Interview (CIDI), the SF-36, and an index of social support. Findings. There was no significant difference between treated and untreated women in the prevalence of 12 month or lifetime major depression, eating disorders, or scores on the SF-36 mental health summary scale or the index of social support. However, compared with the findings of populationbased studies, the prevalence of major depression in both treated and untreated tall girls was high. Retrospective cohort designs have the same strengths as prospective designs. They can establish that a predictor variable (e.g., being in a treatment program) precedes an outcome (e.g., depression). Also, because data are collected before the outcomes being assessed are known with certainty, the measurement of variables that might predict the outcome (e.g., being in a program) cannot be biased by prior knowledge of which people are likely to develop a problem (e.g., depression). Case-Control Designs Case-control designs are generally retrospective. They are used to explain why a phenomenon currently exists by comparing the histories of two different groups, one of which is involved in the phenomenon. For example, a case-control design might be used to help understand the social, demographic, and attitudinal variables that distinguish people who, at the present time, have been identified with frequent headaches from those who 136– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE do not currently have frequent headaches. The researchers in a case-control study like this want to know which factors (e.g., dietary habits, social arrangements, education, income, quality of life) distinguish one group from the other. The cases in case-control designs are individuals who have been chosen on the basis of some characteristic or outcome (e.g., frequent headaches). The controls are individuals without the characteristic or outcome. The histories of cases and controls are analyzed and compared in an attempt to uncover one or more characteristics that are present in the cases and not in the controls. How can researchers avoid having one group decidedly different from the other (e.g., healthier, smarter)? Some methods include randomly selecting the controls, using several controls, and carefully matching controls and cases on important variables. Example 4.13 uses a sophisticated sampling strategy to compare the role of alcohol use in boating deaths. Example 4.13 Alcohol Use and Risk of Dying While Boating (Smith et al., 2001) Objective. To determine the association of alcohol use with passengers’ and operators’ estimated relative risk of dying while boating. Assessment for Eligibility. A study of recreational boating deaths among persons aged 18 years or older from 1990–1998 in Maryland and North Carolina (n = 221) provided the cases, which were compared with control interviews obtained from a multistage probability sample of boaters in each state from 1997–1999 (n = 3,943). Persons aged 18 years or older from 1990–1998 in Maryland and North Carolina (n = 221) were compared with control interviews obtained from a multistage probability sample of boaters in each state from 1997–1999 (n = 3,943). In this study, a complex random sampling scheme was employed to minimize bias among control subjects and maximize their comparability with cases (e.g., deaths took place in the same location). Epidemiologists often use case-control designs to provide insight into the causes and consequences of disease and other health problems. Reviewers of these studies should be on the lookout for certain methodological problems, however. First, cases and controls are often chosen from two separate populations. Because of this, systematic differences (e.g., motivation, cultural beliefs) may exist between or among the groups that are Chapter 4. Research Design, Validity, and Best Available Evidence– ●–137 difficult to anticipate, measure, or control, and these differences may influence the study’s results. Another potential problem with case-control designs is that the data often come from people’s recall of events, such as asking women to discuss the history of their physical activity or asking boaters about their drinking habits. Memory is often unreliable, so the results of a study that depends on memory may result in misleading information. Cross-Sectional Designs Cross-sectional designs result in a portrait of one or many groups at one period of time. They are sometimes called descriptive or pre-experimental designs. Following are three illustrative uses of cross-sectional designs The most common use of cross-sectional designs is to describe the study sample. The tabular description of results is sometimes called Table 1 because it is often the first table in a study report or article. Example 4.14 shows an example Table 1. Example 4.14 Sociodemographic Characteristics, Substance Abuse History, and History of Violence: Low-Income Women Seeking Emergency Care in the Bronx, NY, 2001–2003 (El-Bassel, Gilbert, Vinocur, Chang, & Wu, 2011) Total (N = 241) Participants Not Meeting PTSD Criteria (n = 169) Participants Meeting PTSD Criteria (n = 72) 33 (10) 33 (10) 33 (10) Latina 119 (49) 81 (48) 38 (53) African American 105 (44) 75 (44) 30 (42) 17 (7) 13 (8) 4 (6) High school diploma, no. (%) 127 (53) 93 (55) 34 (47) Employed in past 6 mo., no. (%) 111 (46) 86 (51) 25* (35) Homeless in past 6 mo., no. (%) 38 (16) 23 (14) 15 (21) Sociodemographic characteristics Age, y, mean (SD) Race/ethnicity, no. (%) Other (continued) 138– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE Example 4.14 (Continued) Total (N = 241) Participants Not Meeting PTSD Criteria (n = 169) Participants Meeting PTSD Criteria (n = 72) Substance abuse in past 6 mo., no. (%) Heavy episode drinking 57 (24) 30 (18) 27** (38) 104 (43) 61 (36) 43** (60) 99 (41) 50 (30) 49** (68) Lifetime sexual IPV 167 (69) 107 (63) 60** (83) Lifetime physical or injurious IPV 165 (68) 103 (61) 62** (86) Illicit drug use History of violence, no. (%) Childhood sexual abuse (before age 16 y) NOTE: IPV = intimate partner violence; PTSD = posttraumatic stress disorder. *P < 0.05; **P < 0.01 The major limitation of cross-sectional studies is that, on their own and without follow-up, they provide no information on causality; they only provide information on events at a single, fixed point in time. For example, suppose a researcher finds that girls have less knowledge of current events than do boys. The researcher cannot conclude that being female somehow causes less knowledge of current events. The researcher can only be sure that, in this survey undertaken at this particular time, girls had less knowledge than boys did. To illustrate this point further, suppose you are doing a literature review on community-based exercise programs. You are specifically interested in learning about the relationship between age and exercise. Does exercise decrease with age? In your search of the literature, you find the report presented in Example 4.15. Example 4.15 A Report of a Cross-Sectional Survey of Exercise Habits In March of this year, Researcher A surveyed a sample of 1,500 people between the ages of 30 and 70 to find out about their exercise habits. One of the questions he asked participants was, “How much do you exercise on a typical day?” Researcher A divided his sample into two groups: People 45 years of age and younger and people 46 years and older. Researcher A’s data analysis revealed that the amount of daily exercise reported by the two groups differed with the younger group reporting 15 minutes more exercise on a typical day. Chapter 4. Research Design, Validity, and Best Available Evidence– ●–139 Based on this summary, does amount of exercise decline with age? The answer is that you cannot get the answer from Researcher A’s report. The decline seen in a cross-sectional study like this one can actually represent a decline in exercise with increasing age, or it may reflect the oddities of this particular sample. The younger people in this study may be especially sports minded, while the older people may be particularly adverse to exercise. As a reviewer, you need to figure out which of the two explanations is better. One way you can do this is to search the literature to find out which conclusions are supported by other studies. Does the literature generally sustain the idea that amount of exercise always declines with age? After all, in some communities the amount of exercise done by older people may actually increase because, with retirement or part-time work, older adults may have more time to exercise than do younger people. Observational Designs and Controlled Trials: Compare and Contrast Observational data can be useful adjuncts to randomized controlled trials and quasi-experiments. They can assist the researcher in determining whether effectiveness under controlled conditions translates into effective treatment in routine settings. Also, some problems simply do not lend themselves to a randomized controlled trial. For instance, when they studied the effects of cigarette smoking on health, it was impossible for researchers to randomly assign some people to smoke while assigning others to abstain. The only possible design was an observational one, albeit one that involved decades of observing hundreds of thousands of people all over the world. The case for observational studies over RCTs is suggested in a study reported in the British Medical Journal: The investigators in the study aimed to determine whether “parachutes are effective in preventing major trauma related to gravitational challenge.” To find out, they reviewed all the randomized controlled trials they could find in Medline [PubMed], Web of Science, EMBASE, and the Cochrane Library databases. They also reviewed appropriate Internet sites and citation lists. To be included, a study had to discuss the effects of using a parachute during free fall. The effects were defined as death or major trauma, defined as an injury severity score > 15. Despite their diligence and scientific approach to the review, the investigators were not able to find any randomized controlled trials of the effectiveness of parachute intervention. They concluded that as with many interventions intended to prevent ill health, the effectiveness of 140– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE parachutes has not been subjected to rigorous evaluation by using randomized controlled trials. The investigators point out that this is a serious problem for hard-line advocates of evidence-based medicine who are adamantly opposed to the adoption of interventions evaluated by using only observational data. To resolve the problem, the investigators recommend that the most radical protagonists of evidence-based medicine organize and participate in a double blind, randomized, placebo controlled, crossover trial of the parachute. They further conclude that individuals who insist that all interventions need to be validated by a randomized controlled trial need to come down to earth with a bump.  THE BOTTOM LINE: INTERNAL AND EXTERNAL VALIDITY Internal validity refers specifically to whether an experimental program makes a difference and whether there is sufficient evidence to support the claim. A study has internal validity when you can confidently say that Program A causes Outcome A. A study has external validity if it is generalizable because its results are applicable to other programs, populations, and settings. Internal Validity Is Threatened Just as the best-laid plans of mice and men (and women) often go awry, evaluation research no matter how well planned loses something in the execution. Randomization may not produce equivalent study groups, for example, or people in one study group may drop out more often than will people in the other. Factors such as less-than-perfect randomization and attrition can threaten or compromise an evaluation’s validity. There are at least eight common threats to internal validity. 1. Selection of participants. This threat occurs when biases result from the selection or creation of groups that are not equivalent. Either the random assignment did not work or attempts to match groups or control for baseline confounders were ineffective. As a result, groups can be distinguished by being more affected by a given policy, more mature, and more affected by differential administration and content of the baseline and postprogram measures. Selection can interact with history, maturation, and instrumentation. 2. History. Unanticipated events occur while the evaluation is in progress, and this history jeopardizes internal validity. A change in policy Chapter 4. Research Design, Validity, and Best Available Evidence– ●–141 or a historical event may affect participants’ behavior while they are in the program. For instance, the effects of a school-based program to encourage healthier eating may be affected by a healthy eating campaign on a popular children’s television show. 3. Maturation. Processes (e.g., physical and emotional growth) occur within participants inevitably as a function of time, threatening validity. Children in a 3-year school-based physical education program mature physically, for example. 4. Testing. This threat can occur because taking one test has an effect on the scores of a subsequent test. For instance, after a 3-week program, participants are given a test. They recall their answers on the pretest, and this influences their responses to the second test. The influence may be positive (they learn from the test) or negative (they recall incorrect answers). 5. Instrumentation. Changes in a measuring instrument or changes in observers or scorers cause an effect that can diminish validity. For example, Researcher A makes slight changes between the questions asked at baseline and those asked after the conclusion of the program. Or Researcher B administers the baseline measures, but Researcher A administers the posttest measures. 6. Statistical regression. This effect operates when participants are selected on the basis of extreme scores and regress or go back toward the mean (e.g., average score) of that variable. Only people at great risk are included in the program, for example. Some of them inevitably regress to the mean or average score. Regression to the mean is a statistical artifact (i.e., due to some factor or factors outside of the study). 7. Attrition (dropout) or loss to follow-up. This threat to internal validity is the differential loss of participants from one or more groups on a nonrandom basis. For instance, participants in one group drop out more frequently than do participants in the others or are lost to follow-up. The resulting two groups, which had similar characteristics at baseline, no longer do. 8. Expectancy. A bias is caused by the expectations of the evaluator, the participants, or both. Participants in the experimental group expect special treatment, for example, while the evaluator expects to give it to them (and sometimes does). Blinding is one method of dealing with expectancy. A second is to ensure that a standardized process is used in delivering the program. 142– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE External Validity Is Threatened Threats to external validity are most often the consequence of the way in which participants or respondents are selected and assigned. For example, respondents in an experimental situation may answer questions atypically because they know they are in a special experiment; this is called the Hawthorne effect. External validity is also threatened whenever respondents are tested, surveyed, or observed. They may become alert to the kinds of behaviors that are expected or favored. There are at least four relatively common sources of external invalidity. 1. Interaction effects of selection biases and the experimental treatment. This threat to external validity occurs when an intervention or program and the participants are a unique mixture, one that may not be found elsewhere. The threat is most apparent when groups are not randomly constituted. Suppose a large company volunteers to participate in an experimental program to improve the quality of employees’ leisure time activities. The characteristics of the company (some of which, like leadership and priorities, are related to the fact that it volunteered for the experiment) may interact with the program so that the two together are unique; the particular blend of company and program can limit the applicability of the findings. 2. Reactive effects of testing. These biases occur when a baseline measure interacts with the program, resulting in an effect that will not generalize. For example, two groups of students participate in an ethics program evaluation. Group 1 is given a test before watching a film, but Group 2 just watches the film. Group 1 performs better on a posttest because the pretest sensitizes them to the program’s content, and they pay more attention to the film’s content. 3. Reactive effects of experimental arrangements or the Hawthorne effect. This threat to external validity can occur because participants know that they are participating in an experiment. This threat is caused when people behave uncharacteristically because they are aware that their circumstances are different. (They are being observed by cameras in the classroom, for instance, or they have been chosen for an experiment.) 4. Multiple program interference. This threat results when participants are in other complementary activities or programs that interact. For example, participants in an experimental mathematics program are also taking physics class. Both teach differential calculus. Chapter 4. Research Design, Validity, and Best Available Evidence– ●–143 External validity is dependent upon internal validity. Research findings cannot be generalized to other populations and settings unless we first know if these findings are due to the program or to other factors. Randomized controlled trials with double blinding have the greatest chance of being internally valid—assuming that their data collection and analysis are also valid. As soon as the researcher begins to deviate from the strict rules of an RCT, threats to internal validity begin to appear. Example 4.16 illustrates a sample of the threats to internal and external validity found in evaluation reports. Example 4.16 Threats to Internal and External Validity: Reducing Confidence in the Evidence of the Effectiveness of Four Programs 1. Evaluating a Health Care Program to Get Adolescents to Exercise (Patrick et al., 2006) An additional concern in interpreting results is the potential impact on our findings of measurement reactivity in which self-reported behavior is influenced by the measurement process itself. Repeated assessments of the target behaviors as well as extensive surveys on thoughts and actions used to change behaviors (not described in this article) could have motivated and even instructed adolescents in both conditions to change behaviors, and control participants reported improvements in several diet and physical activity behaviors [reactive effects of testing]. Measurement effects have been demonstrated in studies promoting physical activity through primary care settings, and this also may occur with diet assessment. 2. Evaluating a Mental Health Intervention for Schoolchildren Exposed to Violence: A Randomized Controlled Trial (Stein et al., 2003) The CBITS [Cognitive Behavioral Intervention for Trauma in Schools] intervention was not compared with a control condition such as general supportive therapy, but rather with a wait-list delayed intervention. As a consequence, none of the informants (students, parents, or teachers) were blinded to the treatment condition. It is possible that the lack of blinding [expectancy] may have contaminated either the intervention or assessments. School staff and parents may have provided more attention and support to students who were eligible for the program while they were on a waiting list; alternatively, respondents may have been more likely to report improvement in symptoms for those students whom they knew had received the intervention. 3. HIV-Risk-Reduction Intervention Among Low-Income Latina Women (Peragallo et al., 2005) Individuals lost to follow-up (n = 112) differed from those who received at least one session of the intervention (n = 292) [attrition, generalizability] with respect to age (younger), ethnicity (Puerto Rican), years in the United States (slightly more years in United States), education (completed 1 more year), marital status (less likely to be married), insurance source (more likely to have insurance), and acculturation (more non-Hispanic acculturation) [selection]. (continued) 144– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE Example 4.16 (Continued) 4. Tall Stature in Adolescence and Depression in Later Life (Bruinsma et al., 2006) Another possibility is that the assessment or treatment procedures predisposed women to depression either because it medicalized the issue of their height or because of the intrusiveness of the assessment and treatment [reactive effects of testing and of experimental arrangements] and its effect on adolescent girls. In this study, there was evidence that women who reported a negative experience of assessment or treatment procedures were significantly more likely to have a history of depression than women who did not, which is consistent with other studies. A high-quality research article will always describe threats to its validity, sometimes called limitations, in the discussion or conclusions section. The following checklist consists of questions to ask when evaluating a study’s internal and external validity. What Evidence-Based Public Health Practice Should Watch For: A Checklist for Evaluating a Study’s Internal and External Validity 99 If the research has two or more groups, is information given on the number of people in each group who were eligible to participate? 99 If the research has two or more groups, is information given on the number in each group who agreed to participate? 99 If the research has two or more groups, is information given on the number in each group who were assigned to groups? 99 If the research has two or more groups, is information given on the number in each group who completed all of the program’s activities? 99 Were reasons given for refusal to participate among participants (including personnel)? 99 Were reasons given for not completing all program or data collection activities? 99 Did any historical or political event occur during the course of the study that may have affected its findings? 99 In long-term studies, was information given on the potential effects on outcomes of physical, intellectual, and emotional changes among participants? 99 Was information provided on concurrently running programs that might have influenced the outcomes? 99 Was there reason to believe that taking a preprogram measurement affected participants’ performance on a postprogram measurement? Chapter 4. Research Design, Validity, and Best Available Evidence– ●–145 99 99 99 99 99 This problem might arise in evaluations of programs that take a few weeks or require only a few sessions. Was there reason to believe that changes in measures or observers may have affected the outcomes? Did the researchers provide information on whether observers or people administering the measures (e.g., tests, surveys) were trained and monitored for quality? If participants were chosen because of special needs, did the researchers discuss how they dealt with regression toward the mean? Did the researchers provide information on how staff ensured that the program was delivered in a standardized manner? Were participants or researchers blinded to the intervention? If not, did the researchers provide information on how the outcomes were affected? THE PROBLEM OF INCOMPARABLE PARTICIPANTS:  STATISTICAL METHODS TO THE RESCUE Randomization is designed to reduce disparities between experimental and control groups by balancing them with respect to all characteristics (e.g., participants’ age, sex, or motivation) that might affect a study’s outcome. With effective randomization, the only difference between study groups is whether or not they are assigned to receive an experimental program. The idea is that, if discrepancies in outcomes are subsequently found by statistical comparisons (e.g., the experimental group improves significantly), they can be attributed to the fact that some people received the experiment while others did not. In observational and nonrandomized studies, the researcher cannot assume that the groups are balanced before they receive (or do not receive) a program or intervention. In observational studies, for example, measured participant characteristics are obtained before, during, and after program participation, and it is often difficult to determine exactly which characteristics are baseline variables. Also, there frequently are unmeasured characteristics that are not available, inadequately measured, or unknown. But if the participants are different, then how can the evaluator who finds a difference between experimental and control outcomes separate the effects of the intervention from differences in study participants? One answer is to consider taking care of potential confounders during the data analysis phase using statistical methods such as analysis of covariance and propensity score methods. 146– ●–EVIDENCE-BASED PUBLIC HEALTH PRACTICE Analysis of Covariance Analysis of covariance (ANCOVA) is a statistical procedure that results in estimates of intervention or program effects adjusted for participants’ background (and potentially confounding) characteristics or covariates (e.g., age, gender, educational background, severity of illness, type of illness, motivation). The covariates are included explicitly in a statistical model. Analysis of covariance adjusts for the confounder by assuming (statistically) that all participants are equally affected by the same confounder, say, age. That is, the ANCOVA can provide an answer to this question: If you balance the ages of the participants in the experimental and control groups so that age has no influence on one group versus the others, how do the experimental and control groups compare? The ANCOVA removes age as a possible confounder at baseline. The choice of covariates to include in the analysis comes from the literature, preliminary analysis of study data, and expert opinion on which characteristics of participants might influence their willingness to participate in and benefit from study inclusion. Example 4.17 illustrates the use of ANCOVA in a study protocol or plan to improve work task performance in young adults with Down syndrome. Example 4.17 Excerpt From Study Protocol of a Randomised Controlled Trial to Investigate if a Community-Based Strength Training Programme Improves Work Task Performance in Young Adults With Down Syndrome Aim. The aim of this study is to investigate if a student-led community-based progressive resistance training programme can improve these outcomes in adolescents and young adults with Down syndrome. Methods. A randomised controlled trial will compare progressive resistance training with a control group undertaking a social programme. Seventy adolescents and young adults with Down syndrome aged 14–22 years and mild to moderate intellectual disability will be randomly allocated to the intervention or control group using a concealed...
Purchase answer to see full attachment

Tags: Research Design and Methods secondary data