HLSS500 Validity and Reliability Quantitative Research Methods Discussion

User Generated

zevprpernzzna64

Other

HLSS500

Description

Think about how you might design a quantitative research project. What methods would you use to collect your data? What would you need to do to demonstrate that your study had a high degree of validity and reliability?

Instructions: Fully utilize the materials that have been provided to you in order to support your response. Your initial post should be at least 500 words.

Unformatted Attachment Preview

Introduction and Overview In: Confidence Intervals By: Michael Smithson Pub. Date: 2011 Access Date: May 6, 2019 Publishing Company: SAGE Publications, Inc. City: Thousand Oaks Print ISBN: 9780761924999 Online ISBN: 9781412983761 DOI: https://dx.doi.org/10.4135/9781412983761 Print pages: 2-3 © 2003 SAGE Publications, Inc. All Rights Reserved. This PDF has been generated from SAGE Research Methods. Please note that the pagination of the online version will vary from the pagination of the print book. SAGE SAGE Research Methods 2003 SAGE Publications, Ltd. All Rights Reserved. Introduction and Overview This monograph surveys methods for constructing confidence intervals, which estimate and represent statistical uncertainty or imprecision associated with estimates of population parameters from sample data. A typical example of a confidence interval statement is a pollster's claim that she or he is 95% confident that the true percentage vote for a political candidate lies somewhere between 38% and 44%, on the basis of a sample survey from the voting population. Pollsters often refer to the gap between 38% and 44% as the “margin of error.” In statistical terms, the interval from 38% to 44% is a 95% confidence interval, and 95% is the confidence level. The pollster's claim actually means that she or he has a procedure for constructing an interval that, under repeated random sampling in identical conditions, would contain the true percentage of the vote 95% of the time. We will examine this more technical meaning in the next chapter. This interval conveys a lot of information concisely. Not only does it tell us approximately how large the vote is, but it also enables anyone so disposed to evaluate the plausibility of various hypothetical percentages. If the previous election yielded a 39% vote for this candidate, for instance, then it is not beyond the bounds of plausibility (at the 95% confidence level) that the candidate's popularity has remained the same. This is because 39% is contained in the interval from 38% to 44% and therefore is a plausible value for the true percentage vote. That said, we also cannot rule out the possibilities of an increase by as much as 5% or a decline by as much as 1%. The confidence interval also enables us to assess the capacity of the poll to resolve competing predictions or hypotheses about the candidate's popularity. We can rule out, for instance, a hypothesis that the true percentage is 50%, but we cannot rule out hypothetical values of the percentage vote that fall within the 38%-44% interval. If, for example, the candidate needs to gain a clear majority vote to take office, then this poll is able to rule that out as implausible if the election were held on the same day as the poll (assuming that a 95% confidence level is acceptable to all concerned). If, on the other hand, the candidate needs only a 4% increase to take office, then the confidence interval indicates that this is a plausible possibility. In fact, as we will see in Chapter 2, a confidence interval contains all the hypothetical values that cannot be ruled out (or rejected). Viewed in that sense, it is much more informative than the usual significance test. This monograph refers to a fairly wide variety of statistical techniques, but many of these should be familiar to readers who have completed an undergraduate introductory statistics unit for social science students. Where less familiar techniques are covered, readers may skip those parts without sacrificing their understanding of the fundamental concepts. In fact, Chapters 2–4 and 7 cover most of the fundamentals. Chapter 2 introduces the basis of the confidence interval framework, beginning with the concepts of a sampling distribution and a limiting distribution. Criteria for “best” confidence intervals are discussed, along with the trade-off between confidence and precision (or decisiveness). The strengths and weaknesses of confidence intervals are presented, particularly in comparison with significance tests. Chapter 3 covers “central” confidence intervals, for which the same standardized distribution may be used Page 2 of 3 Confidence Intervals SAGE SAGE Research Methods 2003 SAGE Publications, Ltd. All Rights Reserved. regardless of the hypothetical value of the population parameter. Many of these will be familiar to some readers because they are based on the t, normal, chi-square, and F distributions. This chapter also introduces the transformation principle, whereby a confidence interval for a parameter may be used to construct an interval for any monotonic transformation of that parameter. Finally, there is a brief discussion of the effect that sampling design has on variability and therefore on confidence intervals. Chapter 4 introduces “noncentral” confidence intervals, based on distributions whose shape changes with the value of the parameter being estimated. Widely applicable examples are the noncentral t, F, and X2 (chi-squared) distributions. Confidence intervals for the noncentrality parameters associated with these distributions may be converted into confidence intervals for several popular effect-size measures such as multiple R2 and Cohen's d. Chapters 5 and 6 provide extended examples of the applications of confidence intervals. Chapter 5 covers a range of applications in ANOVA and linear regression, with examples from research in several disciplines. Chapter 6 deals with topics in categorical data analysis, starting with univariate and bivariate techniques and proceeding to multi-way frequency analysis and logistic regression. Chapter 7 elucidates the relationship between the confidence interval and significance testing frameworks, particularly regarding power. The use of confidence intervals in designing studies is discussed, including the distinctions arising between considerations of confidence interval width and power. Chapter 8 provides some concluding remarks and brief mentions of several topics related to confidence intervals but not dealt with in this monograph, namely measurement error, complex sample designs, and meta-analysis. I have received useful advice from many colleagues and students on drafts of this monograph. I am especially indebted to John Beale, Geoff Cumming, Chris Dracup, John Maindonald, Craig McGarty, Jeff Ward, and the students in the ACSPRI Summer School 2001 Confidence Interval Workshop for detailed and valuable ideas, data, criticism, and error detection. Of course, I am solely responsible for any remaining errors or flaws in this work. http://dx.doi.org/10.4135/9781412983761.n1 Page 3 of 3 Confidence Intervals Statistical Research Designs for Causal Inference In: Designing Research in the Social Sciences By: Martino Maggetti, Fabrizio Gilardi & Claudio M. Radaelli Pub. Date: 2015 Access Date: May 6, 2019 Publishing Company: SAGE Publications Ltd City: London Print ISBN: 9781849205016 Online ISBN: 9781473957664 DOI: https://dx.doi.org/10.4135/9781473957664 Print pages: 69-92 © 2013 SAGE Publications Ltd All Rights Reserved. This PDF has been generated from SAGE Research Methods. Please note that the pagination of the online version will vary from the pagination of the print book. SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. Statistical Research Designs for Causal Inference Introduction In Chapter 3 we discussed the different ways in which the social sciences conceptualize causation and argued that there is no single way in which causal relationships can be defined and analysed empirically. In this chapter, we focus on a specific set of approaches to constructing research designs for causal analysis, namely, one based on the potential-outcomes framework developed in statistics. As discussed in Chapter 3, this perspective is both probabilistic and counterfactual. It is probabilistic because it does not assume that the presence of a given cause leads invariably to a given effect, and it is counterfactual because it involves the comparison of actual configurations with hypothetical alternatives that are not observed in reality. In essence, this approach underscores the necessity to rely on comparable groups in order to achieve valid causal inferences. An important implication is that the design of a study is of paramount importance. The way in which the data are produced is the critical step of the research, while the actual data analysis, while obviously important, plays a secondary role. However, a convincing design requires research questions to be broken down to manageable pieces. Thus, the big trade-off in this perspective is between reliable inferences (that is, conclusions based on empirical evidence) on very specific causal relationships on the one hand, and their broader context and complexity (and, possibly, theoretical relevance) on the other hand. The chapter first distinguishes between two general perspectives on causality, namely, one that places the causes of effects in the foreground, and another that is more interested in the effects of causes. We then introduce the potential-outcomes framework before discussing several research designs for causal inference, notably various types of experiments and quasi-experiments. This is followed by a discussion of the implications for research design, and the conclusion summarizes the main points. Causes of Effects and Effects of Causes To understand the specificities of statistical research designs for causal inference, it is useful to consider a general difference between quantitative and qualitative approaches to causal analysis. While quantitative approaches typically focus on the ‘effects of causes’, qualitative approaches usually examine the ‘causes of effects’ (Mahoney and Goertz, 2006). An equivalent distinction is that between ‘forward’ and ‘reverse’ causal inference: forward causal inference asks ‘What might happen if we do X?’ while reverse causal inference asks ‘What causes Y?’ (Gelman, 2011). The difference between the two approaches overlaps in part with that characterizing ‘variable-oriented research’ on the one hand and ‘case-oriented research’ on the other (Ragin, 1987: 34–68; see also Chapter 3, this volume). Obviously, both are legitimate and fruitful perspectives in the social sciences, each with its own trade-offs. Moreover, it would be wrong to draw a sharp distinction between qualitative and quantitative research. As we will see throughout this chapter, although Page 2 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. statistical research designs for causal inference necessarily rely on quantitative techniques (otherwise they would not be ‘statistical’), qualitative information and substantive knowledge are an important precondition for meaningful analyses and are often an integral component of experiments and quasi-experiments. For instance, consider the case of women's quotas in parliamentary elections. Figure 4.1 compares the percentage of women in parliament in 69 countries with and 84 countries without quotas (Tripp and Kang, 2008). Each dot represents a country, and Finland, Sweden, France and the Netherlands are highlighted. Horizontal lines represent the average percentage of women in parliament in each group. From an effectsof-causes perspective, we would investigate the consequences of quotas on female representation. That is, the starting point is the presumed cause (quotas), and the aim is to measure its causal connection with the presumed effect (for example, the percentage of women in parliament). The fact that, on average, countries with quotas have more women in parliament than those without quotas suggests that quotas might be conducive to better female representation. On the other hand, from a causes-of-effects perspective we would begin with the outcome and trace our way back to the possible causes. For instance, we could ask why two relatively similar countries such as Finland and the Netherlands have similar shares of women in parliament (about 37 per cent), although only the Netherlands has gender quotas. We could also ask why, in Sweden, there are almost four times as many women in parliament as in France (45.3 per cent compared to 12.2 per cent), given that both countries have introduced quotas. The first perspective would be likely to produce a single estimate of the causal effect, while the second would probably give an extensive account of the numerous factors influencing female representation and explain the cases holistically, that is, in all their complexity. However, significant qualitative knowledge is also required in the former, both for constructing an appropriate research design and for interpreting the finding correctly. Page 3 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. Figure 4.1 Percentage of women in parliament in 69 countries with and 84 countries without quotas. Each dot represents a country. Horizontal lines represent the average percentage of women in parliament in each group Statistical research designs embrace the effect-of-causes approach. As Gelman (2011) argues, ‘What causes Y?’ is often the question that motivates the analysis in the first place. However, attempting to answer the question directly leads inevitably to a proliferation of hypotheses, most of which are actually likely to have some validity. Thus, the risk is that the analysis becomes intractable. This is the problem of overdetermination, or the fact that there are always a myriad of factors contributing in some way to a specific outcome. As we discuss in Chapter 6, there are methods that allow us to address this issue from a case-oriented perspective, that is, within a causes-of-effects approach. However, statistical designs reframe the question in terms of the effects of causes. They break the question down, identify a particularly interesting factor, and ask what consequences it has on the outcome of interest. An implication of this strategy is that multiple analyses are needed to uncover complex causal paths, because each analysis can examine only one at a time. Or, as Gelman (2011) puts it, in this perspective we are trying to learn about a specific causal path within a more complex causal structure, but not about the causal structure itself. Thus, statistical designs prioritize Page 4 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. the reliability of very specific causal estimates at the expense of the broader context in which they operate and possibly even of the connection with the original (theoretical and/ or empirical) problem, which must be redefined in order to make it fit within the strict requirements of the analytical design. The Potential-Outcomes Framework The potential-outcomes framework, also known as the counterfactual model, presupposes a dichotomous treatment (Di), such as (to continue our example from the previous section) the presence or absence of women's quotas. If Di = 1, then country i has quotas for the representation of women in parliament, while if Di = 0, then it does not. Further, the framework assumes that there are two potential outcomes for each unit i, Y1i and Y0i The outcomes are associated with the two possible values of the treatment. In our example, Y1i is the percentage of women in parliament in country i in the presence of quotas, while Y0i is that percentage if the same country i does not have quotas. Formally, we can represent this idea as follows: Notice that both outcomes refer to the same unit. But, of course, it is impossible that, in our example, the same country both does and does not have quotas. This is why the two outcomes are called ‘potential’; only one is realized and can be observed, while the other is its logical counterpart, which exists only in the realm of ideas. However, conceptually, both are necessary for the definition of a causal effect. If we were able to observe, for the same country i, the percentage of women both with and without quotas, then we could compute the causal effect for that country simply as the difference between the two outcomes: On this basis, and always on the assumption that both outcomes can be observed (which, in fact, is not possible), we can define two other quantities. The first is the average treatment effect (ATE), which, as the name indicates, is the average effect of the treatment for all units (for instance, the average effect of quotas in all countries): That is, the ATE is defined as the average difference between the two potential outcomes in all countries. The second quantity is the average treatment effect on the treated (ATT), that is, the effect of the treatment averaged only over units that actually receive the treatment (for instance, the average effect of quotas in countries with quotas): That is, we make the same computation as for the ATE, but only for the subset of countries with quotas (those for which Di = 1). Countries without quotas (Di = 0) are disregarded. Page 5 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. These definitions rely on a critical assumption, namely, the so-called stable unit treatment value assumption (SUTVA) (Morgan and Winship, 2007: 37–40). This has two components. First, the treatment must be the same for all units. While the effect of the treatment can vary across units (if it did not, we would not need to compute averages for the ATE and ATT), the treatment itself must be equivalent in all units. In our example, this assumption is in fact violated because there are several types of quotas, namely, compulsory or voluntary party quotas, reserved lists, and women-only lists (Tripp and Kang, 2008: 347). By collapsing them in a simple ‘quotas versus no quotas’ dichotomy, we assume that each of these instruments has the same consequences for female representation, which is unlikely to be the case. However, this assumption is necessary in the potential-outcomes framework. Second, the outcomes in one unit must be independent of the treatment status in other units. In other words, the percentage of women in a given country must be unrelated to whether or not other countries have quotas. This assumption should be met in our example, but it is easy to imagine other situations in which it does not hold, for instance when the treatment has network effects or other types of externalities. The interdependencies discussed in Chapter 7 are good cases in point. As noted above, these definitions of treatment effects are purely theoretical. In reality, we cannot observe the same unit both with and without the treatment. This is known as the ‘fundamental problem of causal inference’ (Holland, 1986), and it is what makes causal inference so difficult in practice. The nature of the problem is summarized in Table 4.1. In reality we can observe two outcomes, namely, in our example, the percentage of women in parliament in the presence of quotas given that there are actually quotas, and the percentage in the absence of quotas given that there are actually no quotas. However, to compute the quantities defined above, we would need also the two corresponding counterfactual outcomes, namely, the percentage of women in parliament in the absence of quotas in countries that actually have quotas, and the percentage in the presence of quotas in countries that actually have quotas. To illustrate more intuitively, take the case of France. Because this country has women's quotas, we are here in the top-left corner of Table 4.1. To compute the causal effect of quotas in France, we should take the difference between the observed percentage of women in parliament (12.2 per cent) and the value that we would observe if France had no quotas, that is, the value of the top-right corner of Table 4.1. The same logic applies to countries that have no quotas, namely, those in the bottom-right corner, which would need to be compared with their counterfactuals in the bottom-left corner. Table 4.1 The fundamental problem of causal inference (based on Morgan and Winship, 2007: 35) Page 6 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. What if we compute the difference between the two quantities we can actually observe? As we have seen in Figure 4.1, countries with quotas have, on average, more women in parliament (19.2 per cent) than those without (13.2 per cent). It turns out that this observed difference in averages is equal to the ATT (one of our quantities of interest), plus a selection bias (Angrist and Pischke, 2009: 14). In our example, the selection bias corresponds to the average difference between the percentage of women in parliament without quotas in countries that actually have quotas (a counterfactual) and the percentage without quotas in countries that actually do not have them (which is observable). The former group includes countries such as France, Germany, and Sweden, while the latter includes countries such as Ghana, Syria, and Vietnam. In fact, Table 4.2 shows that the two groups differ systematically in a number of ways. Countries with quotas tend to be wealthier, more democratic, and more likely to have a proportional system of electoral representation. Although the difference is only borderline significant, women in countries with quotas also tend to be more educated. All these factors are likely to be associated with a higher share of women in parliament even in the absence of quotas. This is what ‘selection bias’ means in this context. Countries are not assigned quotas randomly; they self-select into this policy. Therefore, countries with and countries without quotas differ in a number of ways and the two groups are not well comparable. Table 4.2 Countries with and countries without quotas are quite different (calculations based on Tripp and Kang, 2008) In sum, within the potential-outcomes framework, causal effects are clearly defined but cannot be directly computed in practice because the required counterfactuals are unobservable. However, researchers can rely on several methods to estimate them. We turn to these in the next section. Methods Regression Regression Analysis Is: [a]n extension of correlation analysis, which makes predictions about the value of a dependent variable using data about one or more independent variables. A key parameter estimated in a regression analysis is the magnitude of change in the dependent variable associated with a unit Page 7 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. change in an independent variable. This parameter is referred to as the slope or the regression coefficient. (Brady and Collier, 2004: 303) In most quantitative studies, the default research design applies this technique to observational data, that is, information that was not generated by a process controlled by the researcher. The data set used by Tripp and Kang (2008) is a typical example. By contrast, experimental data are those produced under the supervision of the researcher. To continue with our example, a bivariate regression (that is, including just one explanatory variable) of the share of women in parliament on quotas indicates that countries with quotas have on average about 6 per cent more women in parliament than countries without quotas, and that the difference is statistically highly significant.1 This difference corresponds exactly to what is shown in Figure 4.1. An obvious problem with this analysis is that it fails to control for the differences that exist across countries beyond the presence of quotas, such as those shown in Table 4.2. In other words, the bivariate regression neglects the selection bias problem. A multivariate regression (that is, including several explanatory variables) can mitigate it, to a certain extent. If we include the variables listed in Table 4.2, quotas remain significantly associated with female representation, but the size of the effect is reduced by half in comparison with the bivariate regression. That is, with per capita gross domestic product (GDP), women's education, democracy, and the type of electoral system controlled for, countries with quotas have on average only about 3.2 per cent more women in parliament than countries without quotas.2 The inclusion of control variables is known also as ‘covariate adjustment’, which means that the analysis adjusts the estimate of the causal effect for those covariates (that is, variables) that can be taken into account. Under some conditions, regression can yield unbiased estimates of causal effects (Morgan and Winship, 2007: 136–42). These conditions, however, are quite restrictive and generally unlikely to be met in practice. First, there must be no omitted variables in the analysis. That is, in our example, all factors influencing the percentage of women in parliament besides quotas must be measured and included in the regression. Obviously, no analysis can ever fulfil this requirement perfectly, which means that only rarely can the causal estimates produced by regression analysis be credibly considered unbiased. Second, the functional relationship between the control variables and the outcome must be fully and correctly specified. This means, for instance, that any non-linearities in the relationship between say, per capita GDP and women's representation (for instance, the correlation is stronger at lower levels of per capita GDP), as well as any interactions (for instance, the correlation between women's education and representation depends on the level of per capita GDP) must be explicitly and correctly modelled. This quickly becomes intractable with even just a handful of variables, a problem that is known as the ‘curse of dimensionality’. This requirement stems from the fact that, in most practical situations, the treatment and control groups are quite different; in other words, the covariates are not balanced between them. In fact, this is the case in our example, as shown in Table 4.2. Therefore, the analysis needs to make assumptions in order to extrapolate the comparison between countries with and without quotas for specific combinations of control variables. The problem can be alleviated by a method called ‘matching’ (Ho et al., 2007), which attempts to make the Page 8 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. treated and control groups more similar by removing ‘incomparable’ cases. One can, for instance, compute the probability that a unit receives the treatment (the ‘propensity score’) and then find, for each treated unit, an untreated unit with a very similar propensity score. If this procedure is successful (which depends on the characteristics of the data set), then a better balance between the two groups is achieved (that is, they are more comparable) and the analysis becomes less dependent on the specific assumptions made by the regression model. However, matching improves comparability only with respect to variables that can actually be observed. Thus, the first condition (no omitted variables) remains a big problem. Experiments As we have seen, two main practical problems arise when the potential-outcomes approach is implemented empirically. First, selection bias is ubiquitous, which means that the comparability of the treatment and control groups is usually limited. Second, while regression can in principle solve this problem, omitted variables and the ‘curse of dimensionality’ will in most cases lead to biased estimates of causal effects. The appeal of the experimental approach is that it is much more effective in ensuring that treated and control units are in fact comparable. This occurs through ‘randomization’, that is, random assignment of treatment to the units. Specifically, what defines experiments is that randomization is undertaken by researchers themselves. With randomization, systematic differences between the two groups can occur only by chance and, if the number of units is sufficiently large, with a very low probability. Moreover, the procedure works for both observable and unobservable characteristics, such that omitted variables are no longer a problem. Because randomization is so powerful, the data can in principle be analysed with simple techniques, and the difference in means for the outcome between treatment and control groups (or, equivalently, the coefficient of a bivariate regression) can be interpreted as the ATE as well as the ATT. A common problem is that units are not selected randomly from the population, such that it is not possible to generalize the estimates straightforwardly beyond the sample. However, the estimates are still valid for the units that were part of the experiment. It should be emphasized that, of course, randomization is not perfect and there are several ways in which it can go wrong. For instance, it is possible that not all the units that are assigned to the treatment are actually treated or, conversely, that some control units become exposed to it (‘non-compliance’); it is also possible that, for one reason or another, outcomes cannot be observed for some units (‘attrition’) (Gerber and Green, 2008). However, experiments have an unparalleled capacity to uncover causal relationships and are widely considered the ‘gold standard’ in this respect. In our women's quotas example, an experiment would imply that quotas are attributed to countries randomly. As a consequence, and in contrast to what we have seen in Table 4.2, the groups of countries with and without quotas would be very similar, if not exactly identical, in all characteristics that could potentially affect women's representation, including those that cannot be observed. Therefore, the average difference in the percentages of women in parliament between the two groups could in principle be interpreted as the causal effect of quotas. The example shows the advantages of the experimental approach, but also an obvious drawback in the social sciences. In many, if not most, cases, randomization cannot be implemented for a number of practical and ethical reasons. For instance, imposing a dictatorship on a random subset of democracies Page 9 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. (to see the consequences on economic growth, for example) is impossible in practice and, even if it were feasible, would be unethical. Given these problems, it is not surprising that experiments are not the first method that comes to mind when one thinks of social science research. At the same time, in recent years they have been used with increasing frequency and success and have become a mainstream tool for social scientists (Druckman et al., 2006). We can distinguish among three broad types, namely, laboratory, survey and field experiments, which we discuss in the following subsections. Laboratory Experiments Laboratory experiments are ‘experiments where the subjects are recruited to a common location, the experiment is largely conducted at that location, and the researcher controls almost all aspects in that location, except for subjects’ behavior’ (Morton and Williams, 2008: 346; emphasis in original). They are what first comes to mind when we hear the word ‘experiment,’ namely, a relatively small group of people, not necessarily representative of the broader population (for example, students), following precise instructions to perform a set of abstract tasks. Despite their stylized nature, laboratory experiments can help to uncover important causal relationships. For example, Correll (2004) was interested in how cultural beliefs about gender differences in ability affect career choices through the self-assessment of performance. If it is commonly accepted in society that, say, men are better than women at mathematics, then the theory is that, at equal levels of objective skills, men will evaluate their competence more highly than women do. Consequently, men will be more inclined than women to pursue a career in a field where maths is important, thus reproducing existing gender imbalances. To estimate the causal effect of cultural frames, Correll (2004) set up an experiment in which about 80 undergraduate students were asked to perform a test purportedly designed to develop a new examination for graduate school admission. The test had no right or wrong answers (but was perceived as credible) and all subjects were given the same score, that is, the same objective assessment of their skills. By contrast, their cultural expectations (that is, the treatment) were manipulated by assigning subjects randomly to two groups. The treated group was told that males tend to perform better at the task, while the control group was informed that there are usually no gender differences in this context. After completing the test and receiving the (fake) scores, subjects were asked to provide a self-assessment of their performance and to answer questions about how likely they would be to pursue a career requiring high levels of the skills that were purportedly tested. In line with the theoretical expectations, the analysis showed that, under the treatment condition, females’ self-assessment was lower than males’, and that males’ assessment under the treatment was higher than under the control condition. Further, these biased self-assessments were related to potential career plans. A second example is Dunning and Harrison (2010), which studied how cross-cutting cleavages moderate the political saliency of ethnicity. The theory is that ethnic differences play a more important role in politics if citizens speaking a given language, for instance, belong to a different religion and are poorer than those speaking other languages. If, however, the different cleavages (linguistic, religious, economic) are Page 10 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. not superposed in this way, then it is expected that language is less relevant as a determinant of political behaviour. Dunning and Harrison (2010) studied this argument in the case of Mali, a highly ethnically diverse country, by focusing on ‘cousinage’, which is a form of identity and social bonds connected with groups of patronymics (surnames) but distinct from ethnicity. The 824 subjects of the experiments, recruited in Mali's capital city, were shown videotaped political speeches by a purported political independent considering being a candidate for deputy in the National Assembly. Subjects were asked to evaluate the candidate on a number of dimensions. The treatment was the politician's last name, which subjects could readily associate with both ethnicity and cousinage ties. This set-up yielded four combinations of subjects’ and politician's ethnicity and cousinage, namely, same ethnicity/cousins, same ethnicity/not cousins, different ethnicity/cousins, and different ethnicity/not cousins. Additionally, in the control group the politician's name was not given. In line with theoretical expectations, the candidate was evaluated best by the subjects when they shared both ethnicity and cousinage and worst in the opposite scenario. Additionally, cousinage compensated for ethnicity: the candidate was evaluated similarly when subjects and candidate were from the same ethnic group but without cousinage ties and when they were from a different ethnic group but with cousinage ties. In order to produce valid results, laboratory experiments must consider an extensive list of potential problems, such as the nature of experimental manipulations, location, artificiality, subjects’ selection and motivation, and ethical concerns (for a thorough discussion, see Morton and Williams, 2010). Furthermore, they are vulnerable to the objection that, while their internal validity may be strong (that is, their results are valid within the context of the experiment), their conclusions cannot be generalized to the ‘real world’. We return to this point in the conclusion. Survey Experiments Survey experiments randomly assign the respondents of a survey to control and treatment conditions through the manipulation of the form or placement of questions (Gaines et al., 2007: 3–4). Because many survey experiments use samples that are representative of the population, they promise to achieve both internal and external validity, the first through randomization, and the second through representativeness (Barabas and Jerit, 2010: 226). These potential qualities, in combination with increasingly easy and cheap access to survey resources, have made survey experiments more popular among social scientists in recent years. For example, Hainmueller and Hiscox (2010) examined attitudes towards immigration. They asked whether, as predicted by the labour market competition model, people tend to oppose immigrants with a skills level similar to their own, who would be perceived as a more direct threat in the competition for jobs. The experiment was embedded in a survey completed by 1,601 respondents in the United States, who were randomly divided into two groups. Those in the treatment group were asked whether they agreed that the USA should accept more highly skilled immigrants from other countries. The question asked in the control group was identical except that ‘highly skilled’ was replaced with ‘low-skilled’. The authors were able to confirm that randomization worked well because the distributions of respondents’ characteristics in the two groups were statistically indistinguishable. The main finding of the analysis is that, contrary to theory, both Page 11 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. low-skilled and highly skilled respondents prefer highly skilled immigrants, which suggests that non-economic concerns are very important to explaining attitudes towards immigration. Another example is Linos (2011), who studied cross-national interdependencies (one of the topics of Chapter 7) in the field of family policy with an experiment in which 1,291 Americans were asked whether they agreed that the United States should increase taxes to finance paid maternity leave. Respondents were assigned randomly either to a control group, in which the question was formulated neutrally, or to one of four treatment groups. In the first and second treatment groups, respondents were informed that the proposed policy was already in place in Canada or in most Western countries, respectively. In the third, respondents learned that the policy was recommended by the United Nations. Finally, in the fourth the policy was endorsed by ‘American family policy experts’. The results show that, while in the control group only 20 per cent of respondents supported increasing taxes to pay for maternity leave, the share jumped to about 40 per cent in the treatment groups referring to Canada or other Western countries. Interestingly, the effect of foreign models was comparable to that of American experts, while that of the UN was even slightly higher. Thus, foreign experiences seemed to play a significant role in shaping public opinion on family policy, which could be an important channel through which policies spread cross-nationally. Researchers employing survey experiments face a distinct set of issues (Gaines et al., 2007; Barabas and Jerit, 2010). The treatment can be problematic in several ways. It is typically administered as a single exposure to an artificially intense stimulus, while in reality people may be exposed to it to varying degrees, at several points in time, and in combination with other factors. Moreover, exposure to the realworld version of the treatment prior to the survey can bias the results. Also, survey experiments usually measure the immediate effects of the treatment, but it would be important to know how long they last. In short, even if the sample is representative, external validity can be compromised if the treatment itself lacks representativeness. Field Experiments Field experiments ‘are experiments where the researcher's intervention takes place in an environment where the researcher has only limited control beyond the intervention conducted’ (Morton and Williams, 2008: 346). The central characteristic of experiments (randomized treatment assignment) is preserved but takes place in the ‘real world’, which complicates its implementation in various ways. Field experiments are well established particularly in the study of political behaviour and the political economy of development, but they have also caught on in other sub-fields. Because of the logistical requirements, which often involve prolonged stays in the area where the experiments take place and contacts with a large number of local actors, researchers gain detailed knowledge of their cases, comparable to that of typical qualitative fieldwork. Thus, the qualitativequantitative distinction is not very meaningful here. For instance, Olken (2010) studied a classic question of democratic theory, namely, the comparative advantages of direct democracy and representation. The field experiment randomized the political process Page 12 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. through which infrastructure projects were selected in 49 Indonesian villages. About 53 per cent of the villages were randomly assigned to a direct democratic process in which all adults eligible to vote in national elections could express their preference. In the remaining villages the standard process was followed. Project selection took place in small meetings open to the public but, in fact, were attended by a limited number of members of the local elite (such as government officials and representatives of various groups). On average, about 20 times as many people participated in the referenda as in the meetings. The randomization produced treatment and control groups that were statistically indistinguishable with respect to both village characteristics (such as ethnic and religious fragmentation, distance to subdistrict capital, population) and individual characteristics (education, gender, age, occupation). The results of the experiment showed that the same projects were selected under both decision-making processes, which suggests that representation does not lead to outcomes that are biased in favour of the elite's preferences. However, villagers were significantly more satisfied with the decisions when they were taken through referenda. Thus, it seems that the main effect of direct democracy is to increase the legitimacy of decisions, but not necessarily to shift their content closer to the population's preferences. Another field experiment attempted to uncover the effects of political advertising on voters’ preferences by randomizing radio and television advertisements, for a total value of about $2 million, during the 2006 re-election campaign of Texas governor Rick Perry (Gerber et al., 2011). The study randomized both the starting date and the volume of advertisements across 20 media markets in Texas, but not stations or programmes. The outcome, that is, voters’ evaluation of the candidate, was measured using large daily polls. Results showed a strong short-term effect of the advertisements. The maximum advertisements volume was associated with an increase of almost five percentage points in the candidate's vote share during the week in which the advertisements were aired. However, this effect vanished as soon as a week afterwards. Thus, the results suggest that political advertising does make a difference, but this difference evaporates quite quickly. In addition to problems common to all experiments (such as external validity), field experiments present some specific challenges (Humphreys and Weinstein, 2009: 373–6). Given that many interesting variables cannot be randomized because of practical constraints, only a relatively small subset of questions can be investigated with this method. A possible solution is to focus on smaller units (for example, municipalities instead of countries), but this will reduce the external validity of the analysis. Because field experiments take place in real time and in real settings, many factors are not under the control of researchers and can therefore contaminate the findings. A common problem is spillovers, or the fact that intervention in one unit may affect outcomes in other units. As discussed above, this violates the SUTVA assumption of the potentialoutcomes framework. The logistics of field experiments also constrains their size and reduces the precision of the estimates, which is a problem especially if the effects are small. Finally, because they operate in real contexts, field experiments also raise certain ethical concerns. Quasi-Experiments Quasi-experiments are observational studies (that is, they use data that were not generated by a process Page 13 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. controlled by the researcher) in which, thanks to circumstances outside the researcher's control, random treatment assignment is approximated to a certain extent. That is, although the assignment of units to treatment or to control status is not determined by the researchers but by naturally occurring social and political processes, some features of the procedures make it credible to assume that it is ‘as if at random’. As Dunning (2008) argues, the plausibility of this assumption is variable and the burden of proof must be on the researcher. Thus, it is useful to situate quasi-experiments on a continuum with standard observational studies at one end and classical randomized experiments at the other. Making the case convincingly usually requires detailed knowledge of the context of the quasi-experiment. Moreover, the data are seldom readily available. Their acquisition often necessitates archival work or other procedures typically associated with qualitative studies. This demonstrates again that the distinction between quantitative and qualitative approaches is not very relevant. Quasi-experiments can take different forms. We discuss three: natural experiments, discontinuity designs, and instrumental variables. Natural Experiments In natural experiments, the ‘as if at random’ component comes from some social, economic, and/or political process that separates two groups cleanly on a theoretically relevant dimension. That is, although the quasirandomization occurs without the researcher's intervention, it produces well-defined treatment and control groups. For instance, Hyde (2007) studied the effects of international election monitoring on electoral fraud with data from the 2003 presidential election in Armenia, using polling stations as units of analysis. The outcome variable was the share of votes of incumbent President Kocharian, who was widely believed to have orchestrated extensive fraud operations. Poll stations in the treatment group were those visited by international observers, while those in the control group were not inspected by the monitors. To measure the treatment status of poll stations, Hyde (2007) relied on the list of assigned polling stations produced by the organization in charge of monitoring the elections, the Office for Democratic Institutions and Human Rights of the Organization for Security and Co-operation in Europe (OSCE/ODIHR). The validity of the natural experiment rests upon the assumption that international observers were assigned to polling stations in a way that approximates random assignment, and Hyde (2007) discussed in detail why this assumption was plausible in this case. The OSCE/ ODIHR staff completed the lists arbitrarily, only on the basis of logistical considerations and with no knowledge of the socio-economic and political characteristics of the polling stations. The analysis showed that the incumbent presidents received significantly more votes (between 2 and 4 per cent) in stations that were not monitored than in those that were visited by observers, which suggests that this control mechanism has an impact on the extent of electoral fraud. In another study, Bhavnani (2009) exploited an actual randomization, albeit one which he did not design, to investigate the long-term effects of quotas on female representation, that is, their consequences after they Page 14 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. are withdrawn. A policy initiative in India reserved a certain number of seats for women in local elections, which were chosen randomly for one legislature. The goal of this selection procedure was not to allow an evaluation of the policy (though this was a welcome side product), but rather to make it as fair as possible by ensuring that men would be excluded from certain seats only temporarily, and without biases towards specific seats. Reserved and unreserved seats were statistically indistinguishable on many relevant dimensions, which suggests that the randomization is likely to have worked. The analysis of elections in 1997 and 2002 showed that quotas had an effect on female representation not only during the election in which they were enforced, which must be true if the policy is implemented properly, but also in the next election, after they were no longer in force. A comparison of districts that were open both in 1997 and in 2002 with those that were reserved in 1997 but open again in 2002 shows that the percentage of female winners was significantly higher in the latter districts (21.6 per cent compared to 3.7 per cent). This indicates that the effects of quotas extend beyond their duration, possibly by introducing new female candidates into politics and by changing the perceptions of voters and parties. Natural experiments are appealing because they feature randomization in a real-world setting without the direct involvement of the researcher. However, because researchers have no control over them, and because good natural experiments are rare, they often originate in the availability of a convenient configuration instead of in a previously defined research question. In this sense, they tend to be method-driven rather than problemdriven. Nonetheless, this is not necessarily problematic and the examples that we have just seen prove that natural experiments can be used to investigate important questions. Discontinuity Designs Similar to natural experiments, discontinuity designs exploit sources of quasi-randomization originating in social and political processes. In contrast to natural experiments, they rely on sharp jumps, or ‘discontinuities’, in a continuous variable. The cut-off point determines whether a unit is exposed to the treatment or not, the idea being that treatment assignment is ‘as if at random’ for units on either side of it. Elections are a typical example of such discontinuities because it is quite reasonable to assume that, in narrow elections, the outcome is due in large part to chance. While candidates who win by a landslide are likely to be very different from those who receive only a handful of votes, candidates on either side of the election threshold are probably similar in many respects. Using these ideas, Eggers and Hainmueller (2009) compared the wealth at death of narrow winners and losers in British national elections and found that successful Conservative Party candidates died with about £546,000, compared with about £298,000 for candidates from the same party who were not elected. By contrast, the difference was much smaller for Labour Party candidates, suggesting that the material benefits of serving in Parliament differ across political parties. Gerber and Hopkins (2011) also relied on the random component of elections, but to examine the effects of partisanship on public policy at the local level. The comparison of 134 elections in 59 large American cities revealed that in most policy areas changes in public spending were very similar regardless of whether a Republican or a Democrat narrowly won. The Page 15 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. one exception was policing expenditures, which were higher under successful Republican candidates. These findings suggest that partisan effects are small at the local level. Lalive and Zweimüller (2009) exploited a different type of discontinuity, namely, the date at which a longer period of parental leave entered into force in Austria, to estimate the effects of this policy on mothers’ further childbearing and careers. Mothers giving birth after 30 June 1990 were able to benefit from paid leave of 2 years instead of 1 year under the policy in force until that date. Because of this sharp cut-off, the duration of the parental leave can be considered to be randomly assigned to mothers giving birth shortly before or after 30 June. Indeed, the two groups were indistinguishable on many observed socio-economic characteristics such as age and work history and profile. The comparison of the two groups showed that longer parental leave causes women to have more additional children. It also reduces their employment and earnings, but only in the short term. Sharp cut-offs, such as those found in elections and other settings, generally offer quite convincing sources of quasi-randomization, even though researchers should carefully check whether actors are aware of the discontinuity and exploit it, as in the case of income tax thresholds (Green et al., 2009: 401). However, it is important to note that the causal effects estimated with this method apply only at the threshold and cannot be extrapolated to all units. Because, usually, only relatively few observations are sufficiently close to the threshold, the results produced by regression discontinuity designs apply to a specific subsample, which limits their external validity. Moreover, there are trade-offs but no clear guidelines regarding the width of the window around the threshold (Green et al., 2009). A larger window (and, thus, more observations) makes estimates more precise but potentially biased by unobserved factors, while a smaller window reduces the bias but reduces the number of observations and, thus, the precision of the estimates. Instrumental Variables Instrumental variables are factors that can be used to replace treatment variables for which the ‘as if at random’ assumption does not hold (Sovey and Green, 2010). They have to meet three crucial assumptions. The first is relatively innocuous and states that the instrument and the treatment are correlated, after relevant covariates are controlled for. The second and third assumptions are usually much more problematic. The ‘exclusion restriction’ means that the instrument affects outcomes exclusively through its correlation with the treatment, that is, it has no direct effect on the outcomes, while the ‘ignorability assumption’ requires that the instrument is ‘as if at random’. Thus, good instruments are those produced by some sort of quasi-experiment. Concretely, the estimation proceeds in two stages. In the first, the treatment variable is regressed on the instrument and the results are used to compute expected values for the treatment. In the second stage, these values replace the treatment in the main regression. In a famous study, Acemoglu et al. (2001) addressed the effects of institutions on economic development. A simple regression of development on institutions is likely to be inappropriate (even with many control variables), for two reasons. First, the causal relationship can arguably go both ways: better institutions cause Page 16 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. higher economic development, but higher economic development can also cause better institutions. Second, similar to the example of women's quotas discussed above, it is likely that countries with different degrees of economic development are different on many other dimensions as well. To circumvent these problems, Acemoglu et al. (2001) employed mortality rates of European settlers (proxied by those of soldiers, bishops, and sailors) as an instrument for current institutions. The argument is that European powers set up different types of institutions depending on their ability to settle. If a region was hospitable, then European-style institutions were constructed with an emphasis on property rights and checks against government power, while if it was not hospitable, colonizers set up ‘extractive states’ for the purpose of transferring as many resources as possible from the colony. The analysis shows a strong association between current institutions, instrumented by settler mortality, and economic development, which corroborates the argument that a causal relationship is at play rather than a mere correlation. An important caveat is the plausibility of the exclusion restriction, that is, the possibility that the effect of settler mortality on economic development could work through something other than institutions. For instance, the mortality rates of colonizers could be related to current diseases, which may have had an impact on development. In this case, institutions would not be part of the causal chain. However, the authors argue convincingly that the causes of European deaths in the colonies (mainly malaria and yellow fever) were not likely to be connected with economic development because the indigenous populations had developed immunities against these diseases. In another application, election-day rainfall was used as instrument for turnout to estimate its effects on electoral outcomes in the United States (Hansford and Gomez, 2010). In effect, many studies have suggested that higher turnout is beneficial to leftist parties (or Democrats in the United States), but the problem is that many factors are likely to influence both the decision to vote and the vote itself at the same time. By contrast, the weather on election day is likely to affect the choice to go to the polling booth, but not the preference expressed in the vote.3 Moreover, rainfall on a specific day can probably be considered an ‘as if at random’ event. The analysis was able to confirm that higher turnout does indeed cause a higher vote share for Democratic candidates. Finally, in a study already discussed in Chapter 3, Kern and Hainmueller (2009) studied the effects of West German television on public support for the East German communist regime, using a survey of East German teenagers. The survey included information for both the dependent (regime support) and treatment (exposure to West German television) variables. Because it is highly likely that people who watch a lot of West German programmes have different predispositions towards the communist regime in the first place, the treatment cannot be considered ‘as if at random’. However, while West German television reception was generally possible in East Germany, it was blocked in some regions (especially near Dresden) because of their topography. As long as living in Dresden per se was not directly related to regime support and that region was generally comparable with the rest of the country, living in Dresden can be used as an instrument for television exposure. The analysis showed that, quite counter-intuitively, West German television caused greater support for the East German regime, possibly because East German citizens consumed it primarily for entertainment and not as a source of information. Page 17 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. Like the other approaches, instrumental variables come with their own set of problems (Sovey and Green, 2010). In fact, the list of potential issues is even longer because, in addition to the need to find a suitable ‘quasi-experiment’, the instrument must fit within the model that is used in the estimation in a very specific way. Also, the results must be interpreted carefully because the causal effects estimates apply to a particular subset of units and are known as ‘local average treatment effects’. In sum, if the right conditions are fulfilled, instrumental variables are a valuable tool, but in practice their application is quite tricky. Lessons for Research Design If we take the statistical approach to causal inference seriously, the consequences for research design are wide-ranging. The main lesson is that the design is the most important part of the research because it is at this stage that the possibility of credibly identifying causal effects can be influenced. In fact, in the ideal-typical case of a ‘perfect’ research design, that is, an experiment that is designed and implemented flawlessly, the analysis stage becomes almost trivial because it suffices to compare mean outcomes in the treatment and control groups. The sophistication of the methods used in the analysis must increase with imperfections in the research design in order to correct them expost. To illustrate, consider again the example of women's quotas and female representation in parliament (Tripp and Kang, 2008). The research design adopted by the authors, which is typical of cross-national quantitative studies, was simply to collect data on as many countries as possible for the dependent variable (percentage of women in parliament), treatment variable (quotas), and control variables (countries’ background characteristics). Here ends the design stage and begins the analysis, which, to produce credible causal estimates, needs to fix the basic problem that countries with and countries without quotas are not really comparable. As discussed above, standard regression tools and newer matching methods can help, but only up to a point. The fundamental problem is that they can adjust for the factors that we do observe, but not for those that we do not, which are virtually always an issue. Thus, ex post fixes are bound to be imperfect. By contrast, the statistical approach to causal inference aims to fix things ex ante by constructing or finding suitable treatment and control groups in advance of the analysis. As we have seen, this goal can be achieved with different means. First, we can design our own experiments in the lab or in the field, or base them on surveys. That is, the treatment can be randomized by the researcher in an artificial setting, in the real world, or via the questions asked in a survey. Second, we can try to find constellations in which randomization is approximated without the direct intervention of the researcher. Natural experiments, discontinuity designs, and suitable instrumental variables are three options. In all these cases, the most traction for causal inferences is gained through the way the comparison between treatment and control groups is configured, not through the specific techniques used to analyse the data. The key benefit is that, if randomization is implemented properly or is approximated sufficiently in a real-world setting, it produces groups that are comparable not only for their observed but also for their unobserved characteristics. This is a major advantage for the validity of causal inferences. Page 18 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. Thus, the quality of the research design is of the essence. The exacting requirement of a plausible ‘as if at random’ assumption implies that downloading prepackaged data sets and letting the computer do the counting is not enough, no matter how sophisticated the techniques. More creative solutions are required, and few will involve broad cross-national comparisons, for the simple reason that broad international comparisons are likely to be, well, incomparable. In fact, none of the examples discussed in this chapter compared countries. Instead, they focused on specific within-country variations and used original data, often assembled with great effort. Unfortunately, there are no clear guidelines for identifying promising comparisons. The criteria that the research design must meet are clear, but discovering the right configuration in practice is an art more than a science. We emphasize that, in many ways, statistical research designs for causal inference transcend the usual qualitative-quantitative distinctions. Obviously, they have strong quantitative components because they rely on statistical techniques to estimate causal effects. However, they also require significant qualitative work and substantive knowledge to identify the most promising cases, to collect hard-to-access data through archival work or other qualitative procedures, and generally to construct a meaningful study. In some cases, such as field experiments, researchers are actually involved in fieldwork comparable to that of many traditional qualitative studies. Thus, these research designs do not fit well within a simple quantitative-qualitative typology. The limits of such distinctions are a general theme of this book. As with all approaches, statistical research designs for causal inference must face trade-offs. The most important trade-off is that between validity and relevance. A common criticism of this approach is that it leads to a focus on small, tractable questions at the expense of big problems that are harder to study. It is undeniable that research in this tradition prioritizes internal over external validity. At the same time, the former is arguably a prerequisite for the latter. In other words, it does not make much sense to generalize findings that are not credible. Moreover, as Angrist and Pischke (2010) argue, external validity, or generalization, remains an important goal that can be achieved through the cumulation of well-designed but necessarily narrow studies. Finally, the examples discussed in this chapter studied problems such as the political salience of ethnicity, attitudes towards immigration, the consequences of direct democracy in comparison to representation, and the foreign influences of support for autocratic rule. These are all ‘big’ questions and, even though each study individually did not provide definitive answers, they did supply convincing evidence on the causal effects in a specific setting. Other studies should try to replicate them in other contexts. If they are successful, then the external validity and generalizability of the findings will be strengthened. Conclusion Figure 4.2 summarizes the main points of this chapter. We can classify statistical research designs for causal inference along two dimensions. First, is the treatment assigned randomly, and, if so, how? Second, to what extent are the treated and control units comparable? Page 19 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. Figure 4.2 A classification of statistical research designs for causal inference. Matching and regression are in parentheses because, strictly speaking, they are estimation techniques and not research designs In the standard regression approach, supplemented or not by matching, there is no randomization and, typically, self-selection into the treatment. For instance, the same variables that explain why countries adopt women's quotas (the treatment) are likely to influence female representation in parliament (the outcome). The problem is bigger if these variables are not included in the analysis (bivariate regression) than if they are (multivariate regression), and matching can mitigate the problem further. However, there is no way around the fact that the adjustment can be made only for those variables that can be observed, but not for those that are unobserved. Therefore, the comparability of the treatment and control groups (countries with and countries without quotas) and, consequently, the validity of causal inferences will be relatively limited. By contrast, in experiments the treatment is randomized by researchers themselves and, in principle, the treatment and control units will be highly comparable. Experiments can take place in the lab, in the field, and within surveys. Quasiexperiments can credibly make the assumption that the treatment is assigned ‘as if at random’ because of a particular process occurring in the real world, without the researcher's intervention. The comparability of the treatment and control groups will in principle be quite high, significantly better than in the standard regression approach, but somewhat worse than in experiments. The validity of the causal inferences will vary accordingly. In this context, an important trade-off is that between complexity or realism of the research question and Page 20 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. reliability of the causal estimates. To achieve the latter, statistical research designs narrow down complex theoretical and/or empirical questions to smaller, tractable questions. These research designs can produce valid estimates of causal relationships, but many different analyses are necessary to give the full picture of a complex phenomenon. By contrast, other research designs discussed in this book put the emphasis on a holistic view of causal processes, but at the cost of validity. To conclude, the statistical approach emphasizes the importance of research design for valid causal inferences. The primary concern is the construction of comparable treatment and control groups. This will be difficult with standard cross-national data sets. Instead, researchers should produce their own experiments or look for configurations in the real world that can approximate them, which requires considerable qualitative knowledge and not just the mastery of quantitative techniques. Checklist • The key for causal inference is the construction or identification of appropriate treatment and control groups. • Random assignment of treatment to the units (‘randomization’) is the gold standard for causal inference because it is the best way to make sure that the treatment and control groups are comparable. • We speak of experiments when researchers themselves undertake the randomization. We can distinguish between laboratory, survey and field experiments. • We speak of quasi-experiments when randomization is approximated due to circumstances outside the researchers’ control. Natural experiments and discontinuity designs belong to this category. • A successful experiment or quasi-experiment requires not just the application of quantitative techniques, but also significant qualitative knowledge. Questions 1 Read closely five articles making causal arguments in your field of study. To what extent do they correspond to a ‘causes-of-effects’ or ‘effects-of-causes’ perspective? 2 For each of the five articles, reframe the causal claims using the potential-outcomes framework and construct the equivalent of Table 4.1. 3 Read five articles making causal arguments using standard regression methods. To what extent can the findings actually be interpreted causally? 4 Think of a specific research question. What would be the ideal experiment to test the causal argument? Now try to develop a research design that can approximate it as much as possible in practice. 5 Read closely five of the articles cited as examples in this chapter (or other articles of your choice) and assess them with respect to the trade-off between the validity of the causal Page 21 of 22 Designing Research in the Social Sciences SAGE SAGE Research Methods 2013 SAGE Publications, Ltd. All Rights Reserved. inference and the relevance or importance of the findings. 1% women = 13.18 (1.03) + 6.02 (1.53) × quotas. OLS estimates, standard errors in parentheses. 2% women = −1.67 (5.68) + 3.2 (1.55) × quotas + 6.02 × electoral system + 0.11 (1.16) × democracy + 0.11 (0.14) × women's education + 1.18 (0.59) × GDP/cap (log). OLS estimates, standard errors in parentheses. 3But recall the Italian expression ‘Piove, governo ladro’. Further Reading Angrist, J.D. and Pischke, J. (2009) Mostly Harmless Econometrics: An Empiricist Companion. Princeton, NJ: Princeton University Press. A relatively non-technical introductory text written by economists. Angrist, J.D. and Pischke, J. (2010) The credibility revolution in empirical economics: how better research design is taking the con out of econometrics. Journal of Economic Perspectives, 24 (2): 3–30. A non-technical summary of the book by the same authors. http://dx.doi.org/10.1257/jep.24.2.3 Morgan, S.L. and Winship, C. (2007) Counterfactuals and Causal Inference. Methods and Principles for Social Research. Cambridge: Cambridge University Press. A relatively technical introductory text written by sociologists. http://dx.doi.org/10.1017/CBO9780511804564 Morton, R.B. and Williams, K.C. (2010). Experimental Political Science and the Study of Causality: From Nature to the Lab. Cambridge: Cambridge University Press. A relatively technical introductory text written by political scientists. http://dx.doi.org/10.1017/CBO9780511762888 http://dx.doi.org/10.4135/9781473957664.n4 Page 22 of 22 Designing Research in the Social Sciences Organizational research: Determining appropriate sample size in survey research Barlett, James E;Kotrlik, Joe W;Higgins, Chadwick C Information Technology, Learning, and Performance Journal; Spring 2001; 19, 1; ProQuest pg. 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached.

Running head: QUANTITATIVE RESEARCH DESIGN

Quantitative Research Design
Student Affiliation
Institution Affiliation

1

QUANTITATIVE RESEARCH DESIGN

2

Quantitative Research Design
The main aim of research is to improve the quality of life by finding solutions to existing
problems. Quantitative research is conducted in order to determine the relationship between one
(dependent variable) thing and another (independent variable) within a population. Quantitative
research designs are either experimental or descriptive. In a descriptive research the study
identifies the association between variables whereas in experimental only the causality is
established (Martinno, Fabrizio & Claudio, 2015). In order to conduct a quantitative research
project, one has to formulate research questions and the purpose of study. In addition, at the end
of the study ...


Anonymous
Just what I was looking for! Super helpful.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags