Data Analysis

User Generated

Fg07

Humanities

Description

This is just a discussion. Not an essay.

Unformatted Attachment Preview

Quantitative Problem Solving 1 Paul Souders/Corbis Chapter Learning Objectives After reading this chapter, you should be able to do the following: 1. Distinguish among data of nominal, ordinal, interval, and ratio scale. 2. Associate the measures of central tendency with the scale of the data for which those m ­ easures are appropriate descriptive statistics. 3. Calculate the mean, median, and mode for a set of data. 4. Calculate and interpret the variance and the standard deviation for a set of data. 5. Distinguish between the characteristics and the notation used for sample and population data. 6. Describe guidelines for reporting statistical data. 1 © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 1 3/3/16 10:02 AM How Much Math Will You Need? Section 1.1 Introduction People approach thinking in different ways (Witkin, Moore, Goodenough, & Cox, 1977). Fielddependent learners process information as a whole. These people tend to view complex problems or tasks in their entirety and usually resist taking them apart; they shy away from analytical tasks. People with this type of cognitive style are sometimes referred to as “global thinkers.” In contrast, field-independent learners are inclined to break complex problems into pieces. Their approach understands the whole by reducing it to its elements. People with this type of cognitive style are sometimes called “analytical thinkers.” What does this theory about cognitive-style preferences have to do with statistical analysis? Google the phrase “statistical analysis” and the common element among the several definitions will be that it involves gathering quantitative data so that the whole can be understood by analyzing a part. With its concern for measurement, detail, and mathematical precision, statistical analysis sounds like work that favors field-independent people. However, the fielddependent person’s approach is an asset, for example, when an instructor needs to have a sense of whether a group of psychology students understands the distinction between shortand long-term memory. So where do the demands of statistical analysis leave students for whom an analytical approach is not second nature? For these people, data analysis can seem rather foreign. This book aims to help everyone who studies the behavioral sciences to tackle quantitative problems—especially students who do not naturally gravitate to data analysis. If analytical tasks happen to be the reader’s preference, all well and good; even so, the rest of us also need to navigate our way through statistical problems. The author’s intent is to appeal to both analytical and global thinkers. 1.1 How Much Math Will You Need? For researchers, statistical analysis is not the end goal: It’s a tool. The concepts and skills involved allow us to interpret the numerical measures we encounter and answer the questions people ask about human behavior. We are not primarily interested in statistical theory. To that end, the text explains what the formulas mean, and in some cases why they have the form they have, but not how they were derived. With such a practical approach, the math readers probably learned in secondary school should be adequate. Ordinarily, the text offers nothing more to worry about than order-of-operations questions. Brushing away the cobwebs over whether to multiply or add first when both are required, and reviewing how to deal with parentheses and exponents when they occur in a formula will be sufficient for the calculations and explanations here. If these topics are not familiar, review “please excuse my dear Aunt Sally,” or whatever memory aid you used in high school to prompt you to first, do what’s in the parentheses; then deal with any exponents (any squaring, for example); then do any multiplication and division, working from left to right, and last, any addition and subtraction, also from left to right. © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 2 3/3/16 10:02 AM Section 1.2 What About the Notation Symbols? In some contemporary statistics classes, the student does not do very much mathematical calculation. Computer analysis is so easy and so readily available that many of those involved in quantitative analysis say the manual mathematical calculations that were once mainstays in statistics classes are no longer necessary. Dedicated statistical packages are generally accessible, and spreadsheet programs such as Microsoft Excel can perform all of the basic calculations, as well as many of the simpler statistical tests. With all of these resources, why do anything by hand? Brand X Pictures/Thinkstock Although calculators and software such as Excel make statistical calculations easy, doing some of these calculations by hand helps with understanding the underlying concepts. Those who take the time to learn and complete the calculations by hand come to understand the underlying concepts more clearly than those who have the computer do all the work. Computer output provides a result to view, and with proper input, accuracy and precision are usually not problematic, but the tables and statistics often do not communicate the logic involved very well. Also, the computer software usually provides little guidance about what the output means or how the results were derived and why. Hand calculation requires that a researcher consider each step in the solution. Taken incrementally, the process makes the reasoning more coherent. Each of the 10 chapters in this textbook has been written so that readers can explain what they have done and why. Once the logic and the processes are familiar, we can use Excel to verification our calculations and to save time, particularly with larger data sets. 1.2 What About the Notation Symbols? Like any discipline, statistical analysis has symbols and language that are particular to it. Like many math-based subjects, statistical analysis makes some use of symbols, some of them from the Greek alphabet, to indicate procedures or values that are used frequently. Try not to be distracted by them; they often involve procedures or values that the reader has probably used many times. For example, the upper-case Greek letter ∑ (sigma) symbolizes summation. It indicates that several values are to be added together. Further, a ∑x means that several values, each of them referred to as “x,” are to be summed. If each of four subjects in a group is administered a verbal ability test, resulting in scores of 23, 35, 36, and 42, x refers to the individual scores, and ∑x 5 136. Sometimes whether Greek or traditional roman letters are used indicates the difference between a group’s origins. A population is all members of a defined group. All the members of your family comprise a population, as do all Nevada voters, or all psychologists in the ­country. Any subset of a population is called a sample. The average, or the mean, of some population characteristic—the mean age of Nevada voters, for example, or the mean number of years of education among psychologists—is indicated by the lowercase Greek letter µ © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 3 3/3/16 10:02 AM Section 1.3 Why Do We Use Statistics? (which is pronounced “mew”—like the cat, not “moo” like the cow). This book will follow the convention of journals of the American Psychological Association, and the mean of the sample will be indicated by the upper-case roman M. David Sipress/The New Yorker Collection/The Cartoon Bank Although the idea of an average score is not a new concept, the symbol µ may be. The symbol is just a shorthand way to mathematically represent the average, or what we will call from this point forward the mean. If the ages of the members of your family are 14, 21, 45, and 47, and if those ages constitute the entire group, then µt 5 31.750 [(14 1 21 1 45 1 47)/4]. People working with statistics find the symbols very helpful. They provide an economical way to communicate procedures and values that are used repeatedly, as some tend to be in statistical analysis. None of the symbols we use represent concepts or operations that are very complicated; they are just a briefer way to refer to common operations or characteristics. Try It!: #1 If a population is all members of a defined group, and your class has seven students, can its population really be only 7? The point of this discussion is that statistical analysis need not be the exclusive domain of those with an analytical style. It can be used by people who are global thinkers as well. The math in this book will be entirely manageable, and the notation that is used in the formulas is for efficiency, not mystery. 1.3 Why Do We Use Statistics? Many of the problems with which behavioral science professionals grapple are complex and nuanced because the people we study are complex and nuanced. From one point of view, behavioral scientists have a much more difficult job when it comes to research than, say, chemists have. If two hydrogen atoms combine with one oxygen atom, the result is always H20, water. Human behavior usually does not manifest such consistency. In a study of childhood obesity, Hewer (2014) reported that to understand children’s food choices, researchers decided to include brain-scan data gathered by the researchers, the number of children who selected healthy food, the number of children who ate what they took, and other measures that the researchers thought were related to childhood obesity. These different measures were all necessary because of subject-to-subject variation. Statistical © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 4 3/3/16 10:02 AM Section 1.4 Describing Data analysis must accommodate variability. And since researchers cannot analyze what they cannot describe, they begin with descriptive statistics. Descriptive statistics are an economical way to describe several measures. Note that descriptive statistics are generalizations. For example, in the obesity study (Hewer, 2014), researchers found that if they labeled carrots as “x-ray vision carrots,” 66% of the students in five schools consumed what they took. That doesn’t mean that 66% of the students in each school ate their carrots; the percentage shows a generalization over the five schools. In some schools the result was likely more than 66%, while in other schools, likely less. Meanwhile, Kiecolt-Glaser and others (2011) studied a group of older adults who were providing care for a family member with dementia. They found that the caregivers had a higher probability of disease late in life than other similar adults not caring for such a family member. Their generalization was that higher-stress activities result in a greater propensity for disease later in life. Studies such as this one result in inferential statistics; the researcher infers from a portion or a sample of the whole what is likely to occur in the entire group or population of similar people. In the earlier study by Hewer (2014), the objective was to understand how most children select food based on the behavior of a sample of such children; researchers were inferring population characteristics, what all children will do, from what a sample of children did. Sometimes the inference is from a sample to a population not specifically represented in the sample. For many years, it has been common in psychology to study one species and generalize to another, a practice that continues. Recently, Robinson (2014) reported studying rats to understand substance-abuse relapse in humans. The ability to use the sample as a window on a population requires, of course, that the sample be representative of the population, at least in the relevant characteristic(s). Robinson’s (2014) burden is to demonstrate that at least in the instance of relapse, what happens to laboratory rats can inform what occurs with people, in spite of the fact that two different species are represented. When the appropriate conventions are observed, samples can provide great insight regarding a population without the need to study the entire population—an important economy since studying the entire population is often not practical or perhaps even possible. Although inferential statistical analysis drives most psychological research, inferential procedures always involve descriptive statistics and our initial interest is in these measures. We want to learn to calculate and interpret them. After we have developed some facility with descriptive procedures, the discussion will turn to inferential statistical analysis. 1.4 Describing Data There are many ways to describe data, including the exclusively verbal descriptions that ­ethnographers and anthropologists sometimes use. However, quantitative research and the © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 5 3/3/16 10:02 AM Section 1.4 Describing Data statistical analysis upon which it is based require numeric descriptions. Numerical measures differ in kind; one way to classify them is in terms of scale. Data scale refers to the type and amount of information that each measure provides, and data can be categorized into four scales: 1. nominal 2. ordinal Nominal Data 3. interval 4. ratio The word nominal refers to the name assigned to the data. Nominal data are also sometimes called categorical data because the name indicates the category to which an individual belongs, in which case the only analysis that can be performed is to count the number that occur in a particular category. In an effort to analyze the demographic makeup of those involved in an alcoholics support group, for example, a researcher might gather data on gender, race, ethnicity, marital status, or religious affiliation by assigning numbers to each of the data categories. For ethnic group membership, perhaps 1 indicates African-American, 2 designates Asian, 3 Caucasian, 4 Hispanic, and so on. The numbers simplify summarizing the number of participants but beyond indicating the category to which an individual belongs, the numbers themselves have no mathematical meaning. It would not make sense to try to compute “mean” ethnicity, for example, by adding up the 1s, 2s, 3s, and 4s and then dividing by the number of categories. The numbers, in this case, are just labels. Ordinal Data Ordinal data allow a researcher to rank individuals relative to the others in the group. Ordinal data provide more information than nominal data, enough in fact that individuals can be ranked according to whatever quality is being measured. Higher values indicate more of the measured characteristic. The limitation of this scale is that ordinal measures do not indicate how much more of whatever is measured one individual possesses than another. In contrast, when comparing nominal data values, a higher (or lower) number simply means that the individual belongs to a different group. Noting that one individual is faster, or more creative, or more successful than another is an example of an ordinal measure. The measure is not very precise, but it does provide enough information for a ranking. A student who has an intelligence score placing her at the 90th percentile has a higher intelligence score than someone at the 85th percentile, but the amount of increase in intelligence is not clear. First prize at a car show is better than second but does not indicate the margin of victory. Interval Data Jim Reed/Science Faction/Corbis Fahrenheit temperature measurements are an example of interval scale data. With interval data, the size of the gap, or the interval (thus the name), between individual cases is known because the difference between consecutive data © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 6 3/3/16 10:02 AM Section 1.4 Describing Data points is constant anywhere along the number line. That rather obtuse definition just means that the difference between 4 and 6 on an interval scale is the same amount of difference as that between 17 and 19. Perhaps a psychologist develops a measure of verbal aptitude. If aptitude scores are based on the number of items the respondent scores correctly, and if each item carries the same weight, the increase in aptitude from scores of 12 correct responses to 17 is the same as the increase from 16 to 21. Interval scale measurement is quite common in the social sciences. Many of the mental measurement instruments psychologists use to gauge anger, depression, aptitude, intelligence, and so on produce interval scale data. However, many are also ordinal scale. Though researchers will sometimes argue that ordinal scale data with a large number of categories can be treated as interval scale data, several ordinal categories do not magically transform ordinal data into actual interval data. It is not possible to directly measure mental states, for example, nor is it possible to demonstrate that the intervals between consecutive ordinal scale values are truly equal. To state that these types of instruments actually produce interval data instead of ordinal data may be wishful thinking. Ratio Data Ratio data have all the characteristics that interval data have and two more. First, a 0 in ratio measure indicates the absence of the characteristic being measured. With interval scale data, 0 does not mean that none of the characteristic is present; it is just a point on the scale midway between 21 and 11. Because the temperature is 0 degrees Fahrenheit does not mean that it cannot become colder; 0 is just a point on the scale. (The Kelvin scale is a different matter, of course, where 0 is the coldest possible temperature.) In the aptitude example above, if someone missed every item on the measure, it is probably not wise to conclude that the individual has zero aptitude for whatever is gauged. Likewise, we would not argue that someone who misses every item on a spelling test cannot spell at all, despite the individual’s answering every item incorrectly. In terms of the construct measured (spelling or aptitude), neither test provides ratio scale data. The other quality that sets ratio data apart has to do with the name. With these data, what are called “ratio comparisons” are possible. A person who is 6 feet tall can be accurately said to be twice as tall as a child who is 3 feet tall, or a 20-year-old is half as old as someone who is 40, or someone who makes $90,000 a year has an income three times greater than the person who makes $30,000. Comparisons like these only make sense with ratio data. Except for demographic characteristics like height, weight, age, and income, ratio scale measurement is very uncommon in psychology and in fact in all the social sciences. Mental ­measurements are rarely—­ perhaps never—ratio scale. Identifying the scale of the data is important because to some extent the statistics that can be calculated to describe the data depend upon their scale. Likewise, Try It!: #2 a. Someone comments about a student, “He’s more ambitious than his classmates.” What scale of data is involved in such a statement? b. What scale of data is in your bank statement? © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 7 3/3/16 10:02 AM Descriptive Statistics Section 1.5 some of the statistical tests this text will calculate later are tied to data scale. Tests that require interval data will also accommodate ratio data, so the distinction between those two is not important in terms of which test to use. The differences between nominal, ordinal, and interval data matter a great deal, however. 1.5 Descriptive Statistics As we suggested earlier, descriptive statistics are measures that summarize larger groups of data. They make describing the data set possible without the need to recite each of the individual measures. There are several descriptive statistics, but the most common are measures of central ­tendency and measures of variability. The latter are also sometimes called measures of dispersion. First, we will look at measures of central tendency. Measures of Central Tendency Three different measures of central tendency are commonly used in data description: the mode, the median, and the mean. Although each indicates what is most typical in a data set, each measure relies on a different definition of “typical.” As a result, these three statistics complement each other. The Mode The mode (Mo) is defined as the most frequently occurring value in a group. It is the only statistic for indicating central tendency that is appropriate when the data are nominal scale. Perhaps someone is interested in the residential background of 20 military personnel being treated for post-traumatic stress disorder (PTSD). If they indicate their predeployment residences on a questionnaire, and if 1 indicates urban, 2 suburban, 3 semi-rural, and 4 rural, the results might be as follows: 1, 2, 2, 1, 1, 3, 4, 1, 4, 3, 2, 1, 1, 2, 1, 3, 2, 1, 1, 2 Remember that the numbers indicate just where they lived prior to deployment. If those in each category are counted, the result is: 1s 9 3s 3 2s 4s 6 2 The mode, the most commonly occurring value, is 1, Mo 5 1, which indicates that in this group of 20 military personnel being treated for PTSD, more come from urban backgrounds than from any of the alternatives. © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 8 3/3/16 10:02 AM Descriptive Statistics Section 1.5 The mode is often calculated for ordinal, interval, and ratio data as well, where it also indicates the most frequently occurring value. In such cases, it is usually accompanied by some other measure(s) of central tendency. However, for nominal data, the mode is the only measure of central tendency that makes sense. In some instances, there can be more than one mode. In the PTSD example above, had three more suburban service personnel been diagnosed with the condition, the results would include two most frequently occurring values (9 ones and 9 twos). Such data have a bimodal distribution. The Median When scores are arranged either from largest to smallest or from smallest to largest, the median (Mdn) is the middlemost score. Another way to describe it is that the median is that point on a scale where an equal number of scores have greater values and lesser values. The median requires data of at least ordinal scale. Since interval and ratio data can also be ranked, the median can be calculated for any data beyond nominal scale. The median is not calculated for nominal data because there, the numbers are only category labels, and the “middle” would be an arbitrary value based entirely on the numbers used as labels. Suppose all freshmen students at a university are ranked in order of their academic performance at the end of the year. These are the class rankings for nine students in the department of psychology: 3, 7, 13, 15, 17, 33, 36, 42, 51 The median is the middle ranking. Among nine rankings, the middle ranking is the fifth, which for these class rankings is 17, so the Mdn ranking for psychology freshman is 17. Note that this data set has no mode. All the rankings occur with the same frequency, 1. If 10 psychology students were ranked, and the 10th had a class ranking of 52, the results would show the following: 3, 7, 13, 15, 17, 33, 36, 42, 51, 52 In this case, with an even number of scores, two scores occur in the middle of the distribution. The median is the average, or mean, of those two middle scores. The fifth and sixth scores are 17 and 33: 17 1 33 5 50 4 2 5 25. With the additional student’s class ranking added, the Mdn ranking for psychology freshman changes to 25. Note that the change to the median that results from adding the ranking of 52 is substantial, not because of the magnitude of the 52, but because the size of the gap from 17 to 33 is so large. Had the next ranking after 17 been 18, for example, adding the ranking of 52 to the data set makes Mdn 5 17.5. The variability in the data set greatly affected this measure of central tendency. The Mean The mean (M) is the average of a set of values, but hereafter the text will use the term mean rather than average. To calculate the mean, the data must be interval or ratio scale. Suppose a © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 9 3/3/16 10:02 AM Section 1.5 Descriptive Statistics data analyst is interested in gauging the level of depression among 10 social workers. A psychologist administers a depression scale for the 10 and finds the following results: 3, 4, 4, 5, 5, 6, 6, 7, 7, 8 Calculating a mean might not be new, but the statistical symbols might be, so note Formula 1.1: Formula 1.1 ∑x M5 n where M 5 the mean x 5 each value in the set (in this case, each depression score) ∑x 5 the sum of the individual depression scores n 5 the number of values or scores For the depression data, verify that Sx 5 55 n 5 10, ∑x 55 n 5 10 5 5.5 M 5 5.5 indicates that the mean level of depression is 5.5. Without something to which the mean can be compared, the number by itself is not very revealing. It is not clear whether 5.5 is the mean level of depression for all people, or all ­college-educated people, or all social workers. All we know is that for this group, 5.5 is the arithmetic mean. Although there have been variations in the past, most of the major journals in psychology and many other disciplines currently use M to indicate the mean of a sample. (Recall from earlier in the chapter that µ indicates the mean of a population.) This text will follow the convention of using M for the sample mean. The median can also be calculated for interval data. With an even number of scores, as in the ­depression-score sample, the median will be the midpoint between those two middle scores: 5 and 6. For the depression scores, Mdn 5 5.5. JPL/Jpse : Pelaez/Corbis A psychologist administering a depression scale can use the median and mode to understand what is typical of several depression scores. Calculating the mode for the depression scores shows that four values occur with the same frequency, so there are four modes, Mo 5 4, 5, 6, and 7. In small data sets, the mode is usually not very revealing and often not calculated at all, but the most frequent score or measurement—the mode— can be very informative with larger groups. © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 10 3/3/16 10:02 AM Descriptive Statistics Section 1.5 The fact that the mean and median have the same value (M 5 5.5, Mdn 5 5.5) will be particularly relevant to the discussion of normality in Chapter 2. For now, just note that the mean and the median have the same value and for our set of depression scores at least, the mode is not helpful. Measures of Variability Measures of central tendency are easier to understand when accompanied by variability measures, and these two descriptive measures often go hand-in-hand. As the name suggests, variability measures indicate how scores differ from each other. Generally speaking, large values indicate substantial variability. Relatively small variability values indicate that the data are quite similar or quite homogeneous. For example, among university students who have all met entrance requirements to a prestigious school, there will be less variability in scholastic aptitude than among people of the same age selected randomly from the general population. The Range When a researcher needs a quick measure of data variability, the range (R) is the easiest to calculate. It is the difference between the highest and lowest values in a data set. For the depression data 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, R 5 5 (8 2 3) It is common to hear people say, “Scores ranged from ___________ to _____________ ,” but technically the range is just one value: the difference between the highest and lowest measures. For that reason R is not very informative when reported in isolation from other statistics. The depression data’s R 5 5 does not reveal much. A research report in which the authors report only that R 5 5, does not indicate how many values were involved or what the highest and lowest values were. For example, if two clerical workers from the same state office as the social workers are also measured on the depression scale and have scores of 9 and 14, then the range for those two scores is also R 5 5. The clerical workers’ scores are quite different from the social workers’ scores in terms of quantity and value, but the range does not reflect either of those differences. The Variance and the Standard Deviation While the range indicates how much difference there is between the high and low scores, other measures of variability are based on how much individual scores in a data set vary from the data mean, M. This describes both the variance (s2) and the standard deviation (s), which are anchored to the mean of the group. For either statistic, large measures (and it is difficult at this point to know what “large” is) indicate that individual values in the group tend to differ more from the mean (M) of the group than smaller values. As the s and s2 notations suggest, the two statistic’s formulas (see 1.2 and 1.3 below) are very similar. Note that in this text, while we will initially distinguish between sample and population standard deviations, we will use standard deviation as shorthand for sample standard deviation, since it is very rare that anyone has access to population data. © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 11 3/3/16 10:02 AM Section 1.5 Descriptive Statistics Formula 1.2 The formula for the variance (s2) is expressed as: ∑(x 2 M)2 (n 2 1) s2 5 Formula 1.3 The standard deviation (s) is expressed as: s5Ñ ∑(x 2 M)2 (n 2 1) In the case of either formula, ∑ 5 summation, x 5 each score in the group, M 5 the mean of the group, n 5 the number of scores in the sample. Note that if the standard deviation (s) is multiplied by itself the result is the variance (s2). From the other direction, taking the square root of the variance produces the standard deviation. Why include both of these closely related statistics? The answer is that some of the more involved procedures we will use later call for the standard deviation (s), some for the variance (s2). Figure 1.1 depicts how to use the formula to calculate the variance. Figure 1.1: Calculating the variance Determine the group mean ∑x/n Subtract the mean from each score (x−M) Square each x−M difference (x−M)2 Sum all the squared differences ∑ (x−M)2 Divide the sum of the squared differences by the number of scores, minus one ∑ (x−M)2 / (n −1) © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 12 3/29/16 1:13 PM Section 1.5 Descriptive Statistics To calculate the variance for the depression data (3, 4, 4, 5, 5, 6, 6, 7, 7, 8), we follow this procedure: ∑x 55 1. M 5 n 5 10 5 5.5 2. Subtract M from each x. 3 2 5.5 5 22.5 6 2 5.5 5 0.5 4 2 5.5 5 21.5 7 2 5.5 5 1.5 4 2 5.5 5 21.5 5 2 5.5 5 20.5 5 2 5.5 5 20.5 6 2 5.5 5 0.5 7 2 5.5 5 1.5 8 2 5.5 5 2.5 3. Square each x 2 M difference. Remember that squaring a negative number results in a positive. 22.52 5 6.25 0.52 5 0.25 21.52 5 2.25 1.52 5 2.25 21.52 5 2.25 20.52 5 0.25 20.52 5 0.25 0.52 5 0.25 1.52 5 2.25 2.52 5 6.25 4. Sum the squared differences. 6.25 1 2.25 1 2.25 1 0.25 1 0.25 1 0.25 1 0.25 1 2.25 1 2.25 1 6.25 5 22.50 5. Divide the sum by the number of scores (10) minus 1. 22.50 9 5 2.50 s2 5 2.50 Determining the standard deviation requires one more step. 6. The standard deviation (s) equals the square root of the variance, or  s 5 Î 2.50  s 5 1.581 Note that the way the variance and standard deviation are determined emphasizes their relationship to the mean. Although all statistical packages and spreadsheets will calculate these measures, and in fact almost any other descriptive statistic as well, there is value to doing a © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 13 3/3/16 10:02 AM Descriptive Statistics Section 1.5 few of these calculations by hand with the steps given. The repeated x 2 M calculations (step 2) remind us that the central component in both of these statistics is the difference between individual scores and the mean. Because subtracting each individual value (x) from the mean indicates how far individual data points are from M, why not just sum those differences to get a total of the differences between the xs and M? Why square the result? The answer is that about half of those differences are going to be positive (when x . M) and about half are going to be negative (when x , M), as was the case with the depression scores. Summing them would result in something close to 0, which would not provide any useful information about data variability. When the differences TIP are squared, however, all the negatives become posiWhen using a calculator to determine a tives, and when they are averaged (step 5, dividing the square root, be aware of some functionality sum of the squares by n 2 1; the 21 will be discussed differences. Some calculators allow entering later), the result is a value that provides a gauge of the the value and then pressing the square root typical distance between the scores and the mean. button. Others require that the square root function be pressed before entering the value. The variance and standard deviation have more than purely descriptive functions. When we explore t test, for example, we will find that when two sample sizes are equal, we need the standard deviation to complete the test. When sample sizes are not equal, we use the variance. But for now, our interest lies just in describing data sets. The depression scores are fairly similar; the 10 scores have a range of only five points and what readers will come to recognize, with some practice, as a comparatively small variance and standard deviation: s2 5 2.50 and s 5 1.581. The name standard deviation suggests that what this statistic is indicating is the “standard” deviation of all the individual data points from the mean of the group. While it is not, strictly speaking, an average or mean deviation, it is similar. The result indicates that the “standard” deviation of individual data points from the mean of the sample is 1.581 points. As we noted above, the variance answers the same question about typical variability among the values in a group, but it does so without the final square root, which means that the variance will always have a larger value. Alternative Formulas Formulas 1.2 (the variance) and 1.3 (the standard deviation) are examples of conceptual formulas, chosen because they make it easier to understand how the resulting value is derived and what it means. A scan of other statistics books indicates that there are alternative formulas for calculating standard deviation and variance. Some of these are easier to Try It!: #3 use, particularly with large data sets, but they commuIn a set of n 5 5 where M 5 6.0, the lowest nicate less than the ones we have used here. Because we value is 3 and the highest is 11. If a value want to emphasize clarity over ease of calculation (and of 12 is added, which will be affected most, because we know that your calculator or computer is R or s? going to do the work in any case), this text will use Formulas 1.2 and 1.3 and keep the data sets small. © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 14 3/3/16 10:02 AM Section 1.5 Descriptive Statistics Apply It! A Study on Studying As part of her thesis, Anna, a graduate student in psychology, is studying college failure rates among first-year students. She has just read results from a nationwide study that found that the majority of students who dropped out in their first year studied less than 11 hours per week. Anna decides to conduct a survey of first-year students at her college. She chooses a random sample of 100 freshmen and asks each how many hours they study in a typical week. Rather than present all 100 responses, she will summarize them with descriptive statistics. The difference between studying 6 hours and 8 hours a week is precisely the same amount of difference in study time as the difference between studying 1 and 3 hours a week, so the data are at least interval scale. However, since a response of “zero” would indicate that the responding student did not study at all, students’ responses actually constitute ratio scale data. Anna calculates several measures of central tendency, and discovers these results: • Mode 5 16 hours • Median 5 15.5 hours • Mean (or average) 5 16.5 hours As you can see, values for the mode, median, and mean are all closely grouped. Since measures of central tendency are more informative when accompanied by a variability measure, Anna proceeds to calculate the range and standard deviation. She finds that the range (R) is 33, meaning the difference between the student who studies the most and the one who studies the least is 33 hours. Creates/Thinkstock In addition, Anna calculates the sample standard deviation(s) for the responses and finds that s 5 5.5; the standard, or typical deviation from the mean of 16.5 hours of study is 5.5 hours. Consider the implication. Students who are one standard deviation (5.5 hours) or more below the mean of 16.5 hours study 11 hours per week, or less. They represent the very group most likely to drop out during their first year of college study. At the end of the study, Anna presented her results to the dean of students. He asked her to make a presentation to the freshman class emphasizing the relationship between study time and the tendency to drop out of school. Apply It! boxes written by Shawn Murphy The Impact of Different Score Values In our example, the highest depression score was 8. If instead it had been a 12, how would that affect the variance and the standard deviation? Because 12 is more distant from the mean than 8, and because both the variance and standard deviation formulas call for the M 2 x differences to be squared, a more extreme value invariably increases the value of the statistics. © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 15 3/3/16 10:02 AM Section 1.5 Descriptive Statistics If s2 and s (and M) are recalculated to reflect the change, note that the mean changes to M 5 5.9; s2, the variance, becomes 6.322, compared to s2 5 2.50 in the original data set; and s, the standard deviation, becomes 2.514, compared to s 5 1.581 originally. TIP Calculators with a built-in standard deviation function make it possible to enter the values and produce the result without the several x 2 M steps. Directions vary somewhat depending upon the particular calculator, but most have a key marked something like “σxn 2 1” or “σn 2 1,” or SD for the standard deviation. The σ is the lowercase Greek letter sigma, which is the equivalent of s. It is often used as a symbol for the standard deviation, as we will see later in the chapter. An σ2 key for the variance is less common because it is the lesser-used of the two statistics, and in any case, using the x2 key on the calculator to square the standard deviation will produce the variance. Extreme values have a disproportionate impact on the value of these statistics. Because both statistics are based on the square of the difference between individual scores and the mean, and because extreme scores produce the largest squared differences, extreme values have a disproportionate effect on the size of s2 and s as we noted above. Scores much smaller than the mean also have great impact because the issue is the difference between x and M, not just the magnitude of the score. If one of the 4s in the data set is changed to 0 so that the depression scores become 0, 3, 4, 5, 5, 6, 6, 7, 7, 8, then the mean reduces to M 5 5.10; s2 becomes 5.433 instead of the original 2.50; and s becomes 2.331 instead of the original 1.581. To summarize, scores quite similar to the mean of a distribution tend to shrink the values of the standard deviation and the variance. Scores substantially different from the mean in either direction make those measures of variability larger. The effect of extreme scores, called outliers, can be to substantially distort values and statistical procedures, particularly when sample sizes are small. The problems of how to identify them and what to do with them will come up later in the book. Although they are all variability statistics, there is an important contrast between the variance and standard deviation statistics on the one hand and the measure that is the range on the other. Since the range is based on the lowest and highest score in a distribution, the R value is unaffected by any of the values that fall between the extremes; no additional value between the highest and lowest values will change the range. The number of scores in the range can increase, but R will not decrease unless the two most extreme values become less extreme. Populations Versus Samples and a Correction for a Biased Estimator Earlier, the chapter defined population as all possible members of a specified group. The sample was any subset of the population. A population can include “every psychologist in San Diego,” or “all law enforcement officers in the state,” or “everyone in your family.” Remove at least one person from any of those populations, and the resulting group is a sample. Because samples are easier to access than populations, most research is based on samples. Gathering data on populations, particularly when they number in the thousands or millions, is far too time-consuming and expensive to be feasible. When important parameters for gathering the sample are met, the sample’s characteristics, including its descriptive statistics, will mirror those of the population. While no sample can be an exact duplicate of its parent population, samples can be similar enough for useful inferential analysis. This is the principle upon which political polling is based, of course. A few thousand people who accurately represent the characteristics of the entire electorate may © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 16 3/3/16 10:02 AM Section 1.5 Descriptive Statistics reveal how an election will likely turn out; a carefully selected group of law enforcement officers may suggest the stress level in all law enforcement officers; a few consumers may indicate what the tendency will be among the population of all consumers. Sometimes making the sample representative involves more than careful sampling technique. As it turns out, samples tend to be less variable than populations. If we were to conduct a survey of stress levels among law enforcement officers, we would probably find that the sample produces stress scores that are less variable than stress in the entire population of police officers. If researchers do not make some adjustment (described below), the result could easily be a consistent underestimation of stress variability, which constitutes bias. Note that in statistical usage and in this example, bias means that stress among law enforcement officers is consistently, and probably unintentionally, underestimated. It does not carry the more widely used sense of intentional discrimination. Since careful sampling is not enough to counter the bias, researchers make a mathematical adjustment for variances and standard deviations in samples so that they will more accurately reflect the population. This “correction for a biased estimator” is the 21 in the denominators of Formulas 1.2 and 1.3. The adjustment slightly increases the value of the variability measure, with the greatest correction occurring when samples are smallest and the potential for distortion is greatest. If all the data are available for every possible member of a group, bias is not a concern, and the adjustment is therefore unnecessary. In the case of population data, should they be available, the formulas for variance and standard deviation are as follows: Formula 1.4 ∑(x 2 μ)2 N σ2 5 Try It!: #4 Formula 1.5 σ5Ñ What constitutes bias in statistics? ∑(x 2 μ)2 N Besides the absence of the 21 in the denominator, note that sigma (σ) has been substituted for s, and mu (µ) has been substituted for M in both formulas. The σ indicates the population standard deviation, and σ2 indicates the population variance. Earlier in the chapter, we noted that µ indicates the population mean. Differentiating Sample and Population Characteristics The descriptive characteristics of populations are referred to as parameters, and technically, the word “statistic” refers to a sample characteristic. To summarize, Mean Standard Deviation Population Parameter Sample Statistic µ M σ s © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 17 3/3/16 10:02 AM Section 1.5 Descriptive Statistics The Statistic or the Parameter? Zhao Jingwu/Xinhua Press/Corbis Researchers sometimes use population data from an entire national census, but most often they work with only a sample of a given population. Unless we are working with a relatively small ­population—such as the population of a particular social worker’s clients or the population of a family or social group—we generally do not have all the data. Exceptions are found, of course: Researchers sometimes work with data from the U.S. Census which, by definition, includes the entire population of the country, and testing agencies will sometimes provide means for the entire population who took a particular test. But researchers more commonly have access only to sample data. This book will use Formula 1.3 and calculate the sample standard deviation. Understanding Degrees of Freedom The n 2 1 also gives us a chance to introduce degrees of freedom (df). Degrees of freedom are one of those odd statistical abstractions that are difficult to explain briefly but affect the way that many procedures are computed and interpreted, such as t tests (Chapter 5) and analysis of variance (Chapter 6). Degrees of freedom are the number of scores in a calculation that are free to vary when the final result of the calculation is known. Consider the following example. If the sum of three integers is 6, or ___ 1 ___ 1 ___ 5 6, then the first two of those three integers can be any integers. They could be 2 1 2, or 3 1 1, or 23 1 10, or any other two values, as long as the value of the third integer makes the sum 6. The third value cannot vary; it must be 2 in the first and second examples above, and 21 in the third example. This problem has 2 degrees of freedom. If the number of integers that make up the problem is n, degrees of freedom (df) for a problem like the three-digit addition problem above are n 2 1. The standard deviation and the variance also have n 2 1 degrees of freedom. If we know the final value of either s2 or s, all the scores in the data set except for one can have any value, but the final value must be whatever it takes to make the result what it is. Degrees of freedom will come up with other procedures as we move through the book. TIP Calculators with a built-in standard deviation function generally have keys for both the sample standard deviation and the population standard deviation. They usually distinguish the sample standard deviation with something like an n 2 1 on the key or in the display. Unfortunately, some manufacturers use s for both the sample and the population standard deviation, so look for the n 2 1 to identify the sample statistic. Reviewing Results Do not become so busy calculating variances or standard deviations that you set common sense aside. All measures of variability have this in common: the calculated value cannot be a number less than 0. There is no such thing as negative variance. If a school psychologist measures intelligence scores for a variety of students who have learning disabilities and finds that they all have the same score, the scores have no variation, and s2, s, and R will all equal 0. If, in the course of all the © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 18 3/3/16 10:02 AM Calculating Descriptive Statistics with Excel Section 1.6 number-crunching, a negative value somehow emerged for a variability statistic, look for a calculation error. 1.6 Calculating Descriptive Statistics with Excel Now that you can calculate descriptive statistics by hand and understand what they mean, we can begin to rely on Excel. Excel, which is part of Microsoft Office, is an electronic spreadsheet program. Like all spreadsheets, it is laid out in the rows and columns of a ledger. Computer spreadsheets were originally intended for businesspeople who kept track of and manipulated large amounts of numeric data. Although the commands vary for different spreadsheet programs, they all produce descriptive statistics, and most programs, including Excel, will also complete some of the basic statistical tests. Descriptive statistics in Excel can be obtained in two ways. They can be calculated directly by entering the step-by-step commands, as we did when we used the calculator, or they can be part of a package in Excel called “Descriptive Statistics.” First, we will try the individual commands. Consider a psychologist who is interested in the cognitive characteristics of teenage boys who get into trouble with the law. The psychologist gathers data on problem-solving ability among teenage boys consigned to juvenile hall. Using the Problem-solving Aptitude Test (PAT) and having secured the appropriate permissions, the psychologist tests 12 randomly selected juvenile offenders. These are the resulting PAT scores: 11, 14, 14, 15, 17, 17, 17, 19, 22, 22, 23, 27. Navigating Excel The individual boxes in an Excel spreadsheet are called “cells.” Each cell is identified by the column and row in which it is located. The columns are labeled from left to right, alphabetically from column A. The rows are numbered down the left side of the window beginning with 1. Cell A1 is the top cell in column A. The next cell down is cell A2, and so on. When identifying a cell, the column letter is named first, followed by the row number. When entering cell locations in Excel, do not put a space between letter and number. This text uses the Office 2010 version of the software. Although Excel is updated from version to version, and the appearance and some of the procedures change, the commands remain quite consistent. The steps for entering the PAT data into the spreadsheet are as follows: 1. Place the cursor in the first cell (for example, cell A1). This can be done by using the arrow keys in the keypad to move the cursor, by clicking the mouse on the particular cell, or by using the touchpad on a laptop. 2. In cell A1, type the number 11, followed by the Enter key. Typing Enter moves the cursor to the next cell down. 3. Enter each of the other values so that the data are arranged vertically in all the cells from A1 to A12. The spreadsheet should look like Figure 1.2. © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 19 3/3/16 10:02 AM Section 1.6 Calculating Descriptive Statistics with Excel Figure 1.2: A data set entered in Excel Figure 1.2 depicts how Excel displays a data set as a vertical list. Source: Microsoft Excel. Used with permission from Microsoft. Entering the Command for the Mean An equal sign (5) entered into a cell tells Excel that a specific command or a formula will follow. To calculate the mean for the 12 PAT scores so they appear in cell A13, perform these steps: 1. Place the cursor in cell A13. 2. Type in =average(a1:a12), which averages the data in cells A1 to A12. (Note that the Excel command is average rather than mean.) Press Enter. Or, type in =average( and then use the mouse to highlight the cells that will be included, followed by Enter. The value that appears in cell A13 is the mean, 18.16667. 3. From the Home page, click the arrow in the bottom right corner of the Number tab (it is in the middle near the top of the screen). 4. Under category, click Number, and then using the arrow keys on the right, indicate the number of decimal places. Round to 3 decimal places and click OK. That will make M 5 18.167. Using the Descriptive Statistics Option Entering the specific command is fine if that is the only statistic needed, but if we need a more comprehensive description, the mean is one of several descriptive values in the Descriptive Statistics option. For the PAT data list, to perform the commands for that package of statistics complete the following steps: © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 20 3/3/16 10:02 AM Section 1.6 Calculating Descriptive Statistics with Excel 1. From the Home tab, click the Data tab, which is four to the right at the top of the page. 2. Click the Data Analysis window at the extreme right just below the tabs. This will open a small window in the page with a list of options. (If the Data Analysis option does not appear, you will need to add it. For Excel 2007 users, click here for instructions. For instructions on how to add in this window in Excel 2010, 2013, or 2016, click here. 3. Click on the Descriptive Statistics option, and then click OK. 4. In the small window labeled Input Range, type in the cells for which you wish the values to be included (A1:A12) just as we did when we entered the formula for the mean. (Column letters may be entered as either upper- or lower-case.) Note that the default setting for data is “Grouped by” columns. If the data were listed along a row, the default would need to be changed. 5. Click Output Range and indicate where the results display is to begin, perhaps cell C1, so that results are next to the original data but not on top of them. 6. Finally, click the particular output you wish, which is Summary statistics. 7. Click OK. Figure 1.3 shows the results of the Descriptive Statistics option with the values rounded to three decimals. Figure 1.3: The Excel data analysis, Descriptive Statistics option Figure 1.3 shows how Excel displays the results of the Descriptive Statistics option, which performs a series of statistical calculations. Source: Microsoft Excel. Used with permission from Microsoft. © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 21 3/3/16 10:02 AM The Language of Research Section 1.7 Note the following Excel results: M 518.167 s 5 4.589 Mdn 5 17 Mo 5 R5 17 16 With values from 11 to 27 in the data set, what happens to the range if a second value of 27 is added? The answer, of course, is that the range is unaffected since it is a measure of the difference between the highest and lowest values in the data set, and the highest and lowest values are not altered by the addition of the new value. Is the same true of the standard deviation? Is it also unaffected? Although it too is a measure of variability, unlike the range, the standard deviation statistic indicates how much individual values tend to vary from the mean of the data set. A value of 27 is substantially different from the mean of 18.167, so the value of the standard deviation (and the variance) must increase. Recalculating the values with that additional 27 produces a result of s 5 5.031, compared to the original s 5 4.589. On the other hand, if an additional value near the mean were added, say something like x 5 18, what would be the effect on the standard deviation? The typical variability and therefore the value of the standard deviation would diminish. The output indicates that besides the mean, standard deviation, and range, Excel produces other descriptive statistics, including the median (Mdn), the mode (Mo), the lowest and highest values, and the variance (s2). It will also produce some statistics not yet introduced. Chapter 2 discusses two other statistics Excel produces, skewness and kurtosis. The standard error will be discussed in Chapter 4. 1.7 The Language of Research When a formal plan is developed to gather and analyze data, the plan is called a research design. A research design allows the researchers to gather the relevant data and perform the analyses needed to answer a research question. The statistical procedures in this chapter and those that follow are often elements of research designs. Sometimes a study is conducted to obtain descriptive statistics. For example, someone needs to know the mean level of education among those who are unemployed, or how much variation is in autistic clients’ verbal behaviors. Often, however, descriptive statistics are calculated as part of some more involved research project, like a senior paper or a research report. Researchers take the time to calculate descriptive statistics because they provide an economical way to understand a larger body of data. In the case of measures of central tendency and variability, descriptive statistics indicate what is most typical, and how much individual © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 22 3/3/16 10:02 AM Section 1.7 The Language of Research measures tend to stray from what is typical. Measures of variability are necessary precisely because of differences among the measures. If there were no variability, we would have what is called a ­constant value. Constants hold little interest for the researcher. If Alfred Binet, the father of intelligence testing, had not observed differences in intellectual ability, he would have had no reason to develop an intelligence test. Variables are what make measures interesting and worth studying. Qualitative and Quantitative Variables Eco Images/Getty Images Research design is a formal plan that guides researchers in gathering relevant data and performing the appropriate analyses. Qualitative variables are difficult to measure and reduce to a number. Often they are the nominal scale, or categorical variables mentioned earlier, that are used to classify peoples’ demographic characteristics, such as religious persuasion or national origin. Sometimes qualitative variables refer to emotional characteristics, such as passion or discouragement, or they may refer to traits such as intelligence or creativity that can be difficult to quantify. Quantitative variables are any variables where numbers reflect the amount of what is measured, such as age or spelling ability. With these variables, a greater value indicates more of the measured characteristic. Since many demographic variables are qualitative, quantitative research typically involves both kinds of variables. Strictly speaking, however, research involving procedures designed specifically for qualitative and quantitative variables is called mixed methods research. Dependent and Independent Variables Research designs always identify the variables the researcher believes to be relevant to an outcome. The outcome itself is the dependent variable; it is the affected variable or the consequence variable. The variable thought to help bring about the effect is the independent variable. It is tempting to say that the independent variable causes the dependent variable, but cause is difficult to demonstrate in social science research. The problem is not that causes do not exist; the problem is that—particularly with human-subjects research—they are difficult to verify. Perhaps, having read the research on community service, a psychologist wonders whether service to other people reduces individuals’ feelings of discouragement. The psychologist develops a plan, a research design, to see whether serving in a soup kitchen for the poor (the independent variable) is associated with lower feelings of discouragement in the volunteer (the dependent variable). As the psychologist executes the design, descriptive statistics will be calculated for the independent and dependent variables—the mean level of discouragement, the standard deviation of the hours served by subjects in the experiment. But in this instance, the descriptive statistics are components of a broader purpose, which is to determine the relationship between the © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 23 3/3/16 10:02 AM Summary and Resources independent and dependent variables. By the way, even if discouragement declines among those serving, it will be difficult to attribute the change to serving. For instance, if discouragement declines among people who serve, increased social interaction might be the factor that reduces discouragement, or perhaps goal-directed behavior or some other variable is the cause of the outcome. Summary and Resources Chapter Summary Part of the transition into any new discipline is learning the terminology necessary to have a common language. Part of the language of statistical analysis is labeling data which brought us to the scale of the data. Recall that scale (nominal, ordinal, interval, and ratio) refers to the kind and the quantity of information that the data provide (Objective 1). Data scale also helps us determine which statistics we can calculate (Objective 2). Descriptive statistics can provide a great economy when data sets increase in size. The central tendency measures (mean, median, and mode) suggest what is most representative in a data set, although they each define what is typical differently. For that reason, these measures are often reported together (Objective 3). Measures of variability complement measures of central tendency. When the standard deviation, the variance, or the range is calculated and reported with the mean, we have a view of not just what is most typical but also of how homogeneous the data are (Objective 4). The descriptive characteristics of populations are referred to as parameters, and technically, the word statistic refers to a sample characteristic. Recall that the notations used for the mean and standard deviation in samples are M and s, respectively, and that the notations used for population are µ and σ, respectively (Objective 5). To gain a working knowledge of statistics, it is important to study often—daily, if possible. Several frequent, brief study ­sessions tend to be more productive than less frequent, more intensive study periods. In statistical analysis, we face problems that are not common to many other disciplines. Unlike English or journalism, for example, statistical reasoning emerges less often in ordinary conversation. Outside of study sessions, it can be helpful to look for ways to apply concepts, which is one of the reasons that frequent study is important. As sensitivity increases, students begin to recognize situations in which people calculate the wrong descriptive statistics or explain their data improperly. Those experiences should prompt the healthy skepticism important to all scholarship. Many dedicated software packages are available for statistical analysis: SPSS, SAS, and ­SYSTAT are three of the more prominent. However, Microsoft Excel is probably more ­accessible than any of them, so it will be used in this course. Besides doing calculation and analysis, Excel also makes it easy to produce some of the graphs and other data displays that later chapters will use. Statistical concepts have an incremental nature. The topics developed in each chapter become part of a more involved topic in later chapters. Virtually no concept is raised, ­discussed, and then permanently set aside, so frequent review is valuable. The questions © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 24 3/3/16 10:02 AM Summary and Resources in the Try It! boxes are intended to keep students thinking about important concepts. The answers to these questions follow. Besides the homework that instructors assign, the end-of-chapter review questions will help students judge their level of understanding. If the questions seem difficult, reread the relevant section, and then tackle the problem again. The answers to the odd-numbered items are in the back of the book. The author’s hope is that students do not attempt to simply push their way through this course, but rather become enamored with the concepts. To that end, more important than students’ natural talent are their drive, their frequent study, and their willingness to be open to thinking differently—all variables students can control. Onward! Key Terms bias In statistical analysis, a consistent error of the same nature. If a sample is drawn which distorts some characteristic of the population, the nature of the parent population will be misrepresented in all experiments involving the sample. Results are distorted (biased) in a way that can’t be corrected by simply repeating the experiment. constants Also called constant values; have only one value. For example, the temperature at which water boils at sea level is a constant value, 212 degrees Fahrenheit. data scale The kind of information that data values provide. The scales of data include nominal data, which define category; ordinal data, which allow ranking; interval data, which have consistent increases/ decreases between consecutive data points; and ratio data, which have a meaningful 0. degrees of freedom (df) The number of measures in a procedure that are free to vary, or to have any value, when the result is known. For the standard deviation, for example, df 5 n 2 1, which means that if there are 10 values on which the standard deviation calculation is based, 9 of them may have any value, so long as the 10th produces the correct result. dependent variable In a research problem, the variable affected by the treatment. descriptive statistics Provide values that define the characteristics of a data set. Typical descriptive statistics are measures of what is most typical and measures of how different individual values are from each other. independent variable In a research problem, the antecedent variable expected to explain any change in the dependent variable. inferential statistics Procedures that allow the analyst to draw inferences or conclusions from the data. It is common in statistical analysis, for example, to deduce the characteristics of the population from what occurs in a sample. mean (M) The arithmetic average of a set of values. The mean of a population is represented by the symbol µ. measures of central tendency Indicate what measure is most typical in a data set and include the mean, median, and mode. measures of variability Indicate how much variety there is in a set of data values. Also called measures of dispersion. median (Mdn) The middle number when data are ordered. © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 25 3/3/16 10:02 AM Summary and Resources mixed methods research Research involving both qualitative and quantitative variables, as well as methods that are appropriate to each. mode (Mo) The most frequently occurring value in a set. parameters Population characteristics; symbolized by Greek letters, such as µ for the population mean and σ for the population standard deviation. population Includes all members of a defined group. qualitative variables Defined by the kind of characteristic they represent, such as gender or eye color. quantitative variables Defined by the amount of the characteristic they represent, such as intelligence. range (R) The difference between the highest and lowest values in a data set. research design A formal plan for conducting a study. It specifies the variables to be studied; indicates who the subjects in the experiment will be, as well as how they will be selected; specifies how the data will be gathered, including how the independent variable will be manipulated; and indicates the type of analysis to be used. sample Any subset of a population. standard deviation The sample standard deviation (s) and the population standard deviation (σ) indicate how much individual scores tend to differ from the mean of the respective group. statistic A characteristic of a sample. Some common statistics are the mean (M), the standard deviation (s), and the range (R). variables Characteristics that can have changing values. variance (s2) The square of the standard deviation. The variance is one measure of how much individual scores differ from the mean of the group. Review Questions Answers to the odd-numbered questions are provided in Appendix A. 1. A researcher is interested in how people of different political affiliations contrast. a. Their different political affiliations represent data of what scale? b. If the contrasted characteristic is income, what is the scale of that variable? c. Arranging people from most involved to least involved in politics results in data of what scale? d. What measures of central tendency should be calculated for ordinal scale data? 2. A psychologist tracks the number of times a subject responds to a stimulus. a. Of what scale are the data indicating the number of responses? b. What measure or measures of central tendency are appropriate for data of this scale? © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 26 3/3/16 10:02 AM Summary and Resources 3. A group of clients being treated for substance abuse have the following measures on a compulsive behavior scale: 55, 47, 62, 27, 50, 49, 66, 53, 50, 44, 63, 59. Calculate the following: a. the mean   b. the median   c. the range   d. the standard deviation 4. Subjects in a research project are classified according to whether they have been involved in a relaxation therapy program. After some receive the therapy, they rank themselves from 1 to 10 according to how calm they feel. a. Of what scale are the data regarding whether they participated? b. What measure(s) of central tendency is/are appropriate for the attended/didnot-attend data? c. What is the scale of the level-of-calm data? 5. These scores are measures of compulsive behavior for those involved in a therapy session for people with compulsive disorders: 24, 25, 28, 28, 31, 33, 36, 36, 36, 39, 40, 53, 54. a. Calculate the mean, the variance, and the range. b. If a new client with a score of 35 is added, i. What happens to the mode? Why? ii. What is the effect on the variance? iii. What is the effect on the range? iv. Why are the range and variance affected differently? v. Would adding a 22 increase or decrease the variance? Why? 6. Why is calculating the mean inappropriate for data of ordinal scale? 7. Why are the standard deviation and variance inappropriate for ordinal scale data? 8. A researcher is trying to predict retirements and has surveyed the number of years a group of social workers have been employed. The data are as follows: 2, 5, 8, 11, 13, 16, 22, 27. a. What is the scale of the data? b. Calculate values for the mean and the median. c. Without doing the calculations, explain the effect on the standard deviation of including a ninth person who has been employed 13 years. d. What will be the effect of that ninth person’s tenure on the value of the range? e. Which would increase most dramatically by the addition of someone who had been employed 28 years: the range or the standard deviation? Check your answers to c, d, and e with calculations. 9. Determine the scale in each situation below. Psychologists are: a. listed according to how many clients they see. b. classified according to whether they are public employees or private practitioners. c. ranked according to how much their receptionists like working for them. For each of the situations in a, b, and c, what is the most sophisticated measure of central tendency that can be calculated? For which situation should one calculate a standard deviation? © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 27 3/3/16 10:02 AM Summary and Resources 10. What is the difference between M and µ? 11. What does σ symbolize? 12. Why is the “correction for a biased estimator” inserted into the standard deviation and variance formulas for samples? 13. A researcher wants to determine the impact that positive reinforcement has on ­subjects’ response rates. a. What is the independent variable? b. What is the dependent variable? 14. Using your recently developed “AssessmeNt Gauging Supervisors’ Tension” instrument (ANGST, for short), you gather anxiety data for officers in law enforcement with ­supervisory responsibility. Their scores are as follows: 11, 19, 20, 21, 19, 14, 19, 21. What are the values of the following? a. b. c. d. n M Mdn s2 Answers to Try It! Questions 1. Absolutely! You must have driven through rural little towns out in the middle of nowhere with signs indicating a population of 22. All that is required for a population is that all individuals in the described group be included. Although we tend to think of populations as large by definition, they need not be. 2. a. Anything that references a “more than” or “less than” comparison will be ordinal scale. b. Your bank statement is ratio scale. If it says you have a zero balance, it means you have no money in the account. 3. The effect on the range is going to be minimal; it is going to increase by just one. While it is impossible to know the exact effect on the standard deviation ­without having all of the data, it will likely to be more than one because the difference between the individual score, 12 in this case, and the mean (M 5 6.0) will be squared (12 – 6 5 6, 62 5 36) and then summed with the other squared differences. 4. In statistics, bias means that a consistent error is made in the same direction. A test that consistently predicts the problem-solving ability of members of one group better than it predicts for another has biased scores. In our case, using population formulas to calculate the standard deviation from sample data will consistently underestimate the variability in the parent population. That consistent error constitutes bias and is why the “correction for a biased estimator” (the 21 correction in the denominator) is inserted into the formula for sample data. © 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution. tan82773_01_ch01_001-028.indd 28 3/3/16 10:02 AM For this discussion, you will organize data sets into meaningful number groupings, calculate basic descriptive values, and communicate written critiques of statistical analyses. This exercise requires the use of a descriptive statistics calculator. You can find this tool in some versions of Excel (as part of the Analysis ToolPak) or you can use one of the many free online descriptive calculators such as the Descriptive Statistics Calculator (Links to an external site.)Links to an external site. by Calculator Soup. To begin, come up with 20 different data points (that will form a set of data) and enter them into the first column of an Excel spreadsheet. The data points can be any numbers you want as long as there are 20 of them. You will then use the descriptive statistics option in your descriptive statistics program or calculator. This is explained in Chapter 1 of your course text. You should get an output similar to the image in Figure 1.1. This output must contain the following values: mean, standard error, median, mode, standard deviation, sample variance, kurtosis, skewness, range, minimum, maximum, sum, and count. Address the following points in your initial post: • • • Begin your discussion by reporting your results for each of the values listed above. Based on this output, which single value best describes this set of data and why? If you could pick three of these values instead of only one, which three would you choose and why? It is important to note that the answers to these questions may be different for each of you since you are each using unique sets of data. For this exercise, you may elect to use the "Analysis ToolPak" within Excel. This feature is already part of the 2007 and 2010 Excel program for Windows; however, it must be activated. The following directions were provided by the Help function within Excel. If you experience issues while following these steps, utilize the Help function within Excel or contact Microsoft's technical support for Excel. The Analysis ToolPak includes the tools described below. To access these tools, click Data Analysis in the Analysis group on the Data tab. If the Data Analysis command is not available, you need to load the Analysis ToolPak add-in program. Load the Analysis ToolPak: 1. Click the File tab, click Options, and then click the Add-Ins category. 2. In the Manage box, select Excel Add-ins and then click Go. 3. In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK. Tip: If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it. If you are prompted that the Analysis ToolPak is not currently installed on your computer, click Yes to install it. https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached.

Surname: 1
Statistical Data Analysis
Institution
Name
Professor
Due Date
Statistical Data Analysis

Results
Data
4
3
2
5
4
6
7
3
6
8
9
6

Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance

4
5
4
3
6
7
8
3

Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Confidence Level (95.0%)

Data Analysis
5.210526316
0.462506187
5
3
2.016017729
4.064327485
0.953768682
0.224440321
7
2
9
99
19
0.971689442

All the measures of tendency and required statistics are as shown on the data analysis as obtained from
the excel sheet.

Surname: 2
Based on this output, which single value best describes this set of data and why?
The mean will be the best value that describes the data.
Reason
The mean will be used in this case because it is more descriptive of the full data set. It also helps in t...


Anonymous
Really useful study material!

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags