Statistics in criminology

User Generated

Ynqlybir1

Mathematics

Description

Using the uploaded material (PDF book) discuss the three measures of central tendency (mode, mean, medium) in depth, to include when each is best used in statistical analysis. Explain rationale with a minimum of 300 words, also must support the discussion with two other scholarly references remember to use in-text citations. Reference for the PDF: Walker, J. (2009). Statistics in criminology and criminal justice (4th ed.). Sudbury, MA: Jones and Bartlett.

Unformatted Attachment Preview

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. What do you want to do? Describe How many variables? Univariate Bivariate What level of data? Nominal Ordinal Central tendency Central tendency Mode 16304_CH04_Walker.indd 92 L I D D E L L , T I F F A N Y 1 5 6 Median 8 T S Make inferences Multivariate Interval/ Ratio Central tendency Mean 8/2/12 3:41:29 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. Chapter 4 Measures of Central ­­Tendency L I D Learning Objectives Dmean as measures of central tendency. ■■ Understand the mode, median, and ■■ Identify the proper measure of central E tendency to use for each level of measurement. L ■■ Explain how to calculate the mode, median, and mean. L , 4-1 Univariate Descriptive Statistics Using frequency distributions and graphical T representation, as in Chapter 3, helps researchers determine how the data is arranged and summarize it. Frequency distribuI tell the entire story. It is usually necessary tions and graphs, however, cannot always to summarize the data further. Instead ofFsummarizing entire distributions, it is more often efficient to compare only certainFcharacteristics of the data. To conduct this comparison, it is helpful to know certain information, such as the form of the distribution, the average of the values, and howA spread out they are within the distribution. This is where univariate descriptiveN statistics come into play. Univariate descriptive statistics are used to describe and interpret the meaning of a distribution. They Y are called univariate because they pertain to only one variable at a time and do not attempt to measure relationships between variables. Univariate descriptive statistics make compact characterizations of distributions in terms of three properties of the 1 data. First is the central tendency, which translates to the average, middle point, or 5 most common value of the distribution. The second property is the dispersion of the 6 are around the central measure. Finally, data. This relates to how spread out the values there is the form of the distribution. The 8 form of a distribution relates to what the distribution would look like if displayed graphically. Included in the form of a disT tribution is the number of peaks, skewness, and kurtosis. In this chapter we address Smeasures of central tendency. Measures of the first univariate descriptive procedure: dispersion and measures of the form of a distribution are covered in Chapters 5 and 6, respectively. 93 16304_CH04_Walker.indd 93 8/2/12 3:41:29 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 94    Chapter 4 4-2 n Measures of Central ­­Tendency Measures of Central Tendency Measures of central tendency examine where the central value is in a distribution or the distribution’s most typical value. There are three common measures of central tendency, one for each level of measurement (interval and ratio are combined). These are the mode for nominal level data, the median for ordinal level data, and the mean for interval and ratio level data. L I (symbolized by Mo). The mode is At the lowest level of sophistication is the mode used primarily for nominal data to identify the D category with the greatest number of cases. The mode is the most frequently occurring value, or case, in a distribution. It D is the tallest column on a histogram or the peak on a polygon or line chart. The mode has the advantage of being spotted easily in aE distribution, and is often used as a first indicator of the central tendency of a distribution. L The mode is the only measure of central tendency appropriate for nominal variL ables because it is simply a count of the values. Unlike other measures of central ten, dency, the mode explains nothing about the ordering of variables or variation within Mode the variables. In fact, the mode ignores information about ordering and interval size even if it is available. So it is generally not advised to use the mode for ordinal or T interval level data (unless it is used in addition to the median or mean) because too I much information is lost. There is no formula or calculation for theFmode for either grouped or ungrouped data. The procedure is just to count the scores and determine the most frequently F occurring value. Consider the data set in Table 4-1, which is the number of prisoner AHere, there are 15 total escapes. There escapes from 15 prisons over a 10-year period. are two 7’s, one 6, three 5’s, two 4’s, four 3’s, one N 2, and two 1’s. The mode in this data set would be 3 escapes, because there are more 3’s than any other value. Y 7 5 4 3 2 7 5 4 13 1 6 5 3 53 1 6 Data Table 4-1 Ungrouped 8 For grouped data, determining the mode is often even easier because the numbers are already counted. The data from Table 4-1 T has been grouped in Table 4-2. What is the mode of this data set? Here you simply determine the category that has the highest S value. In this case it would be the 3–4 category because it has a frequency of 6. If the data were plotted on a bar chart or polygon, the distribution would look like that in Figure 4-1. Here, you can see that the category 3–4 has the highest bar on the bar chart 16304_CH04_Walker.indd 94 8/2/12 3:41:29 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 4-2 Measures of Central Tendency X f 7–8 2 5–6 4 3–4 6 1–2 3 95 Table 4-2 Modal Value for Grouped Data L I D How do you do that? Obtaining Univariate Statistics in SPSS D E central tendency, measures of dispersion, Univariate statistics include measures of and form. You can obtain all of these statistics in the same procedure in SPSS, and L it is just an extension of the same procedure you used in Chapter 3 to obtain a freL quency distribution. The steps to follow are: , 1. Open a data set. 2. 3. 4. 5. 6. 7. 8. 16304_CH04_Walker.indd 95 a. Start SPSS. b. Select File, then Open, then Data. T c. Select the file you want to open, then select Open. I Once the data is visible, select Analyze, then Descriptive Statistics, then FreF quencies. F is checked. Make sure the Display Frequency Tables Select the variables you wish to include A in your distribution and press the c between the two windows. N Select the Statistics button at the bottom of the window. Y Check the boxes of any of the univariate measures you want to include in your research. a. For measures of central tendency1(this chapter), check the boxes in the frame Central Tendency, typically the5mode, median, and mean. b. For measures of dispersion (Chapter 5), check the boxes in the frame Disper6 sion, typically the standard deviation, variance, and range. 8 check the boxes in the frame Distribution, c. For measures of form (Chapter 6), specifically skewness and kurtosis. T Select Continue, then ok. S An output window should appear containing a distribution similar in format to Table 4-3. 8/2/12 3:41:29 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 96    Chapter 4 n Measures of Central ­­Tendency 7 6 5 4 3 2 L 0 I 1–2 3–4 5–6 7–8 D Figure 4-1 Bar Chart and Polygon of Grouped Data from Table 4-2 D What is your highest level of education? E Valid Cumulative L Percent Percent Percent Value Label Value Frequency Less than High School 1 16 L    4.6    4.8    4.8 GED 2 59 , 17.0 17.6 22.3 1 High School Graduate 3    8    2.3    2.4 24.7 Some College 4 117 33.7 34.8 59.5 College Graduate 5 20.7 21.4 81.0 Post Graduate 6 18.4 19.0 100.0    Missing    Total N Valid Missing Mean T 72 I 64 11 F 347 F 336 A 11 N 4.08Y Median 4.00 Mode 4 Std. Deviation Variance Skewness Std. Error of Skewness Kurtosis Std. Error of Kurtosis Range    3.2 100.0 100.00 1 5 2.131 6 2.477 .133 8 2.705 T .265 S 1.460 5 Table 4-3 Combination Table for Education from the 1993 Little Rock Community Policing Survey 16304_CH04_Walker.indd 96 8/2/12 3:41:29 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 4-2 Measures of Central Tendency 97 and it forms a hump in the polygon. This highest bar or hump indicates the mode for that variable. One caution when discussing the mode. The mode is not the frequency of the number that occurs most often but rather, the category (or class) itself. It is easy to want to state that the mode in Table 4-2 is 6 because that is the frequency that is highest. This is not the mode, however; the mode is the category of the value that has the highest frequency: in this case, 3–4. Data that is in a frequency table alsoL makes calculating the mode easy. What is the mode in the frequency table in Table 4-3? The mode in this case is 4, or some college. I Note that in this case, the mode can be written as either 4 or some college. When using D are assigned to the values, the mode can be nominal or ordinal data where value labels expressed as either the value (number) orDthe value label. The histogram with a polygon overlay for the data in Table 4-3 is shown in Figure E 4-2. As shown in the figure, the highest bar on the histogram or the hump in the polygon is at the 4 or some college level. TheLmode as calculated here is what is obtained from SPSS. In the output in Table 4-3, the Lmode is identified as some college (4), with a frequency of 117. Notice also that the median, mean, and other measures are also , included in this table. This is typical univariate output from SPSS. It provides most of the univariate descriptive statistics discussed in this chapter and the two that follow. Table 4-3 may look somewhat daunting T right now, but by the time you finish Chapter 6, a frequency table and univariate output such as this should be shorthand for everyI thing you need to know about a distribution. A distribution is not confined to having F only one mode. There are often situations where a distribution will have several categories that have the same or similar frequenF cies. In these cases, the distribution can be said to be bimodal or even multimodal. It is also possible for a distribution to have noA mode if the frequencies are the same for each category. If the data in Table 4-1 is modified, N a bimodal, multimodal, and a data set with no mode can be created, as shown in Figures 4-3 to 4-5. In Figure 4-3, categories Y 140 Frequency 120 100 80 60 40 20 0 1 5 6 8 T S Less Than GED High School Some College Post High School Graduate College Graduate Graduate Education Type Figure 4-2 Histogram and Polygon of Education Responses 16304_CH04_Walker.indd 97 8/2/12 3:41:29 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 98    Chapter 4 n Measures of Central ­­Tendency 3 2 1 0 1 2 Figure 4-3 3 2 1 0 1 2 Figure 4-4 3 L I 3 4 D 5 6 7 Bimodal DDistribution E L L , T I F F 3 4 5 6 7 A Multimodal Distribution N Y 1 5 6 8 T S 2 1 0 2 3 4 5 7 Figure 4-5 No-Modal Distribution 16304_CH04_Walker.indd 98 8/2/12 3:41:30 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 4-2 Measures of Central Tendency 99 3 and 4 both have the same frequency: 3. In this case, both the 3 and the 4 would be the modes because each has the same (highest) value. In Figure 4-4, the 3, 4, and 5 categories all have frequencies of 3. This means that all three categories would be the mode for this distribution. When almost half of the categories in the distribution represent the mode, its use as a measure of central tendency is reduced. In Figure 4-5, all of the categories have the same frequency. This does not happen very often, but it is possible, especially L in survey research or with other data that have a limited range of categories. The mode as a measure of central tendency in this I case is practically useless, although it would be beneficial as a way of stating that D Although each of the modes in Figures all of the values have the same frequency. 4-3 (3 and 4), 4-4 (3, 4, and 5), and 4-5D (2 through 7) have the same frequency, that does not always have to be the rule. There is some debate as to what constitutes a E bimodal or multimodal distribution. Some propose that the frequencies have to be L Others argue that practically any peaks the same for a distribution to be multimodal. in a distribution can represent modes. For L example, in Figure 4-1, some would argue that both the 1–2 category and the 3–4 category represent a mode. These people , spike in the frequency, may represent a argue that any peak in a polygon, or any mode. In this text, only the category or categories with the highest frequencies will be designated the mode. T I F F A Measures of central tendency are among the oldest of all descriptive statistics. N The mean, for example, can be traced back to Pythagoras in the 6th century BC, although its development is surely much Y earlier. Galton (1883) coined the term median during his work on percentiles, but the procedure was used before this by Fechner for arriving at a value of the “middlemost ordinate.” Finally, Karl Pearson reduced the concept of the “abscissa corresponding to the ordinate of maximum frequency” to the mode in his 1895 work. 1 5 6 As can be seen from these distributions, 8 the mode quickly becomes ineffective when there are multiple modes and is worthless (except for an understanding of the T nature of the distribution) when each category has a modal value. This is why the mode is not widely used as a measure of S central tendency in statistics except for nominal level data. 16304_CH04_Walker.indd 99 8/2/12 3:41:30 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 100    Chapter 4 n Measures of Central ­­Tendency Median If the data is at least ordinal level, the median (symbolized by Me) may be a better choice for examining the central tendency of the distribution. The median is the point of the 50th percentile of the distribution. This means that the median is the exact midpoint of a distribution or the value that cuts the distribution into two equal parts. For the simple distribution 1, 2, 3, the median would be 2 because it cuts this distribution in half. Note that 2 is not the most frequently occurring or the product of some formula but simply the value in the L middle. The median will always be the middle value, but sometimes it will be necessary I to resort to math to determine the exact middle value. D because it does not imply distance The median is used with ordinal level data D or below it. Recall from “Variables between intervals, only direction: above the median and Measurement” (Chapter 2) that the nature E of ordinal level data is that you can determine which category is greater than or less than another category, but there are not equal intervals so there is no way to determineL how much greater or lesser the category is. The median also works on this principle, determining the midpoint of a distribution L such that a category can be said to be less than, or greater than the median, but there is no way to tell by how much. For example, take the following two distributions: 1, 2, 3, 3, 4, 4, 5     1, 1, T 1, 3, 10, 50, 100 I Each has the same number of values, 7, although each has very different numbers. In this case, the modes would be different: 3 F and 4 in the first; 1 in the second. Also, the means would be different: 3.14 in the first, 23.71 in the second. The median for F both of these distributions, however, would be 3, the middle value in the distribution. A the median and three values above In both distributions there are three values below the median. N The median may be used instead of the mean in a special circumstance where Y the distribution takes on the quality of being skewed. The mean (discussed in the next section) is often highly influenced by extreme scores. For example, if you were to calculate the mean, or average, age of four 1 people who are 2, 3, 4, and 50 years old, the mean would be 14.75 years old. Obviously, 14.75 is not a good measure of 5 the central value in this distribution, but because of the way the mean is calculated, 6 that is the value that would be obtained. The median of that distribution would be 3.5, which is much more like the central value. Even 8 in the example above, the mean of the second distribution is 23.71, which is not really representative of the distribution. T Note, however, that if the variable is interval and the distribution is not skewed, some S the median as the measure of central of the explanatory power of the data is lost using tendency. 16304_CH04_Walker.indd 100 8/2/12 3:41:30 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 4-2 Measures of Central Tendency 101 Median for Ungrouped Data Calculation of the median for ungrouped data is relatively simple. All that is needed is the N for the distribution. If the N is not given, simply count the number of scores (remember—do not add the scores, count them). The N is then placed in the formula 1N 1 12 . If the data from Table 4-1 were expanded, the median can be calculated as 2 shown in Exhibit 4-1. There are 23 values here. Adding 1 L to this number and then dividing by 2 obtains the exact middle of the distribution, in this case the 12th value. Once this value is I calculated, if the numbers are not arranged in order, you should do so. This ensures Dactually in the middle and that the numbers that the middle value of the distribution is are arranged in order. Then, beginning D with the lowest value, simply count up the ungrouped data until the value obtained in the formula is reached (the 12th value in E Exhibit 4-1). This is the median. In this case, counting to the 12th value would produce L in this distribution is 3. a score of 3. So the median number of escapes There are several issues to note about L the median. These are important to understand when interpreting the median. First, be careful when calculating the median , because two different numbers must be dealt with. The value that is obtained from the formula is not the median but simply the number of values to count up in the distribution to find the median (or median class T for grouped data). The median is the score or class that contains the number from the formula. In the example above, the median is I up from the beginning of the distribution not 12. Twelve is only the number to count F to find the median, which is 3. Also, if there is more than one of the F same score in the median class (there are three 3’s in Figure 4-5), the median is still that score even though it occurs more than A value, regardless of how many there are once. The key value to look for is the middle of that particular category. This will be N brought up again in terms of calculating the median for grouped data where the classY interval is greater than 1. Finally, unlike the mode, the median does not have to be a value in the distribution. For an odd number of scores (as in Exhibit 4-1) the median will be one of the scores 7 6 6 6 6 5 5 5 5 4 4 3 3 3 2 2 2 2 2 2 2 1 1 Exhibit 4-1 16304_CH04_Walker.indd 101 1 5 6 8 T S N11 23 1 1 5 2 2 24 5 2 5 12 Ungrouped Data 8/2/12 3:41:31 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 102    Chapter 4 n Measures of Central ­­Tendency because it is the point that cuts the distribution in half. If there are an even number of scores, however, the median will fall in between two of the scores. For example, 1N 1 12 in the distribution 3, 4, 5, 6, 7, 8, 9, 10, the formula would give a value of 2 4.5. This would put the median between the 6 and 7. When this occurs, the median is a value halfway between the two scores. In this case, the median would be 6.5. This holds true even if the two numbers do not have an interval of 1. For example, in the distribution 5, 6, 8, 10, 11, 12, the number from the formula puts the median between L the score of 8 and the score of 10; therefore, the median would be 9. I Median for Grouped Data D For grouped data where the class interval is 1 or where the entire class can be used D as the median, the process for finding the median is essentially the same as that for E to count up in the distribution using ungrouped data. The first step is to find the value 1N 1 12 L the formula . Then, simply count up the frequency of each class to find the 2 median class. If the data from Exhibit 4-1 is L grouped into a frequency distribution it would look like Exhibit 4-2. , The first step is to determine the midpoint using the formula. Since the data in Exhibit 4-2 has not changed, there are still 23 values (escapes). Plugging this value into the formula would, as before, result in aTvalue of 12. Since the values are in a frequency distribution, they are probably already I ordered. Although it is possible to find the median beginning from either the lowest or highest category, it is best for F consistency to begin with the lowest value. In this case, you would begin with the class F is reached, which is 3. Note that it of 1 and count the frequencies until the 12th value is possible here to count from 10 to 12 and still A be in the 3 class. This is fine, as long as the value we are looking for, 12, is one of the numbers in that class. If the middle N value from the calculation had been a 10, 11, or 12, the 3 class would still have been Y the median class. X 7 6 5 4 3 2 1 N f 1 4 4 2 3 7 2 23 1 5N 1 1 5 23 1 1 2 2 6 24 8 5 2 T S 5 12 Exhibit 4-2 Grouped Data with an Interval Class of 1 16304_CH04_Walker.indd 102 8/2/12 3:41:31 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 4-2 Measures of Central Tendency 103 The same procedure is used if the data is grouped with a class interval greater than 1 but where a median class is sufficient. The process for calculating the median where only the median class is desired is shown in Exhibit 4-3. This frequency distribution has the same N as the previous distributions, so the first step will be the same: calculating the value to count up to. Here again, the value is 12. Beginning with the lowest class and counting up to 12 will put you in the 16–20 class. This class contains between the 10th and 14th cases, but because it contains the value from the calculation, it is the median class. L Looking again at Table 4-3, the data in this distribution could be either nominal I or ordinal. It could be argued, for example, that including a GED in the distribution disrupts the ordering of the categories D such that the data should properly be called nominal. It could also be argued, however, D that the categories are sufficiently ordered to be called ordinal. For that reason, and to ensure some consistency of examples, the E same frequency distribution used to discuss the mode is used here to discuss output L for the median. The median in Table 4-3 is the same L as the mode, some college (4). This was obtained in the same manner as described above: , 347 1 1 5 174 2 T at the 1 category, which is on the top in this Counting up in the distribution (beginning example) puts the median in the 4th category I (16 + 59 + 8 + 117). Since this category contains between the 83rd and 200th cases, it contains the 174th value. Also, since the F category containing the median is sufficient in this instance, the median is said to be F some college, or 4. Calculating an exact median for grouped A data with a class interval greater than 1 is somewhat more complicated. Using the data set in Exhibit 4-3, the procedure to N 1N 1 12 calculate an exact median begins as all others, with the formula . This produces Y 2 the value 12, the same as in previous examples. This means that the 16–20 class is the X 31–35 26–30 21–25 16–20 11–15 6–10 1–5 N f 2 3 4 5 4 3 2 23 Exhibit 4-3 16304_CH04_Walker.indd 103 1 5 N11 Step 1. Find6 the median interval 5 12. 2 8 Step 2. Count up in the frequency for that class. Step 3. ThatT is the median class for this distribution (16–20). S Calculating Median Class for Grouped Data 8/2/12 3:41:31 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 104    Chapter 4 n Measures of Central ­­Tendency median class. As stated above, we could count to 14 in this class, beyond the 12 needed to establish a median value. The question then becomes: Where in this class does the median lie? To find out requires interpolation within the class. Assuming the scores are evenly distributed within the class,1 the formula for calculating the exact median is Me 5 Lm 1 a 0.5N 2 cfbm bi fm L where Lm is the lower limit of the median class, cfbm the cumulative frequency of the I of the median class, and i the width interval below the median class, fm the frequency of the interval of the median class. Using thisD formula with the data from Exhibit 4-3, the median is calculated as follows: D 0.5 1 23 2 2 9 Me 5 15.5 1 a E b5 5 L 11.5 2 9 5 15.5 1 a L b5 5 , 5 15.5 1 a 2.5 b5 5 T I 5 15.5 1 2.5 F 5 18 F Aclass. N is 23, as in all other examples. The value of 15.5 is the lower limit of the 16–20 The cumulative frequency is determined by adding N up all the frequencies below the class containing the median. In this case, there are three classes below the median Y class: 1–5, 6–10, and 11–15. The frequencies of these classes (2, 3, and 4, respectively) 5 15.5 1 1 0.5 2 5 equal 9 (cfbm). The frequency of the class containing the median in this case (16–20 class) is 5 (fm). Finally, the interval is calculated 1 by subtracting the lower limit of the median class from the upper limit (20.5 2 15.5 = 5). In this case, the result of the cal5 culations shows that the exact midpoint of the distribution is 18.2 6 Calculating the exact median in actual research is less often necessary. Most statistical programs report the exact median from8the ungrouped data, or the researchers report only the median class or report the midpoint of the median category as the T median. There are times, however, when it is necessary to determine the exact median S you may wish to know the exact from information in journal articles. For example, median from Table 4-4. This table shows categories that are not only greater than 1 but are unequal. The procedure would be the same as discussed above, however. Here, the median category would be 51 to 75% [(40 + 1)/2 = 20.5]. Interpolating where in that 16304_CH04_Walker.indd 104 8/2/12 3:41:32 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 4-2 Measures of Central Tendency 105 What portion of your professional research focuses on racial or ethnic issues? Portion Number 0–10% 6 Percentage 15 11–25% 1 2 26–50% 12 30 51–75% 14 35 Over 75% 18 L7 I40 Source: Edwards, White, and Pezzella (1998). D Table 4-4 Research on Racial or Ethnic Issues D Eusing the formula given above. Application class the exact median would be involves of that formula for the data in Table 4-4 L is shown below. L 2 cfbm 0.5N Me 5 Lm 1 a bi , fm 0.5 1 40 2 2 19 b25 14 T 5 50.5 1 a I 20 2 19 b25 F 14 F1 5 50.5 1 a b25 A14 5 50.5 1 N 0.07 1 25 2 5 50.5 1 Y 1.75 5 50.5 1 a 5 52.25 1 As you would expect, the median does not go very far into the median class in this 5 example. This is evident because the frequency below the median class is 19, and the 6 exact median is only 20.5. This process is complicated somewhat 8 when the median class is open-ended. For example, what is the midpoint in a distribution where the upper category for annual T several methods of dealing with this issue. income is $30,000 and greater? There are S what a reasonable midpoint might be. Probably the best is to attempt to determine This is also shown in the example in Table 4-4, which has two open-ended categories: less than high school and postgraduate. This would make it difficult to determine, for example, where the midpoint of a postgraduate degree would lie (some graduate 16304_CH04_Walker.indd 105 8/2/12 3:41:32 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 106    Chapter 4 n Measures of Central ­­Tendency work, master’s degree, law degree, etc.). This would have to be a judgment call by the re­searcher based on theory and an understanding of the data. Mean A statistician is a person who stands in a bucket of ice water, sticks his head in an oven, and says, “On average, I feel fine.” —Unknown L The most popular measure of central tendency,Iboth among statisticians and the general population, is the mean. The mean is used primarily for interval and ratio level data. D Because it assumes equality of intervals, the mean is generally not used with nominal D to statistical analysis because it is the or ordinal level data. The mean is very important basis, along with the variance (see the discussion E of measures of dispersion in Chapter 5), of many of the formulas for higher-order statistical procedures. The mean also L serves as a check on the integrity of the data. As discussed above, the mean is often heavily influenced by extreme scores. So if aL 17 has been mistyped as 177, the mean will be much larger than expected. Mean scores , outside what would be expected for the data should be a signal to recheck the data. There are actually several different versions of the mean. The mean discussed in this chapter is the arithmetic mean (from hereTon, called the mean). There are variations of the mean that are less utilized in social I science research and are not discussed here. These include the weighted mean, harmonic mean, and geometric mean. F The symbolic notation for the mean is different than symbols that have been used to this point. The mean is symbolized either byFm or X, depending on whether the data is a population or sample estimate (this distinction A is used most often in Chapter 15 and beyond). It is interesting that descriptive statistics deals with a population, but it N has become convention that the mean most commonly used in descriptive statistics is Y most texts use this notation for the actually the symbol for the sample mean (X). Since mean in descriptive analyses, it will also be used here, even though the more proper notation would be the population mean (m). 1 The mean is simply the average of all the values in a distribution. To obtain the mean, add up the scores in a distribution and5divide by N (just as in calculating an average). In statistical terms, the mean is calculated 6 as X5 8 Sfx NT S the frequency for each value. In the where fx is calculated by multiplying X times example used in Exhibit 4-2, the mean would be calculated as in Exhibit 4-4. Here, each X is multiplied by the frequency for that category (7 3 1, 6 3 4, etc.). That cre- 16304_CH04_Walker.indd 106 8/2/12 3:41:32 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 4-2 X 7 6 5 4 3 2 1 f fx 1   7 4 24 4 20 2   8 3   9 7 14 2   2 N     23 ofx 84 Measures of Central Tendency 107 Sfx N 84 5 23 5 3.65 X5 L I Exhibit 4-4 Calculating the Mean D ates an fx column in the table, which is then D summed to obtain Sfx (84). That value is then divided by the N for the distribution (23) to obtain the mean for the distribution. E In this case, there were 23 prisons that had a total of 84 escapes, so the mean (average) L 3.65 escapes. number of escapes for these 23 prisons was The procedure for calculating the mean L for grouped and ungrouped data is the same. The only difference is that for grouped data where the class interval is greater , than 1, the midpoint of the class is used as X.3 For example, in the frequency distribution in Exhibit 4-3, the midpoints of the classes would be 2.5 (5.5 2 0.5 = 5; 5/2 = 2.5), 8.5, 13.5, and so on. These are T the values that would be used for X in the formula for the mean. I The mean can be estimated from the example output that was used for the mode and median (as shown in Table 4-3). NoteFthat this data is not interval or ratio level and is used here only to show the similarities F and differences among the mean, median, and mode. Even though this is nominal/ordinal level data, SPSS treats it as interval A level and uses the formula above for calculating the mean. In this example, each of N the category values (1 through 6) are multiplied by the frequency for that category (1 3 16, 2 3 59, etc.). This fx value is summed Y to achieve a total of 1370. This is then divided by N minus the 11 missing values for a total of 336. The result is 4.077, which is what SPSS reported (rounded to 4.08). 1 may not be accompanied by a frequency In most cases in real research, the mean distribution, or the frequency distribution5will be more for presentation than for analysis. In such cases, the mean may be reported alone, or it could be reported as part of a discussion or table of univariate statistics6associated with the research. 8 other measures of central tendency. From The mean has several advantages over a practical standpoint, the mean is preferred T because it is standardized. This means it can be compared across distributions. This is very beneficial when comparing similar data from different sources, such as theS mean number of prisoners per institution in several states, because the two values can be directly compared. The mean is also important because the sum of the deviations of the scores from the mean is always 16304_CH04_Walker.indd 107 8/2/12 3:41:33 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 108    Chapter 4 n Measures of Central ­­Tendency zero. That is, if each value in a distribution were subtracted from the mean, the sum of those scores would be zero. This is discussed in detail in Chapter 5. A final important characteristic of the mean is that the sum of the squared deviations from the mean is the smallest value for summed deviations (smaller than if the same calculations were made for the mode or median). This principle of sum of squares is very important to our discussions in Chapter 5 of the variance and sum of squares as they relate to regression lines. As discussed above, the greatest problemLwith the mean is that it is greatly influenced by extreme scores in the distribution. The example in the section on the median, I where a mean age of 15 was obtained when all but one of the values was less than D by extreme scores. That is why the 5, shows how much the mean can be influenced median is used in cases where the data is skewed. D E 4-3 Selecting the Most Appropriate Measure of L Central Tendency L The goal of many statistical analyses is to be able to develop summary statements, , often about a large amount of data. Proper summarization depends on several factors, including the level of data, the nature of the data, the purpose of the summarization, and the interpretation. T The level of data has a substantial influence on which measure of central tendency should be used. As stated earlier, one measureIis most appropriate for a particular level of data. The mode is most appropriate for nominal F level data, and its use with ordinal and interval level data would result in a loss of Fpower in terms of the information that could be gained from the data. The median is most appropriate with ordinal level data. Although it can be used with interval level A data (especially skewed distributions), it should not be used with nominal level data Nbecause the rankings assumed in the median cannot be achieved with nominal level data. Finally, the mean should be used Y only with interval or ratio level data because it assumes equal intervals of the data that cannot be achieved by nominal and partially ordered ordinal level data. The exception here is that the mean can be used with dichotomized nominal level data because this 1 type of data approximates interval level characteristics. 5 Selection of the most appropriate measure of central tendency is also sometimes 6 based on the nature of the distribution. As discussed above, if a distribution is highly skewed, or if it can be determined that there are 8 some extreme values (outliers) in the distribution that would make the mean inaccurate as a measure of central tendency, the T median should be used rather than the mean. S of central tendency is the purpose of The second criterion for choosing a measure summarization, typically in terms of what you are trying to predict. Imagine that you were asked to state one measure that would best capture the nature of a distribution. 16304_CH04_Walker.indd 108 8/2/12 3:41:33 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 4-3 Selecting the Most Appropriate Measure of Central Tendency 109 How would you go about that? To put it another way, you might bet $100 to guess a number drawn at random from a distribution. Which number would you choose? One way to address these questions would be to find the score that would be at the “heart” of the distribution: the most common score, the one that cut the distribution in half, or the average score. That is the goal and the role of measures of central tendency. There are several ways to go about this. If you knew all the values in the distribution, you could calculate the mode easily and quickly. If you are interested in predicting L an exact value, you should probably use the mode because it has the highest probability of occurring in any given distribution. I Both the median and the mean may produce values that are not in the distribution, so if you must guess and be absolutely rightD as to the number, use the mode. For example, say you are taking a multiple-choice testDand have no idea which answer to a certain question is correct. If you had the distribution of correct answers for that professor E for that test, you would want to choose the modal answer rather than the median or L correct or it does not count. As another mean. This is because you must get the answer example, consider a prediction based onL driving a car around an obstruction placed in front of it. If tests occur over a number of drivers, the distribution would be bimodal: , right. A suggested course of action would some steering to the left and some to the not be the median or mean, however, as that would have the vehicle crashing into the obstacle even though it minimized the error T in steering. If, on the other hand, you want to maximize your prediction by getting closest to I the number over several tries, thereby minimizing your error, the median might be a better choice. Here, whether you miss high F or low is irrelevant; what is important is the size of the error. In a popular game show,F contestants are given $7 and required to guess the exact numbers included in the price of a car. For each number they are off, they lose A all the guesses, they win the car; if they $1. If they have money left over after making run out of money, they lose. The probability N of response plays a big part in the first two or three numbers. You would not want to guess 9 for the first number, for example. If Y contestants are at the fourth or fifth number, however, and still have money left, they may want to choose the median value (probably a 5) to minimize the error (loss of dollars). Being high or low does not matter here, 1 only deviation from the number. Finally, if you have the opportunity to average your misses over several guesses 5 and the signs do matter (high guesses can offset low guesses), the mean is the best 6 not know a value, it is often best to choose choice. The mean is good in that if you do the average. For example, if you had to 8 guess the weight of a woman whom you had never seen, you should probably choose the mean weight for women because this T would minimize the error. The mean is also practically the only choice when using S the mathematical properties of both the estimates in higher-order analyses because mode and the median are such that they do not lend themselves to inclusion in other formulas. The mean is less efficient, however, with highly skewed distributions. 16304_CH04_Walker.indd 109 8/2/12 3:41:33 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 110    Chapter 4 n Measures of Central ­­Tendency The final criterion for selecting a particular measure of central tendency is the interpretation. If you chose the wrong level of measurement and base your measure of central tendency on that choice, your interpretation may very well not make sense. For example, for a nominal level variable such as paint color, the mode makes sense (more people chose red than any other color). The median does not make much sense, however. For example, if you say half or fewer of the respondents chose red, what does that mean? There is no reference point because there is no order. The same holds true for the mean. How could you interpret an average L of 1.8 on paint color; that the average color chosen was slightly different than red? It is easier to use lower measures of I central tendency with higher levels of measurement, but you lose some of the power D correct to say the modal age in a of your interpretation. For example, it is technically class is 20, but it is not as precise as saying the Daverage age is 22.4. 4-4 Conclusion E In this chapter, we introduced univariateLanalyses by discussing the first of the univariate descriptive statistics, measures of central L tendency. Measures of central tendency are one of the most used descriptive statistics and provide the most information. , For example, if you were to ask someone about a group of people, you might provide an answer in terms of an average age or average income. The measures of central tendency provideTthe information that their name implies: a measure of the central value. Think of a seesaw. For a seesaw to work properly, it I must have a balance point in the middle so the weight is distributed generally equally F of central tendency is at the balance on each side (as in Figure 4-6). Here, the measure point of the distribution. The picture of a seesaw, F however, could easily be replaced with a histogram of a frequency distribution. If only the X axis were retained, the A seesaw could look like the bar chart in Figure 4-7. This distribution is actually unique N 4 is the most frequently occurring in that, mathematically, the mean equals 4. Since value, it is also the mode; and because 4 is the Y middlemost point in the distribution, it is also the median. If the values of the distribution were changed some, the balance point would have to shift to keep the balance of the distribution. For example, in 1 different points. This is because of the ­Figure 4-8, the mean, median, and mode are at spread and alignment of the values in the distribution. 5 You can see that just knowing the measure of central tendency is not always 6 enough. Sometimes it is also important to know how spread out the values are or how they are arranged in the distribution. This is the8reason that more than measures of central tendency are needed for a proper description T of data. In Chapter 5, we address how S Figure 4-6 Balancing a Distribution on the Measure of Central Tendency 16304_CH04_Walker.indd 110 8/2/12 3:41:33 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 4-4 1 2 Figure 4-7 111 Conclusion L I D D E 3 5 6 L4 L Histogram of Balanced Frequency Distribution , 7 spread out the values are in the distribution, and in Chapter 6 we discuss the arrangement of the data within the distribution. Together, these three pieces of information Tvariable (univariate analysis). make up the complete analysis of a single I F F A N Y 1 2 3 1 5 6 8 T S4 Mo Mean Me 5 6 7 Figure 4-8 Histogram of Unbalanced Distribution 16304_CH04_Walker.indd 111 8/2/12 3:41:33 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 112    Chapter 4 4-5 n Measures of Central ­­Tendency Key Terms central tendency dispersion form mean 4-6 median mode sum of squares Summary of Equations L Median (Me) for ungrouped data I N 1 1D 2 D Median (Me) for grouped data E L 2 cfbm 0.5N Me 5 Lm 1 a bi L fm , Mean (X) X5 fx aT NI F F 4-7 Exercises A The exercises for this chapter and Chapters 5 and 6 use the same examples. This will N allow you to work through problems using all three types of univariate descriptive staY tistics. 1. For the set of data below, calculate: a. The mode b The median c. The mean 2. 1 5 6 10, 12, 14 6, 7, 8, 10, 10, 8 For the set of data below, calculate: T a. The mode b. The median S c. The mean 7, 4, 2, 3, 4, 5, 8, 1, 9, 4 16304_CH04_Walker.indd 112 8/2/12 3:41:34 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 4-7 Exercises 113 3. For the set of data below, calculate: a. The mode b. The median c. The mean Interval Midpoint 90–100 4. 5. 6. Frequency 6 L8 80–89 I4 70–79 60–69 D3 50–59 D2 E For the set of data below, calculate: a. The mode L b. The median L c. The mean , Interval f 90–100 5 80–89 7 T I 70–79 9 F 60–69 4 F For each of the variables in the frequency tables that follow (from the gang A database), describe the level of measurement for each variable and how you N determined your answer. Using the frequency tables that Y follow (from the gang database), discuss the three measures of central tendency. 1 5 6 8 T S 16304_CH04_Walker.indd 113 8/2/12 3:41:34 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 114    Chapter 4 n Measures of Central ­­Tendency HOME: What type of house do you live in? Value Label Value Frequency Percent Valid Percent Cumulative Percent House 1 280 81.6 82.4 82.4 Duplex 2    3    .9    .9 83.2 Trailer 3 34    9.9 10.0 93.2 Apartment 4 21    6.1    6.2 99.4 Other 5    .6 .6 100.0    Missing    Total N Valid Missing Mean Std. Error of Mean Median Mode Std. Deviation Variance Skewness Std. Error of Skewness Kurtosis Std. Error of Kurtosis Range L    2 I    3 343 D D 340 3 E 1.41L .051 L 1 ,    .9 100.0 100.00 1 0.945 T I 2.001 .132 F 2.613 F .264 A 5 N Y .892 1 5 6 8 T S 16304_CH04_Walker.indd 114 8/2/12 3:41:34 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 4-7 Exercises 115 ARREST: How many times have you been arrested? Value Frequency 0 243 1 23 2 10    3 3 5 24    Missing    Total N Valid Missing Mean Std. Error of Mean Median Mode Std. Deviation Variance Skewness Std. Error of Skewness Kurtosis Std. Error of Kurtosis Range L    2 I   1 D61 343 D E 282 L 61 L .30 , .093 Valid Percent Cumulative Percent 70.8 86.2 86.2    6.7    8.2 94.3    2.9    3.5 97.9     .9    1.1 98.9     .6     .7 99.6     .3     .4 100.0 Percent 17.8 100.0 100.0 0 0 T1.567 I 2.455 F12.692 F .145 187.898 A .289 N 24 Y 1 5 6 8 T S 16304_CH04_Walker.indd 115 8/2/12 3:41:34 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 116    Chapter 4 n Measures of Central ­­Tendency TENURE: How long have you lived at your current address (months)? Value    16304_CH04_Walker.indd 116 Frequency Percent Valid Percent Cumulative Percent 1 14    4.1    4.3    4.3    2    6    1.7    1.8    6.1    3    4    1.2    1.2    7.3    4    4    1.2    1.2    8.6    5    6    1.7    1.8 10.4    6    6    1.7    1.8 12.2    7    1     .3     .3 12.5    8    3     .9     .9 13.5    9    2     .6     .6 14.1 10    1     .3     .3 14.4 11    1     .3     .3 14.7 12 11    3.2    3.4 18.0 14    1     .3     .3 18.3 18    5    1.5    1.5 19.9 21    1     .3     .3 20.2 24 30    8.7    9.2 29.4 30    1     .3     .3 29.7 31    1     .3     .3 30.0 32    1     .3     .3 30.3 36 22    6.4    6.7 37.0 42    1     .3     .3 37.3 48 12    3.5    3.7 41.0 60 24    7.0    7.3 48.3 72 14    4.1    4.3 52.6 76    1     .3     .3 52.9 84    8    2.3    2.4 55.4 96 18    5.2    5.5 60.9 108    4    1.2    1.2 62.1 120    9    2.6    2.8 64.8    3.2    3.4 68.2    6.1    6.4 74.6 132 11 144 21 L I D D E L L , T I F F A N Y 1 5 6 8 T S 8/2/12 3:41:34 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 4-7 Exercises 117 TENURE: How long have you lived at your current address (months)? Percent 156 13    3.8    4.0 78.6 168 11    3.2    3.4 82.0 170    5    1.5    1.5 83.5 180 L   7 I   2    1 D14 D    1 E24 L   3 L   2 , 16    2.0    2.1 85.6     .6     .6 86.2     .3     .3 86.5    4.1    4.3 90.8     .3     .3 91.1    7.0    7.3 98.5     .9     .9 99.4     .6     .6 100.0 186 192 198 204 216 240    Missing    Total Std. Error of Mean Median Mode Std. Deviation Variance Skewness Std. Error of Skewness Kurtosis Std. Error of Kurtosis Range 16304_CH04_Walker.indd 117    4.7 343 Valid Missing Mean Cumulative Percent Frequency 182 N Valid Percent Value 100.0 100.0 T327 16 I 88.77 F 3.880 F 72 A 24 N 70.164 4923.055 Y .365 .135 121.284 5 .269 6239 8 T S 8/2/12 3:41:34 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 118    Chapter 4 n Measures of Central ­­Tendency SIBS: How many brothers and sisters do you have? Value Frequency 39 11.4 11.5 11.5 137 39.9 40.5 52.1 2 79 23.0 23.4 75.4 3 39 11.4 11.5 87.0 4 17    5.0    5.0 92.0 5 13    3.8    3.8 95.9 6    6    1.7    1.8 97.6 7    4    1.2    1.2 98.8     .3     .3 99.1     .3     .3 99.4     .3     .3 99.7     .3     .3 100.0 9    1 10    1 12    1 15    1    5 343 Valid Missing Std. Error of Mean Median Mode Std. Deviation Variance Skewness 16304_CH04_Walker.indd 118    1.5 100.0 Total 3.245 2.664 1 .133 5 12.027 .265 6 Std. Error of Kurtosis Range L I D D E L L , T 338 I 5 1.94F F .098 1 A 1 N 1.801 Y Std. Error of Skewness Kurtosis Cumulative Percent 1    Total Mean Valid Percent 0    Missing N Percent 15 8 T S 8/2/12 3:41:34 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 4-9 4-8 Notes 119 References Edwards, W. J., White, N., Bennett, I., & Pezzella, F. (1998). Who has come out of the pipeline? African Americans in criminology and criminal justice. Journal of Criminal Justice Education, 9(2), 249–266. Galton, F. (1883). Inquiries into Human Faculty and Its Development. London, ­England: Macmillan. Pearson, K. (1895). Classification of asymmetrical frequency curves in general: Types L actually occurring. Philosophical Transactions of the Royal Society of London (Series A, Vol. 186). London, England:I Cambridge University Press. 4-9 1. 2. D D This may not be a valid assumption, E and it is possible, for example, that all the scores could be 14, but it would be impossible to calculate the median withL out deconstructing the values, so an assumption is made that all values in the L median class are equally distributed. For future reference, this formula ,is the same (except for the 0.5) as the one used Notes for computing percentiles because the median is the 50th percentile of the distribution. T 3. This procedure assumes closed intervals for each class. If you have a situation, say, where the oldest category of an age distribution is “6 and above,” it is I more difficult to determine the midpoint. It is sometimes necessary to make an F estimate of where the central value of the class might be. F A Criminal Justice on the Web Nto make full use of today’s teaching and techVisit http://criminaljustice.jbpub.com/Stats4e nology! Our interactive Companion Website has been designed to specifically complement Y Statistics in Criminology and Criminal Justice: Analysis and Interpretation, 4th Edition. The resources available include a Glossary, Flashcards, Crossword Puzzles, Practice Quizzes, Web­links, and Student Data Sets. Test yourself 1 today! 5 6 8 T S 16304_CH04_Walker.indd 119 8/2/12 3:41:34 PM
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

...


Anonymous
Great! 10/10 would recommend using Studypool to help you study.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags