Humanities
POL221 Minnesota State University Mankato Sigma Summation Sign Questions

Minnesota State University Mankato

Question Description

I’m stuck on a Political Science question and need an explanation.

- Please Answer the problem set paper questions and also try to explain each step you did and explain what you did please.

- I'll attach the problem set paper and also I will attach the handouts that would help to understand the concepts.



Unformatted Attachment Preview

Political Science 221 Problem Set #1 Spring 2020 Please answer all of the following problems on a separate sheet of paper. Your answers may be handwritten. Be sure to show all steps in your calculations. (30 points total) 1. (1 points each) Given that k = {11.2, -4, 17, 11, 0.6, -4.5, 0, -8, -31, 13, 15.9, -3.5}, find the following: a) k 3 b) k 2−k 6 11 c) ∑ ku u=2 5 d) e) ∑ ku u=1 ∑ ku f) g) h) Mode ( k ) i) MD ( k ) j) An outlier. 2. (2 points each) Given that l = {4, 2, 1, 6, 8, 7}; solve for m in each of the following. Show all of your work at each step: 6 a) m=∑ l i i=3 b) 3. Suppose that v is the data of an entire population and w is a sample taken from v. Given that: v = {7, 5, 4, 4, 9, -2, 11, 5, 2, 0, -2, 5, -4, 6, 9} w = {4, 5, -2, 2, 6, 5, -4, 9} Find each of the following (4 points each): 2 a) σ v b) σ v 2 c) s w d) s w POL 221: Political Analysis Scott Granberg-Rademacker Handout #1 Measures of Central Tendency Measures of central tendency are mathematical operations which supply information about the “typical” observation in a set or variable. There are several measures of central tendency, each with different pros and cons: expected values (sometimes called expectations, means or averages), medians, and modes. Expected values (usually denoted as E (X) or x̄) are most commonly used in practice, but there are applications where medians (denoted x̃) or modes may prove to be a better indicator of what the “typical” observation is like. Most of the time, the expected value is identical to the simple average, which is nothing more than the arithmetic mean of a set or variable. Simple averages, however, make the assumption that the probability of each observation is equal: P (x1 ) = P (x2 ) = · · · = P (xk ). If X is a discrete stochastic variable, the simple average can be simply found as follows: n P E(X) = x̄ = xi i=1 n (1) However, such an assertion may or may not be true. If the probabilities assocatied with each observation are different, then the expected value is a weighted average. Consider the expected value of a variable, x, where the probability of each possible observation is different. In a case like this, the expected value would simply be each observation times its probability: E(X) = x̄ = n X xi f (xi ) (2) i=1 Though the problem with weighted averages in practice is that we often do not know the exact probabilities that make up f (x) (remember that f (x) is the probability density function of x). When these probabilities are not known, the most common approach is to simply assume that the probabilities are all the same and use the simple average formula. 1 One of the main problems with using expected values is that the influence of outliers is poorly mitigated. Basically, extreme values which are not “typical” of other observations may heavily skew the expected value. Consider two variables: a = {3, 4, −2, 4, 5, 3} b = {3, 4, −2, 4, 5, 3, 170} The only difference between the two is that B has one more observation than A, but that single observation is clearly much different than the rest of the observations. Such abnormal observations are outliers, which can badly skew the expected value: n P ā = n n P b̄ = ai i=1 i=1 n = 3+4+(−2)+4+5+3 6 = 3+4+(−2)+4+5+3+170 7 bi = 17 6 = = 2.83 187 7 = 26.71 So, how can one consider extreme outliers while still getting a good idea about the “typical” observation? Another possibility is to use the median. The median of a set or variable is the value that has just as many values greater than it as are less than it. When the set or variable has an even number of observations, the median is the average of the two middle values. When the set or variable has an odd number of observations, the median is simply the middle value. It is important to note for discrete variables that the median will always satisfy the following condition: P (X ≤ x̃) ≥ 0.5 ≤ P (X ≥ x̃) (3) Finding the median is quite simple. The first step is to arrange the values in the variable(s) from least to greatest. Let us denote the arranged variables as a∗ and b∗ . a∗ = {−2, 3, 3, 4, 4, 5} b∗ = {−2, 3, 3, 4, 4, 5, 170} When the total number of observations is odd, the median can be found using the following formula: (4) x̃ = x∗n+1 2 2 and when the total number of observations is even: x̃ = x∗n + x∗n +1 2 2 2 (5) Since a has six observations (n = 6), it is necessary for us to use Equation 5 to find the median of a: ã = a∗n + a∗n +1 2 2 2 = a∗6 + a∗6 +1 2 2 2 = a∗3 + a∗3+1 a∗ + a∗4 3+4 7 = 3 = = = 3.5 2 2 2 2 Finding the median of b is simply a matter of using Equation 4, since b has an odd number of observations (n = 7): b̃ = b∗n+1 = b∗7+1 = b∗8 = b∗4 = 4 2 2 2 When we compare the means and medians of a and b, one can see that they are not the same: ā = 2.83, ã = 3.5 b̄ = 26.71, b̃ = 4 However, both the mean and median are fairly “typical” of a, which is to be expected since there is no extreme outlier in a. Note that the mean of b has been heavily skewed by the outlier but the median of b easily mitigates the impact of the outlier. This illustrates one of the nice properties of the median–it tends to be resistant to outliers. Another measure of central tendency which is not used very often is the mode. The mode of a set or variable is simply the value that occurs most frequently within that set or variable. It is possible that for any given set or variable, there may be one mode, several modes, or no modes. For example, the mode of a is simply: Mode (a) = {3, 4} Modes are seldom used in practice for good reason. They are often unreliable and misleading, as illustrated in the following example: c = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 902, 902} Where the mode of c is: 3 Mode (c) = 902, which is hardly typical of c. Consider another example: d = {1, 2, 3, 4, 5, 6, 7} In this instance, there is no mode of d, because there is only one instance of each value. Mode (d) = ∅, where ∅ denotes an empty set. Measures of Variability are mathematical operations which measure the amount of dispersion or spread in a given set or variable. While measures of central tendency tell you what the “typical” observation is like, measures of variability tell you hbow dispersed or spread out the data in a set or variable is. There are several measures of variability available to us, each with advantages and disadvantages. The most basic measure of variability is the range. The range of a set or variable is simply the largest value minus the smallest value. The range can be denoted as: Range (x) = xmax − xmin (6) So if we have two variables: e = {3, 5, 5, 7} f = {4, 4, 6, 6} Finding the ranges is quite simple: Range (e) = 7 − 3 = 4 Range (f ) = 6 − 4 = 2 Ranges are nice but are only informative about the extreme values of a variable. This means that they are susceptible to outliers, and can ultimately provide a badly skewed picture of the variability of a variable. A better measure of variability is the mean deviation. The mean deviation is the average distance an observation in a set or variable is away from the mean. This makes for a nice interpretation about the “typical” observation. 4 The mean deviation can be found by using the following formula: n P MD (x) = |xi − x̄| i=1 n (7) Absolute value bars || simply mean that after all operations in the absolute value are finished, negative numbers are turned positive. For example, |5 − 8| = |−3| = 3. The absolute value of a positive number is a positive number: |5| = 5. Despite the nice interpretation, absolute values are not used all that often. First of all, absolute values are problematic (particularly for computers) when doing more complex operations. Secondly, it is possible for variables with different distributions to have the same mean deviation. Consider e and f once again: e = {3, 5, 5, 7} f = {4, 4, 6, 6} Clearly they are distributed differently, but the mean deviation will not reveal this to us. Observe how both mean deviations yield the same result (keep in mind both ē and f¯ = 5): n P |ei − ē| |3 − 5| + |5 − 5| + |5 − 5| + |7 − 5| = n 4 |−2| + |0| + |0| + |2| 2+0+0+2 4 = = =1 4 4 4 MD (e) = i=1 n P = fi − f¯ |4 − 5| + |4 − 5| + |6 − 5| + |6 − 5| = n 4 |−1| + |−1| + |1| + |1| 1+1+1+1 4 = = =1 4 4 4 MD (f ) = i=1 = This is where the variance (commonly denoted σ 2 which is pronounced “Sigma squared”) and standard deviation (denoted σ) can help out. The formula for the variance is very similar to the mean deviation, but it avoids the problem of taking the absolute value by simply squaring the deviations. Additionally, it provides us with a measure that is more sensitive to variation 5 than the mean deviation. The formula for the variance is simply: n P σ2 = (xi − µ)2 i=1 n The variance is simply the square root of the variance: √ σ= σ2 = (8) v n uP u (xi − µ)2 t i=1 (9) n All of these benefits do have a downside, however. Since the deviations are being squared, the variance and standard deviation do not have a clean and simple interpretation like the mean deviation does. It does have some nice qualities which will be illustrated when we talk about distributions and hypothesis testing. So how do the variance and standard deviation fare with e and f ? Let’s find the variances: n P σe2 = i=1 n P σf2 = (ei − µe )2 (3 − 5)2 + (5 − 5)2 + (5 − 5)2 + (7 − 5)2 = n 4 (−2)2 + 02 + 02 + 22 4 + 4 8 = + = =2 4 4 4 (fi − µf )2 (4 − 5)2 + (4 − 5)2 + (6 − 5)2 + (6 − 5)2 n 4 2 2 2 2 (−1) + (−1) + 1 + 1 1+1+1+1 4 = + = =1 4 4 4 i=1 = And the standard deviations: σe = p σe2 = √ 2 = 1.41 q √ σf = σf2 = 1 = 1 6 Notice that the standard deviations are close (or identical in the case of f ) to the mean deviations found, but are still different from each other–better reflecting the true variability of e and f . In general, the larger the standard deviation, the greater the variability. All of what we have done so far assumes that we are dealing with populations. Populations are complete sets of all observations of interest. In reality, true populations are often unknown. Most of the time, what we have in social science is sample data. Samples are simply subsets of a population. Because we often deal with sample data, we need to account for the uncertainty that needs to be accounted for in a sample. Think of it like a currency: every observation in a sample is a currency unit, but whenever an estimate is calculated, one unit of currency is “spent”. These “currency” are known as degrees of freedom (referred to as “df” for short), and one degree of freedom is lost when we “spend” it to calculate an estimate. More technically, degrees of freedom are any of the unrestricted, random variables that constitute a statistic. In practicality, this means that we have to make small adjustments to some of our formulas when dealing with samples. The biggest change for us right now is to remember that the formulas for variance and standard deviations need to be slightly corrected. The sample variance can be found using the following formula: n P s2 = (xi − x̄)2 i=1 n−1 And the sample standard deviation is: v uP u n u (xi − x̄)2 √ t s = s2 = i=1 n−1 (10) (11) You might ask, what really changed? The most noticeable change is that the Greek letter σ is not used in either formula. Instead, the sample variance is denoted as s2 and the sample standard deviation is denoted as s. These are estimates which approximate the unknown population variance σ 2 and population standard deviation σ. Since these are sample estimates, we lose one degree of freedom, which is taken off of the denominator. So instead of dividing by n, we divide by n − 1, when finding s2 and s. 7 Also of note is that the typical notation for the population mean and sample mean are different. The population mean is usually denoted by the Greek letter µ (pronounced “mu”), and the sample mean is usually denoted with a bar over the variable name, x̄. Once again, in practice the true value of µ is often unknown, and the mean of the observed sample data x̄ is only an estimate of µ. EXCEL Commands: Average: =AVERAGE(number1,number2,...) Median: =MEDIAN(number1,number2,...) Mode: =MODE(number1,number2,...) Range: =MAX(number1,number2,...)-MIN(number1,number2,...) Mean Deviation: =AVEDEV(number1,number2,...) Population Variance: =VARP(number1,number2,...) Population Standard Deviation: =STDEVP(number1,number2,...) Sample Variance: =VAR(number1,number2,...) Sample Standard Deviation: =STDEV(number1,number2,...) 8 POL 221: Political Analysis Scott Granberg-Rademacker Handout #2 Normal Distribution The shape of the normal distribution is the famous bell-shape shown below. Figure 1: Normal Distribution The normal distribution (also sometimes called the Gaussian distribution) first appeared in print by Abraham de Moivre in 1733. It is easily the single most important distribution ever. The normal distribution has two parameters: mean (denoted µ) and variance (denoted σ 2 ). The pdf of the normal distribution seems intimidating, but fortunately we don’t really have to deal with it all that much in this class: x−µ 2 [ σ ] 1 f x; µ, σ = √ e− 2 (1) 2πσ for −∞ < x < ∞, where −∞ < µ < ∞ and 0 < σ < ∞. A normally distributed random variable, X, is denoted: X ∼ N (µ, σ 2 ). 2  1 The importance of the normal distribution is in how it relates to most other distributions. In fact, the central limit theorem states that if any given distribution (normal or non-normal) has a finite mean µ and variance σ 2 , then the sampling distribution of the mean will approach the normal distri2 bution with a mean µ and variance σn where the sample size n increases and approaches infinity n → ∞. Tests of Hypotheses What are hypotheses? Hypotheses are sets of statements (usually two statements) which meet the following criteria: 1. They are mutually exclusive, which means that it is not possible for both statements to be true or false at the same time. If one is true then the other must necessarily be false, and vice-a-versa. 2. They are collectively exhaustive, which means that all possibilities must be accounted for. 3. There must be adequate data of sufficient quantity and quality by which the statements in the set can be tested for truth or falsity. Consider an example whereby you might be interested in knowing whether the average age of children at your daycare center is significantly different than the average age of daycare centers nationally. Let the national average age of children at daycare centers be denoted as µ, and let the average age of children at your daycare be denoted as x̄. The relationship between µ and x̄ can be expressed in six possible ways: 1. µ 6= x̄ 2. µ > x̄ 3. µ < x̄ 4. µ = x̄ 5. µ ≥ x̄ 6. µ ≤ x̄ 2 Hypothesis sets are typically denoted as two different statements, H0 and H1 . H0 is what is known as the null hypothesis (H0 is actually pronounced “H not”) and H1 (which is pronounced “H one”) is the alternative hypothesis. It is important to remember when constructing a hypothesis set that the equals sign (which could be expressed as =, ≥, or ≤) is always going to be in H0 . Alternatively, H1 will never have an equals sign in it. Instead, H1 will be directly relatable to your suspicion about the relationship expressed. For example, if you believed that than your daycare center had younger children (on average) than daycare centers nationwide, your suspicion would be: Age of children at your daycare < Age of children at daycares nationwide Which is the same as stating: x̄ < µ And since this is our suspicion, we can denote it as H1 : H1 : x̄ < µ Now that we have H1 , we need to construct H0 . We must include all other possibilities and we must make sure that the equals sign is included in the expression in H0 . In H1 , we stated our belief that children are on average younger at your daycare than at daycares nationwide. If this statement is false, then one of the following must be the case: children at your daycare must be older or the same age as children at daycares nationwide. We could express this formally as H0 : H0 : x̄ ≥ µ If we put H0 and H1 together, we have a hypothesis set that is both mutually exclusive and collectively exhaustive: H0 : x̄ ≥ µ H1 : x̄ < µ 3 DIFFERENCE BETWEEN MEANS OF SAMPLE AND POPULATION WITH LARGE SAMPLES (n > 30) 2-tailed test Let’s say that you are interested in knowing whether or not your sample mean is different than the known mean of your population 1. Examples: Let’s say that you are interested in knowing whether the average age of children at your daycare center are different than the average age of children at daycare nationally. Let x be the ages of the children at your daycare, and your daycare has 30 children (n=30). x = {5, 6, 6, 2, 4, 0, 9, 5, 5, 4, 4, 6, 7, 1, 2, 9, 0, 5, 6, 2, 2, 3, 8, 9, 9, 0, 0, 6, 5, 5} The average child age at your daycare: The variance: s2 = 8.05, and standard deviation: s = 2.84 Census Bureau data on daycares states that the average age of children at daycare is 5.7 years old, and the population variance is 5.1 and population standard deviation is 2.26 So our population figures are: µ = 5.7 σ 2 = 5.1 σ = 2.26 We then state our hypotheses (H0 must always contain an equal sign): H0: H1: Or stated another way: H0: H1: Since we have 30 or more observations, we can use the large-sample approximation to assume that our sampling distribution is approximately normal. We then use the following formula to calculate the test statistic: So then we go through the actual calculation: 1 It is useful to know that most of the time, the true population mean (µ) of a sample is not known; neither is the true population variance (σ2). 3 Once we have the z-score, we must determine whether or not our z-score is inside or outside of the critical region. We have to determine what our α-level is going to be. Think of this in terms of: how certain do you want to be in your result? Most commonly, α = .05, though sometimes scientists want a higher standard of proof, so they may choose a smaller α level. Basically, what this means is that you are testing your hypothesis against a certain confidence level. This level of confidence is: 1 - α = confidence level So in our example, if we choose α = .05, then our confidence level is 95% (confidence level is always 1-α, so if α = .05 like it does in this instance, 1-.05=.95, or 95% confidence). Next we need to look our z-table. If we are conducting a two-tailed test (which we are in this case), we would look to see if the following statement is true or not: , where In this instance, and are found by looking at the z-table. , so the trick is to find the values that most closely matches: , which in our case is .475, since .5 - .025 = .475. The closest match from the table is 1.96, with a value of .4749. To find 1.96, just look follow straight across from .4749 on the table to arrive at 1.9, then follow straight up from .4749 to find .06, put them together and your = 1.96 So in our instance the equation is actually false, since the expression: -1.96 < 2.31 < 1.96 is false. When this is false, we REJECT H0. Meaning that we can be 95% is significantly different than the population mean of µ = 5.7. confident that our mean of 1-tailed test In the previous example, we used the two-tailed test because we didn’t know for sure whether our mean was going to be smaller or larger than the population mean. 1-tailed tests are used when you have a good idea which way you want to test. 4 Let’s say that the follow ...
Purchase answer to see full attachment
Student has agreed that all tutoring, explanations, and answers provided by the tutor will be used to help in the learning process and in accordance with Studypool's honor code & terms of service.

Final Answer

Hi - this is done! I'm attaching the Excel doc that I used too for some of the formulas. Let me know if you have any questions!

k
11,2
-4
17
11
0,6
-4,5
0
-8
-31
13
15,9
-3,5

k_bar
abs(k-k_bar)
1,475
9,725
1,475
5,475
1,475
15,525
1,475
9,525
1,475
0,875
1,475
5,975
1,475
1,475
1,475
9,475
1,475
32,475
1,475
11,525
1,475
14,425
1,475
4,975
Sum
MD

121,45
10,12

Mean
Std Dev

1,475
12,926

1st Quartile
Median
3rd Quartile
IQR

-4,125
0,3
11,65
15,775

Median-1.5*IQR
Median + 1.5*IQR

-23,3625
23,9625

l

l_bar
4
2
1
6
8
7

4,7
4,7
4,7
4,7
4,7
4,7
Sum

l-l_bar
-0,...

Sara_777 (1091)
University of Maryland

Anonymous
Top quality work from this tutor! I’ll be back!

Anonymous
Just what I needed… fantastic!

Anonymous
Use Studypool every time I am stuck with an assignment I need guidance.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4