SOC 113 Sociology Hypothesis Testing Questions

User Generated

c3305

Humanities

SOC 113

Description

1. Hypothesis testing: how to form hypotheses (null and alternative); what is the meaning of reject the null or fail to reject the null; how to compare the p-value to the significant level (suchlike alpha = 0.05), and what a smaller p-value means.

2. How to interpret the one-sample t-test results: what are Ho and Ha; the standard for determining statistical significance, i.e., t statistic and p-value; what are the steps for the one-sample t test; what a normal distribution looks like.

3. How to interpret the one-way ANOVA results: what are Ho and Ha; the standard for determining statistical significance, i.e., F statistic and p-value; what an F distribution looks like.

4. How to interpret the simple linear regression results: what are Ho and Ha; the standard for determining statistical significance, i.e., t statistic and p-value of the slope; what is the slope and what it means; what is the R-square (not R, it is R-square!) and what it means; what are independent variables and dependent variable, and what their relationships are; how would you plot the relationship between a dependent variable and an independent variable; from a given independent variable, how would you predict the value of a dependent variable.

5. How to interpret the multiple regression results: how to interpret the slope of an independent variable (i.e., the impact of this independent variable, holding other independent variables constance).

Unformatted Attachment Preview

1 Soc 113 Review Guide for November 6 Midterm (2019) Form: 20 questions in total. 10 multiple choice or filling the blanks; 10 short responses, related to the statistical tables provided (suchlike those tables in HW assignments). Key points are summarized below: 1. Level of measurement: understand what are continuous and discrete variables, and examples of different types (discrete, continuous, and the 4 types below) 2. Hypothesis testing: how to form hypotheses (null and alternative); what is the meaning of reject the null or fail to reject the null; how to compare the p-value to the significant level (suchlike alpha = 0.05), and what a smaller p-value means. 3. How to interpret the one-sample t-test results: what are Ho and Ha; the standard for determining statistical significance, i.e., t statistic and p-value; what are the steps for the one-sample t test; what a normal distribution looks like. 4. How to interpret the one-way ANOVA results: what are Ho and Ha; the standard for determining statistical significance, i.e., F statistic and p-value; what an F distribution looks like. 5. How to interpret the simple linear regression results: what are Ho and Ha; the standard for determining statistical significance, i.e., t statistic and p-value of the slope; what is the slope and what it means; what is the R-square (not R, it is R-square!) and what it means; what are independent variables and dependent variable, and what their relationships are; how would you plot the relationship between a dependent variable and an independent variable; from a given independent variable, how would you predict the value of a dependent variable. 6. How to interpret the multiple regression results: how to interpret the slope of an independent variable (i.e., the impact of this independent variable, holding other independent variables constance). 2 ● Understand how to use SPSS or Stata to produce all of the tables that you have had to handle so far. ○ Homework 1: ■ Tables used: ○ Homework 2: ■ Tables used: ○ Homework 3: ■ Tables used: ● Be familiar with the variables housed in the GSS dataset. ○ Limited because it doesn’t have a lot of the best kind of variables, but the variables still work. ■ Limitations: level of measurement / going to be a lot of times you have to overlook the problems ■ HAPMAR (happiness in marriage), RINCOME (income), PAPRES10 (father’s prestige score) ● `How are they coded? ○ HAPMAR → 1 = very happy, 2 = pretty happy, 3 = not too happy, 8 = don’t know, 9 = no answer, 0 = Not applicable ○ RINCOME → 1 = Lt $1000, 2 = $1000 - $2999, [...], 12 = $25000 or more, 13 = Refused, 98 = Don’t know, 99 = No answer, 0 = applicable ○ PAPRES → F``or the 3 different ‘papres’ variables on GSS, there are no labels associated with the codes ● Levels of measurement? ○ HAPMAR - nominal ○ RINCOME - ordinal ○ PAPRES - interval ● Be able to distinguish among various levels of measurement for variables. ○ Nominal ○ Data cannot be ordered nor can it be used in calculations ■ Republican, democrat, green, libertarian ● Not useful in calculations - Data is qualitative, can’t be used in a meaningful way such as means and standard deviations ● Ordinal ○ Data that can be ordered, differences cannot be measured ■ Small - 8oz, medium - 12oz, large - 32oz ■ Cities ranked 1-10, but differences between the cities don’t make sense/ can’t know how much better life is in city 1 vs city 2 ● Also shouldn’t be used in calculations ● Interval 3 ○ Data with a definite ordering but not starting point; the differences can be measured, but there is no such thing as a ratio ○ Not only classifies and orders the measurements, but it also specifies that the distances between each interval on the scale are equivalent along the scale from low interval to high interval ○ Can be ordered and differences between the data make sense ○ Data at this level does not have a starting point ■ 0 degrees doesn’t mean absence of temperature ■ think temperature: 10℃+10℃=20℃ but 20℃ is not twice as hot as 10℃. We can see this when we convert to Farenheit; 10℃= 50℉, but 20℃= 68℉. ● Ratio Data ○ Data with a starting point that can be ordered; the differences have meaning and ratios can be calculated ○ All features of interval data plus absolute zero ■ Phrases such as “four times as likely” are actually meaningful ○ Is defined as a quantitative data, having the same properties as interval data, with an equal and definitive ratio between each data and absolute “zero” being treated as a point of origin ○ Tell us about the order, the exact value in between units ■ Height, weight, duration ■ Both descriptive and inferential statistics can be applied ■ Your highest level, your most sophisticated ■ Axis of whatever you are measuring ■ There can be no negative numeric value in ratio data ■ Amount of money in your pocket right now ● Understand the difference between continuous and discrete variables. 4 ○ Discrete data ■ Very discrete spaces in between values / not going to have values in between whole numbers ● Certain number of values; positive, whole numbers (like number of people) ○ Continuous data ■ Fractional size spaces in between ■ Capturing every moment of the process / any value between a given range ● Height, weight, etc. ■ Not restricted to separate values ■ Occupies any value over a continuous data value ● Age ● Why is it important to know #4 and #5 in performing statistical procedures. ○ Not all variable types can have statistical procedures performed on them ○ Affects what type of analytical techniques can be used on the data and what conclusions can be drawn ○ Important to understand that they are just 2 different types of data which will explain the relationship of the data & create a better understanding for analysis ○ Important because you always want to know the level of measurement before you start analysis - you want to choose the right way of doing analysis ● What do we mean by inference? ○ Inference: causal ■ Something caused/influenced another thing ■ A caused by B ○ Concerned primarily with understanding the quality of parameter estimates ■ How sure are we that estimated xbar is near true population mean µ ○ Reliability of statistical relationships, typically on the basis of random sampling ● Would you need to perform any work regarding inference with population data? ○ No, inferential statistics allows you to make inferences about the population based on sample data. No inferences would need to be made if you had population data. ● What is the purpose of hypothesis testing, and on what kind of data? ○ Hypothesis testing is the primary mechanism for making decisions based on observed sample statistics ○ We want to know if there’s any relationship - causal or correlated ■ Related to the conclusion we can get/ pre-score and post-score see if there’s a difference ■ Must be done with continuous sample data ○ The alpha level tells you that you’re operating at the possibility of being wrong ■ Working cautiously and understanding limitations ● What are the important components of hypothesis testing? What are the essential elements? 5 Read all the elements to understand what it’s about Know sampling statistic - derive from own data Critical value - get off curve Compare critical value to the point you derive from your data Based on the level of significance, you draw a conclusion ■ There’s a lot of components - you have to have a dataset, have to construct your own hypothesis, find mean & variance to construct analysis ● Null & alternative hypotheses ● Test statistic ● Sampling statistic ● Critical value ● Probability values and statistical significance ● Conclusions of hypothesis testing ● What are the steps in performing a hypothesis test? ○ 1. Specify the null hypothesis and alternative hypothesis ○ 2. assumptions / givens ■ Random sampling, known parameters, levels of measurement, known statistics ○ 3. Set the significance level (alpha value) ○ 4. Calculate the test statistic and corresponding p-value ○ 5. Drawing a conclusion ● Be able to draw a “curve” and label that curve appropriately for a hypothesis test. ○ Plot number line below curve and be able to do the math ○ Make sure math matches curve ○ If it’s a two tailed test make sure you break it up into two sides ○ F is always one tail ○ Question about greater than or equal to – it’s a one-sided test ○ ○ ○ ○ ○ ● What alternative is there to a “curve”? ○ a. You can walk through the equation without drawing a curve ■ Ex: calculate p-value and compare that to the critical value ○ You perform the test and afterwards and tell people how to determine if that’s significant or not ● How do tests of proportion differ from tests of means? 6 ○ A test of proportions seeks to find a statistically significant difference between the proportions of two groups. A test of means seeks to find a statistically significant difference between the means of two groups. ● What is a sampling distribution and how is it derived? ○ A sampling distribution is a probability distribution of a statistic obtained through a large number of samples drawn from a specific population ■ It tells us which outcomes we should expect for some sample statistics (mean, standard deviation, correlation, etc ○ Represents the distribution of the point estimates based on samples of a fixed size from a certain population. It is useful to think of a particular point estimate as being drawn from such distribution. Understanding the concept of a sampling distribution is central to understanding statistical inference. ■ Example below: unimodal and approximately symmetric. Centered exactly at true population mean µ=3.90. Sample means should tend to fall around population mean. ■ ● What are sampling distributions used for? ○ Knowledge of sampling distribution & making inferences about the overall population ● What is a significance level? How is it interpreted? (significance level = a) ○ Probability of error / doing our best to get as close as we can. Restricting to 5%, 1%, etc. ○ The significance level, also denoted as alpha or a is the probability of rejecting the null hypothesis when it is true. For example, a significance level of .05 indicates a 5% risk of concluding that a difference exists when there is no actual difference (95% confidence interval to evaluate hypothesis test). 7 ● ● ● ● ■ With this example, we will make an error whenever the point estimate is at least 1.96 standard errors away from population parameter (about 5% of the time, 2.5% on each tail) Can you set your level of significance anywhere? ○ Yes you can - you’re essentially making an assumption at the beginning of your statistical experiment so you can adjust it to whatever you want ○ Lower the alpha(significance level), more confident ■ Coming in with an alpha of .01 - one would most likely assume that findings would be somewhat significant What do we mean by a “significant” finding? ○ Differences that are being studied are real and not due to chance What are the basic things you need to perform a hypothesis test? ○ 1. Parameter & Statistic ■ parameter: summary description of a fixed characteristic or measure of the target population. Denotes the true value that would be obtained if a census rather than a sample were undertaken ● Mean (µ), Variance (oˆ2), standard deviation (o), proportion (p) ■ Statistic: summary description of a characteristic or measure of the sample. The sample statistic is used as an estimate of the population parameter ● Sample mean (xbar), sample variance (S^2), sample standard deviation (S), sample proportion (pbar) ○ 2. Sampling Distribution: probability distribution of a statistic obtained through a large number of samples drawn from a specific population ○ 3. Standard Error: similar to standard deviation - both are measures of spread. The higher the number, the more spread out your data is. Standard error uses statistics (sample data) and standard deviation uses parameters (population data) ■ Tells you how far your sample statistic (such as sample mean) deviates from the actual population mean. Larger your sample size, the smaller the SE/closer your sample mean is to the actual population mean. ○ 4. Null hypothesis: a statement in which no difference or effect is expected ○ 5. Alternate hypothesis: a statement that some difference or effect is expected ○ Descriptive statistics ■ Brief descriptive coefficients that summarize a given data set, which can be either a representation of the entire or a sample of a population/ summarizes or describes characteristics of a data set ■ Broken down into measures of central tendency (mean, median, mode) and measures of variability (spread - standard deviation, variance, minimum and maximum variables, skewness) What do you run on the computer at the very start of a hypothesis test? (Varies with type of test) 8 ○ Run a frequency distribution to make sure your levels of measurement match the procedures you want to do ● What is a test statistic and how many test statistics have we worked with so far? ○ Test statistic measures how close the sample has come to the null hypothesis. Its observed value changes randomly from one random sample to a different sample. A test statistic contains information about the data that is relevant for deciding whether to reject the null hypothesis or not ○ Hypothesis test Test Statistic Z-Test Z-Statistic t-test t-statistic ANOVA F-statistic Chi-square tests Chi-square statistic ● What is a frequency distribution and a cross tabulation and how do you interpret them? ○ Frequency distribution: shows you how common values are within the variable ■ We can get an idea about whether something is a continuous or categorical variable/ snapshot view of the characteristics of a data set - allows you to see how scores are distributed across the whole set of scores (spread evenly, skew, etc.) ● SPSS steps: click on analyze —> descriptive statistics —> frequencies ○ Move the variable of interest into the right-hand column ○ Click on the chart button, select histograms, and press continue and OK to generate distribution table ○ Cross tabulations: shows where the variables have something in common, seen at the intersec tion of the row and the column ■ summarize the association between two categorical variables ■ joint frequency distribution of cases based on two or more categorical variables ● SPSS steps: analyze —> descriptive statistics —> select cross tabulation ○ Here you will see Rows and Columns. You can select one or more than one variable in each of these boxes, depending on what you have to compare, then click on OK. ■ For percentages - analyze —> descriptive statistics —> crosstabs —> cells —> under percentage, select all 3 options ● Can you determine the level of measurement from a frequency distribution? ○ Yes, the independent variable of a frequency distribution should indicate its level of measurement - which is typically categorical ● What is the purpose of an analysis of variance? Is it relevant for data that comes in proportions? ○ ANOVA uses a single hypothesis test to check whether the means across many groups are equal: H0: The mean outcome is the same across all groups. In statistical notation, µ1 = µ2 9 ● ● ● ● ● =...... = µk where µi represents the mean of the outcome for observations in category i. HA: At least one mean is different. Generally we must check three conditions on the data before performing ANOVA: ■ the observations are independent within and across groups, ■ the data within each group are nearly normal, and ■ ■ the variability across the groups is about equal How do you calculate Eta2 from ANOVA and how do you interpret it? (from the reading) ○ A measure in ANOVA that tells you how much variance is in between each variable ○ Is a measure in ANOVA (h^2) - proportion of the total variance that is attributed to an effect. It is calculated as the ratio of the effect variance (SSeffect) to the total variance (SStotal) ○ We will be given value and just need to interpret it on test ■ Example: Total SS: 62.29, Anxiety SS: 4.08 —> 4.08/62.29 = 6.6% ● 6.6% of variance is associated with anxiety What kind of data is needed for an analysis of variance? ○ Dependent variable must be a continuous (interval or ratio) level of measurement ○ Independent variable must be a categorical (nominal or ordinal variable) ■ Two way ANOVA has 2 independent variables ● Females may have higher IQ scores compared to males, but this difference could be greater or less in European countries compared to North American countries ○ ANOVA assumes: data is normally distributed, homogeneity of variance (variance among groups should be approx. equal), observations independent of each other How does ANOVA work with both means and variances? ○ Inferences about means are made by analyzing variance What is the equation for ANOVA? ○ F = MST/MSE ■ where F = Anova coefficient, MST = mean sum of squares due to treatment, MSE = mean sum of squares due to error ■ MST = SST/p-1 ■ SST = ∑n(x-xbar)^2 ● where SST = sum of squares due to treatment, p = total number of populations, n = total number of samples in a population ■ MSE = SSE/N-p ■ SSE = ∑(n-1)S^2 ● Where SSE = sum of squares due to error, S = standard deviation of samples, and N = total number of observations ○ F=MSbetween/MSwithin What kind of conclusion are we looking to draw from an ANOVA procedure? What is ALL that we can report? 10 ○ We are looking to see if the means between groups are statistically equal to one another, which is all we can report. ○ P-value and Eta^2 ● What are we able to conclude from linear regression that we have not been able to conclude with other procedures? Based on what? ○ The growth of dependent variable due to changing (can be positive or negative) of 1 unit of independent variable. ○ Which group is significantly different from the others (coding each group as one binary independent variable). ● What level of variable measurement is ideal for regression? Why? ○ Continuous variable ○ Any time you’re working with means, you want to be working with ratios because you want to be able to have continuous data with an absolute zero ● Why are certain levels of measurement problematic? ○ TA doesn’t think they are problematic, but - for some variables getting the mean doesn’t make sense ■ If not continuous, maybe it’s not normally distributed OTHER NOTES / READING NOTES ● Descriptive statistics: uses the data to provide descriptions of the population, either through numerical calculations or graphs or tables ● Inferential statistics: makes inferences and predictions about a population based on a sample of data taken from the population in question ● ANOVA ○ Analysis of variance using a test statistic F/ uses single hypothesis test to check whether the means across many groups are equal ■ Null: mean outcome is the same across all groups; Alternate: at least one mean is different ○ Interval or ratio level data ○ 3 conditions before performing ANOVA: ■ the observations are independent within and across groups ■ The data within each group are nearly normal ■ The variability across the groups is about equal ○ Example: consider a stats department that runs three lectures of an introductory stats course. We might like to determine whether there are statistically significant differences in first exam scores in these three classes (A,B, and C). Describe appropriate hypotheses to determine whether there are any differences between the 3 classes. ■ H0= Average score is identical in all lectures, any observed difference is due to chance. HA: average score varies by class 11 ○ Mean square between groups(MSG): Simultaneously consider many groups, and evaluate whether their sample means differ more than we would expect from natural variation ■ Mean square between groups is quite useless so we compute a pooled variance estimate mean square error (MSE). MSE has an associated degrees of freedom value dfE= n - k ■ It is helpful to think of MSE as a measure of variability within the groups. When the null hypothesis is true, any differences among the sample means are only due to chance and the MSG and MSE should be about equal. As a test statistic for ANOVA, we examine the fraction of MSG and MSE ● F = MSG/MSE ● The MSG represents a measure of the between-group variability, and MSE measures the variability within each of the groups ○ ANOVA on SPSS ■ One-way: Analyze > Compare means > One way ANOVA ● Dependent list: variable whose means will be compared between the samples ● Factor: the independent variable: categories will define which samples will be compared ● F test ○ When to use F-test: ○ F: represents a standardized ratio of variability in the sample means relative to the variability within groups. If null is true, F follows an F distribution. The upper tail of the F distribution is used to represent the p-value ○ We can use the F statistic to evaluate the hypotheses in what is called an F test. A p-value can be computed from the F statistic using an F distribution, which has two associated parameters df1 and df2 ○ The larger the observed variability in the sample means (MSG) relative to the within-group observations (MSE), the larger F will be and the strongest evidence against the null hypothesis. Because larger values of F represent stronger evidence against the null hypothesis, we use the upper tail of the distribution to compute a p-value ○ ● P-value is how significant your findings are 12 ● ● ● ● ● ● ○ Used to determine statistical significance in a hypothesis test; evaluate how well the sample data support the devil’s advocate argument that the null hypothesis is true. Measures how compatible your data are with the null hypothesis. ○ The result you find from your z or t score after doing test ○ Lower the better – more likely that you can reject your null ○ For F—> tail is where significant values are Alpha is what you set beforehand to see if p-value is going to be below it ○ a= 1 – confidence interval µ= population mean, xbar = sample mean Variance ○ Trying to see how close together a data set is T test (steps and components) ○ When to do T-Test: ○ On SPSS: Analyze > Compare means > Independent Samples T Test ■ Test variables: the dependent variable(s)/ continuous variable whose means will be compared between the two groups ■ Grouping variable: independent variable/categories of the independent variable will define which samples will be compared in the t test ○ Steps using calculator/walk through procedure: Z tests (steps and components) ○ When to do Z-test ○ Population is always a z test The formula for calculating a z-score is z=(x-μ)/σ, where μ is the population mean and σ is the population standard deviation (note: if you don't know the population standard deviation or the sample size is below 6, you should use a t-score instead of a z-score). ● ● Giving frequency table and understanding how its coded ○ What does the table/number represent ○ What kind of data is that ■ Happiness of marriage: categorical ● One-tailed test (steps and components) ○ A statistical hypothesis test in which the critical area of a distribution is one-sided so that it is either greater than or less than a certain value, but not both. If sample being tested falls into the one sided critical area, the alternative hypothesis will be accepted instead of the null ● Two-tailed test (steps and components) 13 ○ Method in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values. If sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null. 14 15 Review Session/OH notes - Know how to read the curve and table for a Z, T, and F test - Probability, different parameters, different testing, etc. - Don’t need to know all equations - but do need to know really straightforward equations - I.e. F test = means squared/ another means squared - Sum of squared/sum of squared - Know what all of these mean - Know how sampling distributions work - Know something about z, t, f scores/ what they mean - Z-tests are statistical calculations that can be used to compare population means to a samples 16 - - - A z-score is a measure of position that indicates the number of standard deviations a data value lies from the mean. Positive if above mean, negative if below. - T-tests are calculations used to test a hypothesis, most useful when we need to determine if there is a statistically significant difference between two independent sample groups - Comparing two related samples - Population is infinite and normal, population variance is unknown and estimated from sample, mean is known, sample observations are random and independent, sample size is small, null may be one sided or two sided - F-test is used to test the equality of two populations/ if data conforms to a regression model which is acquired through least square analysis/ determines whether any of the independent variables is having a linear relationship with the dependent variable - A statistical test which determines the equality of the variances of the two normal datasets How much proportion of the variation is being contributed by this effect —> n^2=SSeffect/SStotal - Might encounter a situation where we have so many groups, might not be a huge impact because of so many groups Hypothesis test with slope - y=B0+B1X1 - Testing if slope is significant (B1) - Null hypothesis: B1=0 Sociology 113 Cumulative Final Exam Study Guide All the knowledge you need included below. 1. Understand how to use SPSS or Stata to produce all of the tables that you have had to handle so far. 1. Frequency Distribution, Cross tabulation, ANOVA Output, Two-Sample T-Test Frequency Distribution ● Analyze → Descriptive Statistics → Frequency Distribution ● Used in order to summarize categorical variables Cross Tabulation ● Analyze → Descriptive Statistics → Cross Tabulation ● Used in order to expose relationships between two separate variables ANOVA Output ● Analyze → Compare Means → One-Way ANOVA output ● Independent variable goes under ‘factor’ ● Dependent variable goes under ‘dependent list’ ● Post Hoc test at significance level 0.05 17 ● If P is = or < 0.05, then reject the null. If above, then fail to reject. Two-Sample T-Test ● Analyze → Compare Means → Independent Samples T-Test ● Input Test variable and grouping variable ● If sig (2-tailed) is below 0.05, reject the null hypothesis 2. Be familiar with the variables housed in the GSS dataset. 1. Be familiar particularly with the variables used in the homework 1. happiness in marriage (HAPMAR), respondent’s income (RINCOME), father’s prestige score (PAPRES10) 2. How are they coded?[a] 3. Levels of measurement? HAPMAR - nominal[b][c][d] RINCOME - ordinal PAPRES - interval 3. Be able to distinguish among various levels of measurement for variables. 1. Nominal- name only; labels with no numerical significance 1. cannot perform statistical procedures 2. Ordinal- ordered levels or ranks; differences between each is unknown 1. cannot perform statistical procedures 3. Interval- numeric scales in which we know both the order and the exact differences between the values, like temperature 1. can perform some statistical analysis, but the problem is that they don’t have a “true zero” (0 does not mean the absence of value; it is actually another number used on the scale, like 0℃ or 0℉, there is no absence of temperature) which means we cannot calculate ratios 1. think temperature: 10℃+10℃=20℃ but 20℃ is not twice as hot as 10℃. We can see this when we convert to Farenheit; 10℃= 50℉, but 20℃= 68℉. 4. Ratio- tell us about the order, the exact value between units, and they also have an absolute zero, like height, weight, duration 1. both descriptive and inferential statistics can be applied 4. Understand the difference between continuous and discrete variables.[e] 1. discrete variables refer to those that have a certain number of values; positive, whole numbers (like number of people) 1. Whole values 18 2. continuous variables refer to t 5. 6. 7. 8. 9. hose that can take any value between a given range (like height, weight, etc) 1. Any values like fractions of values Why is it important to know #4 and #5 in performing statistical procedures. 1. it is important to know the level of measurement because not all variable types can have statistical procedures performed on them (see #3) 1. You can do means test on ratio data, but not on nominal data What do we mean by inference? 1. statistical inference is the theory, methods, and practice of forming judgments about the parameters of a population and the reliability of statistical relationships, typically on the basis of random sampling 2. causal inference is finding the causal relationship between variables Would you need to perform any work regarding inference with population data? 1. no, inferential statistics allows you to make inferences about the population based on sample data. no inferences would need to be made if you have the population data What is the purpose of hypothesis testing, and on what kind of data? 1. hypothesis testing is the primary mechanism for making decisions based on statistical analysis in order to make inferences about population parameters based on observed sample statistics 1. is there a causal relationship? 2. must be done on continuous sample data What are the important components of hypothesis testing? What are the essential elements? 1. the null and alternative hypotheses 2. test statistic 19 3. sampling statistic 4. critical value (aka significance aka alpha value) 5. probability values and statistical significance 6. conclusions of hypothesis testing 1. get data set 2. find variable 3. construct hypothesis 4. construct analysis. 10. What are the steps in performing a hypothesis test? 1. hypothesis: null and alternative 2. assumptions/givens: random sampling, normal population distribution, level of measurement, known parameters/known statistics 3. sampling distribution test statistic: use appropriate sampling distribution to calculate value for test statistic 4. level of significance, the critical value/region: level of significance expressed as probability error α 5. conclusion 11. Be able to draw a “curve” and label that curve appropriately for a hypothesis test. 20 6. normal, unimodal distribution where mean, median, and mode are all equal 7. Be sure to label t, z, or f curve!! 8. z=(x-μ)/standard error[f][g] 9. f=mean square between / mean square within 10. Population is always a z test 11. Z test: Used when n is equal to or greater than 30 12. T test: Used for smaller populations (n< 30) (https://www.youtube.com/watch?v=dDsKP7wVpzM) $1.50 (H0: it is = 1.5; Ha: it is not equal) 2. Translate dollar values into Z value with Z formula (Z = [m -μ]/standard error) 3. plot 1.50 on number line5 13. How do tests of proportion differ from tests of means?[h][i][j][k] (not mentioned in the lecture) 3. a test of proportions seeks to find a statistically significant difference between the proportions of two groups. A test of means seeks to find a statistically significant difference between the means of two groups. 4. If you want to know how these two tests work specifically, please read https://mathbitsnotebook.com/Algebra2/Statistics/STcompare.html 14. What is a sampling distribution and how is it derived? 1. a sampling distribution is a probability distribution of a statistic obtained through a large number of samples drawn from a specific population; it tells us which outcomes we should expect for some sample statistics (mean, standard deviation, correlation, etc) 21 Ex. If we drew infinite samples, took the means of each of those samples, and graphed those means, that would show us the sampling distribution of the mean. 2. https://www.statisticshowto.datasciencecentral.com/sampling-distribution/ 15. What are sampling distributions used for? 1. knowledge of sampling distribution can be used in making inferences about the overall population 16. What is a significance level? How is it interpreted? 1. a measure of the strength of the evidence that must be present in your sample before you will reject the null hypothesis and conclude that the effect is statistically significant, aka the probability of rejecting the null hypothesis when it is true 2. In most cases, the significance level is compared to a P value of 0.05 17. Can you set your level of significance anywhere? 1. Yes 18. What do we mean by a “significant” finding? 1. we mean that something likely occurred not by chance 1. A 0.05 significance level means we are allowing 5% error in our findings and that we are 95% sure that there is a definite, statistically significant difference 2. When we find the test statistics is fall in the critical region, then the null hypothesis is rejected and the outcome is said to be statistically significant. 19. What are the basic things you need to perform a hypothesis test? 1. have the descriptive data (frequency tables and crosstabs), variables, model, truth of significance level 2. Check the condition: random sampling(independent), normal distribution population 20. What do you run on the computer at the very start of a hypothesis test? (Varies with type of test) 1. run a frequency distribution to make sure your levels of measurement match the procedures you want to do 21. What is a test statistic and how many test statistics have we worked with so far? 1. a test statistic is a value computed from sample data, used in a hypothesis test when you are deciding to support or reject the null hypothesis. it takes your data from an experiment or survey and compares your results to the results you would expect from the null hypothesis 2. we have worked with z-test[l][m], t-value (t-test), F-value (ANOVA), and t-value (Linear Regression) 22. What is a frequency distribution and a cross tabulation and how do you interpret them? 1. Frequency Distribution: Shows you how common (frequency) values are within the variable 1. we can get an idea about whether something is a continuous variable or categorical variable 2. Cross tabulations: a table that shows where the variables have something in common, seen at the intersection of the row and the column 23. Can you determine the level of measurement from a frequency distribution? 1. Yes, the independent variable of a frequency distribution should indicate its level of measurement which is typically a categorical variable. 24. What is the purpose of an analysis of variance? Is it relevant for data that comes in proportions? 1. https://www.youtube.com/watch?v=UrRYITjDOww tells us how well the regression line predicts actual values ● If r2=1 -> perfect line -> 100% of the variation in y is accounted for by its regression on x https://www.youtube.com/watch?v=ik9k4OnkkNU 24 How to read the coefficient table in SPSS regression: https://www.youtube.com/watch?v=Sz5AdyOiSLE Multiple regression in SPSS: R square, P value, ANOVA F https://www.youtube.com/watch?v=O435SJtU2c8 Understanding Regression Output: https://www.youtube.com/watch?v=VvlqA-iO2HA If y = a + b1x1 + b2x2 + b3x3 Then y = -.54.528 + 0.523x + 0.003x + 2.126x (look at coefficient for numbers ) Regression Crash Course: https://www.youtube.com/watch?v=WWqE7YHR4Jc ● Residuals are errors 25 ● ● ● ● SSE is close to 0 -> good at predicting outcome ● ● SSE is big Interpreting Output for Multiple Regression in SPSS https://www.youtube.com/watch?v=WQeAsZxsXdQ Homoscedasticity and Heteroscedasticity ● homoscedasticity means “having the same scatter.” For it to exist in a set of data, the points must be about the same distance from the line, as shown in the picture above. The opposite is heteroscedasticity (“different scatter”), where points are at widely varying distances from the regression line. 26 Variance formula: Be able to recognize equations, and describe the step-by step function of each equation ● Cheat sheet will be sent out 1. Review your study guide from the Midterm. 2. Know the purpose of regression analysis and when it is appropriate. In simple words: The purpose of regression analysis is to predict an outcome based on historical data. This historical data is understood using regression analysis and this understanding helps us build a model to predict an outcome based on this regression model. Its helps us predict and that is why it is called predictive analysis model. Regression analysis is used when you want to predict a continuous dependent variable from a number of independent variables. If the dependent variable is dichotomous, then logistic regression should be used. 3. Know why level of measurement matters to what statistical procedure you are working with. Not all levels of measurement can have certain statistical procedures performed on them. 4. Know all that you can determine from regression analyses. Allows you to determine which factors matter most, which factors can be ignored, and how these factors influence each other 5. Be able to assess the quality of a given regression analysis. Know how to know the quality. Residual plot; residuals should be randomly distributed around the line of error=0 with zero mean 2 R . Close to 1= regression model is correct Variance; smaller variance represents data more accurately than a model with larger values 6. Know the procedure for hypothesis testing re: regression analysis. 7. Review the steps for conducting an ANOVA and what can be gleaned from ANOVA. Recall the relationship between ANOVA and simultaneous t-tests.[n] when the population means of only two groups is to be compared, the t-test is used, but when means of more than two groups are to be compared, ANOVA is preferred. 8. Be able to assess the quality of a regression analysis. 9. Look closely at everything that lands in the output for a regression analysis. 10. You will be asked to pull information from SPSS/STATA tables to answer questions. 11. There will be approximately 20-25 questions in all. 27 [a]What does this mean by "how are they coded?" [b]Wouldn't this be ordinal because there are levels to happiness (the order matters). So why is it labeled as nominal? Can someone explain [c]Agree, Miss Zhong said SPSS automatically label nominal. [d]so is it ordinal or nominal? what would be the correct answer on the test? [e]Discrete data take particular values, while continuous data are not restricted to separate values. Discrete data are distinct and there is no grey area in between, while continuous data occupy any value over a continuous data value [f]or standard deviation? [g]if X is a single value, should use std dev., if X is a mean of a group, should use std error. [h]Are we expected to know how to carry out hypothesis testing for proportions? (Zeke told me no, but she seems to be suddenly talking about it so...?) [i]I am guessing she will probably at least have us know the steps [j]I would trust what your TA said more than the prof because the TA will be grading your midterm. [k]Not mentioned much in the lecture. We'll try to make sure it's not tested. [l]z-test or z-score? [m]has she done both? [n]In the review session, SHUJIN said to ignore this question. That she will try and make sure this is not being tested.
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

hello. I have completed your assignment. Kindly have a look and let me know if you need any adjustments. I will be here to assist in any way

Running Head: SOCIOLOGY QUIZ

1

Sociology Quiz
Institution:
Name:
Date

Question 1: Hypothesis testing
Rejecting the null hypothesis indicates that there exists a positive significant association between
the test phenomena while failure to reject null hypothesis means that the test did not identify any
consequential relation between the two test phenomena. P-values are used to draw conclusions in
testing the significance. Precisely, the p-values are compared to the significance level α to
conclude on our hypotheses. A p-value lower than the significance level leads to rejection of the
H0 (null hypothesis), accepting the Ha (alternative hypothesis).
Question 2. How to interpret the one-sample t-test results
H0, is the hypothesis that is truly tested, and presumed to be correct, unless a solid proof to the
contrary exists, while Ha, is the other hypothesis, normally presumed to be true where the Ho is
proven to be false. Both hypotheses must be stated prior to any statistical testing of significance.
The one-sample t-test is conducted in 4 steps:
i.

Calculating the mean ...


Anonymous
Just what I needed. Studypool is a lifesaver!

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4
Similar Content
Related Tags