Read all 5 questions carefully and answer these questions in Word document

User Generated

Nqnz11

Mathematics

Description

Here is my assignment to please read the instruction carefully. Please see attached. Please write your own words. Plagiarism is not acceptable at all. So, make a good writing to keep working with you.

Thank you

Unformatted Attachment Preview

Sampling, Normal Curve, and Hypotheses in Quantitative Research Types of Sampling Simple Random Sample Stratified Random Sample Cluster sampling Systematic Convenience Simple Random Sample Every subset of a specified size n from the population has an equal chance of being selected. Stratified Random Sample The population is divided into two or more groups called strata, according to some criterion, such as geographic location, grade level, age, or income, and subsamples are randomly selected from each strata. Cluster Sample The population is divided into subgroups (clusters) like families. A simple random sample is taken of the subgroups and then all members of the cluster selected are surveyed. Systematic Sample Every kth member ( for example: every 10th person) is selected from a list of all population members. Convenience Sample Selection of whichever individuals are easiest to reach. It is done at the “convenience” of the researcher. Now you decide: • including 5 people from every sports team on a collegiate campus • including every teacher from 4 elementary schools chosen from a group of 11 elementary schools in a school district with 45 elementary schools total • including 25 employees whose names were drawn from a hat 250 school employees • including all people who attend parent-teacher conferences • including every 20th student from a list of 2000 students in a particular high school Errors in Sampling Non-Observation Errors ◦ Sampling error: naturally occurs ◦ Coverage error: people sampled do not match the population of interest ◦ Underrepresentation ◦ Non-response: won’t or can’t participate As the researcher, you will never eliminate ALL elements of BIAS…but it is your job to minimize the impact of bias on your research project by carefully planning out your research design. The Normal Curve The Normal Distribution: The Normal curve is a mathematical abstraction which conveniently describes ("models") many frequency distributions of scores in real-life. length of time before someone looks away in a staring contest: length of pickled gherkins: Francis Galton (1876) 'On the height and weight of boys aged 14, in town and country public schools.' Journal of the Anthropological Institute, 5, 174-180: Francis Galton (1876) 'On the height and weight of boys aged 14, in town and country public schools.' Journal of the Anthropological Institute, 5, 174-180: Height of 14 year-old children 16 country town 14 10 8 6 4 2 0 51 -5 2 53 -5 4 55 -5 6 57 -5 8 59 -6 0 61 -6 2 63 -6 4 65 -6 6 67 -6 8 69 -7 0 frequency (%) 12 height (inches) Frequency of different wand lengths An example of a normal distribution - the length of Sooty's magic wand... Length of wand Properties of the Normal Distribution: 1. It is bell-shaped and asymptotic at the extremes. 2. It's symmetrical around the mean. 3. The mean, median and mode all have same value. 4. It can be specified completely, once mean and SD are known. 5. The area under the curve is directly proportional to the relative frequency of observations. e.g. here, 50% of scores fall below the mean, as does 50% of the area under the curve. e.g. here, 85% of scores fall below score X, corresponding to 85% of the area under the curve. Relationship between the normal curve and the standard deviation: frequency All normal curves share this property: the SD cuts off a constant proportion of the distribution of scores:- 68% 95% 99.7% -3 -2 -1 mean +1 +2 +3 Number of standard deviations either side of mean About 68% of scores fall in the range of the mean plus and minus 1 SD; 95% in the range of the mean +/- 2 SDs; 99.7% in the range of the mean +/- 3 SDs. e.g. IQ is normally distributed (mean = 100, SD = 15). 68% of people have IQs between 85 and 115 (100 +/- 15). 95% have IQs between 70 and 130 (100 +/- (2*15). 99.7% have IQs between 55 and 145 (100 +/- (3*15). 68% 85 (mean - 1 SD) 115 (mean + 1 SD) We can tell a lot about a population just from knowing the mean, SD, and that scores are normally distributed. If we encounter someone with a particular score, we can assess how they stand in relation to the rest of their group. e.g. someone with an IQ of 145 is quite unusual (3 SDs above the mean). IQs of 3 SDs or above occur in only 0.15% of the population [ (100-99.7) / 2 ]. Population  all possible values Sample  a portion of the population Statistical inference  generalizing from a sample to a population with calculated degree of certainty Two forms of statistical inference ◦ Hypothesis testing ◦ Estimation Parameter  a characteristic of population, e.g., population mean µ Statistic  calculated from data in the sample, e.g., sample mean ( ) P-hat  a sample proportion, symbolized by Distinctions Between Parameters and Statistics (Vocabulary Review) Parameters Statistics Source Population Sample Notation Greek (e.g., μ) Roman (e.g., xbar) Vary No Yes Calculated No Yes Hypothesis Testing Steps A.Null and alternative hypotheses B.Significance level C.Test statistic D.P-value and interpretation General Example: A criminal trial is an example of hypothesis testing without the statistics. In a trial a jury must decide between two hypotheses. The null hypothesis is H0: The defendant is innocent The alternative hypothesis or research hypothesis is H1: The defendant is guilty The jury does not know which hypothesis is true. They must make a decision on the basis of evidence presented. In the language of statistics convicting the defendant is called rejecting the null hypothesis in favor of the alternative hypothesis. That is, the jury is saying that there is enough evidence to conclude that the defendant is guilty (i.e., there is enough evidence to support the alternative hypothesis). We say, “We reject the null.” If the jury acquits it is stating that there is not enough evidence to support the alternative hypothesis. Notice that the jury is not saying that the defendant is innocent, only that there is not enough evidence to support the alternative hypothesis. That is why we never say that we accept the null hypothesis…we say, “We fail to reject the null.” (Although non-stats people often do this wrong!) Specific Example: Crazy guy...but good at explanations! Another Specific Example: A department store manager determines that a new billing system will be cost-effective only if the mean monthly account is more than $170. What null and alternative hypotheses can we write for this situation? The system will be cost effective if the mean account balance for all customers is greater than $170. We express this belief as a our research hypothesis, that is: H1: μ > 170 (this is what we want to determine) Thus, our null hypothesis becomes: H0: μ < 170 (we assume is true until proven otherwise) Interpretation P-value answer the question: What is the probability of the observed test statistic … when H0 is true? Thus, smaller and smaller P-values provide stronger and stronger evidence against H0 Small P-value  strong evidence for HA Interpreting the p-value… The smaller the p-value, the more statistical evidence exists to support the alternative hypothesis. •If the p-value is less than 1%, there is overwhelming evidence that supports the alternative hypothesis. •If the p-value is between 1% and 5%, there is a strong evidence that supports the alternative hypothesis. •If the p-value is between 5% and 10% there is a weak evidence that supports the alternative hypothesis. •If the p-value exceeds 10%, there is no evidence that supports the alternative hypothesis. We observe a p-value of .0069, hence there is overwhelming evidence to support H1: > 170. 11.38 Interpreting the p-value… Overwhelming Evidence (Highly Significant) Strong Evidence (Significant) Weak Evidence (Not Significant) No Evidence (Not Significant) 0 .01 .05 .10 p=.0068 11.39 Conclusions of a Test of Hypothesis… If we reject the null hypothesis, we conclude that there is enough evidence to infer that the alternative hypothesis is true. If we fail to reject the null hypothesis, we conclude that there is not enough statistical evidence to infer that the alternative hypothesis is true. This does not mean that we have proven that the null hypothesis is true! 11.40 Prior to testing, you would decide on a level of significance… Your computed “p-value” will indicate whether you should reject the null or fail to reject the null. Let’s examine some p-values and make decisions: If p = .45, we would __________________. If p = .20, we would __________________. If p = .09, we would __________________. If p = .01, we would __________________. If p = .009, we would __________________. In summary… *Sampling critically important to your study. *Null and alternative hypotheses are the foundation of research investigations. *Interpreting the p-value provides evidence as to whether you have “statistically significant” evidence to support your claim or not. Correlation & Regression Correlation Finding the relationship between two quantitative variables without being able to infer causal relationships Correlation is a statistical technique used to determine the degree to which two variables are related Scattergram (or scatterplot) • Rectangular coordinate • Two quantitative variables • One variable is called independent or criterion (X) and the second is called dependent or predictive (Y) • Points are not joined • No frequency table Y * * * X Example Wt. 67 69 85 83 74 81 97 92 114 85 (kg) SBP 120 125 140 160 130 180 150 140 200 130 mmHg) Wt. 67 69 85 83 74 81 97 92 114 85 (kg) SBP 120 125 140 160 130 180 150 140 200 130 SBP(mmHg) (mmHg) 220 200 180 160 140 120 100 wt (kg) 80 60 70 80 90 100 110 120 Scatter diagram of weight and systolic blood pressure Scatter plots The pattern of data is indicative of the type of relationship between your two variables: ➢ positive relationship ➢ negative relationship ➢ no relationship Positive relationship 18 16 14 Height in CM 12 10 8 6 4 2 0 0 10 20 30 40 50 Age in Weeks 60 70 80 90 Negative relationship Reliability Age of Car No relation An Example A familiar statement from parents to children: If you want to get ahead, stay in school. Underlying this nagging parental advice is the following claimed empirical relationship: + LEVEL OF EDUCATION =====> LEVEL OF SUCCESS IN LIFE Suppose we collect data through by means of a survey asking respondents (say a representative sample of the population aged 3555) to report the number of years of formal EDUCATION they completed and also their current INCOME (as an indicator of SUCCESS). We then analyze the association between the two interval variables in this reformulated hypothesis. + LEVEL OF EDUCATION =========> LEVEL OF INCOME (# of years reported) ($000 per year) Since these are both continuous variables, we analyze their association by means of a scattergram or scatterplot. Data collected from two different societies: Years of Education versus Yearly Income An Example (cont.) Note that the two scattergrams are drawn with the same horizontal and vertical scales to facilitate comparison between the two charts. Both scattergrams show a clear positive association between the two variables, i.e., the plotted points in both form an upward-sloping pattern running from Low – Low to High – High. At the same time there are obvious differences between the two scattergrams (and thus between the relationships between INCOME and EDUCATION in societies A and B). Questions For Discussion In which society, A or B, is the hypothesis most powerfully confirmed? In which society, A or B, is there a greater incentive for people to stay in school? Which society, A or B, does the U.S. more closely resemble? How might we characterize the difference between societies A and B? An Example (cont.) We can visually compare and contrast the nature of the associations between the two variables in the two scattergrams by drawing a number of vertical strips in each scattergram. Points that lie within each vertical strip represent respondents who have (just about) the same value on the independent (horizontal) variable of EDUCATION. Within each strip, we can estimate (by “eyeball methods”) the average magnitude of the dependent (vertical) variable INCOME and put a mark at the appropriate level. Average Income for Selected Levels of Education We can connect these marks to form a line of averages that is apparently (close to being) a straight line. An Example (cont.) Now we can assess two distinct characteristics of the relationships between EDUCATION and INCOME in scattergrams A and B. How much the does the average level of INCOME change among people with different levels of education? How much dispersion of INCOME there is among people with the same level of EDUCATION? An Example (cont.) In both scattergams, the line of averages is upward-sloping, indicating a clear apparent positive effect on EDUCATION on INCOME. But in the scattergram for society A, the upward slope of the line of averages is fairly shallow. The line of averages indicates that average INCOME increases by only about $1000 for each additional year of EDUCATION. On the other hand, in the scattergram for society B, the upward slope of the line of averages is much steeper. The graph in Figure 1B indicates that average INCOME increases by about $4000 for each additional year of EDUCATION. In this sense, EDUCATION is on average more “rewarding” in society B than A. An Example (cont.) There is another difference between the two scattergrams. In scattergram A, there is almost no dispersion within each vertical strip (and almost no dispersion around the line of averages as a whole). In scattergram B, there is a lot of dispersion within each vertical strip (and around the line of averages as a whole). We can put this point in simpler language. In society A, while additional years of EDUCATION produce rewards in terms of INCOME that are modest (as we saw before), these modest rewards are essentially certain. In society B, while additional years of EDUCATION produce on average much more substantial rewards in terms of INCOME (as we saw before), these large expected rewards are highly uncertain and are indeed realized only on average. For example, in scattergram B (but not A), we can find many pairs of cases such that one case has (much) higher EDUCATION but the other case has (much) higher INCOME. An Example (cont.) This means that in society B, while EDUCATION has a big impact on EDUCATION, there are evidently other (independent) variables (maybe family wealth, ambition, career choice, athletic or other talent, just plain luck, etc.) that also have major effects on LEVEL OF INCOME. In contrast, in society A it appears that LEVEL OF EDUCATION (almost) wholly determines LEVEL OF INCOME and that essentially nothing else matters. Another difference between the two societies is that, while both societies have similar distributions of EDUCATION, their INCOME distributions are quite different. A is quite egalitarian with respect to INCOME, which ranges only from about $40,000 to about $60,000, while B is considerably less egalitarian with respect to INCOME, which ranges from under to $10,000 to at least $100,000 — and possibly higher.) In summary, in society A the INCOME rewards of EDUCATION are modest but essentially certain, while in society B the INCOME rewards of EDUCATION are substantial on average but quite uncertain in individual cases. Correlation Coefficient: r ➢ It is also called Pearson's correlation or product moment correlation coefficient. ➢ It measures the nature and strength between two variables of the quantitative type. The sign of r denotes the nature of association while the value of r denotes the strength of association. ➢ If the sign is +, this means the relation is direct (an increase in one variable is associated with an increase in the other variable and a decrease in one variable is associated with a decrease in the other variable). ➢ While if the sign is -, this means an inverse or indirect relationship (which means an increase in one variable is associated with a decrease in the other). ➢ ➢ The value of r ranges between ( -1) and ( +1) The value of r denotes the strength of the association as illustrated by the following diagram. strong -1 intermediate -0.75 -0.25 weak weak 0 indirect perfect correlation intermediate 0.25 strong 0.75 1 Direct no relation perfect correlation If r = Zero this means no association or correlation between the two variables. If 0 < r < 0.25 = weak correlation. If 0.25 ≤ r < 0.75 = intermediate correlation. If 0.75 ≤ r < 1 = strong correlation. If r = l = perfect correlation. How to compute the simple correlation coefficient (r) r x y   xy  n 2 2     ( x) ( y)   2 2 x  .  y      n n    This slide used for explanation only…not a required “understanding” slide. Example: A sample of 6 children was selected, data about their age in years and weight in kilograms was recorded as shown in the following table . It is required to find the correlation between age and weight. serial No Age (years) Weight (Kg) 1 7 12 2 6 8 3 8 12 4 5 10 5 6 11 6 9 13 This slide used for explanation only…not a required “understanding” slide. These 2 variables are of the quantitative type, one variable (Age) is called the independent and denoted as (X) variable and the other (weight) is called the dependent and denoted as (Y) variables to find the relation between age and weight compute the simple correlation coefficient using the following formula: r  x y   xy  2  ( x)    x2   n  n 2  ( y)  .  y 2   n      This slide used for explanation only…not a required “understanding” slide. Serial n. Age (years) (x) Weight (Kg) (y) xy X2 Y2 1 7 12 84 49 144 2 6 8 48 36 64 3 8 12 96 64 144 4 5 10 50 25 100 5 6 11 66 36 121 6 9 13 117 81 169 Total ∑x= 41 ∑y= 66 ∑xy= 461 ∑x2= 291 ∑y2= 742 This slide used for explanation only…not a required “understanding” slide. r 41  66 461  6  (41) 2   (66) 2  291  .742   6 6    r = 0.759 strong direct correlation This slide used for explanation only…not a required “understanding” slide. EXAMPLE: Relationship between Anxiety and Test Scores X2 Y2 Anxiety (X) Test score (Y) 10 8 2 1 5 6 ∑X = 32 2 100 4 20 3 64 9 24 9 4 81 18 7 1 49 7 6 25 36 30 5 36 25 30 ∑Y = 32 ∑X2 = 230 ∑Y2 = 204 ∑XY=129 XY This slide used for explanation only…not a required “understanding” slide. Calculating Correlation Coefficient r (6)(129)  (32)(32) 6(230)  32 6(204)  32  2 2 774  1024   .94 (356)( 200) r = - 0.94 Indirect strong correlation This slide used for explanation only…not a required “understanding” slide. exercise Multiple Correlation Tables Repeated correlations with multiple variables t-tests, & ANOVAs and their application to the statistical analysis of neuroimaging Adapted from Carles Falcon & Suz Prejawa Populations and samples Population  z-tests Sample (of a population) t-tests NOTE: a sample can be 2 sets of scores, eg fMRI data from 2 conditions Comparison between Samples Are these groups different? Comparison between Conditions (fMRI) Reading aloud (script) Reading aloud vs vs “Reading” finger spelling (sign) Picture naming t-tests comp infer 12 95% CI 10 8 6 Left hemisphere right hemisphere Exp. 1lesion site Exp. 2 • Compare the mean between 2 samples/ conditions • if 2 samples are taken from the same population, then they should have fairly similar means  if 2 means are statistically different, then the samples are likely to be drawn from 2 different populations, ie they really are different t-test in word-forming area comp infer • Exp. 1: activation patterns are similar, not significantly different  they are similar tasks and recruit the wordforming area in a similar way 12 • Exp. 2: activation patterns are very (and significantly) different reading aloud recruits the word-forming area significantly more than naming 95% CI 10 8 6 Left hemisphere Exp. 1 lesion site right hemisphere Exp. 2 Formula Difference between the means divided by the pooled standard error of the mean x1  x 2 t s x1  x2 Example slide only…you DON’T have to know this formula!! Formula cont. x1  x 2 t s x1  x2 2 Cond. 1 Cond. 2 s x1  x2  2 s1 s2  n1 n2 Example slide only…you DON’T have to know this formula!! Types of t-tests One sample vs. Independent hypothesized Samples mean Paired Samples (also called dependent means test) # 1 sample compared to a predicted value 2 experimental conditions and different participants were assigned to each condition 2 experimental conditions and the same participants took part in both conditions of the experiments Research Question Example • Let’s pretend you came up with the following theory… Having a baby increases brain volume (associated with possible structural changes) Some Problems with a Population-Based Study • • • • Cost Not able to include everyone Too time consuming Ethical right to privacy Realistically, researchers can only do sample based studies. Paired Sample T-Tests: Pre and Post Hypothesize • HO = There is no difference in brain size before or after giving birth • HA = The brain is significantly smaller or significantly larger after giving birth (difference detected) Absolute Brain Volumes cm3 Sum Mean SD Before Delivery 1437.4 1089.2 1201.7 1371.8 1207.9 1150.7 1221.9 1208.7 9889.3 1236.1625 113.8544928 6 Weeks After Delivery 1494.5 1109.7 1245.4 1383.6 1237.7 1180.1 1268.8 1248.3 10168.1 1271.0125 119.0413426 T=(1271-1236)/(119-113) Difference 57.1 20.5 43.7 11.8 29.8 29.4 46.9 39.6 278.8 34.85 5.18685 Results: p=.003 T DF 6.718914454 7 Women have a significantly larger brain after giving birth. http://www.danielsoper.com/statcalc/calc08.aspx The concentration of cholesterol (a type of fat) in the blood is associated with the risk of developing heart disease, such that higher concentrations of cholesterol indicate a higher level of risk, and lower concentrations indicate a lower level of risk. If you lower the concentration of cholesterol in the blood, your risk of developing heart disease can be reduced. Being overweight and/or physically inactive increases the concentration of cholesterol in your blood. Both exercise and weight loss can reduce cholesterol concentration. However, it is not known whether exercise or weight loss is best for lowering cholesterol concentration. Therefore, a researcher decided to investigate whether an exercise or weight loss intervention is more effective in lowering cholesterol levels. To this end, the researcher recruited a random sample of inactive males that were classified as overweight. This sample was then randomly split into two groups: Group 1 underwent a calorie-controlled diet and Group 2 undertook the exercise-training program. In order to determine which treatment program was more effective, the mean cholesterol concentrations were compared between the two groups at the end of the treatment programs. This table provides useful descriptive statistics for the two groups that you compared, including the mean and standard deviation. This table provides the actual results from the independent t-test. This study found that overweight, physically inactive male participants had statistically significantly lower cholesterol concentrations (5.80 ± 0.38 mmol/L) at the end of an exercise-training programme compared to after a calorie-controlled diet (6.15 ± 0.52 mmol/L), t(38) = 2.428, p = 0.020. Note the mean for each of the two groups in the “Group Statistics” section. This output shows that the average weight for European cars is 2431 pounds, versus 2221 pounds for Japanese cars. To see the results of the t-test for the difference in the two means, find the p-value for the test. The p-value is labeled as “Sig.” in the SPSS output (“Sig.” stands for significance level). To find the correct “Sig.”, look in the section of the “Independent Samples Test” output labeled “t-test for Equality of Means” and you will find a column labeled “Sig. (2-tailed).” This is the correct column, not the column labeled “Sig.” in the section of the “Levene’s Test for Equality of Variances” section. Finally, read the “Sig.” value in the second row, the row labeled “Equal variances not assumed”. We will use the second row since we almost never have any reason to think a priori that the amount of variation within each group will be the same (the p-value in the two rows is usually almost the same anyway). In the above example the p-value is .002, implying that the difference in means is statistically significant at the .05. and .01 levels. Comparison of more than 2 samples or complicated designs Each comparison brings its own p-value…too much! Could be p = .15, yielding no results! ANOVA in word-forming area comp infer 12 95% CI 10 • Is activation in word-forming area different for a) naming and reading and b) influenced by age and if so (a + b) how so? 8 6 Left hemisphere right hemisphere Naminglesion site Reading TASK Naming AGE Young Old Reading Aloud • H1 & H0: reading difference • H2 & H0: age difference • H3 & H0: reading/age difference • reading causes significantly stronger activation in the wordforming area but only in the older group so the word-forming area is more strongly activated during reading but this seems to be affected by age  ANOVA ANalysis Of VAriance (ANOVA) – Still compares the differences in means between groups but it uses the variance of data to “decide” if means are different Many different types of ANOVA…just the basics in this class, however. I’m throwing this in only if you want to explore detailed types of ANOVAs… 2-way ANOVA for independent groups Type Participants Condition I Condition II Task I Participan t group A Participant group B Task II Participan t group C Participant group D Betweensubject design repeated measures ANOVA Condition I Condition II Task I Participan t group A Participant group A Task II Participan t group A Participant group A mixed ANOVA Condition I Condition II Task I Participa nt group A Participant group B Task II Participa nt group A Participant group B Within-subject design NOTE: You may have more than 2 levels in each condition/ task both A manager wants to raise the productivity at his company by increasing the speed at which his employees can use a particular spreadsheet program. As he does not have the skills in-house, he employs an external agency which provides training in this spreadsheet program. They offer 3 courses: a beginner, intermediate and advanced course. He is unsure which course is needed for the type of work they do at his company, so he sends 10 employees on the beginner course, 10 on the intermediate and 10 on the advanced course. When they all return from the training, he gives them a problem to solve using the spreadsheet program, and times how long it takes them to complete the problem. He then compares the three courses (beginner, intermediate, advanced) to see if there are any differences in the average time it took to complete the problem. Descriptive Statistics… ANOVA Results… What can be concluded from ANOVA • There is a significant difference somewhere between groups • NOT where the difference lies • Finding exactly where the difference lies requires further statistical analysis = post hoc analysis Tukey Post-Hoc tests… • There was a statistically significant difference between groups as determined by one-way ANOVA (F(2,27) = 4.467, p = .021). A Tukey post-hoc test revealed that the time to complete the problem was statistically significantly lower after taking the intermediate (23.6 ± 3.3 min, p = .046) and advanced (23.4 ± 3.2 min, p = .034) course compared to the beginners course (27.2 ± 3.0 min). There were no statistically significant differences between the intermediate and advanced groups (p = .989). Conclusions • T-Tests for samples • ANOVAS compare 2 groups in more complicated scenarios or more than 2 groups Tables used in this presentation are courtesy of: https://statistics.laerd.com/spss-tutorials/ Additional tests… • More statistical tests available for more complicated situations: • ANCOVAs (same idea as ANOVA but in pre/post test situations, ANCOVA can be used if your original groups are statistically different). • Factor Analysis (investigation into separate factors which may explain correlations of several variables) • MANOVAs (same principal as ANOVA but multiple dependent variables). Please answer these questions Final Thoughts: 1. The goal of this course was to give you an overview of educational research methods and statistical tests. Please discuss the degree to which you feel that this goal was accomplished. 2. Explain which activity of this class was most effective at increasing your knowledge of educational research and why it was so effective. 3. Through which activity or activities do you feel you gained immediately-useful information to help you improve your understanding of educational research? 4. Which of the statistical tests presented in this course do you feel you understand the most and why? The least and why? 5. Did you find that any instructional activities in the course were not beneficial to your understanding of educational research? Sampling, Normal Curve, and Hypotheses in Quantitative Research Types of Sampling Simple Random Sample Stratified Random Sample Cluster sampling Systematic Convenience Simple Random Sample Every subset of a specified size n from the population has an equal chance of being selected. Stratified Random Sample The population is divided into two or more groups called strata, according to some criterion, such as geographic location, grade level, age, or income, and subsamples are randomly selected from each strata. Cluster Sample The population is divided into subgroups (clusters) like families. A simple random sample is taken of the subgroups and then all members of the cluster selected are surveyed. Systematic Sample Every kth member ( for example: every 10th person) is selected from a list of all population members. Convenience Sample Selection of whichever individuals are easiest to reach. It is done at the “convenience” of the researcher. Now you decide: • including 5 people from every sports team on a collegiate campus • including every teacher from 4 elementary schools chosen from a group of 11 elementary schools in a school district with 45 elementary schools total • including 25 employees whose names were drawn from a hat 250 school employees • including all people who attend parent-teacher conferences • including every 20th student from a list of 2000 students in a particular high school Errors in Sampling Non-Observation Errors ◦ Sampling error: naturally occurs ◦ Coverage error: people sampled do not match the population of interest ◦ Underrepresentation ◦ Non-response: won’t or can’t participate As the researcher, you will never eliminate ALL elements of BIAS…but it is your job to minimize the impact of bias on your research project by carefully planning out your research design. The Normal Curve The Normal Distribution: The Normal curve is a mathematical abstraction which conveniently describes ("models") many frequency distributions of scores in real-life. length of time before someone looks away in a staring contest: length of pickled gherkins: Francis Galton (1876) 'On the height and weight of boys aged 14, in town and country public schools.' Journal of the Anthropological Institute, 5, 174-180: Francis Galton (1876) 'On the height and weight of boys aged 14, in town and country public schools.' Journal of the Anthropological Institute, 5, 174-180: Height of 14 year-old children 16 country town 14 10 8 6 4 2 0 51 -5 2 53 -5 4 55 -5 6 57 -5 8 59 -6 0 61 -6 2 63 -6 4 65 -6 6 67 -6 8 69 -7 0 frequency (%) 12 height (inches) Frequency of different wand lengths An example of a normal distribution - the length of Sooty's magic wand... Length of wand Properties of the Normal Distribution: 1. It is bell-shaped and asymptotic at the extremes. 2. It's symmetrical around the mean. 3. The mean, median and mode all have same value. 4. It can be specified completely, once mean and SD are known. 5. The area under the curve is directly proportional to the relative frequency of observations. e.g. here, 50% of scores fall below the mean, as does 50% of the area under the curve. e.g. here, 85% of scores fall below score X, corresponding to 85% of the area under the curve. Relationship between the normal curve and the standard deviation: frequency All normal curves share this property: the SD cuts off a constant proportion of the distribution of scores:- 68% 95% 99.7% -3 -2 -1 mean +1 +2 +3 Number of standard deviations either side of mean About 68% of scores fall in the range of the mean plus and minus 1 SD; 95% in the range of the mean +/- 2 SDs; 99.7% in the range of the mean +/- 3 SDs. e.g. IQ is normally distributed (mean = 100, SD = 15). 68% of people have IQs between 85 and 115 (100 +/- 15). 95% have IQs between 70 and 130 (100 +/- (2*15). 99.7% have IQs between 55 and 145 (100 +/- (3*15). 68% 85 (mean - 1 SD) 115 (mean + 1 SD) We can tell a lot about a population just from knowing the mean, SD, and that scores are normally distributed. If we encounter someone with a particular score, we can assess how they stand in relation to the rest of their group. e.g. someone with an IQ of 145 is quite unusual (3 SDs above the mean). IQs of 3 SDs or above occur in only 0.15% of the population [ (100-99.7) / 2 ]. Population  all possible values Sample  a portion of the population Statistical inference  generalizing from a sample to a population with calculated degree of certainty Two forms of statistical inference ◦ Hypothesis testing ◦ Estimation Parameter  a characteristic of population, e.g., population mean µ Statistic  calculated from data in the sample, e.g., sample mean ( ) P-hat  a sample proportion, symbolized by Distinctions Between Parameters and Statistics (Vocabulary Review) Parameters Statistics Source Population Sample Notation Greek (e.g., μ) Roman (e.g., xbar) Vary No Yes Calculated No Yes Hypothesis Testing Steps A.Null and alternative hypotheses B.Significance level C.Test statistic D.P-value and interpretation General Example: A criminal trial is an example of hypothesis testing without the statistics. In a trial a jury must decide between two hypotheses. The null hypothesis is H0: The defendant is innocent The alternative hypothesis or research hypothesis is H1: The defendant is guilty The jury does not know which hypothesis is true. They must make a decision on the basis of evidence presented. In the language of statistics convicting the defendant is called rejecting the null hypothesis in favor of the alternative hypothesis. That is, the jury is saying that there is enough evidence to conclude that the defendant is guilty (i.e., there is enough evidence to support the alternative hypothesis). We say, “We reject the null.” If the jury acquits it is stating that there is not enough evidence to support the alternative hypothesis. Notice that the jury is not saying that the defendant is innocent, only that there is not enough evidence to support the alternative hypothesis. That is why we never say that we accept the null hypothesis…we say, “We fail to reject the null.” (Although non-stats people often do this wrong!) Specific Example: Crazy guy...but good at explanations! Another Specific Example: A department store manager determines that a new billing system will be cost-effective only if the mean monthly account is more than $170. What null and alternative hypotheses can we write for this situation? The system will be cost effective if the mean account balance for all customers is greater than $170. We express this belief as a our research hypothesis, that is: H1: μ > 170 (this is what we want to determine) Thus, our null hypothesis becomes: H0: μ < 170 (we assume is true until proven otherwise) Interpretation P-value answer the question: What is the probability of the observed test statistic … when H0 is true? Thus, smaller and smaller P-values provide stronger and stronger evidence against H0 Small P-value  strong evidence for HA Interpreting the p-value… The smaller the p-value, the more statistical evidence exists to support the alternative hypothesis. •If the p-value is less than 1%, there is overwhelming evidence that supports the alternative hypothesis. •If the p-value is between 1% and 5%, there is a strong evidence that supports the alternative hypothesis. •If the p-value is between 5% and 10% there is a weak evidence that supports the alternative hypothesis. •If the p-value exceeds 10%, there is no evidence that supports the alternative hypothesis. We observe a p-value of .0069, hence there is overwhelming evidence to support H1: > 170. 11.38 Interpreting the p-value… Overwhelming Evidence (Highly Significant) Strong Evidence (Significant) Weak Evidence (Not Significant) No Evidence (Not Significant) 0 .01 .05 .10 p=.0068 11.39 Conclusions of a Test of Hypothesis… If we reject the null hypothesis, we conclude that there is enough evidence to infer that the alternative hypothesis is true. If we fail to reject the null hypothesis, we conclude that there is not enough statistical evidence to infer that the alternative hypothesis is true. This does not mean that we have proven that the null hypothesis is true! 11.40 Prior to testing, you would decide on a level of significance… Your computed “p-value” will indicate whether you should reject the null or fail to reject the null. Let’s examine some p-values and make decisions: If p = .45, we would __________________. If p = .20, we would __________________. If p = .09, we would __________________. If p = .01, we would __________________. If p = .009, we would __________________. In summary… *Sampling critically important to your study. *Null and alternative hypotheses are the foundation of research investigations. *Interpreting the p-value provides evidence as to whether you have “statistically significant” evidence to support your claim or not.
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

almost through..

Please answer these questions
Final Thoughts:
1.

The goal of this course was to give you an overview of
educational research methods and statistical tests.
Please discuss the degree to which you feel that this
goal was accomplished.
Research is what drives any discipline to heights
that’s are unprecedented be it research in science,
arts, etc. after this course, I...


Anonymous
Just the thing I needed, saved me a lot of time.

Studypool
4.7
Indeed
4.5
Sitejabber
4.4

Related Tags