descriptive statistics, statistics homework help

Content Type

User Generated

User

xuhag87

Subject

Mathematics

Description

Instructions attached

Unformatted Attachment Preview

WEEK 1, HW1 (PART 1): DESCRIPTIVE STATISTICS We start with DESCRIPTIVE STATISTICS where we simply want to “see” what our data set looks like. Later in the course we move on to INFERENTAIL STATISTICS where we try to learn what our sample data can tell us about the actual, entire population from which our sample was taken. INTRODUCTION (please read carefully and post questions if anything is not clear): There are a 1001 expressions that relate to statistics in our lives. My favorites are: “‘Life is a crap shoot”, “Pay your money, and take your chances”, and “What could possibly go wrong?” (The last is the “mantra” of the Darwin Awards). Of course there are phrases that show how we ignore data and the statistical analysis of it: DENIAL, which we have all likely done (and many in this last election), at least in their own minds: “My mind is made up, don’t confuse me with the facts !!”, “ I really want that car; I don’t care about its safety rating or gas mileage!”, “I love fried chicken and pork BBQ; I don’t care about the grease and salt !”, “I don’t need a flu shot !”, “So I’m overweight, smoke, and drink Coke tm who cares ?!” “I only buy ‘organic’ produce, milk and eggs; it’s worth the much higher cost.” What statistical denials have or are YOU making? Research suggests that our brain’s pre-frontal cortex does not mature or “kick-in” until our early twenties. This cortex is where experiences are tied together and we start to see the possible consequences of our actions. Up to then, we are “immortal”, as in the “Born to Be Free” song. Of course some life events (violence) speed up this process, which you can imagine, and not always in a good way. Then again, some of us never “mature.” Moving on . . . Most of our life decisions are (or should be) based on statistics: what is the safest car to buy, what picks should I make for my fantasy team, what foods are heathiest, what medicine can best relieve my headache, can I afford this house, what degree offers the highest job/salary potential, which lottery ticket should I buy, which political candidate will best help me (preferably, which will be best for our country), etc. We base many of these decisions on the ads we’ve seen or read. Those ads cite studies conducted on their products or services. Those studies are statistically based (though NOT necessarily sound statistics). If you watch any TV show you have seen countless ads for drugs that spend more time listing possible hazards than likely benefits. Wonder why? This is a CYA deal. In the testing or actual use of their product, some persons have developed those conditions. We hope they are rare occurrences, but we aren’t given that information (pay your money, take your chances). In making all these corporate and personal decisions they and we need DATA. Keep in mind that the goal is to predict what an entire POPULATION (e.g., age group) will do based on SAMPLES taken from that population. STATISTICS ALLOWS US TO MAKE PREDICTIONS ABOUT A POPULATION BASED ON SAMPLES FROM THAT POPULATION. It gives us the odds (the probability) of success (or failure). Now, let’s assume you ARE a mature “critical thinker” who seeks out hard data and valid statistical analyses (good luck with that). It’s out there in “peer-reviewed” studies and sound science research, if you look. BUT, it’s much easier to find the more readily available, typically very biased data – the “alternative facts.” BUT, let’s be clear – there are NO alternative facts. Facts are facts. There may be different interpretations of why something is a fact, but not that there is a different fact. So, how do we handle these different interpretations? We BALANCE THIS BIAS, meaning look at the extreme views and their supporting data and then form OUR own opinion from these extremes, this can work. This is CRITICAL THINKING and is what education is all about. Unfortunately, as this topic stated up front, far too many of us simply pick the data sources that match, for whatever reason, our personal biases, and that polarization certainly stops any compromise, meaning progress, that would ultimately benefit us all. Moving on . . (again): Let’s talk “DATA”. What is it and how do we collect it, but most importantly what makes it good, meaning valid. There are two types of data: qualitative and quantitative. • • QUALITATIVE: Color of cars, taste of beer (hoppy, fruity, molasses), rankings like “unsatisfied, satisfied, very satisfied”, numbers like 1-4, $$$, , etc.) QUANTITATIVE: Heights, weights, income, home prices, IQ, test scores; almost anything that can be measured mathematically (except numerical rankings). There are two types of quantitative data: discrete and continuous: o DISCRETE: These are WHOLE numbers like number of children, where an average, if not a whole number might sound ridiculous (e.g., average U.S. family has 2.6 children). o CONTINUOUS: Numbers where fractions are realistic like heights, weights, age. Money can go either way, but let’s go with continuous. Rounding off can create some error. There are FOUR SCALES or LEVELS of MEASUREMENT used for these above data types and this is important to remember (Final Exam likely question-Illowsky p-26). • NOMINAL (scale or level): Qualitative data are measured on this scale. The unique characteristic is that no statistical calculation works (would be invalid or nonsense) on NOMINAL data. Even putting the choices like car colors in a particular order makes no real sense: red, yellow, white, blue or blue, white, red, yellow So what now? • ORDINAL (scale or level): Qualitative data can also be measured on this scale. Here we have our RANKINGS using choices like poor/fair/good or $$$ or even numbers 1-4 . Data measured on this scale CAN be put into a meaningful order in that $$$$ is logically higher than $$. HOWEVER, ORDINAL scale data as with Nominal scale data can NOT be analyzed statistically. How much better is a restaurant with 3 “smiley faces” than one with 2 smiley faces? We can’t calculate this and more importantly we don’t know what each ranking was based on. People may like the style of food, its presentation, its quantity, or they may not like dirty silverware, or unclean restrooms. Who knows ?? You may even be asked to rank each of these qualitative areas, but they are still QUALITATIVE, hence this data cannot be analyzed statistically. Also, be careful with numerical rankings like “1 – 5”. These are no more appropriate for statistical analysis than smiley faces. • INTERVAL (scale or level): We have meaningful numbers. Some Quantitative data are measured on this scale, BUT this scale has NO ZERO POINT. Temperatures are a good example. Differences in data DO make sense, BUT comparisons do not. You can calculate average summer/winter temperatures for an area, let’s say 80 oF / 20 oF BUT we can NOT say that it is 4 times hotter in summer than winter. Why? Because the 0 oF or 0 oC are NOT absolute zero. Temperature measured in the KELVIN scale DO go down to absolute zero (when all molecular motion stops). On this scale we CAN say the 100 K is twice as hot as 50 K where “hot” refers to the amount of molecular motion. This motion can be “seen” when you boil water and the molecules of water actually have enough energy to “jump out” of the liquid phase and become steam (gas phase). (The other state of matter is solid like ice). You can even “freeze” (solidify) the gas CO2 as dry ice. • RATIO (scale or level): Now we’re talking !! We have a meaningful zero and we can do ALL the statistical calculations that might apply to this data set. An example would be class grades based on points earned out of 100. This works for most courses with multiple choice tests, but what about essay questions. Can you statistically compare the grades in a course in which grades are based totally on multiple-choice exams to one (in the same subject) in which the grade is based totally on essay question exams? NO ! So be careful that you are ALWAYS comparing apples to apples. This is where knowing what the data are based on is the FIRST critical consideration in evaluating any statistical analysis. Data collection or “SAMPLING” is the next topic. What are we sampling? These are samples of a specific characteristic of an entire POPULATION, and it is RARELY possible to sample an entire population. But, if we did and calculated the mean of all those data, that mean would be considered a PARAMETER of the population. HOWEVER, the mean of a sample is referred to as a STATISTIC. REMEMBER THIS (Final Exam likely). There are FIVE data collection or sampling protocols we will cover, the INTENT of all is to get a REPRESENTATIVE SAMPLE of the population. (methods –Illowsky p-18): • SIMPLE RANDOM sampling is the first. “Random” means that EVERY piece of data has an EQUAL chance (probability) of being collected. You have twenty grandchildren that you like equally well but you can only afford to send $10 holiday presents to five (the rest get $5 each). • STRATIFIED: Divide the population in to logical groups (or strata which means layers - a little confusing). You want to determine the average age of students in each of the ten UMUC departments (let’s assume there are only ten). Then, take a simple random sample of students from each Department. • CLUSTER: This sampling method starts like Stratified in that all groups in a population are identified. BUT, then we use simple random sampling to decide on only a portion (cluster) of those groups. Next, we use simple random sampling to collect our data from each of groups in that cluster. • SYSTEMATIC: A little tricky. Remember that we want EVERY person or item in the population to have an equal chance of being selected. So, this seems to require that we know the size of our population. We also need to decide how many samples we can afford to take. Divide the population size by the sample size and save that number. We then pick our starting point from a random numbers table or generator and proceed to collect the desired data (information) from every “saved number” person or item (e.g., item on a conveyor belt for quality assurance). • CONVENIENCE: It is what it is. Poll the classmates, poll the neighbors, count the cars at a nearby intersection. Some of the results from this sampling methodology will produce valid statistical results, but MANY won’t. In some cases this is deliberate BIAS and assumes that readers will NOT question or look into how the data were collected. One last issue with DATA SAMPLING is whether sampling is done WITH REPLACEMENT OR NOT. Taking a large number of samples from a phone book might require going through the book multiple times. With simple random sampling, you would possibly hit the same name twice (or even more). Does this matter? MAYBE. If you ignore a repeat (nonreplacement), you actually improve the odds (probability) for the other names or items. FOR EXAMPLE, If 5 winners are pulled from 20 names in a hat and yours is one name out of the 20 in the hat, your odds of winning on the first pick are 1/20 (=0.05 or 5%). It you did NOT win on the first pick and the winner’s name is NOT put back in the hat, your odds improve to 1/19 (=0.053 = 5.3%) and continue to get better with each losing selection. BUT, if the winners’ names are put back in the hat, your odds stay at 5% with each pull as do the odds for the prior winners to win again. For samples from a LARGE population replacement is not that critical. FINALLY, HERE ARE THIS WEEK’S PART 1 HOMEWORK PROBLEMS: HW1 (part 1)- HOMEWORK PROBLEMS (SUBMIT TO THE ASSIGNMENT FOLDER BY 11:59 PM EST SUNDAY) #1. You are the quality assurance person working an assembly line at a TV manufacturing plant. They produce 1000 TV’s a day. IF THE TV’S ARE ALL THE SAME MODEL, WHAT PERCENTAGE (think about the cost of testing) WOULD YOU TEST (WHY?) AND HOW WOULD YOU SELECT THEM (Don’t just say “randomly” – How do you do it randomly?) If the inspector were lazy, how would they likely do it as a “convenience” sample? Lastly, if the 1000 TV’s were 4 different models, how would you sample then and what type of sampling would this be? #2. You are going out to eat. There are three shopping malls nearby and each has up to five restaurants (these restaurants are all different styles: e.g., Italian, Chinese, French). Here are their customer SATISFACTION ratings on a scale of up to five +’s (highest satisfaction). WHAT ASSUMPTIONS ARE YOU MAKING REGARDING WHAT “SATISFACTION” MEANS? Mall 1 Mall 2 Mall 3 (a) ++++ (a) +++ (a) +++++ (b) ++++ (b) ++ (b) ++++ (c) ++++ (c) +++ (c) +++ (d) ++ (d) + (e) +++ #3. (a) What type of data and scale are involved here? (b) Which Mall Restaurant did you pick? WHY? (c) What issues could you encounter with your pick once you got there? #4. What is a CONVENIENCE SAMPLE? Give an example of one and explain when it might be actually useful in giving a picture of the entire population, and what could be misleading about it. #5. You can find 20 RANDOM NUMBERS in a Table or you can generate them with software like Excel. The Excel functions are “RAND” and “RANDBETWEEN”. With “Randbetween” you simply input how many numbers you want, the number of digits you want in your random number and the range of values you want those numbers to fall between. For example you may want twenty, 2-digit numbers that fall between 00 and 100 (like “34”). TWO CONSIDERATIONS: (1) You must systematically use the random numbers in the Table or the ones generated. You don’t “skip around” because that could un-randomize the values. (2) Let’s say you want 1000 names from a 50 page phone book. You reach the end of the book with your systematic selection and only have 800 names. What do you do? Simple: start over in the book (loop). For example, if you were selecting names from every 15th page and you reached the end of the book after only 8 pages, then start over on page 7 of the same book. One source of random numbers is the Greek symbol “π” and its numerical value used in geometry is 3.141592653589793238462643383. . . (ignore the decimal between 3 and 1) and you have THIS string of random numbers: 3141592653589793238462643383. USE THIS STRING (and loop it) to generate twenty, 3-digit (e.g. 314) random numbers AND EXPLAIN how you did it. FYI: the number π used in geometry, as in the AREA of a circle = π r2 , is a random number in that the numbers never repeat: π = 3.141592653589793238462643383 . . . (If you want a million decimal places check out: www.piday.org/million ) WEEK 1, HOMEWORK 1- PART 2: LANE C 1-2,; ILLOWSKY C 2 SECTIONS 2.1 – 2.4 HW1 – Part 1 dealt with what data are and how to collect valid random data. We also talked about how data “samples” are intended to give us an idea of an entire population. Of course sample size affects everything. We now continue with DESCRIPTIVE STATISTICS concepts, such as DISPLAYING our data in the hopes of seeing some pattern as distinctly as possible. So, who cares if there is a pattern? Well, if we see a bell curve shape we likely have a NORMAL distribution and all our statistical analyses will work (give us valid results). BUT, always remember that statistics proves NOTHING by itself. The calculated numbers simply give us support for our hypothesis (our ideas). Remember too that as with stock prices, past data do NOT predict future performance. Of course all statistics mean NOTHING IF the data are not true random samples that reflect the entire population of concern. OPEN THE EXCEL TABLE PROVIDED WITH THIS HOMEWORK AND ANSWER THE FOLLOWING FIVE (5) QUESTIONS #6. Draw a vertical (or horizontal) BAR CHART that compares each month’s total income AND total expenses (TWO BARS PER MONTH FOR THE 12 MONTHS ) #7. Draw a PIE CHART showing the total ANNUAL cost split among these three EXPENSE categories: gas, food, electricity (a hand drawing is fine). EXPLAIN HOW EXCEL OR YOU DETERMINED THE ANGLES OF THE 3 WEDGES IN THE PIE CHART (DUST OFF GEOMETRY AND TRIG) #8. Draw a STEM & LEAF diagram of the MONTHLY INCOME numbers. Explain how you would handle this if the expenses had more significant digits (e.g., $3189 instead of $3200, etc.). This is a possible shortcoming of displaying data in a stem & leaf diagram) #9. HAND DRAW a DOT PLOT for any one data column. (Few seem to get this one right) Look at the LANE text example. Try putting the $-amounts along the bottom axis and a dot for each time a value occurs above that number. You can do this sideways too (see Lane). #10. SUMMARIZE (in your own words AFTER READING THE TEXTS) what each of the data displays (i.e., pie chart, bar graph, stem & leaf, and dot plot) above is BEST suited for and what, if any, are its LIMITATIONS. Which display do you feel gave you the best “picture” of the shape of the data distribution and/or the most information about it? WK1-HW1 JAN FEB MAR APR MAY JUN JUL ASUG SEP OCT NOV DEC TOTALS MEAN MEDIAN MODE VARIANCE STD DEV INCOME 3500 3300 3000 3500 3600 3800 3800 4000 4100 4200 4400 4500 45700 3808.33 3800 3500 204470 452 GAS 250 225 260 200 200 175 300 250 260 200 250 270 2840 237 250 250 1338 37 FOOD 500 600 550 450 400 450 500 500 375 350 400 600 5675 473 475 500 6984 84 UTIL 700 650 600 550 400 500 600 700 500 450 500 600 6750 563 575 600 9148 96 RENT 1400 1400 1400 1400 1400 1400 1400 1400 1400 1400 1400 1400 16,800 1400 1400 1400 0 0 C-CARD 1200 900 800 600 450 450 900 1000 850 1000 800 2000 10,950 913 875 900 166,875 409 EXP TOT 4050 3775 3610 3200 2850 2975 3700 3850 3385 3400 3350 4870 43,015 3585 3505 none 288,875 537
Purchase answer to see full attachment

Tags: interval nominal ordinal descriptive statistics

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Hi there!Attached are two Word documents, containing the solutions to questions there.Also attached are the Excel documents that contain the work that was required in Excel :)Thanks again!Selenica

WK1-HW1
JAN
FEB
MAR
APR
MAY
JUN
JUL
ASUG
SEP
OCT
NOV
DEC
TOTALS
MEAN
MEDIAN
MODE
VARIANCE
STD DEV

INCOME
3500
3300
3000
3500
3600
3800
3800
4000
4100
4200
4400
4500
45700
3808,33
3800
3500
204470
452

GAS
250
225
260
200
200
175
300
250
260
200
250
270
2840
237
250
250
1338
37

FOOD
500
600
550
450
400
450
500
500
375
350
400
600
5675
473
475
500
6984
84

UTIL
700
650
600
550
400
500
600
700
500
450
500
600
6750
563
575
600
9148
96

RENT
1400
1400
1400
1400
1400
1400
1400
1400
1400
1400
1400
1400
16.800
1400
1400
1400
0
0

C-CARD
1200
900
800
600
450
450
900
1000
850
1000
800
2000
10.950
913
875
900
166.875
409

3
4

0
0

3
1

5
2

6
4

EXP TOT
4050
3775
36...