MAT 154 Grand Canyon University Modeling a Problem Statistics Project

User Generated

fnaqrefcq

Mathematics

Mat 154

Grand Canyon University

MAT

Description

: Consider one way of modeling a problem. Say you have a device an in any given year there is about a 1 in 6 chance that the device will fail. A question is, “On average, how long will it be before such a device fails?” This sort of problem is what will be modeled and analyzed here.

Unformatted Attachment Preview

Notes for Statistics Project Contents Statistics on data. .......................................................................................................................................... 2 Statistics on a random variable. .................................................................................................................... 3 Expected value E(X) (a.k.a. “mean”) ......................................................................................................... 3 Linearity of Expectation ........................................................................................................................ 4 Total Expectation .................................................................................................................................. 4 Variance and Standard Deviation ............................................................................................................. 4 Two important sums for your project........................................................................................................... 5 Statistics on data. There are many statistics associated to data, two of the most important statistics are mean and standard deviation. Other statistics include, mode, median, variance, kurtosis, and on and on. The mean, or average, is one notion of the center of data. The standard deviation is a measure of spread. Given data 𝑥 = 𝑥1 , … , 𝑥𝑁 these two statistics are defined by 𝑁 1 𝑥̅ = ∑ 𝑥𝑖 , 𝑁 𝑖=1 𝑁 1 var(x) = ∑ (𝑥𝑖 − 𝑥̅ )2 , 𝑁 𝑖=1 and σ(𝑥) = √var(𝑥) The quantity |𝑥𝑖 − 𝑋| is called the deviation of 𝑥𝑖 from the mean and so variance is the mean of 1 ̅ squared deviations. In some sense the mean deviation, ∑𝑁 𝑖=1|𝑥𝑖 − 𝑋 | would be another, 𝑁 perhaps more obvious, measure of the average distance of a data point from the mean, but for mathematical reasons, 𝜎(𝑋) plays this roll. Given data as above one can find the distribution of the data, namely the function 𝑝(𝑥𝑖 ) = |{𝑥𝑗 |𝑥𝑗 = 𝑥𝑖 }| . Consider the random variable 𝑁 X = select one item at random from among 𝑥1 , … , 𝑥𝑁 , then 𝑃(𝑋 = 𝑥𝑖 ) = 𝑝(𝑥𝑖 ). Thus, from data we get a derived probability distribution of the random variable X. Example: Here are the results of throwing two dice and taking the sum repeated 100 times. 12 10 9 9 8 7 7 6 5 4 11 10 9 8 8 7 6 6 5 4 11 10 9 8 8 7 6 6 5 4 11 10 9 8 8 7 6 6 5 4 11 10 9 8 8 7 6 6 5 3 11 10 9 8 8 7 6 6 5 3 11 10 9 8 8 7 6 6 4 3 1 11 10 9 8 7 7 6 5 4 3 11 10 9 8 7 7 6 5 4 3 11 9 9 8 7 7 6 5 4 2 9 So 𝑥1 = 12, 𝑥2 = 11, … , 𝑥100 = 2, here 𝑝(12) = 100 = 0.01, 𝑝(11) = 100 = 0.09, etc. 𝑥𝑖 𝑃(𝑋 = 𝑥𝑖 ) 𝑃(𝑋 = 𝑥𝑖 ) 2 3 4 5 6 7 8 9 10 11 12 1/100 4/100 8/100 9/100 16/100 14/100 16/100 12/100 9/100 9/100 1/100 0.01 0.03 0.08 0.09 0.16 0.17 0.16 0.12 0.09 0.09 0.01 A simple calculation in Excel yields: 𝑥̅ = 7.25 and 𝜎(𝑥) = 2.317. Statistics on a random variable. Given a random variable 𝑋 with distinct possible outcomes 𝑥1 , … , 𝑥𝑀 there is an associated distribution 𝑝(𝑥) = 𝑃(𝑋 = 𝑥). Example: A typical example would be throwing two identical fair dice and taking the sum, this 1 would be our random variable X. Here we have 𝑃(𝑋 = 7) = 6, this is because a 7 can be achieves as (1,6), (6,1), (2,5), (5,2), (3,4), or (4,3) and there are 36 possible outcomes so there is 1 a 6/36 = 1/6 chance that a 7 is the result. Similarly, 𝑃(𝑋 = 2) = 36. If we were to actually collect data, say roll these dice 100 times and record the results, we would expect “about” 1/6th of the rolls are 7, while about 1/36th of the results would be 2 and thus we would expect the distribution derived from the data, to be close to the distribution based on the assumption of throwing two fair dice. Now we turn to the problem of computing the expected value, variance, and standard deviation of the random variable, directly from the distribution without resorting to collecting data. Throughout assume 𝑋 has the distinct possible outcomes 𝑥1 , … , 𝑥𝑀 which are all numeric values and that a distribution function 𝑝(𝑥𝑖 ) = 𝑃(𝑋 = 𝑥𝑖 ) is known. It is possible that 𝑀 = ∞ and this is indeed the case for your project, where the 𝑥𝑖 represent a time to failure of some device. Expected value E(X) (a.k.a. “mean”) The expected value of X can be calculated as: 𝑀 𝐸(𝑋) = ∑ 𝑥𝑖 ⋅ 𝑃(𝑋 = 𝑥𝑖 ) 𝑖=1 If you were given data 𝑦1 , … , 𝑦𝑁 and derived the distribution from the data as we did in the first part and if X was the random variable “select a data point at random”, then 𝑦̅ = 𝑁 𝑀 𝑀 |{𝑗|𝑦𝑗 = 𝑥𝑖 }| 1 ∑ 𝑦𝑖 = ∑ 𝑥𝑖 ⋅ = ∑ 𝑥𝑖 ⋅ 𝑃(𝑋 = 𝑥) = 𝐸(𝑋) 𝑁 𝑁 𝑖=𝑖 𝑖=1 𝑖=1 Here one must be a little careful as the 𝑦𝑖 ’s might repeat so we take the unique values 𝑥1 , … , 𝑥𝑀 and then must group together all those 𝑦𝑗 such that 𝑦𝑗 = 𝑥𝑖 . In our dice example, for instance, there will be many 𝑦𝑖 ’s with the value 7, this corresponds to the single 𝑥𝑖 value 7. Example: Again, let X be the outcome of throwing two fair 6 sided dice and adding. The full distribution is as follows: 𝑥𝑖 𝑃(𝑋 = 𝑥𝑖 ) 𝑃(𝑋 = 𝑥𝑖 ) 2 1/36 3 2/36 4 3/36 5 4/36 6 5/36 7 6/36 8 5/36 9 4/36 10 3/36 11 2/36 12 1/36 0.028 0.056 0.083 0.111 0.139 0.167 0.139 0.111 0.083 0.056 0.028 12 1 3 2 1 +3⋅ + ⋯ + 11 ⋅ + 12 ⋅ 36 36 36 36 𝑖=2 1 2⋅1+3⋅2+4⋅3+5⋅4+6⋅5+7⋅6+8⋅5 = ⋅( ) +9 ⋅ 4 + 10 ⋅ 3 + 11 ⋅ 2 + 12 ⋅ 1 36 1 = ⋅ 252 = 7 36 𝐸(𝑋) = ∑ 𝑖 ⋅ 𝑃(𝑋 = 𝑖) = 2 ⋅ Note that the distribution and the mean differ from what was determined by the throw of 100 dice above. This is to be expected since every throw of 100 dice will turn out slightly differently. If we throw 1000 pairs of dice, then the distribution and mean of the data would be closer to the distribution and expected value of the random variable. This is one reason for choosing large sample sizes or many repetitions when doing experiments. Linearity of Expectation Expectation has an important property that follows directly from its definition: 𝐸(𝑐𝑋 + 𝑑𝑌) = 𝑐𝐸(𝑋) + 𝑑𝐸(𝑌) where 𝑋 and 𝑌 are any random variables and 𝑐 and 𝑑 are numeric constants. Example: Let Y be the experiment of throwing 2 dice and adding, 10 times and adding all the results. Thus 𝑌 = 𝑋 + 𝑋 + 𝑋 ⋯ + 𝑋 = 10𝑋 where 𝑋 is the result of throwing a pair of dice once and adding. Then 𝐸(𝑌) = 𝐸(10𝑋) = 10 ⋅ 7 = 70. This is much simpler than calculating a distribution for 𝑌 and using the definition. Total Expectation Suppose 𝐴1 , … , 𝐴𝑘 are a disjoint collection of events so that exactly one must occur. Then 𝐸(𝑋|𝐴𝑖 ) is the expected value of 𝑋 given that event 𝐴𝑖 has occurred. The law of total expectation states: 𝑘 𝐸(𝑋) = 𝐸(𝑋|𝐴1 ) ⋅ 𝑃(𝐴1 ) + ⋯ + 𝐸(𝑋|𝐴𝑘 ) ⋅ 𝑃(𝐴𝑘 ) = ∑ 𝐸(𝑋|𝐴𝑖 ) ⋅ 𝑃(𝐴𝑖 ) 𝑖=1 Example: Let 𝐴𝑖 be the event that an 𝑖 is rolled on the second die. Clearly, 𝐸(𝑋|𝐴𝑖 ) = 1 ((1 + 𝑖) + (2 + 𝑖) + (3 + 𝑖) + (4 + 𝑖) + (5 + 𝑖) + (6 + 𝑖)) ⋅ = 6 6 𝐸(𝑋) = ∑ 𝑖=1 𝐸(𝑋|𝐴𝑖 ) ⋅ 21 6 +𝑖 6 1 21 1 1 = ∑ ( + 𝑖) ⋅ = ⋅ (21 + 21) = 7 6 6 6 𝑖=1 6 Variance and Standard Deviation Now the variance of X is the expected value of the deviation. var(𝑋) = 𝐸(𝑋 − 𝐸(𝑋)2 ) Using a little algebra and linearity of expectation this can be simplified: var(𝑋) = 𝐸(𝑋 2 − 2𝑋 ⋅ 𝐸(𝑋) + 𝐸(𝑋)2 ) = 𝐸(𝑋 2 ) − 2𝐸(𝑋)𝐸(𝑋) + 𝐸(𝑋)2 = 𝐸(𝑋 2 ) − 𝐸(𝑋)2 The final form is often used in computation: var(𝑋) = 𝐸(𝑋 2 ) − 𝐸(𝑋)2 Standard deviation is simply the square root of the variance, so we have: 1 𝜎(𝑋) = (var(𝑋))2 Example: Continuing with our example of throwing two dice and summing we have 𝑥𝑖2 𝑃(𝑋 2 = 𝑥𝑖2 ) 𝑃(𝑋 2 = 𝑥𝑖2 ) 4 1/36 9 2/36 16 3/36 25 4/36 36 5/36 49 6/36 64 5/36 81 4/36 100 3/36 121 2/36 144 1/36 0.028 0.056 0.083 0.111 0.139 0.167 0.139 0.111 0.083 0.056 0.028 So 𝐸(𝑋 2 ) = 4 ⋅ 1 2 1 1974 +9⋅ + ⋯ + 144 ⋅ = 36 36 36 36 Thus var(𝑋) = 𝐸(𝑋 2 ) − 𝐸(𝑋)2 = 1974 210 105 35 − 49 = = = 36 36 18 6 and 3 𝜎(𝑋) = √ = 2.415 … 6 Two important sums for your project For your project a hint is provided that allows you to avoid using this section of the notes. You will need to deal with a random variable that has infinitely many possible values, namely, 1, 2, 3, … (all positive integers). For the computations you will need the following formulas where it is assumed that |𝑥| < 1: ∞ 1 + 𝑥 + 𝑥2 + ⋯ = ∑ 𝑥𝑖 = 𝑖=0 1 + 2 ⋅ 𝑥 + 3 ⋅ 𝑥2 + ⋯ = ∑ 1 1−𝑥 1 2 ) 𝑛 ⋅ 𝑥 𝑛−1 = ( 1−𝑥 𝑖=1 ∞ The first of these is called a geometric series and information on these is found in your text. Here you simply need these two formulas. Statistics Project Problem: Consider one way of modeling a problem. Say you have a device an in any given year there is about a 1 in 6 chance that the device will fail. A question is, “On average, how long will it be before such a device fails?” This sort of problem is what will be modeled and analyzed here. The problem above can be modeled as follows: Roll a normal fair six-sided die. Rolling a 1 will count as “fail” anything else will count as “not-fail”. The random variable we will use will be X = # of rolls required before a 1 is rolled = “time until failure” This is a random variable, so in each experiment (for each device) this will produce a value. For example, if we roll , then X = 5 (we count the final roll). Part 1 (Empirical Analysis): Roll a die, or simulate using Excel, python, or some other option, e.g., Random.org, 20 repetitions of obtaining a value for X, that is, roll until a 1 is rolled, record the result and repeat 20 times. For example, if the rolls are , then record 4 since 4 rolls were required. • • • Record your rolls and counts. Compute the mean and standard deviation of your values for X. Given the modeling problem we started with interpret the mean and standard deviation you found in terms of how many years are expected before the device fails. Excel might be used for this part as it makes the calculations, recording of data, etc., very simple, however, it is not required. Part 2 (Theoretical Analysis): Compute the expected value, variance, and standard deviation of the random variable X. See the additional notes for discussion on these computations, in brief: • 𝑃(𝑋 = 𝑖) = the probability of rolling something other than a 1 (𝑖 − 1)-times, then rolling a 1 on the ith throw. Make a table for the first 10 values: … 𝑃(𝑋 = 20) 𝑃(𝑋 = 1) 𝑃(𝑋 = 1) 1 ⋅ 𝑃(𝑋 = 1) 2 ⋅ 𝑃(𝑋 = 1) … 20 ⋅ 𝑃(𝑋 = 20) 12 ⋅ 𝑃(𝑋 = 1) 22 ⋅ 𝑃(𝑋 = 1) … 202 ⋅ 𝑃(𝑋 = 0) 20 2 Use this to compute ∑20 𝑖=1 𝑖 ⋅ 𝑃(𝑋 = 𝑖) and ∑𝑖=1 𝑖 ⋅ 𝑃(𝑋 = 𝑖) and then use these to get a rough approximation of 𝐸[𝑋] and var[𝑋] = 𝐸[𝑋 2 ] − (𝐸[𝑋])2 • • • (Expected Value) 𝐸[𝑋] = ∑∞ 𝑖=1 𝑖 ⋅ 𝑃(𝑋 = 𝑖) (Variance) var[𝑋] = 𝐸[(𝑋 − 𝐸[𝑋])2 ] = 𝐸[𝑋 2 ] − (𝐸[𝑋])2 (Standard Deviation) 𝜎[𝑋] = √var[𝑋] • Repeat the second item of Part 1: Given the modeling problem we started with, interpret 𝐸[𝑋] and 𝜎[𝑋] in terms of how many years are expected before the device fails. Hint: For computing these without directly manipulating the infinite summations that appear in the definitions of 𝐸[𝑋] and var[𝑋], let 𝐴 be the event that a 1 is rolled on the first throw and 𝐴′ be the complementary event, namely, that a 1 is not rolled on the first throw. It is clear that 𝐸[𝑋|𝐴′ ] = 𝐸[1 + 𝑋] = 1 + 𝐸[𝑋] since what is rolled after the first roll is just like starting over. It is also clear that 𝐸[𝑋|𝐴] = 1. The Law of Total Expectation (see notes) gives: 𝐸[𝑋] = 𝐸[𝑋|𝐴] ⋅ 𝑃(𝐴) + 𝐸[𝑋|𝐴′ ] ⋅ 𝑃(𝐴′ ) = 𝑃(𝐴) + (1 + 𝐸[𝑋]) ⋅ (1 − 𝑃(𝐴)) This makes it quite simple to find 𝐸[𝑋]. A similar “trick” can be used to find 𝐸[𝑋 2 ], here you will use 𝐸[𝑋 2 |𝐴′ ] = 𝐸[(1 + 𝑋)2 ] = 𝐸[1 + 2𝑋 + 𝑋 2 ] = 1 + 2𝐸[𝑋] + 𝐸[𝑋 2 ]. Again, the Law of Total Expectation gives: 𝐸[𝑋 2 ] = 𝐸[𝑋 2 |𝐴] ⋅ 𝑃(𝐴) + 𝐸[𝑋 2 |𝐴′ ] ⋅ 𝑃(𝐴′ ) and from this it is simple to compute 𝐸[𝑋 2 ]. Extra Instructions: Every item indicated by a • is an item you must address. Part 1 may be done in Excel or by hand. Part 2 may be done by hand and then uploaded. The uploaded file must be a pdf. If you want to use Word that is good, but the mathematics must be formatted correctly.
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Hi Dude, Please check the attached files for detailed work, let me know if you have any questions, thank you. Best regards, James,

part I
# of rolls until one is obtained X
8
10
5
1
1
4
2
6
2
1
10
12
2
2
10
4
10
2
1
2

Mean of X
4.75

Standard Deviation of X
3.82340438

on average, the life time of the device is 4.75 years, give its standard deviation 3.
time may vary, but not far away from 4.753.82 years.

ve its standard deviation 3.82 years, for any sample device, the life
.


𝑥𝑖

1

2

3

4

5

6

7

8

9

10

𝑃(𝑋 = 𝑥𝑖 )

0.1667

0.1389

0.1157

0.0...


Anonymous
Just what I needed. Studypool is a lifesaver!

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags