R studio HW

User Generated

pnxrf5qnlf

Description

R studio HW. This question was already asked 1 year ago, as it's the same HW. I saw that you answered this question for azoozeta1 already and I am hoping you can do it for me as well. Thanks.

Unformatted Attachment Preview

Homework 1 Due Wednesday, January 24 1.a. In R define three variables (vectors): Day, WaitingTime and BusLine. All should be numeric variables with the following values. (You should combine all your work in an R script and submit that with a Word document that presents and explains the results) Day WaitingTime BusLine 1 43 0 2 27 0 3 4 0 4 18 0 5 17 0 6 31 0 7 16 1 8 18 1 9 7 0 10 41 0 11 22 1 12 5 1 13 14 1 14 12 0 15 23 0 16 7 1 17 12 1 18 3 1 19 18 1 20 9 1 b. combine your three variables in a dataframe called “CrazyDave” c. For BusLine assign labels 0 to Damen and 1 to Halsted by using the “factor()” function. d. Print the complete dataframe to check your work. Save your dataset by writing it as a csv file to your default working directory. e. provide descriptive statistics (mean, median, mode, variance, standard deviation) as appropriate for these variables. You can use Describe in the Psych Library to get all of these. f. According to the CTA timetables both buses run at 10 minute intervals. Given our data could they possibly be running consistently on time? How do you know? g. If the CTA maintains the scheduled number of buses, but their arrival times are random, then waiting time would follow an exponential distribution with parameter λ equal to the rate 1/10minutes. (This is known as a Poisson Process). Look up the exponential distribution (Wikipedia will do) and find the formula for the mean and standard deviation and calculate these values using the rate (.1/minute) from the CTA timetable. h. An important result in statistics called the Central Limit Theorem says that the sample mean of n random draws from an arbitrary distribution with meanμ and standard deviation σ converges to a normal distribution with mean μ and standard deviation σ/√𝑛. Use this to calculate the expected mean waiting time and standard deviation of the mean based on the null hypothesis that arrival times follow the distribution in g (random arrivals at rate 1/10 minutes). i. Since our null hypothesis that bus arrivals are Poisson with rate λ = .1 also determines the variance under the null, we can use a z test to determine whether our sample is consistent with the hypothesis. Construct a z value by taking the difference between the sample mean and the calculated expected mean and dividing by the standard deviation (of the mean) from h. Find a normal (z) lookup table (two sided) and find the corresponding p value. This is the likelihood that our sample was generated under the null hypothesis. If is it less than 0.05 we reject the null at the 95% confidence level and accept the alternative hypothesis (that the mean is significantly different from μ). j. Perhaps the waiting times are not independent (maybe busses run late because of problems that affect several in a row). Then our calculations in g are not valid. This is typically the case for sample data, where we do not know the population variance. Instead of the z test we use a t-test, which uses the sample variance to do hypothesis testing. Use the value for the sample standard deviation for waiting time from part e and calculate the standard deviation for the mean (by dividing by √𝑛). Calculate the difference between the sample mean and the null (10 minutes) and divide by the standard deviation of the mean to get a t-statistic. Run the Rfunctiont.test(), to check your work. The t-test reports the p value just like our z test above. Is the mean significantly different from 10 minutes? Does this mean the Professor is not crazy? k. Things seem to be getting better for Professor Dave. Calculate the Pearson Correlation coefficient for waiting time and day to determine if waiting times are actually improving over time. Use “cor.test()” to get the correlation and a test of significance. What can you conclude from the result? Is it statistically significant? l. Dave was initially taking the Damen bus, but switched to the Halsted bus in hopes of avoiding his curse. Use DescribeBy() to calculate the mean and standard deviation for the two routes considered separately. Use t.test() again to compare two independent samples. Use the “var.equal= true” flag to run Student’s test and then run again with “var.equal= false” to repeat the test allowing for different variances. To determine which test is appropriate we need to runLevene’s test for the equality of variances. You can find a function to do this “leveneTest()” in the “car” package for R. If the sig.(nificance) value is less than .05 we reject the null (that the variances are equal) and use a modified t-test which accounts for the unequal variances of the two samples. Which should we use here? Does it matter for determining whether the Halsted bus is better? m. Even if we can’t say that the Halsted bus is statistically significantly better, the average times are certainly lower. Is this responsible for all of the improvement over time we found in k.? Run a linear regression of Waiting Time on Day and BusLine. How much of the variance in waiting time is explained by the model. How much confidence do we have in the results? Does this impact our interpretation of the correlation in k?
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Hi,I am sorry I was not available due to some issues but I am submitting your work within our agreed time.I am submitting R script file(in Rar form)containing R-code according to question number along with the explanation in docs file.If you are finding some error in extracting the file because it is my window's pc and mac user wont be able to extract,then I am giving media fire link from where you can download.http://www.mediafire.com/file/hv4m07j622913pb/R_Sc...Let me know if you still face any issue.Please note that Studypool won't allow us to upload Rscrip file,therefore used two different ways.Thank you.

Last Name 1
Name
University Name
Course
Homework 1
Q1
a,b,c,d)The variables have been defined, combined and assigned using the R-program
software
e)

The mean waiting time was found to be 17.35 minute with the standard deviation being 11.34
minute. The variance is square of the standard deviation which is 128.5956.The median
waiting time was found to be 16.5 minutes.
f)After finding the avera...


Anonymous
Great study resource, helped me a lot.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags