# Lab Olympic project

Nov 14th, 2015
DreamIt
Category:
Statistics
Price: \$25 USD

Question description

A. Summary

School of Mathematics and Science Math 153 Introduction to Statistics Fall 2015

Curve-fitting Project - Linear Regression Model

For this assignment you will be collecting data, analyzing whether the data exhibits a linear trend, finding the line of best fit, plotting the data and the line, interpreting the slope, and using the linear equation to make a prediction. You will use r2 (coefficient of determination) and the p-value to evaluate the strength of your prediction. Finally, you will write a report discussing your findings.

B. Background

The modern day Olympics began in 1896. You can read an overview of the history of the Olympics on Wikipedia, but, in short, the Summer and Winter Olympics have been held every 4 years since. There have been periods where war or politics caused less participation, including boycotts of the Olympics due to the Cold War in 1980 and 1984. The web site www.databaseolympics.com has a “comprehensive medal history of every Summer and Winter Olympics since 1896.” The web site includes all of the gold, silver, and bronze winners up to the Olympics of 2008.

Looking at the data on this web site, there is an interesting trend. It seems that over the modern history of the Olympics, the performance of some atheletes seems to be getting better and better. Your job will be to analyze the Olympic data for a particular event, see if a trend exists, and use the information to make a prediction about the 2012 Olympic games. The analysis you will be doing is called a “linear regression”.

A linear regression is a technique for examining real-world data to determine if the data follows a linear model. In other words, given some data points, can we reliably use a line to model the points and make predictions? There are tools available which will find the best line that approximates a set of data points. The tools provide a measure of how well the line fits the data values. If a line exists that is a good fit, then we can use the line to make predicitions for values we do not have.

There are a variety of reference materials available to help you complete the project.

• Chapters 10.1 - 10.3 of your textbook has material on least squares fitting of a line to data points.

• The following YouTube video is an introduction to Linear Regression. This is background/motivation rather than how to actually compute a linear regression.

• You can use Minitab as we did in class to determine the coefficient of determination (r2), p-value, and least squares fitting for data points. You can also use Excel. Shown below are some tutorial videos on using Excel to compute a linear regression.

1

C. Instructions

The tasks required to complete the project are listed below.

1. Select an event from the Summer Olympic games, or an indoor event from the Winter Olympic games. The event you select must be a measured event where the winner is the one with the best time, lifts the most weight, jumps the highest, etc. It cannot be one of the scored events where the winner has the highest or lowest score. For example, ice hockey or figure skating cannot be used as both of these events rely on scoring.

2. Write your purpose. This can be a one or two sentence summary of the goal of your research.

3. Go to the Olympics database and note the year and gold medal data for your event for at least 8 different Olympic years. You must have at least 8 data points for your project. Make a table which summarizes the data that you obtained including labels with units. Make sure that if the website gives you mixed units (such as “minutes:seconds”) that you convert this to be solely in minutes or solely in seconds.

4. Plot the points (x, y) to obtain a scatterplot. Use an appropriate scale on the horizontal and vertical axes and be sure to label the axes carefully, including units.

5. Find and state the value of r2 and the p-value. Discuss your findings in a few sentences. Is a line a model to fit to this data? Why or why not? Is the linear relationship very strong, moderately strong, weak, or nonexistent? Is it likely that your data came from solely random chance? If your p-value or r2 are too poor, you may have to select a new event at this point. Use the criteria we have established in class.

6. Find the line of best fit (regression line) and graph it on the scatterplot. You can use Minitab to do this or any other software that does a least-squares fitting. The equation of the line must be included on the graph or in the text.

7. State the slope of the line of best fit. Carefully interpret the meaning of the slope in a sentence or two. This means more than simply writing “the slope is ”. Give an example, for instance, of what your slope means.

8. Make a prediction about the gold medal winner in the 2012 Olympics using the line of best fit that you found above. Show calculation work.

9. Write a brief narrative of a paragraph or two. Summarize your topic and what you did as well as your findings. Be sure to mention any aspect of the linear model project (topic, data, scatterplot, line, r2, or estimate, etc.) that you found particularly important or interesting. Do not just mimic what I have said in my sample project — thoughtfully describe your own project.

Note that while your project must meet our established minimum standard for a least squares fitting, a successful project does not require a perfect fitting. Instead, a successful project is one where students correctly interpret their findings and demonstrate an understanding of what their results mean. Interpret numbers. What does your slope mean? What does your r2 mean? What does your p-value mean? What are the implications (good or bad) of these numbers on your prediction for the future?

Items #2-#9 constitute your project report. Be sure to include your name and a meaningful title at the be- ginning of your report. While mathematics can be hand-written, any descriptions, sentences, or paragraphs must be typed. Be sure your scatterplot has labels for the axes including units, and that the line of best fit is graphed in addition to your data. Your thoughts should be in complete sentences using proper English and punctuation. Projects are graded on the basis of completeness, correctness, and strength of the narrative portions.

D. What to Turn In

Turn in your scatterplot, any work you did by hand, and a printout of your report.

2

A. Purpose:

Curve-Fitting Example Project: Men’s 400 Meter Dash

To analyze the winning times for the Olympic Men’s 400 Meter Dash using a linear model, and predict the winning time in the 2012 Summer Olympics.

B. Data:

The winning times were retrieved from www.databaseolympics.com. The winning times were gathered for the most recent 16 Summer Olympics, post-WWII. (More data was available, back to 1896.)

Year Time (secs)

1948 46.20 1952 45.90 1956 46.70 1960 44.90

C. Scatterplot:

Year Time
(secs) (secs)

Year Time (secs)

1996 43.49 2000 43.84 2004 44.00 2008 43.75

Year Time

1964 45.10 1968 43.80 1972 44.66 1976 44.26

1980 44.60 1984 44.27 1988 43.87 1992 43.50

D. Coefficient of Determination and P-Value:

r2 = 0.6991 p-value = 0.0002

The p-value of 0.0002 is less than 0.05, indicating that it is unlikely that this data came about from random chance. The moderate coefficient of determination (0.6991) means that the line of best fit is a reasonable model for this data. Thus the year can be used to do an approximate prediction of the winning time for the 2012 Olympics using the line of best fit. To do a better prediction, the r2 value should be closer to 0.85 or 1.0 (perfect). Also note that at some point physical limitations of the runners will make the model inaccurate.

3

E. Line of Best Fit (Regression Line)

y = −0.0431x + 129.84 where x = Year and y = Winning Time (in seconds)

The slope is -0.0431 and is negative since the winning times are generally decreasing. The slope indicates that in general, the winning time decreases by 0.0431 second a year, and so the winning time decreases at an average rate of 4(0.0431) = 0.1724 second each 4-year Olympic interval.

F. Prediction:

For the 2012 Summer Olympics, substitute x = 2012 to get y = −0.0431(2012) + 129.84 ≈ 43.1 seconds. The regression line predicts a winning time of 43.1 seconds for the Men’s 400 Meter Dash in the 2012 Summer Olympics in London.

G. Narrative:

The data consisted of the winning times for the men’s 400m event in the Summer Olympics, for 1948 through 2008. The data exhibit a moderately strong downward linear trend, looking overall at the 60 year period. The r2 and p-values indicate that a line is a reasonable model for this data, giving me confidence in the prediction based on the regression line. The r2 value was not near 1.0, however, which means that predictions are not expected to be extremely accurate.

The regression line predicts a winning time of 43.1 seconds for the 2012 Summer Olympics, which would be nearly 0.4 second less than the existing Olympic record of 43.49 seconds, quite a feat! Will the regression line’s prediction be accurate? In the last two decades, there appears to be more of a cyclical (up and down) trend. Could winning times continue to drop at the same average rate? Extensive searches for talented potential athletes and improved full-time training methods can lead to decreased winning times, but ultimately, there will be a physical limit for humans.

Note that there were some unusual data points of 46.7 seconds in 1956 and 43.80 seconds in 1968, which are far above and far below the regression line. I wondered if these values made the correlation less strong, but when I investigated this, I found the coefficient of determination is r2 = 0.5351 which is not as strong as when we considered the time period going back to 1948 (the p-value is 0.01 which is still below the 0.05 threshold). The lower coefficient of determination means that the prediction will not be as good. Also, the most recent set of 10 winning times do not visually exhibit as strong a linear trend as the set of 16 winning times dating back to 1948.

H. Conclusion:

I have examined two linear models, using different subsets of the Olympic winning times for the men’s 400 meter dash. The prediction with the strongest coefficient of determination was 43.1 seconds for the 2012 Olympics. I checked on another website (olympic.org) and found that when the race was run in August, 2012, the winning time was 43.94 seconds. This means my estimate was off by 0.84 seconds from the actual time.

Does this mean the trend of the last 50 years is finally coming to an end? It will be interesting to compare these results to the upcoming 2016 results to see what happens!

(Top Tutor) Daniel C.
(997)
School: UCLA

Studypool has helped 1,244,100 students

## Review from our student for this Answer

SoccerBoss
Nov 16th, 2015
"Excellent job"

1829 tutors are online

Brown University

1271 Tutors

California Institute of Technology

2131 Tutors

Carnegie Mellon University

982 Tutors

Columbia University

1256 Tutors

Dartmouth University

2113 Tutors

Emory University

2279 Tutors

Harvard University

599 Tutors

Massachusetts Institute of Technology

2319 Tutors

New York University

1645 Tutors

Notre Dam University

1911 Tutors

Oklahoma University

2122 Tutors

Pennsylvania State University

932 Tutors

Princeton University

1211 Tutors

Stanford University

983 Tutors

University of California

1282 Tutors

Oxford University

123 Tutors

Yale University

2325 Tutors