CS410P SUMMER 2019
DUE: 6/16/19
ASSIGNMENT 4P
Purpose
The purpose of this assignment is to have you work on a mathematical problem/analysis with lists and
functions and file input, along with working with multiple files for creating your program/source code.
Scenario
When the relationship between two related quantities appears linear, a linear regression equation will give the
best fit to that data. The context of this assignment is that you will compute a linear regression equation to fit a
straight line to load deflection data for a mechanical coil spring. The computed equation will satisfy the least
squares criterion. That is, it will minimize the sum of the squares of the deviations of the observed deflections
from those predicted by the equation.
We start with a set of experimental data which plots the relationship between two quantities, the weight of
a load on a spring and the corresponding compression of the spring, Fitting a curve to known data enables
estimation of spring deflections for other loads not in the data. This curve can be the basis for calibrating
a spring scale. The data consists of a collection of experimentally obtained pairs (xi, yi) with i = 0, 1, 2,
…., n-1. The xi s (independent variables) are the weights; the yi s (dependent variables) are the
corresponding deflections. We want to find a (linear) function f such that f(xi) f(yi).
A) Least Squares Method
The method of least squares, a technique used for determining the equation of a straight line for a set of
data pairs, will calculate coefficients m and b in the following linear regression equation:
f(x) = mx + b
When the original xi values are substitued into the equation, estimate f(xi) values are obtained. For each xi,
the residual is defined to be the difference beween the observed yi and the estimated f(xi). Thus in the
least squares method the sum of the square of the residuals
will be minimized.
Algorithm
For the n data pairs (xi,yi) with i = 0,, 1, …., n-1 the slope m and the intercept b, for the least squares linear
approximation f(x) = mx + b are calculated in the following way:
Page 1 of 5
After printing the information about the data file and number of data points, your program will compute
the m and b values and print these coefficients followed by the original xi, yi, estimated f(xi) values,
residuals.
Next we want to compute and print the sum of squared residuals, the total sum of squares and the
coefficient of determination as determined by the following formula:
B) Pearson Method
The Pearson Coefficient will be a number between -1 and 1, where a value close to 1 indicates good
correlation, values close to 0 will indicate no correlation and values close to -1 indicates negative
correlation. The Pearson method is not used for line fitting but just to indicate what kind of correlation
exists between the pairs of (x,y) values.
Algorithm
For the n data pairs (xi,yi) with i = 0,, 1, …., n-1 the Pearson Correlation Coefficient r is found by using
the following formula:
The Pearson Coefficient will be a number between -1 and 1, where a value close to 1 indicates good
correlation, values close to 0 will indicate no correlation and values close to -1 indicates negative
correlation.
For the Pearson method all we need to is to print the above “r” value which is the Pearson correlation
coefficient which will be a number between -1 and 1.
Input
The input is assumed to be coming from a file. The input consists of a number of pairs of values, where
each pair contains two floating point values which represent a load on the spring and the resulting
compression distance, respectively. You should read the data into two lists one for x and one for y. These
are “parallel” lists meaning that you have exactly the same number of items in both lists, and the
corresponding element of each lists make up an (x, y) pair.
Output
Output from the Least Squares method is printed first followed by the Pearson method. While your output
does not have to look exactly like mine but it should be as complete and as easy to read. For the Pearson
method we are simply interested in seeing the final Pearson Correlation Coefficient.
Requirements
Using the two lists containing the input (x, y) pairs, your program must eventually create two
other lists – one which stores the corresponding f(x) value for each (x, y) pair, and another which
stores the corresponding residual values for each (x, y) pair. In other words you end up having 4
parallel lists in your program.
You cannot use any library (or numpy) functions for computing the regression coefficients for
any methods. You will need to compute the equations in your program.
You must use at least the following functions:
o
o
o
o
o
A function that reads in the data to the list (argument: data file name, return values are
the two lists storing x and y values)
A function that computes both the y-intercept and slope (arguments: lists containing x
and y values; return values: y-intercept and slope)
A function that computes two lists: f(x) and residuals (arguments: lists containing x and
y values, slope and y-intercept (computed from the function above); return values: list
containing f(x), list containing residuals, a numeric value for the regression coefficient
which is the sum of squared residuals) and a numeric value for the total sum of squares.
(Optionally you can create separate functions for computing and returning the sum of
squared residuals and total sum of squares and return them appropriately)
A function that computes the Pearson correlation coefficient (arguments: lists containing
x and y values; return value: Pearson coefficient r)
All function definitions must have at least have a block comment describing what they do.
Printing (output) must be done only in your main function
Other than your main function and a function that reads data from your input file into your lists,
all your other functions must be written in a separate Python file.
o Let’s say your file that contains all the functions is called functions.py
o
To be able to include all your functions from this functions.py file in your main
program/file you can use the import statement as follows:
import functions as f
o
The “f” at the end allows you to abbreviate your functions module to be specified easily
while calling any function from that file. For example if compute_sxy() is a function
defined in your functions.py file then you can call this function from your main program
file as follows (assuming this function returns one value back)
result = f.compute_sxy(xvalues, yvalues)
The file containing your main() function where you control the flow of your program by seeking
input from the user (for filename), calling the function to read data from the input file, followed
by calling appropriate functions to compute the results. All input and output must be done only in
this file.
Work on the program using the modularity of the functions
As always test your program carefully and follow good programming style.
Sample Output
Grade Key:
A
4 parallel lists created one each for x, y, fx, residual
8
B
Minimum of 4 functions created with comments in a separate file (4 points for
creating a separate file with all the functions related to regression computation)
Printing done only in main() function, output format reasonable
12
40
E
Accuracy of results: slope, intercept, fx, residual – 5 points each; sum of
squared residuals; total sum of squares, coefficient of determination – 3 points
each – 3 points each, Pearson coefficient – 11 points
(-25 if library functions are used to compute any coefficients)
Program works correctly for each input file
F
G
Programming style, modularity (arguments, return values of functions)
Late Penalty
8
C
D
20
12
Purchase answer to see full
attachment