Quiz 5 week 5

User Generated

Uhffnz002

Mathematics

UB

Description

The rest is attached

Part 1 of 1 - Multiple Choice
Question 1 of 10
Regression analysis asks:

Question 6 of 10

Unformatted Attachment Preview

PUAD 630 ANALYTICAL TECHNIQUES IN PUBLIC ADMINISTRATION F ERZANA H AVEWALA R EGRESSION A NALYSIS : E STIMATING R ELATIONSHIPS Introduction 1  Regression analysis is the study of relationships between variables.  There are two potential objectives of regression analysis: to understand how the world operates  to make predictions.   Two basic types of data are analyzed: Cross-sectional data are usually data gathered from approximately the same period of time from a population.  Time series data involve one or more variables that are observed at several, usually equally spaced, points in time.  Time series variables are usually related to their own past values—a property called autocorrelation—which adds complications to the analysis.  PUAD 630: Analytical Techniques in Public Administration Potential Uses of Regression Analysis 2 Regression analysis can help answer questions similar to:  How do wages of employees depend on years of experience, years of education, and gender?  How does the current price of a stock depend on its own past values, as well as the current and past values of a market index?  How does a company’s current sales level depend on its current and past advertising levels, the advertising levels of its competitors, the company’s own past sales levels, and the general level of the market?  How does the total cost of producing a batch of items depend on the total quantity of items that have been produced?  How does the selling price of a house depend on such factors as the appraised value of the house, the square footage of the house, the number of bedrooms in the house, and perhaps others? PUAD 630: Analytical Techniques in Public Administration Regression Analysis Terms 3  In every regression study, there is a single variable that we are trying to explain or predict, called the dependent variable.  It is also called the response variable or the target variable.  To help explain or predict the dependent variable, we use one or more explanatory variables.  They are also called independent or predictor variables.  If there is a single explanatory variable, the analysis is called simple regression.  If there are several explanatory variables, it is called multiple regression.  Regression can be linear (straight-line relationships) or nonlinear (curved relationships).  Many nonlinear relationships can be linearized mathematically. PUAD 630: Analytical Techniques in Public Administration Scatterplots: Graphing Relationships 4  Drawing scatterplots is a good way to begin regression analysis.  A scatterplot is a graphical plot of two variables, an X and a Y.  If there is any relationship between the two variables, it is usually apparent from the scatterplot. PUAD 630: Analytical Techniques in Public Administration Example 1 (a). Drugstore Sales 5 Objective: To use a scatterplot to examine the relationship between promotional expenditures and sales at Pharmex. Data:  Pharmex has collected data from 50 randomly selected metropolitan regions.  There are two variables:   Pharmex’s promotional expenditures as a percentage of those of the leading competitor (“Promote”) Pharmex’s sales as a percentage of those of the leading competitor (“Sales”). PUAD 630: Analytical Techniques in Public Administration Example 1(a). Drugstore Sales Scatterplots to estimate relationships PUAD 630: Analytical Techniques in Public Administration Example 2. Explaining Overhead Costs at Bendrix 7 Objective: To use scatterplots to examine the relationships among overhead, machine hours, and production runs at Bendrix. Data:  Data file contains monthly observations for 3 years overhead costs  machine hours  number of production runs.   Each observation (row) corresponds to a single month. PUAD 630: Analytical Techniques in Public Administration Example 2(a). Overhead Costs Scatterplots to estimate relationships PUAD 630: Analytical Techniques in Public Administration Linear versus Nonlinear Relationships 9  Scatterplots are useful for detecting relationships that may not be obvious otherwise.  The typical relationship you hope to see is a straight-line, or linear, relationship.  This doesn’t mean that all points lie on a straight line, but that the points tend to cluster around a straight line.  The scatterplot here illustrates a nonlinear relationship. PUAD 630: Analytical Techniques in Public Administration Outliers 10  Scatterplots are especially useful for identifying outliers—observations that fall outside of the general pattern of the rest of the observations. If an outlier is clearly not a member of the population of interest, then it is probably best to delete it from the analysis.  If it isn’t clear whether outliers are members of the relevant population, run the regression analysis with them and again without them.  If the results are practically the same in both cases, then it is probably best to report the results with the outliers included.  Otherwise, you can report both sets of results with a verbal explanation of the outliers.  PUAD 630: Analytical Techniques in Public Administration Outliers 11 CEO PUAD 630: Analytical Techniques in Public Administration Unequal Variance 12  Occasionally, the variance of the dependent variable depends on the value of the explanatory variable. The figure below illustrates an example of this.  There is a clear upward relationship, but the variability of amount spent increases as salary increases—which is evident from the fan shape.  This unequal variance violates one of the assumptions of linear regression analysis, but there are ways to deal with it.  PUAD 630: Analytical Techniques in Public Administration No Relationship 13  Scatterplot can indicate that there is no relationship between a pair of variables.  Shapeless swarm of points PUAD 630: Analytical Techniques in Public Administration Correlations: Indicators of Linear Relationships 14  Correlations are numerical summary measures that indicate the strength of linear relationships between pairs of variables. A correlation between a pair of variables is a single number that summarizes the information in a scatterplot.  It measures the strength of linear relationships only.  The usual notation for a correlation between variables X and Y is rxy.   Correlation formula:  The numerator of the equation is also a measure of association between X and Y, called the covariance between X and Y.  The magnitude of a covariance is difficult to interpret because it depends on the units of measurement. PUAD 630: Analytical Techniques in Public Administration Correlations: Indicators of Linear Relationships 15  By looking at the sign of the covariance or correlation—plus or minus—you can tell whether the two variables are positively or negatively related.  Unlike covariances, correlations are completely unaffected by the units of measurement. A correlation equal to 0 or near 0 indicates practically no linear relationship.  A correlation with magnitude close to 1 indicates a strong linear relationship.  A correlation equal to -1 (negative correlation) or +1 (positive correlation) occurs only when the linear relationship between the two variables is perfect.   Be careful when interpreting correlations—they are relevant descriptors only for linear relationships. PUAD 630: Analytical Techniques in Public Administration Remember . . . 16  The greater the strength of the relationship between two variables (the higher the absolute value of the correlation coefficient), the more accurate the predictive relationship.  Why?  The more two variables share in common (shared variance), the more you know about one variable from the other. PUAD 630: Analytical Techniques in Public Administration Simple Linear Regression 17  Scatterplots and correlations indicate linear relationships and the strengths of these relationships, but they do not quantify them.  Simple linear regression quantifies the relationship where there is a single explanatory variable.  A straight line is fitted through the scatterplot of the dependent variable Y versus the explanatory variable X. PUAD 630: Analytical Techniques in Public Administration Regression Line 18  Reflects our best guess as to what score on the Y variable would be predicted by a score on the X variable  The line best fits these data because it minimizes the distance between each individual predicted point and the regression line.  The distance between each individual data point and the regression line is the error in prediction.  If the correlation were perfect, all of the data points would align themselves along a 45-degree angle, and the regression line would pass through each point. PUAD 630: Analytical Techniques in Public Administration The Simple Linear Regression Model 19 The equation that describes how y is related to x and an error term  Simple Linear Regression Model: y = β 0 + β 1x + ε Parameters: β0 and β1  The parameter values are usually not known and must be estimated using sample data  Sample statistics (denoted b0 and b1) are computed as estimates of the population parameters β0 and β1  Random variable: Error term, ε  The error term accounts for the variability in y that cannot be explained by the linear relationship between x and y  Estimated Regression Equation: The equation obtained by substituting the values of the sample statistics b0 and b1 for β0 and β1 in the regression equation  Estimated simple linear regression equation: 𝑦ො = b0 + b1x     𝑦ො = Point estimator of E(y|x) b0 = Estimated y-intercept b1 = Estimated slope The graph of the estimated simple linear regression equation is called the estimated regression line PUAD 630: Analytical Techniques in Public Administration The Estimation Process in Simple Linear Regression 20 PUAD 630: Analytical Techniques in Public Administration Possible Regression Lines in Simple Linear Regression 21  The regression line in Panel A shows that the mean value of y is related positively to x, with larger values of E(y|x) associated with larger values of x.  In Panel B, the mean value of y is related negatively to x, with smaller values of E(y|x) associated with larger values of x.  In Panel C, the mean value of y is not related to x; that is, E(y|x) is the same for every value of x. PUAD 630: Analytical Techniques in Public Administration Least Squares Method 22  Least squares method is a procedure for using sample data to find the estimated regression equation  Determine the values of b0 and b1  Interpretation of b0 and b1:   The slope b1 is the estimated change in the mean of the dependent variable y that is associated with a one unit increase in the independent variable x The y-intercept b0 is the estimated value of the dependent variable y when the independent variable x is equal to 0 PUAD 630: Analytical Techniques in Public Administration Least Squares Estimation 23  Fundamental Equation for Regression: Observed Value = Fitted Value + Residual  The residual is the difference between the actual and fitted values of the dependent variable.  The best-fitting line through the points of a scatterplot is the line with the smallest sum of squared residuals. This is called the least squares line.  It is the line quoted in regression outputs.   The least squares line is specified completely by its slope and intercept.  Equation for Slope in Simple Linear Regression: b1   ( X  X )(Y  Y )  r (X  X ) i i 2 i  Equation for Intercept in Simple Linear Regression: b0  Y  b1 X PUAD 630: Analytical Techniques in Public Administration XY sY sX Least Squares Estimation 24  When fitting a straight line through a scatterplot, choose the line that makes the vertical distance from the points to the line as small as possible.  A fitted value is the predicted value of the dependent variable.  Graphically, it is the height of the line above a given explanatory value. PUAD 630: Analytical Techniques in Public Administration Example 1 (b). Drugstore Sales 25 Objective: Find the least squares line for sales as a function of promotional expenses at Pharmex. Solution:  Select Regression from the StatTools Regression and Classification dropdown.  Use Sales as the dependent variable and Promote as the explanatory variable. PUAD 630: Analytical Techniques in Public Administration Example 1 (b). Drugstore Sales Simple Regression PUAD 630: Analytical Techniques in Public Administration Example 1 (b). Drugstore Sales  The slope = 0.7623 • • Sales index tends to increase by about 0.76 for each one-unit increase in the promotional expenses index. If two regions are compared, where the second region spends one unit more than the first region, the predicted sales index for the second region is 0.76 larger than the sales index for the first region.  The intercept = 25.1264 • • The predicted sales index for a region that does zero promotions is about 25.13 However, no region in the sample has anywhere near a zero promotional value. • Therefore, in a situation like this, where the range of observed values for the explanatory variable does not include zero, it is best to think of the intercept term as simply an “anchor” for the least squares line that enables predictions of Y values for the range of observed X values. PUAD 630: Analytical Techniques in Public Administration Example 2 (b). Overhead Costs Objective: To regress overhead expenses at Bendrix against the two potential explanatory variables. Solution:  The Bendrix manufacturing data set has two potential explanatory variables, Machine Hours and Production Runs.  First regress Overhead against Machine Hours as the single explanatory variable.  Then regress Overhead against Production Runs as the single explanatory variable. PUAD 630: Analytical Techniques in Public Administration Example 2 (b). Overhead Costs Simple Regression PUAD 630: Analytical Techniques in Public Administration Standard Error of Estimate 30  The magnitude of the residuals provide a good indication of how useful the regression line is for predicting Y values from X values.  Because there are numerous residuals, it is useful to summarize them with a single numerical measure. This measure is called the standard error of estimate and is denoted se.  It is essentially the standard deviation of the residuals, and is given by this equation:   The usual empirical rules for standard deviation can be applied to the standard error of estimate.  In general, the standard error of estimate indicates the level of accuracy of predictions made from the regression equation.  The smaller it is, the more accurate predictions tend to be. PUAD 630: Analytical Techniques in Public Administration How Good Is Our Prediction? 31  Error of estimate: How much each data point differs from the predicted data point  Standard error of estimate: The measure of how much each data point (on average) differs from the predicted data point or a standard deviation of all of the error scores  The higher the correlation between two variables (and the better the prediction), the lower the error will be. PUAD 630: Analytical Techniques in Public Administration The Percentage of Variation Explained: R-Square 32  R2 is an important measure of the goodness of fit of the least squares line. It is the percentage of variation of the dependent variable explained by the regression.  It always ranges between 0 and 1.  The better the linear fit is, the closer R2 is to 1.   Formula for R2:  In simple linear regression, R2 is the square of the correlation between the dependent variable and the explanatory variable. PUAD 630: Analytical Techniques in Public Administration Multiple Regression 33  To obtain improved fits in regression, several explanatory variables could be included in the regression equation. This is the realm of multiple regression. Graphically, you are no longer fitting a line to a set of points. If there are two explanatory variables, you are fitting a plane to the data in three-dimensional space.  The regression equation is still estimated by the least squares method, but it is not practical to do this by hand.  There is a slope term for each explanatory variable in the equation, but the interpretation of these terms is different.  The standard error of estimate and R2 summary measures are almost exactly as in simple regression.  Many types of explanatory variables can be included in the regression equation.  PUAD 630: Analytical Techniques in Public Administration Interpretation of Regression Coefficients 34  If Y is the dependent variable, and X1 through Xk are the explanatory variables, then a typical multiple regression equation has the form shown below, where a is the Yintercept, and b1 through bk are the slopes.  General Multiple Regression Equation: Predicted: Y = b0 + b1X1 + b2X2 + … + bkXk  Collectively, the bs in the equation are called the regression coefficients.  Each slope coefficient is the expected change in Y when this particular X increases by one unit and the other Xs in the equation remain constant.  This means that the estimates of the bs depend on which other Xs are included in the regression equation. PUAD 630: Analytical Techniques in Public Administration The BIG Rules . . . 35 When using multiple predictors, keep in mind . . .  Your independent variables (X1, X2, X3, etc.) should be related to the dependent variable (Y). They should have something in common.  Independent variables should not be related to each other; they should be uncorrelated so that they provide a unique contribution to the variance in the outcome of interest. PUAD 630: Analytical Techniques in Public Administration Example 2(c). Overhead Costs Objective: To estimate the equation for overhead costs at Bendrix as a function of both machine hours and production runs. Solution:  Select Regression from the StatTools Regression and Classification dropdown list. Then choose the Multiple option and specify the single D variable and the two I variables. PUAD 630: Analytical Techniques in Public Administration 37 Example 2(c). Overhead Costs Multiple Regression PUAD 630: Analytical Techniques in Public Administration Interpretation of Standard Error of Estimate and R-Square 38  The multiple regression output is very similar to simple regression output.  The standard error of estimate is essentially the standard deviation of residuals, but it is now given by the equation below, where n is the number of observations and k is the number of explanatory variables:  The R2 value is again the percentage of variation of the dependent variable explained by the combined set of explanatory variables, but it has a serious drawback: It can only increase when extra explanatory variables are added to an equation.  Adjusted R2 is an alternative measure that adjusts R2 for the number of explanatory variables in the equation.  It is used primarily to monitor whether extra explanatory variables really belong in the equation. PUAD 630: Analytical Techniques in Public Administration Modeling Possibilities 39  Several types of explanatory variables can be included in regression equations: Dummy variables  Interaction variables  Nonlinear transformations   There are many alternative approaches to modeling the relationship between a dependent variable and potential explanatory variables.  In many applications, these techniques produce much better fits than you could obtain without them. PUAD 630: Analytical Techniques in Public Administration Dummy Variables 40  Some potential explanatory variables are categorical and cannot be measured on a quantitative scale.  However, these categorical variables are often related to the dependent variable, so they need to be included in the regression equation.  The trick is to use dummy variables. A dummy variable is a variable with possible values of 0 and 1.  It is also called a 0-1 variable or an indicator variable.  It equals 1 if the observation is in a particular category, and 0 if it is not.   Categorical variables are used in two situations: When there are only two categories (example: gender)  When there are more than two categories (example: quarters)  In this case, multiple dummy variables must be created.  PUAD 630: Analytical Techniques in Public Administration Example 3. Bank Salaries 41 Objective: To analyze whether the bank discriminates against females in terms of salary. Data: The data set includes the following variables for each of the 208 employees:  Education (categorical)  Grade (categorical)  Years1 (numerical. years with this bank)  Years2 (numerical. years of previous work experience)  Age (numerical)  Gender (categorical with two values)  PCJob (categorical yes/no for if employee's current job is primarily computer-related)  Salary (numerical) PUAD 630: Analytical Techniques in Public Administration Example 3(a). Bank Salaries   Create dummy variables for the various categorical variables, using IF functions or the StatTools Dummy procedure. Then we can run a regression analysis with Salary as the dependent variable, using any combination of numerical and dummy explanatory variables. Don’t use any of the original categories (such as Education) that the dummies are based on.  Always use one fewer dummy than the number of categories for any categorical variable.  The omitted dummy then corresponds to the reference category.  The interpretation of any dummy variable coefficient is relative to this reference category.  When there are only two categories, (e.g., gender variable), name the variable with the category (e.g., Female) that corresponds to the 1’s. In this case the other category (e.g., Male) automatically becomes the reference category.  PUAD 630: Analytical Techniques in Public Administration Example 3(a)&(b). Bank Salaries Multiple regression with numerical and categorical explanatory variables PUAD 630: Analytical Techniques in Public Administration Interaction Variables 44  When you include only a dummy variable in a regression equation, you are allowing the intercepts of the two lines to differ, but you are forcing the lines to be parallel.  To be more realistic, you might want to allow them to have different slopes.  You can do this by including an interaction variable. An interaction variable is the product of two explanatory variables.  Include an interaction variable in a regression equation if you believe the effect of one explanatory variable on Y depends on the value of another explanatory variable.  PUAD 630: Analytical Techniques in Public Administration Example 3(c). Bank Salaries 45 Objective: To use multiple regression with an interaction variable to see whether the effect of years of experience on salary is different across the two genders. Solution: StatTools will create the interaction variables implicitly. Check the Include Derived Variables box at the bottom of the Regression dialog box and then click the resulting Add button . Check the Years1 and Gender variables and select the Interaction with Category Variable option. PUAD 630: Analytical Techniques in Public Administration Example 3(c) Bank Salaries Multiple regression with interaction variable PUAD 630: Analytical Techniques in Public Administration Example 3(c). Bank Salaries Regression Equations: Males: Salary = 30430 + (1528) Years1 Females: Salary = (30430+4098) + (1528-1248) Years 1 34528 + (280) Years 1 PUAD 630: Analytical Techniques in Public Administration Nonlinear Transformations 48  The general linear regression equation has the form: Predicted Y = b0 + b1X1 + b2X2 + … + bkXk  It is linear in the sense that the right side of the equation is a constant plus a sum of products of constants and variables.  The variables can be transformations of original variables. Nonlinear transformations of variables are often used because of curvature detected in scatterplots.  You can transform the dependent variable Y or any of the explanatory variables, the Xs. You can also do both.  Typical nonlinear transformations include: the natural logarithm, the square root, the reciprocal, and the square.  PUAD 630: Analytical Techniques in Public Administration Example 4(a). Electricity Objective: To see whether the cost of supplying electricity is a nonlinear function of demand, and if it is, what form the nonlinearity takes. Data: The data set lists the number of units of electricity produced (Units) and the total cost of producing these (Cost) for a 36-month period. Solution:  First generate a scatterplot of Cost versus Units.  Next, run a simple regression of Cost on Units. PUAD 630: Analytical Techniques in Public Administration Example 4(a). Electricity PUAD 630: Analytical Techniques in Public Administration Example 4(b). Electricity  Create a new variable Units2 in the data set and then use multiple regression to estimate the equation for Cost with both explanatory variables, Units, and Units2  Use Trendline option in Excel to superimpose a quadratic curve on the scatterplot. PUAD 630: Analytical Techniques in Public Administration Example 4(b). Electricity Quadratic transformation PUAD 630: Analytical Techniques in Public Administration Example 4(c). Electricity  Next, try a logarithmic fit by creating a new variable, NaturalLog(Units), and then regressing Cost against this variable. PUAD 630: Analytical Techniques in Public Administration Example 4(c). Electricity Logarithmic transformation PUAD 630: Analytical Techniques in Public Administration Example 4(c). Electricity  Logarithmic transformations of variables are used widely in regression analysis because they have a meaningful interpretation. Suppose that Units increases by 1% (e.g., from 600 to 606 ), then the expected Cost will increase by a given amount, namely, the coefficient of Log(Units) multiplied by 0.01, approximately 0.01(16654)= $166.54  Every 1% increase in Units is accompanied by an expected $166.54 increase in Cost.  Note that for larger values of Units, a 1% increase represents a larger absolute increase (e.g., from 700 to 707 instead of from 600 to 606). But each such 1% increase entails the same increase in Cost. This is another way of describing the decreasing marginal cost property.  PUAD 630: Analytical Techniques in Public Administration
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Hi hussein,Attached is the complete solution of the multiple choice questions.Pleasure working for you:)

Part 1 of 1 - Multiple Choice

Question 1 of 10
Regression analysis asks:


A. if there are differences between distinct populations



B. if the sample is representative of the population



C. how a single variable depends on other relevant variables



D. how several variables depend on each other

Answer C

Question 2 of 10
A scatterplot that appears as a shapeless mass of data points indicates:


A. a curved relationship among the variables
...

Related Tags