Unit 5 Dependent Variable in My Work Environment

Content Type

User Generated

User

nevryenzbanpbfgn

Subject

Other

Description

Think of a dependent variable within your work environment, domain of interest, or everyday life that would be valuable to predict using multiple regression. What are some independent variables that you would include in the analysis when your intuition tells you they may be related to the dependent variable?

I am currently in Human Resources/recruiting

Your journal entry must be at least 200 words in length.

Unformatted Attachment Preview

UNIT V STUDY GUIDE Data Analysis: Correlation and Regression Course Learning Outcomes for Unit V Upon completion of this unit, students should be able to: 6. Differentiate between various research-based tools commonly used in businesses. 6.1 Determine the most appropriate statistical procedure to use from among correlation, simple regression, and multiple regression to test hypotheses. 7. Test data for a business research project. 7.1 Establish whether to accept or reject null and alternative hypotheses by using correlation, simple regression, and multiple regression. Course/Unit Learning Outcomes 6.1 7.1 Learning Activity Unit Lesson Video: How to Find Correlation in Excel with the Data Analysis Toolpak Video: How to Use Excel-The PEARSON Function Video: Excel 2016 Correlation Analysis Video: How to Calculate a Correlation (and p value) in Microsoft Excel Video: Correlation Coefficient in Excel Video: How to Perform a Linear or Multiple Regression (Excel 2013) Video: Multiple Regression Interpretation in Excel Unit V Scholarly Activity Unit Lesson Video: Excel 2016 Correlation Analysis Video: How to Calculate a Correlation (and p value) in Microsoft Excel Video: Correlation Coefficient in Excel Video: Multiple Regression Interpretation in Excel Unit V Scholarly Activity Reading Assignment In order to access the following resources, click the links below: Glen, S. (2013, December 14). How to find correlation in Excel with the Data Analysis Toolpak [Video file]. Retrieved from https://www.youtube.com/watch?v=AjQA78tI39Q Click here for a transcript of the video. TheRMUoHP Biostatistics Resource Channel. (2014, November 6). How to use Excel-The PEARSON Function [Video file]. Retrieved from https://www.youtube.com/watch?v=JO-Gc5bEG70 Click here for a transcript of the video. Porterfield, T. (2017, May 18). Excel 2016 correlation analysis [Video file]. Retrieved from https://www.youtube.com/watch?v=kr64tfZmiGA Click here for a transcript of the video. MBA 5652, Research Methods 1 Quantitative Specialists. (2014, September 15). How to calculate a correlationUNIT (and xp-value) Microsoft STUDYinGUIDE Excel [Video file]. Retrieved from https://www.youtube.com/watch?v=vFcxExzLfZI Title Click here for a transcript of the video. MrSnyder88. (2009, November 8). Correlation coefficient in Excel [Video file]. Retrieved from https://www.youtube.com/watch?v=s2TVkYmmCAs Click here for a transcript of the video. economistician.com. (2015, May 15). How to perform a linear or multiple regression (Excel 2013) [Video file]. Retrieved from https://www.youtube.com/watch?v=wBocR96UdyY Click here for a transcript of the video. TheWoundedDoctor. (2013, May 6). Multiple regression interpretation in Excel [Video file]. Retrieved from https://www.youtube.com/watch?v=tlbdkgYz7FM Click here for a transcript of the video. Unit Lesson Data Analysis: Correlation and Regression Unit IV discussed descriptive statistics and the importance of testing the data to ensure assumptions are met before using parametric statistical procedures. When using descriptive statistics, the data that are collected are described by the researcher both visually and statistically. The visual representation alone can reveal information about whether assumptions are met. Although all statistical tests have different assumptions, normality is universally shared and is relatively easy to observe through the use of histograms. It is preferable to use parametric tests since they are more powerful than non-parametric tests, which have fewer assumptions that must be met. Regardless of the statistical procedure under consideration, the assumptions must be met if the researcher can have confidence in the validity of the results. Units V through VII will focus on inferential statistics, which include the parametric tests of correlation, regression, t test, and ANOVA. Inferential Statistics Unlike descriptive statistics, inferential statistics go beyond simply describing the data to making inferences, or predictions, about a population. The inferences are often based on the characteristics of a sample. Inferences, or predictions, are stated in the form of hypotheses. Results of statistical tests on samples are used to generalize those results to a population (Zikmund, Babin, Carr, & Griffin, 2013). Descriptive statistics and inferential statistics are not mutually exclusive. In fact, performing descriptive statistics should always be a precursor to inferential statistics for assumption testing for statistical procedures being considered. Populations, Samples, and Generalization Statistical procedures are used to answer questions about a population. A population can be people or things, such as a company’s entire consumer base or the total units produced for a new product. A population can be very large or very small. For example, a company may collect productivity data on their 100 employees. They are interested in knowing if there is a relationship between the size of merit increases and job productivity. The 100 employees represent the entire population, which would be considered a census. Since data are collected from all 100 employees, the company can have certainty that the statistical results represent the entire population. In many instances, however, it is impractical and cost prohibitive to collect data from all participants in the population. In these scenarios, data is collected from a sample of the population. The statistical results from the sample are then used to generalize the findings to the population. Using the example above, now assume the company has a population of 200,000 employees. They decide to select a random sample of 100 employees to whom they have provided various merit increases. Like the example MBA 5652, Research Methods 2 above, their interest is to understand if there is a relationship between the sizeUNIT of merit increases and job x STUDY GUIDE productivity. If they determine that there is a statistically significant relationshipTitle between the size of merit increases and productivity, they can generalize those results to the population of 200,000 employees. This can inform their decision-making and planning regarding the size of raises to provide for the next fiscal year and the productivity increase they can forecast. This is the function of inferential statistics. Relationships or Differences Statistical analysis can be simplified as either looking for relationships (or associations) between variables or looking for differences between variables or groups. This unit considers statistical testing that looks for relationships between variables. The statistical procedures highlighted to test for relationships will be correlation, simple regression, and multiple regression. Correlation and regression analyses are parametric tests. Chi-square is a corresponding non-parametric test. Correlation Although many course concepts in research methods may be new and foreign, correlation may feel more familiar and comfortable. The concept of correlation makes intuitive sense to most people since relationships between variables (e.g., years of education and income, safety training hours and lost time hours, and hours of exercise and weight loss) occur frequently in daily life. Relationships naturally occurring between variables can be positive or negative. A positive or negative relationship between variables does not mean positive or negative in the context of making a value judgment of good or bad. A positive or negative relationship, in statistical terms, means the direction of the relationship. An example of a positive relationship between variables is durable goods orders and the S&P 500 index. When durable goods orders decrease, there is a decrease in the S&P 500 index. When durable goods orders increase, there is an increase in the S&P 500 index. This is a positive relationship because both variables move in the same direction. As one variable increases, the other increases. Conversely, when one variable decreases, the other decreases. An example of a negative relationship between variables is outdoor temperature and heating oil expenditures. When the outdoor temperature increases, heating oil expenditures decrease. When the outdoor temperature decreases, heating oil expenditures increase. This is a negative relationship because the variables move in opposite directions. As one variable increases, the other decreases. Conversely, when one variable decreases, the other increases. Another important distinction that must be understood is the difference between correlation and causation. Even if a statistical test (e.g. Pearson’s r) indicates a statistically significant relationship between variables, it must never be said that one variable causes the change in the other variable. For example, there is a positive correlation between ice cream sales and violent crime in New York City (both increase in the warmer months of the year, and both decrease in the cooler months). It would be absurd to say that ice cream causes violent crime—even though the relationship between variables does exist. This extreme example makes the point that correlation does not mean causation. Causation can only be statistically shown via experimental research designs, which have tight controls to manipulate variables. Pearson Correlation Coefficient (r) When conducting correlation analysis, the Pearson correlation coefficient (r) is the most commonly used parametric measure of association between two variables (Norusis, 2008). The Pearson statistic is represented by r, which is the standardized covariance between the variables, and measures the linear relationship between variables (Field, 2005). The Pearson correlation coefficient is sometimes represented by R, but this is normally used in the context of regression analysis. One can easily determine how to calculate r using long-hand by referring to a statistics textbook, but it is much easier and faster to use statistical software to quickly calculate the Pearson correlation coefficient. For the purposes of this course, it is most important to understand what Pearson’s r is, what it measures, and how to interpret it, rather than how to calculate it by long-hand. MBA 5652, Research Methods 3 When using correlation analysis, a hypothesis is tested that there is no statistically UNITsignificant x STUDY relationship GUIDE between variables. The null and alternative hypotheses would be stated like so. Title Ho1: There is no statistically significant relationship between X and Y. Ha1: There is a statistically significant relationship between X and Y. As mentioned above, the r statistic can indicate a positive relationship or a negative relationship between variables. The r statistic can also indicate no relationship at all between variables. An r of +1 indicates a perfect positive correlation, while an r of -1 indicates a perfect negative correlation (Field, 2005). The r statistic will always fall between +1 and -1. An r of 0 indicates no correlation exists between variables. Correlation When reviewing the literature for research articles, it is very common to find r statistics less than .5. Given the fact that an r of 1 indicates a perfect correlation, a statistically significant r of .5 or less hardly seems large enough to get excited about; however, the American Psychological Association would disagree. The American Psychological Association (as cited in Kerr, Garvin, Heaton, & Boyle, 2006) concluded that psychologists studying highly complex human behavior should be satisfied with correlations in the r = 0.10 to 0.20 range, and they should be generally pleased with correlations in the 0.25–0.35 area. The best new variables typically increase predictions, for instance, of job performance between 1% and 4%. A 10% contribution of emotional intelligence would be considered very large (Kerr et al., 2006). Although there are no concrete guidelines for interpreting r and R2, The following chart suggests some general guidelines that are fairly consistent with other rule-of-thumb published guidelines. Adapted from Guideline for Interpreting Correlation Coefficient by I. Phanny, 2014. (https://www.slideshare.net/phannithrupp/guideline-for-interpreting-correlation-coefficient/2). MBA 5652, Research Methods 4 Coefficient of Determination (R2) UNIT x STUDY GUIDE Title The Pearson’s r is useful itself, but the closely related coefficient of determination (R2) is also very informative. Simply squaring r produces R2, which indicates the amount of variability in one variable that is explained by the other variable (Field, 2005). According to the American Psychological Association (as cited in Kerr et al., 2006), a researcher should be generally pleased with a correlation of r = .25, which translates to a coefficient of determination R2 = .0625. This means that the variable x explains 6.25% of the variability in the variable y. Most statistical software programs will calculate both r and R2 for when running correlation analysis, so it is easy to see the strength of the association and the explained variance. Again, it is important not to confuse correlation with causation. Examples of r and R2: r = .10, R2 = .01 explains 1% of the total variance between the variables being tested r = .30, R2 = .09 explains 9% of the total variance between the variables being tested r = .50, R2 = .25 explains 25% of the total variance between the variables being tested Interpreting Correlation Output Results The following correlation analysis looked for a statistically significant relationship between the variables of height and weight. The results show that there is a moderately strong correlation r = .6 (Pearson’s Correlation). It is also necessary to assess whether the correlation is statistically significant using an alpha of .05. The results indicate a p value of .023 < .05. Therefore, the null hypothesis is rejected, and the alternative hypothesis is accepted. Reject Ho1: There is no statistically significant relationship between weight and height. Accept Ha1: There is a statistically significant relationship between weight and height. Although the information obtained through correlation analysis is revealing and useful, it is limited in that correlation analysis cannot be used to make predictions (Field, 2005). To be able to predict the value of a dependent variable (DV) from observations of the independent variable (IV), regression analysis must be used. Regression Analysis Relationships between variables can be useful for making predictions. Regression analysis is a concept that many students have heard of, even if they are not entirely comfortable with it. If the relationship between the variables X and Y are known, predictions can be made about how a change in X will relate to a change in Y. Remember that this is not stating that a change in X causes a change in Y. It is only possible to predict a change based on the relationship between variables. Regression analysis can be powerful, especially when multiple X variables are included (multiple regression) to make a prediction about a change in a single Y variable. When using regression analysis, a hypothesis is tested that there is no statistically significant prediction of the dependent variable (i.e., Y or outcome variable) by one or more independent variables (X). If a single independent variable is used to predict Y, it is termed simple regression. If two or more independent X variables are used to predict Y, it is termed multiple regression. The null and alternative hypotheses would be stated as follows. Ho1: There is no statistically significant relationship to predict Y from X1, X2…and Xn. MBA 5652, Research Methods 5 Ha1: There is a statistically significant relationship to predict Y from X1, X2…and Xn. x STUDY GUIDE UNIT Title Regression analysis uses a linear model to apply a line of best fit to the data. The line of best fit is the most optimal because it results in the smallest amount of difference between the observed data points and the line (Field, 2005). As the linear regression example below shows, a line of best fit is applied to the data for the variables mortality (DV) and cigarette consumption (IV). This is an example of simple linear regression because there is only one IV. If all of the data points fell on a straight line, it would be a perfect linear relationship, which would allow us to make a perfect prediction of the Y axis variable by looking at the X axis variable (Norusis, 2008). A perfect linear relationship is rare, so we develop the regression model as Y = a + b(X). The resulting mathematical model is tested for statistical significance. If statistically significant, at a p value of less than .05, the IV data can be plugged into the model to be multiplied by the calculated coefficient, added to the calculated constant (Yintercept or a0), resulting in the predicted DV. The statistical software will calculate the model and values for a0 and b1, which will appear as the following equation: Y = a0 + b1 (X) Adapted from images in Multiple Linear Regression by J. Neill, 2008 (https://www.slideshare.net/jtneill/multiple-linear-regression). or DV = a0 + b1 (IV1) Simple regression creates the statistical model, shown above, with a single independent variable (IV), sometimes referred to as a predictor variable, and a single DV, sometimes referred to as the outcome or criterion variable. Multiple regression creates a statistical model with a single independent variable and two or more DVs. The multiple regression model is similar to the simple regression equation in that it still contains a Y-intercept, or a, but the multiple regression model contains multiple IVs and multiple corresponding coefficients, or bx, as shown below. MBA 5652, Research Methods 6 Y = a0 + b1X1 + b2X2 +…+ bnXn or UNIT x STUDY GUIDE Title DV = a0 + b1(IV1) + b2(IV2) +…+ bn(IVn) If the multiple regression model is statistically significant, at a p value of less than .05, the IV data can be plugged into the model to be multiplied by the calculated coefficients, added to the calculated constant (Y-intercept or a0), resulting in the predicted DV. Interpreting Regression Output Results Interpreting simple and multiple regression output is similar. There are several key test statistics and p values that are returned in a regression analysis that must be evaluated to a) determine statistical significance and b) assess the strength of the linear regression model. Adapted from images in Multiple Linear Regression by J. Neill, 2008 (https://www.slideshare.net/jtneill/multiple-linear-regression). Multiple R: This is Pearson’s r, as discussed in the correlation section. Regression often uses a capital R instead of r. This is simply the square root of r2. Multiple R describes the strength of the correlation between the model and the dependent variable. In the regression output below, the multiple R figure of 99.2% indicates a very strong positive correlation between the regression model and the dependent (output) variable. R square (r2): This is the coefficient of determination as was discussed in the correlation section. Regression often uses a capital R. The square of R explains the amount of variation in the dependent (output) variable that is explained by the regression model. In the regression output below, the R square (r2) figure indicates that 98.3% of the variation in the dependent variable is explained by the regression model. This is a very high r2. ANOVA: This indicates whether the regression model is statistically significant in its ability to predict the dependent variable. ANOVA uses significance F for probability, and this is synonymous with the p value discussed previously in the course. A significance level of F < .05 indicates statistical significance. In the regression output below, the significance level of F = .000009 < .05 would indicate that the null hypothesis should be rejected, and the alternative accepted that there is a statistically significant relationship between the regression model and the dependent variable. The t Stat: This assesses the statistical significance of the individual predictor variable coefficients. A p value < .05 for any given t stat indicates statistical significance for the corresponding coefficient. In the regression output below, the p values for the coefficients of variables 1, 2, and 3 are all
Purchase answer to see full attachment

Tags: work environment Research Methods unit 5 Dependent Variable Journal Unit 5

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached.

Running head: DEPENDENT VARIABLE

1

A Dependent Variable in my Work Environment
Student Name:
Institutional Affiliation:
Instructor:
Course Title:
Date:

DEPENDENT VARIABLE

2

A Dependent Variable in my Work Environment
I currently work in the Human Resources Department where we are responsible for
recruitment and ensuring an improvement in the c...