Project management model; Model building and decision analysis; Monte Carlo Simulation

Content Type

User Generated

User

xvzvo

Subject

Mathematics

Description

Everything needs to be done using Excel, and every step needs explanation with formula and screenshot. The answer would be provided in Word document. The original Excel file is not required. Screenshot and graph needs to be labelled. Don't need to do question 1, only 2-5.

Unformatted Attachment Preview

Assignment 4 (1) Read chapter 9. Answer review questions 1.1. 6 1.2. 7 Read chapter 10 (until it starts talking about crystal ball software). Answer review question 1.3. 1 1.4. Answer the following question – Is Risk from PDF or CDF? What question do you ask CDF to estimate risk? (2) Project management model 2.1. Chapter 9 – Solve problem 18. Also include the graphical representation. 2.2. Solve the problem again to calculate critical path assuming task W takes 7 units of time to complete. (3) Model building and decision analysis 3.1. Chapter 9 – Solve problem 15. 3.2. Solve the problem again if KVA can handle 138 patients per week. (4) Monte Carlo Simulation Outsourcing decision model (see slides for model setup and procedure), input parameters: • Cost to manufacture is weibull distributed between 90 and 160. Use data in assignment 2. If you are not confident about your Weibull distribution, assume data is normally distributed between 90 and 160. Mean = (90+160)/2. STD = select a value between 10-15. You will not lose any points for this. • Cost to outsource is normally distributed with mean 172 and STD of 13. • Capital is one of {50010, 52500, 47300, 51600, 48440} • Demand is uniformly distributed between 830 to 999 4.1. Determine probability of decision variable = Outsourcing. Hint: see slides 35 to 37. 4.2. What is the probability that manufacturing cost is less than 90% of its maximum value in your model? 4.3. What is the probability that outsourcing cost is less than 85% of its maximum value in its model? 4.4. Add 95% confidence limits on probability of decision variable = Outsourcing from 4.1 (Hint: see week 2 slides. Confidence interval for fraction) Note that this is from empirical data (monte carlo simulation) and not a theoretical model. (5) PDF and CDF calculation on monte carlo simulation using regression 5.1. Using your knowledge of regression fit the PDF of cost difference in problem above to a mathematical model. Hint: see slides 38 to 43. 5.2. Calculate RMS error and Chi Sq. fit between PDF from empirical data (previous problem PDF) and your regression model (current problem PDF). 5.3. From the PDF calculate CDF. 5.4. Verify answers for problems 4.1 using CDF calculated from regression model. Are they close? Important Note: Save your work. We will revisit this problem. PHStat Notes Using the PHStat Stack Data and Unstack Data Tools One‐ and Two‐Way Tables and Charts Normal Probability Tools Generating Probabilities in PHStat Confidence Intervals for the Mean Confidence Intervals for Proportions Confidence Intervals for the Population Variance Determining Sample Size One‐Sample Test for the Mean, Sigma Unknown One‐Sample Test for Proportions Using Two‐Sample t‐Test Tools Testing for Equality of Variances Chi‐Square Test for Independence Using Regression Tools Stepwise Regression Best-Subsets Regression Creating x‐ and R‐Charts Creating p‐Charts Using the Expected Monetary Value Tool p. p. p. p. p. p. p. p. p. p. p. p. p. p. p. p. p. p. p. 28 63 97 98 136 136 137 137 169 169 169 170 171 209 211 212 267 268 375 p. p. p. p. p. p. p. p. p. p. p. p. p. p. p. p. p. p. p. p. p. p. 29 61 61 62 63 63 134 135 135 171 209 209 211 243 243 244 298 298 298 299 299 375 p. p. p. p. p. p. 338 339 339 341 341 342 Excel Notes Creating Charts in Excel 2010 Creating a Frequency Distribution and Histogram Using the Descriptive Statistics Tool Using the Correlation Tool Creating Box Plots Creating PivotTables Excel‐Based Random Sampling Tools Using the VLOOKUP Function Sampling from Probability Distributions Single‐Factor Analysis of Variance Using the Trendline Option Using Regression Tools Using the Correlation Tool Forecasting with Moving Averages Forecasting with Exponential Smoothing Using CB Predictor Creating Data Tables Data Table Dialog Using the Scenario Manager Using Goal Seek Net Present Value and the NPV Function Using the IRR Function Crystal Ball Notes Customizing Define Assumption Sensitivity Charts Distribution Fitting with Crystal Ball Correlation Matrix Tool Tornado Charts Bootstrap Tool TreePlan Note Constructing Decision Trees in Excel p. 376 This page intentionally left blank Useful Statistical Functions in Excel 2010 Description AVERAGE(data range) BINOM.DIST(number_s, trials, probability_s, cumulative) BINOM.INV(trials, probability_s, alpha) Computes the average value (arithmetic mean) of a set of data. Returns the individual term binomial distribution. Returns the smallest value for which the cumulative binomial distribution is greater than or equal to a criterion value. Returns the left-tailed probability of the chi-square distribution. Returns the right-tailed probability of the chi-square distribution. Returns the test for independence; the value of the chi-square distribution and the appropriate degrees of freedom. Returns the confidence interval for a population mean using a normal distribution. Returns the confidence interval for a population mean using a t-distribution. Computes the correlation coefficient between two data sets. Returns the exponential distribution. Returns the left-tailed F-probability distribution value. Returns the left-tailed F-probability distribution value. Calculates a future value along a linear trend. Calculates predicted exponential growth. Returns an array that describes a straight line that best fits the data. Returns the cumulative lognormal distribution of x, where ln (x) is normally distributed with parameters mean and standard deviation. Computes the median (middle value) of a set of data. Computes the modes (most frequently occurring values) of a set of data. Computes the mode of a set of data. Returns the normal cumulative distribution for the specified mean and standard deviation. Returns the inverse of the cumulative normal distribution. Returns the standard normal cumulative distribution (mean = 0, standard deviation = 1). Returns the inverse of the standard normal distribution. Computes the kth percentile of data in a range, exclusive. Computes the kth percentile of data in a range, inclusive. Returns the Poisson distribution. Computes the quartile of a distribution. Computes the skewness, a measure of the degree to which a distribution is not symmetric around its mean. Returns a normalized value for a distribution characterized by a mean and standard deviation. Computes the standard deviation of a set of data, assumed to be a sample. Computes the standard deviation of a set of data, assumed to be an entire population. Returns values along a linear trend line. Returns the left-tailed t-distribution value. Returns the two-tailed t-distribution value. Returns the right-tailed t-distribution. Returns the left-tailed inverse of the t-distribution. Returns the two-tailed inverse of the t-distribution. Returns the probability associated with a t-test. Computes the variance of a set of data, assumed to be a sample. Computes the variance of a set of data, assumed to be an entire population. Returns the two-tailed p-value of a z-test. CHISQ.DIST(x, deg_freedom, cumulative) CHISQ.DIST.RT(x, deg_freedom, cumulative) CHISQ.TEST(actual_range, expected_range) CONFIDENCE.NORM(alpha, standard_dev, size) CONFIDENCE.T(alpha, standard_dev, size) CORREL(arrayl, array2) EXPON.DIST(x, lambda, cumulative) F.DIST(x. deg_freedom1, deg_freedom2, cumulative) F.DIST.RT(x. deg_freedom1, deg_freedom2, cumulative) FORECAST(x, known_y's, known_x's) GROWTH(known_y's, known_x's, new_x's, constant) LINEST(known_y's, known_x's, new_x's, constant, stats) LOGNORM.DIST(x, mean, standard_deviation) MEDIAN(data range) MODE.MULT(data range) MODE.SNGL(data range) NORM.DIST(x, mean, standard_dev, cumulative) NORM.INV(probability, mean, standard_dev) NORM.S.DIST(z) NORM.S.INV(probability) PERCENTILE.EXC(array, k) PERCENTILE.INC(array, k) POISSON.DIST(x, mean, cumulative) QUARTILE(array, quart) SKEW(data range) STANDARDIZE(x, mean, standard_deviation) STDEV.S(data range) STDEV.P(data range) TREND(known_y's, known_x's, new_x's, constant) T.DIST(x, deg_freedom, cumulative) T.DIST.2T(x, deg_freedom) T.DIST.RT(x, deg_freedom) T.INV(probability, deg_freedom) T.INV.2T(probability, deg_freedom) T.TEST(arrayl, array2, tails, type) VAR.S(data range) VAR.P(data range) Z.TEST(array, x, sigma) This page intentionally left blank Fifth Edition STATISTICS, DATA ANALYSIS, AND DECISION MODELING James R. Evans University of Cincinnati Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo Editorial Director: Sally Yagan Editor in Chief: Donna Battista Senior Acquisitions Editor: Chuck Synovec Editorial Project Manager: Mary Kate Murray Editorial Assistant: Ashlee Bradbury Director of Marketing: Maggie Moylan Executive Marketing Manager: Anne Fahlgren Production Project Manager: Renata Butera Operations Specialist: Renata Butera Creative Art Director: Jayne Conte Cover Designer: Suzanne Duda Manager, Rights and Permissions: Hessa Albader Cover Art: pedrosek/Shutterstock Images Media Project Manager: John Cassar Media Editor: Sarah Peterson Full-Service Project Management: Shylaja Gatttupalli Composition: Jouve India Pvt Ltd Printer/Binder: Edwards Brothers Cover Printer: Lehigh-Phoenix Color/Hagerstown Text Font: Palatino Credits and acknowledgments borrowed from other sources and reproduced, with permission, in this textbook appear on the appropriate page within text. Microsoft and/or its respective suppliers make no representations about the suitability of the information contained in the documents and related graphics published as part of the services for any purpose. All such documents and related graphics are provided “as is” without warranty of any kind. Microsoft and/ or its respective suppliers hereby disclaim all warranties and conditions with regard to this information, including all warranties and conditions of merchantability, whether express, implied or statutory, fitness for a particular purpose, title and non-infringement. In no event shall Microsoft and/or its respective suppliers be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss of use, data or profits, whether in an action of contract, negligence or other tortious action, arising out of or in connection with the use or performance of information available from the services. The documents and related graphics contained herein could include technical inaccuracies or typographical errors. Changes are periodically added to the information herein. Microsoft and/or its respective suppliers may make improvements and/or changes in the product(s) and/or the program(s) described herein at any time. Partial screen shots may be viewed in full within the software version specified. Microsoft® and Windows® are registered trademarks of the Microsoft Corporation in the U.S.A. and other countries. This book is not sponsored or endorsed by or affiliated with the Microsoft Corporation. Copyright © 2013, 2010, 2007, 2003, 2000 by Pearson Education, Inc., publishing as Prentice Hall. All rights reserved. Manufactured in the United States of America. This publication is protected by Copyright, and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. To obtain permission(s) to use material from this work, please submit a written request to Pearson Education, Inc., Permissions Department, One Lake Street, Upper Saddle River, New Jersey 07458, or you may fax your request to 201-236-3290. Many of the designations by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed in initial caps or all caps. Library of Congress Cataloging-in-Publication Data Evans, James R. (James Robert) Statistics, data analysis, and decision modeling / James R. Evans. —5th ed. p. cm. ISBN-13: 978-0-13-274428-7 ISBN-10: 0-13-274428-7 1. Industrial management—Statistical methods. 2. Statistical decision. I. Title. HD30.215.E93 2012 658.4r033—dc23 2011039310 10 9 8 7 6 5 4 3 2 1 ISBN 10: 0-13-274428-7 ISBN 13: 978-0-13-274428-7 To Beverly, Kristin, and Lauren, the three special women in my life. —James R. Evans This page intentionally left blank BRIEF CONTENTS PART I Statistics and Data Analysis 1 Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Data and Business Decisions 3 Descriptive Statistics and Data Analysis 31 Probability Concepts and Distributions 65 Sampling and Estimation 99 Hypothesis Testing and Statistical Inference 138 Regression Analysis 172 Forecasting 213 Introduction to Statistical Quality Control 248 PART II Decision Modeling and Analysis 269 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Building and Using Decision Models 271 Decision Models with Uncertainty and Risk 300 Decisions, Uncertainty, and Risk 343 Queues and Process Simulation Modeling 378 Linear Optimization 411 Integer, Nonlinear, and Advanced Optimization Methods 458 Appendix 509 Index 521 vii This page intentionally left blank CONTENTS Preface xxi Part I STATISTICS AND DATA ANALYSIS 1 Chapter 1 DATA AND BUSINESS DECISIONS 3 Introduction 4 Data in the Business Environment 4 Sources and Types of Data 6 Metrics and Data Classification 7 Statistical Thinking 11 Populations and Samples 12 Using Microsoft Excel 13 Basic Excel Skills 14 Skill‐Builder Exercise 1.1 14 Copying Formulas and Cell References 14 Skill‐Builder Exercise 1.2 15 Functions 16 Skill‐Builder Exercise 1.3 18 Other Useful Excel Tips 18 Excel Add‐Ins 19 Skill‐Builder Exercise 1.4 20 Displaying Data with Excel Charts 21 Column and Bar Charts 21 Skill‐Builder Exercise 1.5 22 Line Charts 23 Skill‐Builder Exercise 1.6 23 Pie Charts 23 Skill‐Builder Exercise 1.7 23 Area Charts 24 Scatter Diagrams 24 Skill‐Builder Exercise 1.8 24 Miscellaneous Excel Charts 25 Ethics and Data Presentation 25 Skill‐Builder Exercise 1.9 26 Basic Concepts Review Questions 27 Problems and Applications 27 Case: A Data Collection and Analysis Project 28 ix x Contents Chapter 2 DESCRIPTIVE STATISTICS AND DATA ANALYSIS 31 Introduction 32 Descriptive Statistics 32 Frequency Distributions, Histograms, and Data Profiles 33 Categorical Data 34 Numerical Data 34 Skill‐Builder Exercise 2.1 38 Skill‐Builder Exercise 2.2 38 Data Profiles 38 Descriptive Statistics for Numerical Data 39 Measures of Location 39 Measures of Dispersion 40 Skill‐Builder Exercise 2.3 42 Measures of Shape 43 Excel Descriptive Statistics Tool 44 Skill‐Builder Exercise 2.4 44 Measures of Association 45 Skill‐Builder Exercise 2.5 47 Descriptive Statistics for Categorical Data 47 Skill‐Builder Exercise 2.6 48 Visual Display of Statistical Measures 49 Box Plots 49 Dot‐Scale Diagrams 49 Skill‐Builder Exercise 2.7 49 Outliers 50 Data Analysis Using PivotTables 50 Skill‐Builder Exercise 2.8 53 Skill‐Builder Exercise 2.9 53 Basic Concepts Review Questions 54 Problems and Applications 54 Case: The Malcolm Baldrige Award 57 Skill‐Builder Exercise 2.10 59 Skill‐Builder Exercise 2.11 60 Chapter 3 PROBABILITY CONCEPTS AND DISTRIBUTIONS 65 Introduction 66 Basic Concepts of Probability 66 Basic Probability Rules and Formulas 67 Conditional Probability 68 Skill‐Builder Exercise 3.1 70 Random Variables and Probability Distributions 70 Discrete Probability Distributions 73 Expected Value and Variance of a Discrete Random Variable 74 Contents Skill‐Builder Exercise 3.2 75 Bernoulli Distribution 75 Binomial Distribution 75 Poisson Distribution 76 Skill‐Builder Exercise 3.3 78 Continuous Probability Distributions 78 Uniform Distribution 80 Normal Distribution 81 Skill‐Builder Exercise 3.4 84 Triangular Distribution 84 Exponential Distribution 85 Probability Distributions in PHStat 86 Other Useful Distributions 86 Joint and Marginal Probability Distributions 89 Basic Concepts Review Questions 90 Problems and Applications 90 Case: Probability Analysis for Quality Measurements 94 Chapter 4 SAMPLING AND ESTIMATION 99 Introduction 100 Statistical Sampling 100 Sample Design 100 Sampling Methods 101 Errors in Sampling 103 Random Sampling From Probability Distributions 103 Sampling From Discrete Probability Distributions 104 Skill‐Builder Exercise 4.1 105 Sampling From Common Probability Distributions 105 A Statistical Sampling Experiment in Finance 106 Skill‐Builder Exercise 4.2 106 Sampling Distributions and Sampling Error 107 Skill‐Builder Exercise 4.3 110 Applying the Sampling Distribution of the Mean 110 Sampling and Estimation 110 Point Estimates 111 Unbiased Estimators 112 Skill‐Builder Exercise 4.4 113 Interval Estimates 113 Confidence Intervals: Concepts and Applications 113 Confidence Interval for the Mean with Known Population Standard Deviation 114 Skill‐Builder Exercise 4.5 116 xi xii Contents Confidence Interval for the Mean with Unknown Population Standard Deviation 116 Confidence Interval for a Proportion 118 Confidence Intervals for the Variance and Standard Deviation 119 Confidence Interval for a Population Total 121 Using Confidence Intervals for Decision Making 122 Confidence Intervals and Sample Size 122 Prediction Intervals 124 Additional Types of Confidence Intervals 125 Differences Between Means, Independent Samples 125 Differences Between Means, Paired Samples 125 Differences Between Proportions 126 Basic Concepts Review Questions 126 Problems and Applications 126 Case: Analyzing a Customer Survey 129 Skill‐Builder Exercise 4.6 131 Skill‐Builder Exercise 4.7 132 Skill‐Builder Exercise 4.8 133 Skill‐Builder Exercise 4.9 133 Chapter 5 HYPOTHESIS TESTING AND STATISTICAL INFERENCE Introduction 139 Basic Concepts of Hypothesis Testing 139 Hypothesis Formulation 140 Significance Level 141 Decision Rules 142 Spreadsheet Support for Hypothesis Testing 145 One‐Sample Hypothesis Tests 145 One‐Sample Tests for Means 145 Using p‐Values 147 One‐Sample Tests for Proportions 148 One Sample Test for the Variance 150 Type II Errors and the Power of A Test 151 Skill‐Builder Exercise 5.1 153 Two‐Sample Hypothesis Tests 153 Two‐Sample Tests for Means 153 Two‐Sample Test for Means with Paired Samples 155 Two‐Sample Tests for Proportions 155 Hypothesis Tests and Confidence Intervals 156 Test for Equality of Variances 157 Skill‐Builder Exercise 5.2 158 Anova: Testing Differences of Several Means 158 Assumptions of ANOVA 160 Tukey–Kramer Multiple Comparison Procedure 160 138 Contents Chi‐Square Test for Independence 162 Skill‐Builder Exercise 5.3 164 Basic Concepts Review Questions 164 Problems and Applications 164 Case: HATCO, Inc. 167 Skill‐Builder Exercise 5.4 169 Chapter 6 REGRESSION ANALYSIS 172 Introduction 173 Simple Linear Regression 174 Skill‐Builder Exercise 6.1 175 Least‐Squares Regression 176 Skill‐Builder Exercise 6.2 178 A Practical Application of Simple Regression to Investment Risk 178 Simple Linear Regression in Excel 179 Skill‐Builder Exercise 6.3 180 Regression Statistics 180 Regression as Analysis of Variance 181 Testing Hypotheses for Regression Coefficients 181 Confidence Intervals for Regression Coefficients 182 Confidence and Prediction Intervals for X‐Values 182 Residual Analysis and Regression Assumptions 182 Standard Residuals 184 Skill‐Builder Exercise 6.4 184 Checking Assumptions 184 Multiple Linear Regression 186 Skill‐Builder Exercise 6.5 186 Interpreting Results from Multiple Linear Regression 188 Correlation and Multicollinearity 188 Building Good Regression Models 190 Stepwise Regression 193 Skill‐Builder Exercise 6.6 193 Best‐Subsets Regression 193 The Art of Model Building in Regression 194 Regression with Categorical Independent Variables 196 Categorical Variables with More Than Two Levels 199 Skill‐Builder Exercise 6.7 201 Regression Models with Nonlinear Terms 201 Skill‐Builder Exercise 6.8 202 Basic Concepts Review Questions 204 Problems and Applications 204 Case: Hatco 207 xiii xiv Contents Chapter 7 FORECASTING 213 Introduction 214 Qualitative and Judgmental Methods 214 Historical Analogy 215 The Delphi Method 215 Indicators and Indexes for Forecasting 215 Statistical Forecasting Models 216 Forecasting Models for Stationary Time Series 218 Moving Average Models 218 Error Metrics and Forecast Accuracy 220 Skill‐Builder Exercise 7.1 222 Exponential Smoothing Models 222 Skill‐Builder Exercise 7.2 224 Forecasting Models for Time Series with a Linear Trend 224 Regression‐Based Forecasting 224 Advanced Forecasting Models 225 Autoregressive Forecasting Models 226 Skill‐Builder Exercise 7.3 228 Forecasting Models with Seasonality 228 Incorporating Seasonality in Regression Models 229 Skill‐Builder Exercise 7.4 231 Forecasting Models with Trend and Seasonality 231 Regression Forecasting with Causal Variables 231 Choosing and Optimizing Forecasting Models Using CB Predictor 233 Skill‐Builder Exercise 7.5 235 The Practice of Forecasting 238 Basic Concepts Review Questions 239 Problems and Applications 240 Case: Energy Forecasting 241 Chapter 8 INTRODUCTION TO STATISTICAL QUALITY CONTROL Introduction 248 The Role of Statistics and Data Analysis in Quality Control 249 Statistical Process Control 250 Control Charts 250 x ‐ and R‐Charts 251 Skill‐Builder Exercise 8.1 256 Analyzing Control Charts 256 Sudden Shift in the Process Average 257 Cycles 257 Trends 257 248 Contents Hugging the Center Line 257 Hugging the Control Limits 258 Skill‐Builder Exercise 8.2 258 Skill‐Builder Exercise 8.3 260 Control Charts for Attributes 260 Variable Sample Size 262 Skill‐Builder Exercise 8.4 264 Process Capability Analysis 264 Skill‐Builder Exercise 8.5 266 Basic Concepts Review Questions 266 Problems and Applications 266 Case: Quality Control Analysis 267 Part II Decision Modeling and Analysis 269 Chapter 9 BUILDING AND USING DECISION MODELS 271 Introduction 271 Decision Models 272 Model Analysis 275 What‐If Analysis 275 Skill‐Builder Exercise 9.1 277 Skill‐Builder Exercise 9.2 278 Skill‐Builder Exercise 9.3 278 Model Optimization 278 Tools for Model Building 280 Logic and Business Principles 280 Skill‐Builder Exercise 9.4 281 Common Mathematical Functions 281 Data Fitting 282 Skill‐Builder Exercise 9.5 284 Spreadsheet Engineering 284 Skill‐Builder Exercise 9.6 285 Spreadsheet Modeling Examples 285 New Product Development 285 Skill‐Builder Exercise 9.7 287 Single Period Purchase Decisions 287 Overbooking Decisions 288 Project Management 289 Model Assumptions, Complexity, and Realism 291 Skill‐Builder Exercise 9.8 293 Basic Concepts Review Questions 293 Problems and Applications 294 Case: An Inventory Management Decision Model 297 xv xvi Contents Chapter 10 DECISION MODELS WITH UNCERTAINTY AND RISK 300 Introduction 301 Spreadsheet Models with Random Variables 301 Monte Carlo Simulation 302 Skill‐Builder Exercise 10.1 303 Monte Carlo Simulation Using Crystal Ball 303 Defining Uncertain Model Inputs 304 Running a Simulation 308 Saving Crystal Ball Runs 310 Analyzing Results 310 Skill‐Builder Exercise 10.2 314 Crystal Ball Charts 315 Crystal Ball Reports and Data Extraction 318 Crystal Ball Functions and Tools 318 Applications of Monte Carlo Simulation and Crystal Ball Features 319 Newsvendor Model: Fitting Input Distributions, Decision Table Tool, and Custom Distribution 319 Skill‐Builder Exercise 10.3 323 Skill‐Builder Exercise 10.4 324 Overbooking Model: Crystal Ball Functions 324 Skill‐Builder Exercise 10.5 325 Cash Budgeting: Correlated Assumptions 325 New Product Introduction: Tornado Chart Tool 328 Skill‐Builder Exercise 10.6 329 Project Management: Alternate Input Parameters and the Bootstrap Tool 329 Skill‐Builder Exercise 10.7 334 Basic Concepts Review Questions 334 Problems and Applications 335 Case: J&G Bank 338 Chapter 11 DECISIONS, UNCERTAINTY, AND RISK 343 Introduction 344 Decision Making Under Certainty 344 Decisions Involving a Single Alternative 345 Skill‐Builder Exercise 11.1 345 Decisions Involving Non–mutually Exclusive Alternatives 345 Decisions Involving Mutually Exclusive Alternatives 346 Decisions Involving Uncertainty and Risk 347 Making Decisions with Uncertain Information 347 Decision Strategies for a Minimize Objective 348 Contents xvii Skill‐Builder Exercise 11.2 350 Decision Strategies for a Maximize Objective 350 Risk and Variability 351 Expected Value Decision Making 353 Analysis of Portfolio Risk 354 Skill‐Builder Exercise 11.3 356 The “Flaw of Averages” 356 Skill‐Builder Exercise 11.4 356 Decision Trees 357 A Pharmaceutical R&D Model 357 Decision Trees and Risk 358 Sensitivity Analysis in Decision Trees 360 Skill‐Builder Exercise 11.5 360 The Value of Information 360 Decisions with Sample Information 362 Conditional Probabilities and Bayes’s Rule 363 Utility and Decision Making 365 Skill‐Builder Exercise 11.6 368 Exponential Utility Functions 369 Skill‐Builder Exercise 11.7 370 Basic Concepts Review Questions 370 Problems and Applications 371 Case: The Sandwich Decision 375 Chapter 12 QUEUES AND PROCESS SIMULATION MODELING Introduction 378 Queues and Queuing Systems 379 Basic Concepts of Queuing Systems 379 Customer Characteristics 380 Service Characteristics 381 Queue Characteristics 381 System Configuration 381 Performance Measures 382 Analytical Queuing Models 382 Single‐Server Model 383 Skill‐Builder Exercise 12.1 384 Little’s Law 384 Process Simulation Concepts 385 Skill‐Builder Exercise 12.2 386 Process Simulation with SimQuick 386 Getting Started with SimQuick 387 A Queuing Simulation Model 388 378 xviii Contents Skill‐Builder Exercise 12.3 392 Queues in Series with Blocking 393 Grocery Store Checkout Model with Resources 394 Manufacturing Inspection Model with Decision Points 397 Pull System Supply Chain with Exit Schedules 400 Other SimQuick Features and Commercial Simulation Software 402 Continuous Simulation Modeling 403 Basic Concepts Review Questions 406 Problems and Applications 407 Case: Production/Inventory Planning 410 Chapter 13 LINEAR OPTIMIZATION 411 Introduction 411 Building Linear Optimization Models 412 Characteristics of Linear Optimization Models 415 Implementing Linear Optimization Models on Spreadsheets 416 Excel Functions to Avoid in Modeling Linear Programs 417 Solving Linear Optimization Models 418 Solving the SSC Model Using Standard Solver 418 Solving the SSC Model Using Premium Solver 420 Solver Outcomes and Solution Messages 422 Interpreting Solver Reports 422 Skill‐Builder Exercise 13.1 426 How Solver Creates Names in Reports 427 Difficulties with Solver 427 Applications of Linear Optimization 427 Process Selection 429 Skill‐Builder Exercise 13.2 430 Blending 430 Skill‐Builder Exercise 13.3 432 Portfolio Investment 432 Skill‐Builder Exercise 13.4 433 Transportation Problem 433 Interpreting Reduced Costs 437 Multiperiod Production Planning 437 Skill‐Builder Exercise 13.5 439 Multiperiod Financial Planning 439 Skill‐Builder Exercise 13.6 440 A Model with Bounded Variables 440 A Production/Marketing Allocation Model 445 How Solver Works 449 Basic Concepts Review Questions 450 Contents Problems and Applications 450 Case: Haller’s Pub & Brewery 457 Chapter 14 INTEGER, NONLINEAR, AND ADVANCED OPTIMIZATION METHODS 458 Introduction 458 Integer Optimization Models 459 A Cutting Stock Problem 459 Solving Integer Optimization Models 460 Skill‐Builder Exercise 14.1 462 Integer Optimization Models with Binary Variables 463 Project Selection 463 Site Location Model 464 Skill‐Builder Exercise 14.2 467 Computer Configuration 467 Skill‐Builder Exercise 14.3 470 A Supply Chain Facility Location Model 470 Mixed Integer Optimization Models 471 Plant Location Model 471 A Model with Fixed Costs 473 Nonlinear Optimization 475 Hotel Pricing 475 Solving Nonlinear Optimization Models 477 Markowitz Portfolio Model 479 Skill‐Builder Exercise 14.4 482 Evolutionary Solver for Nonsmooth Optimization 482 Rectilinear Location Model 484 Skill‐Builder Exercise 14.5 484 Job Sequencing 485 Skill‐Builder Exercise 14.6 488 Risk Analysis and Optimization 488 Combining Optimization and Simulation 491 A Portfolio Allocation Model 491 Using OptQuest 492 Skill‐Builder Exercise 14.7 500 Basic Concepts Review Questions 500 Problems and Applications 500 Case: Tindall Bookstores 506 Appendix 509 Index 521 xix This page intentionally left blank PREFACE INTENDED AUDIENCE Statistics, Data Analysis, and Decision Modeling was written to meet the need for an introductory text that provides the fundamentals of business statistics and decision models/ optimization, focusing on practical applications of data analysis and decision modeling, all presented in a simple and straightforward fashion. The text consists of 14 chapters in two distinct parts. The first eight chapters deal with statistical and data analysis topics, while the remaining chapters deal with decision models and applications. Thus, the text may be used for: • MBA or undergraduate business programs that combine topics in business statistics and management science into a single, brief, quantitative methods • Business programs that teach statistics and management science in short, modular courses • Executive MBA programs • Graduate refresher courses for business statistics and management science NEW TO THIS EDITION The fifth edition of this text has been carefully revised to improve clarity and pedagogical features, and incorporate new and revised topics. Many significant changes have been made, which include the following: 1. Spreadsheet-based tools and applications are compatible with Microsoft Excel 2010, which is used throughout this edition. 2. Every chapter has been carefully revised to improve clarity. Many explanations of critical concepts have been enhanced using new business examples and data sets. The sequencing of several topics have been reorganized to improve their flow within the book. 3. Excel, PHStat, and other software notes have been moved to chapter appendixes so as not to disrupt the flow of the text. 4. “Skill-Builder” exercises, designed to provide experience with applying Excel, have been located in the text to facilitate immediate application of new concepts. 5. Data used in many problems have been changed, and new problems have been added. SUBSTANCE The danger in using quantitative methods does not generally lie in the inability to perform the requisite calculations, but rather in the lack of a fundamental understanding of why to use a procedure, how to use it correctly, and how to properly interpret results. A key focus of this text is conceptual understanding using simple and practical examples rather than a plug-and-chug or point-and-click mentality, as are often done in other texts, supplemented by appropriate theory. On the other hand, the text does not attempt to be an encyclopedia of detailed quantitative procedures, but focuses on useful concepts and tools for today's managers. To support the presentation of topics in business statistics and decision modeling, this text integrates fundamental theory and practical applications in a spreadsheet environment using Microsoft Excel 2010 and various spreadsheet add-ins, specifically: • PHStat, a collection of statistical tools that enhance the capabilities of Excel; published by Pearson Education xxi xxii Preface • Crystal Ball (including CBPredictor for forecasting and OptQuest for optimization), a powerful commercial package for risk analysis • TreePlan, a decision analysis add-in • SimQuick, an Excel-based application for process simulation, published by Pearson Education • Risk Solver Platform for Education, an Excel-based tool for risk analysis, simulation, and optimization These tools have been integrated throughout the text to simplify the presentations and implement tools and calculations so that more focus can be placed on interpretation and understanding the managerial implications of results. TO THE STUDENTS The Companion Website for this text (www.pearsonhighered.com/evans) contains the following: • Data files—download the data and model files used throughout the text in examples, problems, and exercises • PHStat—download of the software from Pearson • TreePlan—link to a free trial version • Risk Solver Platform for Education—link to a free trial version • Crystal Ball—link to a free trial version • SimQuick—link that will direct you to where you may purchase a standalone version of the software from Pearson • Subscription Content—a Companion Website Access Code is located on the back cover of this book. This code gives you access to the following software: • Risk Solver Platform for Education—link that will direct students to an upgrade version • Crystal Ball—link that will direct students to an upgrade version • SimQuick—link that will allow you to download the software from Pearson To redeem the subscription content: • Visit www.pearsonhighered.com/evans. • Click on the Companion Website link. • Click on the Subscription Content link. • First-time users will need to register, while returning users may log-in. • Once you are logged in you will be brought to a page which will inform you how to download the software from the corresponding software company's Web site. TO THE INSTRUCTORS To access instructor solutions files, please visit www.pearsonhighered.com/evans and choose the instructor resources option. A variety of instructor resources are available for instructors who register for our secure environment. The Instructor’s Solutions Manual files and PowerPoint presentation files for each chapter are available for download. As a registered faculty member, you can login directly to download resource files, and receive immediate access and instructions for installing Course Management content to your campus server. Need help? Our dedicated Technical Support team is ready to assist instructors with questions about the media supplements that accompany this text. Visit http://247.pearsoned.com/ for answers to frequently asked questions and toll-free user support phone numbers. Preface xxiii ACKNOWLEDGMENTS I would like to thank the following individuals who have provided reviews and insightful suggestions for this edition: Ardith Baker (Oral Roberts University), Geoffrey Barnes (University of Iowa), David H. Hartmann (University of Central Oklahoma), Anthony Narsing (Macon State College), Tony Zawilski (The George Washington University), and Dr. J. H. Sullivan (Mississippi State University). In addition, I thank the many students who over the years provided numerous suggestions, data sets and problem ideas, and insights into how to better present the material. Finally, appreciation goes to my editor Chuck Synovec; Mary Kate Murray, Editorial Project Manager; Ashlee Bradbury, Editorial Assistant; and the entire production staff at Pearson Education for their dedication in developing and producing this text. If you have any suggestions or corrections, please contact me via email at james. evans@uc.edu. James R. Evans University of Cincinnati This page intentionally left blank PART I Statistics and Data Analysis This page intentionally left blank Chapter 1 Data and Business Decisions ■ INTRODUCTION 4 ■ DATA IN THE BUSINESS ENVIRONMENT 4 ■ SOURCES AND TYPES OF DATA 6 n ■ n ■ ■ Metrics and Data Classification 7 STATISTICAL THINKING 11 Populations and Samples 12 USING MICROSOFT EXCEL 13 n Basic Excel Skills 14 n Copying Formulas and Cell References 14 n Functions 16 n Other Useful Excel Tips 18 n Excel Add‐Ins 19 DISPLAYING DATA WITH EXCEL CHARTS 21 n Column and Bar Charts 21 n Line Charts 23 n Pie Charts n Area Charts 24 n Scatter Diagrams 24 n Miscellaneous Excel Charts 25 n Ethics and Data Presentation 25 23 ■ BASIC CONCEPTS REVIEW QUESTIONS 27 ■ PROBLEMS AND APPLICATIONS 27 ■ CASE: A DATA COLLECTION AND ANALYSIS PROJECT 28 ■ APPENDIX 1.1: EXCEL AND PHSTAT NOTES 28 n A. Using the PHStat Stack Data and Unstack Data Tools 28 n B. Creating Charts in Excel 2010 29 3 4 Part I • Statistics and Data Analysis INTRODUCTION Since the dawn of the electronic age and the Internet, both individuals and organizations have had access to an enormous wealth of data and information. Data are numerical facts and figures that are collected through some type of measurement process. Information comes from analyzing data; that is, extracting meaning from data to support evaluation and decision making. Modern organizations—which include for‐profit businesses such as retailers, manufacturers, hotels, and airlines, as well as nonprofit organizations like hospitals, educational institutions, and government agencies—need good data to evaluate daily performance and to make critical strategic and operational decisions. The purpose of this book is to introduce you to statistical methods for analyzing data; ways of using data effectively to make informed decisions; and approaches for developing, analyzing, and solving models of decision problems. Part I of this book (Chapters 1–8) focuses on key issues of statistics and data analysis, and Part II (Chapters 9–14) introduces you to various types of decision models that rely on good data analysis. In this chapter, we discuss the roles of data analysis in business, discuss how data are used in evaluating business performance, introduce some fundamental issues of statistics and measurement, and introduce spreadsheets as a support tool for data analysis and decision modeling. DATA IN THE BUSINESS ENVIRONMENT Data are used in virtually every major function in business, government, health care, education, and other nonprofit organizations. For example: • Annual reports summarize data about companies’ profitability and market share both in numerical form and in charts and graphs to communicate with shareholders. • Accountants conduct audits and use statistical methods to determine whether figures reported on a firm’s balance sheet fairly represents the actual data by examining samples (that is, subsets) of accounting data, such as accounts receivable. • Financial analysts collect and analyze a variety of data to understand the contribution that a business provides to its shareholders. These typically include profitability, revenue growth, return on investment, asset utilization, operating margins, earnings per share, economic value added (EVA), shareholder value, and other relevant measures. • Marketing researchers collect and analyze data to evaluate consumer perceptions of new products. • Operations managers use data on production performance, manufacturing quality, delivery times, order accuracy, supplier performance, productivity, costs, and environmental compliance to manage their operations. • Human resource managers measure employee satisfaction, track turnover, training costs, employee satisfaction, turnover, market innovation, training effectiveness, and skills development. • Within the federal government, economists analyze unemployment rates, manufacturing capacity and global economic indicators to provide forecasts and trends. • Hospitals track many different clinical outcomes for regulatory compliance reporting and for their own analysis. • Schools analyze test performance and state boards of education use statistical performance data to allocate budgets to school districts. Data support a variety of company purposes, such as planning, reviewing company performance, improving operations, and comparing company performance with competitors’ or “best practices” benchmarks. Data that organizations use should focus on critical success factors that lead to competitive advantage. An example from the Chapter 1 • Data and Business Decisions Boeing Company shows the value of having good business data and analysis capabilities.1 In the early 1990s, Boeing’s assembly lines were morasses of inefficiency. A manual numbering system dating back to World War II bomber days was used to keep track of an airplane’s four million parts and 170 miles of wiring; changing a part on a 737’s landing gear meant renumbering 464 pages of drawings. Factory floors were covered with huge tubs of spare parts worth millions of dollars. In an attempt to grab market share from rival Airbus, the company discounted planes deeply and was buried by an onslaught of orders. The attempt to double production rates, coupled with implementation of a new production control system, resulted in Boeing being forced to shut down its 737 and 747 lines for 27 days in October 1997, leading to a $178 million loss and a shakeup of top management. Much of the blame was focused on Boeing’s financial practices and lack of real‐time financial data. With a new Chief Financial Officer and finance team, the company created a “control panel” of vital measures, such as materials costs, inventory turns, overtime, and defects, using a color‐coded spreadsheet. For the first time, Boeing was able to generate a series of charts showing which of its programs were creating value and which were destroying it. The results were eye‐opening and helped formulate a growth plan. As one manager noted, “The data will set you free.” Data also provide key inputs to decision models. A decision model is a logical or mathematical representation of a problem or business situation that can be developed from theory or observation. Decision models establish relationships between actions that decision makers might take and results that they might expect, thereby allowing the decision makers to predict what might happen based on the model. For instance, the manager of a grocery store might want to know how best to use price promotions, coupons, and advertising to increase sales. In the past, grocers have studied the relationship of sales volume to programs such as these by conducting controlled experiments to identify the relationship between actions and sales volumes.2 That is, they implement different combinations of price promotions, coupons, and advertising (the decision variables), and then observe the sales that result. Using the data from these experiments, we can develop a predictive model of sales as a function of these decision variables. Such a model might look like the following: Sales = a + b * Price + c * Coupons + d * Advertising + e * Price * Advertising where a, b, c, d, and e are constants that are estimated from the data. By setting levels for price, coupons, and advertising, the model estimates a level of sales. The manager can use the model to help identify effective pricing, promotion, and advertising strategies. Because of the ease with which data can be generated and transmitted today, managers, supervisors, and front‐line workers can easily be overwhelmed. Data need to be summarized in a quantitative or visual fashion. One of the most important tools for doing this is statistics, which David Hand, former president of the Royal Statistical Society in the UK, defines as both the science of uncertainty and the technology of extracting information from data.3 Statistics involve collecting, organizing, analyzing, interpreting, and presenting data. A statistic is a summary measure of data. You are undoubtedly familiar with the concept of statistics in daily life as reported in newspapers and the media; baseball batting averages, airline on‐time arrival performance, and economic statistics such as Consumer Price Index are just a few examples. We can easily google statistical information about investments and financial markets, college loans and home mortgage rates, survey results about national political issues, team and individual 1 Jerry Useem, “Boeing versus Boeing,” Fortune, October 2, 2000, 2 “Flanking in a Price War,” Interfaces, Vol. 19, No. 2, 1989, 1–12. 3 148–160. David Hand, “Statistics: An Overview,” in Miodrag Lovric, Ed., International Encyclopedia of Statistical Science, Springer Major Reference; http://www.springer.com/statistics/book/978-3-642-04897-5, p. 1504. 5 6 Part I • Statistics and Data Analysis sports performance, and well, just about anything. To paraphrase Apple, “There’s a stat for that!” Modern spreadsheet technology, such as Microsoft Excel, has made it quite easy to apply statistical tools to organize, analyze, and present data to make them more understandable. Most organizations have traditionally focused on financial and market information, such as profit, sales volume, and market share. Today, however, many organizations use a wide variety of measures that provide a comprehensive view of business performance. For example, the Malcolm Baldrige Award Criteria for Performance Excellence, which many organizations use as a high‐performance management framework, suggest that high‐performing organizations need to measure results in five basic categories: 1. Product and process outcomes, such as reliability, performance, defect levels, service errors, response times, productivity, production flexibility, setup times, time to market, waste stream reductions, innovation, emergency preparedness, strategic plan accomplishment, and supply chain effectiveness. 2. Customer‐focused outcomes, such as customer satisfaction and dissatisfaction, customer retention, complaints and complaint resolution, customer perceived value, and gains and losses of customers. 3. Workforce‐focused outcomes, such as workforce engagement and satisfaction, employee retention, absenteeism, turnover, safety, training effectiveness, and leadership development. 4. Leadership and governance outcomes, such as communication effectiveness, governance and accountability, environmental and regulatory compliance, ethical behavior, and organizational citizenship. 5. Financial and market outcomes. Financial outcomes might include revenue, profit and loss, net assets, cash‐to‐cash cycle time, earnings per share, and financial operations efficiency (collections, billings, receivables). Market outcomes might include market share, business growth, and new products and service introductions. Understanding key relationships among these types of measures can help organizations make better decisions. For example, Sears, Roebuck and Company provided a consulting group with 13 financial measures, hundreds of thousands of employee satisfaction data points, and millions of data points on customer satisfaction. Using advanced statistical tools, the analysts discovered that employee attitudes about the job and the company are key factors that predict their behavior with customers, which, in turn, predicts the likelihood of customer retention and recommendations, which, in turn, predict financial performance. As a result, Sears was able to predict that if a store increases its employee satisfaction score by five units, customer satisfaction scores will go up by two units and revenue growth will beat the stores’ national average by 0.5%.4 Such an analysis can help managers make decisions, for instance, on improved human resource policies. SOURCES AND TYPES OF DATA Data may come from a variety of sources: internal record‐keeping, special studies, and external databases. Internal data are routinely collected by accounting, marketing, and operations functions of a business. These might include production output, material costs, sales, accounts receivable, and customer demographics. Other data must be generated through special efforts. For example, customer satisfaction data are often acquired by mail, 4 “Bringing Sears into the New World,” Fortune, October 13, 1997, 183–184. Chapter 1 • Data and Business Decisions Internet, or telephone surveys; personal interviews; or focus groups. External databases are often used for comparative purposes, marketing projects, and economic analyses. These might include population trends, interest rates, industry performance, consumer spending, and international trade data. Such data can be found in annual reports, Standard & Poor’s Compustat data sets, industry trade associations, or government databases. One example of a comprehensive government database is FedStats (www .fedstats.gov), which has been available to the public since 1997. FedStats provides access to the full range of official statistical information produced by the Federal Government without having to know in advance which Federal agency produces which particular statistic. With convenient searching and linking capabilities to more than 100 agencies— which provide data and trend information on such topics as economic and population trends, crime, education, health care, aviation safety, energy use, farm production, and more—FedStats provides one location for access to the full breadth of Federal statistical information. The use of data for analysis and decision making certainly is not limited to business. Science, engineering, medicine, and sports, to name just a few, are examples of professions that rely heavily on data. Table 1.1 provides a list of data files that are available in the Statistics Data Files folder on the Companion Website accompanying this book. All are saved in Microsoft Excel workbooks. These data files will be used throughout this book to illustrate various issues associated with statistics and data analysis and also for many of the questions and problems at the end of the chapters. They show but a sample of the wide variety of applications for which statistics and data analysis techniques may be used. Metrics and Data Classification A metric is a unit of measurement that provides a way to objectively quantify performance. For example, senior managers might assess overall business performance using such metrics as net profit, return on investment, market share, and customer satisfaction. A supervisor in a manufacturing plant might monitor the quality of a production process for a polished faucet by visually inspecting the products and counting the number of surface defects. A useful metric would be the percentage of faucets that have surface defects. For a web‐based retailer, some useful metrics are the percentage of orders filled accurately and the time taken to fill a customer’s order. Measurement is the act of obtaining data associated with a metric. Measures are numerical values associated with a metric. Metrics can be either discrete or continuous. A discrete metric is one that is derived from counting something. For example, a part dimension is either within tolerance or out of tolerance; an order is complete or incomplete; or an invoice can have one, two, three, or any number of errors. Some discrete metrics associated with these examples would be the proportion of parts whose dimensions are within tolerance, the number of incomplete orders for each day, and the number of errors per invoice. Continuous metrics are based on a continuous scale of measurement. Any metrics involving dollars, length, time, volume, or weight, for example, are continuous. A key performance dimension might be measured using either a continuous or a discrete metric. For example, an airline flight is considered on time if it arrives no later than 15 minutes from the scheduled arrival time. We could evaluate on‐time performance by counting the number of flights that are late, or by measuring the number of minutes that flights are late. Discrete data are usually easier to capture and record, but provide less information than continuous data. However, one generally must collect a larger amount of discrete data to draw appropriate statistical conclusions as compared to continuous data. 7 8 Part I • Statistics and Data Analysis TABLE 1.1 Data Files Available on Companion Website Business and Economics Accounting Professionals Atlanta Airline Data Automobile Quality Baldrige Banking Data Beverage Sales Call Center Data Cell Phone Survey Cereal Data China Trade Data Closing Stock Prices Coal Consumption Coal Production Concert Sales Consumer Price Index Consumer Transportation Survey Credit Approval Decisions Customer Support Survey Customer Survey DJIA December Close EEO Employment Report Employees Salaries Energy Production & Consumption Federal Funds Rate Gas & Electric Gasoline Prices Gasoline Sales Google Stock Prices Hatco Hi‐Definition Televisions Home Market Value House Sales Housing Starts Insurance Survey Internet Usage Microprocessor Data Mortgage Rates New Account Processing New Car Sales Nuclear Power Prime Rate Quality Control Case Data Quality Measurements Refrigerators Residential Electricity Data Restaurant Sales Retail Electricity Prices Retirement Portfolio Room Inspection S&P 500 Salary Data Sales Data Sampling Error Experiment Science and Engineering Jobs State Unemployment Rates Statistical Quality Control Problems Surgery Infections Syringe Samples Treasury Yield Rates Unions and Labor Law Data University Grant Proposals Behavioral and Social Sciences Arizona Population Blood Pressure Burglaries California Census Data Census Education Data Church Contributions Colleges and Universities Death Cause Statistics Demographics Facebook Survey Freshman College Data Graduate School Survey Infant Mortality MBA Student Survey Ohio Education Performance Ohio Prison Population Self‐Esteem Smoking & Cancer Student Grades Vacation Survey Science and Engineering Pile Foundation Seattle Weather Surface Finish Washington, DC, Weather Sports Baseball Attendance Golfing Statistics Major League Baseball NASCAR Track Data National Football League Olympic Track and Field Data Chapter 1 • Data and Business Decisions FIGURE 1.1 Example of Cross‐Sectional, Univariate Data (Portion of Automobile Quality) When we deal with data, it is important to understand the type of data in order to select the appropriate statistical tool or procedure. One classification of data is the following: 1. Types of data • Cross‐sectional—data that are collected over a single period of time • Time series—data collected over time 2. Number of variables • Univariate—data consisting of a single variable • Multivariate—data consisting of two or more (often related) variables Figures 1.1–1.4 show examples of data sets from Table 1.1 representing each combination from this classification. Another classification of data is by the type of measurement scale. Failure to understand the differences in measurement scales can easily result in erroneous or misleading analysis. Data may be classified into four groups: 1. Categorical (nominal) data, which are sorted into categories according to specified characteristics. For example, a firm’s customers might be classified by their geographical region (North America, South America, Europe, and Pacific); employees might be classified as managers, supervisors, and associates. The categories bear no quantitative relationship to one another, but we usually assign an arbitrary number to each category to ease the process of managing the data and computing statistics. Categorical data are usually counted or expressed as proportions or percentages. FIGURE 1.2 Example of Cross‐Sectional, Multivariate Data (Portion of Banking Data) 9 10 Part I • Statistics and Data Analysis FIGURE 1.3 Example of Time‐Series, Univariate Data (Portion of Gasoline Prices) 2. Ordinal data, which are ordered or ranked according to some relationship to one another. A common example in business is data from survey scales; for example, rating a service as poor, average, good, very good, or excellent. Such data are categorical but also have a natural order, and consequently, are ordinal. Other examples include ranking regions according to sales levels each month and NCAA basketball rankings. Ordinal data are more meaningful than categorical data because data can be compared to one another (“excellent” is better than “very good”). However, like categorical data, statistics such as averages are meaningless even if numerical codes are associated with each category (such as your class rank), because ordinal data have no fixed units of measurement. In addition, meaningful numerical statements about differences between categories cannot be made. For example, the difference in strength between basketball teams ranked 1 and 2 is not necessarily the same as the difference between those ranked 2 and 3. 3. Interval data, which are ordered, have a specified measure of the distance between observations but have no natural zero. Common examples are time and temperature. Time is relative to global location, and calendars have arbitrary starting dates. Both the Fahrenheit and Celsius scales represent a specified measure of distance—degrees— but have no natural zero. Thus we cannot take meaningful ratios; for example, we cannot say that 50° is twice as hot as 25°. Another example is SAT or GMAT scores. The scores can be used to rank students, but only differences between scores provide information on how much better one student performed over another; ratios make little sense. In contrast to ordinal data, interval data allow meaningful comparison of ranges, averages, and other statistics. In business, data from survey scales, while technically ordinal, are often treated as interval data when numerical scales are associated with the categories (for instance, FIGURE 1.4 Example of Time‐Series, Multivariate Data (Portion of Treasury Yield Rates) Chapter 1 • Data and Business Decisions 1 = poor, 2 = average, 3 = good, 4 = very good, 5 = excellent). Strictly speaking, this is not correct, as the “distance” between categories may not be perceived as the same (respondents might perceive a larger distance between poor and average than between good and very good, for example). Nevertheless, many users of survey data treat them as interval when analyzing the data, particularly when only a numerical scale is used without descriptive labels. 4. Ratio data, which have a natural zero. For example, dollar has an absolute zero. Ratios of dollar figures are meaningful. Thus, knowing that the Seattle region sold $12 million in March while the Tampa region sold $6 million means that Seattle sold twice as much as Tampa. Most business and economic data fall into this category, and statistical methods are the most widely applicable to them. This classification is hierarchical in that each level includes all of the information content of the one preceding it. For example, ratio information can be converted to any of the other types of data. Interval information can be converted to ordinal or categorical data but cannot be converted to ratio data without the knowledge of the absolute zero point. Thus, a ratio scale is the strongest form of measurement. The managerial implications of this classification are in understanding the choice and validity of the statistical measures used. For example, consider the following statements: • Sales occurred in March (categorical). • Sales were higher in March than in February (ordinal). • Sales increased by $50,000 in March over February (interval). • Sales were 20% higher in March than in February (ratio). A higher level of measurement is more useful to a manager because more definitive information describes the data. Obtaining ratio data can be more expensive than categorical data, especially when surveying customers, but it may be needed for proper analysis. Thus, before data are collected, consideration must be given to the type of data needed. STATISTICAL THINKING The importance of applying statistical concepts to make good business decisions and improve performance cannot be overemphasized. Statistical thinking is a philosophy of learning and action for improvement that is based on the following principles: • All work occurs in a system of interconnected processes. • Variation exists in all processes. • Better performance results from understanding and reducing variation.5 Work gets done in any organization through processes—systematic ways of doing things that achieve desired results. Understanding processes provides the context for determining the effects of variation and the proper type of action to be taken. Any process contains many sources of variation. In manufacturing, for example, different batches of material vary in strength, thickness, or moisture content. Cutting tools have inherent variation in their strength and composition. During manufacturing, tools experience wear, vibrations cause changes in machine settings, and electrical fluctuations cause variations in power. Workers may not position parts on fixtures consistently, and physical and emotional stress may affect workers’ consistency. In addition, measurement gauges and human inspection capabilities are not uniform, resulting in variation in measurements even when there is little variation in the true value. Similar phenomena occur in 5 Galen Britz, Don Emerling, Lynne Hare, Roger Hoerl, and Janice Shade, “How to Teach Others to Apply Statistical Thinking,” Quality Progress, June 1997, 67–79. 11 12 Part I • Statistics and Data Analysis service processes because of variation in employee and customer behavior, application of technology, and so on. While variation exists everywhere, many managers do not often recognize it or consider it in their decisions. For example, if sales in some region fell from the previous year, the regional manager might quickly blame her sales staff for not working hard, even though the drop in sales may simply be the result of uncontrollable variation. How often do managers make decisions based on one or two data points without looking at the pattern of variation, see trends when they do not exist, or try to manipulate financial results they cannot truly control? Unfortunately, the answer is “quite often.” Usually, it is simply a matter of ignorance of how to deal with data and information. A more educated approach would be to formulate a theory, test this theory in some way, either by collecting and analyzing data or developing a model of the situation. Using statistical thinking can provide better insight into the facts and nature of relationships among the many factors that may have contributed to the event and enable managers to make better decisions. In recent years, many organizations have implemented Six Sigma initiatives. Six Sigma can be best described as a business process improvement approach that seeks to find and eliminate causes of defects and errors, reduce cycle times and cost of operations, improve productivity, better meet customer expectations, and achieve higher asset use and returns on investment in manufacturing and service processes. The term six sigma is actually based on a statistical measure that equates to 3.4 or fewer errors or defects per million opportunities. Six Sigma is based on a simple problem‐solving methodology—DMAIC, which stands for Define, Measure, Analyze, Improve, and Control—that incorporates a wide variety of statistical and other types of process improvement tools. Six Sigma has heightened the awareness and application of statistics among business professionals at all levels in organizations, and the material in this book will provide the foundation for more advanced topics commonly found in Six Sigma training courses. Populations and Samples One of the most basic applications of statistics is drawing conclusions about populations from sample data. A population consists of all items of interest for a particular decision or investigation, for example, all married drivers over the age of 25 in the United States, all first‐year MBA students at a college, or all stockholders of Google. It is important to understand that a population can be anything we define it to be, such as all customers who have purchased from Amazon over the past year or individuals who do not own a cell phone. A company like Amazon keeps extensive records on its customers, making it easy to retrieve data about the entire population of customers with prior purchases. However, it would probably be impossible to identify all individuals who do not own cell phones. A population may also be an existing collection of items (for instance, all teams in the National Football League) or the potential, but unknown, output of a process (such as automobile engines produced on an assembly line). A sample is a subset of a population. For example, a list of individuals who purchased a CD from Amazon in the past year would be a sample from the population of all customers who purchased from the company. Whether this sample is representative of the population of customers—which depends on how the sample data are intended to be used—may be debatable; nevertheless, it is a sample. Sampling is desirable when complete information about a population is difficult or impossible to obtain. For example, it may be too expensive to send all previous customers a survey. In other situations, such as measuring the amount of stress needed to destroy an automotive tire, samples are necessary even though the entire population may be sitting in a warehouse. Most of Chapter 1 • Data and Business Decisions the data files in Table 1.1 represent samples, although some, like the major league baseball data, represent populations. We use samples because it is often not possible or cost‐effective to gather population data. We are all familiar with survey samples of voters prior to and during elections. A small subset of potential voters, if properly chosen on a statistical basis, can provide accurate estimates of the behavior of the voting population. Thus, television network anchors can announce the winners of elections based on a small percentage of voters before all votes can be counted. Samples are routinely used for business and public opinion polls—magazines such as Business Week and Fortune often report the results of surveys of executive opinions on the economy and other issues. Many businesses rely heavily on sampling. Producers of consumer products conduct small‐scale market research surveys to evaluate consumer response to new products before full‐scale production, and auditors use sampling as an important part of audit procedures. In 2000, the U.S. Census began using statistical sampling for estimating population characteristics, which resulted in considerable controversy and debate. Statistics are summary measures of population characteristics computed from samples. In business, statistical methods are used to present data in a concise and understandable fashion, to estimate population characteristics, to draw conclusions about populations from sample data, and to develop useful decision models for prediction and forecasting. For example, in the 2010 J.D. Power and Associates’ Initial Quality Study, Porsche led the industry with a reported 83 problems per 100 vehicles. The number 83 is a statistic based on a sample that summarizes the total number of problems reported per 100 vehicles and suggests that the entire population of Porsche owners averaged less than one problem (83/100 or 0.83) in their first 90 days of ownership. However, a particular automobile owner may have experienced zero, one, two, or perhaps more problems. The process of collection, organization, and description of data is commonly called descriptive statistics. Statistical inference refers to the process of drawing conclusions about unknown characteristics of a population based on sample data. Finally, predictive statistics—developing predictions of future values based on historical data—is the third major component of statistical methodology. In subsequent chapters, we will cover each of these types of statistical methodology. USING MICROSOFT EXCEL Spreadsheet software for personal computers has become an indispensable tool for business analysis, particularly for the manipulation of numerical data and the development and analysis of decision models. In this text, we will use Microsoft Excel 2010 for Windows to perform spreadsheet calculations and analyses. Some key differences exist between Excel 2010 and Excel 2007. We will often contrast these differences, but if you use an older version, you should be able to apply Excel easily to problems and exercises. In addition, we note that Mac versions of Excel do not have the full functionality that Windows versions have. Although Excel has some flaws and limitations from a statistical perspective, its widespread availability makes it the software of choice for many business professionals. We do wish to point out, however, that better and more powerful statistical software packages are available, and serious users of statistics should consult a professional statistician for advice on selecting the proper software. We will briefly review some of the fundamental skills needed to use Excel for this book. This is not meant to be a complete tutorial; many good Excel tutorials can be found online, and we also encourage you to use the Excel help capability (by clicking the question mark button at the top right of the screen). 13 14 Part I • Statistics and Data Analysis Basic Excel Skills To be able to apply the procedures and techniques we will study in this book, it is necessary for you to know many of the basic capabilities of Excel. We will assume that you are familiar with the most elementary spreadsheet concepts and procedures: • Opening, saving, and printing files • Moving around a spreadsheet • Selecting ranges • Inserting/deleting rows and columns • Entering and editing text, numerical data, and formulas • Formatting data (number, currency, decimal places, etc.) • Working with text strings • Performing basic arithmetic calculations • Formatting data and text • Modifying the appearance of the spreadsheet • Sorting data Excel has extensive online help, and many good manuals and training guides are available both in print and online, and we urge you to take advantage of these. However, to facilitate your understanding and ability, we will review some of the more important topics in Excel with which you may or may not be familiar. Other tools and procedures in Excel that are useful in statistics, data analysis, or decision modeling will be introduced as we need them. SKILL‐BUILDER EXERCISE 1.1 Sort the data in the Excel file Automobile Quality from lowest to highest number of problems per 100 vehicles using the sort capability in Excel. Menus and commands in Excel 2010 reside in the “ribbon” shown in Figure 1.5. Menus and commands are arranged in logical groups under different tabs (File, Home, Insert, and so on); small triangles pointing downward indicate menus of additional choices. We will often refer to certain commands or options and where they may be found in the ribbon. Copying Formulas and Cell References Excel provides several ways of copying formulas to different cells. This is extremely useful in building decision models, because many models require replication of formulas for different periods of time, similar products, and so on. One way is to select the cell with the formula to be copied, click the Copy button from the Clipboard group under the Home tab (or simply press Ctrl‐C on your keyboard), click on the cell you wish to FIGURE 1.5 Excel 2010 Ribbon Chapter 1 • Data and Business Decisions FIGURE 1.6 Copying Formulas by Dragging copy to, and then click the Paste button (or press Ctrl‐V). You may also enter a formula directly in a range of cells without copying and pasting by selecting the range, typing in the formula, and pressing Ctrl‐Enter. To copy a formula from a single cell or range of cells down a column or across a row, first select the cell or range, then click and hold the mouse on the small square in the lower right‐hand corner of the cell (the “fill handle”), and drag the formula to the “target” cells you wish to copy to. To illustrate this technique, suppose we wish to compute the differences in projected employment for each occupation in the Excel file Science and Engineering Jobs. In Figure 1.6, we have added a column for the difference and entered the formula = C10‐B10 in the first row. Highlight cell D4 and then simply drag the handle down the column. Figure 1.7 shows the results. SKILL‐BUILDER EXERCISE 1.2 Modify the Excel file Science and Engineering Jobs to compute the percent increase in the number of jobs for each occupational category. In any of these procedures, the structure of the formula is the same as in the original cell, but the cell references have been changed to reflect the relative addresses of the formula in the new cells. That is, the new cell references have the same relative relationship to the new formula cell(s) as they did in the original formula cell. Thus, if a formula is copied (or moved) one cell to the right, the relative cell addresses will have their column label increased by one; if we copy or move the formula two cells down, the row FIGURE 1.7 Results of Dragging Formulas 15 16 Part I • Statistics and Data Analysis FIGURE 1.8 Formulas for Science and Engineering Jobs Worksheet number is increased by 2. Figure 1.8 shows the formulas for the Science and Engineering Jobs spreadsheet example. For example, note that the formulas in each row are the same, except for the column reference. Sometimes, however, you do not want to change the relative addressing because you would like all the copied formulas to point to a certain cell. We do this by using a $ before the column and/or row address of the cell. This is called an absolute address. For example, suppose we wish to compute the percent of the total for each occupation for 2010. In cell E4, enter the formula = C4/$C$12. Then, if we copy this formula down column E for other months, the numerator will change to reference each occupation, but the denominator will still point to cell C12 (see Figure 1.9). You should be very careful to use relative and absolute addressing appropriately in your models. Functions Functions are used to perform special calculations in cells. Some of the more common functions that we will use in statistical applications include the following: MIN(range)—finds the smallest value in a range of cells MAX(range)—finds the largest value in a range of cells SUM(range)—finds the sum of values in a range of cells AVERAGE(range)—finds the average of the values in a range of cells COUNT(range)—finds the number of cells in a range that contain numbers COUNTIF(range, criteria)—finds the number of cells within a range that meet specified criteria Other more advanced functions often used in decision models are listed below: AND(condition 1, condition 2…)—a logical function that returns TRUE if all conditions are true, and FALSE if not FIGURE 1.9 Example of Absolute Address Referencing Chapter 1 • Data and Business Decisions OR(condition 1, condition 2…)—a logical function that returns TRUE if any condition is true, and FALSE if not IF(condition, value if true, value if false)—a logical function that returns one value if the condition is true and another if the condition is false VLOOKUP(value, table range, column number)—looks up a value in a table Excel has a wide variety of other functions for statistical, financial, and other applications, many of which we will use throughout the text. The easiest way to locate a particular function is to select a cell and click on the Insert function button [ fx], which can be found under the ribbon next to the formula bar and also in the Function Library group in the Formulas tab. This is particularly useful even if you know what function to use but you are not sure of what arguments to enter. Figure 1.10 shows the dialog box from which you may select the function you wish to use, in this case, the AVERAGE function. Once this is selected, the dialog box in Figure 1.11 appears. When you click in an input cell, a description of the argument is shown. Thus, if you were not sure what to enter for the argument number 1, the explanation in Figure 1.11 will help you. For further information, you could click on the Help on this function link button in the lower left‐hand corner. The IF function, IF(condition, value if true, value if false), allows you to choose one of two values to enter into a cell. If the specified condition is true, value A will be put in the cell. If the condition is false, value B will be entered. For example, if cell C2 contains the function = IF(A8 = 2,7,12), it states that if the value in cell A8 is 2, the number 7 will be assigned to cell C2; if the value in cell A8 is not 2, the number 12 will be assigned to cell C2. “Conditions” may include the following: = 7 6 equal to greater than less than FIGURE 1.10 Insert Function Dialog 17 18 Part I • Statistics and Data Analysis FIGURE 1.11 7= 6= Function Arguments Dialog for Average greater than or equal to less than or equal to not equal to You may “nest” up to seven IF functions by replacing value‐if‐true or value‐if‐false in an IF function with another IF function, for example: = IF(A8 = 2,(IF(B3 = 5, “YES”,””)),15) This says that if cell A8 equals 2, then check the contents of cell B3. If cell B3 is 5, then the value of the function is the text string YES; if not, it is a blank space (a text string that is blank). However, if cell A8 is not 2, then the value of the function is 15 no matter what cell B3 is. You may use AND and OR functions as the condition within an IF function, for example: = IF(AND(B1 = 3,C1 = 5),12,22). Here, if cell B1 = 3 and cell C1 = 5, then the value of the function is 12, otherwise it is 22. SKILL‐BUILDER EXERCISE 1.3 In the Excel file Residential Electricity Data, use Excel functions to find the maximum, minimum, and total for the Number of Consumers and Average Monthly Consumption for all census divisions. Other Useful Excel Tips • Split Screen. You may split the worksheet horizontally and/or vertically to view different parts of the worksheet at the same time. The vertical splitter bar is just to the right of the bottom scroll bar, and the horizontal splitter bar is just above the right‐hand scroll bar. Position your cursor over one of these until it changes shape, click, and drag the splitter bar to the left or down. • Paste Special. When you normally copy (one or more) cells and paste them in a worksheet, Excel places an exact copy of the formulas or data in the cells (except for relative addressing). Often you simply want the result of formulas, so the data will remain constant even if other parameters used in the formulas change. To do this, use the Paste Special option found within the Paste menu in the Clipboard Chapter 1 • Data and Business Decisions group under the Home tab instead of the Paste command. Choosing Paste Values will paste the result of the formulas from which the data were calculated. • Column and Row Widths. Many times a cell contains a number that is too large to display properly because the column width is too small. You may change the column width to fit the largest value or text string anywhere in the column by positioning the cursor to the right of the column label so that it changes to a cross with horizontal arrows, and then double‐click. You may also move the arrow to the left or right to manually change the column width. You may change the row heights in a similar fashion by moving the cursor below the row number label. This can be especially useful if you have a very long formula to display. To break a formula within a cell, position the cursor at the break point in the formula bar and press Alt‐Enter. • Displaying Formulas in Worksheets. Choose Show Formulas in the Formula Auditing group under the Formulas tab. You will probably need to change the column width to display the formulas properly. • Displaying Grid Lines and Row and Column Headers for Printing. Check the Print boxes for gridlines and headings in the Sheet Options group under the Page Layout tab. Note that the Print command can be found by clicking on the Office button. • Filling a Range with a Series of Numbers. Suppose you want to build a worksheet for entering 100 data values. It would be tedious to have to enter the numbers from 1 to 100 one at a time. Simply fill in the first few values in the series and highlight them. Now click and drag the small square (fill handle) in the lower right‐hand corner down (Excel will show a small pop‐up window that tells you the last value in the range) until you have filled in the column to 100; then release the mouse. Excel Add‐Ins Microsoft Excel will provide most of the computational support required for the material in this book. Excel (Windows only) provides an add‐in called the Analysis Toolpak, which contains a variety of tools for statistical computation, and Solver, which is used for optimization. These add‐ins are not included in a standard Excel installation. To install them in Excel 2010, click the File tab and then Options in the left column. Choose Add‐Ins from the left column. At the bottom of the dialog, make sure Excel Add‐ins is selected in the Manage: box and click Go. In the Add‐Ins dialog, if Analysis Toolpak, Analysis Toolpak VBA, and Solver Add‐in are not checked, simply check the boxes and click OK. You will not have to repeat this procedure every time you run Excel in the future. Four other add‐ins available with this book provide additional capabilities and features not found in Excel and will be used in various chapters in this book. Prentice‐Hall’s PHStat2 (which we will simply refer to as PHStat) add‐in provides useful statistical support that extends the capabilities of Excel.6 Refer to the installation procedures on the Companion Website. PHStat will be used in Chapters 1–8 and in Chapter 11. The student version of Crystal Ball provides a comprehensive set of tools for performing risk analysis simulations. Crystal Ball will be used in Chapter 10. TreePlan provides Excel support for decision trees and will be used in Chapter 11. Finally, Frontline Systems’ Risk Solver Platform7 provides a replacement (called Premium Solver) for the default Solver in Excel and will be used in Chapters 13 and 14. The Companion Website also includes an Excel workbook, SimQuick‐v2.xls, which will be used for process simulation in Chapter 12. 6 The latest version of PHStat, PHStat2, is included on the Companion Website. Enhanced versions and updates may be published on the PHStat Web site at www.prenhall.com/phstat. To date, PHStat is not available for Mac. 7 Risk Solver Platform is a full‐featured package that contains many other tools similar to other add‐ins we use in this book. However, we will use only the Premium Solver component. 19 20 Part I • Statistics and Data Analysis FIGURE 1.12 Spreadsheet Note Excel Worksheet Process Capability Throughout this book, we will provide many notes that describe how to use specific features of Microsoft Excel, PHStat, or other add‐ins. These are summarized in chapter appendixes and are noted in the text by a margin icon when they will be useful to supplement examples and discussions of applications. It is important to read these notes and apply the procedures described in them in order to gain a working knowledge of the software features to which they refer. We will illustrate the use of one of the PHStat procedures. In many cases, data on Excel worksheets may not be in the proper form to use a statistical tool. Figure 1.12, for instance, shows the worksheet Process Capability from the Excel file Quality Measurements, which we use for a case problem later in this book. Some tools in the Analysis Toolpak require that the data be organized in a single column in the worksheet. As a user, you have two choices. You can manually move the data within the worksheet, or you can use a utility from the Data Preparation menu in PHStat called Stack Data (see the note Using the Stack Data and Unstack Data Tools in the Appendix to this chapter). The tool creates a new worksheet called “Stacked” in your Excel workbook, a portion of which is shown in Figure 1.13. If the original data columns have group labels (headers), then the column labeled “Group” will show them; otherwise, as in this example, the columns are simply labeled as Group1, Group2, and so on. In this example, Group1 refers to the data in the first column. If you apply the Unstack Data tool to the data in Figure 1.13, you will put the data in its original form. SKILL‐BUILDER EXERCISE 1.4 Use the PHStat Stack tool to stack the sample observations for the first shift in the Excel file Syringe Samples. Then, modify the Excel file Automobile Quality to label each car brand as either Foreign or Domestic, use the PHStat Unstack tool to group them. Chapter 1 • Data and Business Decisions 21 FIGURE 1.13 Portion of Stacked Worksheet DISPLAYING DATA WITH EXCEL CHARTS The Excel file EEO Employment Report provides data on the employment in the state of Alabama for 2006. Figure 1.14 shows a portion of this data set. Raw data such as these are often difficult to understand and interpret. Graphs and charts provide a convenient way to visualize data and provide information and insight for making better decisions. Microsoft Excel provides an easy way to create charts within your spreadsheet (see the section on Creating Charts in Excel in Appendix 1.1). These include vertical and horizontal bar charts, line charts, pie charts, area charts, scatter plots, three‐ dimensional charts, and many other special types of charts. We generally will not guide you through every application but will provide some guidance for new procedures as appropriate. Spreadsheet Note Column and Bar Charts Excel distinguishes between vertical and horizontal bar charts, calling the former column charts and the latter bar charts. A clustered column chart compares values across categories using vertical rectangles; a stacked column chart displays the contribution of each value to the total by stacking the rectangles; and a 100% stacked column chart compares the percentage that each value contributes to a total. An example of a clustered column chart is FIGURE 1.14 Portion of EEO Commission Employment Report 22 Part I • Statistics and Data Analysis shown in Figure 1.15 for the Alabama employment data shown previously; Figure 1.16 shows a stacked column chart for the same data. Bar charts present information in a similar fashion, only horizontally instead of vertically. SKILL‐BUILDER EXERCISE 1.5 Create the column chart shown in Figure 1.15 for the EEO Employment Report data. FIGURE 1.15 Column Chart for Alabama Employment Data FIGURE 1.16 Stacked Column Chart Chapter 1 • Data and Business Decisions FIGURE 1.17 Line Chart for U.S. to China Exports Line Charts Line charts provide a useful means for displaying data over time. For instance, a line chart showing the amount of U.S. exports to China in billions of dollars from the Excel file China Trade Data is shown in Figure 1.17. The chart clearly shows a significant rise in exports starting in the year 2000, which began to level off around 2008. You may plot multiple data series in line charts; however, they can be difficult to interpret if the magnitude of the data values differs greatly. In this case, it would be advisable to create separate charts for each data series. SKILL‐BUILDER EXERCISE 1.6 Create line charts for the closing prices in the Excel file S&P 500. Pie Charts For many types of data, we are interested in understanding the relative proportion of each data source to the total. For example, consider the marital status of individuals in the U.S. population in the Excel file Census Education Data, a portion of which is shown in Figure 1.18. To show the relative proportion in each category, we can use a pie chart, as shown in Figure 1.19. This chart uses a layout option that shows the labels associated with the data, but not the actual values or proportions. A different layout that shows both can also be chosen. SKILL‐BUILDER EXERCISE 1.7 Create a pie chart showing the breakdown of occupations in the Science and Engineering Jobs Excel file. 23 24 Part I • Statistics and Data Analysis FIGURE 1.18 Data FIGURE 1.19 Portion of Census Education Pie Chart for Marital Status Area Charts An area chart combines the features of a pie chart with those of line charts. For example, Figure 1.20 displays total energy consumption (billion Btu) and consumption of fossil fuels from the Excel file Energy Production & Consumption. This chart shows that while total energy consumption has grown since 1949, the relative proportion of fossil fuel consumption has remained generally consistent at about half of the total, indicating that alternative energy sources have not replaced a significant portion of fossil fuel consumption. Area charts present more information than pie or line charts alone but may clutter the observer’s mind with too many details if too many data series are used; thus, they should be used with care. Scatter Diagrams Scatter diagrams show the relationship between two variables. Figure 1.21 shows a scatter diagram of house size (in square feet) versus the home market value from the Excel file Home Market Value. The data show that higher market values are associated with larger homes. In Chapter 2 we shall see how to describe such a relationship numerically. SKILL‐BUILDER EXERCISE 1.8 Create a scatter diagram showing the relationship between Hours online/week and Log‐ins/day in the Facebook Survey data. Chapter 1 • Data and Business Decisions FIGURE 1.20 Area Chart for Energy Consumption FIGURE 1.21 Scatter Diagram of House Size Versus Market Value Miscellaneous Excel Charts Excel provides several additional charts for special applications (see Figure 1.22). A stock chart allows you to plot stock prices, such as the daily high, low, and close. It may also be used for scientific data such as temperature changes. A surface chart shows three‐dimensional data. A doughnut chart is similar to a pie chart but can contain more than one data series. A bubble chart is a type of scatter chart in which the size of the data marker corresponds to the value of a third variable; consequently, it is a way to plot three variables in two dimensions. Finally, a radar chart allows you to plot multiple dimensions of several data series. Ethics and Data Presentation In summary, tables of numbers often hide more than they inform. Graphical displays clearly make it easier to gain insights about the data. Thus, graphs and charts are a means of converting raw data into useful managerial information. However, it can be easy to distort data by manipulating the scale on the chart. For example, Figure 1.23 shows the U.S. exports to China in Figure 1.17 displayed on a different scale. The pattern looks much flatter and suggests that the rate of exports is not increasing as fast as it 25 26 Part I • Statistics and Data Analysis FIGURE 1.22 Other Excel Charts really is. It is not unusual to see distorted graphs in newspapers and magazines that are intended to support the author’s conclusions. Creators of statistical displays have an ethical obligation to report data honestly and without attempts to distort the truth. SKILL‐BUILDER EXERCISE 1.9 Create a bubble chart for the first five colleges in the Excel file Colleges and Universities for which the x‐axis is the Top 10% HS, y‐axis is Acceptance Rate, and bubbles represent the Expenditures per Student. FIGURE 1.23 An Alternate View of U.S. Exports to China Chapter 1 • Data and Business Decisions 27 Basic Concepts Review Questions 1. Explain the importance of statistics in business. 2. Explain the difference between data and information. 3. Describe some ways in which data are used in different business functions. 4. Explain how a company might use internal sources of data, special studies, and external data bases. 5. What is a metric, and how does it differ from a measure? 6. Explain the difference between a discrete and a continuous metric. 7. Explain the differences between categorical, ordinal, interval, and ratio data. 8. Explain the difference between cross‐sectional and time‐series data. 9. What is statistical thinking? Why is it an important managerial skill? 10. What is the difference between a population and a sample? 11. List the different types of charts available in Excel, and explain characteristics of data sets that make each chart most appropriate to use. 12. What types of chart would be best for displaying the data in each of the following data sets on the Companion Website? If several charts are appropriate, state this, but justify your best choice. a. b. c. d. e. f. Mortgage Rates Census Education Data Consumer Transportation Survey MBA Student Survey Vacation Survey Washington, DC, Weather Problems and Applications 1. For the Excel file Credit Approval Decisions, identify each of the variables as categorical, ordinal, interval, and ratio. 2. A survey handed out to individuals at a major shopping mall in a small Florida city in July asked the following: • Gender • Age • Ethnicity • Length of residency • Overall satisfaction with city services (using a scale of 1–5 going from Poor to Excellent) • Quality of schools (using a scale of 1–5 going from Poor to Excellent) a. What is the population that the city would want to survey? b. Would this sample be representative of the population? c. What types of data would each of the survey items represent? 3. Construct a column chart for the data in the Excel file State Unemployment Rates to allow comparison of the June rate with the historical highs and lows. Would any other charts be better to visually convey this information? Why or why not? 4. Data from the 2000 U.S. Census show the following distribution of ages for residents of Ohio: Total Households 4,445,773 Family households (families) With own children under 18 years Married‐couple family With own children under 18 years Female householder, no husband present With own children under 18 years Nonfamily households Householder living alone Householder 65 years and over 2,993,023 1,409,912 2,285,798 996,042 536,878 323,095 1,452,750 1,215,614 446,396 a. Construct a column chart to visually represent these data. b. Construct a stacked bar chart to display the sub categories where relevant. (Note that you will have to compute additional subcategories, for instance, under Family households, the number of families without children under 18, so that the total of the subcategories equals the major category total. The sum of all categories does not equal the total.) c. Construct a pie chart showing the proportion of households in each category. 5. The Excel file Energy Production & Consumption provides various energy data since 1949. a. Construct an area chart showing the fossil fuel production as a proportion of total energy production. b. Construct line charts for each of the variables. 28 Part I • Statistics and Data Analysis c. Construct a line chart showing both the total energy production and consumption during these years. d. Construct a scatter diagram for total energy exports and total energy production. e. Discuss what information the charts convey. 6. The Excel file Internet Usage provides data about users of the Internet. a. Construct appropriate charts that will allow you to compare any differences due to age or educational attainment. b. What conclusions can you draw from these charts? 7. The Excel file Freshman College Data provides data from different colleges and branch campuses within one university over four years. a. Construct appropriate charts that allow you to contrast the differences among the colleges and branch campuses. b. Write a report to the academic vice president explaining the information. 8. Construct whatever charts you deem appropriate to convey comparative information on the two categories of televisions in the Excel file Hi‐Definition Televisions. What conclusions can you draw from these? 9. Construct whatever charts you deem appropriate to convey comparative information on deaths by major causes in the Excel file Death Cause Statistics. What conclusions can you draw from these? 10. Construct an appropriate chart to show the proportion of funds in each investment category in the Excel file Retirement Portfolio. 11. Modify the Excel file Major League Baseball to identify teams that have either a winning or losing record. Use Excel functions to find the minimum and maximum values for each type of data and count the number of teams with winning and losing records. Case A Data Collection and Analysis Project Develop a simple questionnaire to gather data that include a set of both categorical variables and ratio variables. In developing the questionnaire, think about some meaningful questions that you would like to address using the data. The questionnaire should pertain to any subject of interest to you, for example, customer satisfaction with products or school‐related issues, investments, hobbies, leisure activities, and so on—be creative! (Several Web sites provide examples of questionnaires that may help you. You might want to check out www .samplequestionnaire.com or www.examplequestionnaire .com for some ideas.) Aim for a total of 6–10 variables. Obtain a sample of at least 20 responses from fellow students or coworkers. Record the data on an Excel worksheet and construct appropriate charts that visually convey the information you gathered, and draw any conclusions from your data. Then, as you learn new material in Chapters 2–7, apply the statistical tools as appropriate to analyze your data and write a comprehensive report that describes how you drew statistical insights and conclusions, including any relevant Excel output to support your conclusions. (Hint: a good way to embed portions of an Excel worksheet into a Word document is to copy it and then use the Paste Special feature in Word to paste it as a picture. This allows you to size the picture by dragging a corner.) APPENDIX 1.1 Excel and PHStat Notes A. Using the PHStat Stack Data and Unstack Data Tools From the PHStat menu, select Data Preparation then either Stack Data (to create a single column from multiple columns) or Unstack Data (to split a single column into multiple according to a grouping label). Figures 1A.1 and 1A.2 show the dialog boxes that appear. To stack data in columns (with optional column labels), enter the range of the data in the Unstacked Data Cell Range. If the first row of the range contains a label, check the box First cells contain group labels. These labels will appear in the first column of the stacked data to help you identify the data if appropriate. Chapter 1 • Data and Business Decisions FIGURE 1A.1 Stack Data Dialog To unstack data in a single column and group them according to a set of labels in another column, enter the range of the column that contains the labels for the grouping...

Tags: excel expert statistics major