Unformatted Attachment Preview
Assignment 4
(1) Read chapter 9. Answer review questions
1.1. 6
1.2. 7
Read chapter 10 (until it starts talking about crystal ball software). Answer review question
1.3. 1
1.4. Answer the following question – Is Risk from PDF or CDF? What question do you ask CDF to
estimate risk?
(2) Project management model
2.1. Chapter 9 – Solve problem 18. Also include the graphical representation.
2.2. Solve the problem again to calculate critical path assuming task W takes 7 units of time to
complete.
(3) Model building and decision analysis
3.1. Chapter 9 – Solve problem 15.
3.2. Solve the problem again if KVA can handle 138 patients per week.
(4) Monte Carlo Simulation
Outsourcing decision model (see slides for model setup and procedure), input parameters:
•
Cost to manufacture is weibull distributed between 90 and 160. Use data in assignment 2. If
you are not confident about your Weibull distribution, assume data is normally distributed
between 90 and 160. Mean = (90+160)/2. STD = select a value between 10-15. You will not lose
any points for this.
•
Cost to outsource is normally distributed with mean 172 and STD of 13.
•
Capital is one of {50010, 52500, 47300, 51600, 48440}
•
Demand is uniformly distributed between 830 to 999
4.1. Determine probability of decision variable = Outsourcing. Hint: see slides 35 to 37.
4.2. What is the probability that manufacturing cost is less than 90% of its maximum value in your
model?
4.3. What is the probability that outsourcing cost is less than 85% of its maximum value in its model?
4.4. Add 95% confidence limits on probability of decision variable = Outsourcing from 4.1 (Hint: see
week 2 slides. Confidence interval for fraction)
Note that this is from empirical data (monte carlo simulation) and not a theoretical model.
(5) PDF and CDF calculation on monte carlo simulation using regression
5.1. Using your knowledge of regression fit the PDF of cost difference in problem above to a
mathematical model. Hint: see slides 38 to 43.
5.2. Calculate RMS error and Chi Sq. fit between PDF from empirical data (previous problem PDF) and
your regression model (current problem PDF).
5.3. From the PDF calculate CDF.
5.4. Verify answers for problems 4.1 using CDF calculated from regression model. Are they close?
Important Note: Save your work. We will revisit this problem.
PHStat Notes
Using the PHStat Stack Data and Unstack Data Tools
One‐ and Two‐Way Tables and Charts
Normal Probability Tools
Generating Probabilities in PHStat
Confidence Intervals for the Mean
Confidence Intervals for Proportions
Confidence Intervals for the Population Variance
Determining Sample Size
One‐Sample Test for the Mean, Sigma Unknown
One‐Sample Test for Proportions
Using Two‐Sample t‐Test Tools
Testing for Equality of Variances
Chi‐Square Test for Independence
Using Regression Tools
Stepwise Regression
Best-Subsets Regression
Creating x‐ and R‐Charts
Creating p‐Charts
Using the Expected Monetary Value Tool
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
28
63
97
98
136
136
137
137
169
169
169
170
171
209
211
212
267
268
375
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
p.
29
61
61
62
63
63
134
135
135
171
209
209
211
243
243
244
298
298
298
299
299
375
p.
p.
p.
p.
p.
p.
338
339
339
341
341
342
Excel Notes
Creating Charts in Excel 2010
Creating a Frequency Distribution and Histogram
Using the Descriptive Statistics Tool
Using the Correlation Tool
Creating Box Plots
Creating PivotTables
Excel‐Based Random Sampling Tools
Using the VLOOKUP Function
Sampling from Probability Distributions
Single‐Factor Analysis of Variance
Using the Trendline Option
Using Regression Tools
Using the Correlation Tool
Forecasting with Moving Averages
Forecasting with Exponential Smoothing
Using CB Predictor
Creating Data Tables
Data Table Dialog
Using the Scenario Manager
Using Goal Seek
Net Present Value and the NPV Function
Using the IRR Function
Crystal Ball Notes
Customizing Define Assumption
Sensitivity Charts
Distribution Fitting with Crystal Ball
Correlation Matrix Tool
Tornado Charts
Bootstrap Tool
TreePlan Note
Constructing Decision Trees in Excel
p. 376
This page intentionally left blank
Useful Statistical Functions in Excel 2010
Description
AVERAGE(data range)
BINOM.DIST(number_s, trials, probability_s, cumulative)
BINOM.INV(trials, probability_s, alpha)
Computes the average value (arithmetic mean) of a set of data.
Returns the individual term binomial distribution.
Returns the smallest value for which the cumulative binomial
distribution is greater than or equal to a criterion value.
Returns the left-tailed probability of the chi-square distribution.
Returns the right-tailed probability of the chi-square
distribution.
Returns the test for independence; the value of the chi-square
distribution and the appropriate degrees of freedom.
Returns the confidence interval for a population mean using a
normal distribution.
Returns the confidence interval for a population mean using a
t-distribution.
Computes the correlation coefficient between two data sets.
Returns the exponential distribution.
Returns the left-tailed F-probability distribution value.
Returns the left-tailed F-probability distribution value.
Calculates a future value along a linear trend.
Calculates predicted exponential growth.
Returns an array that describes a straight line that best fits the data.
Returns the cumulative lognormal distribution of x, where ln
(x) is normally distributed with parameters mean and
standard deviation.
Computes the median (middle value) of a set of data.
Computes the modes (most frequently occurring values) of a
set of data.
Computes the mode of a set of data.
Returns the normal cumulative distribution for the specified
mean and standard deviation.
Returns the inverse of the cumulative normal distribution.
Returns the standard normal cumulative distribution (mean = 0,
standard deviation = 1).
Returns the inverse of the standard normal distribution.
Computes the kth percentile of data in a range, exclusive.
Computes the kth percentile of data in a range, inclusive.
Returns the Poisson distribution.
Computes the quartile of a distribution.
Computes the skewness, a measure of the degree to which a
distribution is not symmetric around its mean.
Returns a normalized value for a distribution characterized by
a mean and standard deviation.
Computes the standard deviation of a set of data, assumed to
be a sample.
Computes the standard deviation of a set of data, assumed to
be an entire population.
Returns values along a linear trend line.
Returns the left-tailed t-distribution value.
Returns the two-tailed t-distribution value.
Returns the right-tailed t-distribution.
Returns the left-tailed inverse of the t-distribution.
Returns the two-tailed inverse of the t-distribution.
Returns the probability associated with a t-test.
Computes the variance of a set of data, assumed to be a sample.
Computes the variance of a set of data, assumed to be an entire
population.
Returns the two-tailed p-value of a z-test.
CHISQ.DIST(x, deg_freedom, cumulative)
CHISQ.DIST.RT(x, deg_freedom, cumulative)
CHISQ.TEST(actual_range, expected_range)
CONFIDENCE.NORM(alpha, standard_dev, size)
CONFIDENCE.T(alpha, standard_dev, size)
CORREL(arrayl, array2)
EXPON.DIST(x, lambda, cumulative)
F.DIST(x. deg_freedom1, deg_freedom2, cumulative)
F.DIST.RT(x. deg_freedom1, deg_freedom2, cumulative)
FORECAST(x, known_y's, known_x's)
GROWTH(known_y's, known_x's, new_x's, constant)
LINEST(known_y's, known_x's, new_x's, constant, stats)
LOGNORM.DIST(x, mean, standard_deviation)
MEDIAN(data range)
MODE.MULT(data range)
MODE.SNGL(data range)
NORM.DIST(x, mean, standard_dev, cumulative)
NORM.INV(probability, mean, standard_dev)
NORM.S.DIST(z)
NORM.S.INV(probability)
PERCENTILE.EXC(array, k)
PERCENTILE.INC(array, k)
POISSON.DIST(x, mean, cumulative)
QUARTILE(array, quart)
SKEW(data range)
STANDARDIZE(x, mean, standard_deviation)
STDEV.S(data range)
STDEV.P(data range)
TREND(known_y's, known_x's, new_x's, constant)
T.DIST(x, deg_freedom, cumulative)
T.DIST.2T(x, deg_freedom)
T.DIST.RT(x, deg_freedom)
T.INV(probability, deg_freedom)
T.INV.2T(probability, deg_freedom)
T.TEST(arrayl, array2, tails, type)
VAR.S(data range)
VAR.P(data range)
Z.TEST(array, x, sigma)
This page intentionally left blank
Fifth Edition
STATISTICS, DATA ANALYSIS,
AND DECISION MODELING
James R. Evans
University of Cincinnati
Boston Columbus Indianapolis New York San Francisco Upper Saddle River
Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto
Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo
Editorial Director: Sally Yagan
Editor in Chief: Donna Battista
Senior Acquisitions Editor: Chuck Synovec
Editorial Project Manager: Mary Kate Murray
Editorial Assistant: Ashlee Bradbury
Director of Marketing: Maggie Moylan
Executive Marketing Manager: Anne Fahlgren
Production Project Manager: Renata Butera
Operations Specialist: Renata Butera
Creative Art Director: Jayne Conte
Cover Designer: Suzanne Duda
Manager, Rights and Permissions: Hessa Albader
Cover Art: pedrosek/Shutterstock Images
Media Project Manager: John Cassar
Media Editor: Sarah Peterson
Full-Service Project Management: Shylaja
Gatttupalli
Composition: Jouve India Pvt Ltd
Printer/Binder: Edwards Brothers
Cover Printer: Lehigh-Phoenix Color/Hagerstown
Text Font: Palatino
Credits and acknowledgments borrowed from other sources and reproduced, with permission, in this
textbook appear on the appropriate page within text.
Microsoft and/or its respective suppliers make no representations about the suitability of the information
contained in the documents and related graphics published as part of the services for any purpose. All
such documents and related graphics are provided “as is” without warranty of any kind. Microsoft and/
or its respective suppliers hereby disclaim all warranties and conditions with regard to this information,
including all warranties and conditions of merchantability, whether express, implied or statutory, fitness for
a particular purpose, title and non-infringement. In no event shall Microsoft and/or its respective suppliers
be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss
of use, data or profits, whether in an action of contract, negligence or other tortious action, arising out of or
in connection with the use or performance of information available from the services.
The documents and related graphics contained herein could include technical inaccuracies or
typographical errors. Changes are periodically added to the information herein. Microsoft and/or its
respective suppliers may make improvements and/or changes in the product(s) and/or the program(s)
described herein at any time. Partial screen shots may be viewed in full within the software version specified.
Microsoft® and Windows® are registered trademarks of the Microsoft Corporation in the U.S.A. and other
countries. This book is not sponsored or endorsed by or affiliated with the Microsoft Corporation.
Copyright © 2013, 2010, 2007, 2003, 2000 by Pearson Education, Inc., publishing as Prentice Hall.
All rights reserved. Manufactured in the United States of America. This publication is protected by
Copyright, and permission should be obtained from the publisher prior to any prohibited reproduction,
storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical,
photocopying, recording, or likewise. To obtain permission(s) to use material from this work, please
submit a written request to Pearson Education, Inc., Permissions Department, One Lake Street,
Upper Saddle River, New Jersey 07458, or you may fax your request to 201-236-3290.
Many of the designations by manufacturers and sellers to distinguish their products are claimed as
trademarks. Where those designations appear in this book, and the publisher was aware of a trademark
claim, the designations have been printed in initial caps or all caps.
Library of Congress Cataloging-in-Publication Data
Evans, James R. (James Robert)
Statistics, data analysis, and decision modeling / James R. Evans. —5th ed.
p. cm.
ISBN-13: 978-0-13-274428-7
ISBN-10: 0-13-274428-7
1. Industrial management—Statistical methods. 2. Statistical decision. I. Title.
HD30.215.E93 2012
658.4r033—dc23
2011039310
10 9 8 7 6 5 4 3 2 1
ISBN 10: 0-13-274428-7
ISBN 13: 978-0-13-274428-7
To Beverly, Kristin, and Lauren, the three special women in my life.
—James R. Evans
This page intentionally left blank
BRIEF CONTENTS
PART I Statistics and Data Analysis 1
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Data and Business Decisions 3
Descriptive Statistics and Data Analysis 31
Probability Concepts and Distributions 65
Sampling and Estimation 99
Hypothesis Testing and Statistical Inference 138
Regression Analysis 172
Forecasting 213
Introduction to Statistical Quality Control 248
PART II Decision Modeling and Analysis 269
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Building and Using Decision Models 271
Decision Models with Uncertainty and Risk 300
Decisions, Uncertainty, and Risk 343
Queues and Process Simulation Modeling 378
Linear Optimization 411
Integer, Nonlinear, and Advanced Optimization
Methods 458
Appendix 509
Index
521
vii
This page intentionally left blank
CONTENTS
Preface xxi
Part I
STATISTICS AND DATA ANALYSIS 1
Chapter 1 DATA AND BUSINESS DECISIONS
3
Introduction 4
Data in the Business Environment 4
Sources and Types of Data 6
Metrics and Data Classification 7
Statistical Thinking 11
Populations and Samples 12
Using Microsoft Excel 13
Basic Excel Skills 14
Skill‐Builder Exercise 1.1 14
Copying Formulas and Cell References 14
Skill‐Builder Exercise 1.2 15
Functions 16
Skill‐Builder Exercise 1.3 18
Other Useful Excel Tips 18
Excel Add‐Ins 19
Skill‐Builder Exercise 1.4 20
Displaying Data with Excel Charts 21
Column and Bar Charts 21
Skill‐Builder Exercise 1.5 22
Line Charts 23
Skill‐Builder Exercise 1.6 23
Pie Charts 23
Skill‐Builder Exercise 1.7 23
Area Charts 24
Scatter Diagrams 24
Skill‐Builder Exercise 1.8 24
Miscellaneous Excel Charts 25
Ethics and Data Presentation 25
Skill‐Builder Exercise 1.9 26
Basic Concepts Review Questions 27
Problems and Applications 27
Case: A Data Collection and Analysis Project 28
ix
x
Contents
Chapter 2 DESCRIPTIVE STATISTICS AND DATA ANALYSIS 31
Introduction 32
Descriptive Statistics 32
Frequency Distributions, Histograms, and Data Profiles 33
Categorical Data 34
Numerical Data 34
Skill‐Builder Exercise 2.1 38
Skill‐Builder Exercise 2.2 38
Data Profiles 38
Descriptive Statistics for Numerical Data 39
Measures of Location 39
Measures of Dispersion 40
Skill‐Builder Exercise 2.3 42
Measures of Shape 43
Excel Descriptive Statistics Tool 44
Skill‐Builder Exercise 2.4 44
Measures of Association 45
Skill‐Builder Exercise 2.5 47
Descriptive Statistics for Categorical Data 47
Skill‐Builder Exercise 2.6 48
Visual Display of Statistical Measures 49
Box Plots 49
Dot‐Scale Diagrams 49
Skill‐Builder Exercise 2.7 49
Outliers 50
Data Analysis Using PivotTables 50
Skill‐Builder Exercise 2.8 53
Skill‐Builder Exercise 2.9 53
Basic Concepts Review Questions 54
Problems and Applications 54
Case: The Malcolm Baldrige Award 57
Skill‐Builder Exercise 2.10 59
Skill‐Builder Exercise 2.11 60
Chapter 3 PROBABILITY CONCEPTS AND DISTRIBUTIONS 65
Introduction 66
Basic Concepts of Probability 66
Basic Probability Rules and Formulas 67
Conditional Probability 68
Skill‐Builder Exercise 3.1 70
Random Variables and Probability Distributions 70
Discrete Probability Distributions 73
Expected Value and Variance of a Discrete Random Variable 74
Contents
Skill‐Builder Exercise 3.2 75
Bernoulli Distribution 75
Binomial Distribution 75
Poisson Distribution 76
Skill‐Builder Exercise 3.3 78
Continuous Probability Distributions 78
Uniform Distribution 80
Normal Distribution 81
Skill‐Builder Exercise 3.4 84
Triangular Distribution 84
Exponential Distribution 85
Probability Distributions in PHStat 86
Other Useful Distributions 86
Joint and Marginal Probability Distributions 89
Basic Concepts Review Questions 90
Problems and Applications 90
Case: Probability Analysis for Quality Measurements 94
Chapter 4 SAMPLING AND ESTIMATION
99
Introduction 100
Statistical Sampling 100
Sample Design 100
Sampling Methods 101
Errors in Sampling 103
Random Sampling From Probability Distributions 103
Sampling From Discrete Probability Distributions 104
Skill‐Builder Exercise 4.1 105
Sampling From Common Probability Distributions 105
A Statistical Sampling Experiment in Finance 106
Skill‐Builder Exercise 4.2 106
Sampling Distributions and Sampling Error 107
Skill‐Builder Exercise 4.3 110
Applying the Sampling Distribution of the Mean 110
Sampling and Estimation 110
Point Estimates 111
Unbiased Estimators 112
Skill‐Builder Exercise 4.4 113
Interval Estimates 113
Confidence Intervals: Concepts and Applications 113
Confidence Interval for the Mean with Known Population Standard
Deviation 114
Skill‐Builder Exercise 4.5 116
xi
xii
Contents
Confidence Interval for the Mean with Unknown Population Standard
Deviation 116
Confidence Interval for a Proportion 118
Confidence Intervals for the Variance and Standard Deviation 119
Confidence Interval for a Population Total 121
Using Confidence Intervals for Decision Making 122
Confidence Intervals and Sample Size 122
Prediction Intervals 124
Additional Types of Confidence Intervals 125
Differences Between Means, Independent Samples 125
Differences Between Means, Paired Samples 125
Differences Between Proportions 126
Basic Concepts Review Questions 126
Problems and Applications 126
Case: Analyzing a Customer Survey 129
Skill‐Builder Exercise 4.6 131
Skill‐Builder Exercise 4.7 132
Skill‐Builder Exercise 4.8 133
Skill‐Builder Exercise 4.9 133
Chapter 5 HYPOTHESIS TESTING AND STATISTICAL INFERENCE
Introduction 139
Basic Concepts of Hypothesis Testing 139
Hypothesis Formulation 140
Significance Level 141
Decision Rules 142
Spreadsheet Support for Hypothesis Testing 145
One‐Sample Hypothesis Tests 145
One‐Sample Tests for Means 145
Using p‐Values 147
One‐Sample Tests for Proportions 148
One Sample Test for the Variance 150
Type II Errors and the Power of A Test 151
Skill‐Builder Exercise 5.1 153
Two‐Sample Hypothesis Tests 153
Two‐Sample Tests for Means 153
Two‐Sample Test for Means with Paired Samples 155
Two‐Sample Tests for Proportions 155
Hypothesis Tests and Confidence Intervals 156
Test for Equality of Variances 157
Skill‐Builder Exercise 5.2 158
Anova: Testing Differences of Several Means 158
Assumptions of ANOVA 160
Tukey–Kramer Multiple Comparison Procedure 160
138
Contents
Chi‐Square Test for Independence 162
Skill‐Builder Exercise 5.3 164
Basic Concepts Review Questions 164
Problems and Applications 164
Case: HATCO, Inc. 167
Skill‐Builder Exercise 5.4 169
Chapter 6 REGRESSION ANALYSIS
172
Introduction 173
Simple Linear Regression 174
Skill‐Builder Exercise 6.1 175
Least‐Squares Regression 176
Skill‐Builder Exercise 6.2 178
A Practical Application of Simple Regression to Investment
Risk 178
Simple Linear Regression in Excel 179
Skill‐Builder Exercise 6.3 180
Regression Statistics 180
Regression as Analysis of Variance 181
Testing Hypotheses for Regression Coefficients 181
Confidence Intervals for Regression Coefficients 182
Confidence and Prediction Intervals for X‐Values 182
Residual Analysis and Regression Assumptions 182
Standard Residuals 184
Skill‐Builder Exercise 6.4 184
Checking Assumptions 184
Multiple Linear Regression 186
Skill‐Builder Exercise 6.5 186
Interpreting Results from Multiple Linear Regression 188
Correlation and Multicollinearity 188
Building Good Regression Models 190
Stepwise Regression 193
Skill‐Builder Exercise 6.6 193
Best‐Subsets Regression 193
The Art of Model Building in Regression 194
Regression with Categorical Independent Variables 196
Categorical Variables with More Than Two Levels 199
Skill‐Builder Exercise 6.7 201
Regression Models with Nonlinear Terms 201
Skill‐Builder Exercise 6.8 202
Basic Concepts Review Questions 204
Problems and Applications 204
Case: Hatco 207
xiii
xiv
Contents
Chapter 7 FORECASTING
213
Introduction 214
Qualitative and Judgmental Methods 214
Historical Analogy 215
The Delphi Method 215
Indicators and Indexes for Forecasting 215
Statistical Forecasting Models 216
Forecasting Models for Stationary Time Series 218
Moving Average Models 218
Error Metrics and Forecast Accuracy 220
Skill‐Builder Exercise 7.1 222
Exponential Smoothing Models 222
Skill‐Builder Exercise 7.2 224
Forecasting Models for Time Series with a Linear Trend 224
Regression‐Based Forecasting 224
Advanced Forecasting Models 225
Autoregressive Forecasting Models 226
Skill‐Builder Exercise 7.3 228
Forecasting Models with Seasonality 228
Incorporating Seasonality in Regression Models 229
Skill‐Builder Exercise 7.4 231
Forecasting Models with Trend and Seasonality 231
Regression Forecasting with Causal Variables 231
Choosing and Optimizing Forecasting Models Using
CB Predictor 233
Skill‐Builder Exercise 7.5 235
The Practice of Forecasting 238
Basic Concepts Review Questions 239
Problems and Applications 240
Case: Energy Forecasting 241
Chapter 8 INTRODUCTION TO STATISTICAL QUALITY CONTROL
Introduction 248
The Role of Statistics and Data Analysis in Quality
Control 249
Statistical Process Control 250
Control Charts 250
x ‐ and R‐Charts 251
Skill‐Builder Exercise 8.1 256
Analyzing Control Charts 256
Sudden Shift in the Process Average 257
Cycles 257
Trends 257
248
Contents
Hugging the Center Line 257
Hugging the Control Limits 258
Skill‐Builder Exercise 8.2 258
Skill‐Builder Exercise 8.3 260
Control Charts for Attributes 260
Variable Sample Size 262
Skill‐Builder Exercise 8.4 264
Process Capability Analysis 264
Skill‐Builder Exercise 8.5 266
Basic Concepts Review Questions 266
Problems and Applications 266
Case: Quality Control Analysis 267
Part II Decision Modeling and Analysis 269
Chapter 9 BUILDING AND USING DECISION MODELS 271
Introduction 271
Decision Models 272
Model Analysis 275
What‐If Analysis 275
Skill‐Builder Exercise 9.1 277
Skill‐Builder Exercise 9.2 278
Skill‐Builder Exercise 9.3 278
Model Optimization 278
Tools for Model Building 280
Logic and Business Principles 280
Skill‐Builder Exercise 9.4 281
Common Mathematical Functions 281
Data Fitting 282
Skill‐Builder Exercise 9.5 284
Spreadsheet Engineering 284
Skill‐Builder Exercise 9.6 285
Spreadsheet Modeling Examples 285
New Product Development 285
Skill‐Builder Exercise 9.7 287
Single Period Purchase Decisions 287
Overbooking Decisions 288
Project Management 289
Model Assumptions, Complexity, and Realism 291
Skill‐Builder Exercise 9.8 293
Basic Concepts Review Questions 293
Problems and Applications 294
Case: An Inventory Management Decision Model 297
xv
xvi
Contents
Chapter 10 DECISION MODELS WITH UNCERTAINTY AND RISK
300
Introduction 301
Spreadsheet Models with Random Variables 301
Monte Carlo Simulation 302
Skill‐Builder Exercise 10.1 303
Monte Carlo Simulation Using Crystal Ball
303
Defining Uncertain Model Inputs 304
Running a Simulation 308
Saving Crystal Ball Runs 310
Analyzing Results 310
Skill‐Builder Exercise 10.2 314
Crystal Ball Charts 315
Crystal Ball Reports and Data Extraction 318
Crystal Ball Functions and Tools 318
Applications of Monte Carlo Simulation and Crystal Ball
Features 319
Newsvendor Model: Fitting Input Distributions, Decision Table Tool,
and Custom Distribution 319
Skill‐Builder Exercise 10.3 323
Skill‐Builder Exercise 10.4 324
Overbooking Model: Crystal Ball Functions 324
Skill‐Builder Exercise 10.5 325
Cash Budgeting: Correlated Assumptions 325
New Product Introduction: Tornado Chart Tool 328
Skill‐Builder Exercise 10.6 329
Project Management: Alternate Input Parameters and the
Bootstrap Tool 329
Skill‐Builder Exercise 10.7 334
Basic Concepts Review Questions 334
Problems and Applications 335
Case: J&G Bank 338
Chapter 11 DECISIONS, UNCERTAINTY, AND RISK
343
Introduction 344
Decision Making Under Certainty 344
Decisions Involving a Single Alternative 345
Skill‐Builder Exercise 11.1 345
Decisions Involving Non–mutually Exclusive Alternatives 345
Decisions Involving Mutually Exclusive Alternatives 346
Decisions Involving Uncertainty and Risk 347
Making Decisions with Uncertain Information 347
Decision Strategies for a Minimize Objective 348
Contents xvii
Skill‐Builder Exercise 11.2 350
Decision Strategies for a Maximize Objective 350
Risk and Variability 351
Expected Value Decision Making 353
Analysis of Portfolio Risk 354
Skill‐Builder Exercise 11.3 356
The “Flaw of Averages” 356
Skill‐Builder Exercise 11.4 356
Decision Trees 357
A Pharmaceutical R&D Model 357
Decision Trees and Risk 358
Sensitivity Analysis in Decision Trees 360
Skill‐Builder Exercise 11.5 360
The Value of Information 360
Decisions with Sample Information 362
Conditional Probabilities and Bayes’s Rule 363
Utility and Decision Making 365
Skill‐Builder Exercise 11.6 368
Exponential Utility Functions 369
Skill‐Builder Exercise 11.7 370
Basic Concepts Review Questions 370
Problems and Applications 371
Case: The Sandwich Decision 375
Chapter 12 QUEUES AND PROCESS SIMULATION MODELING
Introduction 378
Queues and Queuing Systems 379
Basic Concepts of Queuing Systems 379
Customer Characteristics 380
Service Characteristics 381
Queue Characteristics 381
System Configuration 381
Performance Measures 382
Analytical Queuing Models 382
Single‐Server Model 383
Skill‐Builder Exercise 12.1 384
Little’s Law 384
Process Simulation Concepts 385
Skill‐Builder Exercise 12.2 386
Process Simulation with SimQuick 386
Getting Started with SimQuick 387
A Queuing Simulation Model 388
378
xviii Contents
Skill‐Builder Exercise 12.3 392
Queues in Series with Blocking 393
Grocery Store Checkout Model with Resources 394
Manufacturing Inspection Model with Decision Points 397
Pull System Supply Chain with Exit Schedules 400
Other SimQuick Features and Commercial Simulation Software 402
Continuous Simulation Modeling 403
Basic Concepts Review Questions 406
Problems and Applications 407
Case: Production/Inventory Planning 410
Chapter 13 LINEAR OPTIMIZATION
411
Introduction 411
Building Linear Optimization Models 412
Characteristics of Linear Optimization Models 415
Implementing Linear Optimization Models on Spreadsheets 416
Excel Functions to Avoid in Modeling Linear Programs 417
Solving Linear Optimization Models 418
Solving the SSC Model Using Standard Solver 418
Solving the SSC Model Using Premium Solver 420
Solver Outcomes and Solution Messages 422
Interpreting Solver Reports 422
Skill‐Builder Exercise 13.1 426
How Solver Creates Names in Reports 427
Difficulties with Solver 427
Applications of Linear Optimization 427
Process Selection 429
Skill‐Builder Exercise 13.2 430
Blending 430
Skill‐Builder Exercise 13.3 432
Portfolio Investment 432
Skill‐Builder Exercise 13.4 433
Transportation Problem 433
Interpreting Reduced Costs 437
Multiperiod Production Planning 437
Skill‐Builder Exercise 13.5 439
Multiperiod Financial Planning 439
Skill‐Builder Exercise 13.6 440
A Model with Bounded Variables 440
A Production/Marketing Allocation Model 445
How Solver Works 449
Basic Concepts Review Questions 450
Contents
Problems and Applications 450
Case: Haller’s Pub & Brewery 457
Chapter 14 INTEGER, NONLINEAR, AND ADVANCED OPTIMIZATION
METHODS 458
Introduction 458
Integer Optimization Models 459
A Cutting Stock Problem 459
Solving Integer Optimization Models 460
Skill‐Builder Exercise 14.1 462
Integer Optimization Models with Binary Variables 463
Project Selection 463
Site Location Model 464
Skill‐Builder Exercise 14.2 467
Computer Configuration 467
Skill‐Builder Exercise 14.3 470
A Supply Chain Facility Location Model 470
Mixed Integer Optimization Models 471
Plant Location Model 471
A Model with Fixed Costs 473
Nonlinear Optimization 475
Hotel Pricing 475
Solving Nonlinear Optimization Models 477
Markowitz Portfolio Model 479
Skill‐Builder Exercise 14.4 482
Evolutionary Solver for Nonsmooth Optimization 482
Rectilinear Location Model 484
Skill‐Builder Exercise 14.5 484
Job Sequencing 485
Skill‐Builder Exercise 14.6 488
Risk Analysis and Optimization 488
Combining Optimization and Simulation 491
A Portfolio Allocation Model 491
Using OptQuest 492
Skill‐Builder Exercise 14.7 500
Basic Concepts Review Questions 500
Problems and Applications 500
Case: Tindall Bookstores 506
Appendix 509
Index 521
xix
This page intentionally left blank
PREFACE
INTENDED AUDIENCE
Statistics, Data Analysis, and Decision Modeling was written to meet the need for an introductory text that provides the fundamentals of business statistics and decision models/
optimization, focusing on practical applications of data analysis and decision modeling,
all presented in a simple and straightforward fashion.
The text consists of 14 chapters in two distinct parts. The first eight chapters deal
with statistical and data analysis topics, while the remaining chapters deal with decision
models and applications. Thus, the text may be used for:
• MBA or undergraduate business programs that combine topics in business statistics and management science into a single, brief, quantitative methods
• Business programs that teach statistics and management science in short, modular
courses
• Executive MBA programs
• Graduate refresher courses for business statistics and management science
NEW TO THIS EDITION
The fifth edition of this text has been carefully revised to improve clarity and pedagogical features, and incorporate new and revised topics. Many significant changes have
been made, which include the following:
1. Spreadsheet-based tools and applications are compatible with Microsoft Excel 2010,
which is used throughout this edition.
2. Every chapter has been carefully revised to improve clarity. Many explanations
of critical concepts have been enhanced using new business examples and data
sets. The sequencing of several topics have been reorganized to improve their flow
within the book.
3. Excel, PHStat, and other software notes have been moved to chapter appendixes
so as not to disrupt the flow of the text.
4. “Skill-Builder” exercises, designed to provide experience with applying Excel,
have been located in the text to facilitate immediate application of new concepts.
5. Data used in many problems have been changed, and new problems have been added.
SUBSTANCE
The danger in using quantitative methods does not generally lie in the inability to perform the requisite calculations, but rather in the lack of a fundamental understanding of
why to use a procedure, how to use it correctly, and how to properly interpret results.
A key focus of this text is conceptual understanding using simple and practical examples
rather than a plug-and-chug or point-and-click mentality, as are often done in other
texts, supplemented by appropriate theory. On the other hand, the text does not attempt
to be an encyclopedia of detailed quantitative procedures, but focuses on useful concepts and tools for today's managers.
To support the presentation of topics in business statistics and decision modeling, this text integrates fundamental theory and practical applications in a spreadsheet
environment using Microsoft Excel 2010 and various spreadsheet add-ins, specifically:
• PHStat, a collection of statistical tools that enhance the capabilities of Excel; published by Pearson Education
xxi
xxii Preface
• Crystal Ball (including CBPredictor for forecasting and OptQuest for optimization),
a powerful commercial package for risk analysis
• TreePlan, a decision analysis add-in
• SimQuick, an Excel-based application for process simulation, published by Pearson
Education
• Risk Solver Platform for Education, an Excel-based tool for risk analysis, simulation,
and optimization
These tools have been integrated throughout the text to simplify the presentations
and implement tools and calculations so that more focus can be placed on interpretation
and understanding the managerial implications of results.
TO THE STUDENTS
The Companion Website for this text (www.pearsonhighered.com/evans) contains the
following:
• Data files—download the data and model files used throughout the text in examples, problems, and exercises
• PHStat—download of the software from Pearson
• TreePlan—link to a free trial version
• Risk Solver Platform for Education—link to a free trial version
• Crystal Ball—link to a free trial version
• SimQuick—link that will direct you to where you may purchase a standalone version of the software from Pearson
• Subscription Content—a Companion Website Access Code is located on the back
cover of this book. This code gives you access to the following software:
• Risk Solver Platform for Education—link that will direct students to an
upgrade version
• Crystal Ball—link that will direct students to an upgrade version
• SimQuick—link that will allow you to download the software from Pearson
To redeem the subscription content:
• Visit www.pearsonhighered.com/evans.
• Click on the Companion Website link.
• Click on the Subscription Content link.
• First-time users will need to register, while returning users may log-in.
• Once you are logged in you will be brought to a page which will inform you how
to download the software from the corresponding software company's Web site.
TO THE INSTRUCTORS
To access instructor solutions files, please visit www.pearsonhighered.com/evans and
choose the instructor resources option. A variety of instructor resources are available for
instructors who register for our secure environment. The Instructor’s Solutions Manual
files and PowerPoint presentation files for each chapter are available for download.
As a registered faculty member, you can login directly to download resource files,
and receive immediate access and instructions for installing Course Management content to your campus server.
Need help? Our dedicated Technical Support team is ready to assist instructors with questions about the media supplements that accompany this text. Visit
http://247.pearsoned.com/ for answers to frequently asked questions and toll-free user
support phone numbers.
Preface xxiii
ACKNOWLEDGMENTS
I would like to thank the following individuals who have provided reviews and insightful suggestions for this edition: Ardith Baker (Oral Roberts University), Geoffrey Barnes
(University of Iowa), David H. Hartmann (University of Central Oklahoma), Anthony
Narsing (Macon State College), Tony Zawilski (The George Washington University), and
Dr. J. H. Sullivan (Mississippi State University).
In addition, I thank the many students who over the years provided numerous
suggestions, data sets and problem ideas, and insights into how to better present the
material. Finally, appreciation goes to my editor Chuck Synovec; Mary Kate Murray,
Editorial Project Manager; Ashlee Bradbury, Editorial Assistant; and the entire production staff at Pearson Education for their dedication in developing and producing this
text. If you have any suggestions or corrections, please contact me via email at james.
evans@uc.edu.
James R. Evans
University of Cincinnati
This page intentionally left blank
PART
I
Statistics and Data Analysis
This page intentionally left blank
Chapter 1
Data and Business Decisions
■
INTRODUCTION 4
■
DATA IN THE BUSINESS ENVIRONMENT 4
■
SOURCES AND TYPES OF DATA 6
n
■
n
■
■
Metrics and Data Classification 7
STATISTICAL THINKING 11
Populations and Samples 12
USING MICROSOFT EXCEL 13
n
Basic Excel Skills 14
n
Copying Formulas and Cell References 14
n
Functions 16
n
Other Useful Excel Tips 18
n
Excel Add‐Ins 19
DISPLAYING DATA WITH EXCEL CHARTS 21
n
Column and Bar Charts 21
n
Line Charts 23
n
Pie Charts
n
Area Charts 24
n
Scatter Diagrams 24
n
Miscellaneous Excel Charts 25
n
Ethics and Data Presentation 25
23
■
BASIC CONCEPTS REVIEW QUESTIONS 27
■
PROBLEMS AND APPLICATIONS 27
■
CASE: A DATA COLLECTION AND ANALYSIS PROJECT 28
■
APPENDIX 1.1: EXCEL AND PHSTAT NOTES
28
n
A. Using the PHStat Stack Data and Unstack Data Tools 28
n
B. Creating Charts in Excel 2010
29
3
4
Part I • Statistics and Data Analysis
INTRODUCTION
Since the dawn of the electronic age and the Internet, both individuals and organizations have
had access to an enormous wealth of data and information. Data are numerical facts and figures
that are collected through some type of measurement process. Information comes from analyzing
data; that is, extracting meaning from data to support evaluation and decision making. Modern
organizations—which include for‐profit businesses such as retailers, manufacturers, hotels, and
airlines, as well as nonprofit organizations like hospitals, educational institutions, and government
agencies—need good data to evaluate daily performance and to make critical strategic and
operational decisions.
The purpose of this book is to introduce you to statistical methods for analyzing data; ways
of using data effectively to make informed decisions; and approaches for developing, analyzing,
and solving models of decision problems. Part I of this book (Chapters 1–8) focuses on key issues
of statistics and data analysis, and Part II (Chapters 9–14) introduces you to various types of decision
models that rely on good data analysis.
In this chapter, we discuss the roles of data analysis in business, discuss how data are used
in evaluating business performance, introduce some fundamental issues of statistics and measurement, and introduce spreadsheets as a support tool for data analysis and decision modeling.
DATA IN THE BUSINESS ENVIRONMENT
Data are used in virtually every major function in business, government, health care,
education, and other nonprofit organizations. For example:
• Annual reports summarize data about companies’ profitability and market
share both in numerical form and in charts and graphs to communicate with
shareholders.
• Accountants conduct audits and use statistical methods to determine whether
figures reported on a firm’s balance sheet fairly represents the actual data
by examining samples (that is, subsets) of accounting data, such as accounts
receivable.
• Financial analysts collect and analyze a variety of data to understand the contribution that a business provides to its shareholders. These typically include profitability, revenue growth, return on investment, asset utilization, operating margins,
earnings per share, economic value added (EVA), shareholder value, and other
relevant measures.
• Marketing researchers collect and analyze data to evaluate consumer perceptions
of new products.
• Operations managers use data on production performance, manufacturing quality, delivery times, order accuracy, supplier performance, productivity, costs, and
environmental compliance to manage their operations.
• Human resource managers measure employee satisfaction, track turnover, training
costs, employee satisfaction, turnover, market innovation, training effectiveness, and
skills development.
• Within the federal government, economists analyze unemployment rates, manufacturing capacity and global economic indicators to provide forecasts and trends.
• Hospitals track many different clinical outcomes for regulatory compliance reporting
and for their own analysis.
• Schools analyze test performance and state boards of education use statistical
performance data to allocate budgets to school districts.
Data support a variety of company purposes, such as planning, reviewing company performance, improving operations, and comparing company performance with
competitors’ or “best practices” benchmarks. Data that organizations use should focus
on critical success factors that lead to competitive advantage. An example from the
Chapter 1 • Data and Business Decisions
Boeing Company shows the value of having good business data and analysis capabilities.1 In the early 1990s, Boeing’s assembly lines were morasses of inefficiency. A manual
numbering system dating back to World War II bomber days was used to keep track
of an airplane’s four million parts and 170 miles of wiring; changing a part on a 737’s
landing gear meant renumbering 464 pages of drawings. Factory floors were covered
with huge tubs of spare parts worth millions of dollars. In an attempt to grab market
share from rival Airbus, the company discounted planes deeply and was buried by an
onslaught of orders. The attempt to double production rates, coupled with implementation of a new production control system, resulted in Boeing being forced to shut down
its 737 and 747 lines for 27 days in October 1997, leading to a $178 million loss and
a shakeup of top management. Much of the blame was focused on Boeing’s financial
practices and lack of real‐time financial data. With a new Chief Financial Officer and
finance team, the company created a “control panel” of vital measures, such as materials
costs, inventory turns, overtime, and defects, using a color‐coded spreadsheet. For the
first time, Boeing was able to generate a series of charts showing which of its programs
were creating value and which were destroying it. The results were eye‐opening and
helped formulate a growth plan. As one manager noted, “The data will set you free.”
Data also provide key inputs to decision models. A decision model is a logical or
mathematical representation of a problem or business situation that can be developed
from theory or observation. Decision models establish relationships between actions
that decision makers might take and results that they might expect, thereby allowing
the decision makers to predict what might happen based on the model. For instance,
the manager of a grocery store might want to know how best to use price promotions,
coupons, and advertising to increase sales. In the past, grocers have studied the relationship of sales volume to programs such as these by conducting controlled experiments to identify the relationship between actions and sales volumes.2 That is, they
implement different combinations of price promotions, coupons, and advertising (the
decision variables), and then observe the sales that result. Using the data from these
experiments, we can develop a predictive model of sales as a function of these decision
variables. Such a model might look like the following:
Sales = a + b * Price + c * Coupons + d * Advertising + e * Price * Advertising
where a, b, c, d, and e are constants that are estimated from the data. By setting levels for
price, coupons, and advertising, the model estimates a level of sales. The manager can
use the model to help identify effective pricing, promotion, and advertising strategies.
Because of the ease with which data can be generated and transmitted today, managers, supervisors, and front‐line workers can easily be overwhelmed. Data need to be
summarized in a quantitative or visual fashion. One of the most important tools for
doing this is statistics, which David Hand, former president of the Royal Statistical
Society in the UK, defines as both the science of uncertainty and the technology of extracting
information from data.3 Statistics involve collecting, organizing, analyzing, interpreting,
and presenting data. A statistic is a summary measure of data. You are undoubtedly
familiar with the concept of statistics in daily life as reported in newspapers and the
media; baseball batting averages, airline on‐time arrival performance, and economic
statistics such as Consumer Price Index are just a few examples. We can easily google
statistical information about investments and financial markets, college loans and home
mortgage rates, survey results about national political issues, team and individual
1
Jerry Useem, “Boeing versus Boeing,” Fortune, October 2, 2000,
2
“Flanking in a Price War,” Interfaces, Vol. 19, No. 2, 1989, 1–12.
3
148–160.
David Hand, “Statistics: An Overview,” in Miodrag Lovric, Ed., International Encyclopedia of Statistical Science,
Springer Major Reference; http://www.springer.com/statistics/book/978-3-642-04897-5, p. 1504.
5
6
Part I • Statistics and Data Analysis
sports performance, and well, just about anything. To paraphrase Apple, “There’s a stat
for that!” Modern spreadsheet technology, such as Microsoft Excel, has made it quite
easy to apply statistical tools to organize, analyze, and present data to make them more
understandable.
Most organizations have traditionally focused on financial and market information, such as profit, sales volume, and market share. Today, however, many organizations use a wide variety of measures that provide a comprehensive view of business
performance. For example, the Malcolm Baldrige Award Criteria for Performance
Excellence, which many organizations use as a high‐performance management framework, suggest that high‐performing organizations need to measure results in five basic
categories:
1. Product and process outcomes, such as reliability, performance, defect levels, service
errors, response times, productivity, production flexibility, setup times, time to
market, waste stream reductions, innovation, emergency preparedness, strategic
plan accomplishment, and supply chain effectiveness.
2. Customer‐focused outcomes, such as customer satisfaction and dissatisfaction, customer retention, complaints and complaint resolution, customer perceived value,
and gains and losses of customers.
3. Workforce‐focused outcomes, such as workforce engagement and satisfaction,
employee retention, absenteeism, turnover, safety, training effectiveness, and leadership development.
4. Leadership and governance outcomes, such as communication effectiveness, governance and accountability, environmental and regulatory compliance, ethical
behavior, and organizational citizenship.
5. Financial and market outcomes. Financial outcomes might include revenue, profit
and loss, net assets, cash‐to‐cash cycle time, earnings per share, and financial operations efficiency (collections, billings, receivables). Market outcomes
might include market share, business growth, and new products and service
introductions.
Understanding key relationships among these types of measures can help organizations make better decisions. For example, Sears, Roebuck and Company provided
a consulting group with 13 financial measures, hundreds of thousands of employee
satisfaction data points, and millions of data points on customer satisfaction. Using
advanced statistical tools, the analysts discovered that employee attitudes about the
job and the company are key factors that predict their behavior with customers, which,
in turn, predicts the likelihood of customer retention and recommendations, which, in
turn, predict financial performance. As a result, Sears was able to predict that if a store
increases its employee satisfaction score by five units, customer satisfaction scores will
go up by two units and revenue growth will beat the stores’ national average by 0.5%.4
Such an analysis can help managers make decisions, for instance, on improved human
resource policies.
SOURCES AND TYPES OF DATA
Data may come from a variety of sources: internal record‐keeping, special studies, and
external databases. Internal data are routinely collected by accounting, marketing, and
operations functions of a business. These might include production output, material costs,
sales, accounts receivable, and customer demographics. Other data must be generated
through special efforts. For example, customer satisfaction data are often acquired by mail,
4
“Bringing Sears into the New World,” Fortune, October 13, 1997, 183–184.
Chapter 1 • Data and Business Decisions
Internet, or telephone surveys; personal interviews; or focus groups. External databases are
often used for comparative purposes, marketing projects, and economic analyses. These
might include population trends, interest rates, industry performance, consumer spending, and international trade data. Such data can be found in annual reports, Standard &
Poor’s Compustat data sets, industry trade associations, or government databases.
One example of a comprehensive government database is FedStats (www
.fedstats.gov), which has been available to the public since 1997. FedStats provides access
to the full range of official statistical information produced by the Federal Government
without having to know in advance which Federal agency produces which particular
statistic. With convenient searching and linking capabilities to more than 100 agencies—
which provide data and trend information on such topics as economic and population
trends, crime, education, health care, aviation safety, energy use, farm production, and
more—FedStats provides one location for access to the full breadth of Federal statistical
information.
The use of data for analysis and decision making certainly is not limited to business. Science, engineering, medicine, and sports, to name just a few, are examples of professions that rely heavily on data. Table 1.1 provides a list of data files that are available
in the Statistics Data Files folder on the Companion Website accompanying this book. All
are saved in Microsoft Excel workbooks. These data files will be used throughout this
book to illustrate various issues associated with statistics and data analysis and also for
many of the questions and problems at the end of the chapters. They show but a sample
of the wide variety of applications for which statistics and data analysis techniques may
be used.
Metrics and Data Classification
A metric is a unit of measurement that provides a way to objectively quantify performance. For example, senior managers might assess overall business performance
using such metrics as net profit, return on investment, market share, and customer
satisfaction. A supervisor in a manufacturing plant might monitor the quality of a production process for a polished faucet by visually inspecting the products and counting
the number of surface defects. A useful metric would be the percentage of faucets that
have surface defects. For a web‐based retailer, some useful metrics are the percentage
of orders filled accurately and the time taken to fill a customer’s order. Measurement
is the act of obtaining data associated with a metric. Measures are numerical values
associated with a metric.
Metrics can be either discrete or continuous. A discrete metric is one that is derived
from counting something. For example, a part dimension is either within tolerance or
out of tolerance; an order is complete or incomplete; or an invoice can have one, two,
three, or any number of errors. Some discrete metrics associated with these examples
would be the proportion of parts whose dimensions are within tolerance, the number
of incomplete orders for each day, and the number of errors per invoice. Continuous
metrics are based on a continuous scale of measurement. Any metrics involving dollars,
length, time, volume, or weight, for example, are continuous.
A key performance dimension might be measured using either a continuous or a
discrete metric. For example, an airline flight is considered on time if it arrives no later
than 15 minutes from the scheduled arrival time. We could evaluate on‐time performance
by counting the number of flights that are late, or by measuring the number of minutes
that flights are late. Discrete data are usually easier to capture and record, but provide less
information than continuous data. However, one generally must collect a larger amount of
discrete data to draw appropriate statistical conclusions as compared to continuous data.
7
8
Part I • Statistics and Data Analysis
TABLE 1.1
Data Files Available on Companion Website
Business and Economics
Accounting Professionals
Atlanta Airline Data
Automobile Quality
Baldrige
Banking Data
Beverage Sales
Call Center Data
Cell Phone Survey
Cereal Data
China Trade Data
Closing Stock Prices
Coal Consumption
Coal Production
Concert Sales
Consumer Price Index
Consumer Transportation Survey
Credit Approval Decisions
Customer Support Survey
Customer Survey
DJIA December Close
EEO Employment Report
Employees Salaries
Energy Production & Consumption
Federal Funds Rate
Gas & Electric
Gasoline Prices
Gasoline Sales
Google Stock Prices
Hatco
Hi‐Definition Televisions
Home Market Value
House Sales
Housing Starts
Insurance Survey
Internet Usage
Microprocessor Data
Mortgage Rates
New Account Processing
New Car Sales
Nuclear Power
Prime Rate
Quality Control Case Data
Quality Measurements
Refrigerators
Residential Electricity Data
Restaurant Sales
Retail Electricity Prices
Retirement Portfolio
Room Inspection
S&P 500
Salary Data
Sales Data
Sampling Error Experiment
Science and Engineering Jobs
State Unemployment Rates
Statistical Quality Control Problems
Surgery Infections
Syringe Samples
Treasury Yield Rates
Unions and Labor Law Data
University Grant Proposals
Behavioral and Social Sciences
Arizona Population
Blood Pressure
Burglaries
California Census Data
Census Education Data
Church Contributions
Colleges and Universities
Death Cause Statistics
Demographics
Facebook Survey
Freshman College Data
Graduate School Survey
Infant Mortality
MBA Student Survey
Ohio Education Performance
Ohio Prison Population
Self‐Esteem
Smoking & Cancer
Student Grades
Vacation Survey
Science and Engineering
Pile Foundation
Seattle Weather
Surface Finish
Washington, DC, Weather
Sports
Baseball Attendance
Golfing Statistics
Major League Baseball
NASCAR Track Data
National Football League
Olympic Track and Field Data
Chapter 1 • Data and Business Decisions
FIGURE 1.1 Example of Cross‐Sectional, Univariate Data
(Portion of Automobile Quality)
When we deal with data, it is important to understand the type of data in order
to select the appropriate statistical tool or procedure. One classification of data is the
following:
1. Types of data
• Cross‐sectional—data that are collected over a single period of time
• Time series—data collected over time
2. Number of variables
• Univariate—data consisting of a single variable
• Multivariate—data consisting of two or more (often related) variables
Figures 1.1–1.4 show examples of data sets from Table 1.1 representing each combination from this classification.
Another classification of data is by the type of measurement scale. Failure to
understand the differences in measurement scales can easily result in erroneous or misleading analysis. Data may be classified into four groups:
1. Categorical (nominal) data, which are sorted into categories according to
specified characteristics. For example, a firm’s customers might be classified by
their geographical region (North America, South America, Europe, and Pacific);
employees might be classified as managers, supervisors, and associates. The categories bear no quantitative relationship to one another, but we usually assign an
arbitrary number to each category to ease the process of managing the data and
computing statistics. Categorical data are usually counted or expressed as proportions or percentages.
FIGURE 1.2 Example of Cross‐Sectional, Multivariate Data
(Portion of Banking Data)
9
10
Part I • Statistics and Data Analysis
FIGURE 1.3 Example of Time‐Series, Univariate Data
(Portion of Gasoline Prices)
2. Ordinal data, which are ordered or ranked according to some relationship to
one another. A common example in business is data from survey scales; for example,
rating a service as poor, average, good, very good, or excellent. Such data are categorical but also have a natural order, and consequently, are ordinal. Other examples
include ranking regions according to sales levels each month and NCAA basketball
rankings. Ordinal data are more meaningful than categorical data because data can
be compared to one another (“excellent” is better than “very good”). However, like
categorical data, statistics such as averages are meaningless even if numerical codes
are associated with each category (such as your class rank), because ordinal data have
no fixed units of measurement. In addition, meaningful numerical statements about
differences between categories cannot be made. For example, the difference in strength
between basketball teams ranked 1 and 2 is not necessarily the same as the difference
between those ranked 2 and 3.
3. Interval data, which are ordered, have a specified measure of the distance
between observations but have no natural zero. Common examples are time and temperature. Time is relative to global location, and calendars have arbitrary starting dates. Both
the Fahrenheit and Celsius scales represent a specified measure of distance—degrees—
but have no natural zero. Thus we cannot take meaningful ratios; for example, we cannot
say that 50° is twice as hot as 25°. Another example is SAT or GMAT scores. The scores
can be used to rank students, but only differences between scores provide information
on how much better one student performed over another; ratios make little sense. In
contrast to ordinal data, interval data allow meaningful comparison of ranges, averages,
and other statistics.
In business, data from survey scales, while technically ordinal, are often treated
as interval data when numerical scales are associated with the categories (for instance,
FIGURE 1.4
Example of Time‐Series, Multivariate Data (Portion of Treasury Yield Rates)
Chapter 1 • Data and Business Decisions
1 = poor, 2 = average, 3 = good, 4 = very good, 5 = excellent). Strictly speaking, this
is not correct, as the “distance” between categories may not be perceived as the same
(respondents might perceive a larger distance between poor and average than between
good and very good, for example). Nevertheless, many users of survey data treat them
as interval when analyzing the data, particularly when only a numerical scale is used
without descriptive labels.
4. Ratio data, which have a natural zero. For example, dollar has an absolute zero.
Ratios of dollar figures are meaningful. Thus, knowing that the Seattle region sold $12
million in March while the Tampa region sold $6 million means that Seattle sold twice as
much as Tampa. Most business and economic data fall into this category, and statistical
methods are the most widely applicable to them.
This classification is hierarchical in that each level includes all of the information
content of the one preceding it. For example, ratio information can be converted to any
of the other types of data. Interval information can be converted to ordinal or categorical
data but cannot be converted to ratio data without the knowledge of the absolute zero
point. Thus, a ratio scale is the strongest form of measurement.
The managerial implications of this classification are in understanding the choice and
validity of the statistical measures used. For example, consider the following statements:
• Sales occurred in March (categorical).
• Sales were higher in March than in February (ordinal).
• Sales increased by $50,000 in March over February (interval).
• Sales were 20% higher in March than in February (ratio).
A higher level of measurement is more useful to a manager because more definitive
information describes the data. Obtaining ratio data can be more expensive than categorical data, especially when surveying customers, but it may be needed for proper
analysis. Thus, before data are collected, consideration must be given to the type of data
needed.
STATISTICAL THINKING
The importance of applying statistical concepts to make good business decisions and
improve performance cannot be overemphasized. Statistical thinking is a philosophy
of learning and action for improvement that is based on the following principles:
• All work occurs in a system of interconnected processes.
• Variation exists in all processes.
• Better performance results from understanding and reducing variation.5
Work gets done in any organization through processes—systematic ways of doing
things that achieve desired results. Understanding processes provides the context for
determining the effects of variation and the proper type of action to be taken. Any process contains many sources of variation. In manufacturing, for example, different batches
of material vary in strength, thickness, or moisture content. Cutting tools have inherent variation in their strength and composition. During manufacturing, tools experience
wear, vibrations cause changes in machine settings, and electrical fluctuations cause variations in power. Workers may not position parts on fixtures consistently, and physical
and emotional stress may affect workers’ consistency. In addition, measurement gauges
and human inspection capabilities are not uniform, resulting in variation in measurements even when there is little variation in the true value. Similar phenomena occur in
5
Galen Britz, Don Emerling, Lynne Hare, Roger Hoerl, and Janice Shade, “How to Teach Others to Apply
Statistical Thinking,” Quality Progress, June 1997, 67–79.
11
12
Part I • Statistics and Data Analysis
service processes because of variation in employee and customer behavior, application
of technology, and so on.
While variation exists everywhere, many managers do not often recognize it or
consider it in their decisions. For example, if sales in some region fell from the previous
year, the regional manager might quickly blame her sales staff for not working hard,
even though the drop in sales may simply be the result of uncontrollable variation. How
often do managers make decisions based on one or two data points without looking at
the pattern of variation, see trends when they do not exist, or try to manipulate financial
results they cannot truly control? Unfortunately, the answer is “quite often.” Usually,
it is simply a matter of ignorance of how to deal with data and information. A more
educated approach would be to formulate a theory, test this theory in some way, either
by collecting and analyzing data or developing a model of the situation. Using statistical
thinking can provide better insight into the facts and nature of relationships among
the many factors that may have contributed to the event and enable managers to make
better decisions.
In recent years, many organizations have implemented Six Sigma initiatives. Six
Sigma can be best described as a business process improvement approach that seeks to
find and eliminate causes of defects and errors, reduce cycle times and cost of operations, improve productivity, better meet customer expectations, and achieve higher
asset use and returns on investment in manufacturing and service processes. The term
six sigma is actually based on a statistical measure that equates to 3.4 or fewer errors
or defects per million opportunities. Six Sigma is based on a simple problem‐solving
methodology—DMAIC, which stands for Define, Measure, Analyze, Improve, and
Control—that incorporates a wide variety of statistical and other types of process
improvement tools. Six Sigma has heightened the awareness and application of statistics among business professionals at all levels in organizations, and the material in this
book will provide the foundation for more advanced topics commonly found in Six
Sigma training courses.
Populations and Samples
One of the most basic applications of statistics is drawing conclusions about populations from sample data. A population consists of all items of interest for a particular
decision or investigation, for example, all married drivers over the age of 25 in the
United States, all first‐year MBA students at a college, or all stockholders of Google. It
is important to understand that a population can be anything we define it to be, such
as all customers who have purchased from Amazon over the past year or individuals who do not own a cell phone. A company like Amazon keeps extensive records
on its customers, making it easy to retrieve data about the entire population of customers with prior purchases. However, it would probably be impossible to identify
all individuals who do not own cell phones. A population may also be an existing
collection of items (for instance, all teams in the National Football League) or the
potential, but unknown, output of a process (such as automobile engines produced
on an assembly line).
A sample is a subset of a population. For example, a list of individuals who purchased a CD from Amazon in the past year would be a sample from the population of
all customers who purchased from the company. Whether this sample is representative
of the population of customers—which depends on how the sample data are intended
to be used—may be debatable; nevertheless, it is a sample. Sampling is desirable when
complete information about a population is difficult or impossible to obtain. For example, it may be too expensive to send all previous customers a survey. In other situations,
such as measuring the amount of stress needed to destroy an automotive tire, samples
are necessary even though the entire population may be sitting in a warehouse. Most of
Chapter 1 • Data and Business Decisions
the data files in Table 1.1 represent samples, although some, like the major league baseball data, represent populations.
We use samples because it is often not possible or cost‐effective to gather population
data. We are all familiar with survey samples of voters prior to and during elections. A
small subset of potential voters, if properly chosen on a statistical basis, can provide
accurate estimates of the behavior of the voting population. Thus, television network
anchors can announce the winners of elections based on a small percentage of voters
before all votes can be counted. Samples are routinely used for business and public
opinion polls—magazines such as Business Week and Fortune often report the results
of surveys of executive opinions on the economy and other issues. Many businesses
rely heavily on sampling. Producers of consumer products conduct small‐scale market
research surveys to evaluate consumer response to new products before full‐scale production, and auditors use sampling as an important part of audit procedures. In 2000,
the U.S. Census began using statistical sampling for estimating population characteristics, which resulted in considerable controversy and debate.
Statistics are summary measures of population characteristics computed from
samples. In business, statistical methods are used to present data in a concise and
understandable fashion, to estimate population characteristics, to draw conclusions
about populations from sample data, and to develop useful decision models for prediction and forecasting. For example, in the 2010 J.D. Power and Associates’ Initial Quality
Study, Porsche led the industry with a reported 83 problems per 100 vehicles. The number 83 is a statistic based on a sample that summarizes the total number of problems
reported per 100 vehicles and suggests that the entire population of Porsche owners
averaged less than one problem (83/100 or 0.83) in their first 90 days of ownership.
However, a particular automobile owner may have experienced zero, one, two, or perhaps more problems.
The process of collection, organization, and description of data is commonly called
descriptive statistics. Statistical inference refers to the process of drawing conclusions
about unknown characteristics of a population based on sample data. Finally, predictive
statistics—developing predictions of future values based on historical data—is the third
major component of statistical methodology. In subsequent chapters, we will cover each
of these types of statistical methodology.
USING MICROSOFT EXCEL
Spreadsheet software for personal computers has become an indispensable tool for
business analysis, particularly for the manipulation of numerical data and the development and analysis of decision models. In this text, we will use Microsoft Excel 2010 for
Windows to perform spreadsheet calculations and analyses. Some key differences exist
between Excel 2010 and Excel 2007. We will often contrast these differences, but if you
use an older version, you should be able to apply Excel easily to problems and exercises.
In addition, we note that Mac versions of Excel do not have the full functionality that
Windows versions have.
Although Excel has some flaws and limitations from a statistical perspective, its
widespread availability makes it the software of choice for many business professionals.
We do wish to point out, however, that better and more powerful statistical software
packages are available, and serious users of statistics should consult a professional statistician for advice on selecting the proper software.
We will briefly review some of the fundamental skills needed to use Excel for
this book. This is not meant to be a complete tutorial; many good Excel tutorials can be
found online, and we also encourage you to use the Excel help capability (by clicking
the question mark button at the top right of the screen).
13
14
Part I • Statistics and Data Analysis
Basic Excel Skills
To be able to apply the procedures and techniques we will study in this book, it is necessary for you to know many of the basic capabilities of Excel. We will assume that you are
familiar with the most elementary spreadsheet concepts and procedures:
• Opening, saving, and printing files
• Moving around a spreadsheet
• Selecting ranges
• Inserting/deleting rows and columns
• Entering and editing text, numerical data, and formulas
• Formatting data (number, currency, decimal places, etc.)
• Working with text strings
• Performing basic arithmetic calculations
• Formatting data and text
• Modifying the appearance of the spreadsheet
• Sorting data
Excel has extensive online help, and many good manuals and training guides
are available both in print and online, and we urge you to take advantage of these.
However, to facilitate your understanding and ability, we will review some of the more
important topics in Excel with which you may or may not be familiar. Other tools and
procedures in Excel that are useful in statistics, data analysis, or decision modeling will
be introduced as we need them.
SKILL‐BUILDER EXERCISE 1.1
Sort the data in the Excel file Automobile Quality from lowest to highest number of problems per
100 vehicles using the sort capability in Excel.
Menus and commands in Excel 2010 reside in the “ribbon” shown in Figure 1.5.
Menus and commands are arranged in logical groups under different tabs (File, Home, Insert,
and so on); small triangles pointing downward indicate menus of additional choices. We
will often refer to certain commands or options and where they may be found in the ribbon.
Copying Formulas and Cell References
Excel provides several ways of copying formulas to different cells. This is extremely
useful in building decision models, because many models require replication of formulas for different periods of time, similar products, and so on. One way is to select the
cell with the formula to be copied, click the Copy button from the Clipboard group under
the Home tab (or simply press Ctrl‐C on your keyboard), click on the cell you wish to
FIGURE 1.5
Excel 2010 Ribbon
Chapter 1 • Data and Business Decisions
FIGURE 1.6
Copying Formulas by Dragging
copy to, and then click the Paste button (or press Ctrl‐V). You may also enter a formula
directly in a range of cells without copying and pasting by selecting the range, typing in
the formula, and pressing Ctrl‐Enter.
To copy a formula from a single cell or range of cells down a column or across
a row, first select the cell or range, then click and hold the mouse on the small square
in the lower right‐hand corner of the cell (the “fill handle”), and drag the formula to
the “target” cells you wish to copy to. To illustrate this technique, suppose we wish to
compute the differences in projected employment for each occupation in the Excel file
Science and Engineering Jobs. In Figure 1.6, we have added a column for the difference
and entered the formula = C10‐B10 in the first row. Highlight cell D4 and then simply
drag the handle down the column. Figure 1.7 shows the results.
SKILL‐BUILDER EXERCISE 1.2
Modify the Excel file Science and Engineering Jobs to compute the percent increase in the number
of jobs for each occupational category.
In any of these procedures, the structure of the formula is the same as in the original
cell, but the cell references have been changed to reflect the relative addresses of the formula in the new cells. That is, the new cell references have the same relative relationship
to the new formula cell(s) as they did in the original formula cell. Thus, if a formula is
copied (or moved) one cell to the right, the relative cell addresses will have their column label increased by one; if we copy or move the formula two cells down, the row
FIGURE 1.7
Results of Dragging Formulas
15
16
Part I • Statistics and Data Analysis
FIGURE 1.8
Formulas for Science and Engineering Jobs Worksheet
number is increased by 2. Figure 1.8 shows the formulas for the Science and Engineering
Jobs spreadsheet example. For example, note that the formulas in each row are the same,
except for the column reference.
Sometimes, however, you do not want to change the relative addressing because
you would like all the copied formulas to point to a certain cell. We do this by using a $
before the column and/or row address of the cell. This is called an absolute address. For
example, suppose we wish to compute the percent of the total for each occupation for
2010. In cell E4, enter the formula = C4/$C$12. Then, if we copy this formula down
column E for other months, the numerator will change to reference each occupation, but
the denominator will still point to cell C12 (see Figure 1.9). You should be very careful to
use relative and absolute addressing appropriately in your models.
Functions
Functions are used to perform special calculations in cells. Some of the more common
functions that we will use in statistical applications include the following:
MIN(range)—finds the smallest value in a range of cells
MAX(range)—finds the largest value in a range of cells
SUM(range)—finds the sum of values in a range of cells
AVERAGE(range)—finds the average of the values in a range of cells
COUNT(range)—finds the number of cells in a range that contain numbers
COUNTIF(range, criteria)—finds the number of cells within a range that meet
specified criteria
Other more advanced functions often used in decision models are listed below:
AND(condition 1, condition 2…)—a logical function that returns TRUE if all conditions are true, and FALSE if not
FIGURE 1.9
Example of Absolute Address Referencing
Chapter 1 • Data and Business Decisions
OR(condition 1, condition 2…)—a logical function that returns TRUE if any condition
is true, and FALSE if not
IF(condition, value if true, value if false)—a logical function that returns one value if
the condition is true and another if the condition is false
VLOOKUP(value, table range, column number)—looks up a value in a table
Excel has a wide variety of other functions for statistical, financial, and other applications, many of which we will use throughout the text. The easiest way to locate a
particular function is to select a cell and click on the Insert function button [ fx], which
can be found under the ribbon next to the formula bar and also in the Function Library
group in the Formulas tab. This is particularly useful even if you know what function to
use but you are not sure of what arguments to enter. Figure 1.10 shows the dialog box
from which you may select the function you wish to use, in this case, the AVERAGE
function. Once this is selected, the dialog box in Figure 1.11 appears. When you click in
an input cell, a description of the argument is shown. Thus, if you were not sure what
to enter for the argument number 1, the explanation in Figure 1.11 will help you. For
further information, you could click on the Help on this function link button in the lower
left‐hand corner.
The IF function, IF(condition, value if true, value if false), allows you to choose one
of two values to enter into a cell. If the specified condition is true, value A will be put in
the cell. If the condition is false, value B will be entered. For example, if cell C2 contains
the function = IF(A8 = 2,7,12), it states that if the value in cell A8 is 2, the number 7 will
be assigned to cell C2; if the value in cell A8 is not 2, the number 12 will be assigned to
cell C2. “Conditions” may include the following:
=
7
6
equal to
greater than
less than
FIGURE 1.10
Insert Function Dialog
17
18
Part I • Statistics and Data Analysis
FIGURE 1.11
7=
6=
Function Arguments Dialog for Average
greater than or equal to
less than or equal to
not equal to
You may “nest” up to seven IF functions by replacing value‐if‐true or value‐if‐false
in an IF function with another IF function, for example:
= IF(A8 = 2,(IF(B3 = 5, “YES”,””)),15)
This says that if cell A8 equals 2, then check the contents of cell B3. If cell B3 is 5, then the
value of the function is the text string YES; if not, it is a blank space (a text string that is
blank). However, if cell A8 is not 2, then the value of the function is 15 no matter what
cell B3 is. You may use AND and OR functions as the condition within an IF function, for
example: = IF(AND(B1 = 3,C1 = 5),12,22). Here, if cell B1 = 3 and cell C1 = 5, then the
value of the function is 12, otherwise it is 22.
SKILL‐BUILDER EXERCISE 1.3
In the Excel file Residential Electricity Data, use Excel functions to find the maximum, minimum, and
total for the Number of Consumers and Average Monthly Consumption for all census divisions.
Other Useful Excel Tips
• Split Screen. You may split the worksheet horizontally and/or vertically to view
different parts of the worksheet at the same time. The vertical splitter bar is just to
the right of the bottom scroll bar, and the horizontal splitter bar is just above the
right‐hand scroll bar. Position your cursor over one of these until it changes shape,
click, and drag the splitter bar to the left or down.
• Paste Special. When you normally copy (one or more) cells and paste them in a
worksheet, Excel places an exact copy of the formulas or data in the cells (except
for relative addressing). Often you simply want the result of formulas, so the data
will remain constant even if other parameters used in the formulas change. To
do this, use the Paste Special option found within the Paste menu in the Clipboard
Chapter 1 • Data and Business Decisions
group under the Home tab instead of the Paste command. Choosing Paste Values
will paste the result of the formulas from which the data were calculated.
• Column and Row Widths. Many times a cell contains a number that is too large
to display properly because the column width is too small. You may change the
column width to fit the largest value or text string anywhere in the column by
positioning the cursor to the right of the column label so that it changes to a cross
with horizontal arrows, and then double‐click. You may also move the arrow to
the left or right to manually change the column width. You may change the row
heights in a similar fashion by moving the cursor below the row number label.
This can be especially useful if you have a very long formula to display. To break a
formula within a cell, position the cursor at the break point in the formula bar and
press Alt‐Enter.
• Displaying Formulas in Worksheets. Choose Show Formulas in the Formula Auditing
group under the Formulas tab. You will probably need to change the column width
to display the formulas properly.
• Displaying Grid Lines and Row and Column Headers for Printing. Check the Print
boxes for gridlines and headings in the Sheet Options group under the Page Layout
tab. Note that the Print command can be found by clicking on the Office button.
• Filling a Range with a Series of Numbers. Suppose you want to build a worksheet
for entering 100 data values. It would be tedious to have to enter the numbers from
1 to 100 one at a time. Simply fill in the first few values in the series and highlight
them. Now click and drag the small square (fill handle) in the lower right‐hand
corner down (Excel will show a small pop‐up window that tells you the last value
in the range) until you have filled in the column to 100; then release the mouse.
Excel Add‐Ins
Microsoft Excel will provide most of the computational support required for the
material in this book. Excel (Windows only) provides an add‐in called the Analysis
Toolpak, which contains a variety of tools for statistical computation, and Solver,
which is used for optimization. These add‐ins are not included in a standard Excel
installation. To install them in Excel 2010, click the File tab and then Options in the
left column. Choose Add‐Ins from the left column. At the bottom of the dialog, make
sure Excel Add‐ins is selected in the Manage: box and click Go. In the Add‐Ins dialog,
if Analysis Toolpak, Analysis Toolpak VBA, and Solver Add‐in are not checked, simply
check the boxes and click OK. You will not have to repeat this procedure every time
you run Excel in the future.
Four other add‐ins available with this book provide additional capabilities and features not found in Excel and will be used in various chapters in this book. Prentice‐Hall’s
PHStat2 (which we will simply refer to as PHStat) add‐in provides useful statistical support that extends the capabilities of Excel.6 Refer to the installation procedures on the
Companion Website. PHStat will be used in Chapters 1–8 and in Chapter 11. The student
version of Crystal Ball provides a comprehensive set of tools for performing risk analysis simulations. Crystal Ball will be used in Chapter 10. TreePlan provides Excel support
for decision trees and will be used in Chapter 11. Finally, Frontline Systems’ Risk Solver
Platform7 provides a replacement (called Premium Solver) for the default Solver in Excel
and will be used in Chapters 13 and 14. The Companion Website also includes an Excel
workbook, SimQuick‐v2.xls, which will be used for process simulation in Chapter 12.
6
The latest version of PHStat, PHStat2, is included on the Companion Website. Enhanced versions and
updates may be published on the PHStat Web site at www.prenhall.com/phstat. To date, PHStat is not available for Mac.
7
Risk Solver Platform is a full‐featured package that contains many other tools similar to other add‐ins we use
in this book. However, we will use only the Premium Solver component.
19
20
Part I • Statistics and Data Analysis
FIGURE 1.12
Spreadsheet Note
Excel Worksheet Process Capability
Throughout this book, we will provide many notes that describe how to use specific features of Microsoft Excel, PHStat, or other add‐ins. These are summarized in
chapter appendixes and are noted in the text by a margin icon when they will be useful
to supplement examples and discussions of applications. It is important to read these
notes and apply the procedures described in them in order to gain a working knowledge of the software features to which they refer.
We will illustrate the use of one of the PHStat procedures. In many cases, data on
Excel worksheets may not be in the proper form to use a statistical tool. Figure 1.12, for
instance, shows the worksheet Process Capability from the Excel file Quality Measurements,
which we use for a case problem later in this book. Some tools in the Analysis Toolpak
require that the data be organized in a single column in the worksheet. As a user, you
have two choices. You can manually move the data within the worksheet, or you can use
a utility from the Data Preparation menu in PHStat called Stack Data (see the note Using
the Stack Data and Unstack Data Tools in the Appendix to this chapter).
The tool creates a new worksheet called “Stacked” in your Excel workbook, a portion of which is shown in Figure 1.13. If the original data columns have group labels
(headers), then the column labeled “Group” will show them; otherwise, as in this example, the columns are simply labeled as Group1, Group2, and so on. In this example,
Group1 refers to the data in the first column. If you apply the Unstack Data tool to the
data in Figure 1.13, you will put the data in its original form.
SKILL‐BUILDER EXERCISE 1.4
Use the PHStat Stack tool to stack the sample observations for the first shift in the Excel file Syringe
Samples. Then, modify the Excel file Automobile Quality to label each car brand as either Foreign
or Domestic, use the PHStat Unstack tool to group them.
Chapter 1 • Data and Business Decisions
21
FIGURE 1.13 Portion of Stacked
Worksheet
DISPLAYING DATA WITH EXCEL CHARTS
The Excel file EEO Employment Report provides data on the employment in the state
of Alabama for 2006. Figure 1.14 shows a portion of this data set. Raw data such as
these are often difficult to understand and interpret. Graphs and charts provide a
convenient way to visualize data and provide information and insight for making
better decisions.
Microsoft Excel provides an easy way to create charts within your spreadsheet
(see the section on Creating Charts in Excel in Appendix 1.1). These include vertical and horizontal bar charts, line charts, pie charts, area charts, scatter plots, three‐
dimensional charts, and many other special types of charts. We generally will not
guide you through every application but will provide some guidance for new procedures as appropriate.
Spreadsheet Note
Column and Bar Charts
Excel distinguishes between vertical and horizontal bar charts, calling the former column
charts and the latter bar charts. A clustered column chart compares values across categories
using vertical rectangles; a stacked column chart displays the contribution of each value
to the total by stacking the rectangles; and a 100% stacked column chart compares the percentage that each value contributes to a total. An example of a clustered column chart is
FIGURE 1.14 Portion
of EEO Commission
Employment Report
22
Part I • Statistics and Data Analysis
shown in Figure 1.15 for the Alabama employment data shown previously; Figure 1.16
shows a stacked column chart for the same data. Bar charts present information in a
similar fashion, only horizontally instead of vertically.
SKILL‐BUILDER EXERCISE 1.5
Create the column chart shown in Figure 1.15 for the EEO Employment Report data.
FIGURE 1.15 Column
Chart for Alabama
Employment Data
FIGURE 1.16 Stacked
Column Chart
Chapter 1 • Data and Business Decisions
FIGURE 1.17
Line Chart for U.S. to China Exports
Line Charts
Line charts provide a useful means for displaying data over time. For instance, a line
chart showing the amount of U.S. exports to China in billions of dollars from the Excel
file China Trade Data is shown in Figure 1.17. The chart clearly shows a significant rise
in exports starting in the year 2000, which began to level off around 2008. You may
plot multiple data series in line charts; however, they can be difficult to interpret if the
magnitude of the data values differs greatly. In this case, it would be advisable to create
separate charts for each data series.
SKILL‐BUILDER EXERCISE 1.6
Create line charts for the closing prices in the Excel file S&P 500.
Pie Charts
For many types of data, we are interested in understanding the relative proportion of
each data source to the total. For example, consider the marital status of individuals in
the U.S. population in the Excel file Census Education Data, a portion of which is shown
in Figure 1.18. To show the relative proportion in each category, we can use a pie chart,
as shown in Figure 1.19. This chart uses a layout option that shows the labels associated
with the data, but not the actual values or proportions. A different layout that shows
both can also be chosen.
SKILL‐BUILDER EXERCISE 1.7
Create a pie chart showing the breakdown of occupations in the Science and Engineering Jobs
Excel file.
23
24
Part I • Statistics and Data Analysis
FIGURE 1.18
Data
FIGURE 1.19
Portion of Census Education
Pie Chart for Marital Status
Area Charts
An area chart combines the features of a pie chart with those of line charts. For example,
Figure 1.20 displays total energy consumption (billion Btu) and consumption of fossil fuels
from the Excel file Energy Production & Consumption. This chart shows that while total
energy consumption has grown since 1949, the relative proportion of fossil fuel consumption has remained generally consistent at about half of the total, indicating that alternative
energy sources have not replaced a significant portion of fossil fuel consumption. Area
charts present more information than pie or line charts alone but may clutter the observer’s
mind with too many details if too many data series are used; thus, they should be used
with care.
Scatter Diagrams
Scatter diagrams show the relationship between two variables. Figure 1.21 shows a scatter diagram of house size (in square feet) versus the home market value from the Excel
file Home Market Value. The data show that higher market values are associated with
larger homes. In Chapter 2 we shall see how to describe such a relationship numerically.
SKILL‐BUILDER EXERCISE 1.8
Create a scatter diagram showing the relationship between Hours online/week and Log‐ins/day in the
Facebook Survey data.
Chapter 1 • Data and Business Decisions
FIGURE 1.20
Area Chart for Energy Consumption
FIGURE 1.21
Scatter Diagram of House Size Versus Market Value
Miscellaneous Excel Charts
Excel provides several additional charts for special applications (see Figure 1.22). A
stock chart allows you to plot stock prices, such as the daily high, low, and close. It may
also be used for scientific data such as temperature changes. A surface chart shows
three‐dimensional data. A doughnut chart is similar to a pie chart but can contain more
than one data series. A bubble chart is a type of scatter chart in which the size of the
data marker corresponds to the value of a third variable; consequently, it is a way to
plot three variables in two dimensions. Finally, a radar chart allows you to plot multiple
dimensions of several data series.
Ethics and Data Presentation
In summary, tables of numbers often hide more than they inform. Graphical displays
clearly make it easier to gain insights about the data. Thus, graphs and charts are a
means of converting raw data into useful managerial information. However, it can be
easy to distort data by manipulating the scale on the chart. For example, Figure 1.23
shows the U.S. exports to China in Figure 1.17 displayed on a different scale. The pattern
looks much flatter and suggests that the rate of exports is not increasing as fast as it
25
26
Part I • Statistics and Data Analysis
FIGURE 1.22 Other Excel Charts
really is. It is not unusual to see distorted graphs in newspapers and magazines that
are intended to support the author’s conclusions. Creators of statistical displays have
an ethical obligation to report data honestly and without attempts to distort the truth.
SKILL‐BUILDER EXERCISE 1.9
Create a bubble chart for the first five colleges in the Excel file Colleges and Universities for which
the x‐axis is the Top 10% HS, y‐axis is Acceptance Rate, and bubbles represent the Expenditures
per Student.
FIGURE 1.23
An Alternate View of U.S. Exports to China
Chapter 1 • Data and Business Decisions
27
Basic Concepts Review Questions
1. Explain the importance of statistics in business.
2. Explain the difference between data and information.
3. Describe some ways in which data are used in different
business functions.
4. Explain how a company might use internal sources of
data, special studies, and external data bases.
5. What is a metric, and how does it differ from a measure?
6. Explain the difference between a discrete and a continuous metric.
7. Explain the differences between categorical, ordinal,
interval, and ratio data.
8. Explain the difference between cross‐sectional and
time‐series data.
9. What is statistical thinking? Why is it an important
managerial skill?
10. What is the difference between a population and a sample?
11. List the different types of charts available in Excel, and
explain characteristics of data sets that make each chart
most appropriate to use.
12. What types of chart would be best for displaying the
data in each of the following data sets on the Companion
Website? If several charts are appropriate, state this, but
justify your best choice.
a.
b.
c.
d.
e.
f.
Mortgage Rates
Census Education Data
Consumer Transportation Survey
MBA Student Survey
Vacation Survey
Washington, DC, Weather
Problems and Applications
1. For the Excel file Credit Approval Decisions, identify each
of the variables as categorical, ordinal, interval, and
ratio.
2. A survey handed out to individuals at a major shopping mall in a small Florida city in July asked the
following:
• Gender
• Age
• Ethnicity
• Length of residency
• Overall satisfaction with city services (using a
scale of 1–5 going from Poor to Excellent)
• Quality of schools (using a scale of 1–5 going
from Poor to Excellent)
a. What is the population that the city would want to
survey?
b. Would this sample be representative of the
population?
c. What types of data would each of the survey items
represent?
3. Construct a column chart for the data in the Excel file
State Unemployment Rates to allow comparison of the
June rate with the historical highs and lows. Would any
other charts be better to visually convey this information? Why or why not?
4. Data from the 2000 U.S. Census show the following
distribution of ages for residents of Ohio:
Total Households
4,445,773
Family households (families)
With own children under 18 years
Married‐couple family
With own children under 18 years
Female householder, no husband present
With own children under 18 years
Nonfamily households
Householder living alone
Householder 65 years and over
2,993,023
1,409,912
2,285,798
996,042
536,878
323,095
1,452,750
1,215,614
446,396
a. Construct a column chart to visually represent these
data.
b. Construct a stacked bar chart to display the sub
categories where relevant. (Note that you will have
to compute additional subcategories, for instance,
under Family households, the number of families
without children under 18, so that the total of the
subcategories equals the major category total. The
sum of all categories does not equal the total.)
c. Construct a pie chart showing the proportion of
households in each category.
5. The Excel file Energy Production & Consumption provides
various energy data since 1949.
a. Construct an area chart showing the fossil fuel production as a proportion of total energy production.
b. Construct line charts for each of the variables.
28
Part I • Statistics and Data Analysis
c. Construct a line chart showing both the total energy
production and consumption during these years.
d. Construct a scatter diagram for total energy exports
and total energy production.
e. Discuss what information the charts convey.
6. The Excel file Internet Usage provides data about users
of the Internet.
a. Construct appropriate charts that will allow you to
compare any differences due to age or educational
attainment.
b. What conclusions can you draw from these charts?
7. The Excel file Freshman College Data provides data from
different colleges and branch campuses within one university over four years.
a. Construct appropriate charts that allow you to contrast the differences among the colleges and branch
campuses.
b. Write a report to the academic vice president explaining the information.
8. Construct whatever charts you deem appropriate to
convey comparative information on the two categories
of televisions in the Excel file Hi‐Definition Televisions.
What conclusions can you draw from these?
9. Construct whatever charts you deem appropriate to
convey comparative information on deaths by major
causes in the Excel file Death Cause Statistics. What conclusions can you draw from these?
10. Construct an appropriate chart to show the proportion
of funds in each investment category in the Excel file
Retirement Portfolio.
11. Modify the Excel file Major League Baseball to identify
teams that have either a winning or losing record. Use
Excel functions to find the minimum and maximum
values for each type of data and count the number of
teams with winning and losing records.
Case
A Data Collection and Analysis Project
Develop a simple questionnaire to gather data that
include a set of both categorical variables and ratio variables. In developing the questionnaire, think about some
meaningful questions that you would like to address
using the data. The questionnaire should pertain to any
subject of interest to you, for example, customer satisfaction with products or school‐related issues, investments, hobbies, leisure activities, and so on—be creative!
(Several Web sites provide examples of questionnaires
that may help you. You might want to check out www
.samplequestionnaire.com or www.examplequestionnaire
.com for some ideas.) Aim for a total of 6–10 variables.
Obtain a sample of at least 20 responses from fellow
students or coworkers. Record the data on an Excel
worksheet and construct appropriate charts that visually
convey the information you gathered, and draw any conclusions from your data. Then, as you learn new material
in Chapters 2–7, apply the statistical tools as appropriate
to analyze your data and write a comprehensive report
that describes how you drew statistical insights and conclusions, including any relevant Excel output to support
your conclusions. (Hint: a good way to embed portions
of an Excel worksheet into a Word document is to copy it
and then use the Paste Special feature in Word to paste it
as a picture. This allows you to size the picture by dragging a corner.)
APPENDIX 1.1
Excel and PHStat Notes
A. Using the PHStat Stack Data and Unstack
Data Tools
From the PHStat menu, select Data Preparation then either
Stack Data (to create a single column from multiple columns)
or Unstack Data (to split a single column into multiple
according to a grouping label). Figures 1A.1 and 1A.2 show
the dialog boxes that appear. To stack data in columns (with
optional column labels), enter the range of the data in the
Unstacked Data Cell Range. If the first row of the range contains a label, check the box First cells contain group labels.
These labels will appear in the first column of the stacked
data to help you identify the data if appropriate.
Chapter 1 • Data and Business Decisions
FIGURE 1A.1
Stack Data Dialog
To unstack data in a single column and group them
according to a set of labels in another column, enter the
range of the column that contains the labels for the grouping...