Emory University Kinaxis Chooses Sales Reps with Personality Case Study

User Generated


Business Finance

Emory University


around 85 words for each Question

After completing this week's assigned reading, read the HR in Small Business case titled "Kinaxis Chooses Sales Reps with Personality" which can be found at the end of Ch. 6 (pgs. 271-272). Then, answer the questions at the end of the case, specifically:

1) What selection methods did Bob Dolan use for hiring salespeople? Did he go about using these methods in the best order? What, if anything, would you change about the order of the methods used?

2) What were the advantages to Kinaxis of using personality tests to help select sales representatives? What were the disadvantages?

3) Given the information gathered from the selection methods, what process did Dolan use to make his selection decision? What improvements can you recommend to this process for decisions to hire sales reps in the future?

Unformatted Attachment Preview

CHAPTER 6 Selection and Placement LEARNING OBJECTIVES After reading this chapter, you should be able to: LO 6-1 Establish the basic scientific properties of personnel selection methods, including reliability, validity, and generalizability. page 237 LO 6-2 Discuss how the particular characteristics of a job, an organization, or an applicant affect the utility of any test. page 246 LO 6-3 Describe the government’s role in personnel selection decisions, particularly in the areas of constitutional law, federal laws, executive orders, and judicial precedent. page 248 LO 6-4 List the common methods used in selecting human resources. page 254 LO 6-5 Describe the degree to which each of the common methods used in selecting human resources meets the demands of reliability, validity, generalizability, utility, and legality. page 254 234 >>> ENTER THE WORLD OF BUSINESS When Strangers Meet in a World with No Background Checks Most people growing up were probably told by their parents to “never accept a ride from a stranger.” In today’s age of ride-sharing apps, that advice probably seems quaint. Still, although quaint, it may also be tragically true. Ride-sharing companies exploded on the scene only very recently and quickly gained a major competitive advantage over traditional taxis and limousine ­services for a number of factors. One of the big factors was their hiring model, because unlike traditional driver services where employees had to undergo background checks (that included fingerprint records validated by state authorities), ridesharing companies were allowed to skip this step. These checks are expensive and time consuming, sometimes taking over two weeks. By escaping this part of the regulatory process, companies like Uber were able to hire drivers cheaper, faster, and more flexibly relative to the competition. As one industry analyst noted, “Taxis and limos are still required to abide by the old more stringent rules, but with Uber, it’s a free for all. It’s become the Wild West.” There are now questions as to whether this source of competitive advantage is going to be sustainable, however. Recent evidence suggests that Uber was routinely hiring convicted criminals and exposing its customers to risks that they could not possible imagine. For example, a simple search on the Internet—a technology we can assume Uber is aware of—would have revealed to the company that Talal Chammout should never have been hired. He had been convicted of shooting a person, hitting his wife with a crowbar, hiring a hitman, and even attempting to smuggle in a rocket launcher as part of a terrorist plot. The young woman who stepped into his car one night was blithely unaware of this and was totally unprepared when he followed her into her apartment and sexually assaulted her. Chammout won’t be driving for Uber for the next 25 years while he serves his sentence in a federal prison, but regrettably, he is not an isolated case. Investigations into Uber drivers have revealed that the company hired thousands of convicted felons. There have been over 100 cases in just the last four years where Uber drivers were arrested for murder, sexual assault, or first degree assault. Although the company has policies that bar drivers from carrying firearms in their vehicles, it is impossible for them to enforce that policy. In order to protect themselves from the total strangers they pick up, many Uber drivers are packing heat—in many cases with unregistered firearms. In fact, the only aspect of Uber’s selection practices that might be worse than their ­hiring standards and enforcement is that they actively lobbied local governments to protect this source of competitive advantage. In many cases, state and city lawmakers would pass bills that were written with language provided by lobbyists for the ride-sharing industry. As Saika Chen, an attorney who specializes in ride-sharing laws notes, “Lobbying is nothing new, but this is lobbying on steroids.” The sustainability of this form of competitive advantage is now being challenged, however, with lawsuits filed from both harmed passengers and state law enforcement agencies. For example, Uber was recently fined over $25 million by the District Attorneys from San Francisco and Los Angeles. Colorado’s Public Utility Service fined Uber over $4 million for a “failure to protect public safety.” Many other local jurisdictions, like sharks smelling blood in the water, are considering similar actions against what they see as a vulnerable and deep-pocketed potential defendant. Time will tell if Uber can withstand all of this pressure and survive, but the verdict CONTINUED 235 is already in on its CEO, Travis Kalanick. He was forced to leave the company he founded due to the hostile culture he tolerated when it came to employees. Well, at least with respect to his own employees, perhaps he knew who he was dealing with. Introduction SOURCES: C. Devine, N. Black, D. Girffin, and C. Roberts, “Thousands of Criminals Were Cleared to Be Uber Drivers, Here’s How the Rideshare Companies Fought Harder Checks,” CNN Online, June 1, 2018; R. Ellis and S. Jones, “Uber Driver Held after Fatal Shooting of Passenger in Denver,” CNN Online, June 2, 2018; “Uber Embraces Major Reforms as Travis Kalanick, the CEO Steps Away,” The New York Times Online, June 13, 2017. Any organization that intends to compete through people must take the utmost care with how it chooses organizational members. These decisions have a critical impact on the organization’s ability to compete, as well as on each and every job applicant’s life. Organizations have to strive to make sure that the decisions they make with respect to who gets accepted or rejected for jobs promote the best interests of the company and are fair to all parties involved. Poorly informed decisions like the ones we saw at the beginning of this chapter harm everyone who comes into contact with such organizations. Although the vignette that opened this chapter focused on Uber, similar concerns apply to job applicants who go on to commit other crimes. For example, the terrorist who killed 49 people in an Orlando nightclub, Omar Mateen, had been hired by the global security firm G4S, which issued him one of the weapons used in the fatal attack. G4S blamed this hiring decision on “a clerical error.” The question then becomes whether one would trust their security needs to a company that would make such an egregious mistake with an otherwise simple background check.1 Many organizations seem to have forgotten how important it is to maintain hiring standards, especially when confronted with the labor shortages like we have witnessed in the last few years. For example, in 2018, many companies reacted to labor shortages by hiring people sight unseen. Applicants would call in a number and be interviewed on the phone, believing that if the phone call went well; they would get an onsite interview. Instead, most were hired right on the spot and told when and where to show up for work. This was so unprecedented that it raised concerns even among job applicants. In fact, when Jamari Powell was hired at Macy’s after just a 20-minute phone interview, she noted that, “It was a little weird. It kind of feels like a scam almost.”2 Selecting the best talent is critical to the competitiveness of organizations and nations. Innovation and economic growth are fueled by people, and the firms or countries that bring in the best people will be the ones that compete most successfully. For example, the United States has always tried to be a magnet for talent from other countries, and this country grew economically powerful due to the contributions of people who have emigrated from other countries. In contrast, the recent evidence suggests that Russia is losing its edge when it comes to keeping young talent in the country. The number of people emigrating from Russia and the former Soviet Union states jumped from 14,000 to 56,000 in just the last four years. Recent public demonstrations in Moscow and other Russian cities in protest of corruption have been the largest rallies ever seen, and most of the protesters were college students that represent the country’s future.3 The purpose of this chapter is to familiarize you with ways to minimize errors in employee selection and placement and, in doing so, improve your company’s competitive position when it comes to hiring winners. We focus first on five standards that any selection method should meet. Then we evaluate several common selection methods that meet those standards and discuss how these may be used to prevent companies from hiring low performing, dubious characters that may harm the firm’s reputation. 236 CHAPTER 6 Selection and Placement 237 Selection Method Standards Personnel selection is the process by which companies decide who will or will not be allowed into organizations. Several generic standards should be met in any selection ­process. We focus on five: (1) reliability, (2) validity, (3) generalizability, (4) utility, and (5) legality. The first four build off each other in the sense that the preceding standard is often necessary but not sufficient for the one that follows. This is less the case with legal standards. However, a thorough understanding of the first four standards helps us understand the rationale underlying many legal standards. LO 6-1 Establish the basic scientific properties of personnel selection methods, including reliability, validity, and generalizability. RELIABILITY Much of the work in personnel selection involves measuring characteristics of people to determine who will be accepted for job openings. For example, we might be interested in applicants’ physical characteristics (like strength or endurance), their cognitive abilities (such as spatial memory or verbal reasoning), or aspects of their personality (like their decisiveness or integrity). Many people have inaccurate stereotypes about how these kinds of characteristics may be related to factors such as race, sex, age, or ethnic background; therefore, we need to get past these stereotypes and measure the actual attributes directly.4 For example, with respect to jobs in the field of public safety, research employing fake résumés sent to employers found that white applicants with a criminal background were more likely to be hired than African American applicants with no criminal record but identical on all other attributes.5 One key standard for any measuring device is its reliability. We define reliability as the degree to which a measure is free from random error. If a measure of some supposedly stable characteristic such as intelligence is reliable, then the score a person receives based on that measure will be consistent over time and in different contexts. Estimating the Reliability of Measurement Most measurement in personnel selection deals with complex characteristics like intelligence, integrity, and leadership ability. However, to appreciate some of the complexities in measuring people, we will consider something concrete in discussing these concepts: the measurement of height. For example, if we were measuring an applicant’s height, we might start by using a 12-inch ruler. Let’s say the first person we measure turns out to be 6 feet, 1 41 inches tall. It would not be surprising to find out that someone else measuring the same person a second time, perhaps an hour later, found this applicant’s height to be 6 feet, 43 inches. The same applicant, measured a third time, maybe the next day, might be measured at 6 feet, 1 12 inches tall. As this example makes clear, even though the person’s height is a stable characteristic, we get slightly different results each time he is assessed. This means that each time the person is assessed, we must be making slight errors. If we used a measure of height that was not as reliable as a ruler—for example, guessing someone’s height after seeing her walk across the room— we might see an even greater amount of unreliability in the measure. Thus reliability refers to the measuring instrument (a ruler versus a visual guess) rather than to the characteristic itself. We can estimate reliability in several different ways, and because most of these rely on computing a correlation coefficient, we will briefly describe and illustrate this statistic. The correlation coefficient is a measure of the degree to which two sets of numbers are related. The correlation coefficient expresses the strength of the relationship in numerical form. A perfect positive relationship (as one set of numbers goes up, so does the other) equals +1.0; a perfect negative relationship (as one goes up, the other goes down) equals –1.0. When there is no relationship between the sets of numbers, the correlation equals 0.00. Reliability The consistency of a performance measure; the degree to which a performance measure is free from random error. 238 CHAPTER 6 Selection and Placement Although the actual calculation of this statistic goes beyond the scope of this book, it will be useful for us to conceptually examine the nature of the correlation coefficient and what this means in personnel selection contexts. When assessing the reliability of a measure, for example, we might be interested in knowing how scores on the measure at one time relate to scores on the same measure at another time. Obviously, if the characteristic we are measuring is supposedly stable (like intelligence or integrity) and the time period is short, this relationship should be strong. If it were weak, then the measure would be inconsistent—hence, unreliable. This is called assessing test–retest reliability. Note that the time period between measurements is important when it comes to interpreting test–retest reliability. The assumption is that the characteristic being measured is not changing; hence, any change from Time 1 to Time 2 is treated as an error. When the time period becomes too long, this increases the chance that the characteristic itself is changing. For example, if one is measuring personality traits, the evidence suggests that people become more conscientious, more introverted, and more emotionally stable as they get older. These are not age stereotypes but rather scientifically documented facts about the instability of certain personality traits over extended periods of time.6 Plotting the two sets of numbers on a two-dimensional graph often helps us to appreciate the meaning of various levels of the correlation coefficient. Figure 6.1, for example, examines the relationship between student scholastic aptitude in one’s junior and senior years in high school, where aptitude for college is measured in three ways: (1) via scores on the SAT (formerly known as the Scholastic Aptitude Test), (2) via ratings from a high school counselor on a 1-to-100 scale, and (3) via tossing dice. In this plot, each number on the graphs represents a person whose scholastic aptitude is assessed twice (in the junior and senior years), so in Figure 6.1a, 1 represents a person who scored 1580 on the SAT in the junior year and 1500 in the senior year; 20 represents a person who scored 480 in the junior year and 620 in the senior year. Figure 6.1a shows a very strong relationship between SAT scores across the two years. This relationship is not perfect in that the scores changed from one year to the next but Figure 6.1a Senior-Year SAT Score Measurements of a Student’s Aptitude 1600 2 1 10 1300 5 3 8 15 6 11 4 16 14 1000 13 9 7 17 19 700 12 20 18 400 400 700 1300 1600 1000 Junior-Year SAT Score CHAPTER 6 Selection and Placement 239 Senior-Year Counselor’s Rating Figure 6.1b 100 80 1 4 13 2 8 17 6 60 14 9 20 40 3 11 16 7 5 12 18 15 10 19 20 20 40 60 80 100 Junior-Year Counselor’s Rating not by a great deal. Turning to Figure 6.1b, we see that the relationship between the high school counselors’ ratings across the two years, while still positive, is not as strong. That is, the counselors’ ratings of individual students’ aptitudes for college are less consistent over the two years than are the students’ test scores. This might be attributable to the fact the counselor’s rating during the junior year was based on a smaller number of observations relative to the ratings made during senior year. Finally, Figure 6.1c shows a Figure 6.1c 12 11 8 Senior-Year Toss of Dice 10 14 9 19 3 9 8 15 7 10 1 5 18 4 6 16 5 20 11 6 12 4 17 3 2 7 13 2 2 3 4 5 6 7 8 9 10 Junior-Year Toss of Dice 11 12 240 CHAPTER 6 Selection and Placement worst-case scenario, where the students’ aptitudes are assessed by tossing two six-sided dice. As you would expect, the random nature of the dice means that there is virtually no relationship between scores taken in one year and scores taken the next. Although no one would seriously consider tossing dice to be a measure of aptitude, research shows that the correlation of overall ratings of job applicants’ suitability for jobs based on unstructured interviews is very close to 0.00. Thus, one cannot assume a measure is reliable without checking its reliability directly. Novices in measurement are often surprised at exactly how unreliable many human judgments turn out to be. Thus, much of the science that deals with selection tries to go beyond subjective human judgments. So for example, if one wants to really know how extroverted a person is, a sociometric badge that records the number, length, and nature of this person’s communication patterns across time is likely to provide more reliable test–retest data relative to the subjective perceptions of a former supervisor or interviewer who met the person just once.7 Standards for Reliability Regardless of what characteristic we are measuring, we want highly reliable measures. Thus, in the previous example, when it comes to measuring students’ aptitudes for college, the SAT is more reliable than counselors’ ratings, which in turn are more reliable than tossing dice. But in an absolute sense, how high is high enough—0.50, 0.70, 0.90? This is a difficult question to answer specifically because the required reliability depends in part on the nature of the decision being made about the people being measured. For example, let’s assume some college admissions officer was considering several students depicted in Figures 6.1a and 6.1b. Turning first to Figure 6.1b, assume the admissions officer was deciding between Student 1 and Student 20. For this decision, the 0.50 reliability of the ratings is high enough because the difference between the two students’ counselors’ ratings is so large that one would make the same decision for admissions regardless of the year in which the rating was taken. That is, Student 1 (with scores of 100 and 80 in the junior and senior years, respectively) is always admitted and Student 20 (with scores of 12 and 42 for junior and senior years, respectively) is always rejected. Thus, although the ratings in this case are not all that reliable in an absolute sense, their reliability is high enough for this decision. By contrast, let’s assume the same college admissions officer was deciding between Student 1 and Student 2. Looking at Figure 6.1a, it is clear that even with the highly reliable SAT scores, the difference between these students is so small that one would make a different admissions decision depending on the year the score was obtained. Student 1 would be selected over Student 2 if the junior-year score was used, but Student 2 would be chosen over Student 1 if the senior-year score was used. Thus, even though the reliability of the SAT exam is high in an absolute sense, it is not high enough for this decision. Under these conditions, the admissions officer needs to find some other basis for making the decision regarding these two students (like high school GPA or rank in graduating class). Although these two scenarios clearly show that no specific value of reliability is always acceptable, they also demonstrate why, all else being equal, the more reliable a measure is, the better. For example, turning again to Figures 6.1a and 6.1b, consider Student 9 and Student 14. One would not be able to make a decision between these two students based on scholastic aptitude scores if assessed via counselors’ ratings, because the unreliability in the ratings is so large that scores across the two years conflict. However, one would be CHAPTER 6 Selection and Placement 241 able to base the decision on scholastic aptitude scores if assessed via the SAT, because the reliability of the SAT scores is so high that scores across the two years point to the same conclusion. VALIDITY We define validity as the extent to which performance on the measure is related to performance on the job. A measure must be reliable if it is to have any validity. By contrast, we can reliably measure many characteristics (like height) that may have no relationship to whether someone can perform a job. For this reason, reliability is a necessary but insufficient condition for validity. Validity Criterion-Related Validation Criterion-related validity One way of establishing the validity of a selection method is to show that there is an empirical association between scores on the selection measure and scores for job performance. If there is a substantial correlation between test scores and job-performance scores, criterion-related validity has been established. For example, Figure 6.2 shows the relationship between 2014 scores on the SAT and 2015 freshman grade point average (GPA). In this example, there is roughly a 0.50 correlation between the SAT and GPA. This 0.50 is referred to as a validity coefficient. Note that we have used the correlation coefficient to assess both reliability and validity, which may seem somewhat confusing. The key distinction is that the correlation reflects a reliability estimate when we are attempting to assess the same characteristic twice (such as SAT scores in the junior and senior years), but the correlation coefficient reflects a validity coefficient when we are attempting to relate one characteristic (SAT) to performance on some task (GPA). Criterion-related validity studies come in two varieties. Predictive validation seeks to establish an empirical relationship between test scores taken prior to being hired The extent to which a performance measure assesses all the relevant—and only the relevant—aspects of job performance. A method of establishing the validity of a personnel selection method by showing a substantial correlation between test scores and job-performance scores. Predictive validation A criterion-related validity study that seeks to establish an empirical relationship between applicants’ test scores and their eventual performance on the job. Figure 6.2 Freshman GPA (2015) Relationship between 2014 SAT Scores and 2015 Freshman GPA 4.0 4 13 8 17 1 2 6 3.0 14 3 11 16 2.0 9 20 0.0 5 12 18 15 1.0 7 10 19 400 500 600 700 SAT Scores (2014) 800 242 CHAPTER 6 Selection and Placement Concurrent validation A criterion-related validity study in which a test is administered to all the people currently in a job and then incumbents’ scores are correlated with existing measures of their performance on the job. and eventual performance on the job. Because of the time and effort required to conduct a predictive validation study, many employers are tempted to use a different design. Concurrent validation assesses the validity of a test by administering it to people already on the job and then correlating test scores with existing measures of each person’s performance. For example, the testing company Infor measures 39 behavioral, cognitive, and cultural traits among job applicants and then compares their scores on those dimensions with the top performers in the company. The assumption is that if high performers in the company score high on any trait, then the company should use scores on this trait to screen new hires.8 Figure 6.3 compares the two types of validation study. Despite the extra effort and time needed for predictive validation, it is superior to concurrent validation for a number of reasons. First, job applicants (because they are seeking work) are typically more motivated to perform well on the tests than are current employees (who already have jobs). Thus, job applicants are more tempted to fake responses in order to look good relative to current jobholders. Second, current Figure 6.3 Graphic Depiction of Concurrent and Predictive Validation Designs Concurrent Validation Measure all current job incumbents on attribute Measure all current job incumbents’ performance Obtain correlation between these two sets of numbers Predictive Validation Measure all job applicants on attribute Hire some applicants and reject others Obtain correlation between these two sets of numbers Wait for some time period Measure all newly hired job incumbents’ performance CHAPTER 6 SAMPLE SIZE REQUIRED CORRELATION 5 10 20 40 80 100 0.75 0.58 0.42 0.30 0.21 0.19 Selection and Placement 243 employees have learned many things on the job that job applicants have not yet learned. Therefore, the correlation between test scores and job performance for current employees may not be the same as the correlation between test scores and job performance for less knowledgeable job applicants. Thus, although concurrent studies can sometimes help one to anticipate the results of predictive studies, they do not serve as substitutes. Obviously, we would like our measures to be high in validity; but as with the reliability standard, we must also ask, how high is high enough? When trying to determine how much validity is enough, one typically has to turn to tests of statistical significance. A test of statistical significance answers the question, “Assuming that there is no true relationship between the predictor and the criterion, what are the odds of seeing a relationship this strong by chance alone?” If these odds are very low, then one might infer that the results from the test were in fact predicting future job performance. Table 6.1 shows how big a correlation between a selection measure and a measure of job performance needs to be to achieve statistical significance at a level of 0.05 (that is, there is only a 5 out of 100 chance that one could get a correlation this big by chance alone). Although it is generally true that bigger correlations are better, the size of the sample on which the correlation is based plays a large role as well. Because many of the selection methods we examine in the second half of this chapter generate correlations in the 0.20s and 0.30s, we often need samples of 80 to 90 people. A validation study with a small sample (such as 20 people) is almost doomed to failure from the start. Advances in the ability to process big data via cloud-based analytics is greatly expanding the ability to find valid predictors of future job performance. For example, in the past, when it came to staffing its call centers, Xerox Corporation always looked for applicants who had done the job before. This seemed like a reasonable approach to take until the company assessed the empirical relationship between experience, on the one hand, and performance and turnover, on the other hand, and learned that experience did not matter at all. Instead, what really separated winners and losers in this occupation was their personality. People who were creative tended to perform well and stay on the job for a long time, whereas those who were inquisitive tended to struggle with the job and leave well before the company ever recouped its $5,000 investment in training. Xerox now leaves all hiring for its nearly 500,000 call center jobs to a computer software algorithm that tirelessly looks for links between responses to personality items and a highly specific set of job outcomes. The program was developed by Evolv Inc., and rather than relying on interviewer judgments that might be subject to personal biases, the Evolv program puts applicants through a battery of tests and personality items, then tracks their outcomes at the company over time. The algorithm is continually adjusting itself with the accumulation of ever more data, all in an effort to develop a statistical model that describes the ideal call center employee.9 Table 6.1 Required Level of Correlation to Reach Statistical Significance as a Function of Sample Size 244 CHAPTER 6 Selection and Placement Evolv is just one player in an expanding industry that seeks to use big data to help companies find and retain the best employees. Globally, spending on this sort of talent management software rose 15% in just one year to an estimated value of $3.8 billion, and the competition for this business is intense. Indeed, as the Competing through Technology box shows, the competition in this business involves not just hiring the right person, but assembling a number of “right persons” into a team. This is important because organizations are increasingly structured in teams, and many organizations are often disappointed when a set of individuals who all look great on paper and when working alone, fall apart or become problems when working interdependently with others.10 Indeed, some have suggested that organizations should recruit only intact teams, rather than individuals who are then assembled into arbitrary teams.11 Content Validation Content validation A test-validation strategy performed by demonstrating that the items, questions, or problems posed by a test are a representative sample of the kinds of situations or problems that occur on the job. When sample sizes are small, an alternative test validation strategy, content validation, can be used. Although criterion-related validity is established by empirical means, content validity is achieved primarily through a process of expert judgment. Content validation is performed by demonstrating that the questions or problems posed by the test are a representative sample of the kinds of situations or problems that occur on the job. A test that is content valid exposes the job applicant to situations that are likely to occur on the job, and then tests whether the applicant currently has sufficient knowledge, skills, or abilities to handle such situations. Many of the new simulations that organizations are using are essentially computerbased role-playing games, where applicants play the role of the job incumbent, confronting the exact types of people and problems real-live job incumbents would face. The simulations are just like traditional role-playing games (e.g., The Sims), and the applicant’s reactions and behaviors are scored to see how well they match with what one would expect from the ideal employee. For example, if one is considering applicants for a wait staff job at a restaurant, the game Wasabi Waiter, designed by Knack.it, allows the employer to watch how the applicant responds to finicky customers, uppity receptionists, emotionally unstable chefs, and other predictably challenging situations that are likely to take place in a busy establishment.12 Because the content of these tests so closely parallels the content of the job, one can safely make inferences from one to the other. For example, in the field of computer programming, employers see the skills needed to win international software code problemsolving competitions as highly related to the skills necessary to perform well on the job. For those who are unaware of the fast-growing sport of computer programming, an important warning—these contests do not make for riveting television. In most of the contests, roughly two dozen competitors who worked their way to the finals by topping thousand of others in preliminary events online, rarely move from their workstation as they work through five standardized puzzles that have to be solved quickly with code that is as efficient as possible. Still, many employers study the results from these events looking to hire both winners and runner-ups because they view this as a highly valid work sample test. As Vladimir Novakovski, vice president for engineering at Addepar, a software provider in the investment industry notes that, “Every time I hire someone who is good in these contexts, they have crushed the job. They tend to be fast, accurate, and into getting things done.”13 If there is any problem with this source of recruits, it is that some of the competitors are so good and make so much money in the contests, they have no interest in applying for a full-time job.14 COMPETING THROUGH TECHNOLOGY One Part Personality plus One Part AI: The Formula for Team Chemistry The 2004 U.S. Men’s Basketball Team was composed of some of the greatest players of all time, including LeBron James, Dwayne Wade, Tim Duncan, ­Allen Iverson, Carmello Anthony, and Stephon Marbury. The coaching staff was also ­renowned for their past success and included Larry Brown, Greg Popovich, and Roy Williams. Rarely has this amount of individual talent ever been assembled on one team and never has a team with this much individual talent so underachieved. The 2004 version of the Dream Team turned out to be a nightmare that lost three games, coming away with nothing more than a Bronze Medal in a sport that was invented in their country. Clearly, although personnel selection can never ignore talent at the individual level, organizations increasingly employ teambased structures, and thus, there is an urgent need to go beyond the individual level and consider the team as a separate object in and of itself. That is, HR staffing specialists need to learn when and where a collection of individuals will come together to be greater than—or less than—the sum of their parts. Some companies are turning to artificial intelligence (AI) solutions to solve this problem. For example, Nexus AI is a Chicago-based firm that composes teams for companies as part of a twostage process. The first stage matches individual’s skills and abilities with the job requirements associated with the functional role that a person will play. This is very standard HR, and there are many tech companies that can provide a similar service. However, Nexus does not stop there and, after recommending a large slate of potential people to fill potential roles based on abilities, then goes on to a second stage that determines the right mix of individuals based upon their personalities. The AI solution begins at Time 1 with a set of general principles based upon past research. Then, after every project, the team is evaluated by peers and supervisors, and the AI tracks these responses. Over time the AI begins to learn what mix of personality traits is best for different types of team projects, thus going beyond past research. Nexus also tracks workforce utilization parameters to make sure that the AI algorithm is not learning and incorporating the ­biases inherent in the human judgments into its algorithm. For example, an earlier foray into AI and personnel selection at ­Amazon learned the hard way that performance evaluations were biased against female ­applicants in some job categories. The AI learned the exact same prejudice like a precocious child and then incorporated it into decisions that had adverse ­impacts on women. In another case, the data revealed that workers from two zip code areas tended to have lower performance evaluations relative to others. The AI quickly picked up this fact and used it to discriminate against people from that zip code—who turned out to be primarily African Americans. Thus, in HR contexts, preventing artificial discrimination is just as ­important as leveraging ­artificial intelligence. DISCUSSION QUESTIONS 1. How does the evolution to team-based structures change the equation when it comes to personnel selection and placement? 2. In what ways are AI analytic ­solutions similar to—and different from—traditional criterionrelated validation approaches? SOURCES: A. Chowdhry, “How Nexus A.I. is Helping Companies Discover Untapped Talent,” Forbes Online, November 13, 2017; J. Davis, “Can AI Really Build Effective Teams?” HR Daily Advisor Online, April 17, 2018; J. McGregor, “Why Robots Aren’t Going to Make the Call on Hiring Anytime Soon,” The Washington Post Online, October 11, 2018. The ability to use content validation in small-sample settings makes it generally more applicable than criterion-related validation. However, content validation has two limitations. First, one assumption behind content validation is that the person who is to be hired must have the knowledge, skills, or abilities at the time he or she is hired. Second, because subjective judgment plays such a large role in content validation, it is 245 246 CHAPTER 6 Selection and Placement critical to minimize the amount of inference involved on the part of judges. Thus, the judges’ ratings need to be made with respect to relatively concrete and observable behaviors. GENERALIZABILITY It was once believed, for example, that validity coefficients were situationally specific—that is, the level of correlation between test and performance varied as one went from one organization to another, even though the jobs studied seemed to be identical. Subsequent research has indicated that this is largely false. Rather, tests tend to show similar levels of correlation even across jobs that are only somewhat similar (at least for tests of intelligence and cognitive ability). Correlations with these kinds of tests change as one goes across widely different kinds of jobs, however. Specifically, the more complex the job, the higher the validity of many tests. It was also believed that tests showed differential subgroup validity, which meant that the validity coefficient for any test–job performance pair was different for people of different races or genders. This belief was also refuted by subsequent research, and, in general, one finds very similar levels of correlations across different groups of people.15 Because the evidence suggests that test validity often extends across situations and subgroups, validity generalization stands as an alternative for validating selection methods for companies that cannot employ criterion-related or content validation. Validity generalization is a three-step process. First, the company provides evidence from previous ­criterion-related validity studies conducted in other situations that shows that a specific test (such as a test of emotional stability) is a valid predictor for a specific job (like nurse at a large hospital). Second, the company provides evidence from a job analysis to document that the job it is trying to fill (nurse at a small hospital) is similar to the job already validated elsewhere (nurse at a large hospital). Finally, if the company can show that it uses a test that is the same as or similar to that used in the validated setting, then one can “generalize” the validity from the first context (large hospital) to the new context (small hospital). UTILITY LO 6-2 Discuss how the particular characteristics of a job, an organization, or an applicant affect the utility of any test. Utility The degree to which the information provided by selection methods enhances the effectiveness of selecting personnel in real organizations. Utility is the degree to which the information provided by selection methods enhances the bottom-line effectiveness of the organization. In general, the more reliable, valid, and generalizable the selection method is, the more utility it will have. However, many characteristics of particular selection contexts enhance or detract from the usefulness of given selection methods, even when reliability, validity, and generalizability are held constant. Figures 6.4a and 6.4b, for example, show two different scenarios where the correlation between a measure of extroversion and the amount of sales revenue generated by a sample of sales representatives is the same for two different companies: Company A and Company B. Although the correlation between the measure of extroversion and sales is the same, Company B derives much more utility or practical benefit from the measure. That is, as indicated by the arrows proceeding out of the boxes (which indicate the ­people selected), the average sales revenue of the three people selected by Company B (Figure 6.4b) is $850,000, compared to $780,000 from the three people selected by Company A (Figure 6.4a). The major difference between these two companies is that Company B generated twice as many applicants as Company A. This means that the selection ratio (the percentage of people selected relative to the total number of people tested) is quite low for Company B (3/20) relative to Company A (3/10). Thus, the people selected by Company B have higher amounts of extroversion than those selected by Company A; therefore, Company CHAPTER 6 Selection and Placement 247 Figure 6.4a Sales Revenue (2013) Company A $900,000 $900,000 $800,000 1 2 4 $800,000 5 $700,000 $700,000 7 6 3 8 10 $600,000 $600,000 9 $500,000 $500,000 400 500 600 700 800 Extroversion Score (2012) Figure 6.4b Company B Sales Revenue (2013) Utility of Selecting on Extroversion Scores When Selection Ratio Is High $900,000 $900,000 2 1 10 $800,000 5 3 8 $700,000 16 14 $800,000 6 11 15 $850,000 4 13 9 7 $700,000 17 19 $600,000 12 $600,000 20 18 $500,000 400 $500,000 500 600 700 800 Extroversion Score (2012) B takes better advantage of the relationship between extroversion and sales. Thus, the utility of any test generally increases as the selection ratio decreases, as long as the additional costs of recruiting and testing are not excessive. The utility of a test also depends upon the distribution of the trait or the performance metric. Most individual differences take on the form of a normal distribution. In Utility of Selecting on Extroversion Scores When Selection Ratio Is Low 248 CHAPTER 6 Selection and Placement Comparing a Normal Distribution (Red Curve) to a Power Law (Blue Shading) Number of Workers Figure 6.5 Performance Level from Low to High other words, most people are in the middle, followed by a smaller group of people who are a little bit above or below the mean, followed by an even smaller group of outliers far above and below the mean. This belief in the normal distribution has traditionally been extended to people’s beliefs about job performance, even though little evidence has been collected to test this belief. However, a study examining over 600,000 entertainers, politicians, amateur athletes, professional athletes, and scientists has challenged this idea and instead suggests that job performance follows a power law distribution. Figure 6.5 shows how a distribution that follows a power law differs dramatically from a normal distribution, in the sense that there are few high performers and a large group of potentially poor performers.16 The implication of these findings for utility analysis is important because it implies that the dollar value of a “highly productive worker” (e.g., someone who is one standard deviation above the mean, perhaps selected based upon a validated test) and an “average worker” (e.g., at the mean, perhaps selected at random) is much greater than one would expect if the distribution were normal. As an example, a scientist with the “average” publication rate is much, much closer to the bottom of the performance distribution than he or she is to the top. These findings also suggests that the use of dichotomous, success versus failure criteria (e.g., above or below the median or some arbitrary cut-off score) for evaluating performance may far underestimate the huge difference among people, all of whom might be above the mean. Thus, any type of minimum competency cut-off used to score success vastly underestimates the utility of a valid predictor. Overall, a test has much more utility when it predicts performance that is distributed as a power law.17 LO 6-3 Describe the government’s role in personnel selection decisions, particularly in the areas of constitutional law, federal laws, executive orders, and judicial precedent. LEGALITY The final standard to which any selection method should adhere is legality. All selection methods should conform to existing laws and existing legal precedents. For example, Kentucky Fried Chicken requires its workers to wear slacks and was charged with discrimination when it refused to allow Sheila Silver, a Pentecostal Christian, to wear a long dress at work, which was what her religion required. In a similar case, with a different religion, the New York City Police Department was charged with violating the religious rights of a Muslim officer whose religion-required beard violated the department’s appearance code.18 These are hardly isolated incidents in the sense that CHAPTER 6 Selection and Placement 249 cases based on religious discrimination have skyrocketed recently. According to the Equal Employment Opportunity Commission (EEOC), in 2013 alone, over 3,700 religious discrimination claims were brought against employers.19 In both of these cases, the court upheld the religious beliefs of the job applicant against requirements posed by the employer. Employers who are taken to court for illegal discrimination experience high costs associated with litigation, settlements, and awards, and also suffer potential damage to their social reputations as good employers, making recruitment and growth more difficult. This is exactly what happened to Chick-fil-A. Even though the firm had never been charged with any form of employment discrimination, when the president of the company made disparaging comments regarding gay marriage in 2012, there was an immediate negative backlash against “hate chicken” that harmed sales. Even worse, it threatened the company’s expansion plans and strategy to move into northern and urban areas. The mayor of Boston went so far as to send a letter to the company urging it to back down from plans to locate in Boston, and he was quoted in the Boston Herald saying that “he would make it very difficult” for the restaurant to come to town. Chicago mayor Rahm Emanuel chimed in and stated that “Chick-fil-A’s values are not Chicago’s values,” and protest movements in New York City and San Francisco were organized to oppose expansion into those areas. All of this happened even though no one ever presented any evidence or even charged the company with discriminating against gay customers or job applicants.20 Federal Legislation Three federal laws form the basis for a majority of the suits filed by job applicants: the Civil Rights Act of 1991, the Age Discrimination in Employment Act of 1967, and the Americans with Disabilities Act of 1990 (all discussed in Chapter 3). Civil Rights Act of 1991. An extension of the Civil Rights Act of 1964, the Civil Rights Act of 1991 protects individuals from discrimination based on race, color, sex, religion, and national origin with respect to hiring as well as compensation and working conditions. This act defines employers’ explicit obligation to establish the business necessity of any neutral-appearing selection method that has had adverse impact on groups specified by the law. This is typically done by showing that the test has significant criterion-related or content validity. If the employer cannot show such a difference, which the research suggests will be difficult, then the process may be ruled illegal. Ironically, for example, the Consumer Finance Protection Bureau (CFPB) that was created as part of the DoddFrank Act, which regulates banks and financial institutions to specifically prevent discrimination in loan practices, discovered that its own promotion policies created adverse impact. An investigation into the CFPB’s promotion policies found that 21% of the agency’s white employees received the highest performance rating compared with just 10% of the African American employees and 9% of Hispanic employees. Since this rating was used to make promotion decisions, it became a neutral-appearing employment practice that created adverse impact and thus had to be justified.21 Many other employers, if challenged, could find themselves with similar problems because the statistics at CFBP actually mirror the statistics for employers as a whole. Investigations into employers believed to be unfairly discriminating against African American candidates will often send résumés from fictitious applicants whose credentials are exactly the same except for race. Studies show that white applicants in these studies are 33% more likely to be hired than the identically qualified black candidate, which is pure evidence of bias.22 250 CHAPTER 6 Selection and Placement Similar data was assembled in the banking industry, and a class action suit was filed against Goldman Sachs, accusing the firm of discriminating against women. This is a tough industry for female employees; no woman has ever run a major New York bank, and less than 20% of executive and senior managers at Citigroup, JPMorgan Chase, and Goldman Sachs are women.23 In the case against Goldman, Christina Chen-Oster and her legal team were able to document how the percentage of women at each transition level (e.g., from regional director to vice president, and then vice president to managing director) got smaller and smaller.24 This is the kind of evidence that “shifts the burden of proof” to the employer (i.e., Goldman Sachs), to prove that these promotion decisions were based on a business necessity. The Civil Rights Act of 1991 allows the individual filing the complaint to have a jury decide whether he or she may recover punitive damages (in addition to lost wages and benefits) for emotional injuries caused by the discrimination. This can generate large financial settlements as well as poor public relations that can hinder the organization’s ability to compete. Finally, the 1991 act explicitly prohibits the granting of preferential treatment to minority groups. Preferential treatment is often attractive because many of the most valid methods for screening people, especially cognitive ability tests and work sample tests, often are high in adverse impact.25 For example, although software coding sport competitions help organizations uncover talented programmers, almost all of the tournament champions tend to be white males. Thus, there is somewhat of a trade-off in terms of selecting the highest scorers on validated tests, on the one hand, and creating diversity in the workforce, on the other hand.26 One potential way to “have your cake and eat it too” is to simply rank the scores of different races or gender groups within their own groups, and then take perhaps the top 10% of scorers from each group, instead of the top 10% that would be obtained if one ignored race or gender. Many observers feel that this practice is justified because it levels the playing field in a context where bias works against African Americans. However, the 1991 act specifically outlaws this practice (sometimes referred to as race norming). Some believe that race norming is just reverse discrimination and gives preferential treatment—rather than equal treatment—to minorities, and thus this practice has been challenged in court. Two specific Supreme Court cases show that policies that may be construed as promoting preferential treatment will not stand up in court. In the first case, voters in the state of Michigan backed an initiative that made it illegal to engage in affirmative action for minorities when it came to admissions to Michigan colleges. Because the majority of voters in this state were white, this initiative was challenged because of legal precedents that protect minorities from being targeted for unfair treatment through the political process. That is, taken to an extreme, if a majority of members of a state were white, it would not be permissible for them to support a ballot initiative that would prevent minorities from attending college at all, since doing so would be patently unfair. The challenge to the Michigan initiative claimed that its effect was close to this extreme, but the challenge was struck down by the Supreme Court, which decided that the electorate was acting within its rights.27 The Court did not necessarily say that affirmative action was illegal in this case, but rather that it was fair for the general electorate to impose its will this way, which leaves colleges that are trying to promote diversity scrambling for other alternatives, one of which was adopted at the University of Texas.28 The Supreme Court case that involved the University of Texas illustrates how difficult it can be to achieve diversity goals while still upholding merit-based selection and avoiding perceptions of reverse discrimination. Specifically, in order to increase the percentage of African American and Hispanic students in the UT system, the school made it a policy CHAPTER 6 Selection and Placement 251 to accept the top 10% of the graduating class of every high school. Because many high schools in Texas tend to be segregated by race and ethnicity, this policy worked somewhat like race norming in ensuring that members of every group found their way into college, but it was not explicitly race norming.29 To push the diversity gains even further, though, the admissions officers at UT noted that many African American students in affluent suburban schools often were rejected for admission, even though they had higher test scores than African American students from urban schools. When the school tried to reach out and accept those students, however, this policy was challenged. In 2016, a divided Supreme Court upheld the legality of the UT affirmative action program. Writing for the 4–3 majority, Justice Anthony Kennedy stated that “the university is entitled considerable deference in defining the type of institution it wished to be, including intangible characteristics, such as student body diversity that might be central to the university’s identity and educational mission.30 Whereas the issue at the heart of the UT case dealt with the underrepresentation of African American students, a separate issue deals with what to do when neutral-appearing selection methods create a situation where some minority group might exceed its representation in the general population. For example, as shown in the Competing through Environmental, Social, and Governance Practices box shows, admitting a percentage of Asian American students that reflects their percentage in the population actually holds them back. Rather than employing race norming, employers can partially achieve both goals of maximizing predicted future performance and diversity in several ways. First, aggressive recruiting of members of protected groups allows an employer to generate a larger pool of protected group members, and, by being highly selective within this larger group, the scores of admitted applicants will more closely match those of all the other groups.31 Second, as we will see later in this chapter, different selection methods have different degrees of adverse impact, and multistage selection batteries that use different methods at different stages can also help.32 Finally, one common approach that does not seem to work is to abandon the kinds of compliance-driven, evidenced-based workforce utilization reviews discussed in Chapter 5, in favor of softer, “inclusion” initiatives that express the generic value of diversity but fail to document goals and timetables statistically. Some organizations treat diversity more like a marketing campaign than an HR initiative, and it is not uncommon to see companies that won awards for their “inclusion programs,” such as Texaco and Bank of America, also later convicted of illegal discrimination. Some have noted that there is an almost complete overlap of the lists of the top 50 companies for inclusion and the top 50 companies for advertising expenditures, and the need to complement style with substance cannot be overlooked in this critical area.33 The simple truth is that the best predictors of whether a firm becomes truly diverse and avoids litigation is whether (1) there is a specific person (e.g., a diversity compliance officer) whose sole job is to monitor hiring statistics, (2) this person has the power to change hiring practices, and (3) this person is held strictly accountable in his or her own performance appraisal for achieving quantifiable results.34 Age Discrimination in Employment Act of 1967. Court interpretations of the Age Discrimination in Employment Act mirror those of the Civil Rights Act, in that if any neutral-appearing practice happens to have adverse impact on those over age 40, then the burden of proof shifts to the employer, who must show business necessity to avoid a guilty verdict. This act outlaws almost all “mandatory retirement” programs (company policies that dictate that everyone who reaches a set age must retire). COMPETING THROUGH ENVIRONMENTAL, SOCIAL, AND GOVERNANCE PRACTICES According to Harvard: “Asian-Americans Have Bad Personalities” Although the Asian American population in the United States has more than doubled in the last 30 years, the percentage of Asian Americans admitted to Harvard in 2017 was the same as it was in 1980. This is despite the fact than when it comes to ­almost all the published factors that the institution claims to use in selection (standardized test scores, grade point average, ­extracurricular activities, ratings from high school teachers, and personal essays), as a group, Asian Americans outperform Caucasians, Hispanics, and African Americans. In fact, based upon the objective evidence, if one just made selection decisions based on these factors, the Asian American acceptance rate would be 43% versus the 19% that it was in 2017. There is only one factor where Asian Americans perform poorly, and this factor alone costs them the 24% of the positions they might have otherwise earned—the dreaded “personal score.” The “personal score” rating used at Harvard purportedly captures the student’s “likability, helpfulness, courage, kindness, positive personality, and respectability.” This subjective judgment, often rendered by an admissions officer who has never met the student, has a devastating impact on Asian American students—especially those who would otherwise score highly on the rest of the selection battery. Asian ­American students who would be in the top 10% of applicants on everything other than the personal score get a “2” on a 5-point scale more than 20% of the time. One can find “2” ratings for other groups of students, but for Caucasians, these low ratings are predominantly found in the 60th to 70th percentile, and in the 70th to 80th percentile for Hispanics, and the 80th to 90th percentile for African Americans. Thus, the personal score becomes a knock-out factor for Asians who would have been selected otherwise, but has no effect on other groups because the poorly rated students would not have been admitted ­anyway. Does Harvard really believe that Asian Americans have bad personalities? After all, the Dean of Admissions at the Massachusetts Institute for Technology did call one Korean American student who was denied admission in his institution, “just another texture-less math grind.” Still, ­although the potential for stereotypes does seem to play a part in this process, many others see this case as part of a much larger battle against affirmative action. Specifically, current law does allow schools and employers to use race as a “plus factor” in a “holistic process” that ­includes all the other factors that go into a selection battery. ­However, the law precludes strict quotas in favor of one group or caps that work against one group. Critics of Harvard’s admission process argue that the personal score is just a flexible, seemingly innocuous, and—most importantly—legal mechanism that can be used to cap Asian American admissions far below 43%. Thus, some believe that ­Harvard is regulating admission rates in such a way that its population is representative of the larger diversity in United States. Indeed, Harvard does seem to be achieving this goal, but some question if this will be sustainable. Currently, the admission rate for Asian Americans mirrors their percentage of the U.S. population, and the same is true for Caucasians, Hispanics, and African Americans. Although this may seem fair, for the roughly 20% of Asian Americans who would have otherwise been ­admitted, this may seem unfair, and especially harsh, when it is falsely attributed to their bad personalities. When the principle of Stuyvesant High School in New York, a school for gifted children that is over 70% Asian American, was informed of these statistics when she was on the witness stand during a CONTINUED 252 federal lawsuit where this is all playing out, she actually broke down and started crying. When asked why she was crying she stated, “because these numbers make it seem like there’s discrimination and I love these kids, and I know how hard they work. So these all just look like numbers to you guys, but I see their faces.” Apparently, when she looks into the faces of her students, she does not see the same “texture-less math grinds” that others see. DISCUSSION QUESTIONS 1. Do you believe there is merit in organizations being ­representative of the larger ­society in which they are embedded, or should every selection decision be based totally on individual merit— regardless of the negative impact this might have on representativeness? 2. How might one’s own demographic profile affect how one balances the merits of representativeness on the one hand, with demographically blind selection ­methods on the other hand? SOURCES: K. Benner, “Asian-American Students Suing Harvard over Affirmative Action Win Justice Department Support,” The New York Times Online, August 30, 2018; N. Corn and N. Hong, “Justice Department Says Harvard Hurts Asian-Americans’ Admission Prospects With ‘Personal Rating,’” The Wall Street Journal Online, August 30, 2018; K. Reilly, “With Harvard on Trial, So Is Affirmative Action,” Bloomberg Businessweek, October 29, 2018; W. Yang, “Harvard Is Wrong That Asians Have Terrible Personalities,” The New York Times Online, June 25, 2018. For example, the Texas Roadhouse restaurant company was sued for discrimination based on this law. Whereas 20% of servers nationally are over the age of 40, this was true for less than 2% of Texas Roadhouse employees. The suit was brought by a 40-year-old woman who applied for a job at a Texas Roadhouse restaurant in Palm Bay, Florida, and was told there were no openings. A few days later, she learned that one of her daughter’s friends interviewed after she did and got a job offer. Despite this incident and the larger data on underutilization, the chain defended its action by stating that it needed younger workers to reflect its brand image and attract more customers.35 This appeal to brand image and customer preference has a long history as a “business necessity” defense, but it rarely seems to prevail in court.36 Americans with Disabilities Act (ADA) of 1990. The ADA protects individuals with physical and mental disabilities (or with a history of the same), and requires that employers make “reasonable accommodation” to disabled individuals whose handicaps may prevent them from performing essential functions of the job as currently designed. “Reasonable accommodation” could include restructuring jobs, modifying work schedules, making facilities accessible, providing readers, or modifying equipment. The ADA does not require an organization to hire someone whose disability prevents him or her from performing either critical or routine aspects of the job, nor does it require accommodations that would cause “undue hardship.” There is some degree of political pressure to increase the hiring of disabled workers, and in 2014, the Department of Labor issued new rules aimed at government contractors that decreed they should set a goal of having 7% of their workforce be composed of disabled employees. Thus, if you are applying for a job with a government contractor, you need to check a box that asks whether or not you are disabled. The ruling was controversial because many disabled workers, especially those with nonobvious physical impairments or mental impairments, are unlikely to check that box. This means that some employers may be meeting the goal but are not able to show it because of applicants’ reluctance to check the box.37 One source of disabled workers that employers are increasingly tapping in to is the pool of Gulf War–era veterans. This pool of potential workers was once highly underutilized by employers and experienced an unemployment rate well over 30%. That proportion has since dropped to less than 10%. 253 254 CHAPTER 6 Selection and Placement Although part of the drop is attributable to a general improvement in the labor market, some of it is also due to the heroic efforts of programs like the Wounded Warrior Project, which seeks to help disabled veterans find jobs in the private sector. This group helps veterans translate how skills and job categories in the military match civilian jobs.38 For example, the skills of a “medic” in the military vary from case to case, and effectively entering the civilian medical industry at the right spot requires help communicating the similarities and differences between the two domains. This group also led efforts to reclassify the idiosyncratic codes for technical skills used in the military to the codes used by the Department of Labor’s O*NET system (see Chapter 4 for a description of O*NET). This made it much easier for veterans to translate their skills into terms more broadly used in the private sector.39 LO 6-4 List the common methods used in selecting human resources. LO 6-5 Describe the degree to which each of the common methods used in selecting human resources meets the demands of reliability, validity, generalizability, utility, and legality. Types of Selection Methods In the first half of this chapter, we laid out the five standards by which to judge selection measures. In this half of the chapter, we examine the common selection methods used in various organizations and discuss their advantages and disadvantages in terms of these standards. INTERVIEWS A selection interview is a dialogue initiated by one or more persons to gather information and evaluate the qualifications of an applicant for employment. The selection interview is the most widespread selection method employed in organizations, and there have been literally hundreds of studies examining their effectiveness.40 Unfortunately, the long history of research on the employment interview suggests that, without proper care, it can be unreliable, low in validity, and biased against a number of different groups. Moreover, interviews are relatively ® costly because they require at least one person to interview Visit your instructor’s Connect® course and access your eBook another person, and these people are often in different locato view this video. tions. Finally, in terms of legality, the subjectivity embodied in the process, as well as the opportunity for unconscious bias effects, often makes applicants upset, particularly if they fail to get a job after being asked apparently irrelevant questions. In the end, subjective selection methods like the interview must be validated by traditional criterion-related or contentvalidation procedures if they show any degree of adverse impact. Fortunately, more recent research has pointed to a number of concrete steps that one can employ to increase the utility of the personnel selection interview. First, HR staff should keep the interview structured, standardized, and focused on accomplishing a small number of goals. That is, they should plan to come out of each interview with quanti“Its really about understanding whether or not they’ll fit tative ratings on a small number of dimensions that are into an organization. Things like behavioral interviews are really important.” observable (like interpersonal style or ability to express one—Jim Duffy, Executive Vice President and self) and avoid ratings of abilities that may be better meaChief Human Resources Officer, CIT Group, Inc. sured by tests (like intelligence). In addition to coming out Source: Video produced for the Center for Executive of the interview with quantitative ratings, interviewers Succession in the Darla Moore School of Business at the should also have a structured note-taking system that will University of South Carolina by Coal Powered Filmworks CHAPTER 6 Selection and Placement 255 aid recall when it comes to justifying the ratings. Finally, overall judgments of applicants should be left until the very end of the process, because implicit, first impression biases often cloud initial interpersonal reactions.41 Selection interviews should be focused totally on rating and ranking applicants, and even though it may be tempting to accomplish other goals like recruiting the candidate, this temptation needs to be resisted.42 As we saw in Chapter 5, recruitment interviews should be kept separate from selection interviews because these types of dual-purpose interviews tend to fail on both scores. Then, after a sufficient amount of time to obtain performance evaluation data, interviewers should get normative feedback on which of the employees that they selected performed well versus poorly so that they can learn from past experience.43 When it comes to content, interviewers should ask questions dealing with specific situations that are likely to arise on the job, and use the responses to determine what the person is likely to do in those situations. These types of situational interview items have been shown to have high predictive validity.44 Situational judgment items come in two varieties, as shown in Table 6.2. Some items are “experience-based” and require the applicant to reveal an experience he or she had in the past when confronting the situation. So for example, both Amazon and Google were recruiting thousands of experienced software engineers for their new headquarters, but the experience they were looking for differed. In interviews, Amazon was looking for software engineers that had experience in coding languages like C+ and Java, whereas Google needed people with experience in Linus and Python.45 In contrast, some items are “future oriented.” That is, although the idea that asking one about his or her past experience would seem obvious, unlike Amazon and Google, Situational interview An interview procedure where applicants are confronted with specific issues, questions, or problems that are likely to arise on the job. Table 6.2 Examples of Experience-Based and Future-Oriented Situational Interview Items Experience-based Motivating employees Resolving conflict Overcoming resistance to change Future-oriented Motivating employees Resolving conflict Overcoming resistance to change “Think about an instance when you had to motivate an employee to perform a task that he or she disliked but that you needed to have done. How did you handle that situation?” “What was the biggest difference of opinion you ever had with a coworker? How did you resolve that situation?” “What was the hardest change you ever had to bring about in a past job, and what did you do to get the people around you to change their thoughts or behaviors?” “Suppose you were working with an employee who you knew greatly disliked performing a particular task. You needed to get this task completed, however, and this person was the only one available to do it. What would you do to motivate that person?” “Imagine that you and a co-worker disagree about the best way to handle an absenteeism problem with another member of your team. How would you resolve that situation?” “Suppose you had an idea for a change in work procedures that would enhance quality, but some members of your work group were hesitant to make the change. What would you do in that situation?” 256 CHAPTER 6 Selection and Placement companies like Intel and Github care more about the applicant’s potential future rather than their past. These companies are willing to hire self-taught programmers or programmers that attended coding boot camps, even if they have never practiced those skills on a real job.46 Organizations that employ future-oriented items tend to emphasize on-the-job training specifically focused on their own needs rather than years of past experience meeting some other employers’ needs. These examples show the competitive dynamics associated with HR activities, in the sense that one can compete by emphasizing experience, paying higher wages, and have reduced training needs, or on the other hand, compete by de-emphasizing experience, paying lower wages, but increasing training budgets and socialization expenses. This is critical because as we noted earlier, due to recent labor shortage, more companies are moving to a “no experience necessary” model. In fact, between 2012 and 2017, the percentage of employers who required three years’ experience dropped from 30% to 20%, a move that opens up employment opportunities for 1.2 million people. Companies with a “no experience necessary” policy need to have interviewers who are skilled at recognizing an applicant’s potential for growth and fit with the company’s culture. As Greg Pryor, the vice president of HR at Workday Inc. notes, “this puts a huge responsibility on the company because the burden of proof moves from the candidate to the interviewer.”47 Indeed, perhaps for this reason, research suggests that although both types of items can show validity, experience-based items often outperform future-oriented ones.48 It is also important to use multiple interviewers who are trained to avoid many of the subjective errors that can result when one human being is asked to rate another. For example, at Google, there were definite concerns with demographic similarity bias in interviews, because their own analysis of local data was suggesting that managers were hiring people who seemed just like them. To eliminate this problem, Google now compiles elaborate files for each candidate, and then has all interviews conducted by groups rather than individuals. Laszlo Bock, then vice president for Google’s People Operations, noted that “we do everything to minimize the authority and power of the lone manager in making hiring decisions that are going to affect the entire company.”49 Indeed, many have suggested that one of the major causes of the large number of sexual harassment claims registered in the field of security brokerage is that the broker, who is usually male, makes hiring and compensation decisions regarding female administrative assistants by himself with no input from the firm’s HR staff. These individual brokers, however, are not sole proprietors, but rather employees themselves, so this practice is being curtailed at many of the largest companies.50 Many companies find that a good way to get “mulDigital Vision/Getty Images tiple eyes” on an applicant is to conduct digitally When more than one person is able to interview a taped interviews, and then send the digitized files candidate for a position, there is significant advantage in (rather than the applicants) around from place to removing any errors or biases that a single individual place. Some employers find that the lack of true intermight make in choosing the correct person for the job. In action that can take place in videos limits their value; today’s technological world, it is becoming easier for hence, the use of face-to-face interactive technology multiple people to give their input in an interview by like Skype to conduct virtual interviews over long watching a video recording or listening via conference distance.51 call if they cannot be there in person. CHAPTER 6 Selection and Placement 257 REFERENCES, APPLICATION BLANKS, AND BACKGROUND CHECKS Except in extreme cases, nearly all employers use some method for getting background information on applicants before an interview. This information can be solicited from the people who know the candidate through reference checks. The evidence on the reliability and validity of reference checks suggests that these are, at best, weak predictors of future success on the job. The main reason for this low validity is that the evaluations supplied in most reference letters are so positive that it is hard to differentiate applicants. This problem with reference letters has two causes. First, the applicant usually gets to choose who writes the letter and can thus choose only those writers who think the highest of her abilities. Second, because letter writers can never be sure who will read the letters, they may justifiably fear that supplying damaging information about someone could come back to haunt them. Thus, it is clearly not in the past employers’ interest to reveal too much information beyond job title and years of service. Another problem with reference checks is that applicants do not always tell the truth when it comes to listing their references. In fact, 30% of the companies that check references find false or misleading references on applications. Michael Erwin, a career advisor at CareerBuilder, notes, “For some reason, people think companies aren’t going to check their references and therefore they think they can get away with all sorts of fabrications.” In reality, 80% of companies do in fact check references prior to offering someone an interview or prior to making an offer.52 In addition to outside references, employers can also collect background information from the applicants themselves. The low cost of obtaining such information significantly enhances its utility, especially when the information is used in conjunction with a welldesigned, follow-up interview that complements, rather than duplicates, the biographical information bank. One of the most important elements of biographical information deals with educational background. Indeed, providing background information on one’s education is probably one of the few things that a written résumé is still good for in this day and age. In some cases, employers are looking for specialized educational backgrounds reflected in functional degrees such as business or nursing or engineering, but in other cases, employers are just looking for critical-thinking and problem-solving skills that might be associated with any college degree.53 This focus on education is attributed to the nature of the economy, which increasingly demands people with high levels of education. Indeed, it is ironic that despite relatively high levels of employment, many employers find it impossible to find people with the skills they need.54 The term education gap has been coined to capture the difference between the average years of education required in a job listing in a given area, and the average years of education in that same area. For the nation as a whole, the education gap runs at about 5%, but in some cities, like Las Vegas, the gap exceeds 10%. Areas that have larger education gaps experience much higher rates of unemployment and are usually the last to show signs of job recovery during an economic expansion.55 Again, as with the interview, the biggest concern with the use of biographical data is that applicants who supply the information may be motivated to misrepresent themselves. Some research suggests that over 80% of job applications contain some misleading or false information; so again, hiring sight unseen is a very risky proposition.56 For example, investigators found that Timothy Loehmann, the police officer who shot Tamir Rice, an innocent 14-year-old boy in Cleveland, had falsified his application hiding several past terminations for overly aggressive behavior. This resulted in a $6 million wrongful death lawsuit that might have been prevented with a more thorough background check.57 258 CHAPTER 6 Selection and Placement To prevent embarrassing episodes, many employers hire outside companies to do background checks on employees. For example, Steve Masiello applied for a position coaching basketball at the University of South Florida, a routine background check revealed that he had lied on his application when he stated that he had earned a degree in communications from the University of Kentucky in 2000. This came as an embarrassment to Masiello’s current employer, Manhattan College, which also required a college degree for any top coaching position but apparently never checked on this when they hired Masiello.58 A similar failure to conduct a routine background check was partially to blame for the 2015 jailbreak at the Clinton Correctional Facility in New York, where an employee helped two convicted murderers escape.59 An investigation in the wake of this incident revealed widespread lapses and failures to conduct adequate background checks at the prison.60 PHYSICAL ABILITY TESTS Although automation and other advances in technology have eliminated or modified many physically demanding occupational tasks, many jobs still require certain physical abilities or psychomotor abilities. In these cases, tests of physical abilities may be relevant not only to predicting performance but also to predicting occupational injuries and disabilities.61 There are seven classes of tests in this area: ones that evaluate (1) muscular tension, (2) muscular power, (3) muscular endurance, (4) cardiovascular endurance, (5) flexibility, (6) balance, and (7) coordination.62 The criterion-related validities for these kinds of tests for certain jobs, such as firefighting, are quite strong.63 Unfortunately, these tests, particularly the strength tests, are likely to have an adverse impact on some applicants with disabilities and many female applicants. For example, roughly two-thirds of all males score higher than the highestscoring female on muscular tension tests.64 This difference between the sexes in physical strength was once used to legally bar women from certain jobs in the military; however, this is no longer the case, and all jobs within the U.S. military were opened up to women in 2015.65 There are two key questions to ask in deciding whether to use these kinds of tests. First, is the physical ability essential to performing the job, and is it mentioned prominently enough in the job description? Neither the Civil Rights Act nor the ADA requires employers to hire individuals who cannot perform essential job functions, and both accept a written job description as evidence of the essential functions of the job. Second, is there a probability that failure to adequately perform the job would result in some risk to the safety or health of the applicant, co-workers, or clients? The “direct threat” clause of the ADA makes it clear that adverse impact against those with disabilities is warranted under such conditions. Invoking this clause can sometimes cause controversy, as in 2014, when United Parcel Service (UPS) cited this clause to support the decision to fire a pregnant worker because she could not lift packages weighing more than 20 pounds. UPS was sued, because it routinely made accommodations for injured employees, and the same woman who was fired for being pregnant would have been accommodated had she hurt her back at work. The company tried to argue that pregnant women were being treated the same way as men who were injured outside of work.66 UPS eventually settled out of court and eliminated this policy, probably as much for public relations reasons as for any other factor related to business necessity.67 An important lesson to learn from litigation that involves terminating pregnant women (or placing them on unpaid leave) is that letters from medical personnel to companies CHAPTER 6 Selection and Placement 259 need to be very specific when it comes to laying out what duties can and cannot be performed at different stages of the pregnancy. Form letters or letters that are too general in nature are often interpreted in the least favorable light for the employee by employers looking to save money or reduce their exposure to risk.68 COGNITIVE ABILITY TESTS Cognitive ability tests differentiate individuals based on their mental rather than physical capacities. Cognitive ability has many different facets, although we will focus only on three dominant ones. Verbal comprehension refers to a person’s capacity to understand and use written and spoken language. Quantitative ability concerns the speed and accuracy with which one can solve arithmetic problems of all kinds. Reasoning ability, a broader concept, refers to a person’s capacity to invent solutions to many diverse problems. Some jobs require only one or two of these facets of cognitive ability. Under these conditions, maintaining the separation among the facets is appropriate. However, many jobs that are high in complexity require most, if not all, of the facets; hence, one general test is often as good as many tests of separate facets. Highly reliable commercial tests measuring these kinds of abilities are widely available, and they are generally valid predictors of job performance in many different kinds of contexts, including widely different countries.69 For example, in sales, based on the results of studies he ran on hundreds of salespeople and hundreds of applicants for sales positions, Wharton Professor Adam Grant concluded that “cognitive ability was more than five times more powerful than emotional intelligence when it came to performance.” The results from his research suggested that an employee with high cognitive ability generated annual revenue of over $195,000, compared with $159,000 for those with moderate cognitive ability and $109,000 for those with low cognitive ability.70 The validity of these kinds of tests is slightly related to the complexity of the job, however, in that one sees higher criterion-related validation for complex jobs than for simple jobs. One of the major drawbacks to these tests is that they typically have adverse impact on some minority groups. Indeed, the size of the differences is so large that some observers have advocated abandoning these types of tests for making decisions regarding who will be accepted for certain schools or jobs. This is somewhat ironic in that these standardized tests were originally designed to be anti-elitist and to help identify talented individuals who may not be high in socioeconomic status but were still very bright by objective standards. However, over time, the tests have become a major hurdle to many disadvantaged groups by restricting their college opportunities and thus are now perceived as elitist due to their adverse impact on minorities.71 The notion of race norming, mentioned earlier, was born of the desire to use these high-utility tests in a manner that avoided adverse impact. Although race norming was made illegal by amendments to the Civil Rights Act, some people have advocated the use of banding both to achieve the benefits of testing and to minimize its adverse impact. The concept of banding suggests that similar groups of people whose scores differ by only a small amount all be treated as having the same score. Then, within any band, preferential treatment is given to minorities. Most observers feel preferential treatment of minorities is acceptable when scores are tied, and banding simply broadens the definition of what constitutes a tied score. Like race norming, banding is controversial, especially if the bands are set too wide.72 As with all the selection measures we have seen so far, a concern is that applicants may be tempted to cheat in order to score well on whatever instrument is used to make Cognitive ability tests Tests that include three dimensions: verbal comprehension, quantitative ability, and reasoning ability. Verbal comprehension Refers to a person’s capacity to understand and use written and spoken language. Quantitative ability Refers to the speed and accuracy with which one can solve arithmetic problems of all kinds. Reasoning ability Refers to a person’s capacity to invent solutions to many diverse problems. 260 CHAPTER 6 Selection and Placement selection decisions. Cheating on tests is hardly a new phenomenon, however. What is new is the degree to which the use of computerized testing and social networking has changed the nature and scope of cheating. The term question harvesting has been coined to capture the process whereby test takers use advanced technology to download questions or capture images of questions with digital cameras or other devices while taking a test, and then transmit the content of the test wirelessly to people outside the testing facility, who then post the questions for future test takers.73 Cheating scandals such as these become particularly controversial when allegations are based on nationality. For example, the evidence of wrongdoing with respect to test scores reported from China and South Korea grew so large in 2014 that the Educational Testing Service withheld the scores for applicants from these countries until all the allegations could be sorted out.74 PERSONALITY INVENTORIES Whereas ability tests attempt to categorize individuals relative to what they can do, personality measures tend to categorize individuals by what they are like. The number of firms employing personality tests as screens has ballooned over the years, from just 26% in 2000 to just under 60% in 2014.75 Research suggests that there are five major dimensions of personality, known as the “Big Five”: (1) extroversion, (2) adjustment, (3) agreeableness, (4) conscientiousness, and (5) openness to experience. Table 6.3 lists each of these with a corresponding list of adjectives that fit each dimension. Although it is possible to find reliable, commercially available measures of each of these traits, the evidence for their validity and generalizability is mixed at best.76 For example, conscientiousness, which captures the concepts of self-regulation and selfmotivation, is one of the few factors that displays any validity across a number of different job categories, and many real-world managers rate this as one of the most important characteristics they look for in employees. People who are high in conscientiousness tend to show very good self-control when pursuing work goals and are especially adept at overcoming challenges and obstacles, relative to people low in this trait.77 In contrast, lack of conscientiousness among employees creates a number of difficulties for employers, some of which are shown in the Competing through Globalization box. Instead of showing strong direct and positive correlations with future performance across all jobs, the validity coefficients associated with personality measures tend to be job specific. For example, extroverts tend to excel in jobs likes sales or politics Table 6.3 The Five Major Dimensions of Personality Inventories 1. Extroversion 2. Adjustment 3. Agreeableness 4. Conscientiousness 5. Openness to experience Sociable, gregarious, assertive, talkative, expressive Emotionally stable, nondepressed, secure, content Courteous, trusting, good-natured, tolerant, cooperative, forgiving Dependable, organized, persevering, thorough, achievement-oriented Curious, imaginative, artistically sensitive, broadminded, playful COMPETING THROUGH GLOBALIZATION Phantom Hires Haunt Saudi Change Efforts The business model for the “old Saudi Arabia”—and hence employers working in the region—was relatively simple. The Kingdom generated huge sums of money from oil, a non– labor intense industry, and staffed both low-skilled and high-skilled jobs with foreign workers. Saudi citizens were basically offered a deal where they were provided free health care, education, and utilities, as well as public-sector jobs, in return for not challenging the royal family’s absolute authority. The rulers of the country basically created a “nanny state” where citizens were sheltered from cradle to grave when it came to working in more competitive and demanding private-sector jobs. Although it would be unfair to suggest that all Saudi’s lack conscientiousness, this was a culture where lack of this trait was not always totally debilitating. However, all of that is now changing, and the country’s new Crown Prince Mohammed bin Salmon, often referred to as “MBS,” launched a program called “Saudization” that aims to move his citizens from the public payroll to private payrolls. There were several forces that were driving this change effort. First, lower worldwide demand for Saudi oil was putting a dent in the country’s treasury, which made it harder for the royal family to hold up their part of the deal. Second, most of the money that the government paid to Saudi citizens was spent outside the country. The state needed people to spend more of that money at home and, hence, expand local businesses. Finally due to the strict Islamic culture within Saudi Arabia, where beheadings are not unusual and women are not allowed to work, it was always difficult to recruit skilled workers from more liberal Western cultures. This problem was only intensified in 2018 after the ­brazen dismemberment and murder of Washington Post journalist Jamal Khashoggi that sent thousands of Western employees for the exits. In the wake of all these developments, Saudization was the only answer. As one analyst noted, “the old Saudi Arabia business model is not viable— the country was going to collapse.” In order to help his citizens make the transition, the Crown Prince authorized a set of hiring quotas. Most of these quotas were targeted at low-skill jobs because, although MBS eventually wants to make inroads into solar power, high-tech, and entertainment industries, few Saudis have the skills that are required for jobs in these industries. Thus, the quotas were aimed at bakeries, electronic shops, furniture stores, and jewelry stores and demanded that all workers in such establishments be Saudis. The only problem with this approach, however, was that most Saudi citizens were either not qualified for this work or, if they were qualified, were unwilling to do it. As the owner of the Osool jewelry (a 25-store chain) complained, “this is gold, it cannot just be handled and sold by ­anyone. There are not enough trained and qualified young Saudis. Or they quit due to long hours.” When confronted with terrorizing raids on their businesses, however, the employers needed to do something, and so many turned to “phantom worker programs.” Employers would hire and register Saudi citizens, put them on the payroll, but agree that the person hired would never actually show up for work. As the owner of Osool states, “We have their names, they are registered, but they don’t work.” Phantom workers add to the employer’s cost, but clearly provide no value other than to prevent one from being fined or imprisoned. The combination of increased expenses and fears of raids prompted many employers to leave the country. This, ironically, places even more urgency on Saudization and the process of becoming more self-sufficient when it comes to hiring Saudi workers. CONTINUED 261 DISCUSSION QUESTIONS 1. How does the unique nature of Saudi culture make it difficult to change the rules of engagement for workers like the ones seen here? 2. What aspects of Saudi politics make it easier to change the rules of engagement for these workers? 3. In the end, which of these two forces are likely to win out and why? SOURCES: M. Rashad and S. Kalin, “Saudi Arabia Needs 1.2 Million Jobs by 2022 to Hit Unemployment Target,” Reuters Online, April 25, 2018; M. Stancati and D. Abdulaziz, “Saudi Arabia’s Economic Revamp Means More Jobs for Saudis—If Only They Wanted Them,” The Wall Street Journal Online, June 19, 2018; J. Northam, “Saudi Arabian Businesses Struggle with Rule to Replace Foreign Workers With Locals,” NPR Online, May 28, 2018; “Mobily Penalized in Saudi Arabia for Not Hiring Enough Nationals,” Bloomberg Businessweek Online, October 8, 2018. because these jobs demand gregariousness and assertiveness, two of the central features shared by all extroverts. In contrast, introverts are better at studying and working in isolation, and hence they are best at jobs like accountant or research scientist because these jobs demand patience and vigilance. Extroverts tend to enjoy working in team-oriented environments more than introverts, but this does not always spill over into performance differences for engaging in teamwork.78 Both extroverts and introverts can become effective leaders, although they achieve effectiveness in different ways. Extroverts tend to be top-down, autocratic and charismatic leaders who motivate followers by getting them emotionally engaged. In contrast, effective introverted leaders tend to be more bottom-up, participative leaders who listen to empowered employees and then engineer reward structures so that people are working toward their own self-interests.79 One important element of staffing in team-based structures, however, relates to how the selection of one team member influences the requirements associated with other team members.80 In some cases, organizations might try to select people who have similar values and personality traits in order to create a strong team culture. When there is a strong team culture, everyone shares the same views and traits, promoting harmony and cohesiveness.81 In other cases, people putting together a team go out of their way to make sure that the people on the team have different values and personalities. The hope here is that a diversity of opinion promotes internal debate and creativity. For example, at Pinterest, programming teams are put together one person at a time, and each new person added to the team is evaluated and selected based on some unique trait or perspective they might bring to the team that is not already there.82 The concept of “emotional intelligence” is also important in team contexts and has been used to describe people who are especially effective in fluid and socially intensive contexts. Emotional intelligence is traditionally conceived of as having five aspects: (1) self-awareness (knowledge of one’s strengths and weaknesses), (2) self-regulation (the ability to keep disruptive emotions in check), (3) self-motivation (the ability to motivate oneself and persevere in the face of obstacles), (4) empathy (the ability to sense and read emotions in others), and (5) social skills (the ability to manage the emotions of other people).83 Relative to standard measures of ability and personality, there has not been a great deal of scientific research on emotional intelligence, and critics have raised both theoretical and empirical questions about the construct itself. Theoretically, som...
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer



Case Study: Kinaxis Chooses Sales Reps with Personality




Case Study: Kinaxis Chooses Sales Reps with Personality
Bob Dolan Selection Methods
Dolan used interviews, biographical details (what major platforms they had earned and
resumes), and inventories of personalities as screening tools in the course of recruiting sales
representatives. The sequence in which Bob utilized these techniques was effective for him, but
he should have chosen alternative ...

Great! Studypool always delivers quality work.


Similar Content

Related Tags