PsyOps Personality Assessment Through Gaming Behavior

User Generated

zbb1105

Writing

Description

Read one of the papers discussed in Chapter 6, "Do Games Learn From You When You Play Them?":

  • Player Modeling using Self-Organization in Tomb Raider: Underworld
  • Predicting Player Behavior in Tomb Raider: Underworld
  • Give Me a Reason to Dig: Minecraft and Psychology of Motivation
  • Introverted Elves & Conscientious Gnomes: The Expression of Personality in World of Warcraft
  • PsyOps: Personality Assessment Through Gaming Behavior

For the paper you chose:

1. Write a short, informal summary of what the paper shows and what you think the key takeaways might be. You of course don't have to follow this writing style, but as an example, a log of short summaries of somebody papers I've once read on this website can be seen.

2. Identify one interesting data-driven finding about players that the paper shows. How do the authors establish the finding that you chose? How solid or certain is the finding from a scientific perspective?

3. Identify one question or uncertainty about the study: Does it fall short somewhere, or leave questions unanswered?

12 times new roman.

Unformatted Attachment Preview

Introverted Elves & Conscientious Gnomes: The Expression of Personality in World of Warcraft Nick Yee1, Nicolas Ducheneaut1, Les Nelson1, Peter Likarish2 1 2 Palo Alto Research Center 3333 Coyote Hill Road, Palo Alto, CA [nyee, nicolas, lnelson]@parc.com ABSTRACT Personality inference can be used for dynamic personalization of content or system customization. In this study, we examined whether and how personality is expressed in Virtual Worlds (VWs). Survey data from 1,040 World of Warcraft players containing demographic and personality variables was paired with their VW behavioral metrics over a four-month period. Many behavioral cues in VWs were found to be related to personality. For example, Extraverts prefer group activities over solo activities. We also found that these behavioral indicators can be used to infer a player’s personality. Author Keywords Virtual worlds, online games, personality, Big 5, inference. ACM Classification Keywords H5.m. Information interfaces and presentation (e.g., HCI): Miscellaneous. General Terms Human Factors INTRODUCTION Games can be character revealing. One of the author’s fathers once noted that he enjoys playing golf with his business partners because it lets him see which of them cheats on the golf course. The underlying implication, of course, is that how someone behaves on a golf course says something about how they may behave during a business transaction. And online gamers who have developed romantic relationships in virtual worlds [34] often say something similar: “The game WAS the reason we fell in love. Going through all the adventures and quests together really built our relationship. We found out how the other person is when they are mad, tired, sad, happy, excited, annoyed, etc.” [City of Villains, Female, Age 25] Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CHI 2011, May 7–12, 2011, Vancouver, BC, Canada. Copyright 2011 ACM 978-1-4503-0267-8/11/05....$10.00. University of Iowa Iowa City, IA peter-likarish@uiowa.edu The unique affordances of virtual worlds offer an unparalleled platform for examining the intersections between personality and behaviors in virtual environments. On the other hand, unlike personality expression in physical settings, online games allow, or even encourage, users to behave in a manner inconsistent with their everyday identities. Thus, in this study, we ask:  Is it true that a person’s personality can be inferred from how they behave in a virtual world?  And if so, what specific virtual cues are highly indicative of a person’s introversion or conscientiousness (for example)? Being able to infer a user’s personality from online cues has direct relevance to HCI research, given the field’s longstanding interest in interface personalization and system customization [e.g., 16, 24]. Indeed, knowing more about a user’s personality could help design systems more responsive to users’ needs in areas as diverse as ecommerce, social software, and recommender systems, to name a few. In this paper, we use data from the widely popular massively multiplayer online game (MMOG), World of Warcraft (WoW), to answer the two questions above. We then use our results to discuss how personality data could be used in the design of future online systems, being mindful of some important limitations and potential pitfalls also suggested by our research. The Expression of Personality Studies in personality psychology have repeatedly shown that judgments of personality at zero acquaintance (i.e., by strangers) are moderately accurate. More importantly, the specific cues used to infer different personality traits are consensually shared. In other words, personality is readily expressed in specific cues in everyday life. This has been shown to be true for brief face-to-face encounters [9, 15]. For example, in an earlier study involving video-taped faceto-face conversations [9], Extraverted individuals spoke louder, with more enthusiasm and energy, and were more expressive with gestures. Other studies have researched personality inference by examining an individual’s bedroom or office [12], or their music collection [23]. For example, in the study of personal spaces, Conscientious individuals had well-lit, neat, and well-organized bedrooms. And individuals who scored high on Openness to Experience had more varied books and magazines. This line of research has also extended to computermediated communication (CMC). In particular, studies have shown that moderately accurate personality impressions can be formed based on an individual’s personal website [18, 28], Facebook profile [3], email content [10], blog content [32], and even an individual's email address—the thinnest slice of CMC possible [2]. For example, in terms of linguistic output on blogs, Agreeable individuals were more likely to use the first person singular, words related to family, and words related to positive emotions (e.g., happy, joy). Conscientious individuals were more likely to use words related to achievement. These studies illustrate that we leave behind personality traces in both the physical and digital spaces that we inhabit. Given that the average online gamer spends over 20 hours a week in a virtual world [31, 33], it is not difficult to imagine that some amount of personality traces could be gleaned from logs of their virtual interactions as well. Limits to Personality Expression? On the other hand, there are also reasons to believe that personality may not be readily expressed in virtual worlds. First of all, previous studies have largely focused on personality expression in everyday settings or linguistic output online. It is unclear how or whether personality is expressed via non-human bodies doing non-human things in a fantasy world (e.g., gnomish priests resurrecting the dead with magical light rays). Related to this point, some scholars, like Turkle, have suggested that VWs allow us to constantly reinvent ourselves [27]. If the strongest interpretation of this notion were true, it would imply that there might be a clean break between personality and how a person behaves in a VW. In other words, people could express or reinvent themselves idiosyncratically in VWs, such that shared cues of personality would not exist. And finally, there is also evidence that users do alter their behaviors in online games. For example, studies have shown that role-players tend to be more imaginative and thus willing to experiment with their online personas [5, 26]. And studies of online dating [13] and online gaming [4] have shown that users in both settings tend to idealize their online personas to some degree. In particular, some studies have revealed that tendencies to idealize self-representation online are moderated by poor self-esteem [4, 7]. Thus, identity experimentation and individual variations in that experimentation may suppress stable personality expression cues in virtual worlds. The Collection of Personality Data Previous studies of personality expression have tended to rely on linguistic output or behavioral traces. These traces are often artifacts of behaviors over time. For example, a person who is low on Conscientiousness may often forget to water their plant. A withered plant in a bedroom is an example of a behavioral trace. Of course, as some researchers have suggested [20], we should also study actual behaviors as they occur. These researchers argue that observations of individuals in their natural settings and "humdrum lives" (pg. 862) may yield a better understanding of the link between personality and behavior. The problem is that the recording of behaviors in natural settings and the subsequent coding are daunting tasks using traditional tools. Shadowing and video recording individuals is a laborious method that significantly constrains sample size. Recent technology has begun to offset the daunting nature of behavioral data collection, however. For example, in a study of how personality is related to everyday linguistic output [20], researchers used an electronically activated recorder which was programmed to record a participant’s acoustic space for 30 seconds every 12 minutes. A dictionary-based software tool was then used to generate quantitative linguistic metrics of these recordings. Behavioral Data Collection in Virtual Worlds Virtual Worlds (VWs) offer unique affordances for studying the link between personality and behavior. For the purposes of this paper, we define VWs as graphical environments that enable geographically-distant individuals to interact via avatars (i.e., digital representations of users). It is also important to note that VWs are no longer academic prototypes or niche cultures, but have become mainstream interaction platforms. For example, WoW has over 11 million active monthly subscribers [30], and the Facebook game FarmVille has over 80 million active monthly users. VWs offer three unique features in terms of collection of natural behavioral data. First, unlike the physical world where it would be unfeasible to follow everyone around with video cameras, VWs come inherently instrumented. The computer systems running the VWs already track the movement and behavior of every avatar to make interactions possible (e.g., orienting avatars so that they can look at each other). Second, these high-precision sensors operate at all times. Thus, it is possible to generate not only snapshot data, but longitudinal behavior profiles for every user in a particular VW (e.g., see [8]). And finally, all these observations can be performed unobtrusively, thereby significantly reducing the observer effect [29]—participants cannot react to the camera if the camera is invisible. Indeed, a recent study has illustrated that there are connections between personality and virtual behaviors in the VW Second Life [35]. In that study, 76 students were asked to participate in Second Life for six weeks while “wearing” a scripted virtual tracking device that captured some of their behaviors and linguistic output. The findings revealed some interesting correlations. For example, high Conscientiousness was positively correlated with geographical movement. There were several weaknesses in that study, however. First and foremost, it is difficult to capture natural behavior by assigning users to participate in a VW not of their own choosing. Being able to observe actual users would likely yield more reliable data. Second, only data from one VW was collected. Given that much of SL resembles suburban America [1], it would be helpful to gather data from additional VWs (and in particular fantasy-based online games) to see if the results generalize. Third, participants in that study were only asked to spend six hours each week in Second Life. On the other hand, we know that players of other VWs spend on average 20 hours a week (without being asked to) in games like WoW [33]. In other words, the participant sample may not be representative of VW users in either demographics or usage patterns. And finally, many of the correlations found in that study did not align with trait definitions of the personality variables used—e.g., virtual behaviors that correlated with Agreeableness were not related in obvious ways to Agreeableness. Thus, a replication in a different VW with existing users may help clarify whether the results are an artifact of the nature of Second Life or how people behave in VWs in general. Research Questions We focus on two research questions in this paper. While previous studies have examined personality expression in everyday settings, we were interested in examining whether and how personality is expressed in online games. To clarify and expand upon previous findings of personality correlates in VWs, we focus on the online game World of Warcraft in this paper with a sample of active players. Our first research question is thus: RQ1. What are the behavioral correlates of personality in an online game? If indeed personality is expressed in consistent cues in VWs, a pertinent question is whether these cues can be used specifically for personality inference. Thus, our second research question is: RQ2. How well can we infer someone’s personality from only observing their virtual behaviors? METHOD Given our focus on the online game WoW, we will begin by first briefly describing the game context to lay the foundation for understanding the variables we use as behavioral indicators. World of Warcraft WoW is currently one of the more popular online games available commercially [30]. Unlike Second Life (SL) where users create most of the in-world content (including buildings, clothing, hair styles, and avatar bodies), content in WoW is almost entirely created and designed by the company running the game. And unlike the open sandbox nature of SL, WoW uses a typical “leveling up” formula seen in computer role-playing games. Specifically, players start at level 1 and kill monsters to become higher level and acquire better weapons and armor in order to kill bigger monsters. Along the way, the game encourages players in different ways to collaborate with other players. Users can also create characters with different skill sets that complement each other. For example, heavily-armored tank classes shield the group from enemy attacks while lightly-armored damage dealing DPS (damage per second) classes deal damage to enemies and healing classes restore health lost in combat. In short, WoW is a collaborative virtual environment [22]. WoW draws from an established lore from the Warcraft franchise. Briefly, players must choose to belong to one of two primary factions—the Alliance or the Horde. Each faction has five distinct races, e.g., Night Elves or Trolls. A variety of rules dictate where and when players may attack and kill each other. Thus, a distinction is made between PvP (player-vs-player) activities and PvE (player-vs-environment) activities. PvP activites can range from one-to-one duels to large 40 vs. 40 battlegrounds (BGs). And in general, it is a player’s choice as to how much PvP activity they want to engage in. Players in WoW communicate via typed chat and might also use VoIP tools to communicate via speech. The game also provides a modest set of emotes (e.g., /hug). Players are also able to specialize in crafting professions and convert collected raw ingredients into finished goods, such as in tailoring or cooking. There is also a system of Achievements that keeps track of a wide variety of combat and non-combat based objectives. There are Achievements for zones explored, for dungeons completed, for number of hugs given, and for cooking proficiency. These Achievement scores provide a good sense of how a player chooses to spend their time in WoW. Thus, overall, WoW offers a wide and varied set of rich behavioral cues to draw from. From class choice to amount of PvP activity, from number of emotes used to amount of world exploration, the game context offers a range of measurable behaviors. This is also a point of differentiation from SL. Due to the open nature of SL, most higher-level conceptual behaviors are not defined in the environment and it is up to individual users to define their creations. Thus, there is no overarching set of metrics beyond fairly low level behaviors, whereas in WoW, the game keeps track of many behaviors and activities using a standardized lexicon. The World of Warcraft Armory Indeed, the standardized lexicon and data format inherently lends itself to automated data collection. Blizzard, the developer of WoW, is unique in that they have provided public access to much of their internally-collected data at a website known as the Armory. In short, by searching for a character’s name, anyone can view details about their past activities, including how many hugs they have given, the quality of their equipment, the class they prefer to play, etc. More importantly, these metrics have been tracked since the character was first created. With a few clicks, we can gather a character profile that has cumulative data over many months of game play. It bears emphasizing the tremendous social science research opportunities that are made possible by this publicly-available database of longitudinal behavioral metrics. It is from the Armory that we gathered the behavioral metrics for this study. Participants 1,040 WoW players were included in the study. We recruited participants from forums dedicated to WoW, publicity on popular gaming sites (e.g., WoW.com), word-of-mouth on social media like Twitter, and mailing lists from previous studies of online gamers. We note that due to human subjects regulations, minors were excluded from participating in the study. Nevertheless, we were still able to gather data from a very wide age range (18-65). The average age of our sample was 27.03 (SD = 8.21). 26% of participants were women. Procedure Participants began by completing a web-based survey that gathered their demographic and personality information. Participants were also asked to list up to 6 WoW characters they were actively playing. Once these characters were in our database, an automated data collection system was activated. The system launches a web scraper that gathers character profiles (large XML files) from the WoW Armory. The Armory updates itself once per day (in the early morning) if a character has been active the previous day. Thus, our script follows this schedule with a daily interval and collects any updated profiles. For the results, we analyzed data from a contiguous 4-month period in the spring and summer of 2010. Personality Measures In personality psychology, the Big-5 model is the gold standard. The model measures five traits: Extraversion, Agreeableness, Conscientiousness, Emotional Stability, and Openness to Experience. For comparability, we also used an inventory that measured these 5 factors. A 20-item scale measuring the Big-Five Factor structure was drawn from the International Personality Item Pool [11]. Participants rated themselves on the inventory items using a scale that ranged from 1 (Very Inaccurate) to 5 (Very Accurate). Behavioral Measures in WoW There are two main complexities we encountered when dealing with the Armory data. First, Armory profiles consist of hundreds of variables, oftentimes in a hierarchy. For example, there is a system of Achievements in WoW that tracks progress in a variety of defined goals, such as Exploration Achievements and Dungeon Achievements. Under Exploration Achievements, there is a category for each continent. Under each continent, there is a listing for each zone. To avoid being inundated by low-level variables or including overlapping variables, we adopted an analytic strategy of looking at or generating high level variables where possible. This in turn produces more stable variables that map to psychologically meaningful concepts. For example, a notion of geographical exploration would seem to be better tracked by the overall count of zones explored rather than looking at any one particular zone. A second complexity is that most players have multiple active characters at the same time and it is not at first clear how to combine metrics across characters to derive participant-level aggregates. For example, a level 80 character can do much more damage than a level 60 character (and the function is non-linear). Thus, there is no way to easily combine damage done across characters. While these metrics needed to be normalized, there wasn’t one single variable they could all be normalized against. We therefore adopted the following normalization and variable generation strategies: 1) Static character attributes were normalized against total number of characters. E.g., ratio of male characters = male characters / total characters. 2) Variable character attributes were normalized against overall time played. E.g., for combat roles, we calculated how often each character was a tank/healer/DPS, and then calculated a participant-level ratio for each of those roles. A 0.24 tank ratio meant that across all of a participant’s characters, they spent 0.24 of their total playing time as a tank. 3) Metrics that could be normalized against another variable were normalized accordingly. E.g., the score of Exploration Achievements could be divided by the score of All Achievements to generate an Exploration ratio. This thus filtered out the raw difference between someone with many and someone with few achievements, and focused instead on how they focus their game-play. 4) For metrics that could not be normalized and were highly dependent on character level, we extracted the maximum. E.g., it is very different having one character that has 80 vanity pets compared with having 4 characters with 20 vanity pets each. In these cases, we found the maximum number of vanity pets across a participant’s characters. 5) For metrics that could not be normalized and were not dependent on character level, we calculated the sum. E.g., any level character can emote /hug as often as they’d like. In these cases, we summed up the count of hugs across all of their characters. It is important to mention that we are not claiming to have extracted all possible variables for analysis in this paper, but rather, that we have extracted a meaningful and manageable subset of higher-level variables that covers a wide range of behaviors in WoW. A description of each derived variable, along with its mean and standard deviation are presented in Table 1 below. Note that we excluded outliers that were more than 2 standard deviations away from the mean when # Variable deriving these metrics. For brevity we will only describe high-level trends in the text, but for ease of reference, we will include the table row index in round brackets after each mentioned correlate. Description M (SD) E A C ES O 1 Ratio of Alliance Characters = Alliance Chars / Total Chars 0.53 (0.47) 0.00 0.05 0.07 0.04 0.02 2 Ratio of Opposite Gender Characters Total Character Count = Opposite Gender Chars / Total Chars Count of all active characters reported by participant 0.27 (0.36) -0.07 -0.14 -0.03 0.07 0.00 2.79 (1.51) -0.12 0.03 0.07 0.02 0.10 4 Number of Days Played Since Start of Study Count of unique active days since start of study 65.47 (34.89) -0.04 0.00 0.01 -0.03 -0.01 5 Total Realm Count Count of realms participant has active characters on 1.11 (0.31) -0.05 0.06 0.01 -0.03 0.09 6 Max of Guild Changes .78 (1.05) 0.07 -0.01 0.00 -0.05 0.03 7 Sum of Kills 162353.84 (108633.20) -0.03 -0.03 0.00 -0.00 -0.07 8 Sum of Kills in BGs Highest number of guild change events Includes both kills against computer monsters and other players Number of kills in battlegrounds 2705.70 (3589.28) -0.01 -0.06 0.00 0.04 0.00 9 Sum of PvP Kills Number of all PvP-related kills 10437.22 (12026.80) -0.04 -0.08 0.05 0.09 -0.05 10 Sum of Deaths Total number of deaths from any cause 1849.12 (1440.63) 0.05 -0.07 0.00 0.05 -0.04 11 Number of deaths in dungeons 1018.84 (899.94) 32.64 (69.10) 0.06 -0.07 0.01 0.02 -0.08 12 Sum of Deaths in Raid Dungeons Sum of Deaths from Falling 0.02 -0.05 -0.07 -0.02 0.00 13 Sum of Hugs Number of /hug emote 38.57 (69.10) -0.02 0.11 0.10 -0.03 0.09 14 Sum of LOLs Number of /lol emote 0.01 0.01 -0.03 -0.02 0.05 15 Sum of Cheers Number of /cheer emote 63.73 (147.57) 47.05 (90.40) -0.09 0.13 0.07 0.04 0.13 16 Sum of Waves Number of /wave emote -0.06 0.10 0.09 0.08 0.14 17 Max Number of Mounts Mounts increase travel speed and are both functional and collectible 79.77 (140.21) 32.08 (29.62) -0.05 0.03 0.05 -0.02 0.01 18 Max Number of Vanity Pets 39.45 (31.80) -0.07 0.07 0.08 -0.05 0.07 19 Ratio of Need Rolls Vanity pets are small nonfunctional and largely decorative companions = Need Rolls / Total Rolls 0.17 (0.11) 0.10 -0.14 -0.08 -0.06 -0.09 20 Max Equipment Score Sum of all equipment item levels 0.02 -0.10 -0.04 0.01 -0.06 21 Sum of Count of Respecs Number of times player has changed skill specializations 3867.90 (813.20) 27.02 (28.06) 0.03 -0.09 -0.02 0.03 -0.05 22 Max of Achievement Score Total Achievement score -0.01 -0.04 0.02 0.00 -0.03 23 Ratio of Quest Achievements = Quest Achs / Total Achs (based on Sums) 413.06 (195.44) .07 (.02) -0.10 0.07 0.02 0.01 -0.01 3 Number of deaths from falling from high places Table Continued # 24 Variable Ratio of Exploration Achievements Description = Exploration Achs / Total Achs (based on Sums) M (SD) .10 (.05) E -0.04 A 0.09 C 0.06 ES 0.02 O 0.13 25 Ratio of PvP Achievements .10 (.05) 0.00 -0.12 -0.03 0.07 -0.01 26 Ratio of Dungeon Achievements = PvP Achs / Total Achs (based on Sums) = Dungeons Achs / Total Achs (based on Sums) .36 (.12) 0.12 -0.17 -0.12 0.01 -0.17 27 Ratio of Profession Achievements = Profession Achs / Total Achs (based on Sums) .10 (.06) -0.04 0.13 0.07 -0.02 0.12 28 Ratio of Reputation Achievements = Reputation Achs / Total Achs (based on Sums) .03 (.01) -0.03 -0.02 -0.03 -0.01 -0.12 29 Ratio of World Event Achievements = World Achs / Total Achs (based on Sums) .13 (.07) -0.08 0.16 0.10 -0.04 0.11 30 Max of Cooking Achievements Highest cooking score 6.33 (4.85) -0.07 0.07 0.07 -0.01 0.05 31 Max of Fishing Achievements Highest fishing score 7.26 (6.02) -0.06 0.07 0.07 0.01 0.05 32 Total 10-man end-game raids completed Total 25-man end-game raids completed = Healing Done / Damage Done (based on Sums) 16.78 (17.83) 0.06 -0.11 -0.05 0.00 -0.13 18.13 (22.99) 0.08 -0.09 -0.05 0.00 -0.12 34 Sum of End Game 10-man Raids Done Sum of End Game 25-man Raids Done Ratio of Healing Done .32 (.46) 0.00 0.00 -0.03 -0.02 0.01 35 Sum of Arenas Played Number of Arenas entered -0.01 -0.09 0.01 0.06 0.01 36 Sum of BGs Played Number of BGs entered -0.07 -0.07 0.02 0.05 0.04 37 Sum of Duels Played Number of Duels entered 55.57 (155.31) 98.36 (147.11) 52.80 (94.73) 0.11 -0.07 -0.04 -0.05 -0.03 38 Ratio of Arena Wins = Arena Wins / Arenas Entered .33 (.18) -0.10 -0.12 0.03 0.08 -0.01 39 Ratio of BG Wins = BG Wins / BGs Entered .48 (.18) -0.02 -0.06 -0.01 -0.01 0.01 -0.06 0.02 0.07 0.04 -0.01 -0.02 -0.01 33 40 Ratio of Duel Wins = Duel Wins / Duels Entered .46 (.21) 41 Sum of Flight Paths Taken Flight paths are used to fly from one fixed location to another 1424.42 (1117.06) -0.08 0.07 0.05 42 Sum of Hearths 454.08 (310.49) 0.00 0.02 -0.01 -0.03 -0.03 43 Ratio of Melee DPS Role .30 (.30) -0.08 -0.01 0.03 0.02 0.05 44 Ratio of Ranged DPS Role Hearthstones allow a character to teleport to a pre-determined location Ratio of time spent in hand-to-hand DPS role (e.g., fury warriors, rogues) Ratio of time spent in ranged DPs role (e.g, hunters, mages) .38 (.32) 0.06 0.05 0.01 -0.08 0.04 45 Ratio of Healing Role .20 (.24) 0.04 -0.05 -0.05 -0.01 -0.05 46 Ratio of Tank Role Ratio of time spent in healing role (e.g., holy priests, restoration druids) Ratio of time spent in tanking role (e.g., protection warrior, protection paladin) .13 (.20) -0.01 -0.01 -0.01 0.11 -0.07 Table 1. Means, standard deviations, and correlation coefficients of VW behavioral measures. Correlation coefficients in bold are p < .05. RESULTS To analyze how personality is expressed in VWs, we examined the correlations between the virtual behaviors and the personality factors. Given the increased risk of experiment-wise error in large correlation tables with 46 variables against the five personality factors, we used an analytic method developed by Sherman and Funder [25] to address this specific issue. The method employs a Monte Carlo simulation of repeatedly randomized data within each participant. Thus, the method preserves the statistical properties of the data gathered. The method creates 1,000 instances of these randomized data sets and tabulates the number of observed significant correlations (at alpha of .05). The probability of the actual number of significant correlations is then calculated based on where it lies on the distribution of these 1,000 randomizations. In other words, this technique answers whether we found a significantly higher number of significant correlations in our data set than would be expected by chance alone. In our case, using an alpha of .05, we had 83 observed significant correlations where only 11.50 would be expected by chance based on the simulations. According to this Monte Carlo method, the probability of this number of observed correlations is p < .001. This provides assurance that the observed correlations, as a whole, are non-random. We will now describe each of the Big 5 personality factors and the virtual behaviors they were correlated with. We will not discuss every significant correlation, but instead try to find clusters of correlations that trace out the bigger picture. Extraversion According to the trait definition, individuals who score high on Extraversion tend to be outgoing, gregarious, and energetic, while those who score low on Extraversion tend to be reserved, shy, and quiet. In terms of behavioral indicators in VWs, individuals who score high on Extraversion tend to prefer group activities. They have a higher ratio of Dungeon Achievements (26), which requires collaboration with other players. They have also completed a higher number of end-game 25-man raid dungeons (33). Their higher number of guild changes also implies social promiscuity (6). On the other hand, players who score low on Extraversion prefer solo activities, such as questing (23), cooking (30), and fishing (33). They also are more likely to have more vanity pets (18), which are silent pet-like companions. We also see that players who score low on Extraversion have a preference and higher win ratios for some PvP activities (36, 37, 38, & 40), but it is less obvious what the connection is. The same is true for the higher ratio of opposite gender characters (2) among those who score low on Extraversion. Agreeableness According to the trait definition, individuals who score high on Agreeableness tend to be friendly, caring, and cooperative, while those who score low on Agreeableness tend to be suspicious, antagonistic, and competitive. In terms of behavioral indicators in VWs, individuals who score high on Agreeableness give out more positive emotes (13. 15, 16), i.e., hugs, cheers, and waves, and prefer noncombat activities such as exploration (24), crafting (13), world events (29), cooking (30), and fishing (31). On the other hand, players who score low on Agreeableness prefer the more competitive and antagonistic aspects of game-play. They enjoy killing other players (8 & 9). They also have more deaths (10), focus more on getting better equipment (20), and have engaged in more PvP activities (25), including BGs (35), Arenas (36), and duels (37). Their competitive edge also translates to a higher winning ratio in Arenas (38) and BGs (39). The negative correlation with ratio of need rolls (19) is also telling. Valuable equipment drops from monsters are given to players according to dice rolls. Players select to roll based on “Need” or “Greed”, of which the former is given higher priority. We found that players who are low on Agreeableness often insist on being given higher priority over others by rolling “Need”. While this is tolerated in some cases, abusing Need rolls is often seen as anti-social (there is even a specific epithet used by the community to describe these players: ninja looters). Conscientiousness According to the trait definition, individuals who score high on Conscientiousness are organized, self-disciplined, and dutiful, while those who score low on Conscientiousness are careless, spontaneous, and easy-going. In terms of behavioral indicators in VWs, individuals who score high on Conscientiousness seem to enjoy disciplined collections in non-combat settings. This is reflected in having a large number of vanity pets (18) which must be collected one at a time, and having high cooking (30) and fishing scores (31) which reflect self-discipline in collecting unique recipes and visiting unique fishing locations (as well as patiently staying put for significant amounts of time in these locations, since fishing in the game is surprisingly close to its real-world equivalent: catches can be few and far between). The same is true for world event achievements (29) which often require disciplined collections of items and visiting a set of locations around the world. On the other hand, individuals who score low on Conscientiousness seem to be more careless and are more likely to die from falling from high places (12). Emotional Stability According to the trait definition, individuals who score high on Emotional Stability are calm, secure, and confident, while those who score low on Emotional Stability are nervous, sensitive, and vulnerable. While there were significant correlations between behavioral metrics and this personality trait, these correlations were more difficult to interpret as a whole. Individuals who score low on Emotional Stability prefer PvP related activities, including having a higher PvP achievement score (25) and higher wins in the Arena (38). Individuals who score higher on Emotional Stability are more likely to have characters of the opposite gender (2). It is worth noting that previous studies have also had difficulty identifying meaningful behavioral correlates for Emotional Stability [12, 17], so our findings here may reflect an overall weaker behavioral expression of this trait. Openness to Experience According to the trait definition, individuals who score high on Openness to Experience are abstract thinkers, imaginative, and intellectually curious, while those who score low on Openness to Experience are down-to-earth, conventional, and traditional. In terms of behavioral indicators in VWs, we see a cluster of correlates that reflect exploration and curiosity. For example, individuals who score higher on Openness have more characters (3). They also have characters on more realms (5), i.e., game servers or parallel worlds that each character resides on. And they spend more of their playtime exploring the world (reflected by the higher exploration achievement ratio, 24). They also spend more time participating in non-combat activities, such as crafting professions (27) and world events (29). On the other hand, individuals who score low on Openness prefer the more traditional, combat-oriented aspects of game-play, spending more time in dungeons and raids (26, 32, & 33). Personality Inference from Behavioral Metrics To examine how well personality can be inferred from virtual behavioral metrics alone, we conducted a series of multiple regressions on each of the personality factors using the respective ten highest behavioral correlates. We note that this method is imperfect and creates a “double-dipping” concern, but provides a rough sense of how well personality can be inferred. The results are shown in Table 2. All of the multiple regressions were significant at p < .05; four were significant at p < .001. This suggests that virtual behavioral metrics can be used to provide statistically significant models of a player’s personality. According to Cohen [6], an R of .30 is a medium effect size, while an R of .10 is a small effect size. Thus, many of our regression models had around medium effect sizes. Variable R R2 Adj. R2 STE F p Extrav. 0.30 0.09 0.07 0.93 4.73 < .001 Agreeable. 0.30 0.09 0.07 0.67 4.67 < .001 Conscient. 0.20 0.04 0.03 0.79 4.86 < .001 Emo. Sta. 0.21 0.04 0.02 0.79 2.13 0.03 Openness 0.26 0.07 0.06 0.75 4.93 < .001 Table 2. Multiple regressions on each of the personality factors. DISCUSSION The availability of fine-grained virtual behavioral metrics in the WoW Armory allowed us to gather longitudinal profiles of actual VW users. While studies in the past have examined links between personality and linguistic output online (in emails or blogs), our study is the first to examine the links between personality and virtual behavior in an online game. Our findings reveal that our personalities are expressed in VWs via consistent cues, and that most of these cues reflect trait definitions of standard personality factors. For example, players who score high on Extraversion prefer group-oriented activities. And players who score high on Agreeableness use more positive emotes and prefer non-combat activities. More importantly, our multiple regressions reveal that behavioral cues in VWs can be used to infer an individual’s personality. These findings suggest that while some degree of identity experimentation is occurring in virtual worlds, basic personality is still being readily expressed. While an earlier study of personality expression in VWs [35] had trouble finding trait-aligned behavioral correlates in Second Life, we were able to find much more coherent behavioral clusters that were consistent with personality trait definitions in our study. Findings in the earlier study may have been impacted by participants with no prior experience with the VW. Also, it bears pointing out that the WoW Armory allowed us to gather a set of more conceptually meaningful variables. Due to constraints in the scripting language and sandbox nature of Second Life, there is no standardized set of high-level behavioral variables that are shared. Thus the earlier study relied on lower-level variables such as distance walked or ratio of time sitting down, which may be less powerful in capturing personality expression, as opposed to behaviors such as hugging someone. Knowing the specific behavioral correlates for personality expression in virtual worlds is also important for several reasons. First, it helps researchers triage the large number of behavioral variables gathered in future studies, and helps prioritize where to start looking. Second, it helps psychologists understand whether certain personality traits are more easily predicted via behavioral indicators. And finally, comparing the findings across these studies will help us understand whether these behavioral correlates are consistent or idiosyncratic among different virtual worlds. Implications for CHI Personalized interfaces and system customization have long been of interest to the HCI community [16, 24]. It is reasonable to assume that information needs vary based on a user’s personality – for instance, extroverts using an online shopping website might be more interested in other customers’ reviews, while introverts might prefer seeing mostly technical data about the product instead. Our paper points at the possibility of inferring users’ personalities based on their activity traces (which need not come from online games) and customizing their experience based on the results. Another possibility directly applicable to online games but also other forms of social software would be to use inferred personality information to assist in the formation of groups, perhaps by recommending compatible partners based on the task to be accomplished. For instance, groups requiring a diversity of opinions might benefit from the inclusion of a wide range of personality types [14]. In other contexts, a more homogeneous mix could be beneficial. And it is worth pointing out that we are not suggesting an automated system that would kick some players out of groups because they are low on Agreeableness. After all, the competitive nature of these players can be an asset in PvP settings, and an assertive nature can also be an asset for raid leaders. In a related fashion, personality data could also be used in recommender systems: recommendations from other users with similar personality profiles could be given more or less weight, depending on the user’s desire for more homogeneity or diversity in the options they are presented with [19]. Limitations and Future Research Directions There were several limitations to our study. First, we only collected data from one VW. It is unclear whether the behavioral cues we identified generalize to other similar online games. Moreover, it is difficult to say how our indicators translate to VWs that do not employ the dragonslaying role-playing paradigm. Nevertheless, our findings hint at potential metrics to collect and analyze in future studies. For example, emotes (for Agreeableness) or geographical movement (for Openness) have analogous metrics across many types of VWs. A related limitation to generalizability is that WoW users are highly-engaged users who spend on average 20 weeks producing behaviorally-rich metrics. This usage profile is likely atypical of normal website or mobile app usage. Whether the more typical casual engagement with websites and mobile apps would allow personality inference is certainly an avenue for future research. Third, while the correlation coefficients appear to be quite low (ranging from .06-.17), a similar large-scale study (i.e., >500 participants) of linguistic output among bloggers yielded similar effect sizes [32]. Given the larger variances in demographics (with an age range of 16-85 in our sample) and unavoidable noise among natural setting samples, these smaller effect sizes are probably not surprising in hindsight. And finally, we relied on the set of variables that Blizzard shares publicly via the Armory. It is possible that other unshared variables, such as logged chat, may be even more predictive of personality. Given the existing work on linguistic predictors of personality [21, 32], it would be interesting to be able to directly compare the predictive power of linguistic and behavioral cues. Overall, it is important to continue exploring how personality is expressed across a range of VWs (using a variety of metrics) to understand how generalizable these findings are. Ending Thoughts VWs provide a novel research platform with unique affordances and challenges. The automated longitudinal data collection across a wide range of behaviors is impossible to mirror using traditional data collection techniques, and similar techniques could also be used to study other social phenomena, such as the emergence of group norms or leadership. On the other hand, VWs come with unique challenges as well. Above all, the ability to create tracking systems that essentially shadow a user wherever they go in a VW raises privacy concerns. In our study, the consent process spelled out the data collection scripts to participants, but given that VWs like WoW are a kind of pseudonymous public space, data collection studies (without a survey component like ours) largely fall into the exempt category for human subjects Institutional Review Board (IRB) review. The gray area arises due to the fact that the public space of WoW is unlike any physical public space we know--with microphones and video cameras that could follow every user unobtrusively. This becomes even more complicated when the game developer makes public what would otherwise be private data. Such is the case with the WoW Armory. After all, before the WoW Armory, players could make the case that they had a reasonable expectation of privacy in WoW (with regard to IRB review). This expectation is no longer reasonable with the release of the Armory. In short, VWs create new research platforms, but at the same time, force us to address our role as researchers in the face of such powerful data collection tools. It is easy to imagine that VWs allow us to become whatever we want to be, but our findings show that our personalities remain even when we don virtual bodies. These findings of personality expression in VWs suggest that our first lives still play an important role even when we are in Second Life. And our personalities are readily expressed even when we are Elves and Gnomes. ACKNOWLEDGMENTS This research is sponsored by the Air Force Research Laboratory. REFERENCES 1. 2. 3. Au, W. Linden Suburban Home Owners More Likely To Treat Their Place As Extension of the Real Life Self, Academic Suggests. New World Notes. http://nwn.blogs.com/nwn/2010/04/linden-homesstudy.html Back, M., Schukle, S. and Egloff, B. How extraverted is honey.bunny77@hotmail.de? Inferring personality from e-mail addresses. Journal of Research in Personality, 42 (2008), 1116-1122. Back, M., Stopfer, J., Vazire, S., Gaddis, S., Schmukle, S., Egloff, B. and Gosling, S. Facebook profiles reflect 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. actual personality not self-idealization. Psychological Science, 21 (2010), 372-374. Bessiere, K., Seay, A. and Kiesler, S. The Ideal Elf: Identity Exploration in World of Warcraft. CyberPsychology and Behavior, 10 (2007), 530-535. Carroll, J. and Carolin, P. Relationship between game playing and personality. Psychological Reports, 64 (1989), 705-706. Cohen, J. Statistical Power Analysis. Lawrence Erlbaum Associates, 1988. Ducheneaut, N., Wen, M., Yee, N. and Wadley, G. Body and mind: a study of avatar personalization in three virtual worlds. Proceedings of CHI, 1 (2009), 11511160. Ducheneaut, N., Yee, N., Nickell, E. and Moore, R. The life and death of online gaming communities: a look at guilds in World of Warcraft. CHI 2007 Proceedings (2007), 839-848. Funder, D. and Sneed, C. Behavioral Manifestations of Personality: An Ecological Approach to Judgmental Accuracy. Journal of Personality and Social Psychology, 64 (1993), 479-490. Gill, A., Oberlander, J. and Austin, E. Rating E-mail Personality at Zero Acquaintance. Personality and Individual Differences, 40 (2006), 497-507. Goldberg, L. A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models. in Mervielde, I., Deary, I., De Fruyt, F. and Ostendorf, F. eds. Personality Psychology in Europe, Tilburg University Press, Tilburg, The Netherlands, 1999, 7-28. Gosling, S., Ko, S., Mannarelli, T. and Morris, M. A Room with a cue: Judgments of personality based on offices and bedrooms. Journal of Personality and Social Psychology, 82 (2002), 379-398. Hancock, J., Toma, C. and Ellison, N. The truth about lying in online dating profiles. Proceedings of CHI 2007, 1 (2007), 449-452. Harper, F., Frankowski, D., Drenner, S., Ren, Y.Q., Kiesler, S., Terveen, L., Kraut, R. and Riedl, J. Talk Amongst Yourselves: Inviting Users to Participate in Online Conversations. IUI 2007 (2007), 62-71. Kenny, D., Horner, C., Kashy, D. and Chu, L. Consensus at zero acquaintance: Replication, behavioral cues, and stability. Journal of Personality and Social Psychology, 62 (1992), 88-97. Mackay, W. Triggers and Barriers to Customizing Software. Proceedings of SIGCHI 1991, 1 (1991), 153160. Mairesse, F. and Walker, M. Automatic Recognition of Personality in Conversation. Proceedings of the Human Language Technology Conference, 1 (2006), 85-88. Marcus, B., Machilek, F. and Schutz, A. Personality in Cyberspace: Personal Web Sites as Media for Personality Expressions and Impressiosn. Journal of 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. Personality and Social Psychology, 90 (2006), 10141031. McNee, S., Riedl, J. and Konstan, J. Making Recommendations Better: An Analytic Model for Human-Recommender Interaction. CHI 2006 (2006), 1103-1108. Mehl, M., Gosling, S. and Pennebaker, J. Personality in Its Natural Habitat: Manifestations and Implicit Folk Theories of Personality in Daily Life. Journal of Personality and Social Psychology, 90 (2006), 862-877. Mehl, M. and Pennebaker, J. The Sounds of Social Life: A Psychometric Analysis of Students' Daily Social Environment and Natural Conversations. Journal of Personality and Social Psychology, 84 (2003), 857-870. Nardi, B. and Harris, J. Strangers and Friends: Collaborative Play in World of Warcraft. CSCW 2006 (2006), 149-158. Rentfrow, P. and Gosling, S. Message in a Ballad: The Role of Music Preferences in Interpersonal Perception. Psychological Science, 17 (2006), 236-242. Riecken, D. Personalized Views of Personalization. Communications of the ACM, 43 (2000), 26-28. Sherman, R. and Funder, D. Evaluating correlations in studies of personality and behavior: Beyond the number of significant findings to be expected by chance. Journal of Research in Personality, 43 (2009), 1053-1063. Simon, A. Emotional stability pertaining to the game of Dungeons and Dragons. Psychology in the Schools, 24 (1987), 329-332. Turkle, S. Life on the Screen: Identity in the Age of the Internet. New York: Simon and Schuster., 1995. Vazire, S. and Gosling, S. e-Perceptions: Personality Impressions Based on Personal Websites. Journal of Personality and Social Psychology, 87 (2004), 123-132. Webb, E., Campbell, D., Schwartz, R. and Sechrest, L. Unobtrusive measures: non-reactive research in the social sciences. Rand McNalley, Chicago, 1966. White, P. MMOGData: Charts. 2009 (2008). Williams, D., Yee, N. and Caplan, S. Who plays, how much, and why? Debunking the stereotypical gamer profile. Journal of Computer-Mediated Communication, 13 (2008), 993-1018. Yarkoni, T. Personality in 100,000 Words: A large scale analysis of personality and word use among bloggers. Journal of Research in Personality (in press). Yee, N. The demographics, motivations, and derived experiences of users of massively multi-user online graphical environments. Presence: Teleoperators and Virtual Environments, 15 (2006), 309-329. Yee, N. The "Impossible" Romance. The Daedalus Project. http://www.nickyee.com/daedalus/archives/001534.php Yee, N., Harris, H., Jabon, M. and Bailenson, J. The Expression of Personality in Virtual Worlds. Social Psychological & Personality Science (in press). Predicting Player Behavior in Tomb Raider: Underworld Tobias Mahlmann, Anders Drachen, Julian Togelius, Alessandro Canossa and Georgios N. Yannakakis Abstract—This paper presents the results of an explorative study on predicting aspects of playing behavior for the major commercial title Tomb Raider: Underworld (TRU). Various supervised learning algorithms are trained on a large-scale set of in-game player behavior data, to predict when a player will stop playing the TRU game and, if the player completes the game, how long will it take to do so. Results reveal that linear regression models and other non-linear classification techniques perform well on the tasks and that decision tree learning induces small yet well-performing and informative trees. Moderate performance is achieved from the prediction models, which indicates the complexity of predicting player behavior based on a constrained set of gameplay metrics and the noise existent in the dataset examined, a generic problem in large-scale data collection from millions of remote clients. Keywords: Player modeling, supervised learning, classification, Tomb Raider: Underworld I. I NTRODUCTION User-oriented testing is a crucial phase of modern game development with the scope of iteratively enhancing the final game product that will be published [1], [2], [3]. Usually a carefully selected set of subjects, representative of the target audience, as well as professional testers are involved in a labor-intensive procedure testing the games and evaluating the quality of the gaming experience [1], [3]. One of the key components of user-oriented testing both during production and after game launch, is to evaluate if people play the game as intended and investigate how gameplay and game design impact the playing experience [1], [2]. The increasing focus on increasing player affordability in digital games [1] - freedom, choice - emphasizes the need for the development of reliable and effective user-testing procedures [2]. Being able to predict certain aspects of gameplay and playing experiences defines a vital component of the user testing procedure within game development [1], [3]. Prediction of playing patterns may rely on both qualitative and quantitative approaches to user testing [2], [4]. This paper examines the latter. Within the last five years, instrumentation data derived from player-game interaction — or gameplay metrics as they are referred to in game development — has gained increasing attention in the game industry as a source of detailed information about in-game player behavior [2], comprising detailed numerical data extracted from the interaction of the player with the game [5]. The application of machine learning and data mining on such data, with datasets often in the terabyte scale, and the inference of playing patterns from the data [6] Authors are with the Center for Computer Games Research, IT University of Copenhagen, Rued Langgaards Vej 7, DK-2300 Copenhagen S, Denmark (email: {tmah, drachen, juto, alec, yannakakis}@itu.dk). can provide an alternative quantitative approach to and supplement traditional qualitative approaches of user- and playability testing [3]. Notably, the application of gameplay metrics permits much larger sample sizes to be used, and the data can potentially be collected outside of the laboratory environment. Furthermore, game metrics are highly detailed, permitting tracking and logging of the second-by-second behavior of players. Understanding patterns of game-playing behavior, and more specifically gameplay aspects such as where players encounter problems with progressing through a game, permits re-engineering of the game design and ensures the enhancement of playing experience. In this paper, we explore the possibility of predicting particular aspects of playing behavior in the commercial game title Tomb Raider: Underworld1 (TRU) via supervised learning. In particular we attempt to predict when a player will stop playing and, if alternatively the player completes the game, how long will it take the player to do so. The generated predictors are trained on player metrical data of the first two levels of the TRU game. One of the perennial challenges of game design is to ensure inclusiveness — i.e. that as many different types or classes of players are facilitated in the design. Being able to predict when specific classes of players will stop playing a game is of interest in game development because it assists with locating problematic aspects of game design, i.e. features that hinder different classes or types of players from progressing through specific segments of the game, and ultimately complete the game. The ability to predict completion time for the players who do complete the game is of similar interest. For example, if a particular type of player completes a game very fast, there is a risk of disappointment with the game product. Identifying the different types of completion strategies and accounting for them in the game design is an important element ensuring customer satisfaction. Earlier work on TRU metrics data has focused on the investigation of dissimilar playing patterns via selforganization in a moderate data set of 1365 players [6]. The experiments presented here are based on a large data set derived from 10000 players. Data was collected via the Square Enix Europe (SQE) Metrics Suite. The data collection process is completely unobtrusive since data was gathered directly via the Xbox Live!2 service, with subjects playing TRU in their natural habitat. Several features that correspond to various key aspects of playing behavior, are extracted from the data, e.g. information about causes of player deaths. The specific features are 1 http://www.tombraider.com 2 http://www.xboxlive.com selected so that they incorporate knowledge for the player performance. A carefully selected set of various classification algorithms is employed to 1) predict the number of levels completed (i.e. the level number class) based on those features of play and 2) predict the game playing time of players that completed the TRU game. Our algorithms are tested on two tasks: 1) learn to predict based on playing features of level 1 of the game and 2) based on playing features of level 1 and 2. Results showcase the effectiveness of linear regression techniques as well as nonlinear classification approaches. It also appears that decision tree learning achieves moderate performance but provides a full degree of model expressiveness. Moreover, decision trees showcase that a very small number of playing features is adequate for achieving a moderate classification accuracy. The findings directly address the industrial need of automated processes that could assist towards identifying dissimilar playing patterns and predicting forthcoming player actions and events. The main arguments supporting the commercial applicability of results include the large-scale training dataset consisting of 10000 players; the major commercial game used and the available industrial system for logging the data. II. G AME M ETRICS M INING Viewing the mining of game data as a process towards player modeling [7], [8] we can identify few studies in the literature. Quantitative models of players have been built to assist the learning of basic non-player character (NPC) behaviors (e.g. moving, shooting) in Quake II [9], [10], [11]. In those studies self-organizing maps [10], Bayesian networks [11] and neural gas [9] approaches are employed for clustering game-playing samples. Similarly, self-organizing maps have been used for clustering players of the trails (player waypoints) of users playing a simple level exploration game [12]. Missura and Gärtner [13] investigate the use of k-means for clustering player data and support vector machines for predicting dynamic difficulty adjustment in a simple shooter game; data is derived from a small sample of 17 players. The vast majority of the aforementioned approaches concentrates on a few specific scenarios (e.g. imitate human movement in a particular level of a game) while the game environments investigated are simple test-bed games or simplified versions of commercial games. Moreover, the studies focus on constructing models or predictors of playing behavior based on small-scale player-data collection experiments held in laboratories. Doing so questions the scalability of the obtained performance and leads to the simplification of the learning task — which in turn acts in favor of the learning approach. Game data mining should consider large-scale data sets (ideally live player data sets) if the study wishes to ensure that the findings are representative and scalable. The existence of large-scale data, in turn, addresses the need for efficient and robust algorithms able to classify (or cluster) data successfully. Thawonmas et al. [14] used game metrics from the Massively Multiplayer Online Game (MMOG) Cabal Online to establish patterns of behavior among the player base, trying to identify aberrant patterns indicative of computer-controlled agents, i.e. game bots. The approach followed in that paper is based on simple frequency analysis. A similar approach was used to visualize the behavior of online game players in [15]. Ducheneaut and Moore [16] investigated interaction patterns between players in the Star Wars Galaxies MMOG utilizing action frequencies to group player behaviors. Conversely, Chen et al. [17] utilized the spatial behavior of avatars to establish models of bot and player behavior. None of the aforementioned studies moves beyond relatively simple statistical methods. To the best of our knowledge the most related study to this research is the work of Weber and Mateas, mining game metrical data for the prediction of player strategy in the realtime strategy game Starcraft [18]. Replays from over 5000 expert players were compared using various classification algorithms for recognizing the player’s strategy, and regression algorithms for the task of predicting when specific unit or building types will be produced. In [19] non-negative-matrix factorization is applied to mine 1.6 million images on World of Warcraft guilds. That study, however, does not consider live data of playing behavior rather than online player appearances. Our earlier study utilized self-organization for the identification of playing behavior clusters of 1365 TRU players [6]. III. T OMB R AIDER : U NDERWORLD The Tomb Raider franchise is one of the most established in the digital games industry. The Tomb Raider games, a combination of adventure games and 3D platformers, have been published in different versions on all hardware platforms, including mobile devices, and the current game in the series, Tomb Raider: Underworld is the eighth to be published. The main protagonist of the games, whom the player controls and interacts with the game world through, is Lara Croft. She is designed as a combination between an action heroine and Indiana Jones, who travels to exotic locations and enters forgotten tombs and lairs, solving puzzles and finding ancient treasures at the same time. The Tomb Raider game environments have been 3D from the beginning, and Tomb Raider: Underworld (TRU) is no exception. Tomb Raider: Underworld is a 3D platform game and is played in third-person perspective. The players are tasked with solving various navigational puzzles and apply strategic thinking in their navigational behavior (see Fig. 1). The player faces different types of danger from the game environment and computer-controlled agents operating within it. Falling is an almost continuous risk in the game, and the player also encounters different types of mobile NPC enemies. The environment is also a danger, as it is filled with traps, hazardous substances, fire, etc., which can kill the player. The game consists of seven game levels plus a prologue. Each game level is set to a specific theme, for example Thailand or the Arctic Sea, subdivided into 71 map units (MU) of varying size. Fig. 1. A screenshot from Tomb Raider: Underworld level “Thailand”. Image is copyright of Crystal Dynamics/Square Enix Europe (2009). Because TRU was the first game that the Metrics Suite collected data from, there were a number of data cleaning issues such as the recording of negative values, missing timestamps, etc., which made the data cleaning process extensive. The 10000 player sample was also cleaned to remove e.g. instances where players had completed the game and then started playing the game again (approximately 1600 players did this). Additionally, instances where the Metrics Suite had missing data reported for a player from e.g. a specific game level or map unit or similar missing intermediate location times (those that were reported as not having spent any time in one or more locations that are part of a level they have completed), where removed. Missing data is discussed further in the last section of this paper. B. Extracted features IV. DATA COLLECTION The gameplay metrics data were obtained from the Square Enix Europe Metrics (SQE; the former EIDOS) Suite, which contains data from a range of SQE-produced games. The SQE Suite is an instrumentation/telemetry system developed to capture and store game metrics. Gameplay metrics are normally logged as event-based data, and each metric is associated with a range of descriptives (contextual information) such as time stamps, user IDs, IP addresses, etc. An important aspect of the system is that it delivers live data, i.e. data from people playing these games in their natural habitats. The data collection is completely unobtrusive, providing detailed, quantitative information about how users play games free from any effects or bias imposed by experimental approaches to research [6], [20]. A. Data Preprocessing The SQE Suite holds data from more than 1.5 million players of TRU. A sample was drawn covering all data collected from a two month period (1st Dec 2008 - 31st Jan 2009), providing records from approximately 203000 players (around 100 GB). The game was launched in November 2008, so the data represent a time period where the game was recently released to the public. The data was imported to dual Microsoft SQL Server databses. Such large data amounts require substantial computing power to analyze, and it was therefore chosen to extract a subsample of 10000 players for an initial study. The 10000 players provide a sample large enough to form the basis for developing analysis methods, while at the same time being manageable in terms of analysis runtime. The only criterion applied to the selection was that players in the sample must have completed the first level of TRU. In terms of preprocessing, the main challenge was to transpose the data obtained from the Metrics Suite into a format we could use to analyze the data. To identify distinct players it was necessary to collect several messages to reconstruct their progress. The data in the sample were extracted in a series of tables, cleaned and transposed to a single table. Based on previous experience with a smaller sample of data from TRU [6], it was chosen to focus on game metrics that relate to the primary game mechanics and play features, as these are the most descriptive of the way TRU is played and how players can interact with the game system. TRU is a 3D platformer, with navigation being a major part of the gameplay, as is solving puzzles and fighting enemies. The features used for the current analysis relate to the core mechanics of the game. Eight categories of features were extracted, at two scales of resolution: Map Unit or Game Level, giving a total of 674 variables per player. Which resolution scale to use for each feature was chosen depending on the frequency of the specific variable, the distribution of use among the sampled players, the relation to the core game mechanics and its suitability for machine learning. Given the above-mentioned rationale the following features were extracted: • • • Playing time: The time that each player spent playing the game, T . A total of 8.06 years of playtime were included in the dataset (including the game prologue), with an average playing time of 7.06 hours — with different levels/MUs of TRU taking different amounts of time to complete due to their varying size and/or puzzle difficulty. The total playing time per player varies between 21 minutes and 58.64 hours. The average time taken to complete the entire game was 10.23 hours. Total number of deaths: The total number of deaths for each player, D. There are 961403 instances of death registered, across all levels/MUs and death causes (96.14 average per player, varying from 0-1343 death events; σ{D} = 83). The death count is dependent on e.g. how much of TRU that a player has played, and the skill of the player. Help-on-Demand: The number of times help was requested, H. A key feature of TRU is the focus on navigational puzzle solving. A typical puzzle could be a door which requires specific switches to be pressed in order to open. Players need to solve the numerous puzzles in order to progress through the game. In order to avoid player frustration with the puzzles, a • native Help-on-Demand (HoD) system was added to TRU, from which a hint or solution can be requested in relation to puzzles. The sampled data indicate that players generally either request both hints and answers or no help at all for specific puzzles. Both hint and answer requests were therefore aggregated into the H value. A total of 329907 HoD-requests are recorded (32.99 average), this value is also highly dependent on how much of the game a player has played, and the player skill and playstyle. Causes of death: TRU features a variety of ways in which players can die. The causes of death can be grouped into three categories: Death via enemies (which can be subdivided into ranged- and melee-oriented enemies), from falling or from environmental hazards. Death events caused by game bugs, for example players dying during cinematic encounters, were not included. – Enemies (melee), Dm : the number of deaths caused by melee enemies. Those enemies include tigers, panthers, who attack Lara Croft in close combat. Dying from melee enemies comprise 3.03% of the total number of deaths recorded. – Enemies (ranged), Dr : the number of deaths caused by NPC enemies who attach using ranged weapons, e.g. mercenary snipers. Dying from ranged enemies comprise 4.14% of the total number of deaths recorded. – Environment, De : the number of deaths caused by environment-related causes of death such as player drowning, being consumed by fire, or killed in a trap, comprising 29.9% of the total number of deaths across all players. – Falling, Df : the number of deaths caused by falling. This cause of death comprises the 62.92% of all death events making it the dominating way to die in TRU, as would be expected from the game design. • These numbers vary from those reported in [6], reflecting the different properties of the underlying samples: in that study a sample of 1365 players was used who completed the game, whereas the current sample comprises of 10000 randomly selected players among those who completed the first level. The effect of sampling is seen in e.g. death from opponents only comprising 8.13% in the current dataset, but 28.9% in dataset of [6]. Enemies have a high impact in levels 5 and 6, levels that not all players in the current sample will have reached. Death by environmental causes comprised 13.7% in the earlier study, 29.9% in the current, which is likely again due to the different properties of the two samples. Death by falling is similar however: 57.2% reported in [6] vs. 62.92% in the current sample. Fig. 2 depicts the causes of death in TRU. Adrenalin: The number of times the adrenalin feature was used, A. This is an advanced gameplay feature of TRU that permits the player to temporarily slow down Fig. 2. Percentages of the four causes of death in Tomb Raider: Underworld across all seven game levels. Values are averages of all players (out of the 10000 players) that completed the corresponding level. • • • time while performing special attacks against enemies. When activated, a cursor has to be moved to the head area of the target, which will trigger a headshot event. The players in the sample used the adrenalin feature 72593 times, i.e. 7.26 per player. The use of adrenalin is highly varied between players: between 0 and 304 uses. Rewards: The number of rewards collected, R. The levels of Tomb Raider: Underworld are rife with ancient artifacts, shards and similar relics, which players have the opportunity to collect during the playing of the game. A total of 1120708 artefacts/shards were located by the players in the game (112.08 average (σ{R} = 86.9). Treasure: The number of treasures found, T . Most levels in TRU contain one or a few major treasures, which take particular exploration to locate. Thus, a high treasure count is indicative of explorative behavior in players. A total of 24927 treasures are located in the dataset (T = 2.49; σ{T } = 5.1). Setting changes: Players can change various parameters of the TRU game. Among these, four directly impact on gameplay, and therefore are of interest to the current analysis: – Ammo adjustment, Sa : The number of times the player adjusts how much ammunition Lara Croft is able to carry. Changing this setting comprises 29.6% of the total amount of settings changes. – Enemy hit points, Se : The number of times the player changes the amount of hit points that computer-controlled enemies have, either positively or negatively. Changing this setting comprises 31.5% of the total amount of settings changes. – Player hit points, Sp : The number of times the player adjusts how many hit points Lara Croft has, effectively making her harder vs. easier to kill. Changing this setting comprises 19.5% of the total amount of settings changes. – Saving grab adjustment, Ss : The number of times the player lowers the recovery time when performing platform jumps, increasing the time available to gain a handhold. Changing this setting comprises 19.4% of the total amount of settings changes. There were 15317 settings changes made (max 104, 1.53 average); however, only 1740 players changed settings (8.8 average). Settings changes were vastly more common in the first two levels (comprising 34.71% and 37.82% of the changes, respectively), as compared to the later levels (8.02% for level 3, 10.89% for level 4, 4.89% for level 5, 1.21% for level 6, 2.47% for level 7). This pattern possibly reflects the players adjusting the difficulty parameters of the game early on, until they are satisfied, and then use the adjusted parameters throughout the rest of the game. V. M ETHODOLOGY After cleaning the 10000 player sample as described above, 6430 players remained. For these players, 30 features were collected relating to the performance of the player on level 1. These were the amount of time, T , spent in 19 different locations of the level (e.g. in the ship engine room and on the surface of the sea), and 11 other features relating to this level only: the number of deaths, the total reward, the number of help requests, the adrenalin used, the number of treasures found, and the number of deaths from the four different causes (melee, ranged weapons, environment, falling, and unknown). From this set, a second, smaller set consisting of 3517 players who also completed level 2 was selected. For this set, 25 additional features were computed related to gameplay performance on level 2 following the principles of designing the level 1 dataset: the time spent on 14 locations of level 2 plus the 11 gameplay features used in dataset 1. All features are normalized to be in [0, 1] via a uniform distribution. The target outputs for both data sets is a number indicating the last level completed by the player. We thereby assume there is an unknown underlying function between features of gameplay behavior on the first two levels and the last TRU level that was completed that a classification algorithm will be able to predict. A third data set was created from the second data set, containing only the 1732 players that finished the whole game, and including the same features as the second data set. This data set was used for trying the predict the time taken to play through the game, assuming that there is some function between early playing behaviour and speed of completion. To test the possibility of predicting both the TRU level the player completed last, and the time taken to complete game, we apply various classification and prediction algorithms using the WEKA machine learning software (version 3.6.2) from the University of Waikato [21]. WEKA is a comprehensive software package that includes versions of all the main prediction and classification algorithms from machine learning, as well as standard algorithms for preprocessing and unsupervised learning and regression techniques from statistics. This version of WEKA contains 76 algorithms applicable to classifying a nominal attribute (the final level played) from a vector of real-valued numeric attributes (the normalized location times, deaths etc. mentioned above) from 8 algorithm families. Somewhat fewer (34 algorithms) can predict a real value (time taken to finish the game) from a real-valued vector. This abundance of tools points to the maturity of the machine learning field, but means that all algorithms and all parameters cannot reasonably be tried on any particular problem. Given the experimental aim, our approach was to try at least one algorithm from each of the families of algorithms on each dataset, and to spend extra effort on those classification algorithms that were included in the recent list of the most important algorithms in data mining: decision tree induction, backpropagation/multilayer perceptrons and simple regression [22]. Variants of those algorithms were explored and the space of parameters was searched manually. They were also used as components for ensemble classifiers and as subset evaluators for feature subset evaluation algorithms, in order to achieve maximum classification performance. In the following section, we only report the best and most interesting results we have obtained from this experimentation. For all tested algorithms, the reported classification/prediction accuracy was achieved through 10-fold cross validation. VI. E XPERIMENTS The first two sets of experiments aim to predict the last level finished for each player, based only on features from level 1 and based on features from level 1 and 2 combined. The second set of experiments aims to predict the total time the player took to finish the game, based either on only level 1 features or on both level 1 and 2 features. A. Last level completed Before trying to predict which will be the last level a player finishes, we need to establish the baseline accuracy: what would an optimal predictor predict in the absence of any attribute data? This number is equivalent to the number of samples in the most common class (i.e. level completed) divided by the total number of classes. As can be seen from Table I, for the dataset containing all 6430 players that finished level 1, the best guess — in the absence of further information — is that the player only finishes level 1, leading to a baseline prediction accuracy of 34.3%. For the 3571 1 players that also finished level 2, the best guess is that a player finishes all the levels (last level finished is level 7), yielding a baseline prediction accuracy of 50%. 1 This number is lower than would be expected by subtracting the players that only finished level 1 from the first dataset (6430 − 2561 = 3869) due to extra cleaning that was performed to remove players with missing level 2 features. TABLE I N UMBER OF PLAYERS ( OUT OF THE 6430 FINISHING THE FIRST LEVEL ) THAT STOPPED PLAYING THE GAME ON EACH LEVEL . Level No. of Players 1 2561 2 376 3 1045 4 393 5 56 6 267 7 1732 TABLE II B EST ACCURACY (%) OF SEVERAL CLASSIFICATION ALGORITHMS ON PREDICTING FINAL LEVEL BASED ON FEATURES FROM ONLY LEVEL 1 OR FROM LEVEL 1 AND 2, USING DEFAULT OR LIGHTLY MANUALLY TUNED PARAMETERS . H IGHER VALUES ARE BETTER . N OTE THAT THIS IS JUST A SUBSET OF ALL ALGORITHMS THAT WERE TESTED . Algorithm Logistic regression MLP/Backpropagation J48 (C4.5) decision tree (pruned) REPTree decision tree (pruned) Multinomial naive bayes Bayes network SMO Support vector machine Baseline Level 1 48.3 47.7 48.7 48.5 43.9 46.7 45.9 39.8 Levels 1 and 2 77.3 70.2 77.4 77.2 50.2 65.1 70.0 45.3 As described above, a number of classification algorithms were brought to bear on the problem of predicting last finished level based on attributes from level 1 or from level 1 and 2. It was found to be easy to do substantially better than baseline accuracy. The best accuracy on predicting final level based on attributes from level 1 was 47.7% (baseline 39.8%), and from attributes from both level 1 and 2 it is 76.9% (baseline 50%). The best results were found using logistic regression; several algorithms were able to achieve similar accuracy, but none could surpass this simple algorithm. The performance of a selected few algorithms can be seen in Table II. Most of the tested algorithms had similar levels of performance (with the exception of a few algorithms, especially the Bayesian ones, which underperformed), and were able to predict substantially better than the baseline. In particular, when using features also from level 2, we were able to predict the last level with a much better accuracy than the baseline guess, suggesting that such predictors could be meaningfully used both for analyzing game mechanics and adapting the game online so as to keep the player playing. The difference in the predictive strength of using level 1 and 2 data as compared to only level 1 data is partly due to increased amount of features used in the second case, and of course to the fact that players who stopped playing before finishing level 2 are not part of the second data set. But it is also important to note that level 1 of TRU is designed as a form of “training level”, with less varied hazards to the player. The main hazard is falling, which is also evident from the recorded causes of death for level 1 (see Fig. 2). Levels 2-7, while showing substantial variation in theme and design, are more homogenous in that they are varied in their navigational challenges and the challenges the players encounter. Apart from accuracy, another important advantage of some machine learning algorithms is the transparency and the expressiveness of the acquired model. The models are more useful to a human game designer if they can be expressed in a form which is easy to visualize and comprehend, so that the consequences of changing particular design elements can be easily grasped. Multi-layer perceptrons are particularly limited from this perspective, and linear models with many free variables not so powerful either. However, decision trees of the form constructed by the ID3 algorithm and its many derivatives are excellent from this perspective, especially if pruned to a small size. The following extremely small decision tree is produced by the REPTree algorithm constrained to tree depth 2, and has a classification accuracy of 47.3% when trained on data from the level 1 only: L1-Seatop-T < 10835.5 → L1-R < 25.5 : 1 → L1-R ≥ 25.5 : 7 L1-Seatop-T ≥ 10835.5 : 7 On the set of players who completed both levels 1 and 2, the following tree has a classification accuracy of 76.7%: L2-R < 18.5 → L2-Flushtunnel-T < 9.858 : 2 → L2-Flushtunnel-T ≥ 9.858 : 3 L2-R ≥ 18.5 : 7 The right arrow (→) symbol depicted at the above trees indicates a branch under the tree-node which is right above the symbol. The number right to the colon symbol represents the predicted game level. The accuracy of these predictors is quite impressive given how extremely simple they are. The idea that it would be possible to guess which level a player will finish on much better than baseline, based simply on how long time the player spends on the surface of the sea (L1Seatop-T ; in seconds) in the first level and her total reward (L1-R) during the first level would seem rather outrageous if it was not supported by empirical evidence. The same goes for the idea that we could predict final level with a quite high accuracy based only on the amount of time spent in the Flush Tunnel room (L2-Flushtunnel-T ) and the total rewards collected, for level 2 (L2-R). What these two decision trees indicate is that the amount of time players spent within a given area early in the game and how well they perform is important for determining if they continue playing the game. Time spent can be indicative of problems with progressing through the game, which can lead to frustration. According to these trees the computer-controlled enemies of TRU do not appear to help in predicting when players will stop playing the game. The fact that only very little performance can be gained from using all 30 (or 55) features rather than just 2 or 3, especially when those 2 or 3 features do not appear to be much more important than other features, suggests that there is a very high degree of inter-correlation among those features. We, therefore, used the CFS feature subset evaluator [23], which rates a set of features depending on their correlation with the target class and the degree of redundancy between the features, together with a greedy search method (i.e. sequential forward features selection) which starts with adding the most significant feature and then adds one feature at a time until the feature subset cannot be improved . From all 55 features, this method selected only four (L1-Seatop-T , L2-Norsehall-T , L2-R and L2-H) confirming our assumption that the vast majority of features are highly inter-correlated. B. Completion time The next set of experiments aims at predicting the time taken to finish the game, based on the same features as above, either from level 1 only or from both levels 1 and 2. As in the previous set of experiments we tried standard linear regression methods for the prediction of completion time. The feature (of all features from level 1 and 2) that correlates most with completion time is L1-Seafloor-T (positive correlation 0.35) and employing univariate linear regression from this feature to completion time yields an absolute relative error (RAE) of 92%. The RAE statistic is computed as the average difference between the predicted and the target value divided by the difference between the mean and the target value. Multivariate linear regression manages to reduce this error to 88.2% when only using features from level 1, and to 84.5% when using features from both levels 1 and 2 (see Table III). These linear methods were contrasted a large number of nonlinear methods for numeric prediction from machine learning; selected results are shown in Table III. As can be seen some of the methods (SMO and REPTree combined with bagging) outperformed the linear methods by a notable amount of error. Attribute selection and ensemble classification were tried, as well as moderate parameter tuning, and the results reported in the table reflect the best configuration found for each algorithm (as above, results are only reported for selected algorithms). Like for the classification task, surprisingly poor (sub-baseline) performance was noted from an otherwise reliable algorithm, the MLP using backpropagation. This serves to underscore that experiments like these, which do not perform systematic search in parameter space, can only show that a particular algorithm can work for some type of problem, not that it cannot work. The features that best predict the time taken to complete the game are unsurprisingly the times taken to complete various units of the first two levels. This can be seen both from which features best correlate with the game completion time, and which features best split the data set into binned classes in the REPTree classifier. To summarize, we can predict the completion time substantially better than just random guessing, and using features from level 2 as well as well as from level 1 increases the accuracy of our predictions; the best predictor found is the support vector machine achieving a RAE of 82.4%. Given TABLE III R ELATIVE ABSOLUTE ERROR (%) OF SEVERAL ALGORITHMS ON PREDICTING GAME COMPLETION TIME BASED ON FEATURES FROM ONLY LEVEL 1 OR FROM LEVEL 1 AND 2, USING DEFAULT OR LIGHTLY MANUALLY TUNED PARAMETERS . L OWER VALUES ARE BETTER . Algorithm Simple linear regression Multivariate linear regression SMO Support vector machine MLP/Backpropagation REPTree decision tree (pruned) Bagging REPTree (pruned) M5Rules decision list Gaussian processes Baseline Level 1 92.0 89.4 88.2 107.2 92.5 85.2 93.7 88.8 100.0 Levels 1 and 2 92.0 84.5 82.4 111.5 91.8 83.5 88.6 84.3 100.0 the results obtained the underlying function appears to be nonlinear. The question that remains to be answered is exactly how useful these predictions are. Our best predictions still have 4/5 as high errors as just guessing the average value, meaning that it is unlikely this information would really help in e.g. guiding real-time game adaption, but does provide useful feedback to guide game design. It might be possible to predict outliers – extremely high or low completion times — with higher accuracy, something we have not tried. But our main conclusion regarding completion time is that prediction algorithms are in need of more detailed gameplay metrics and more extracted statistical features. VII. D ISCUSSION Despite the strong indication that prediction of player behavior based on quantitative measures of their early play performance is possible and the indication that it may be a few features of the behavior of the players that are the most important predictors, the predictive power of the models presented in the above is moderate. We believe that one of the reasons for the moderate performances achieved in this paper is the existence of data noise both in terms of unreasonable outliers in unit times (which could be generated due to different patches of the game interlocking with the Metrics Suite) and in terms of missing information of players for some game levels. Even though we put substantial effort to remove noise from these large-scale datasets we cannot be entirely certain of the degree of noise that is still existent within those datasets. An additional issue is the limited number of variables available in the TRU-dataset, which do correlate with the core of the gameplay, but lack for example player movement paths. Improving on these points are likely to improve on the predictive strength of the algorithms used here. In the future it would be our desire to have access to less noisy data via improved logging systems that use even more efficient server-client network communications. The data obtained from Tomb Raider: Underworld was among the first using the — by then — newly developed SQE Metrics Suite, which has since then been further developed. Data from the newer games, which contain more variables compared to TRU, will form the focus of future research in our attempt to test the generality of the approach followed in this paper. Future work will also focus on being able to predict when a player stops the game at a finer granularity. Thus, we would like to know not only at what level, but also at which map unit/specific situation the player stops playing. On that basis, supervised learning techniques could potentially perform better if we instead attempt to predict the type of situation in which the player is when she stops playing. Future research will also investigate association mining, combining clusters of player behavior with gameplay metrics data to investigate if particular play-styles have an impact on game completion and the underlying reasons for why players stop playing before a game is completed. Finally, the 10000 player dataset used in the current study is only a fraction of the main dataset containing data from 203000 players, which in turn is a subset of the main SQE Metrics Suite database which contains data from over 1.5 million players. Future research will focus on testing clustering and classification methodologies on those massive-scale datasets. The causes preventing players from completing a game are possibly game specific and maybe relate to particular playing styles [6], [4], although it is possible that there are principles that apply across specific subsets of games or digital games in general, for example a high difficulty (steep learning curve) early in a game. Ideas about how to keep players engaged are prevalent in the game industry, and increasingly backed by behavioral and cognitive psychology as user research is gaining importance in commercial game development; however, there is very limited publicly available empirical evidence, due to the general proprietary nature of such data. Studies such as the one presented here form a first step towards addressing this problem. ACKNOWLEDGMENTS This work would not be possible without the game development companies who are involved. The authors would like to thank their colleagues at Crystal Dynamics and IO Interactive (IOI) for continued assistance with access to the Square Enix Europe Metrics Suite and discussion of approaches, methods and results, including but certainly not limited to: Thomas Hagen and the rest of the Square Enix Europe Online Development Team, Janus Rau Sørensen and the rest of the IOI User-Research Team, Tim Ward, Kim Krogh, Noah Hughes, Jim Blackhurst, Markus Friedl, Thomas Howalt, Anders Nielsen as well as the management of both companies. R EFERENCES [1] K. Isbister and N. Schaffer, Game Usability: Advancing the Player Experience. Morgan Kaufman, 2008. [2] J. H. Kim, D. V. Gunn, E. Schuh, B. C. Phillips, R. J. Pagulayan, and D. Wixon, “Tracking real-time user experience (true): A comprehensive instrumentation solution for complex systems,” in Proceedings of CHI, Florence, Italy, 2008, pp. 443–451. [3] R. J. Pagulayan, K. Keeker, D. Wixon, R. L. Romero, and T. Fuller, The HCI handbook. Lawrence Erlbaum Associates, 2003, ch. Usercentered design in games, pp. 883–906. [4] A. Drachen and A. Canossa, “Towards Gameplay Analysis via Gameplay Metrics,” in Proceedings of the 13th MindTrek 2009. Tampere, Finland: ACM-SIGCHI Publishers, September 2009. [5] A. Tychsen and A. Canossa, “Defining personas in games using metrics,” in Proceedings of Future Play 2008. Toronto, Canada: ACM publishers, 2008, pp. 73–80. [6] A. Drachen, A. Canossa, and G. N. Yannakakis, “Player Modeling using Self-Organization in Tomb Raider: Underworld,” in Proceedings of the IEEE Symposium on Computational Intelligence and Games. Milan, Italy: IEEE, September 2009, pp. 1–8. [7] R. Houlette, Player Modeling for Adaptive Games. AI Game Programming Wisdom II. Charles River Media, Inc, 2004, pp. 557–566. [8] D. Charles and M. Black, “Dynamic player modelling: A framework for player-centric digital games,” in Proceedings of the International Conference on Computer Games: Artificial Intelligence, Design and Education, 2004, pp. 29–35. [9] C. Thurau, C. Bauckhage, and G. Sagerer, “Learning human-like Movement Behavior for Computer Games,” in From Animals to Animats 8: Proceedings of the 8th International Conference on Simulation of Adaptive Behavior (SAB-04), S. Schaal, A. Ijspeert, A. Billard, S. Vijayakumar, J. Hallam, and J.-A. Meyer, Eds. Santa Monica, LA, CA: The MIT Press, July 2004, pp. 315–323. [10] ——, “Combining self organizing maps and multilayer perceptrons to learn bot-behaviour for a commercial game,” in GAME-ON, 2003, pp. 119–123. [11] C. Thurau, T. Paczian, and C. Bauckhage, “Is bayesian imitation learning the route to believable gamebots?” International Journal of Intelligent Systems Technologies and Applications, vol. 2, no. 2/3, pp. 284–295, 2007. [12] R. Thawonmas, M. Kurashige, K. Iizuka, and M. Kantardzic, “Clustering of Online Game Users Based on Their Trails Using Self-organizing Map,” in Proceedings of Entertainment Computing - ICEC 2006, 2006, pp. 366–369. [13] O. Missura and T. Gärtner, “Player modeling for intelligent difficulty adjustment,” in Proceedings of the ECML–09 Workshop From Local Patterns to Global Models (LeGo–09), J. F. Arno Knobbe, Ed., Bled, Slovenia, September 2009. [14] R. Thawonmas, Y. Kashifuji, and K.-T. Chen, “Detection of MMORPG Bots Based on Behavior Analysis,” in Proceedings of the 2008 International Conference on Advances in Computer Entertainment Technology (ACE). Yokohama, Japan: ACM, 2008, pp. 91–94. [15] R. Thawonmas and K. Iizuka, “Visualization of online-game players based on their action behaviors,” International Journal of Computer Games Technology. [16] N. Ducheneaut and R. J. Moore, “The Social Side of Gaming: A study of interaction patterns in a Massively Multiplayer Online Game,” in Proceedings of the 2004 ACM conference on Computer supported cooperative work. Chicaco, Illinois: ACM, 2004, pp. 360–369. [17] H.-K. K. P. H.-H. C. Kuan-Ta Chen, Andrew Liao, “Game Bot Detection Based on Avatar Trajectory,” in Proceedings of the 7th International Conference on Entertainment Computing (ACE). ACM, 2008, pp. 94–105. [18] B. Weber and M. Mateas, “A Data Mining Approach to Strategy Prediction,” in IEEE Symposium on Computational Intelligence in Games (CIG 2009), Milan, Italy, September 2009, pp. 140–147. [19] C. Thurau, K. Kersting, and C. Bauckhage, “Convex non–negative matrix factorization in the wild,” in Proceedings of the 9th IEEE International Conference on Data Mining (ICDM–09), W. W. H. Kargupta, Ed., Miami, FL, USA, Dec. 6–9 2009. [20] R. Rosenthal, “Covert communication in laboratories, classrooms, and the truly real world,” Current Directions in Psychological Science, vol. 12, no. 5, pp. 151–154, 2003. [21] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA Data Mining Software: An Update,” SIGKDD Explorations, vol. 11, no. 1, 2009. [22] X. Wu, V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z.-H. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, “Top 10 algorithms in data mining,” Knowl. Inf. Syst., vol. 14, no. 1, pp. 1–37, 2007. [23] M. A. Hall and L. A. Smith, “Practical feature subset selection for machine learning,” in Australian Computer Science Conference. Springer, 1998, pp. 181–191. Player Modeling using Self-Organization in Tomb Raider: Underworld Anders Drachen, Alessandro Canossa and Georgios N. Yannakakis Abstract—We present a study focused on constructing models of players for the major commercial title Tomb Raider: Underworld (TRU). Emergent self-organizing maps are trained on high-level playing behavior data obtained from 1365 players that completed the TRU game. The unsupervised learning approach utilized reveals four types of players which are analyzed within the context of the game. The proposed approach automates, in part, the traditional user and play testing procedures followed in the game industry since it can inform game developers, in detail, if the players play the game as intended by the game design. Subsequently, player models can assist the tailoring of game mechanics in real-time for the needs of the player type identified. Keywords: Player modeling, unsupervised learning, emergent self-organizing maps, Tomb Raider: Underworld I. I NTRODUCTION Being able to evaluate how people play a game is a crucial component of the user-oriented testing process in the game development industry. During the development phases, games are iteratively improved and modified towards the final gold master version, which is published. Representatives of the target audience as well as internal professional testers spend hundreds of hours testing the games and evaluating the quality of the gaming experience [1]. Moreover, one of the key components of user-oriented testing both during production, as well as after game launch, is to evaluate if people play the game as intended — and if not, to find out why there is a difference between the intended and actual playing behavior, and whether this has an impact on their playing experience [1], [2]. Given that nonlinear game design (i.e. game design in which the player has multiple choices about how to progress in the game) becomes increasingly popular — massively multi-layer on-line games being a good example of the increased popularity of nonlinear sandboxtype games — the need of more reliable and detailed usertesting is growing. Within the last five years, instrumentation data — or game metrics as they are referred to in game development — has gained increasing attention in the game industry as a source of detailed information about player behavior in computer games [2]. Gameplay metrics are detailed numerical data extracted from the interaction of the player with the game using specialized monitoring software [3]. The application of machine learning on such data and the inference of AD and GNY are with the Center for Computer Games Research, IT Univer...
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached.

Running head: PERSONALITY ASSESSMENT THROUGH GAMING BEHAVIOUR

PsyOps: Personality Assessment Through Gaming Behavior
Institution Affiliation
Date

1

PERSONALITY ASSESSMENT THROUGH GAMING BEHAVIOUR

2

Summary
Tekofsky, Van Den Herik, Spronck, & Plaat, (2013), aim to find the correlation between
video game styles with personality. To test, the value of the video games, a survey was
conducted among Battlefield 3 players. Using a promotional campaign also dubbed as "PsyOps," 13,376 participants were surveyed each one of them was asked to fill out 100 items IPIP
(International Personality Item Pool) Big, big five personality questions and asked to draw their
game statics from the public source. The data collected was stored via Psy-Ops website.
The participants were to visit the website when submitting their data which was categor...


Anonymous
I was having a hard time with this subject, and this was a great help.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags