A Place For Big Data Communication Literature Review Help

User Generated

gxqtenf

Writing

Description

1. Who is the author? Who are they writing for and/or against?
2. Identify and quote a main claim from the reading that you agree or disagree with. Explain your position. (Include the page number, so that you can refer back to this quote later.)
3. Offer an example of the kind of evidence the author uses to support this claim. Is it convincing?

Unformatted Attachment Preview

Original Research Article A place for Big Data: Close and distant readings of accessions data from the Arnold Arboretum Big Data & Society July–December 2016: 1–20 ! The Author(s) 2016 DOI: 10.1177/2053951716661365 bds.sagepub.com Yanni Alexander Loukissas Abstract Place is a key concept in environmental studies and criticism. However, it is often overlooked as a dimension of situatedness in social studies of information. Rather, situatedness has been defined primarily as embodiment or social context. This paper explores place attachments in Big Data by adapting close and distant approaches for reading texts to examine the accessions data of the Arnold Arboretum, a living collection of trees, vines and shrubs established by Harvard University in 1872 (The original interactive data visualizations can be found online: http://www.lifeanddeathofdata.org). Although it is an early and unconventional example of the phenomenon, there are several reasons that the Arboretum is a useful site for investigating the relationship between Big Data and place. First, the category of place is embedded in a range of data fields used in the Arboretum’s records. Second, the Arboretum has long sought to be a place in which scientists and citizens alike can encounter large collections of data firsthand. Third, the place has shaped fluctuations in the daily production of data over the course of the Arboretum’s 144 year history. Furthermore, Arboretum data can help us see place in ways not necessarily tied to geolocation. Each of these place attachments suggests a different way in which data can be environmental: by being about, in, from, or generative of place. Taken together, these attachments offer a model for examining other data in relation to their environments. Moreover, the paper contends that rather than being detached from place, as prevailing discourses suggest, Big Data bring together more and further reaching place attachments than data sets of smaller sizes. Keywords Environmental data, place, situated knowledge, close reading, distant reading Introduction A key concept in environmental criticism, ‘place’ is often overlooked as a dimension of situatedness in social studies of information. In this paper, I reflect on the place of Big Data through an analysis of accessions records from the Arnold Arboretum. Established in 1872, and located on 281 acres within the Boston neighbourhood of Jamaica Plain, the Arboretum is a long-lived collection of trees, vines, and shrubs managed by Harvard University. Equal parts urban laboratory and ‘zoo for plants’, it is one of the most comprehensive and well-documented collections of its kind in the world1 (Figure 1). Although seemingly modest in size – hosting around 15,000 living plants today and about 70,000 over the course of its history – the Arboretum is an apt site for investigating Big Data’s attachments to place for several reasons. First, place itself is an important kind of data for the Arboretum. Indeed, its collections are assembled from sites of scientific and cultural significance around the world. Second, the Arboretum has long sought to be a place in which scientists and citizens alike can encounter large collections of data first hand, simply by walking the landscape and discovering the variety of carefully tagged plants. Third, when understood as a set of conditions for production, the place has shaped fluctuations School of Literature, Media and Communication, Georgia Institute of Technology, Atlanta, GA, USA Corresponding author: Yanni Alexander Loukissas, Program in Digital Media, School of Literature, Media and Communication, Georgia Institute of Technology, TSRB 85 5th Street NW, Room 318A, Atlanta, GA 30308, USA. Email: yanni.loukissas@lmc.gatech.edu Creative Commons CC-BY: This article is distributed under the terms of the Creative Commons Attribution 3.0 License (http:// www.creativecommons.org/licenses/by/3.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-accessat-sage). 2 Big Data & Society Figure 1. Map of the Arnold Arboretum. Courtesy of the Arnold Arboretum Archives. ß President and Fellows of Harvard College. in botanical data over the course of the Arboretum’s long history. Finally, when looked at abstractly, the Arboretum’s data can help us see place in new ways, which are not limited to aspects of geolocation. As I will show, each of these attachments to place is a different way in which data are subject to environmental criticism. Moreover, the dimensions of place attachment identified in this paper – and the means of identifying them – suggest a place-based approach that might influence other studies of Big Data. If we are to illuminate what is distinctive about Big Data as a cultural form, we must attend to the relationships between data and place that they manifest. Because of their size and scope, Big Data have more and further reaching place attachments than data at other scales. Having said this, the accessions data of the Arnold Arboretum do not conform to present-day definitions of Big Data as high magnitude in a variety of dimensions: volume (terebytes or petabytes), velocity, variety, scope, resolution, flexibility, and relations with other data sets (Kitchin and Lauriault, 2014). However, this litany of attributes accounts for only the most ambitious of contemporary practices with Big Data (Kitchin and McArdle, 2016). My use of the term is more in line with the work of boyd and Crawford, who characterise Big Data as a phenomenon with not only technological but also cultural and scholarly dimensions (boyd and Crawford, 2012). I approach Big Data as an epistemological and performative shift in ways of doing research, with a long history involving data sets that Loukissas were previously unmanageable. Seen in this way, we might say that the Arnold Arboretum has been making Big Data for over a century. In the 19th and early 20th centuries, arboreta – as well as libraries, museums, and zoos – held the Big Data of their day. Institutions like the Arnold Arboretum prefigured Big Data by drawing together representative specimens from far and wide. The most ambitious of these institutions sought to establish themselves as comprehensive models of the world (Battles, 2004). As with contemporary holders of Big Data, these institutions continually outstripped strategies for managing all the records necessary to organise, preserve, and study their contents. The Arboretum’s historical data illustrates, better than most, a variety of environmental issues in Big Data. At the Arboretum, data are about place, in place, from place, and even generative of place. Learning about these long-standing forms of place attachment can prompt us to challenge settled conceptions about the relationship between data and place in contemporary life. Data and place There is a long history of scholarship on the place of information within discourses on cyberspace (Kalay and Marx, 2001), cities (Mitchell, 1995), networking (Graham, 1998), interaction (Dourish, 2006), and development (Irani et al., 2010). However, discussions of Big Data often downplay the significance of place. Meanwhile, popular media depict Big Data as increasingly commonplace: a ubiquitous tool for governments (Morozov, 2014), science (Anderson, 2008), and business management (Lohr, 2012). In scholarship, Big Data and place are sometimes treated as incompatible concepts. An influential article by Dalton and Thatcher argues that Big Data distracts from attention to place. ‘Relying solely on ‘‘Big Data’’ methods’, they write, ‘can obscure concepts of place and place-making because places are necessarily situated and partial’ (Dalton and Thatcher, 2014: 6). Rather, I understand Big Data as situated and partial because of their attachments to distributed places. Although place has been an important a topic of interest in the social sciences (Gieryn, 2000), my readings of data in this paper are influenced by literary and cultural studies. Buell, a leading voice for ecocriticism, draws together many conceptions of place in his book, The Future of Environmental Criticism (Buell, 2009). Perhaps the most succinct of these is offered by Agnew, who writes of places as ‘discrete if ‘‘elastic’’ areas in which settings for the constitution of social relations are located and with which people can identify’ (Agnew, 2013: 263). Buell also expounds on the 3 multiple dimensions of place attachment in texts, including temporal and imagined conceptions of place. My development of the notion of place attachment for social studies of information builds on these important precedents but is grounded in readings of Big Data manifest at the Arnold Arboretum. In this article, I define place as a framework with both social and spatial dimensions, in which data are created, displayed, and/or managed, and which, reciprocally, is shaped by those practices. Indeed, data are not simply site-specific tools; they have the power to reconfigure place. My reflections on the relationship between data and place at the Arboretum only serve to refine existing scholarship on the grounding of data within social studies of information and science, technology, and society (STS). Scholars of information have examined how the meaning and significance of the term ‘data’ has evolved over the past few centuries (Day, 2014; Drucker, 2011; Gitelman, 2013, 2014) as well as how it differs in use across academic and professional domains (Borgman, 2015; Star and Griesemer, 1989). Borgman traces data to its earliest use in theology in 1646, when it was applied as a plural of the term datum. It was not until the late 18th century, writes Borgman, that data was used to describe the results of empirical observations of the kind associated with scientific practice at the Arboretum. Meanwhile, scholars in STS have developed empirical accounts of how data are situated in specific scientific contexts (Bowker and Star, 1999; Latour, 1987). This scholarship has largely sought to complicate a widely held, but simplistic perspective: that data are universal, invariable, and altogether immaterial. Latour deftly captures this purified conception of data in the concept of ‘inscription’. In a frequently referenced paper entitled ‘Visualisation and Cognition: Drawing Things Together’, Latour (1990) explains inscriptions as things created for the production of scientific arguments. As he writes, ‘you have to invent objects, which have the properties of being mobile but also immutable, presentable, readable and combinable with one another’ (Latour, 1990: 7). Many scholars have challenged this instrumentalised definition by exposing ways in which data practices, and data themselves, vary from one context to the next. Research on the diversity of data has been conducted in studies of laboratories (Cetina, 1999; Keller, 2003; Latour and Woolgar, 1979), museums (Star and Griesemer, 1989), healthcare (Bowker and Star, 1999), climate debates (Edwards, 2010), and space exploration (Vertesi and Dourish, 2011). Today, in the varied work practices at the Arnold Arboretum, data are used as scientific evidence but also simply as a tool for landscape management. I rely on a grounded approach to data, with the aim of studying 4 the term as it is used in multiple ways in practice. Aligned with this thinking, Borgman suggests that understanding data means asking, ‘when are data?’ (Borgman, 2015) She writes, ‘entities become data only when someone uses them as evidence of a phenomenon, and the same entities can be evidence of multiple phenomena’. (Borgman, 2015: 28) In other words, data must be performed. Moreover, data are more that merely representational. In common parlance, the term data can be used to mean secondary, digital representations of objects that hold scientific and cultural import. However, my findings support the view that data are part of an ontological ‘looping effect’ whereby they help to shape the practices and institutions that create them (Hacking, 1991; Kitchin and Lauriault, 2014). Finally, I have found that prior scholarship in information studies and STS scrutinises data primarily through case studies of discrete technological moments or controversies; these studies provide an event-based reading of data. In contrast, this paper contributes to the development of an emergent place-based perspective (Galison and Thompson, 1999; Kirsch, 2011; Livingstone, 2003). Though the concept of place has been important to environmental criticism, it has been largely overlooked in discourses on situatedness (Buell, 2009). Rather, situatedness is defined primarily as embodiment or social context (Haraway, 1988; Suchman, 2007). I contend that all data can be studied through a local lens, in terms of their place attachments. Even Big Data are connected to ‘local knowledge’, grounded in and inseparable from their social, material, and spatial conditions (Geertz, 1985). Although data are reliably transferred across global communication networks; everywhere, they remain marked by local artefacts: traces of the conditions and values that are particular to their origins. Accepting this claim necessitates a significant shift in our expectations of digital data, given that the digital was invented to be independent of any substrate (Hayles, 2008). Indeed, all data – not just those created at arboreta and other sites for documenting nature – can be read through their attachments to environments. However, data do not speak for themselves. Reading is a means of enacting data, which is also locally situated. In this paper, I use close and distant readings to not only discover but also produce, salient connections between Big Data and place. Close and distant readings Examining place attachments in data requires adopting appropriate methods. In this paper, I make use of a combination of techniques, which I will refer to as close readings and distant readings. These are complimentary ways of interpreting accessions records from Big Data & Society the Arboretum: one up-close, the other from a distance. This hybrid model of analysis owes much to developments in ‘close’ and ‘distant’ reading as methods of interrogating texts in literary and cultural studies (Jänicke et al., 2015). When used as a method of analysis for literary texts, Culler explains that close readings attend to ‘how meaning is produced or conveyed’ (2010: 22). Meanwhile, distant reading aims, paradoxically, not to read. Instead, the later technique, pioneered in literature by Moretti, aims to ‘generate an abstract view by shifting from observing textual content to visualizing global features of a single or of multiple text(s)’ (Jaenicke and Franzini, 2015: 2). Moretti uses traditional methods of graphical display, such as maps, graphs, and trees to illuminate large-scale narrative and geographic patterns in texts. Both close and distant readings reveal not just what is in a data set, but how that data might be enacted. Through close and distant readings, I treat data as texts: cultural expressions subject to interpretive and speculative examination. However, accessions data resemble indices more than prose. As such, they require a great deal more context to decipher. Additionally, both techniques suggest their own relationship to place. The terms close and distant seem to describe a spatial relationship between the analyst and the data. However, my distance from the Arboretum data is not so simply summarised. All my readings of the Arboretum rely on the interpretations of Arboretum staff members, who use their own local knowledge to identify place attachments in the data that are not immediately apparent. Rather, the difference between close and distant reading techniques, applied to data, hinges on the pervasiveness of the features being investigated. Close readings focus on isolated features in a data set; distant readings illuminate features common throughout. Creating both kinds of readings for this paper relied on a prolonged ethnographic engagement with the Arnold Arboretum. During the period of 2012 to 2014, I lived and worked in close proximity to the Arboretum. I conducted nine semi-structured interviews with researchers, administrators, and technologists at the institution and did archival work at their library. But more importantly, I was a participant observer in both formal and informal engagements, including: a course on landscape architecture, a series of outings to map a ‘wild’ portion of the Arboretum, and an intensive two-day workshop that brought together Arboretum staff with STS scholars (http:// stsdesignworkshop.tumblr.com). Over the course of the final year of this engagement, I worked with Arboretum staff to develop close and distant reading techniques appropriate for looking at their data. Beyond the findings about place attachments in Big Loukissas Data, this approach furthers the development of interpretive digital methods and their adaptation from traditional humanities subjects to the study of other forms of data. Reading accessions data as texts Seeing data as texts accessible to traditions of hermeneutic inquiry means reading them within an interpretive context. I argue that it would be difficult to understand these records without considering the way they are culturally and materially situated in place. Indeed, accessions records have a long history of development and use at the Arboretum. For one thing, they were not always recognisable as data. The Arboretum has weathered many successive regimes of documentation. Thus, each organism has germinated within a social and technological setting, its care and curation managed through the instruments and information structures deployed during its lifetime. These place-based practices, and the documents they produce, register what is valued about individual organisms at the Arboretum and, in turn, how those values change over time (Figure 2). Today, plants collected from around the world and across time are held together at the Arboretum by a custom digital record system called BG-Base, a database system developed specifically for this collection. Each entry in the Arboretum’s data set includes an accession number, an extensive list of scientific, common, and abbreviated names, redundant ways of identifying the time of accession, the form and mechanism of reception, individuals associated with the plant, various descriptions of the place the accession hails from, its condition in the wild, and an additional catch-all category. A list of fields used by the Arboretum includes the following: ACC_NUM, HABIT, HABIT_FULL, NAME_NUM, NAME, ABBREV_NAME, COMMON_NAME_ PRIMARY, GENUS, FAMILY, FAMILY_COM MON_NAME_PRIMARY, APG_ORDER, LIN_ NUM, ACC_DT, ACC_YR, RECD_HOW, RECD_ NOTES, PROV_TYPE, PROV_TYPE_FULL, PSOURCE_LABEL_ONE_LINE, COLLECTOR, COLL_ID, COLLECTED_WITH, COUNTRY_ FULL, SUB_CNT1, SUB_CNT2, SUB_CNT3, LOCALITY, LAT_DEGREE, LAT_MINUTE, LAT_SECOND, LAT_DIR, LONG_DEGREE, LONG_MINUTE, LONG_SECOND, LONG_DIR, ALTITUDE, ALTITUDE_UNIT, DESCRIPTION, COLLECTION_MISC If found within a library, museum, or archive, many of these fields would be incorporated into metadata: the information necessary to catalogue a book or other 5 object, such as details of their contents, context, quality, structure, and accessibility. At the Arboretum, this locally defined selection of fields is known simply as ‘accessions data’. However, accessions data are shaped by many of the same local forces that affect metadata (Edwards et al., 2011; Mayernik et al., 2011). Furthermore, as with metadata, each accession record exists as part of a local constellation of information, including the details of the associated plant’s phenology, genetic characteristics, transpiration rate, and growth habit. Even the specimen itself is a kind of data (Gnoli, 2012). This entire ‘data assemblage’ is necessary to make plants real and present in the contemporary ecological, scientific, and public life of the Arboretum (Kitchin and Lauriault, 2014) (Figure 3). As mentioned above, documentation practices at the Arboretum long predate contemporary notions of data. Today, records are available in multiple formats simultaneously: on maps, in ledgers, on index cards, and only recently, in digital forms. It was not until the summer of 1985 that the Arboretum started converting its accessions data from index cards crowded in a vertical file to digital data stored in BG-Base (Figure 4). These digitised data afford new opportunities for access and analysis. Even so, some staff members continue to use older formats exclusively, for they do not yet trust the process of digitisation. Regardless of their format, what counts as data at the Arboretum is a matter of context. Del Tredici explains, ‘the data, in and of itself, is only valuable [for] somebody who understands its significance’.2 To further his point, Del Tredici likens the ‘raw data’ to seeds. When a seed will not germinate, there are innumerable possible reasons. ‘Unless you know how to interpret the behaviour of the seed, it is just non-data’.3 In the sections that follow, I examine the role of place as a form of context necessary for interpreting Arboretum data by looking at three significant place attachments: when place is a form of data, when data are encountered in place, and when place shapes data. Close readings of place in accessions data Understanding data in their environmental context means being attentive to the category of place. Big data, exemplified here by the accessions records of the Arnold Arboretum, exhibits place attachments that are more complex and distributed than might be expected. The scale and diversity of a collection has significant implications for its ties to a variety of environmental conditions and conceptions. In the three examples that follow, close readings of data related to individual specimens reveal diverse place attachments. 6 Big Data & Society Figure 2. Early map of the Arboretum. Image by the author. Place as data: The case of Prunus Sargentii PROV_TYPE, PROV_TYPE_FULL, PSOURCE_ LABEL_ONE_LINE, COUNTRY_FULL, SUB_ CNT1, SUB_CNT2, SUB_CNT3, LOCALITY, LAT_DEGREE, LAT_MINUTE, LAT_SECOND, LAT_DIR, LONG_DEGREE, LONG_MINUTE, LONG_SECOND, LONG_DIR, ALTITUDE, ALTITUDE_UNIT. The subset of fields listed above all contribute to the characterisation of place in Arboretum accessions data. In order to understand the origin of a single specimen using these data, it is necessary to take account of multiple fields and how they might interact. A special cherry tree (Prunus sargentii) accessioned to the Arboretum on a leap day in 1940 provides an example of this process. The history of the cherry tree is catalogued in BG-Base under the specimen number 130–140. The provenance of the plant Loukissas 7 Figure 3. Early ledger containing accessions data. Image by the author. (PSOURCE_LABEL_ONE_LINE) is attributed to the institution’s founding director, Charles Sprague Sargent, at the address of the Arboretum itself: ‘125 The Arborway, Jamaica Plain, MA’. Part of the tree’s Latin name, sargentii, honours this parentage.4 Meanwhile, the tree’s country of origin (COUNTRY_FULL) is listed as ‘Japan’. Sargent might have acquired the plant during an expedition to Asia. However, this would seem to be in conflict with other known conditions. Sargent died in 1927, 13 years before the listed accession date. Moreover, for reasons that will be explained later in the paper, wild plants from abroad had not been taken in at the Arboretum since the mid 1920s. Sargent could not have transported the plant from Japan to the Arboretum on the date of accession. This inconsistency is an artefact of the way that origins are documented today at the long-lived institution. Staff at the Arboretum know that a few select fields – date, place of origin, provenance – do not tell the whole story. One has to look to another field, the provenance type (PROV_TYPE) of the cherry tree, to learn that it is a ‘cultivated plant of known (indirect) wild origin’ or ‘z’ for short. In other words, specimen number 130–140 grew from a cutting taken off a specimen collected in the wild. Provenance type is a classification of disputed value, for it a social distinction, rather than a biological one. Wild plants and their cuttings are genetically identical, and in this case the ‘z’, helps to clarify that the cherry tree in question was grown from a cutting of one of Sargent’s original specimens – probably number 16760, unearthed from its native Japanese soil in 1892. This example illustrates some of the complexities of place as presented within the Arboretum’s data. Data about place is not simply contained in a field. This form of place attachment must be understood through a matrix of values and coordinated through local knowledge about the history of data collection practices at the Arboretum. Place of data: The case of Torreya Grandis As the previous example illustrates, the Arboretum is an aggregated landscape stitched together from plants once residing in other places. Most of these specimens hail from an ecological zone defined by close proximity to the latitude of Boston, stretching across England, Greece, South Korea, China, and Japan. When encountered at the Arboretum, each of these plants stands with its data. A thin plastic card embossed with a subset of accession details usually hangs from its trunk or branches. The cards contain fields that are relevant for Arboretum staff, researchers, and visitors: scientific name, accession number, plant family, accession date, propagation material (e.g., seed ‘SD’ or scion ‘SC’), location, common name, source/collection data. 8 Big Data & Society Figure 4. Card catalogue containing accessions data. Image by the author. Together the plants and their tags make the Arboretum into a full-scale scientific map, organised using the Bentham and Hooker taxonomy, a system that dates to the late 19th century. The Arboretum landscape is itself a place for encounters with data (Figure 5). In order to understand this second form of place attachment, let us revisit a tour of the Arboretum Loukissas 9 Figure 5. Arboretum tag diagram. Courtesy of the Arnold Arboretum Archives. ß President and Fellows of Harvard College. grounds that occurred in late June of 2013. During a workshop that I co-organised, a group of visitors were guided by Del Tredici through the Explorer’s Garden, an area of the Arboretum nestled in a microclimate beneath the summit of Bussey Hill. Del Tredici stopped to comment on his relationship to the living collections. ‘I’ve got a lot of direct connection to a lot of these plants. That little plant, Torreya grandis, I collected in China in 1989. So a lot of these are like my offspring’.5 Del Tredici explains that he found the seeds of the Torreya grandis at a market in China. Fleshy and green, they struck him as unusual examples of edible seeds produced by a conifer. But beyond what is interesting about the plant itself, this quote provides a compelling starting point for understanding what is and is not included in the data landscape of the Arboretum (Figure 6). The acquisition date of the Torreya grandis and Del Tredici’s association with it are duly noted on the Torreya’s tag. Also pressed into the tag’s smooth surface, ‘pinales’, registers the plant’s bemusing status as a conifer. However, there is no hint of the oddness of this ordering. Moreover, several features of the plant’s local significance are not included on the tag, which serves only to position the Torreya grandis within a scientific landscape. Tags do not explain how plants like the Torreya are literally and figuratively torn up by the roots and relocated to a new ecological and cultural context. Let’s explore a few of these absences. Del Tredici is identified on the tag as a ‘collector’, not as a ‘progenitor’, or ‘breeder’, as his statement would suggest – this, despite the fact that he is responsible for the reproduction of the plant in the Boston region. The term ‘collector’ speaks of the scientistand-specimen relationship between Del Tredici and the plant, rather than the more nurturing association between Del Tredici the horticulturalist and the organism he has cultivated. The latter is more in line with his own intimate way of identifying with the Torreya grandis as an ‘offspring’. Furthermore, there are few traces of the fruitful intersections between the living collections and the local communities in Boston that surround the site. Do not look to data for connections between dandelions (Taraxacum officinale) and the elderly Greek women who collect them in the early summer to make horta vrasta (boiled greens), or associations between the ‘tree of heaven’ (Ailanthus altissima) and the devout Dominicans who discover starlit sites for their Santeria rituals in the groves of the Arboretum’s Bussey Brook Meadow. Such details, though important to the local meaning of the Arboretum’s plants, are not part of the way data as tags interact with the place. I introduce the example of the Torreya grandis to call attention to the placement of data, but also their limits as tools for understanding the places they reside in. Data do not capture the full lives of Arboretum plants. This understanding reinforces prior studies of data that show institutionalised categories to be connected to specific social groups (Star and Griesemer, 1989). While useful for establishing shared references among the Arboretum’s staff and its visitors, data 10 Big Data & Society Figure 6. Peter Del Tredici in the Arboretum. Image from a workshop hosted by the author. categories sit beside, but do not account for, all the varied place-based meanings that Arboretum plants embody. Data of place: The case of Tsuga Caroliniana So far, my close readings have revealed how place appears in data and how data appears in place. It is also important to understand how a place affects data’s production. For this last point, let us consider the hemlock, which is a local tree that has been in rapid decline all over the eastern United States due to the non-native insect, the hemlock woolly adelgid. In the late 1990s, a large, unaccessioned stand of hemlocks in the Arboretum fell victim to the pest. A note in the accession record for one Carolina Hemlock reads ‘plants producing very heavy seed crop, heavily infested with woolly adelgid’.6 Although these trees had been residents on the institution’s grounds for decades, they were only accessioned into the collection in order for the infestation to be tracked and treated with imidacloprid, a powerful insecticide. Originally intended as a backdrop for species of scientific significance, the hemlocks were never intended to be an official part of the collection. The blighted hemlock accessions made 1998 a peak year of expansion for the Arboretum, but only from the perspective of data (Figure 7). This example illustrates that even seemingly straightforward fields like ‘date’ can have a complex relationship to place. For each entry in BG-Base, what the accession date means is dependent on context. It might mean when a seed was planted, when a seedling arrived on site, or simply – as in the case of these hemlocks – when an existing plant was annexed to the collection. But beyond the curious and local significance of their accession dates, the hemlocks are interesting because they raise deeper questions about the role that data perform. Controversy still surrounds the decision to make the feral stand of hemlocks part of the collection. Del Tredici sees the trees as invaluable for studying the infestation process. ‘It was only by accessioning the plants that we could track their decline over time or the insecticidal treatment of those plants we decided to treat’.7 Meanwhile, the current Arboretum director William Friedman looks on the hemlocks of questionable provenance as inherently undesirable, for they lack essential data about their origins that would make them reliable subjects of scientific study. Why not replace them with trees of actual research significance? Such disagreements highlight the tensions between competing realities at the Arboretum; it is a living place, but also a repository for data. Hence, data may be looked upon as ‘just good-enough’ tools (see the article on data as ‘just good-enough’ in this issue) to support direct work with the collection: organising plants, notes, and relationships among them in a convenient manner. But without reliable data, the emergent form of the collection can disappear altogether, its contents scattered in an ontological wild. Loukissas 11 Figure 7. A hemlock tree at the Arboretum. Image by the author. Coexisting concerns about the necessity of data and their inherent instability over time reinforce a lesson from STS that holds across shifts in technology: data must be part of a knowledge ecology (Edwards, 2010). The metaphor to environmental processes is apt. Arboretum scientists, specimens, and data infrastructures are all necessary to generate, verify, and sustain what the place knows. It is the place – of which data are only a part – that holds knowledge about the Arboretum hemlocks, their deadly infestation and its implications for similar trees across the Northeast. But at the Arboretum, the knowledge ecology is more than a metaphor. Data are necessary components of the functioning biological system created and maintained at the Arboretum. They transcend their roles as representations by directly supporting the reality they describe. 12 Distant readings of place in accessions data Through close readings of the Arnold Arboretum’s accessions records, the previous section demonstrates that there are numerous ways in which data can be entangled with place: when place is a kind of data, when place is the site of encounters with data, and when place is the site of data’s production. Each of these place attachments can be exposed through close readings of accessions data for individual plants: Prunus Sargentii, Torreya grandis, and Tsuga caroliniana. However, looking at the accessions to the Arboretum all-together, through distant reading techniques, can reveal alternative conceptions of place. I use the term distant reading to describe the experience of looking at the whole Arboretum through the data of it parts. Rather than being a god’s eye view, characterised by Haraway as one that seems to come ‘from nowhere, from simplicity’ (Haraway, 1988: 589), a distant reading is a situated but wide-ranging perspective. It offers views of data, rather than views through data. Creating distant readings requires a critical sensibility towards data, including attention to what might be occluded as well as what other vantage points are possible. This approach compliments prior work in geography on the critical studies of landscape representation (Barnes and Duncan, 2013; Cosgrove, 2008), as well as the development of critical practices in mapping (Crampton, 2011; Kitchin et al., 2013). A distant reading as more like a panorama than a map. Although it is not uncommon to hear the term panorama used today to describe graphical displays of data, few acknowledge that, unlike maps, panoramas are situated ways of seeing places. As far back as the 18th century, the term was used to describe pictorial representations of landscapes as seen by an observer positioned at a single strategic point. Moreover, panoramas have long been understood as mediated. Like distant readings, they are enacted through technological means. The historian, Schivelbusch (1986), uses the term panoramic to evoke the once unfamiliar view across an expansive landscape afforded by the speed of the passenger train.8 Just as the rapid pace of the passenger locomotive offered new vistas across broad stretches of space, the distant readings included here reveal perspectives at previously incomprehensible scales. But distant readings are not narrowly defined technical tools (Hall, 2008). Rather, they generate alternative experiences of data and the places they depict. In the distant reading presented below (Figure 8. Best seen in colour), the Arboretum is portrayed as an agglomeration, a pattern, and a system in flux. Here, data are enlisted to construct a new sense of place. Because of their scale and heterogeneity, large Big Data & Society data sets offer opportunities for new experiences of place that – like Schivelbusch’s locomotive panorama – are different from anything seen before. In the portion of the paper that follows, I will introduce a series of distant readings. None of these readings are neutral or inevitable. Rather, they help us reimagine the Arboretum as a place with origins, structures, and dynamics that are not constrained by their geography. Instead, they depict landscapes of temporality. Place as history Figure 8 portrays the Arboretum as an aggregate place developed over time. Beginning in 1872 and ending in 2012 (when this set of records was made available for use), the distant reading portrays a temporal graph of plant specimens. The image is a kind of timeline: structured by yearly accessions, much like trees record environmental patterns in their annual growth rings. Months and days index accumulated specimens, each denoted by a dot. This two-dimensional view can be enhanced by a series of section cuts through daily accessions (see Figure 9). In the original interactive version of this visualisation (accessible here: http://www. lifeanddeathofdata.org), the section, which portrays the number of accessions on each day of the selected year, can be produced for any year along the timeline. Such distant readings can be used to call attention to variations in the data by linking them to colour, size, and other visual cues. For instance, Figure 8 displays changes in provenance type (a category mentioned earlier) across the history of the collection. Here, a green dot represents a plant collected in the wild, a yellow dot signifies a cutting from a wild plant, a black dot indicates a cultivated plant, and a grey dot stands in for a plant from an unknown origin (far more common than one might expect). Fluctuations across these provenance-related colours illustrate shifts in the make up of the Arboretum, between collections of scientific importance (mostly collected from the wild), and selections in the service of horticulture (mostly from other cultivated collections). The distribution of green, yellow, black, and grey dots faintly demarcates three eras of collecting at the Arboretum identified by curator of living collections Michael Dosmann.9 In the late 19th and early 20th centuries, Sargent engaged in a global project of scientific fieldwork to collect distantly related species from around the world as evidence to support Darwin’s theory of evolution. However, in the 1920s, the United States Department of Agriculture discovered that the Arboretum was inadvertently collecting invasive bugs along with its imported plants. Sargent lost a legal battle with the government and wild collecting decreased substantially thereafter. Loukissas Figure 8. Linear timeline of Arboretum accessions. Image generated from JavaScript code by the author and Krystelle Denis. 13 14 Big Data & Society Figure 9. Section cut through linear timeline of Arboretum accessions. Image generated from JavaScript code by the author and Krystelle Denis. The ensuing middle years of the 20th century are sometimes known at the Arboretum as the Wyman era, after a prominent horticulturist. During this time, the Arboretum halted its foreign expeditions and relocated its scientific research to Harvard’s Cambridge campus. The ensuing research centred on the herbarium, a much larger collection made up entirely of dried plants (Figure 10). Dosmann explains that the expansive grounds in Jamaica Plain became a ‘showcase garden’, a place to display the horticultural trends of the day. During this period, says Dosmann, ‘if you did want to go and collect anything, you went to a nursery’10. It was not until the early 1970s, during a re-evaluation of the mission of the collection associated with its centennial, that the Arboretum re-initiated its expedition work abroad. The renewal of overseas fieldwork built on expanded relationships with institutions in Asia and a focus on emergent and imperative questions around global climate change. The distant reading registers some aspects of these long-term temporal shifts, highlighting in particular the relationship between the two defining arms of the Arboretum, scholarship and horticulture, and the ways in which their relationship changed over time. This visually oriented reading of the Arboretum as data is unlike a photograph or geographic map of the place. It illuminates a landscape shaped over time by otherwise invisible ecological, organisational, and even political forces. However, this particular use of the method is but one way of looking. The data supports many alternative portrayals of place. Alternative histories A radial version of the same timeline pushes the metaphor to embedded arboreal processes – the forming of rings in a tree (see Figure 11). Moreover, new patterns are illuminated by the density gradient from centre to periphery. The three eras of collecting become more prominent as the sparse accessions in early years are compressed into a smaller space. Moreover, subtle lines of accessions running through significant dates of the year are accentuated. They appear as concentrated rays within the circular geometry. Finally, the radial organisation suggests an entirely different kind of temporality: one that has an origin at some fixed point and then expands indefinitely into the future. This radial image can be read against the linear one, which presents time as being infinite in two directions. The accessions depicted in linear form seem sparse in comparison. In the linear version, one can more clearly see increased collection occurring over the years, albeit Loukissas 15 Figure 10. Herbarium specimen. Image courtesy of the Herbarium of the Arnold Arboretum of Harvard University. Cambridge, Massachusetts, USA. with a narrowing in the 1940s. Moreover, practices seem to change dramatically across seasons in the second half of the 20th century, transitioning from winter accessioning to accessioning year-round. However, other patterns are not present in the linear timeline. The dispersion of accessions in the early years makes it more difficult to note the intensity of wild collecting during the first period of exploration, and its symmetry with the period after the 1970s, when the Arboretum began to collect externally again. At a 16 Big Data & Society Figure 11. Radial timeline of the Arnold Arboretum. Image generated from Java code by the author. more detailed level, a substantial gap in collecting on Christmas day appears clearly in the radial version but disappears into the fringe of the linear image. This gap could be made more prominent by simply reordering the arrangement of months, but what other patterns would be shifted out of view? Both the radial and linear versions obscure the exact number of accessions that have come in per day. A 3D approach as demonstrated in Figure 12 can help to make the number of accessions more evident. Rather than being arranged solely by date, the 3D image highlights every accessioned plant at the Arboretum and exposes the rate of accumulation along a new z-axis. The resulting form is a cone. Moments of rapid growth in the collection appear as narrow segments of the cone, whereas periods of slower development flatten it out. While evocative in its shape, the 3D visualisation is more difficult to read overall; In fact, most of the patterns exposed by the previous images are compromised when portrayed in 3D. Graphics overlap from opposite sides of the cone, the visible circumference of the yearly rings is narrowed, and daily accessions are difficult to align with month and year markers. Such examples of distant reading are both interpretive and speculative. They present the Arboretum as multiple. Each version of the place offers its own experience of the substantial collections brought together over a long history. Histories out of place The distant readings depicted above suggest different ways of making sense of the Arboretum as a whole. But distant readings can contain telling details as well. Loukissas 17 Figure 12. 3D Timeline of Arboretum accessions. Image generated from Java code by the author. Indeed, we can learn more about the kind of place the Arboretum is by inspecting components of the distant readings close up. In particular, it is useful to pay attention to apparent anomalies or glitches in the images. I have previously called these ‘data artefacts’ (Battles and Loukissas, 2013). In most work with data visualisation, such irregularities are cleaned up. Like various kinds of data dirt, they appear to be simply out of place (Douglas, 1978; Mody, 2001). But data artefacts speak to the human history of their accumulation. 18 Consider, for example, the rays of clustered accessions so prominent in the radial version of the timeline. A literal reading of these rays suggests that accessions arrived en mass on certain days, particularly on the 15th of every month, on the first of the year, and on the first of July. But Del Tredici suggests that the rays are most likely technological artefacts. ‘If something came in (during) August of 1942, I think BG-Base would output that [by] default as August 15th’.11 Without a precisely recorded day of accession, BGBase places accessions squarely in the middle of the month. The pattern is similar at the scale of the year. Accessions appear unusually heavy on 1 July, the beginning of the Arboretum’s fiscal year. Deriving from various processes, such artefacts are often entangled with the contingencies of a place. Those mentioned above are technological artefacts, resulting from the material conditions of data creation. However, disciplinary artefacts might betray specialised ordering systems, and vernacular artefacts might be the result of dialects or local language uses. These various kinds of artefacts can be extraordinarily subtle and difficult to tease out, but distant reading is an adept tool for bringing such artefacts to the surface. Data artefacts register not only local changes in technology, personnel, and organisation but also broader cultural rhythms and events. Look closely and you can spot the Second World War as well as Christmas (mentioned previously), as gaps between denser periods of accessions. The first is manifest as a bald swath in the middle of the 1940s. The second is particularly noticeable in the radial timeline as a wedge of space radiating down 25th December. Accessions from particular regions are also affected by international relations. Del Tredici recounts, ‘when Nixon went to China, I started to get small little exchanges of seed packets and things like that’.12 Through data artefacts, we can see more than a collection of plants. Kyle Port, the Arboretum’s plant records manager, notes that artefacts betray the ‘personalities’ behind the data.13 Together, these personalities contribute to the sense of place generated through distant readings. Conclusions Social studies of information have much to learn from methods currently emerging in environmental studies and criticism. We should learn to see data as cultural forms that are situated socially and materially, but also in place. Moreover, related techniques from the digital humanities can frame these data as texts, to be read up close or from a distance. Reading the place attachments in data can help us learn what is distinctive about Big Data. I contend that because of their heterogeneous nature, Big Data – exemplified by the historical Big Data & Society example of the Arnold Arboretum – bring together more and further reaching attachments to place than data sets of smaller sizes. Although the Arboretum does not conform to contemporary expectations of Big Data as petabyte-scale, it belongs to the long historical arc of the present-day Big Data phenomenon. The Arboretum has long contended with data sets verging on the unmanageable and aspiring to the scale at which n (the number of elements in the data set) ¼ all. Each of the place attachments explored in this paper suggests a different way in which data can be environmental: by being about, in, from, or even generative of place. Taken together, these four ways of probing data offer a model for how to read Big Data from an environmental perspective. However, the approach demonstrated in this paper should not be mistaken as a formula for engaging with data anywhere. The methods used here were developed in situ, with the particular place attachments of the Arboretum at hand. Instead, this paper is meant as an example of environmental data studies. Such studies can unsettle conceptions of Big Data, by calling attention to their origins as well as the experiences they create. Following on this paper, scholars of environmental data would do well to seek out other places and place attachments. Understanding the possible relationships between data and place can help us challenge the wisdom of Big Data’s centralised models of management. As this paper has shown, thinking about data as mobile, immutable, and generally detached from place can obscure important ways in which data rely on local knowledge and experience for meaningful interpretation and responsible use. When taken out of place, data can come to be seen by unfamiliar audiences as either the view from nowhere or nothing more than data dirt. Acknowledgements This work was made possible by William (Ned) Friedman, director of the Arnold Arboretum, and by the generous participation of many Arboretum staff members. In addition, the paper has benefitted from feedback offered by the editors and reviewers of the journal as well as numerous colleagues: Matthew Battles, Kyle Perry, Katherine Diedrick, Lauren Klein, Greg Zinman, and collaborators at metaLAB(at) Harvard and in the digitalSTS community. Some of the visualisations used in this paper were designed and implemented in collaboration with Krystelle Denis. Declaration of conflicting interests The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Loukissas 19 Funding boyd d and Crawford K (2012) Critical questions for Big Data. Information, Communication & Society 15(5): 662–679. Buell L (2009) The Future of Environmental Criticism: Environmental Crisis and Literary Imagination. Hoboken, NJ: John Wiley & Sons. Cetina KK (1999) Epistemic Cultures: How the Sciences Make Knowledge. Cambridge: Harvard University Press. Cosgrove D (2008) Geography and Vision: Seeing, Imagining and Representing the World. New York, NY: I.B.Tauris. Crampton JW (2011) Mapping: A Critical Introduction to Cartography and GIS. West Sussex: John Wiley & Sons. Culler J (2010) The closeness of close reading. ADE Bulletin 149: 20–25. Dalton C and Thatcher J (2014) What does a critical data studies look like, and why do we care? Society and Space – Environment and Planning D. Day RE (2014) Indexing It All: The Subject in the Age of Documentation, Information, and Data. Cambridge: MIT Press. Douglas M (1978) Purity and Danger: An Analysis of the Concepts of Pollution and Taboo. New York: Routledge & Kegan Paul PLC. Dourish P (2006) Re-space-ing place: ‘Place’ and ‘Space’ ten years on. In: Proceedings of the 2006 20th anniversary conference on Computer Supported Cooperative Work, CSCW ‘06, pp.299–308. New York: ACM. Drucker J (2011) Humanities approaches to graphical display. Digital Humanities 5(1). Edwards PN (2010) A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming. Cambridge: MIT Press. Edwards PN, Mayernik MS, Batcheller AL, et al. (2011) Science friction: Data, metadata, and collaboration. Social Studies of Science 41(5): 667–690. Galison P and Thompson E (eds) (1999) The Architecture of Science. Cambridge: The MIT Press. Geertz C (1985) Local Knowledge: Further Essays in Interpretive Anthropology. New York: Basic Books. Gieryn TF (2000) A space for place in sociology. Annual Review of Sociology 26(1): 463–496. Gitelman L (ed.) (2013) ‘Raw Data’ Is an Oxymoron. Cambridge: MIT Press. Gitelman L (2014) Paper Knowledge: Toward a Media History of Documents. Durham: Duke University Press. Gnoli C (2012) Metadata about what? Distinguishing between ontic, epistemic, and documental dimensions in knowledge organizations. Knowledge Organization 39(4): 268–275. Graham S (1998) The end of geography or the explosion of place? Conceptualizing space, place and information technology. Progress in Human Geography 22(2): 165–185. Hacking I (1991) A tradition of natural kinds. Philosophical Studies: An International Journal for Philosophy in the Analytic Tradition 61(1/2): 109–126. Hall P (2008) Critical visualization. In: Antonelli P (ed.) Design and the Elastic Mind. New York, NY: The Museum of Modern Art, pp. 120–131. The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project was partially supported by The LaskyBarajas Dean’s Innovation Fund for Digital Arts and Humanities at Harvard University. Notes 1. From an interview by the author with Michael Dosmann, 2014. 2. From an interview by the author, with Peter Del Tredici 2014. 3. Ibid. 4. Note that the provenance is not a place of origin, but rather the name and address of a collector. 5. From an interview by the author, with Peter Del Tredici 2014. 6. From a record in BG-Base, the Arnold Arboretums database of plant accessions. 7. From an interview by the author, with Peter Del Tredici 2014. 8. STS scholar Laura Forlano first connected the use of this term to Schivelbusch’s work in a phone conversation with the author. 9. From an interview by the author, with Michael Dosmann 2014. 10. From an interview by the author with Michael Dosmann, 2014. 11. From an interview by the author with Peter Del Tredici, 2014. 12. From an interview by the author with Peter Del Tredici, 2014. 13. From an interview by the author with Kyle Port, 2014. References Agnew J (2013) Representing space: Space, scale and culture in social science. In: Duncan JS and Ley D (eds) Place/ Culture/Representation. London: Routledge, pp. 251–271. Anderson C (2008) The end of theory: The data deluge makes the scientific method obsolete. WIRED Available at: http://www.wired.com/2008/06/pb-theory/ (accessed 18 April 2016). Barnes TJ and Duncan JS (2013) Writing Worlds: Discourse, Text and Metaphor in the Representation of Landscape. London: Routledge. Battles M (2004) Library: An Unquiet History. New York: W. W. Norton & Company. Battles M and Loukissas Y (2013) Data Artifacts: Visualizing Orders of Knowledge in Mega-Meta Collections. In: Proceedings of the UDC Seminar in Classification and Visualization, pp. 243–258. Würzburg: Ergon Verlag. Borgman CL (2015) Big Data, Little Data, No Data: Scholarship in the Networked World. Cambridge: MIT Press. Bowker G and Star SL (1999) Sorting Things Out: Classification and its Consequences. Cambridge: MIT Press. 20 Big Data & Society Haraway D (1988) Situated knowledges: The science question in feminism and the privilege of partial perspective. Feminist Studies 14(3): 575–599. Hayles NK (2008) How We Became Posthuman: Virtual Bodies in Cybernetics, Literature, and Informatics. Chicago, IL: University of Chicago Press. Irani L, Vertesi J, Dourish P, et al. (2010) Postcolonial computing: A lens on design and development. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ‘10, pp.1311–1320. New York, NY: ACM. Jänicke S, Franzini G, Faisal C, et al. (2015) On Close and Distant Reading in Digital Humanities: A Survey and Future Challenges. A State-of-the-Art (STAR) Report. In: (Proceedings) EuroVis 2015: The EG/VGTC Conference on Visualization. Cagliari. Available at: http://dx.doi.org/10.2312/eurovisstar.20151113. Kalay YE and Marx J (2001) The role of place in cyberspace. In: Proceedings of the seventh international conference on Virtual Systems and Multimedia, VSMM ‘01, pp. 770–779. Washington: IEEE Computer Society. Keller EF (2003) Making Sense of Life: Explaining Biological Development with Models, Metaphors, and Machines. Cambridge: Harvard University Press. Kirsch S (2011) Laboratory/observatory. In: Agnew J and Livingstone D (eds) The Sage Handbook of Geographical Knowledge. Thousand Oaks, CA: Sage Publications, pp. 76–87. Kitchin R and Lauriault TP (2014) Towards Critical Data Studies: Charting and Unpacking Data Assemblages and Their Work. SSRN Scholarly Paper. Rochester, NY: Social Science Research Network. Kitchin R and McArdle G (2016) What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets. Big Data & Society 3(1). Kitchin R, Gleeson J and Dodge M (2013) Unfolding mapping practices: A new epistemology for cartography. Transactions of the Institute of British Geographers 38(3): 480–496. Latour B (1987) Science in Action: How to Follow Scientists and Engineers through Society. Cambridge: Harvard University Press. Latour B (1990) Drawing things together. In: Lynch M and Woolgar S (eds) Representation in Scientific Practice. Cambridge: MIT Press, pp. 19–68. Latour B and Woolgar S (1979) Laboratory Life: The Social Construction of Scientific Facts. Beverly Hills, CA: Sage Publications. Livingstone D (2003) Putting Science in Its Place. Chicago, IL: University of Chicago Press. Lohr S (2012) The age of Big Data. New York Times. 11th February. Mayernik MS, Batcheller AL and Borgman CL (2011) How institutional factors influence the creation of scientific metadata. In: Proceedings of the 2011 iConference. New York, NY: ACM, pp.417–425. Mitchell WJ (1995) City of Bits: Space, Place, and the Infobahn. Cambridge: The MIT Press. Mody CCM (2001) A little dirt never hurt anyone: Knowledge-making and contamination in materials science. Social Studies of Science 31(1): 7–36. Morozov E (2014) The rise of data and the death of politics. The Guardian, 19th July. Schivelbusch W (1986) The Railway Journey: The Industrialization of Time and Space in the 19th century. Berkeley: University of California Press. Star SL and Griesemer JR (1989) Institutional ecology, ‘Translations’ and boundary objects: Amateurs and professionals in Berkeley’s Museum of Vertebrate Zoology, 1907-39. Social Studies of Science 19(3): 387–420. Suchman L (2007) Human-Machine Reconfigurations: Plans and Situated Actions. New York, NY: Cambridge University Press. Vertesi J and Dourish P (2011) The value of data: Considering the context of production in data economies. In: Proceedings of the ACM 2011 conference on Computer Supported Cooperative Work, CSCW ‘11. New York: ACM, pp.533–542. This article is a part of Special theme on Practicing, Materializing and Contesting Environmental Data. To see a full list of all articles in this special theme, please click here: http://bds.sagepub.com/content/practicingmaterializing-and-contesting-environmental-data.
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached.

Outline
Introduction
Body
Conclusion
Reference


Course title
Student name
Institution affiliation

1

A place for Big Data
The author of this document is Arnold Arboretum of Harvard University, and the
document is basically written to institutions that always established themselves as the
comprehensive models of the world and because the institutions hold Big data they sought ways
to continually manage all the records necessary for organizing an...

Similar Content

Related Tags