Identify a news article that discusses a current event or social issue that relates to your field of study.

User Generated

mbebx

Writing

Description

Task: Identify a news article that discusses a current event or social issue that relates to your field of study. Follow the instructions below to explain the issue, provide an additional example of the issue, connect the issue to your field of study, and pose questions about the issue. Source: They are attached to the assignment. Choose either one. And answer the questions below.

Formatting your assignment

Incorporate these elements of APA style:

  • Use one-inch margins.
  • Double space.
  • Use an easy-to-read font between 10-point and 12-point.

Note: Title page is not required, but make sure you include your name and a title that reflects your topic at the top of the first page.

Number your answers and/or include the assignment questions so your instructor can see that you addressed each part of the assignment.

Respond to each of the following.

1. Provide an APA-style reference for the news article you selected. The format for the reference is as follows:

Author, A. A. (Year, Month Day). Title of article in sentence case. Title of Newspaper in Title Case and Italics. Retrieved from http://www.newspaperhomepage.com


2. What is the current event or issue that is being discussed in the news article? Explain the event or issue, assuming your reader has little or no prior knowledge of it. Answer in 1-2 paragraphs.

3. Describe a specific real-life situation (other than one discussed in your news article) where the issue at hand has been observed. This could be something that happened to you or someone you know, or it could be a related event in the news. Explain the connection to the event or issue in your news article. Answer in 1-2 paragraphs.

4. What is your field of study and how do you see it relating to the event or issue in your news article? Answer in one paragraph. I don't have a field of study yet. Associates in General Studies.

5. What do you want to know about the issue or event in your news article? List two or more questions that you could pursue as part of your research.

Unformatted Attachment Preview

Money and Politics in Ibsen, Shaw, and Brecht by Bernard F. Dukore (review) James Coakley Comparative Drama, Volume 15, Number 3, Fall 1981, pp. 281-283 (Review) Published by Western Michigan University DOI: https://doi.org/10.1353/cdr.1981.0039 For additional information about this article https://muse.jhu.edu/article/661907/summary Access provided by University of Maryland University College Library (18 Sep 2018 03:09 GMT) 281 Reviews men, and Don Quixote; the figure he seeks exists in A Midsummer Night’s Dream, but the main current of the Shakespearean canon sug­ gests that “earthlier happy is the rose distilled.” Of the four essays not yet discussed, one uses deconstruction, one uses metadrama, and two use political approaches. Madelon Gohlke’s far-reaching essay offers standard overall interpretations, good insights on metaphor, and a meditation on patriarchal versus feminine discourse. Marianne Novy’s account of the heroines as actors and audience is perceptive, but shows the limits of the approach; the gentleman’s account of Cordelia reacting to Kent’s letter reveals her as more than audience/ listener— she is like “a better way.” Paul Berggren’s “The Woman’s Part: Female Sexuality as Power in Shakespeare’s Plays” and Lorie Leininger’s “The Miranda Trap” show the virtues and the dangers of revisionist approaches respectively. The Berggren essay covers too much for analysis here, but every generalization provokes thought. Both these essays are weak on The Tempest, probably because they share the same political bias. Disliking Prospero, why should both critics believe him in thinking Caliban unteachable? Of all Shakespeare’s plays, The Tempest is the most likely to present the characters as extensions of the major figure; self-government is certainly a major theme, for without such government one is a slave. Ferdinand is in bondage, Antonino-Sebastian-Alonso-Gonzalo are spell-stopped, Ariel is a servant, Trinculo-Stephano are stuck in mud. Whatever political application King James’ daughter might have made, the play is intractable when treated from a literalistic political viewpoint. As in Measure for Measure, everyone is freed; no real government’s punishments extend “not a frown further.” As for Caliban, he hears music, dreams, knows that clothes are just show, is acknowledged by Prospero, and is left— free— on the island to “seek for grace hereafter.” The Woman’s Part is a very useful and provocative collection—highly recommended. CECILE WILLIAMSON CARY Wright State University Bernard F. Dukore. Money and Politics in Ibsen, Shaw, and Brecht. Columbia, Missouri, and London: University of Missouri Press, 1980. Pp. xxii + 172. $15.95. This book is a doctoral dissertation grown up. Its title with the quasi-scientific ardor germane to such activities neatly and clearly marks out its concerns. There are to be no mysteries confronted, discoveries made, or journeys unplanned in this undertaking; the itinerary is set forth at the very start of the exposition with the mathematical rigidity (not to say rigor) of some algebraic formulae. This study, we must never forget, is thematic criticism at its most relentless and formidable: interests pursued, one must also say, with an admirable, though ultimately stulti­ fying, scholastic vigor. 282 Comparative Drama Professor Dukore, concerned with the subject of money and other social and/or political issues in selected plays of these eminent dramatists, examines eighteen texts as exemplars of this theme, thesis, position, or attitude. There is nothing particularly wrong with his discussion (six chapters, an introduction, and a conclusion) which neatly wraps up his argument rather like an evening spent with the debate squad. Everything Professor Dukore says about the plays is right, correct, true, demon­ strable, and scholarly. Ibsen, Shaw, and Brecht are, were, and continue to be (since in a very real, profound, and wondrously reassuring sense none of them is dead) radical critics of society, writers who treated “similar issues or themes,” to quote the hortatory pronouncements of the blurb writer. That these men are, were, or continue to be also great poets of the stage is a profound and difficult issue the book ignores at its own peril, a notion Professor Dukore quickly and hurriedly mentions (p. xxii) as he moves on to his easier societal concerns. But the latter issue (how ideas are thrust onto the stage and amalgamated into that unique stage poetry we call modern drama) is another matter of greater importance, I think, raising questions beyond the intentions and scope of this current undertaking. It is, of course, to ask Professor Dukore to write a book other than the one under review. But the disturbing problem will not go away; it is, in fact, provoked by Dukore’s essays. Indeed, the books, in my judgment, are yet to be written about these men as artists rather than social critics, as play­ wrights who saw beyond ideas or the so-called reasons which propelled them to the playhouse. This suggestion or complaint is surely not new, but merely another reminder of what yet remains to be done. We must always look at the plethora of social issues or ideas (how banal they become and so soon!) to see and write about the enduring vitality of these dramatists; to write, in short, of their genius. It is a curious, diffi­ cult, and prolix problem of what a great playwright does to and with ideas, that curious admixture which irrigates the idea from dialectic to art, from social issue to passionate concern in which the lives of human beings are at stake. Ibsen’s polemics, well-handled by Professor Dukore, dissipate as one considers the great Scandinavian poet’s attempts to restore tragedy to the modern stage. Shaw’s answers to everything (his Mr. Fix-it attitude toward life) disappear in the face of his salutary vision of life’s possi­ bilities. And Brecht’s Marxism, the militantly truculent commitment of a difficult personality, is a silly joke, the political mutterings of a great poet who casts aside his ideas as he creates great theatre, tapping at the door of tragedy, recalling the wonderful ability of the human spirit to survive and prevail. Great dramatists like great poets are always some­ thing else; they do not need history; they have their gifts, and they require ideas only peripherally. The political situation in Norway so important to Dukore’s treatment of the play is incidental to the great­ ness of Rosmersholm; the characterizations of Eliza Doolittle and Anna Fierling, alias Mother Courage, are beyond the radical positions these wondrous women take regarding a world they never made. Nonetheless, the above remarks do not vitiate, in any way, the quality Reviews 283 of Dukore’s book. The play selection is apt; the texts are easy prey, good targets for the discussion, and the nomenclature of hunting precisely describes what Professor Dukore has accomplished. He has caught in the snare of his critical net a group of plays and in a sense vanquished them. That he sometimes uses minor works of these men to make his points is an unkind and probably unfair cavil, better perhaps thought than said. The discussions are full and penetrating; at times, alas, they also resemble pedestrian glossings of the play under consideration and no more. They do not breathe with excitement and wonder, and that is probably this book’s problem. It is too worked over, too thought out, and its energy often wanes and dissipates in the face of perfervid critical activity. To use R. P. Blackmur’s winsomely snobbish phrase, this book, as criticism, is not “the formal discourse of an amateur.” It is the informed, intelligent, often wrongheaded efforts of someone who has no doubt taught, lived with, and thought about these plays for a long time. That the book, finally, does not succeed is due, I think, to the enormous pressure of the ratiocination used as the manuscript took shape. Professor Dukore has smothered his enthusiasms in “ideas”; he has explained his plays to the point of banality; he has wrenched every last gram of “meaning” out of this hunk of dramatic literature, and, in the process, has forgotten the profound importance and necessity of feelings in the theatre. JAMES COAKLEY Northwestern University Shakespeare’s “More Than Words Can Witness”: Essays on Visual and Nonverbal Enactment in the Plays. Edited by Sidney Homan. Lewis- burg: Bucknell University Press; London: Associated University Presses, 1980. Pp. 238. $19.50. I came to this particular book with considerable interest and enthusiasm, but I left disappointed. Its title and general concept promise much more than it delivers. Written by different people, the twelve essays that comprise the collection do not cohere well nor do they always correspond to the presumed topic of the book. Further, eight of the twelve have already been published or presented as papers at sundry gatherings, some dating back to 1960. Given the work that has been done on this topic of late, one wonders why the editor could not have put together new essays. According to Homan’s extensive “Preface”— a summary or mini­ review of the book—there are five groupings for the essays: Maurice Charney’s essay on Hamlet stands alone; three pieces on language; four on the “look” of plays; two that survey “the visual and nonverbal side of Shakespeare’s canon”; and two on film. One can admire Char­ ney’s “Hamlet without Words,” as I do, while being uncertain as to why it should stand alone in Homan’s organization of the book. Charney Social Media, Money, and Politics: Campaign Finance in the 2016 US Congressional Cycle Lily McElwee1 and Taha Yasseri1,2,* 1 Oxford Internet Institute, University of Oxford, Oxford, UK 2 Alan Turing Institute, London, UK *Correspondence: Taha Yasseri taha.yasseri@oii.ox.ac.uk Abstract With social media penetration deepening among both citizens and political figures, there is a pressing need to understand whether and how political use of major platforms is electorally influential. Particularly, the literature focused on campaign usage is thin and often describe the engagement strategies of politicians or attempt to quantify the impact of social media engagement on political learning, participation, or voting. Few have considered implications for campaign fundraising despite its recognized importance in American politics. This paper is the first to quantify a financial payoff for social media campaigning. Drawing on candidate-level data from Facebook and Twitter, Google Trends, Wikipedia page views, and Federal Election Commission (FEC) donation records, we analyze the relationship between the topic and volume of social media content and campaign funds received by all 108 candidates in the 2016 US Senate general elections. By applying an unsupervised learning approach to identify themes in candidate content across the platforms, we find that more frequent posting overall and of issue-related content are associated with higher donation income when controlling for incumbency, state population, and information-seeking about a candidate, though campaigning-related content has a stronger effect than the latter when the number rather than value of donations is considered. Keywords: Social Media, Federal Election Commission, Congress, Election, Donation, Campaign Finance. Introduction Scholars have sought to understand the relationship between technology and democracy since the 1990s (Barber, 1998). With rapidly rising adoption of social media by citizens, US politicians are increasingly aware of the power of major platforms to communicate and organize for political purposes (Gainous and Wagner, 2014; Margetts, John, Hale, & Yasseri, 2015). The growth in social media campaigning specifically has been mirrored 1 by growth in literature analyzing usage itself and implications for a range of electorallyrelated outcomes (Boulianne, 2015). As social media penetration continues to deepen among the American electorate, there is pressing need to determine whether and how political candidates’ use of these platforms has electoral significance. Social media adoption is widespread for any protocol, with 70% of US adults on Facebook and 20% on Twitter as of 2016. Empirical work has demonstrated a rise in citizen and candidate interaction on the main platforms. Roughly 40% of Americans had posted and 80% had seen political content on social networking sites (SNSs) as of April 2014. Most importantly, followership of political figures on the main SNSs is on the rise; while 14% and 24% of 18-29 year olds and 6% and 21% of 30-49 year olds followed elected officials, political parties, or candidates for office in 2010 and 2014 respectively (Anderson, 2015), 35% of the American online population now does so (Kalogeropoulos, 2017). The practice of followership is bipartisan, with supporters of both parties equally likely to follow political figures on social media (Anderson, 2015). Studies seeking to explain increased campaign usage distill unique offerings of major social media platforms. Most, primarily focused on Twitter, argue such sites facilitate the “most inexpensive, unmediated, and closely focused forms of communication in campaign history,” (Gainous and Wagner, 2014, 54); further, these platforms are ideally suited to the types of messaging in which office-seekers want to engage, as they enable candidates to create succinct themes and highlight victories rather than explain the minutiae of complex legislation. It has been suggested that although the major social media platforms were not originally created for political purposes, the fact that they are low cost, allow direct communication with the public, and provide access to a wide body represent advantages over traditional phone, mail, and website-based campaigning (Auter and Fine, 2017). Because social media has been shown to be fundamentally different from ‘campaigning as usual’ (Bode et al, 2016), the implications of rising use of social media in campaigning are worth investigation. Cognizant that social media are increasingly commonplace, a plethora of studies have begun to describe usage and test impact across a range of electoral areas, such as vote gains (Yasseri & Bright, 2016; Bright et al., 2017), political participation (Bode, 2012), and political learning (Dimitrova et al., 2014). These studies offer a wealth of information regarding the ways in which online campaigning is playing a role in electoral processes, but leave addressable gaps. Those examining implications often focus on volume rather than content of social media activity as the explanatory variable. Those classifying political social media content categorically fail to examine broader electoral implications of technologies. The past literature on social media electoral campaigning examines determinants of use (Jackson and Lilleker, 2011), genres of content (Bode, 2016; Gainous and Wagner, 2014), and implications for electorally-related outcomes (Bright et al., 2017). Quantifying the effect of social media on electoral outcomes is a work in progress, but initial analyses 2 suggest significance, contradicting skepticism from media and academic accounts (Bright et al, 2017). The field will likely continue to grow in sync with political enthusiasm for social media platforms. On the other hand, the wide body of studies classifying social media posts by type or topic or assessing determinants behind specific styles fail to examine the influence of such classifications on various elements of electoral success. Bode et al., (2016) analyze all 10,303 tweets by campaigns in the 2010 senatorial elections, classifying each according to seven topics (economic, social issues, foreign policy, social welfare, law and order, environment/energy, other); while they assess the relationship between social media activity and campaign resources and competitiveness, they focus on tweeting volume rather than topics and do so in a bid to predict the former rather than the latter. Evans, Cordova, and Sipole (2014) examine how candidates for the House of Representatives used Twitter during the 2012 cycle, classifying tweets according to a sixpart scheme. While they find a clear distinction between the tweeting styles of incumbents, Democrats, women, and those in competitive races versus challengers, Republicans, minor party candidates, men, and those in safe districts, they do not examine electoral implications of these stylistic differences. Jones, Noorbalachi et al., (2016) similarly consider the demographic breakdown of specific types of content, finding that Republican and conservative legislators stress values of tradition, conformity, and national security, whereas Democratic and liberal legislators stress values of benevolence, hedonism, universalism, and social/economic security. However, they leave investigation at the content level, rather than evaluate relevance to future electoral chances. Gainous and Wagner (2014) offer perhaps the most comprehensive and theoreticallygrounded analysis of candidate activity on Twitter, creating a four-part typology of political Twitter content based on research into the patterns and activities of modern campaigning. They research determinants of social media adoption and use, analyzing both total tweet volume and types of Twitter use for each of the four types of online campaigning in light of a variety of political and demographic factors. Despite identifying differences across partisan, incumbency status, congressional office (House versus Senate) and gender metrics through bivariate, multivariate, and qualitative examination, they do not test the association between these typological breakdowns and electoral outcomes of any sort. In one of the first large-scale empirical studies linking vote outcomes to Twitter use, Bright et al (2017) comprehensively account for various elements of candidate social media activity in the 2015 UK general election, such as volume of posting, followership, and dialogue with followers, but do not incorporate specific topics of content in assessing effects. Others lump candidate posts with candidate-related posts (Murphy, 2015) or rely on mentions of political candidates and political parties (Tumasjan et al., 2010). Money is an important ingredient in US congressional elections. General election candidates in the 2016 Senate race alone raised $667,697,881, with Democrats 3 outstripping Republicans at $363,396,637 to $302,100,403.1 In contrast to many other liberal democracies, private funding in the United States is often a primary campaign resource. Since 1971, such funding has been governed by the Federal Election Campaign Act, amendments to which cap individual donations to $2,700 per election and mandate disclosure for all contributions received above $200. Registered political committees, such as candidate campaigns, file reports with the Federal Election Commission (FEC), which are made publicly-available within 48 hours of submission but updated continuously afterwards as the numbers become more accurate. These requirements make the role of money in US elections both influential and researchable. There has been surprisingly little attention to the impact of either volume or type of usage on campaign fundraising success. To the best of our knowledge, there are only few studies analyzing social media effects on political donations (Hong, 2013; Petrova, Sen, & Yildirum, 2016), but both of these works use politicians rather than candidates as the unit of analysis and focus solely on adoption, rather than content type, as the explanatory variable. Campaign contributions rely on decisions about whether and how much to give. Studies in political behavior suggest patterns in such decisions, finding that 1) individuals use social information, or information about what others are doing or have done, to select whether and how much to contribute (Margetts et al., 2011; Croson & Shang, 2008; Traag, 2016); 2) individuals require information about which candidates represent their political beliefs in order to make contributions (Grant and Rudolph, 2002); and 3) selective targeting has the potential to temper the impact of income on contribution decisions, such that campaigns can maximize donations by constructing a messaging and solicitation strategy that aligns with the background information of supporters and their associates. In this work, we analyze this relationship between online campaigns and the donation received by drawing on publicly-available data from Facebook and Twitter timelines, Google Trends, Wikipedia page views, and FEC donation records for all 108 candidates in the 34th general elections for US Senate in 2016. Data and Methods We collect data on the number and sum of donations received and the social media activity conducted by general election senatorial candidates during the six-week period prior to Election Day in the 2016 cycle. For details see Supplementary Information. Donation records. The FEC provides searchable donation records based on reports from registered campaign committees. We use MapLight to retrieve the total sum and count of donations received by the candidates with campaign accounts.2 The customized dataset 1 Statistics retrieved from the Center of Responsive Politics. https://www.opensecrets.org/overview/index.php 2 https://maplight.org/ 4 includes 83 of 108 candidates. Twenty-five candidates are missing donation information altogether across the five periods (see Supplementary Information for a full list). Candidates are not required to file with the FEC if they receive or spend under $5000, so the limited totals of these candidates is the most likely explanation for absent records, especially since data collection falls sufficiently after reporting deadlines. Social media. We use raw Facebook posts and tweets as a basis for the social media variables. We manually search within each social media platform for candidate handles and check for additional accounts via Google. Many candidates have more than one Twitter or Facebook account. For each candidate, we include all accounts found on a given platform to capture the entirety of his or her Twitter or Facebook presence. Google Trends. We collect daily data on Google search trends over the past year for each candidate’s full name via the gtrendsR package in R.3 Wikipedia. We manually search for candidates’ Wikipedia entries; of 108 candidates, 71 have dedicated Wikipedia articles. The list provides a basis for collection of daily data on Wikipedia page views for each available article, which we do through the wikipediaR package (Bar-Hen et al., 2016).4 Incumbency. We collect information on whether a candidate is an incumbent from Ballotpedia.5 Population. For each candidate, we collect data on the population of their state from 2016 census figures. This control variable has been reported to play a role in campaign finance (Lin, Kennedy & Lazer, 2017). Type of post. To classify posts by topic, we use latent Dirichlet allocation (LDA) topic modeling, which postulates a latent structure in a corpus and represents each document as a distribution over topics (Steyvers and Griffiths, 2007). Adopting topic modeling identifies topics of content, rather than categories of content, thereby distinguishing any output from existing categorizations of political content online discussed above (e.g., Gainous and Wagner, 2014; Evans et al, 2014). For the details on corpus cleaning and optimization of the topic modelling algorithm, see Supplementary Information. Classifying Documents. The topic modelling process results in ten collections of words (Figure 1). Consistent with precedent (Messing et al., 2014), topic labels are selected according to the words appearing most frequently in the posts classified by a specific topic (see Supplementary Information). 3 https://cran.r-project.org/web/packages/gtrendsR/gtrendsR.pdf. https://cran.r-project.org/web/packages/WikipediaR/WikipediaR.pdf. 5 https://ballotpedia.org/United_States_Senate_elections,_2016. 4 5 Figure 1. Topic Modeling Output: Qualitative Description. Brief description of the ten topics discovered through modeling, see Supplementary Information for more details. Grouping Topics. Analysing similarity between topics provides a basis for grouping individual topics and deriving a meaningful hypothesis about the relationship between fundraising and specific types of content. We use cosine similarity to identify the similarity of the automatically-generated topics in the LDA model, an approach with precedent in thematic topic aggregation (Messing et al, 2014). Setting the weights for all pairs with similarity smaller than 0.23 to zero, we visualize the topic similarity for those pairs above the cutoff in graphing platform Gephi.6 Analyzing modularity via the Louvain community detection algorithm (Blondel, Guillaume, Lambiotte, and Lefebvre, 2008), we detect communities of topics that are more similar to each other than to the others (Figure 2). Figure 2. Topic Network Graph. Based on cosine similarity weightings, the Louvain community detection method discovers two communities of topics, which qualitative examination shows to group thematically into campaigning-related (green) and issue-related (pink) posts. 6 https://gephi.org/. 6 It becomes evident that there are two main clusters of topics. Topics 1, 3, 4, 8, 9 form a community while topics 2, 5, 6, 7, 10 form a separate community. The former group all have campaigning-related elements while the other topics relate to policy issues. In the following analysis, we consider the social media posts in each group separately. Results Overall Description The total number of donations per candidate, per week ranges from 1 to 8131 (mean 698 and median 108). The total sum received per candidate, per week ranges from $19 to $1,600,000 (mean $178,200 and median $62,580). The distributions of both specifications are fat-tailed. The distributions are shown in Supplementary Information. Figure 3 shows the relation between weekly donation counts and sum received. While the relationship does not appear linear across the count values, mean donation size falls with higher number of donations. This shows that some senatorial candidates receive a high number of smaller donations in certain periods, similar to Obama in the 2008 and 2012 campaigns (Bimber, 2014). The maximum of sum/count appears at roughly nine donations per week. Figures 3. Donation Count versus Size: Log/Log. Log transformation reveals a nonlinear nonmonotonic relationship between the number and size of donations. Senatorial candidates tweeted on average much more frequently than they posted on Facebook. The weekly posts per candidate on Twitter range from 1 to 264 (mean 37 and median 24). The weekly posts per candidate on Facebook range from 1 to 37 (mean 8 and median of 5). The distributions are shown in Supplementary Information. As Figure 4 shows, post volume for all topics detected by topic modelling rises as Election Day nears. In all periods, topic 1 (campaigning talk, including references to 7 meetings, calls, and stops) is highly popular while topics 7 (issue-related, with reference to community) and 9 (campaign topics such as solicitation of volunteers) are relatively unpopular. Figure 4. Posting Volume Per Topic and Type, By Week. Posting across all topics rises as the Election nears. Weekly Google search volume for 2016 senatorial candidates runs from 0 to 1,360,791 (mean 25,870 and median 4,466). And Wikipedia weekly page views for those candidates with an article run from 0 to 1,360,791 (mean of 25,870 and median 4,466). The distributions are shown in Supplementary Information. Regression Analysis Simple scatterplots and residual plots reveal that the relationships between donations and both volume and type of post are not linear (see Supplementary Information). Considering this and the fact that the distributions of each variable are extremely fattailed, as demonstrated in the previous section, we transform all variables logarithmically. Also to be able to infer causality, we associate the values of donations (count and sum) in each week with the social media activity during the week before. The linear regression model on the variables (Figure 5) show a positive and significant relationship between post volume and both donation totals and counts. This relationship is robust when controlling for state population and incumbency status as well as general information-seeking regarding the candidate, as proxied by Google Search Trends and Wikipedia page views. In the baseline model for donation sums, a 10% increase in post volume is associated with a 7% rise in donation income. When state population, incumbency, and general awareness are controlled for, the relationship weakens, with a 10% rise in post volume associated with a 1% boost in donation income. In terms of counts, for every 10% increase in post volume, the number of donations in the following week rises by 12%. When state population, incumbency, and general awareness are controlled for, this falls to a 3% boost. Now we examine the effect of post type on donations, with the total posts as an additional control in order to account for the candidate’s overall level of posting (Figure 6). 8 Similar to the post volume overall, campaigning-related posts have a positive and significant effect on both donation sums and counts with all controls in place. This positive relationship is significant when information-seeking is added as a control. A 10% rise in campaign-related posts gives rise to a 4% rise in donation income in the multi-control model and a 10% rise in such posts produces an 8% boost in donation counts in the following period. Now we turn to the issue-related posts (Figure 7). In a full-control model, a 10% rise in issue-related posts is associated with a 5% rise in both the donation income and counts. It is notable that while the campaign-related posts are more effective in increasing the count of donations than the sum, the issue-related posts have similar effects on both the number of donations and their sum. Conclusion Social media campaigning has normalized as a way for US political candidates of all backgrounds to promote their messages. 2017 marks near complete penetration of major social media platforms Twitter and Facebook by the ‘political class’. Spurred by low barriers to entry, the march toward adoption by congressional candidates represents uniformity unique in the context of campaigning tools, simultaneously precluding any immediate conclusions about electoral implications and making these effects so pressing to study. This study extends existing approaches by examining the effects of different content categories for an important component of elections. Methodologically, it builds on both these literatures by focusing on a novel output variable (donations), adopting a computational approach that avoids preconceived notions of post material, using candidates rather than politicians as a unit of analysis, and examining both Facebook and Twitter in assessing a candidate’s social media presence. Through these extensions, we offer both evidence of thematic patterns in social media electoral campaigning and a further step toward quantification of electoral payoffs through such campaigning. Topic modeling of general election senatorial candidates in the 2016 cycle reveals that posts discuss campaigning-related items or issue-related items, confirming that past classification methods based on more manual approaches indeed capture the spectrum of topics touched on by campaigning political figures online and hinting that this typology of candidate content is applicable up and down the US political ballot. Regarding the effects of candidates’ social media behavior, conclusions hold practical implications in the realm of modern campaigning. First, they point to a clear utility of investing resources in social media as a means to acquire additional campaign resources. Posting more frequently on social media appears to facilitate fundraising. Second, findings hint that some types of content may be associated with a higher return on posting than others. While both campaign- and issue-related posts are positively associated with donations, their effects are roughly equivalent on donation sums when incumbency, population, and information-seeking behavior are controlled for, and campaign-related posts have a stronger effect on the total number of donation receipts. 9 Figure 5. Volume Model: Post Volume and Political Donations (Sum and Count) 10 Figure 6. Post Type Models: Campaign-Related Posts and Political Donations (Sum and Count) 11 Figure 7. Post Type Models: Issue-Related Posts and Political Donations (Sum and Count) 12 Variation in the relative effects of specific types of content based on alternative specifications of donation receipts suggest future work should continue to distinguish sums from counts when assessing electorally-related payoffs of social and other digital media, and on a more practical level, indicate campaigns may find different types of content of greater utility based on the sought-after payout. Through these insights, based on methodological innovation in an area with a shortage of empirical work, this work paves the way for continued investigation into how an Internet-mediated trend in political campaigning is shaping and can be used to shape a longstanding ingredient of electoral success. One should note however that Twitter and Facebook users are a minority of society (Blank, Graham, & Calvino, 2017) and often not of the demographic to actively engage in electoral behavior. While this would not negate the findings, and specific mechanisms for the discovered relationship are not examined empirically herein, it might raise the possibility that something other than user reception to candidate content on social media explains the relationship. However, multiple empirical looks have found that Twitter users are in fact the ‘ideal subpopulation’ with whom elites might desire to communicate, given they are very likely to turn out to the polls, interested in politics, and wealthy enough to contribute to campaigns (Bode & Dalrymple, 2016). Acknowledgements We thank Laura Curlin for her assistance in gathering and cleaning the donation dataset and Jonathan Bright for insightful discussions. Funding Statement TY was partially supported by The Alan Turing Institute under the EPSRC grant EP/N510129/1. Declaration of Conflicting Interest The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Data and Replication Information Meta-Document The code used to access and analyze all data involved in this research may be found here: https://figshare.com/s/ff85b0a22fe470de5e80. Supplementary Information See below. 13 References Auter, Z. & Fine, J. (2017). Social Media Campaigning: Mobilization and Fundraising on Facebook. Social Science Quarterly. Advance online publication. doi: 10.1111/ssqu.12391. Barber, B. R. (1998). Three scenarios for the future of technology and strong democracy. Political Science Quarterly, 113(4), 573-589. Bar-Hen, A., Baschet, L., Francois-Xavier, J., & Riou, J. (2016). Package 'WikipediaR'. Retrieved from https://cran.r-project.org/web/packages/WikipediaR/WikipediaR.pdf. Bimber, B. (2014). Digital media in the Obama campaigns of 2008 and 2012: Adaptation to the personalized political communication environment. Journal of Information Technology & Politics, 11(2), 130-150. Blank, G., Graham, M., & Calvino, C. (2017). Local geographies of digital inequality. Social Science Computer Review. Advance Online Publication. doi: 10.1177/0894439317693332. Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008. Bode, L. (2012). Facebooking it to the polls: A study in online social networking and political behavior. Journal of Information Technology & Politics, 9(4), 352-369. Bode, L., Vraga, E. K., Borah, P., & Shah, D. V. (2014). A new space for political behavior: Political social networking and its democratic consequences. Journal of Computer-Mediated Communication, 19(3), 414-429. Bode, L., & Dalrymple, K. E. (2016). Politics in 140 characters or less: Campaign communication, network interaction, and political participation on Twitter. Journal of Political Marketing, 15(4), 311-332. Bode, L. (2016). Political news in the news feed: Learning politics from social media. Mass Communication and Society, 19(1), 24-48. Bode, L., Lassen, D. S., Kim, Y. M. & Ridout, T. (2016). Coherent campaigns? Campaign broadcast and social messaging. Online Information Review, 40(5), 580-594. Boulianne, S. (2015). Social media use and participation: A meta-analysis of current research. Information, Communication & Society, 18(5), 524–538. Bright, J., Hale, S. A., Ganesh, B., Bulovsky, A., Margetts, H., & Howard, P. (2017). Does Campaigning on Social Media Make a Difference? Evidence from candidate use of Twitter during the 2015 and 2017 UK Elections. arXiv preprint arXiv:1710.07087. Croson, R., & Shang, J. Y. (2008). The impact of downward social information on contribution decisions. Experimental Economics, 11(3), 221-233. Evans, H. K., Cordova, V., & Sipole, S. (2014). Twitter style: An analysis of how house candidates used Twitter in their 2012 campaigns. PS: Political Science & Politics, 47(2), 454-462. Gainous, J., & Wagner, K. M. (2014). Tweeting to power: The social media revolution in American politics. Oxford University Press. Grant, J. T., & Rudolph, T. J. (2002). To give or not to give: Modeling individuals' contribution decisions. Political Behavior, 24(1), 31-54. 14 Hong, S. (2013). Who benefits from Twitter? Social media and political competition in the US House of Representatives. Government Information Quarterly, 30(4), 464-472. Jackson, N., & Lilleker, D. (2011). Microblogging, constituency service and impression management: UK MPs and the use of Twitter. The Journal of Legislative Studies, 17(1), 86–105. Jones, K., Noorbaloochi, S., Jost, J. T., Bonneau, R., Nagler, J., & Tucker, J. A. (2017). L iberal and Conservative Values: What We Can Learn From Congressional Tweets. Political Psychology. Advance online publication. doi: 10.1111/pops.12415. Kalogeropoulos, A. (2017). Following Politicians in Social Media. Retrieved from http://www.digitalnewsreport.org/survey/2017/following-politicians-social-media-2017/. Lin, Y. R., Kennedy, R., & Lazer, D. (2017). The Geography of Money and Politics: Population Density, Social Networks and Political Contributions, Research and Politics: 1–8. Margetts, H., John, P., Hale, S. A., & Yasseri, T. (2015). Political turbulence: How social media shape collective action. Princeton University Press. Margetts, H. (2017). Political behaviour and the acoustics of social media. Nature Human Behaviour 1. Margetts, H., John, P., Escher, T., & Reissfelder, S. (2011). Social information and political participation on the internet: An experiment. European Political Science Review, 3(3), 321-344. Petrova, M., Sen, A., & Yildirim, P. (2016). Social Media and Political Donations: New Technology and Incumbency Advantage in the United States. Unpublished manuscript, Graduate School of Business, Stanford University, Stanford, CA. Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440. Traag, V. A. (2016). Complex Contagion of Campaign Donations. PloS one, 11(4), e0153539. Tumasjan, A., Sprenger, T. O., Sandner, P. G., & Welpe, I. M. (2010). Predicting elections with twitter: What 140 characters reveal about political sentiment. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, 10(1), 178-185. Yasseri, T., & Bright, J. (2016). Wikipedia traffic data and electoral prediction: towards theoretically informed models. EPJ Data Science, 5(1), 1-15. 15 Supplementary Information for Social Media, Money, and Politics: Campaign Finance in the 2016 US Congressional Cycle Lily McElwee1 and Taha Yasseri1,2 1 Oxford Internet Institute, University of Oxford, Oxford, UK 2 Alan Turing Institute, London, UK Table of Contents DATA ...................................................................................................................................................... 2 Collection ........................................................................................................................................................... 2 Donations................................................................................................................................................................. 2 Social Media ............................................................................................................................................................ 2 Controls ................................................................................................................................................................... 2 Filtering ............................................................................................................................................................. 3 Donations................................................................................................................................................................. 3 Social Media ............................................................................................................................................................ 3 Controls ................................................................................................................................................................... 4 Missing Data ...................................................................................................................................................... 4 Donations................................................................................................................................................................. 4 Social Media ............................................................................................................................................................ 5 Candidate Information & Posts ............................................................................................................................... 5 Regression ............................................................................................................................................................... 6 TOPIC MODELING .............................................................................................................................. 7 Corpus Cleaning Operations .............................................................................................................................. 7 Topic Modeling Output - Words in Topics .......................................................................................................... 9 Topic Modeling Description ............................................................................................................................... 9 DESCRIPTIVE STATISTICS .............................................................................................................. 12 Overall Distributions .........................................................................................................................................12 Per-Period (week) Distributions ........................................................................................................................15 Scatterplots .......................................................................................................................................................16 Regression Model Diagnostics ...........................................................................................................................18 Residual versus Fitted Plots ................................................................................................................................... 18 Normal Q-Q Plots .................................................................................................................................................. 19 Correlation between Itemized and Unitemized Sums .........................................................................................20 CODE .................................................................................................................................................... 20 REFERENCES ..................................................................................................................................... 20 1 DATA Collection We collect data on the number and sum of donations received and the social media activity conducted by general election senatorial candidates during the six-week period prior to Election Day in the 2016 cycle. We pair social media data in a given period with donation data one period later to account for the temporal element in reactive donations and get at causation rather than simple correlation. Hence, the ultimate dataset for regression analysis consists of five pairs of periods, with social media data directly before the election not considered. Due to proximity with the studied time, there is no central repository of such data, so accessing and classifying form a major component of this study. Donations The Federal Election Commission (FEC) provides searchable donation records based on reports from registered campaign committees. While there are application programming interfaces (APIs) to access FEC data for a specific set of candidate campaign accounts, we collaborate with MapLight to retrieve the total sum and count of donations received by the candidates with campaign accounts in the five-week period prior to Election Day.1 The customized dataset includes 83 of 108 candidates. Twenty-five candidates are missing donation information altogether across the five periods. Candidates are not required to file with the FEC if they receive or spend under $5000, so the limited totals of these candidates is the most likely explanation for absent records, especially since data collection falls sufficiently after reporting deadlines. Social Media We use raw Facebook posts and tweets as a basis for the social media variables. We manually search within each social media platform for candidate handles and check for additional accounts via Google. Many candidates have more than one Twitter or Facebook account.2 For each candidate, we include all accounts found on a given platform to capture the entirety of his or her Twitter or Facebook presence. 96 of 108 candidates have a Twitter or Facebook presence. Among these, 86 have at least one Twitter profile, and 78 have at least one Facebook profile. 35 have at least two Twitter profiles, and 28 have at least two Facebook profiles. With this set of handles, we use the Facebook Graph API and the Tweepy API package3 in Python to collect the id, time and date, and text of the post for both Facebook and Twitter for a given screen name. Combining the entire base of accessible posts results in 2,248 Facebook entries and 18,243 Twitter entries, totalling 20,491 posts. Controls Google Trends. we collect daily data on Google search trends over the past year for each candidate’s full name via the gtrendsR package in R.4 The raw data range from 0 to 100, since search volume obtained through Google Trends represents the traffic for a specific keyword relative to all queries submitted in Google, normalized on a 0-100 scale (where 100 represents the peak of relative search volume obtained per keyword during a given period). Wikipedia. We manually search for candidates’ Wikipedia entries; of 108 candidates, 71 have dedicated Wikipedia articles. The list provides a basis for collection of daily data on Wikipedia page views for each available article, which we do in R through the wikipediaR package (Bar-Hen, Baschet, Jollois, & Riou, 2016).5 1 Maplight: https://maplight.org/. As an example, Hawaii Senator Brian Schatz possesses personal (‘brianschatz’), office (‘SenBrianSchatz’), and campaign (‘SchatzforHawaii’) accounts. 3 For Facebook API, see here: https://developers.facebook.com/docs/graph-api. For Tweepy, the Twitter API package utilized, see here: http://www.tweepy.org/. 4 gTrendsR package: https://cran.r-project.org/web/packages/gtrendsR/gtrendsR.pdf. 5 2 For wikipediaR package, see here: https://cran.r-project.org/web/packages/WikipediaR/WikipediaR.pdf. 2 Incumbency. We collect information on whether a candidate is an incumbent from Ballotpedia.6 Population. For each candidate, we collect data on the population of their state from 2016 census figures. Filtering Donations To address the fact that some candidates are missing donation records due to total sums below $5000, we put donation data for all candidates on a $5000-scale. In other words, if a candidate raised $10,000 in period three, the donation sum variable would be 2. This allows us to set the sum per period for the candidates without FEC records to zero, since the total was below $5000 in each period. This approach preserves as data points those individuals without an FEC record. Social Media Volume Raw tweets and Facebook posts serve as the basis for our first independent variable: social media volume per candidate-period. We aggregate the individual posts to get a count of how many times each candidate posted across platforms for each of the six periods prior to Election Day. For those without any sort of social media presence, we set post volume to zero in order to preserve the observations in the dataset. This is preferable to eliminating the observations altogether, and any model should retain predictive power for very low levels of social media activity as well as higher ones. Content The scraped posts also serve as the basis for the second genre of independent variable: type of post. To classify posts by topic, we use latent Dirichlet allocation (LDA) topic modeling, which postulates a latent structure in a corpus and represents each document as a distribution over topics (Steyvers and Griffiths, 2007). The advantages of this unsupervised method over alternative classification approaches for political content are numerous. Because it does not require prior labeling or annotation of text, topic modeling detects topics rather than begin with certain ones in mind, thereby avoiding oversight of ones that may be present in the corpus. The automatic nature of topic modeling lends it scale: the approach is better able to deal with higher volumes of content than hand coding, and such volume increasingly characterizes studies of social media (Blei, 2012). On a practical level, classifying content via topic modeling involves cleaning the corpus of social media messages, determining the optimal number of topics, running the model, setting a threshold by which to classify a document as a certain topic, and developing a classification scheme by which to group topics thematically. Corpus cleaning. Cleaning the data in preparation for topic modeling is an extensive process. After converting all the words to lowercase, removing both punctuation and numbers, and stripping whitespace, we conduct steps that typically come prior to processing tokenized texts: stemming and stopword removal. The former trims all terms down to their morphological roots (changing “november” to “novemb”, for instance) and the latter eliminates meaningless but frequent terms such as “and”, “or” and “but”. We then adopt a tailored, iterative process involving text checks; this requires removing domain-related terms, hex terms, nonwords, custom stop words, name and state hashtags, candidate names, state names, candidate handles, and the top five most frequent terms (due to a cutoff at this value in term frequency), and terms that appear just once in the corpus. This process is summarized in Figure SI.1. 6 2016 US Senate Elections, Ballotpedia: https://ballotpedia.org/United_States_Senate_elections,_2016. 3 Figure SI.1. Corpus Cleaning Operations. Total character count in the Twitter and Facebook posts following each step in the cleaning process. Controls Google Search Trends: To retrieve absolute rather than relative numbers, we scale Google search trends using average monthly search volume data from Adwords.7 We create a ratio of average monthly search volume via Adwords to that via Google Search Trends by dividing the former value by the calculated monthly average Trends value based on a year’s worth of data. For instance, Arn Menconi shows an average monthly Trends number of 56 over the past year and an average search volume of 1600 via Adwords, so the relevant ratio would be 29. We then scale each daily view in the relevant week-long periods based on this candidatespecific value. One caveat is that the scale is based on international views while we are specifically interested in US views, but the approach is reasonable given that most views for US senatorial candidates are likely to be from the US. Wikipedia Page Views: We aggregate the daily data by the relevant periods in order to have Wikipedia page view data for each of the candidate-period observations for which it is available. For those without a Wikipedia article, we set the number of views to zero for each period. Incumbency and State Population. The datasets for these controls come pre-formatted and are annual, so do not require any subsetting. We assume that population remains roughly similar across weeks leading up to Election Day. Missing Data Donations Twenty-five candidates are missing donation information altogether across the five periods for which this data is collected. These include the following candidates: Allen Buckley, Bill Barron, Bill Bledsoe, Brian Chabot, Cris Ericson, Edward Clifford, Eric Navickas, Frank Gilbert, Gary Swing, Jeff Russell, Jerry Trudell, Jim Lindsay, John Heiderschet, Jonathan Dine, Kent McMillen, Mike Workman, Pete Diamondstone, Phil Anderson, Richard Lion, Robert Marquette, Robert Garrard, Robert Murphy, Scott Summers, Stoney Fonua, Tom Sawyer. Those with paper filings but no reported itemized contributions 7 We are able to access this information by setting up an Adwords account and searching for each candidate’s name here: https://adwords.google.com/um/GetStarted/Home?__u=1110451079&__c=6445738689&authuser=0#oc. 4 during the periods selected include: Bill Bledsoe, Bill Barron, and Phil Anderson. Many of the remaining candidates have an FEC record but no financial data for the 2015-16 cycle or other elections. Social Media In some cases, there are candidates for whom some social media activity occurred, but either Facebook or Twitter information is not retrieved. In several cases, such as Joe Heck and Stoney Fonua for Facebook, the candidate appears to have once had a social media presence that is no longer available, due to account deletion and archiving. In other cases, temporal subsetting does not include the available posts, such as with the SenatorCortezMasto and SenEvanBayh handles for Facebook and the ArnMenconi and ShelbyForSenate handles for Twitter. Often, these candidates are included in the analysis because there is information about posting through still-accessible parts of these candidates’ social media presences, but it is worth noting that this presence may in fact have been larger than is represented in the calculated social media activity figures. A full list of eliminated handles is provided here. Profiles for which no Facebook data is accessible: ● Profiles (rather than pages): ○ ‘rscrumpton’ ○ 'ray.metcalfe.16’ ○ ‘arnmenconi’ ○ ‘robinlaverne’ ○ ‘michael.k.workman’ ○ ‘eric.navickas’ ○ 'edward.clifford.56’ ○ 'jerry.trudell.7’ ● Archived/Not Available ○ ‘wiesnersenate’ ○ ‘heck4nevada’ ○ ‘CallahanForOregon’ ○ ‘senatortoomey’ ○ ‘stonefonua’ ○ ‘2016crisericson’ ● Business Page ○ 'Mike-Crapo-For-US-Senate-286049384763373’ ○ 'Tom-Jones-for-US-Senate-1752874281645766’ Profiles for which no Twitter data is available: ● No tweets ○ tonyg4senate ○ Callahan4Oregon ● No posts in time frame ○ ArnMenconi ○ DickBlumenthal ○ SenEvanBayh ○ ShelbyForSenate Candidate Information & Posts Full dataset is listed as “2016 Senatorial Candidates: Candidate Information & Posts” in the following repository: https://figshare.com/s/4183e5df4e1b959701a5. Figure SI.2 provides a summary of included variables. 5 Figure SI.2. Variables in the Candidate Information & Posts Dataset. Regression Full dataset is listed as “2016 Senatorial Candidates: Social Media Volume/Type & Campaign Donations” in the following repository: https://figshare.com/s/e271230e5a52e7c1d57e. Figure SI.3 provides a summary of included variables. 6 Figure SI.3. Variables in the Regression Dataset. TOPIC MODELING Corpus Cleaning Operations Here we describe in greater detail each cleaning step taken to prepare the corpus for topic modeling. Standard Operations ● Lowercase. Convert all terms to lowercase. ● Punctuation. Remove standard punctuation terms. ● Numbers. Remove numbers. ● Stopwords. Remove standard English stop words.8 8 The full list of stopwords used in the R operation is provided here: http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart7 stop-list/english.stop. ● Whitespace. Convert multiple whitespace characters into a single blank (Feinerer, 2017). ● Stemming. Apply Porter’s stemming algorithm to stem words.9 Custom Operations ● Domain words. Many of the posts include links. Because they are largely shortened Twitter links (e.g. ‘m2umndgs86’), they are not informative regarding the purpose of the post, and therefore demand removal. We remove these by first identifying all the end components of the Twitter links, which involves extracting all links from the each document using regex and removing all nonalphanumeric characters and numbers. Once we create a vector of all the domain ends, removing these full links is simple, since the operation to remove special characters above splits each full link into several components and it is simple to include http:// and https:// and “t” and “co” as nonword terms, such that they are removed when that command is run. ● Hex terms. These are used to indicate a backslash or other special characters, but show up as xef or xbn. We remove these because they are meaningless. ● Nonwords. Any words observed in manual examination that do not add meaningful content, such as ‘schriok’ (a name), are removed. As noted above, this includes collections of letters that are created by the removal of numbers, links, and punctuation in the initial cleaning (such as “https” and “www” and ‘ly’). ● Custom stop words. These are words that do not have meaning but are also not proper terms such as names. They include terms such as “also”, “other”, “another”, “ill”, "im”, “ive”. Some of them are created by the removal of punctuation in the first cleaning step. ● Candidate names (first and last). We remove the candidate’s first and last names, because they do not add meaningful information in the quest to identify patterns. ● Name/State Hashtags. Many candidates use hashtags to identify the race in which they are running, such as “#AZSen” by Democratic senatorial challenger in Arizona Ann Kirkpatrick and “laelex” by Republican challenger John Neely Kennedy in Louisiana. These terms do not offer meaningful information, since they vary by state/candidate. ● State names. We remove the names of specific states, such as “California”, because these do not help in identification of patterns. ● Candidate handles. We remove the Twitter and Facebook handles of the candidates. We keep all other handles, such as realdonaldtrump, hillaryclinton, and drjillstein, since appearance of these terms offers meaningful information – such as, for instance, whether the candidates are responding to, supporting, or discussing federal candidates. ● Letters. We remove all individual letters of the alphabet. ● Most frequent terms. We remove the five most common topics in the document, based on a cutoff in frequency when all terms are examined. These include: “vote”, “senat”, “thank”, “today”, “work.” Because these terms appear over and over again in the text, they do not improve the chances of distinguishing messages based on differential content. ● Least frequent terms. The least frequent terms do not aid identification of patterns in the documents, so we remove all terms that appear just once in the corpus. This involves eliminating 10,183 terms, which encompasses 53.19% of the total terms. On a technical level, removal of the 306 entries that are empty following cleaning means that the output of the topic modeling tool does not align directly in terms of document numbers with the posts listed in the original file. While we could simply remove the posts that display no text or become empty in the cleaning process from the original file, this would lose valuable information about the volume of activity by a candidate. Instead, after cleaning the data, we record each document’s original position in the full corpus. Alongside information about where the documents are in the non-empty corpus, retaining this information allows us to align the output of the topic modeling tool with the correct row positions in the candidate information file. Hence, the candidate file with classification information remains 20,491 entries but there are 306 cases in which no classification information whatsoever is provided, since these are empty entries either before or after cleaning. 9 The paper originally presenting the algorithm is Porter (1980). Documentation for the text mining package to which the stemming algorithm belongs in R is here: https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf. 8 Topic Modeling Output - Words in Topics Figure SI.4. Top 10 Words Associated with Each Topic. Topic Modeling Description Among the most important steps in identifying the content of a document is determining which topics are addressed by that document (Griffiths and Steyvers, 2004). A given document, or message, can deal with multiple topics, and the words appearing in it reflect the particular set of topics it addresses. Latent Direct Allocation, the approach used in this paper, is an unsupervised statistical method that views topics as probability distributions over words and treats documents as probabilistic combinations of these topics, in order to model the contributions of different topics to a given document (Griffiths and Steyvers, 2004). This specific version of topic modeling involves the user entering a collection of documents and specifying a number of topics; hence, a major part of the process is cleaning the data – notably, 50-80% of a typical data scientist’s role has been shown to involve data preparation (Lohr, 2014) – and then determining the optimum number of topics. As mentioned earlier, topic modeling is not a common approach in online political content analysis, due perhaps to the greater accessibility of approaches such as keyword search and hand-coding. We remove original and cleaned posts that have no text. Null values arise because some original posts have no text (perhaps because they involve just a picture instead), or because the cleaning process drastically shortens them. Empty posts amount to 306 of 20,491 documents. While these are removed for topic modeling, since they do not add valuable qualitative information, we include these posts as part of the overall post volume for a given candidate-period and record them as ‘none’ type because they do not belong to a specific topic, a practice that minimizes extra categories for analysis and has precedent in topic. Altogether, cleaning operations reduce the total number of words from 507,222 in the uncleaned message content to 185,010 in the cleaned corpus. Across platforms, the original messages range from 3 to 6603 characters, with a mean of 144 median of 139. The cleaned messages range from zero (inclusive of the empty posts) to 3,396, with a mean of 61 and median of 54. The average Facebook message is shortened from 258 characters to 134 characters, while the average Twitter message is shortened from 130 characters to 52 characters. With a cleaned corpus, we identify topics in the corpus and classify each post accordingly. Topic Number Optimization. To identify an optimum number of topics, we use a function that loops over different topic numbers, calculates the log-likelihood of the model at each number of topics, and plots the result. We chose to assess the log-likelihood between 2 and 20 topics, in order to keep the number manageable for analysis, although there is no best practice with regard to the proportion of topics to words. The function used to create Figure SI.5 estimates a LDA model using 2000 iterations of Gibbs sampling for the document term matrix with the following control parameters: a document term matrix (all non-empty documents in the corpus), an ‘nstart’ value specifying the number of random restarts used during optimization (five here), and a seed vector with the same number of entries as the number of random restarts (Forte et al, 2017). 9 Figure SI.5. Log-likelihood of data per topic number (2-20). Optimal number of topics is 10, based on ‘elbow’ detection method. We use the ‘elbow detection’ method to identify the optimum number of topics, a technique with precedent in Twitter topic modeling (Steinskog, Therkelsen, & Gambaeck, 2017). As the marginal improvements in log-likelihood appear to decrease around 10 topics, we use this value as a cutoff. Having established 10 as the optimum number of topics, we specify this as a parameter in the graphical user interface tool and run the model.10 Classifying Documents. The process results in ten collections of words. Consistent with precedent (Messing, Franco, Wilkins, Cable, and Warshauer, 2014), topic labels are selected according to the words appearing most frequently in the posts classified by a specific topic. Figure SI.6 offers a summary. Figure SI.6. Topic Modeling Output: Qualitative Description. Brief description of the ten topics discovered through modeling. The algorithm presents each document as one topic or a share of many topics. The topic used to classify each document is selected as that with the highest description share for the document. Prior to classification of documents into topics, we define a threshold above which to consider classifications by plotting the distribution of all shares and of highest shares. Figure SI.7, of the highest shares, shows a first peak and natural cutoff at roughly 0.25, below which any shares can be inferred as noise. This is corroborated through examination of all shares (Figure SI.7), in which there is a peak at 0.24-0.25. Hence, we select 0.25 as the threshold for classifying documents according to the topic capturing the highest share. We set all other values (those below 0.25) to zero. While there are 106,800 nonzero topic shares in the original output of the topic modeling algorithm, this falls to 52,747 when the threshold of 0.25 is applied. The threshold reduces the total number of classified documents by 1766, such that there are this many with maximum shares that fall under the value (0.25) and hence cannot be classified according to any of the topics. Similar to posts with no text after the cleaning process, these are classified typologically in the ‘none’ category. 10 The topic modeling tool utilized may be found here: https://code.google.com/archive/p/topic-modeling-tool/. 10 Figure SI.7. Distribution of Shares Captured by Maximum Topic (Left) and Each Topic (Right). Each document is represented as a probabilistic combination of topics. All shares are provided on the right, showing a natural cutoff at 0.24-0.25. Plotting the highest shares characterizing each document (left) in the corpus confirms a natural cutoff at 0.25. Grouping Topics. Analysing similarity between topics provides a basis for grouping individual topics and deriving a meaningful hypothesis about the relationship between fundraising and specific types of content. We use cosine similarity to identify the similarity of the automatically-generated topics in the LDA model, an approach with precedent in thematic topic aggregation (Messing et al, 2014). The metric runs between -1 and 1, where smaller angles (more similar topics) will be closer to 1 and larger angles (dissimilar topics) will produce a cosine similarity value nearer to 0; opposite topics are represented by a value of -1. Hence, a higher cosine similarity indicates closer connection between topics. Figure SI.8 shows the distribution of cosine similarities across topics. Ignoring connections between the same topic, the range is 0.18 to 0.28 with a mean and median of 0.24. Figure SI.8. Distribution of Cosine Similarity. Dramatic cutoff at 0.23, below which the cosine similarity values (or values measuring similarity between two topics) can be inferred as noise. The histogram (Figure SI.8) shows a rather dramatic cutoff at 0.23, below which the cosine similarity values can be inferred as noise. We convert all weightings below this threshold to zero in order to identify meaningful variation at the higher end of the spectrum. This leaves 30 unique pairs (of the original 45) for which the cosine similarity is above 0.23. Setting the weights for all other pairs to zero, we visualize the topic similarity for those pairs above the cutoff in graphing platform Gephi.11 The full network is composed of 10 nodes and 30 edges. Analyzing modularity via the Louvain community detection algorithm (Blondel, Guillaume, Lambiotte, and Lefebvre, 11 Gephi Documentation: https://gephi.org/. 11 2008), we are able to detect communities of topics that are more similar to each other than to the others. Modularity is maximized with a resolution of 1.0; at this value, it is 0.07 whereas at 0.5 it is 0.06 (and three communities are detected) and anywhere upwards of 1.5 it is 0 (with 1 community detected). At resolution 1.0, there are two communities detected, as indicated by the color distinction in Figure SI.9. Figure SI.9. Topic Network Graph. Based on cosine similarity weightings, the Louvain community detection method discovers two communities of topics, which qualitative examination shows to group thematically into campaigning-related (green) and issue-related (pink) posts. Visually, it is evident there are two main clusters of topics. Topics 1, 3, 4, 8, 9 form a community while topics 2, 5, 6, 7, 10 form a separate community. The former group all have campaigning-related elements while the other topics relate to policy issues, so the distinction enables us to test the broader hypothesis about the effect certain types of social media content have on fundraising success. In order to form a variable for regression analysis, we calculate the number of a given candidate’s posts per period that are classifiable as campaign- and issue-related. For each candidate-period, this involves summing the posts in topics 1, 3, 4, 8, and 9 for campaign-related posts and the posts in topics 2, 5, 6, 7, and 10 for the issue-related posts. DESCRIPTIVE STATISTICS Overall Distributions Distributions: Donations Sum and Counts Figures SI.10. Per-Period Donation Sums to US 2016 Senatorial Candidates ($5000s): Linear (Left) and Log (Right). The distribution of donation sums per candidate-period is right-skewed and lognormal. 12 Figures SI.11. Per-Period Donation Counts to US 2016 Senatorial Candidates (#): Linear (Left) and Log (Right). The distribution of donation counts per candidate-period is right-skewed and lognormal. Distributions: Twitter and Facebook Posts Figures SI.12. Weekly Posts (Twitter + Facebook) by US 2016 Senate Candidates: Linear (Left) and Log-Transformed (Right). The distribution of posts across all platforms per candidate-period is right-skewed and log-normal. Figures SI.13. Weekly Posts (Twitter) by US 2016 Senate Candidates: Linear (Left) and Log-Transformed (Right). The distribution of tweets per candidate-period is right-skewed and log-normal. 13 Figures SI.14. Weekly Posts (Facebook) by US 2016 Senate Candidates: Linear (Left) and Log-Transformed (Right). The distribution of Facebook posts per candidate-period is right-skewed and log-normal. Distributions: Social Media Posts per Type Figures SI.15. Average Campaign-Related Posts Per Period: Linear (Left) and Log-Transformed (Right). The average number of campaign-related posts per candidate across the six periods is lognormal. FiguresSI.16. Average Issue-Related Posts Per Period: Linear (Left) and Log-Transformed (Right). The average number of issue-related posts per candidate across the six periods is lognormal. 14 Distributions: Google Search Figures SI.17. Weekly Google Search Trends Per Candidate-Period: Linear and Log-Transformed. The distribution of Google search trends per candidate-period is right-skewed and log-normal. Distributions: Wikipedia Page Views Figures SI.18. Wikipedia Page Views, Per Candidate-Period: Linear and Log-Transformed. The distribution of Wikipedia page views per candidate-period is right-skewed and log-normal. Per-Period (week) Distributions Google Search Trends, Wikipedia Page Views Figure SI.19. Log-Transformed Distributions of Weekly Google Search Trends Across All Candidates, by Period. 15 Figure SI.20. Log-Transformed Distributions of Weekly Wikipedia Page Views Across All Candidates, by Period. Scatterplots Scatterplots reveal non-linear relationships when each independent variable is plotted against each dependent variable. Specifically, “residuals with a bow shape and increasing variability indicate that a log transformation...is required” (Wonnacott and Wonnacott, 1972, 463). The graphs below show the benefit of log transformation. Figure SI.21. Scatterplot of Post Volume versus Donations, Linear (Left) and Log-Transformed (Right). The relationship is not linear when in lin-lin form, suggesting the need for log-transformation. 16 Figure SI.22. Scatterplot of Campaign-Related Posts versus Donations, Linear (Left) and Log-Transformed (Right). The relationship is not linear when in lin-lin form, suggesting the need for log-transformation. Figure SI.23. Scatterplot of Issue-Related Posts versus Donations, Linear (Left) and Log-Transformed (Right). The relationship is not linear when in lin-lin form, suggesting the need for log-transformation. 17 Regression Model Diagnostics Residual versus Fitted Plots Residual versus Fitted plots reveal if residuals have non-linear patterns. Residuals for models based on log-transformed data appear to be relatively evenly spread with no clear patterns, which indicates a linear relationship. Figure SI.24. Residuals versus Fitted, with IV=Post Volume. Figure SI.25. Residuals versus Fitted, with IV=Campaign-Related Posts. 18 Figure SI.26. Residuals versus Fitted, with IV=Issue-Related Posts. Normal Q-Q Plots Normal Q-Q plots based on log-transformed data show the residuals are normally-distributed, as the points of the plot lie roughly on the straight diagonal lines. Figure SI.27. QQ-Normal Plots, with IV=Post Volume. 19 Figure SI.28. QQ-Normal Plots, with IV=Campaign-Related Posts (Models 7-12) and IV=Issue-Related Posts. Correlation between Itemized and Unitemized Sums To determine whether the unitemized sums can be taken as a proxy for the full value received by candidates, we examine the relationship between unitemized and itemized contributions over the 2016 cycle for each candidate with a campaign account. This involves collecting data on the itemized and unitemized receipts manually through the FEC website: https://www.fec.gov/data/. There are a total of 78 candidates with both sums listed for the 2016 cycle. The two variables are highly positively correlated, with a Pearson’s correlation coefficient of 0.71. CODE In the interest of replicability, code used to access and analyze all data involved in this research may be found here: https://figshare.com/s/ff85b0a22fe470de5e80. REFERENCES Bar-Hen, A., Baschet, L., Francois-Xavier, J., & Riou, J. (2016). Package 'WikipediaR'. Retrieved from https://cran.r-project.org/web/packages/WikipediaR/WikipediaR.pdf. Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84. Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008. Feinerer, I. (2017). Introduction to the 'tm' Package. Retrieved from https://cran.rproject.org/web/packages/tm/vignettes/tm.pdf. Forte, R., Mayor, E. and Fischetti, T. (2017). R: Predictive Analysis. 1st ed. Packt Publishing. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl 1), 5228-5235. 20 Lohr, S. (2014). For Big-Data Scientists, 'Janitor Work' is Key Hurdle to Insights. New York Times. Retrieved from https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insightsis-janitor-work.html. Messing, S., Franco, A., Wilkins, A., Cable, D., & Warshauer, M. (2014). Campaign rhetoric and style on Facebook in the 2014 US midterms [Blog post]. Retrieved from https://www.facebook.com/notes/facebookdata-science/campaign-rhetoric-and-style-on-facebook-in-the-2014-us-midterms/10152581594083859/. Steinskog, A. O., Therkelsen, J. F., & Gambäck, B. (2017, May). Twitter Topic Modeling by Tweet Aggregation. In Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 2224 May 2017, Gothenburg, Sweden (No. 131, pp. 77-86). Linköping University Electronic Press. Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440. Wonnacott, T. H., & Wonnacott, R. J. (1972). Introductory statistics (Vol. 19690). New York: Wiley. 21
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

...


Anonymous
Really helped me to better understand my coursework. Super recommended.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags