R Question

Programming

California University of Management and Sciences

Question Description

  1. Return to the visualization for Presidential Elections: Popular and Electoral College margins, subset by party, and use that to add color to your points.
  2. Recreate figures 5.28 using functions from the dplyr library.
  3. Using gss_sm data, calculate the mean and median number of children by degree
  4. Using gapminder data, create a boxplot of life expectancy over time
  5. Using gapminder data, create a violin plot of population over time.
  6. Using the gapminder data, create a plot comparing log(gdp PerCa with Life Exp and show three different smoothers in three different colors with a legend showing each smoother type.
  7. In a paragraph compare and contrast the smoother types. LOESS, Cubic Spline, and OLS
  8. Look at the gapminder data with str()
  9. Create a linear model of the gapminder data with life expectancy as the target of a multifactor model built from gdpPercap, pop, and continent. Store it in a variable called 'out'.
  10. print a summary of out.
  11. notice that printing a summary of gapminder will produce different results because summary() knows that out is the output of a linear model.
  12. Use min() and max() to get the minimum and maximum values of per capita GDP and create a vector of 100 evenly spaced elements between them while holding population constant at it's median and showing the values by continent using a vector..
  13. use predict() to calculate the fitted values for evey row in the dataframe and show the upper and lower bounds of a 95% confidence interval. Store the result in a variable predi_out.
  14. Use cbind() to bind the two data frames together by column.
  15. Look at the top six rows of the result with head()
  16. make an OLS plot the combined dataframes after subsetting continent to Africa and Europe using geom_ribbon to show the prediction intervals.
  17. What does the alpha aesthetic do?

Unformatted Attachment Preview

Data Visualization Data Visualization A PRACTICAL INTRODUCTION Kieran Healy princeton university press princeton and oxford © 2019 Princeton University Press Published by Princeton University Press 41 William Street, Princeton, New Jersey 08540 6 Oxford Street, Woodstock, Oxfordshire OX20 1TR press.princeton.edu All Rights Reserved Library of Congress Control Number: 2018935810 ISBN 978-0-691-18161-5 ISBN (pbk.) 978-0-691-18162-2 British Library Cataloging-in-Publication Data is available This book has been composed with open-source tools in Minion Pro, Myriad Pro, and Iosevka Type. Printed on acid-free paper. ∞ Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 For the Llamanteriat, who saw it first. Contents Preface xi What You Will Learn xii The Right Frame of Mind How to Use This Book Conventions 2 xv xvi Before You Begin 1 xiv xvii Look at Data 1 1.1 Why Look at Data? 2 1.2 What Makes Bad Figures Bad? 1.3 Perception and Data Visualization 1.4 Visual Tasks and Decoding Graphs 23 1.5 Channels for Representing Data 26 1.6 Problems of Honesty and Good Judgment 1.7 Think Clearly about Graphs 29 1.8 Where to Go Next 31 5 14 Get Started 32 2.1 Work in Plain Text, Using RMarkdown 2.2 Use R with RStudio 2.3 Things to Know about R 2.4 Be Patient with R, and with Yourself 2.5 Get Data into R 49 35 38 48 32 27 viii • Contents 3 4 5 2.6 Make Your First Figure 2.7 Where to Go Next 51 52 Make a Plot 54 3.1 How Ggplot Works 54 3.2 Tidy Data 56 3.3 Mappings Link Data to Things You See 3.4 Build Your Plots Layer by Layer 3.5 Mapping Aesthetics vs Setting Them 63 3.6 Aesthetics Can Be Mapped per Geom 66 3.7 Save Your Work 68 3.8 Where to Go Next 56 59 71 Show the Right Numbers 73 4.1 Colorless Green Data Sleeps Furiously 74 4.2 Grouped Data and the “Group” Aesthetic 4.3 Facet to Make Small Multiples 4.4 Geoms Can Transform Data 80 4.5 Frequency Plots the Slightly Awkward Way 82 4.6 Histograms and Density Plots 85 4.7 Avoid Transformations When Necessary 88 4.8 Where to Go Next 74 76 91 Graph Tables, Add Labels, Make Notes 93 5.1 Use Pipes to Summarize Data 94 5.2 Continuous Variables by Group or Category 5.3 Plot Text Directly 115 5.4 Label Outliers 5.5 Write and Draw in the Plot Area 124 121 102 Contents 6 7 8 5.6 Understanding Scales, Guides, and Themes 5.7 Where to Go Next 131 125 Work with Models 134 6.1 Show Several Fits at Once, with a Legend 6.2 Look Inside Model Objects 137 6.3 Get Model-Based Graphics Right 141 6.4 Generate Predictions to Graph 6.5 Tidy Model Objects with Broom 146 6.6 Grouped Analysis and List Columns 6.7 Plot Marginal Effects 157 6.8 Plots from Complex Surveys 6.9 Where to Go Next 168 135 143 151 161 Draw Maps 173 7.1 Map U.S. State-Level Data 175 7.2 America’s Ur-choropleths 182 7.3 Statebins 189 7.4 Small-Multiple Maps 191 7.5 Is Your Data Really Spatial? 194 7.6 Where to Go Next 198 Refine Your Plots 199 8.1 Use Color to Your Advantage 8.2 Layer Color and Text Together 8.3 Change the Appearance of Plots with Themes 208 8.4 Use Theme Elements in a Substantive Way 8.5 Case Studies 8.6 Where to Go Next 230 215 201 205 211 • ix x • Contents Acknowledgments Appendix 233 235 1 A Little More about R 235 2 Common Problems Reading in Data 245 3 Managing Projects and Files 253 4 Some Features of This Book 257 References Index 267 261 Preface You should look at your data. Graphs and charts let you explore and learn about the structure of the information you collect. Good data visualizations also make it easier to communicate your ideas and findings to other people. Beyond that, producing effective plots from your own data is the best way to develop a good eye for reading and understanding graphs—good and bad—made by others, whether presented in research articles, business slide decks, public policy advocacy, or media reports. This book teaches you how to do it. My main goal is to introduce you to both the ideas and the methods of data visualization in a sensible, comprehensible, reproducible way. Some classic works on visualizing data, such as The Visual Display of Quantitative Information (Tufte 1983), present numerous examples of good and bad work together with some general taste-based rules of thumb for constructing and assessing graphs. In what has now become a large and thriving field of research, more recent work provides excellent discussions of the cognitive underpinnings of successful and unsuccessful graphics, again providing many compelling and illuminating examples (Ware 2008). Other books provide good advice about how to graph data under different circumstances (Cairo 2013; Few 2009; Munzer 2014) but choose not to teach the reader about the tools used to produce the graphics they show. This may be because the software used is some (proprietary, costly) point-and-click application that requires a fully visual introduction of its own, such as Tableau, Microsoft Excel, or SPSS. Or perhaps the necessary software is freely available, but showing how to use it is not what the book is about (Cleveland 1994). Conversely, there are excellent cookbooks that provide code “recipes” for many kinds of plot (Chang 2013). But for that reason they do not take the time to introduce the beginner to the principles behind the output they produce. Finally, we also have thorough introductions to particular software tools xii • Preface and libraries, including the ones we will use in this book (Wickham 2016). These can sometimes be hard for beginners to digest, as they may presuppose a background that the reader does not have. Each of the books I have just cited is well worth your time. When teaching people how to make graphics with data, however, I have repeatedly found the need for an introduction that motivates and explains why you are doing something but that does not skip the necessary details of how to produce the images you see on the page. And so this book has two main aims. First, I want you to get to the point where you can reproduce almost every figure in the text for yourself. Second, I want you to understand why the code is written the way it is, such that when you look at data of your own you can feel confident about your ability to get from a rough picture in your head to a high-quality graphic on your screen or page. What You Will Learn This book is a hands-on introduction to the principles and practice of looking at and presenting data using R and ggplot. R is a powerful, widely used, and freely available programming language for data analysis. You may be interested in exploring ggplot after having used R before or be entirely new to both R and ggplot and just want to graph your data. I do not assume you have any prior knowledge of R. After installing the software we need, we begin with an overview ofsomebasicprinciplesofvisualization. Wefocusnotjustontheaesthetic aspects of good plots but on how their effectiveness is rooted in the way we perceive properties like length, absolute and relative size, orientation, shape, and color. We then learn how to produce and refine plots using ggplot2, a powerful, versatile, and widely used visualization package for R (Wickham 2016). The ggplot2 library implements a “grammar of graphics” (Wilkinson 2005). This approach gives us a coherent way to produce visualizations by expressing relationships between the attributes of data and their graphical representation. Through a series of worked examples, you will learn how to build plots piece by piece, beginning with scatterplots and summaries of single variables, then moving on to more complex graphics. Topics covered include plotting continuous and categorical Preface variables; layering information on graphics; faceting grouped data to produce effective “small multiple” plots; transforming data to easily produce visual summaries on the graph such as trend lines, linear fits, error ranges, and boxplots; creating maps; and some alternatives to maps worth considering when presenting countryor state-level data. We will also cover cases where we are not working directly with a dataset but rather with estimates from a statistical model. From there, we will explore the process of refining plots to accomplish common tasks such as highlighting key features of the data, labeling particular items of interest, annotating plots, and changing their overall appearance. Finally we will examine some strategies for presenting graphical results in different formats and to different sorts of audiences. If you follow the text and examples in this book, then by the end you will • • • • understand the basic principles behind effective data visualization; have a practical sense for why some graphs and figures work well, while others may fail to inform or actively mislead; know how to create a wide range of plots in R using ggplot2; and know how to refine plots for effective presentation. Learning how to visualize data effectively is more than just knowing how to write code that produces figures from data. This book will teach you how to do that. But it will also teach you how to think about the information you want to show, and how to consider the audience you are showing it to—including the most common case, when the audience is yourself. This book is not a comprehensive guide to R, or even a comprehensive survey of everything ggplot can do. Nor is it a cookbook containing just examples of specific things people commonly want to do with ggplot. (Both these sorts of books already exist: see the references in the appendix.) Neither is it a rigid set of rules, or a sequence of beautiful finished examples that you can admire but not reproduce. My goal is to get you quickly up and running in R, making plots in a well-informed way, with a solid grasp of the core sequence of steps—taking your data, specifying the relationship between variables and visible elements, and building up images layer by layer—that is at the heart of what ggplot does. • xiii xiv • Preface Learning ggplot does mean getting used to how R works, and also understanding how ggplot connects to other tools in the R language. As you work your way through the book, you will gradually learn more about some very useful idioms, functions, and techniques for manipulating data in R. In particular you will learn about some of the tools provided by the tidyverse library that ggplot belongs to. Similarly, although this is not a cookbook, once you get past chapter 1 you will be able to see and understand the code used to produce almost every figure in the book. In most cases you will also see these figures built up piece by piece, a step at a time. If you use the book as it is designed, by the end you will have the makings of a version of the book itself, containing code you have written out and annotated yourself. And though we do not go into great depth on the topic of rules or principles of visualization, the discussion in chapter 1 and its application throughout the book gives you more to think about than just a list of graph types. By the end of the book you should be able to look at a figure and be able to see it in terms of ggplot’s grammar, understanding how the various layers, shapes, and data are pieced together to make a finished plot. The Right Frame of Mind It can be a little disorienting to learn a programming language like R, mostly because at the beginning there seem to be so many pieces to fit together in order for things to work properly. It can seem like you have to learn everything before you can do anything. The language has some possibly unfamiliar concepts that define how it works, like “object,” “function,” or “class.” The syntactic rules for writing code are annoyingly picky. Error messages seem obscure; help pages are terse; other people seem to have had not quite the same issue as you. Beyond that, you sense that doing one thing often involves learning a bit about some other part of the language. To make a plot you need a table of data, but maybe you need to filter out some rows, recalculate some columns, or just get the computer to see it is there in the first place. And there is also a wider environment of supporting applications and tools that are good to know about but involve new concepts of their own—editors that highlight what you write; applications that help you organize your Preface • code and its output; ways of writing your code that let you keep track of what you have done. It can all seem a bit confusing. Don’t panic. You have to start somewhere. Starting with graphics is more rewarding than some of the other places you might begin, because you will be able to see the results of your efforts very quickly. As you build your confidence and ability in this area, you will gradually see the other tools as things that help you sort out some issue or solve a problem that’s stopping you from making the picture you want. That makes them easier to learn. As you acquire them piecemeal—perhaps initially using them without completely understanding what is happening—you will begin to see how they fit together and be more confident of your own ability to do what you need to do. Even better, in the past decade or so the world of data analysis and programming generally has opened up in a way that has made help much easier to come by. Free tools for coding have been around for a long time, but in recent years what we might call the “ecology of assistance” has gotten better. There are more resources available for learning the various pieces, and more of them are oriented to the way writing code actually happens most of the time— which is to say, iteratively, in an error-prone fashion, and taking account of problems other people have run into and solved before. How to Use This Book This book can be used in any one of several ways. At a minimum, you can sit down and read it for a general overview of good practices in data visualization, together with many worked examples of graphics from their beginnings to a properly finished state. Even if you do not work through the code, you will get a good sense of how to think about visualization and a better understanding of the process through which good graphics are produced. More useful, if you set things up as described in chapter 2 and then work through the examples, you will end up with a data visualization book of your own. If you approach the book this way, then by the end you will be comfortable using ggplot in particular and also be ready to learn more about the R language in general. This book can also be used to teach with, either as the main focus of a course on data visualization or as a supplement to You can also bring your own data to explore instead of or alongside the examples, as described in chapter 2. xv xvi • Preface undergraduate or graduate courses in statistics or data analysis. My aim has been to make the “hidden tasks” of coding and polishing graphs more accessible and explicit. I want to make sure you are not left with the “How to Draw an Owl in Three Steps” problem common to many tutorials. You know the one. The first two steps are shown clearly enough. Sketch a few bird-shaped ovals. Make a line for a branch. But the final step, an owl such as John James Audubon might have drawn, is presented as a simple extension for readers to figure out for themselves. If you have never used R or ggplot, you should start at the beginning of the book and work your way through to the end. If you know about R already and only want to learn the core of ggplot, then after installing the software described below, focus on chapters 3 through 5. Chapter 6 (on models) necessarily incorporates some material on statistical modeling that the book cannot develop fully. This is not a statistics text. So, for example, I show generally how to fit and work with various kinds of model in chapter 6, but I do not go through the important details of fitting, selecting, and fully understanding different approaches. I provide references in the text to other books that have this material as their main focus. Each chapter ends with a section suggesting where to go next (apart from continuing to read the book). Sometimes I suggest other books or websites to explore. I also ask questions or pose some challenges that extend the material covered in the chapter, encouraging you to use the concepts and skills you have learned. Conventions In this book we alternate between regular text (like this), samples of code that you can type and run yourself, and the output of that code. In the main text, references to objects or other things that exist in the R language or in your R project—tables of data, variables, functions, and so on—will also appear in a monospaced or “typewriter” typeface. Code you can type directly into R at the console will be in gray boxes and also monospaced, like this: my_numbers ← Additional notes and information will sometimes appear in the margin, like this. c(1, 1, 4, 1, 1, 4, 1) If you type that line of code into R’s console, it will create a thing called my_numbers. Doing this doesn’t produce any output, Preface however. When we write code that also produces output at the console, we will first see the code (in a gray box) and then the output in a monospaced font against a white background. Here we add two numbers and see the result: 4 + 1 ## [1] 5 Two further notes about how to read this. First, by default in this book, anything that comes back to us at the console as the result of typing a command will be shown prefaced by two hash characters (##) at the beginning of each line of output. This is to help distinguish it from commands we type into the console. You will not see the hash characters at the console when you use R. Second, both in the book and at the console, if the output of what you did results in a series of elements (numbers, observations from a variable, and so on), you will often see output that includes some number in square brackets at the beginning of the line. It looks like this: [1]. This is not part of the output itself but just a counter or index keeping track of how many items have been printed out so far. In the case of adding 4 + 1 we got just one, or [1], thing back—the number five. If there are more elements returned as the result of some instruction or command, the counter will keep track of that on each line. In this next bit of code we will tell R to show us the lowercase letters of the alphabet: letters ## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" ## [11] "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" ## [21] "u" "v" "w" "x" "y" "z" You can see the counter incrementing on each line as it keeps count of how many letters have been printed. Before You Begin The book is designed for you to follow along in an active way, writing out the examples and experimenting with the code as you go. You will be able to reproduce almost all the plots in the text. You need to install some software first. Here is what to do: • xvii xviii • Preface 1. cloud.r-project.org rstudio.com tidyverse.org I strongly recommend typing all the code examples right from the beginning, instead of copying and pasting. Get the most recent version of R. It is free and available for Windows, Mac, and Linux operating systems. Download the version of R compatible with your operating system. If you are running Windows or MacOS, choose one of the precompiled binary distributions (i.e., ready-to-run applications) linked at the top of the R Project’s web page. 2. Once R is installed, download and install R Studio, which is an “Integrated Development Environment,” or IDE. This means it is a front-end for R that makes it much easier to work with. R Studio is also ...
Student has agreed that all tutoring, explanations, and answers provided by the tutor will be used to help in the learning process and in accordance with Studypool's honor code & terms of service.

This question has not been answered.

Create a free account to get help with this and any other question!

Similar Questions
Related Tags