Discovering Evidence of dirty data through data visualization using R(APA format)

User Generated

nelnaa

Computer Science

Trine University

Description

After listening to the lecture recording that I have provided and reading chapter 3 in your textbook, specifically section (3.2), you have seen many examples about how data visualizations using R can help you in assessing data cleanliness and finding dirty data in your dataset during data exploration and analysis.

In this discussion forum, I would like you to provide a complete example that is not presented in the book that explains how data visualization using R can assist us in understanding how dirty data can manifest itself in visualization. Please provide the graph(s)/plot(s) along with a discussion that explains how visualization helps in capturing or determining the existence of dirty data.

Unformatted Attachment Preview

Dr. Awny Alnusair University of the Cumberlands 2 Putting the Data Analytics Lifecycle into Practice • The Data Analytics Lifecycle consists of the following six phases: 1. 2. 3. 4. 5. 6. Discovery Data Preparation Model Planning Model building Communicate Results Operationalize • To begin analyzing the data, you will need a tool that allows you to look closely at the data – That is “R” 3 KDnuggest Poll - 2019 • KDnuggets Poll is a survey of data science and machine learning software. It asks programmers what languages they use on a regular basis in their work 4 The R Project for Statistical Computing • First of all you need to get R installed on your computer – https://www.r-project.org/ 5 Up and Running with R • Once R is installed, you can test the installation by opening the R Console 6 RStudio – an IDE for R • https://www.rstudio.com/ • https://rstudio.cloud/ 7 Entering Data into R 8 R Packages • https://cloud.r-project.org/web/packages/index.html 9 Things you’r expected to know about R …. • R Data Types and structures • Basic descriptive statistics, dirty data .. • Data Visualization and relationships between multiple variables • Generic Functions • Dealing with sample datasets that are available for you • Statistical Methods for Model Building and Evaluation – Hypothesis Testing - Welche’s t-test, Confidence intervals, Wilcoxon rank-sum test, type I and II errors, and ANOVA 10
Purchase answer to see full attachment
Explanation & Answer:
3 pages
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Sorry, use this.

Running head: EVIDENCE OF DIRTY DATA VIA DATA VISUALIZATION USING R

Evidence of Dirty Data via Data Visualization Using R
Name
Instructor
Course
Date

1

EVIDENCE OF DIRTY DATA VIA DATA VISUALIZATION USING R

2

Evidence of Dirty Data via Data Visualization Using R
Data visualization is essential as it allows a user to comprehend the dynamics of the data.
In a histogram the user is able to understand the numeric aspect of different elements which are
categorized into bins. The bins then depict the instances that are common to a particular. The
height is indicative of these instances. As data visualization can enable the user to take note of
dirty data through comprehending the distribution of a given aspect. Similarly it becomes
possible to understand the spre...


Anonymous
Nice! Really impressed with the quality.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags