Overview: For all milestones in this course, you will be using the same datasets, but in different versions. For this milestone, you’ll be using both the Milestone Two Data Firefighter and Milestone Two Data Police data sets. Your pathway through the milestones is designed so that you will apply the progressively complex concepts you have learned in RStudio.
Milestone Two will assist you in preparing for you for the Final Project. In Milestone One, you performed a preliminary data assessment. In this milestone, you will begin your analysis of the data in earnest by using RStudio to validate the data assessment you performed in Milestone One and to discover new information.
Prompt: Based on your findings from your preliminary data assessment (Milestone One), you will perform the following steps and commands to continue your analysis of the firefighter and police data. This time the data has been separated out for you into two different files: DAT 500 Milestone Two Data - Firefighter and DAT 500 Milestone Two Data - Police. Inside the Codio module Milestone Two, these files are located in the directory ~workspace/SNHU/DAT500/Milestones
In order to complete Milestone Two, you must first:
Move files: Utilize the command line interface to copy the following files to the ~workspace/Analysis directory: o dat500_milestone_two_data_firefighter.csv o dat500_milestone_two_data_police.csv Modify Files: Utilize the command line interface to rename the files to: o Firefighter.csv o Police.csv Code Comments: Utilize comments to add explanations to your code to make it more readable. You will submit your Rscript when it is complete. Import Data to RStudio: Using an Rscript, create the commands needed to read in each of the data files so that they can be further analyzed. Take a screenshot of the data viewer in RStudio to confirm that you imported the data to RStudio. Summary: Perform the summary function to get the descriptive statistics from the file. Identify any noteworthy findings from this summary function and whether or not this changes your plan from Milestone One. Be sure to explain your rationale in either case. You must also submit either a screenshot or an export of the log of the execution of the command and its results. Variables: Use the same two of the data fields (i.e., columns) in the files that you selected to compare in Milestone One. For example, we can use Total Salary and Total Compensation fields in each file to compare Firefighter and Police. For each of the Firefighter and Police data, create three separate variables for each of the two fields selected. There will be a total of 12 variables created. The three separate variables will be the minimum, maximum, and average value for each of the data fields for each of the Firefighter and Police data files. Data Validation: Discuss your findings. You have now calculated the same information (the min, max, and average) in three different ways and places (Milestone One, the summary function, and the variables function). Do the calculations and commands you performed above confirm what you found in Milestone One? Why or why not? This is considered data validation. Did you find new avenues to pursue?
Data Discovery: Now, assuming you have validated your data, compare the variables from Firefighter and Police. How do the numbers compare, and what might it mean? Consider explaining in a few sentences how you might want to proceed to dig deeper into the data to come to an informed conclusion.
Tip: Having trouble figuring out what we’re trying to accomplish in this milestone? Consider taking another look at the Module Four Journal: Data Validation and Discovery Walkthrough assignment.
Rubric Guidelines for Submission: Your deliverable for this assignment should be a 1- to 2-page Word document. You may list your answers to the above questions in bullets or in paragraph format. You must also submit your Rscript and screenshots of the logs of all calculations and commands listed, and as requested.
Critical Elements Proficient (100%) Needs Improvement (70%) Not Evident (0%) Value Code Comments Submits Rscript with code that includes comments that add clarity to the code Submits Rscript with code that includes comments, but the comments do not add clarity to the code Does not submit Rscript or submits Rscript without code comments 10 Import Data to RStudio Submits a screenshot on the data view to confirm that the data sets have been imported to RStudio Does not submit a screenshot to confirm that the data sets have been imported to RStudio 10 Summary Performs summary function and discusses results Performs summary function, but discussion of results is unclear or brief Does not perform summary function and does not discuss results 20 Variables Creates min, max, and average variables for two data fields in both data sets Creates min, max, and average variables for two data fields in both data sets, but calculations are inaccurate or only one data field has been chosen Does not create min, max, and average variables for data fields in both datasets 20 Data Validation Discusses findings from data validation Discusses findings from data validation, but discussion is brief or unclear Does not discuss findings from data validation 20 Data Discovery Discusses a direction for continued analysis Discusses a direction for continued analysis, but discussion is cursory Does not discuss a direction for continued analysis 20 Total 100%