Exploratory Data Analysis (EDA) and Create Data Visualizations and Dashboards

User Generated

tevzerncre313

Business Finance

Description

Assignment 3: EDA and Data Visualization

This assignment is quite a bit different than the first two assignments. Now that we’ve built a solid foundation of spreadsheet modeling/engineering skills and done some pretty structured assignments, you are going to get a chance to be more creative. In this assignment, you are going to do some exploratory data analysis (EDA) and create data visualizations and dashboards. We spent two weekly modules exploring these topics and our Moodle site contains numerous resources for you to draw upon. In particular, our BA textbook does a nice job of reviewing some Excel EDA specifics in Chapters 2, 3, and 17.2. You should also read through the series of whitepapers I posted in the first module on this topic:

Cycle Share data analysis

Many cities have started bike sharing programs. These programs often capture very detailed data related to bike usage and such data can be quite helpful in managing these systems. In this assignment, we’ll be using data from Seattle’s Cycle Share program as well as weather related data from the National Climatic Data Center. You can find the three datasets in the \data folder within the zip file. Here’s a description of the files:

Context

The Pronto Cycle Share system consists of 500 bikes and 54 stations located in Seattle. Pronto provides open data on individual trips, stations, and daily weather.

Content

There are 3 datasets that provide data on the stations, trips, and weather from 2014-2016.

  • Station dataset
    • station_id: station ID number
    • name: name of station
    • lat: station latitude
    • long: station longitude
    • install_date: date that station was placed in service
    • install_dockcount: number of docks at each station on the installation date
    • modification_date: date that station was modified, resulting in a change in location or dock count
    • current_dockcount: number of docks at each station on 8/31/2016
    • decommission_date: date that station was placed out of service
  • Trip dataset
    • trip_id: numeric ID of bike trip taken
    • starttime: day and time trip started, in PST
    • stoptime: day and time trip ended, in PST
    • bikeid: ID attached to each bike
    • tripduration: time of trip in seconds
    • from_station_name: name of station where trip originated
    • to_station_name: name of station where trip terminated
    • from_station_id: ID of station where trip originated
    • to_station_id: ID of station where trip terminated
    • usertype: "Short-Term Pass Holder" is a rider who purchased a 24-Hour or 3-Day Pass; "Member" is a rider who purchased a Monthly or an Annual Membership
    • gender: gender of rider
    • birthyear: birth year of rider
  • Weather dataset contains daily weather information in the service area

I want you to take the perspective of an analyst who has been asked to provide a thorough analysis of the usage of this bike share program based on the data provided. Pretend a new operations manager of the Cycle Share (CS) has just started and wants you to provide her with summary statistics and data visualizations to help her understand how various location, weather, demographic and time related variables affect bike share use. Some examples of basic questions she has include things like:

How many people use CS every month? Is ridership increasing, decreasing, staying the same? Do these trends differ by different rider types, age, gender or other variables?

What are popular stations to rent from? Popular destinations? Popular trips?

How long are trips? How is ride duration impacted by rider, station, or weather characteristics?

How does ridership vary by day of week and time of day?

How does weather affect ridership?

Of course, these basic questions lend themselves to more nuanced questions such as whether rain or temperature has a greater effect on ridership and how do weather and temporal factors interact to impact ridership? She doesn’t even know all the questions she has and is counting on you to enlighten her. Be creative, be analytical, wow her with your insightful analysis.

So, your job is to create a series of tables, graphs, dashboards, infographics, supporting text, or whatever else you think is appropriate to give your new manager the information and insight she needs to chart a good future course for the CB program. Your visualizations and analysis should be woven together in a coherent way as to tell a story to the new manager.

You could do the whole thing in Excel but are welcome to also use other tools like Access, Tableau or PowerPivot. You can definitely use PowerPoint or Word or similar tools as a “container” to help structure your story. You could also use the Story feature in Tableau to structure your presentation.For example, you might use Excel to do your data prep and analysis and then paste visualizations into a Powerpoint presentation which includes your analytical commentary and summarizations. Other people might choose to use some combination of Excel and Tableau for the analysis and use the Tableau Story feature to put it all together (or just copy and paste Tableau visualizations into a Powerpoint presentation. Use your creativity and imagination to combine tools as you see fit.

You CANNOT use any programmatic analysis tools like R or Python.

Make sure you use principles of graphical excellence, solid Excel graph and table designs, and, most importantly, that you tell a coherent and compelling story based on the bike share data. Don’t just create a hodge podge of unrelated graphs and tables. Weave your visualizations together in some coherent and logical way. I’ve given you numerous resources in our course website to help you.

In addition to the actual analysis deliverables, you’ll also be turning in a short supporting technical document (use Word) that describes the various steps you took in creating the deliverables. For example, this supporting document will describe what you did in the data prep phase and which tools you used, how you went about creating the analytical outputs (e.g. did you use Pivot Tables or Tableau or whatever). So, as you are doing the assignment, take some notes in a Word doc so that you can then turn that into this supporting technical document.

Getting Started and Data Prep

You might want to start by importing the CSV files into Excel and just browse through the three tables and the descriptions above to become familiar with the data.

Then, think about creating any new computed columns you might want to have which will facilitate your analysis. For example, there is a datetime field called starttime in the trip table. I’d recommend creating an additional field called something like tripdate and computing the date (with no time) based on the starttime field. We learned all about Excel dates and times and this should be easy for you. Why might you want to do this? Well, if you look in the weather table you’ll see that it contains one row per date. So, if you want to try to create one big master table containing data from the trip and weather tables, you can use the date to look up values from tables as needed.Feels like a VLOOKUP, eh?

STRONG SUGGESTION: If you do a bunch of VLOOKUPs to create a master data table, when you are all done, go ahead and do a Copy – Paste Special – Values into a new workbook sheet. This will make subsequent analysis much quicker since Excel won’t have to recompute a bunch of formulas.

Of course, if you use Tableau or PowerPivot, you might be able to simply join the tables and avoid creating a big master table. Either or both approaches are fine. In fact, here’s a link to a short article by the Tableau folks about this very issue:

http://onlinehelp.tableau.com/current/pro/desktop/en-us/help.htm#multiple_connections.html.

The primary goal for the data preparation step is to make it easy to do the types of analysis you want to do. You may end up creating a few different data tables for analysis. That’s up to you. Just document what you do.

Analysis Suggestions

When doing a comprehensive analysis for an upper level manager, it’s almost always a good idea to start with high level, important, overall statistics and visualizations and then dive into the details to explore further. For example, do NOT start by showing some detailed analysis of how windy Tuesdays impact ridership during the spring. For example, you could start with overall volume trends and rider demographics.

Get Ideas and Inspiration

A few years ago, the state of Colorado held a data visualization contest focused on the state of Colorado’s public school system. All the winning entries used to be posted – however, now just the following one is still available. Again, this is just to provide some ideas and inspiration.

http://infogr.am/Colorado-school-equality

The Tableau Public site has some nice examples of specific visualizations done in Tableau.

Remember, the goal isn’t to be “fancy”, it’s to concisely convey to the new manager what is going on with the CS program in Seattle.

Deliverables

Obviously, the nature of your deliverables is affected by the tools you use. There are two main parts to your deliverables:

  • The analysis products themselves – i.e. what you’d deliver to your manager. For example this might be a Tableau Story, or a Powerpoint presentation or Word document with the graphs/tables created in either Excel or Tableau. It could even just be a nicely structured Excel file with nagivational aids, graphs (possibly interactive) and summary text in text boxes or something similar. Use your imagination and ingenuity.

    IMPORTANT: If you end up copying and pasting graphs/tables from Excel into Powerpoint or Word, I also want the Excel files containing the actual data and graphs/tables so that I can see how you created these things.
  • A technical background document in MS Word that describes how you did what you did. This doesn’t need to be super detailed but should make it clear to me how:
    • You did your data prep
    • You constructed the various visualizations
    • and how you constructed overall story.

IMPORTANT: You MUST put all of your deliverables into a folder, zip the folder and then upload the zip file. Give your zip file a good filename.

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

The Excel file was too large and I am sharing with you through google drive. Here is the link: https://drive.google.com/file/d/1LvfgF2Tl5AcFoygvP...

Business Analytics

Fall 2019

EDA and data visualization
Student’s Name
Institution affiliation

1

Business Analytics

Fall 2019

The Pronto Cycle Share system is used by many individuals of which 77% are male, 21%
are female and those who did not disclose the gender are make 2% of the users.

Other
2%

Percentage of trip Users
Female
21%

Female
Male
Other
(blank)

Male
77%

The percentage of members who use the Cycle Share program is 62% and the users who
are short-term pass holders account for 38%.

Short-Term Pass
Holder, 38%

Members type

Member
Short-Term Pass Holder

Member, 62%

The Number of users of the Program from 2014 Quarter 4 to 2016 Quarter 3 does not
have a significant trend. However, the highest number of users was experienced in the 2nd
and the 3rd Quarter of 2015.
2

Business Analytics

Fall 2019

Number of Users

Number of Users per Quarter

Qtr1

2014

2015
24,678

2016
17,921

Qtr2

44,445

31,064

48,988

26,535

Qtr3
Qtr4

20,239

22,195

The average temperature for each quarter is represented by the chart below. The average
Highest temperature was witnessed in the 3rd Quarter of 2015 and this is the same period
the number of users was the highest.

Average Temperature by Quarter
68.75
70.00
60.00

61.16
49.85

57.36

50.18

49.96

Axis Title

50.00
40.00
30.00
20.00
10.00
0.00

Total

Qtr4
2014
49.85

Qtr1
50.18

Qtr2

Qtr3

Qtr4

2016

61.16

2015
68.75

49.96

57.36

The distribution of the trip duration is right-skewed. The majority of the trips seem to
take very short durations.

3

Business Analytics

Fall 2019

Histogram of Trip duration
18000
16000
14000
12000
10000
8000
6000
4000
2000

60
1,008
1,956
2,904
3,852
4,800
5,748
6,696
7,644
8,591
9,539
10,487
11,435
12,383
13,331
14,279
15,227
16,175
17,123
18,071
19,019
19,967
20,915
21,863
22,811
23,758
24,706
25,654
26,602
27,550
28,498

0

62% of the users responded their age while 38% did not respond to their age. Of those
who responded to their age a majority of the program users have 32 ye...


Anonymous
Goes above and beyond expectations!

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags