IDS 567 Project Deliverable 1
Group 5
Elaheh Daj , edaj2@uic.edu , 662063707
Jivisha, fjivis2@uic.edu, 663685107
Ching-Ting Fang, cfang20@uic.edu 658638997
Oluwaseun Ojo, oojo5@uic.edu , 653867334
Avinash Voruganti, avorug2@uic.edu, 652857771
Vishakh Narkar, vnarka2@uic.edu,665194704
Introduction
Visualization is the simplest way to capture human eyes. People are able to catch up information more
easily through visuals and images than texts and numbers. Visualized business data is much easier to
understand and process. There are many airlines around the world that compete against each other. In
addition, it is crucial to evaluate their flight performance against each other. Therefore, we are going to
look for detailed data about a specific airline flight performance such as scheduled flight times, actual
flight times, length of departure delays and arrival delays. We will try to take a look at the spatial and
geographical impact on the on-time flight performance of the airlines. Moreover, we will compare various
aspects among our potential competitors.
Data Preparation Part-1: Collection and Concatenation of Data
Under this section we are going to focus on the first part of preparation which is data staging and
modelling.
Data will be collected with the help of Bureau of Transportation Statistics. We are going to focus on past
one-year data, from October 2017 to October 2018.
We will be accomplish data staging part with the help of softwares like “R” or “Python” . As the data
collected will be in csv format and we will be having 12 such files, with around 50000 rows each. We will
concatenate the files and perform basic data cleaning with the help of R and Excel and then export to
tableau.
For the data modelling, we will have fact table of airline for on-time performance and dimensional
tables required for the reporting schema. We will include all the necessary metrics required for
reporting of on-time performance of airline in our fact table. Secondly, we will also include various
dimension such as Date, Airport, Country etc with the help of lookup table to provide much more
meaning and detail to our metrics.
Our aim for this section is to optimize our data for analysis with the help of various features provided by
Tableau such as:- Joining, Blending, Union etc.
Data Preparation Part-2: Conditioning of the BTS datasets
When it comes to conditioning the datasets from BTS, having such large data it is very important to define
all the attributes that will be useful. First, we will need to perform basic data cleansing such as removing
null values, identify outliers, and change the types for different variables. Also, we will normalize and scale
for some variables to understand their variable relationship.
We are planning to analyze one specific airline and focus on on-time performance of the airline. This
would result in giving us a simple and interpretable dataset, and from there we would do a comparison
between the assigned airline and the rest of airlines.
1- We will filter on the airline BTS Airline Code
2-We will use the entire dataset to draw comparison
3- We will use aggregation function for all flights of each airline, to make meaningful comparisons
between our assigned airline and other airlines. For example, we can obtain departure delay for a
specific route of airline
Data Preparation Part -3: Exploring airline’s data for meaningful patterns and
trends
We will compare the length of departure delays of our airline with those of competitors across different
time period (daily, weekly, monthly, and quarterly) and different airport via bar chart, and pie chart. By
doing so, we are able to learn patterns of data. In addition, given a time period, we will categorize and
group types of length of delay, such as delay interval every 15 minutes. For arrival time, we will do the
same way. Based on a given time and airport location, we will try to see what’s difference between our
airline and our competitors regarding cancellation and delay causes. By assigning colors and sizes via
regional map, we are able to know trends and what majority patterns look like.
Also, we collect data that is relevant to the on-time performance of the airline and apply an algorithm to
find statistical correlations in the data. We can use tableau to determine the trend and pattern of the data
as well, line chart preferably.
Incorporating time and space into your visualizations
Space weather phenomenon such as hurricanes, storms and conditions like solar radiation storms can
affect the day to day functioning of aviation operation. In order to analyze the on-time flight and
departure delays we would also need to consider these factors for our visualization. Data related to
location of arrival and departure along with their weather conditions will be used to identify the factors
contributing in delay. The field columns like WeatherDelay, AirTime, FirstDepTime, DestAirportID and
CancellationCode are some of the columns that can be taken into consideration and accordingly
visualizations will be presented to portray clear relationship between geographical, climatic conditions
and flight delays.
Summary
To sum up, we are going to conduct above-mentioned points step by step for the future deliverables. We
will use the cleaned data to visualize American airlines on-time performance and analyze the trends such
as actual departure time, actual arrival time, scheduled departure time, late flight with time and space. It
is helpful for us to compare the metrics of American Airlines against its competitors Thus, we are able to
not only learn, understand, extract trends and patterns but also explore the data on a granular level.
IDS 567: Business Data Visualization
Fall(B) 2018
Group Project Description and Requirements
The Bureau of Transportation Statistics (BTS) publishes monthly information about airline on-time flight
performance. Included in the monthly datasets is information about every commercial domestic flight
under the jurisdiction of the Department of Transportation. The detailed flight data includes information
about scheduled flight times, actual flight times, length of departure delays and arrival delays, delay
types, delay causes, and cancellations. Details about every flight in the monthly BTS are available to
every airline – i.e. any airline can evaluate their own on-time performance against every other
commercial airline operating in the US.
Your group project will make use of the BTS dataset to implement the visualization practices we discuss
in this course. Specifically, each group will be assigned a specific airline to research against all other
airlines. Each group will prepare a final deliverable using Tableau that provides a visual overview of the
group’s airline’s performance as well as recommendations for their assigned airline to improve
performance based on visual evidence.
Requirements
Data: Procure Data from here. You are to utilize 12 monthly files, from August 2017 through July 2018.
Do not use data from any month outside of that range. Along with the 12 monthly files there are various
tables available at the BTS site that contain important lookup information for the BTS data. You will
need to utilize those lookup tables where project requirements dictate.
Software: You may use software of choice to stage and manage the flight performance data. Your
presentation, however, should be prepared in Tableau using the story feature. In addition, your final
presentation should be prepared as a packaged workbook in Tableau – i.e. the presentation file should
be self-contained and not require any live data connections during your presentation in class.
Project Scope
Data Visualizations: You are to prepare visualizations from the BTS data set that describe the on-time
performance of your assigned airline. Your visual evidence will consist of three (3) parts. First, you will
prepare visual descriptions of metrics that best describe the on-time performance of your assigned
airline. You should also prepare visualization depictions of any patterns or associations between metrics
that help explain on-time performance.
Second, you will prepare visualizations of your assigned airline’s on-time performance contextualized by
information about the on-time performance of other airlines in the BTS data set. In other words, you will
use appropriate data visualization practices to compare your assigned airline against the industry as a
whole.
Finally, you will prepare visual evidence that identifies the airline that is the most similar competitor of
your assigned airline. You will include visualizations that explain how you selected the closest
competitor. You will also include visualizations that compare your assigned airline directly against the
closest competitor that you select.
Guidelines
Dimensions: Your data visualizations should incorporate appropriate visual depictions of time and
geography to better illustrate your airline’s on-time performance. Your rendering of geography should
favor visualization of airline performance at the airport level rather than city or state level. Where
possible, prioritize flight route over origin or destination alone.
Metrics: You should select the measures that best illuminate on-time performance in your visualizations.
For consistency across all groups you should, however, focus on Departure Delays rather than Arrival
Delays. (Feel free to explore visually why I want you to prioritize Departure on-time performance over
Arrival on-time performance.)
Evaluation: Project deliverables will be evaluated for the use of higher order visual devices and for the
clear depiction of high-dimensional information.
Presentation: Presentations will be made with the Tableau story feature. The story should be selfcontained (i.e., every visual should be sufficiently annotated to be self-explanatory). Your final
presentation should depict your information by minimizing unnecessary text in favor of visual depictions
of data that minimize the noise-to-ink ratio.
Schedule of Deliverables:
Friday, November 2: Interim Deliverable 1
Friday, November 9: Interim Deliverable 2
Friday, November 23: Interim Deliverable 3
Tuesday, December 3: Final Deliverable
Friday, November 2: Submit a brief project plan that describes the following:
1. How you will collect and concatenate the monthly BTS datasets.
2. How you will condition the BTS datasets for use in the group project.
3. How you will explore your airline’s data for meaningful patterns and trends.
4. How you will incorporate time and space into your visualizations.
Friday, November 9: Submit a visual description of the key flight delay metrics for your airline from the
monthly BTS data sets – especially the Departure Delay Minutes and Arrival Delay Minutes variables.
Focus on the distributions of these variables for your airline only.
Friday, November 23: Submit a visual comparison of the key flight delay metrics for your airline as
compared against the overall industry. Focus on those metrics that you identified in the prior interim
deliverable.
Tuesday, December 3: Submit your final presentation draft. You will present your project submission in
class from the presentation you submit on this date. In addition to the visual description of your assigned
airline, and the comparison of its performance against the other airlines in the industry, your final
presentation should also identify one other airline that can be described as your airline’s most similar
competitor. (You may identify the competitor airline by any of several criteria: e.g., it may be one that
has the most flights to common airports as your own; it may be one that flies a similar number of miles;
etc.) The third component of your presentation will consist of a visual comparison of your airline with
your primary competitor that shows how your on-time performance compares with that of your closest
competitor.
Wednesday, December 5/12: Final group presentations. All final project deliverables will be due at the
specified submission deadline prior to the class. (See the submission requirements above.) All groups
will present on Jun 13. The order of the presentations will not be known until you arrive to class that
day.
Purchase answer to see full
attachment