All the dates and times by Mountain Time (MT)
Tasks
Assignment 2 Assignment: JSON and MongoDB
Peer Review: JSON & MongoDB
Assignment: JSON and MongoDB ( Final)
Exercise 12A: Overflow Errors (Auto Graded)
Exercise 12B: Input Validation
Assignment: Data Modeling & Secure Coding (6020 Initial)
Peer Review: Data Modeling & Secure Coding
Exercise 13: Pandas
Assignment: Data Modeling & Secure Coding (6020 Final)
Exercise 14: Numpy & Matplotlib
Assignment: Data Analysis and Visualization (6020 Initial)
Peer Review: Data Analysis & Visulaization
Assignment: Data Analysis and Visualization (6020 Final)
Final Exam ISMG 6020 (Remotely
Due date
Oct 31 by 10pm
Nov 5 by 10pm
Nov 7 by 10pm
Nov 13 by 10pm
Nov 13 by 10pm
Nov 15 by 10pm
Nov 19 by 10pm
Nov 20 by 10pm
Nov 29 by 10pm
Dec 4 by 10pm
Dec 6 by 10pm
Dec 10 by 10pm
Dec 13 by 10pm
Dec 18 by 10pm
Assignment 1 (due Nov 31)
Assignment: JSON and MongoDB
I have already submitted this assignment so you do not have to worry about it I just leave it
here because we may need it to do the rest of the assignments I will share my solution for this
assignment and the correct solution from the professor as well so you can understand it.
• This is the first assignment in a series of assignments that utilizes bike station data from
https://data.cityofchicago.org/Transportation/Divvy-Bicycle-Stations-Historical/eq458inv (Links to an external site.).
•
•
•
•
You will use object oriented programming techniques to create a BikeStation object
with a constructor and multiple methods.
Run a data modeling processes to extract the necessary bike station utilization
information from the JSON data and group the data by station.
You will then run an analysis to calculate some basic statistics about the data.
You will create different types of visualizations of the bike station data you have
retrieved and processed and save some plots.
• The final program will consist of multiple modules including: BikeStation.py,
downloader.py, model_data.py, stats.py and plots.py.
• In this assignment you will be creating the BikeStation.py, that will read some bike station
data from a file and create 5 bike_station objects based on some of the the data in the file, as
described in the Assignment Instructions and in the video below:
Submitting Your Work
Please Upload Your Initial Submission on the assignment due date:
•
A screen shot of your running the BikeStation.py application.
•
•
•
A screen shot of your BikeStation.py code -showing all code.
Neither screen shot should show your name.
You can use https://carbon.now.sh/ (Links to an external site.) to capture a screenshot
of your code even if it is longer that the length of your screen.
Review assignments for two classmates within 4 days of the original assignment
submission date:
•
•
Submitting the reviews within this assignment will be the submission for the peer
review assignment.
See Code Reviewing for instructions on conducting a Code Review.
Revise your assignment and resubmit to: Assignment: Object Oriented Programming
(6020 Final)
•
•
•
•
•
Your actual BikeStation.py code file as a TEXT file (NOT A Jupyter Notebook file)
- PLEASE add your name to the text file.
A screen shot of your running the BikeStation.py application.
If you use ANY CODE you found in another student's assignment PLEASE
o Add a comment to that piece of code indicating the source of the code e.g.
"Lines 12-16 were adapted from code I reviewed"
o Make sure not to use more than 10% of the code from another student.
Please add a comment in the Canvas comments describing how you updated your
code from the first draft (or if you did no updates).
Please submit the entire assignment EVEN IF YOU MADE NO CHANGES
FOLLOWING CODE REVIEW.
Overall Project Structure
Assignment Instructions
•
•
•
This assignment will utilize Bike Station data from the City of Chicago (original data
source https://data.cityofchicago.org/Transportation/Divvy-Bicycle-StationsHistorical/eq45-8inv). (Links to an external site.)This assignment will use a file
(bdata.txt
Download bdata.txt ) (you will find it attached in a different file named
bdata) containing one record for each bike station (669 stations) but the the complete
data set has 176 million records.
Each bike station record contains multiple pieces of data about the time, bike station,
number of docks at the station, the number of bikes at the station, the status to of the
station and where the station is located.
The data is stored as name value pairs as shown below (NOTE: the actual file has all
of the data for one bike station on a single line).
{
"id": "515",
"timestamp": "2020-11-16T11:55:55.000",
"station_name": "Paulina St & Howard St",
"total_docks": "19",
"docks_in_service": "19",
"available_docks": "9",
"available_bikes": "10",
"percent_full": "53",
"status": "In Service",
"latitude": "42.019159",
"longitude": "-87.673573",
"location": {
"type": "Point",
"coordinates": [-87.673573, 42.019159]
},
"record": "51520201116115555"
}
Step 1: Select a set of Bike Stations to take data for
•
Go to Assignment: Choose Set of Bike Stations to Collect Data For and follow the
instructions on the page to choose the five Bike Stations you collect data for.
Step 2: Write an BikeStation class that demonstrates the following aspects of OOP in Python:
1. Create a basic class that can hold all relevant Bike data for an individual bike station
(see above).
o As with a database, you don't normally want to have calculated values stored
in your objects because it could cause data integrity issues if, for example you update the available_bikes when a bike is returned but forget to update
percent_full.
o As such, you BikeStation class should only include the data values to uniquely
specify the current state (number bikes available, number slots in the station).
2. Appropriately create a constructor to set all data values.
3. Use the @property decorator to make at least one property in your BikeStation class
private.
4. Use the @???.setter method to validate the private property in some way (e.g. check
if its numeric, change it's data type, change it's length) before setting its value.
5. Create one or more regular class methods to CALCULATE other relevant bike station
values from the data attributes you are storing as a part of the class.
6. Override the __str__ method to print a string representation of a BikeStation that
looks like the String below
o Note that the format of the date string is different than it is in the data file and
requires you to use String methods to remove the T between the date & time
as well as the seconds and milliseconds from the time:
o
Paulina St & Howard St had 10 bikes on 2020-11-16 11:55
Step 3: Write a second class called Point or Position to store two or more pieces of
BikeStation data.
•
•
The class could store latitude & longitude
You should use this class in the BikeStation class instead of storing the vales as
primitive data types
You may use the Author and Article classes presented in lecture as a starting point for these
classes.
Step 4: Do one thing I did not ask you to do explicitly in the assignment.
•
•
•
•
There are many things about Object Oriented Programming that may not have been
covered in the lectures. An important part of becoming a programmer is finding
resources and and better ways of writing your code.
Find one other object oriented concept not explicitly required for the assignment in
the lectures, book(s) or using Google and add it to your code.
The change should improve the molecularity of the program or "reusability" of your
code.
Document what you did in the in code comments as well as in the submission
comments in Canvas.
Step 5: Testing you classes using data from the data file bdata.txt
Download bdata.txt
1. Prompt for a file name (bdata.txt
Download bdata.txt ): (you will find it attached
in a different file named bdata)
2. Open that file and read through the file
3. Display a custom error message if the file does not exist
4. Use an appropriate String method find lines for the BikeStations you were assigned e.g. lines that contain:
"id": "###" (where (###) is the number for one of the Bike Stations you were
assigned).
5. Once you have found lines of, you can pull the BikeStation data out from the the line
by splitting the line on a ", " and then splitting the string a second time using a colon.
6. Create a new BikeStation object and add them to a list.
7. Print the BikeStation Object
8. You code should be efficient (you should not have to open the file multiple times
NOR should you have to loop through the data more than once).
9. Print out the total number of Bikes available for each of the five Bike Stations you
were are gathering data for after the file has been completely read.
10. Print out the total number of empty bike docks available for each of the five Bike
Stations you were are gathering data for after the file has been completely read.
Your output should look something like this:
Paulina St & Howard St had 10 bikes on 2020-11-16 11:55
Clark St & Jarvis Ave had 1 bikes on 2020-11-16 11:55
Greenview Ave & Jarvis Ave had 3 bikes on 2020-11-16 11:55
Bosworth Ave & Howard St had 1 bikes on 2020-11-16 11:55
Eastlake Ter & Rogers Ave had 2 bikes on 2020-11-16 11:55
Stations: [515, 517, 520, 522, 523] Bikes Available 17 Docks Available 58
OOP Assignment Rubric
This criterion is linked to a Learning Outcome BikeStation class with at least 6 attributes
defined in an __init__ function that correctly sets all BikeStation properties.
This criterion is linked to a Learning Outcome BikeStation class includes a regular method
that returns a calculated value.
This criterion is linked to a Learning Outcome BikeStation class overrides the __str__
method and returns a string containing at least 2 pieces of BikeStation data.
BikeStation class uses the @property decorator to make at least one property private
Both the @property and ???.setter methods for that property are created correctly.
A second class called Point or Position was created and used in the BikeStation Class
Must include an __init__ method
This criterion is linked to a Learning Outcome BikeStation data is read from a file and
appropriate methods are used to open, read from and close the file
This criterion is linked to a Learning Outcome A Custom Error Message is displayed if the
file cannot be opened.
This criterion is linked to a Learning Outcome The lines containing the BikeStation ids that
you are extracting data for are correctly identified.
This criterion is linked to a Learning Outcome Appropriate String methods are used to extract
the relevant pieces of BikeStation data from the lines from the data file.
This criterion is linked to a Learning Outcome Relevant BikeStation data is used to create a
list of BikeStation objects
This criterion is linked to a Learning Outcome The total number of bikes available and the
total number of docks available across the five BikeStations is calculated and displayed.
Code includes one "Extra" feature
Code includes one object oriented feature not explicitly required in the assignment
description. Feature is documented in the comments.
Code runs correctly
A screenshot file is included demonstrating the program runs and displays data correctly
Code is clear, well commented and has good overall design quality.
Watch out for:
Bad variable or method names.
Convoluted control flow (if and while statements) that are repetitive or that could be
simplified.
Packing too much into one line of code, or too much into one method.
Failing to comment obscure code.
Having too many trivial comments that are simply redundant with the code.
Variables used for more than one purpose.
My solution for assignment 1:
import json
class Bikestation:
def __init__(self, station_id, timestamp, station_name, total_docks, available_docks,
available_bikes):
self.id = station_id
self.timestamp = timestamp
self.station_name = station_name
self.total_docks = total_docks
self.available_docks = available_docks
self.available_bikes = available_bikes
@property
def station_id(self):
return self.station_id
@station_id.setter
def station_id(self, x):
assert isinstance(x, int)
self.station_id = x
def available_docks_percentage(self):
percentage = 100 * (self.available_docks / self.total_docks)
return percentage
def __str__(self):
f_date = self.timestamp.replace("T", " ").split(".")[0]
f_date = f_date[:len(f_date)-3]
return f"{self.station_name} had {self.available_bikes} bikes on {f_date}"
def __repr__(self):
return self.__str__()
def main():
f_name = input("Enter data source file name: ")
fp = None
try:
fp = open(f_name, 'r')
except Exception:
print(f"Unable to open {f_name}")
return
chosen_stations = [149, 150, 263, 335, 406]
all_stations = json.load(fp)
bike_stations = []
for station in all_stations:
for id in chosen_stations:
if station["id"] == str(id):
station_id = station['id']
timestamp = station['timestamp']
station_name = station['station_name']
total_docks = station['total_docks']
available_docks = station['available_docks']
available_bikes = station['available_bikes']
bike_stations.append(Bikestation(station_id, timestamp, station_name, total_docks,
available_docks, available_bikes))
break
fp.close()
all_bikes = 0
all_docks = 0
for bike_station in bike_stations:
all_bikes += int(bike_station.available_bikes)
all_docks += int(bike_station.available_docks)
print(bike_station)
print(f"Stations: {chosen_stations} Bikes Available {all_bikes} Docks Available
{all_docks}")
if __name__ == "__main__":
main()
The professor solution:
Assignment 2
Assignment: JSON and MongoDB
For each assignment we have to do three things:
1- Solve the assignment
2- make some corrections or notes (Peer Review) to two students’ solution for
the same assignment
3- correct our solution after see the student notes for our assignment’ solution.
•
•
•
•
This is the second assignment in a series of assignments that utilizes bike station data
from https://data.cityofchicago.org
o You will use object oriented programming techniques to create a BikeStation
object with a constructor and multiple methods.
o Run a data modeling processes to extract the necessary bike station utilization
information from the JSON data and group the data by station.
o You will then run an analysis to calculate some basic statistics about the data.
o You will create at least three different types of visualizations of the bike
station data you have retrieved and processed and save some plots.
This assignment is the second part a a project where you will download some of the
bike station data from https://data.cityofchicago.org/resource/eq45-8inv.json and
store it in a MongoDB database.
The final program will consist of multiple modules including: bike_station.py,
downloader.py, model_data.py, and analysis.py.
In this assignment you will be creating the downloader.py, that downloads live bike
station data from the City of Chicago and saves them to a MongoDB database running
in the cloud, as described in the Assignment Instructions below.
Submitting Your Work
Please Upload Your Initial Submission on the assignment due date:
•
•
•
•
A screen shot of your running the downloader.py application to produce the bike
stations database/collection.
A screen shot of your downloader.py code -showing all code.
Neither screen shot should show your name.
You can use https://carbon.now.sh/ (Links to an external site.) to capture a screenshot
of your code even if it is longer that the length of your screen.
Review assignments for two classmates within 4 days of the original assignment
submission date:
•
•
Submitting the reviews within this assignment will be the submission for the peer
review assignment.
See Code Reviewing for instructions on conducting a Code Review.
Revise your assignment and resubmit to: Assignment: JSON and MongoDB (6020
Final)
•
Your actual downloader.py code file as a TEXT file (NOT A Jupyter Notebook file)
- PLEASE add your name to the text file.
•
•
•
•
•
A screen shot of your running the downloader.py application to produce
the bike_stations database/collection.
A screen shot of the bike_stations collection page in Atlas, showing the number
of bike_station records downloaded.
If you use ANY CODE you found in another student's assignment PLEASE
o Add a comment to that piece of code indicating the source of the code e.g.
"Lines 12-16 were adapted from code I reviewed"
o Make sure not to use more than 10% of the code from another student.
Please add a comment in the Canvas comments describing how you updated your
code from the first draft (or if you did no updates).
Please submit the entire assignment EVEN IF YOU MADE NO CHANGES
FOLLOWING CODE REVIEW.
Overall Project Structure
Assignment Instructions
downloader.py
•
The python code in the following lectures is a good starting point for coding this
assignment.
o Example: Reading JSON Via HTTP
o Example: MongoDB and Atlas
Step 1: Select a set of Bike Stations to take data for
Go to Assignment: Choose Set of Bike Stations to Collect Data For and follow the
instructions on the page to choose the Bike Stations you collect data for. We already did this
and our Bike stations are:
149, 150, 263, 335, 406
No need to choose a new set of Bike Stations if you already did this in the previous
assignment.
Step 2: Download your first Bike Data
•
You will need to edit the program to:
o Download data for Chicago Bike Stations using the Bike Station URL instead
of the Colorado Business data URL:
https://data.cityofchicago.org/resource/eq45-8inv.json (don't forget to add the
? on the end so you can add parameters) (Links to an external site.)
o The API for the data can be found here:
https://dev.socrata.com/foundry/data.cityofchicago.org/eq45-8inv (Links to an
external site.)
o Instead of printing out data for the individual bike stations extract the entire
array from the JSON data returned - this is the array of bike station readings.
o Use an appropriate python function to get the number of bike station readings
in the list of bike station readings downloaded, to confirm you downloaded the
number of readings you were expecting.
o Ideally you should sort the Bike Station data so you get the most current bike
station readings as eventually you will want to download all of the bike station
data for each of your bike stations for a period of time (a week, or a month) so
you can analyze usage patterns.
Step 3: Write Data to MongoDB
•
You are now ready to write your data to a MongoDB database!
o If you haven't already done so, create your own MongoDB cluster on Atlas
(see MongoDB Project Atlas) (I will gave you my account and you will find
the MongoDB project Atlas attached)
o To connect to your free tier cluster, first, you'll need to import the
MongoClient class from PyMongo.
o Next, you need to actually connect to your free tier cluster. You do that by
instantiating a MongoClient object and specify the URI for your cluster (copy
the URI connection string for connecting your application from Atlas).
o Remember that you will need to update the password in the connection string.
o Once with that is done, you should run the code to test whether or not you
were successful (it works if there are no errors).
o Before the start of your loop you will want to create a database to hold your
bike data (e.g. bikesdb or stationdb), then create a collection to hold your bike
station readings (eg. bikedata).
o Inside the loop, you will want to ADD the array you get from the JSON data
to the MongoDB bikedata collection. Remember the JSON data contains
multiple bike station readings so you will want to use the command that
allows you to add multiple items to a collection.
Step 4: Multiple Downloads
•
•
You will be assigned at least 5 bike stations to download the most recent data
for. This means you will need to download data at least 5 times - once for each bike
station you are assigned.
Check the total number of bike station readings downloaded for each bike station and
generate an error if it is fewer than the target number (e.g. 1000).
Step 5: Expand the Assignment
•
•
•
•
There are many things about Networked programs, JSON, Dates and MongoDB that
were not covered in the lectures.
Find one other feature or check you can do related to dates in the book(s) or using
Google and add it to your code.
Ideally the change should improve the automation, accuracy, efficiency or
completeness of your code.
Document what you did in the in code comments as well as in the submission
comments in Canvas.
Step 6: Downloading Data
•
•
•
•
Once your code is working, you will want to delete your bike stations collection and
then start your download for real. I suggest deleting the collection at this point
because it is likely that you got duplicate data in your collection while you were
debugging your code.
Change your bike station ids to the values you were assigned.
Change the limit to 5000 so you are downloading 5000 bike station readings for the
stations you were assigned at a time.
Rerun downloader.py until you get at least 25,000 bike station readings - 5000 for
each station you were assigned.
Step 7: Validate Download
•
•
•
After you download your data you will want to validate that the data has been written
to MongoDB correctly.
Verify that you have at least 5000 bike station readings in the dataset for each station
ID using a MongoDB query with a filter.
Verify that you have at least one month of data in the data set for each station ID
using a MongoDB query to extract the bike station record with the earliest time.
o You can sort the data to get the first document for a particular station by
putting a sort statement in the parentheses of the find_one method then extract
the "timestamp" value from the data and display it.
Data Collection Options:
•
•
•
All students must collect 4,000 to 6,000 bike station readings for 5 different bike
stations.
Students will select different bike station groupings here Assignment: Choose Set of
Bike Stations to Collect Data For
The groups are for nearby bike stations so you can estimate the number of bikes
available at any given time within a small area of Chicago.
The professor solution.. we MUST change it because if he find out that we have his solution
he will accuse me of cheating (we have to cahge the names some of the way to solve the
assignment it is fine to made some small mistakes on purpose so he can not know that we
have the answers)
Peer Reviews:
Peer Reviews Requirements:
In this course, you will read your classmates’ code and give them comments about
it. Although you can make comments about anything you think is relevant, the primary goal
of this class is to learn how to write code that is safe from bugs, easy to understand, and ready
for change. Read the code with those principles in mind.
For each program you review, you should:
•
•
Compare the code to the rubric to determine whether the submission meets all of
the requirements set forth in the assignment and rubric (be sure to complete the
rubric!). That is - did the student implement a program that provides all of the
information it needs to incorporate?
Make a comment by clicking on a line of code or selecting a range of lines, and
typing your comment or by adding a text comment in the Canvas comment window.
Things you might comment on include (see Code Reviewing guidelines for details):
o Bugs or potential bugs: Repetitive code, Inconsistent indentation, Spelling or
capitalization mistakes
o Format: Format in Python is essential to program function. Format includes
use of proper indenting, the use of whitespace and comments ...
o Unclear, messy code: Bad variable or method names, poor comments...
o Design Quality: The design chosen should be clear and concise. Is the
solution chosen excellent, better than average, average or worse than other
ways of approaching the given problem? Design quality problems might
include: Repetitive code, Global variables, Convoluted control flow that could
be simplified, etc.
Follow the Guidelines outlined here:
Code Reviewing
In this course, you will read your classmates’ code and give them comments
about it. This document describes the whys and hows of the code reviewing
process.
You can’t learn how to write without learning how to read – and programming is no
exception. Code reviewing is widely used in software development, both in industry and in
open source projects. Some companies like Google have instituted review-before-commit as a
required policy. You can’t get a line of code into the Google source code repository unless
another Google engineer has read it, given feedback about it, and signed off on it.
The benefits of code review in practice are several. Reviewing helps find bugs, in a way
that’s complementary to other techniques (like static checking, testing, assertions, and
reasoning). Reviewing uncovers code that is confusing, poorly documented, unsafe, or
otherwise not ready for maintenance or future change. Reviewing also spreads knowledge
through an organization, allowing developers to learn from each other by explicit feedback
and by example. So code reviewing is not only a practically important skill that you will need
in the real world, but also a learning opportunity.
What to Look For
Although you can make comments about anything you think is relevant, the primary goal of
this class is to learn how to write code that is safe from bugs, easy to understand, and ready
for change. Read the code with those principles in mind. Here are some concrete examples of
problems to look for:
Bugs or potential bugs.
•
•
•
•
•
•
•
•
•
•
•
Repetitive code (remember DRY, Don’t Repeat Yourself).
Disagreement between code and specification (your code does not do everything
required).
Relying on the assignment operator instead of the equality operator: (=) in stead
of (==)
Off-by-one errors: Remember that a loop doesn’t count the last number you specify in
a range. So, if you specify the range [1:11], you actually get output for values between
1 and 10.
Inconsistent indentation. Many Python features rely on indentation and getting it
wrong can create a bug.
Placing function calls in the wrong order when creating complex statements: Python
always executes functions from left to right. So the statement
MyString.strip().center(21, "*") produces a different result than MyString.center(21,
"*").strip().
Misplacing punctuation: You can put punctuation in the wrong place and create an
entirely different result. Remember that you must include a colon at the end of each
structural statement. In addition, the placement of parentheses is critical. For example,
(1 + 2) * (3 + 4), 1 + ((2 * 3) + 4), and 1 + (2 * (3 + 4)) all produce different results.
Using the wrong capitalization: Python is case sensitive, so MyVar is different from
myvar and MYVAR. Always check capitalization when you find that you can’t access
a value you expected to access.
Making a spelling mistake: Even seasoned developers suffer from spelling errors at
times. Ensuring that you use a common approach to naming variables, classes, and
functions does help. However, even a consistent naming scheme won’t always
prevent you from typing MyVer when you meant to type MyVar.
Optimistic, insecure programming.
etc.
Unclear, messy code.
•
•
•
•
•
•
•
Bad variable or method names.
Convoluted control flow (if and while statements) that could be simplified.
Packing too much into one line of code, or too much into one method.
Failing to comment obscure code.
Having too many trivial comments that are simply redundant with the code.
Variables used for more than one purpose.
etc. …
Positive comments are also a good thing. Don’t be afraid to make comments about things you
really like, for example:
Unusually elegant code.
Creative solutions.
Great design.
Process
We will be using the Canvas Peer Reviewing system to allow you to review two student's
code and provide helpful feedback.
Here’s how the process will typically go. Only 80 point assignment code will be reviewed.
•
•
•
Initial due date: Submit a screen shot of assignment code and screen shot of code
running to Canvas as separate files.
1-2 days after initial due date: you should visit Canvas, read the code that you were
assigned, and make comments about them. For each piece of code you review, you
should:
o Compare the code to the rubric to determine whether the submission meets
all of the requirements set forth in the assignment and rubric (be sure to
complete the rubric!). That is - did the student implement a program that
provides all of the information it needs to incorporate?
o Make a comment by clicking on a line of code or selecting a range of lines,
and typing your comment or by adding a text comment in the Canvas
comment window. Things you might comment on include:
Format: Format in Python is essential to program function. Format
includes use of proper indenting, the use of whitespace and comments
.
Modularity in Design: Avoid accomplishing too many tasks in one
function.
Design Quality: The design chosen should be clear and concise. Is the
solution chosen excellent, better than average, average or worse than
other ways of approaching the given problem? Design quality
problems might include: Repetitive code, Global variables,
Convoluted control flow that could be simplified, Variables used for
more than one purpose.
o 2 days after initial due date: code reviews are due.
Day after code review date: Look at your own code, and start revising your code for
resubmission, and re-submit your work by the resubmission deadline, (the assignment
link will disappear one week after the initial assignment was due). This approach to
re-grades is sometimes referred to the “mastery approach”. You will get 50% credit
for any improvements you make to your code after the initial submission. So, if your
grade on the initial assignment would have been an 80% and you get 100% on the
revision, you final grade will be 90%. If you got 50% on the initial submission and
got 90% on the regrade, your final grade will be 70%. The code reviews can give you
an idea of what your initial grade is likely to be BUT all assignments will be reviewed
by a grader to determine both the initial and final grades on the assignment.
Privacy & Visibility
As a code author, you are anonymous. The system does not display your name with any of
the code you wrote. Please don’t put your full name, username, email address, or other
identifying information in your source code.
As a reviewer, you are anonymous. The system does not displays your name BUT the
instructor will be able to see your name when viewing the reviews.
Respect
Be polite. Sarcasm, insults, and belittling words have no place in a code review. It doesn’t
matter whether you’re talking about a person (a fellow reviewer or a code author) or about
code. Don’t call code “stupid,” because that transfers all too easily to the author of the code,
whether you meant it that way or not.
Be constructive. Don’t just criticize, but be helpful: point the way toward solutions.
“Hopeless mess” is not a constructive comment; “name the variables more descriptively, e.g.
tmp1 is not a great name” is much more constructive.
As a code author, you should read your feedback carefully, and keep an open mind. Don’t get
defensive. If your reviewers – who are fellow students, TAs and your instructor – find your
code confusing, then you should consider what this says about its clarity and maintainability
in the real world. If you disagree with a comment, you can indicate what you disagree with in
a Canvas comment when you submit your final version of you code.
FAQ for Reviewers
Does reviewing affect my grade? Yes, it contributes to your Exercise/Peer review grade;
The two reviews you do for a given assignment are worth 5 points (for two) just like a
homework exercise.
Does my reviewing affect other people’s grades? Not directly. The instructor will grade the
initial and final assignment submission independent of your review. But your reviewing will
hopefully help other students become better programmers and acquire a better understanding
of the course, and indirectly improve their grades. You may also learn something about
programming and the assignment expectations by completing a review.
How many programs do I need to review, and how much do I need to do on each one?
You will be assigned two reviews for the four 80 point assignments in the class. At a
minimum, you must do at least two things on each program assigned to you – complete the
rubric and make a comment.
I don’t feel like I know anything. How can I possibly review other people’s code? You
know more than you think you do. You can read for clarity, comment on places where the
code is confusing, look for problems we read about or talked about in class, etc.
What if I can’t find anything wrong? You can write a specific positive comment about
something good.
How much of the whole program do I have to try to understand? You do not have to
understand the entire program. You can look for the key things asked for tin the programming
rubric and click "Full Marks" if you find it and "No Marks" if you do not. This is useful
because if you cannot find something and it is there - it may mean that the student's code you
are reviewing is not clear. If you do see something you think is worn - please comment to
help the other student out.
How can I find out who wrote this code? Code authors are anonymous. If there is a serious
situation requiring the identity of the code author – e.g., a potential case of plagiarism – then
bring it to the attention of the teaching staff.
FAQ for Code Authors
Can somebody use my code to cheat in the class/can I copy the code I review? It is
possible that a student could use ideas from the code they review in their final assignment
submission. However, copying the work of current or past ISMG 4400/6020 students is still
considered academic dishonesty - so never cut and paste another student's code into your
program. If you figure out how to correct an error in your code by seeing it in in another
student's code, please do the following:
1. Edit your code to add the new piece of logic you got from another student
2. Add a comment before the code you added indicating "Lines 12-16 were adapted
from code I reviewed"
3. Make sure not to use more than 10% of the code from another student.
4. Do not type the other student's code directly into your code.
5. Remember, partial credit on an assignment is always better than a 0.
Can I look at the reviews of my code while reviewing is still in progress? Yes, you can
see your reviews as soon as they are posted.
You will receive points for completing both reviews, including meaningful comments,
correctly identifying the presence (or absence) of requires program features using the
assignment rubric.
Assignment 3
Assignment: Data Modeling & Secure Coding
•
•
•
This is the third assignment in a series of assignments that utilizes bike station data
from https://data.cityofchicago.org
o You will use object oriented programming techniques to create a BikeStation
object with a constructor and multiple methods.
o Run a data modeling processes to extract the necessary bike station utilization
information from the JSON data and group the data by station.
o You will then run an analysis to calculate some basic statistics about the data.
o You will create at least three different types of visualizations of the bike
station data you have retrieved and processed and save some plots.
The final program will consist of multiple modules including: bike_station.py,
downloader.py, model_data.py, and analysis.py.
In this part of the project you will be creating model_data.py which will extract
relevant data from the BikeStations stored in MongoDB and write them to a SQLite
database. This assignment will also include a error checking process to validate the
downloaded data has the correct format.
Submitting Your Work
Please Upload Your Initial Submission on the assignment due date:
•
•
•
•
A screen shot of your running the model_data.py application to produce the
index.sqlite database/collection - with some print statements to show updates are
made every 100 or so records.
A screen shot of your model_data.py code -showing all code.
Neither screen shot should show your name.
You can use https://carbon.now.sh/ (Links to an external site.) to capture a screenshot
of your code even if it is longer that the length of your screen.
Review assignments for two classmates within 4 days of the original assignment
submission date:
•
•
Submitting the reviews within this assignment will be the submission for the peer
review assignment.
See Code Reviewing for instructions on conducting a Code Review.
Revise your assignment and resubmit to: Data Modeling & Secure Coding (6020 Final)
•
•
•
•
Your actual model_data.py code file as a TEXT file (NOT A Jupyter Notebook file)
- PLEASE add your name to the text file.
A screen shot of your running the model_data.py application to produce the
index.sqlite database/collection - with some print statements to show updates are
made every 100 or so records.
Your actual A screen shot of the index.sqlite database.
If you use ANY CODE you found in another student's assignment PLEASE
o
•
•
Add a comment to that piece of code indicating the source of the code e.g.
"Lines 12-16 were adapted from code I reviewed"
o Make sure not to use more than 10% of the code from another student.
Please add a comment in the Canvas comments describing how you updated your
code from the first draft (or if you did no updates).
Please submit the entire assignment EVEN IF YOU MADE NO CHANGES
FOLLOWING CODE REVIEW.
Project Structure
model_data.py
•
•
•
•
•
•
model_data.py reads the rough/raw data from the MongoDB downloads and produces
a cleaned-up and well-modeled version of the data in the file index.sqlite.
The file index.sqlite will be much smaller (often 10X smaller for data sets with more
data attributes) than the raw data because it only contains the data you will want to
use in your data analytics tasks - and should not include data that can be calculated
from other data in the file.
Each time model_data.py runs - it should completely wipe out and re-build the
index.sqlite, database allowing you to adjust its parameters and edit the mapping
tables in index.sqlite to tweak the data modeling process.
It should store the data necessary to create a BikeStation object (Assignment 1) as
downloaded from the MongoDB data (Assignment 2)
It should do some validation on the data
Running model_data.py can take quite a bit of time because it loops through every
record, extracts relevant fields from the data, it will run faster if you do not write
every item to the screen and only commit the data to write it to the database every 50
to 100 records.
•
The python code in the following lectures that is a good starting point for coding this
assignment.
o Example: Python and Databases (I will attached the lecture)
o Example: MongoDB and Atlas (I will attached the lecture)
Reading and Writing Data
•
•
•
•
You will need to update the SQL queries to create and insert bike station data into a
database table called bike_stations - or some other reasonable name.
It should store the data necessary to create a BikeStation object (Assignment 1)
Instead of reading raw JSON data from the web as done in the example, your
program will connect to your MongoDB database and collection created in
Assignment 2,
Then you can loop through all of the documents in your bike_station collection and
extract the relevant data from the dictionary object for each bike station and insert it
into the database table.
Secure Coding
•
•
•
•
It should extract the data from the MongoDB data and store it individual variables for
each piece of data you are writing to the database.
It should cast numeric variables as int() or float() and break out of the data processing
loop if the data is not of the correct format.
It should convert ISO formatted date to a date time to validate it as well BUT, since
you cannot store Dates=Time data in SQLite, you should store the data in the database
as either an ISO formatted date or as a numeric time stamp.
It should use a parameterized query to minimize the opportunity for SQLInjection.
Expanding the Assignment.
•
•
•
•
As with prior assignments, look for ways you can improve your code that may not
have been covered in the lectures.
Find one other feature or check you can do based on information discussed in class, in
the book(s) or using Google and add it to your code.
Ideally the change should improve the automation, accuracy, efficiency or
completeness of your code.
Document what you did in the in code comments as well as in the submission
comments in Canvas.
The Final Product
•
•
When you are done, you will have a nicely indexed version of the bike station data in
index.sqlite.
This is the file to use to do data analysis (project 4).
The professor solution.. we MUST change it because if he find out that we have his solution
he will accuse me of cheating (we have to cahge the names some of the way to solve the
assignment it is fine to made some small mistakes on purpose so he can not know that we
have the answers)
Assignment 4
Assignment: Data Analysis and Visualization (6020 Initial)
•
•
•
•
This is the fourth and final assignment in a series of assignments that utilizes bike
station data from https://data.cityofchicago.org
o You will use object oriented programming techniques to create a BikeStation
object with a constructor and multiple methods.
o Run a data modeling processes to extract the necessary bike station utilization
information from the JSON data and group the data by station.
o You will then run an analysis to calculate some basic statistics about the data.
o You will create at least three different types of visualizations of the bike
station data you have retrieved and processed and save some plots.
The final program will consist of multiple modules
including: bike_station.py, downloader.py, model_data.py, and analysis.py.
In this part of the project you will be creating analysis.py which will extract relevant
data from the BikeStations stored in an SQLite database and calculate some basic
statistics on the data using either NumPy or Pandas.
You will then create at least three different types of visualizations of the data you
have retrieved and processed and save some plots. For example:
o
o
o
Either a bar chart or a stacked bar chart to visualize the average number of
bikes (or docks) available by bike station or by hour of the day or day of the
week.
A histogram of the numbers of bikes available in the data set.
A scatter plot to show how the number of bikes available is changing over
time.
Submitting Your Work
Please Upload Your Initial Submission on the assignment due date:
•
•
•
•
•
A screen shot of your running the analysis.py application to compute basic
histogram data and statistics on the messages you have retrieved.
A .png file each of the plots you made with analysis.py.
A screen shot of your analysis.py code -showing all code.
None of the screen shots should show your name.
You can use https://carbon.now.sh/ (Links to an external site.) to capture a screenshot
of your code even if it is longer that the length of your screen.
Review assignments for two classmates within 4 days of the original assignment
submission date:
•
•
Submitting the reviews within this assignment will be the submission for the peer
review assignment.
See Code Reviewing for instructions on conducting a Code Review.
Revise your assignment and resubmit to: Assignment: Data Analysis and Visualization
(6020 Final)
•
•
•
•
•
•
Your actual amalysis.py code file as a TEXT file (NOT A Jupyter Notebook file) PLEASE add your name to the text file.
A screen shot of your running the analysis.py application to compute basic
histogram data and statistics on the messages you have retrieved.
A .png file each of the plots you made with analysis.py.
If you use ANY CODE you found in another student's assignment PLEASE
o Add a comment to that piece of code indicating the source of the code e.g.
"Lines 12-16 were adapted from code I reviewed"
o Make sure not to use more than 10% of the code from another student.
Please add a comment in the Canvas comments describing how you updated your
code from the first draft (or if you did no updates).
Please submit the entire assignment EVEN IF YOU MADE NO CHANGES
FOLLOWING CODE REVIEW.
Project Structure
analysis.py
The python code in the following lectures is a good starting point for coding this
assignment.
•
•
Example: Analysis and Visualization (I will attached the lecture)
Exercise: Pandas Solution (I will attached the lecture)
Step 1:
•
•
•
•
Add the appropriate import statements to allow you to use the SQLite database,
statistics and plotting modules
Make the database connection and run select query
Get data from the database and add it to dictionaries and/or lists so it can be used in
analysis,
You can either put data values into individual lists for each column of data in the
database OR you will create a list of BikeStation objects using the BikeStation class
created in the OOP assignment.
Step 2: Statistical Analysis
Statistical analysis can be done either using NumPy or Pandas
•
•
If you created individual lists/dictionaries containing data you should:
o Convert lists to NumPy Arrays
o Calculate basic statistics around average bikes available and docks in service
by station using a Numpy filter to filter the numpy arrays by station
o Print the statistics to the screen
If you created a list of BikeStation objects you will need to convert list of BikeStation
objects to a Panda's Dataframe.
o See https://stackoverflow.com/questions/47623014/converting-a-list-ofobjects-to-a-pandas-dataframe for an example of how to use a custom method
to turn you object into a dictionary AND use list comprehension to create a list
of Bike Station dictionaries to create your DataFrame.
o Create PANDAS Pivot Table to calculate means for average bikes available
and docks in service by station and display the pivot table to the screen.
Step 3: Generating plots
•
•
•
You can use the lists of data created for your statistical analysis to generate the plots
If you are using a Pandas DataFrame, you can get Pandas Series from the DataFrame
for each column in the data frame, and those series can be used just like a list in
matplotlib.pyplot.
Create at least three different types of plots using the ideas below:
o Create a bar chart showing the average bikes and/or docks available by bike
station (can also be made directly from the Pandas DataFrame)
o Create a bar chart showing the average bikes and/or docks available by day of
the week (Monday, Tuesday ...) or hour of the day. This will require you to
convert the timestamp to a datetime so you can create a new list of data for the
hour/day associated with each BikeStation reading.
o Create a histogram counting the of the number of bikes available (or docks
available) for each 10 minute measurement in the data set.
o
•
•
•
Create a histogram counting the of the number of bikes available (or docks
available) for each 10 minute measurement in the data set for an individual
BikeStation.
o Create a scatter plot of the number of bikes available or docks in service for
each BikeStation over time.
o Calculate the average bikes available by day for each BIkeStation and create a
line plot showing the changer over time.
Create a title, xlabel and ylabel for each of your plots
Make sure the scale used is appropriate for the plot
Call savefig for each of your plots and save as a png.
Step 4: Expanding the Assignment.
•
•
•
•
As with prior assignments, look for ways you can improve your code that may not
have been covered in the lectures.
Find one other analysis or visualization you can do based on information discussed in
class, in the book(s) or using Google and add it to your code.
Ideally the change should improve the analysis you did or the ability of the user to
understand the data.
Document what you did in the in code comments as well as in the submission
comments in Canvas.
The professor solution.. we MUST change it because if he find out that we have his solution
he will accuse me of cheating (we have to cahge the names some of the way to solve the
assignment it is fine to made some small mistakes on purpose so he can not know that we
have the answ
Purchase answer to see full
attachment