Powerpoint of collected data, writing homework help

User Generated

mlq5846316

Business Finance

Description

5 slides powerpoint. Integrate the data analysis I have in hand using RStudio and excel into powerpoint slides. We analyze the freethrow percentage in NBA with few model to determine corelation and other models. Please put all the things we have into a few slides based on the instruction I will attach.

If you know how to use Rstudio this should be a 10 min work for you.

setwd("C:/Users/kaziz/Desktop")
library(readr)
Basketball <- read_csv("C:/Users/kaziz/Desktop/MKTG 480N Basketball analytics.csv")


# to help generate correlation plots
install.packages("PerformanceAnalytics", repos = "http://cran.us.r-project.org")

library(PerformanceAnalytics)

# to help visualize correlation in color
install.packages("corrplot", repos = "http://cran.us.r-project.org")

library(corrplot)

#See some Descriptive statistics about our Basketball dataset
summary(Basketball)

attach(Basketball)
plot(Basketball$FTP,Basketball$AST)


```{r, results='hide'}
# Using the function chart.Correlation from "PerformanceAnalytics" package,
# we can create a correlation matrix easily, much easier than built in functions

# However, before that, we need to pick out the numerical variables
# because we cannot run correlation matrix with categorical data or missing data
Basketball.num = sapply(Basketball, is.numeric) # label TRUE FALSE for numerical variables
num = Basketball[,Basketball.num] # selecting only numerical variables
```

chart.Correlation(num)


correlation = cor(num, use = "complete.obs")
corrplot(correlation, type="upper")

Basketball$Post= ifelse(Basketball$Pos=="PG",1,ifelse(Basketball$Pos=="SG",2,ifelse(Basketball$Pos=="SF",3,ifelse(Basketball$Pos=="PF",4,5))))


```{r}
# first load package "caTools"
library(caTools)

# based on probability 70% training data / 30% test data split.
# We create an item variable called "indicator", where indicator = TRUE takes up 70% of data
indicator = sample.split(Basketball, SplitRatio = 0.7)

# Extract out the data based on whether indicator variable is TRUE or FALSE
testing = Basketball[!indicator,] # getting 30% of the data as testing
training = Basketball[indicator,] # getting 70% of the data as training

# Attach training data first
attach(training)

# To build a linear regression model, give this model a name "linear":
linear = lm(FTP~ Post + FGP + `3PP` + AST + TRB+ TOV+BLK+`PS/G`+ MP)


# To see the result of model:
summary(linear)

plot(linear)

# To predict the gross of data from testing dataset using the linear model we built
testing$linear_prediction = predict(linear, newdata = testing)

# To see the accuracy of prediction:
accuracy = testing$linear_prediction - testing$FTP
percent = accuracy/testing$FTP
mean(accuracy,na.rm = TRUE) # to see how much percentage away from the actual

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Thank you for the opportunity.50% off the next assignment.

1.1 Number of Games per Season
In [3]:
games = df.drop_duplicates("game_id") \
.groupby(["season", "playoffs"]).size() \
.unstack()
games.head(3)

Out[3]:

playoffs

playoffs

regular

2006 - 2007

79

1218

2007 - 2008

84

1227

2008 - 2009

85

1231

season

In [4]:
fig, ax = plt.subplots(1,2, figsize=(15,5))
plt.suptitle("Number of Games per Season", y=1.03, fontsize=20)
games.regular.plot(marker="o", rot=90, title="Regular Season", color="#41ae76
", ax=ax[0])
games.playoffs.plot(marker="o", rot=90, title="Playoffs", ax=ax[1])

Out[4]:


In the regular season there are 1231 games played (30 teams playing 82 games each plus one Allstar game), except for the 2011-2012 season which was shortened due to a lockout. Thus, there is a
big drop in the diagram. The number of games are also not exactly 1231 for all seasons because for
some games there was simply no data available during the scraping process.
The playoffs are played in a best-of-seven mode and that's why the number of games vary.

1.2 Average Number of Free Throws per Game by Season
In [5]:
ft_total = df.groupby(["season", "playoffs"]).size() \
.unstack()
ft_total.head(3)

Out[5]:

playoffs

playoffs

regular

2006 - 2007

4116

63496

2007 - 2008

4384

61116

2008 - 2009

4455

60900

season

In [6]:
ft_per_game = ft_total / games
ft_per_game.head(2)

Out[6]:

playoffs

playoffs

regular

2006 - 2007

52.101266

52.131363

2007 - 2008

52.190476

49.809291

season

In [7]:
ft_per_game.plot(marker="o", rot=90, figsize=(12,5))
plt.title("Average Number of Free Throws per Game", fontsize=20)
plt.arrow(5.3, 51, -0.5, -1.2, width=0.01, color="k", head_starts_at_zero=Fal
se)
plt.text(4.8, 51.2, "Change of Rules")

Out[7]:


As expected, the number of free throws per game is higher for playoff games than for regular season
games (although only slightly in the first and last season of this data set). Overall, one can see that
there is a decline of free throws per game in the course of the seasons.
There is an especially deep drop from season 2010-2011 to 2011-2012 and it moves almost in
parallel for regular season and playoff games. So, there must have been some kind of change
regarding the rules of what constitutes a foul. And sure enough, I found this article which confirmed
my suspicion: http://www.espn.com/nba/story/_/id/7329584/nba-alters-emphasis-shooting-fouls2011-12

1.3 Number of Free Throws per Period
In [8]:
periods = df.groupby(["game_id", "playoffs", "period"]).size() \
.unstack(["playoffs", "period"]) \
.describe()[:2] \
.stack().unstack(0) \
.swaplevel(0, 1, axis=1).sortlevel(axis=1)

periods
/opt/conda/lib/python3.5/site-packages/numpy/lib/function_base.py:3834: Runti
meWarning: Invalid value encountered in percentile
RuntimeWarning)

Out[8]:

count

p
l
a
y
o
f
f
s

mean

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

1
.
0

8
3
4
.
0

1
1
9
6
6
.
0

9
.
3
2
8
5
3
7

9
.
1
2
4
3
5
2

2
.
0

8
3
6
.
0

1
2
0
1
4

1
2
.
5
6
2

1
1
.
9
6
2

p
e
r
i
o
d

count

p
l
a
y
o
f
f
s

p
l
a
y
o
f
f
s

mean

r
e
g
u
l
a
r

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

2
0
1

3
7
7

1
2
.
6
0
0
0
0
0

1
1
.
9
7
0
5
7
1

1
5
.
4
9
3
9

1
4
.
4
3
2
2

p
e
r
i
o
d
.
0

3
.
0

8
3
5
.
0

1
1
9
9
5
.
0

4
.
0

8
3
2
.
0

1
2
0
1
4
.
0

count

p
l
a
y
o
f
f
s

p
l
a
y
o
f
f
s

mean

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

9
0

4
6

7
1
3
.
0

7
.
3
9
2
1
5
7

7
.
0
9
9
5
7
9

1
1
4
.
0

4
.
5
7
1
4
2
9

7
.
0
7
0
1
7
5

r
e
g
u
l
a
r

p
e
r
i
o
d

5
.
0

6
.
0

5
1
.
0

7
.
0

count

p
l
a
y
o
f
f
s

p
l
a
y
o
f
f
s

mean

r
e
g
u
l
a
r

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

2
0
.
0

7
.
6
6
6
6
6
7

6
.
3
5
0
0
0
0

N
a
N

1
2
.
0
0
0
0
0
0

p
e
r
i
o
d

7
.
0

8
.
0

3
.
0

N
a
N

2
.
0

There were only 7 playoff games that went into the 6th period, so I am not going to include them (or
higher periods) into the following graph.
In [9]:

periods["mean"][:5].plot(marker="o", xticks=(1,2,3,4,5), xlim=(0.8, 5.2), fig
size=(8,5))
plt.title("Average Number of Free Throws", fontsize=20)

Out[9]:


Here again, playoff games have an higher average than regular season games (across all periods).
And as expected, as the game comes closer to the end the number of free throws increases with the
highest average being in the fourth quarter.
There is a huge drop in the fifth quarter because periods in overtime are only 5 minutes long. In
order to compare them with the first 4 periods (which are 12 minutes long), I am going to calculate
the average number of free throws per minute per period.
In [10]:
periods["minutes"] = [12,12,12,12,5,5,5,5]
periods["playoffs"] = periods["mean"].playoffs / periods.minutes
periods["regular"] = periods["mean"].regular / periods.minutes
periods

Out[10]:

co
u
nt

p
l
a
y
o
f
f
s

me
an

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

1
.
0

8
3
4
.
0

1
1
9
6
6
.
0

9
.
3
2
8
5
3
7

9
.
1
2
4
3
5
2

2
.
0

8
3
6
.
0

1
2
0
1
4

1
2
.
5
6
2

1
1
.
9
6
2

m
i
n
u
t
e
s

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

1
2

0
.
7
7
7
3
7
8

0
.
7
6
0
3
6
3

1
2

1
.
0
4
6
8

0
.
9
9
6
8

p
e
r
i
o
d

co
u
nt

p
l
a
y
o
f
f
s

p
l
a
y
o
f
f
s

m
i
n
u
t
e
s

me
an

r
e
g
u
l
a
r

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

2
0
1

3
7
7

1
2
.
6
0
0
0
0
0

1
1
.
9
7
0
5
7
1

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

5
0

6
5

1
.
0
5
0
0
0
0

0
.
9
9
7
5
4
8

p
e
r
i
o
d
.
0

3
.
0

8
3
5
.
0

1
1
9
9
5
.
0

1
2

co
u
nt

p
l
a
y
o
f
f
s

p
l
a
y
o
f
f
s

me
an

r
e
g
u
l
a
r

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

1
2
0
1
4
.
0

1
5
.
4
9
3
9
9
0

1
4
.
4
3
2
2
4
6

7
1
3

7
.
3
9
2

7
.
0
9
9

m
i
n
u
t
e
s

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

1
2

1
.
2
9
1
1
6
6

1
.
2
0
2
6
8
7

5

1
.
4
7
8

1
.
4
1
9

p
e
r
i
o
d

4
.
0

8
3
2
.
0

5
.
0

5
1
.
0

co
u
nt

p
l
a
y
o
f
f
s

p
l
a
y
o
f
f
s

m
i
n
u
t
e
s

me
an

r
e
g
u
l
a
r

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

1
5
7

5
7
9

4
3
1

9
1
6

4
.
5
7
1
4
2
9

7
.
0
7
0
1
7
5

0
.
9
1
4
2
8
6

1
.
4
1
4
0
3
5

p
e
r
i
o
d
.
0

6
.
0

7
.
0

1
1
4
.
0

5

co
u
nt

p
l
a
y
o
f
f
s

p
l
a
y
o
f
f
s

me
an

r
e
g
u
l
a
r

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

2
0
.
0

7
.
6
6
6
6
6
7

6
.
3
5
0
0
0
0

m
i
n
u
t
e
s

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

5

1
.
5
3
3
3
3
3

1
.
2
7
0
0
0
0

p
e
r
i
o
d

7
.
0

8
.
0

3
.
0

N
a
N

2
.
0

N
a
N

1
2
.
0
0
0

5

N
a
N

2
.
4
0
0
0

co
u
nt

p
l
a
y
o
f
f
s

p
l
a
y
o
f
f
s

m
i
n
u
t
e
s

me
an

r
e
g
u
l
a
r

p
l
a
y
o
f
f
s

p
l
a
y
o
f
f
s

r
e
g
u
l
a
r

r
e
g
u
l
a
r

p
e
r
i
o
d
0
0
0

0
0

In [11]:
per_minute = periods[["playoffs", "regular"]][:5]
per_minute.columns = per_minute.columns.droplevel(1)
per_minute.plot(marker="o", xticks=(1,2,3,4,5), xlim=(0.8, 5.2), figsize=(8,5
))
plt.title("Average Number of Free Throws per Minute", fontsize=20)

Out[11]:


Now, the pattern is more clear. The closer the game gets to the end, the higher the number of free
throws. Let's see if that also applies to the actual playing time left.

1.4 Number of Free Throws: Seconds left
In [12]:
# excluding free throws that were made during overtime
df_seconds_left = df[df.period =100]
shooting.head(3)

Out[16]:

ft_count

percentage

A.J. Price

282

0.748227

Aaron
Brooks

1109

0.836790

Aaron
Gordon

254

0.681102

player

In [17]:
shooting.percentage.hist(bins=50, figsize=(8,5))
plt.title("Distribution of Shooting Percentages", font...


Anonymous
Excellent! Definitely coming back for more study materials.

Studypool
4.7
Indeed
4.5
Sitejabber
4.4

Similar Content

Related Tags