Unformatted Attachment Preview
The Prediction of La Liga champion
La Liga or the Spanish soccer league is one of the five major soccer leagues in the world,
and it’s well known as one of the strongest leagues due to the high quality of players playing in
that league. The main goal of the project is to look deeply into the historical data provided and
predict the champion of the upcoming season. The question may appear to be impossible to
answer, however, it will be specific and will be supported by evidence from the data. The answer
to this question will be very important to see how the code accuracy will be.
The data set we want to analyze is European Soccer Dataset: La Liga from 1970 to 2016,
since it provides much detailed information of games played in the last five decades in the La
Liga such as the number of matches won/lost, total matches, goals scored, and many other
attributes that will help us to predict a solid answer.
The dataset provided is in CSV file and we believe that it does not need to be converted
into another format since we are planning to use a python programming language to analyze the
dataset. Moreover, the SAS program might be used to perform regression analysis between two
or more attributes to determine if there is a relationship between those attributes and how strong
is this relationship.
The data mining problem will be formulated for both classification and regression on one
dataset. Furthermore, the main question we want to answer is to predict the champion of the
upcoming season of La Liga, but also important information must be predicted such as the
relationship between the number of wins and goals scored, such questions should be answered
using regression analysis. Moreover, here we are trying to anticipate the champion of 2017
season, but we also want to know if there is a strong relationship between two attributes and
The type of project we want to do is an application-based project since we are trying to
go deep in the data and discover important facts from the data using many tools provided in
python. For example, we can use the histogram feature in python to predict the highest team
scored/conceded goals in the history of La Liga.