Description
Database Design and Analysis
The following database project will create an educational attainment “demand” forecast for the state of California for years greater than 2010. The demand forecast is the expected number of population who have obtained a certain level of education. The population is divided into age groups and education attainment is divided into different levels. The population of each group is estimated for each year up to year 2050. Implement the following steps to obtain and educational demand forecast for the state of California. The files can be downloaded below.
- Create a ca_pop schema in your MySQL database.
- Using your ca_pop schema, create an educational_attainment table which columns match the columns in the Excel spreadsheetca_pop_educational_attainment.csv.
- Using your ca_pop schema, create a pop_proj table which columns match the columns in the Excel spreadsheet pop_proj_1970_2050.csv.
- Using the data loading technique for a csv file you learned in Module 1, load the data in ca_pop_educational_attainment.csv into the table educational_attainment.
- Using the data loading technique for a csv file you learned in Module 1, load the data in pop_proj_1970_2050.csv into the table pop_proj.
- Write a query to select the total population in each age group.
- Use the query from Step 6 as a subquery to find each type of education attained by the population in that age group and the fraction of the population of that age group that has that educational attainment. Label the fraction column output as coefficient. For instance, the fraction of the population in age group 00 - 17 who has an education attainment of Bachelor's degree or higher is 0.0015, which is the coefficient.
- Create a demographics table from the SQL query from Step 7.
- Create a query on the pop_proj table which shows the population count by date_year and age.
- Use that query from Step 9 as a subquery and join it to the demographics table using the following case statement:
demographics.age = case when temp_pop.age < 18 then '00 to 17' when temp_pop.age > 64 then '65 to 80+' else '18 to 64' end
“temp_pop” is an alias for the subquery. Use the following calculation for the demand output:
round(sum(temp_pop.total_pop * demographics.coefficient)) as demand
Output the demand grouped by year and education level.
Write each query you used in Steps 1 – 10 in a text file (and submit results screenshots). If a query produced a result set, the list then first ten rows of each row set after the query.
Explanation & Answer
Attached.
1. Create a ca_pop schema in your MySQL database.
Create schema ca_pop;
2. Create table educational_attainment;
3. Import ca_pop_educational_attainment.csv
4. Create table pop_proj
5. Import pop_proj_1970_2050.csv
6. Write a query to select the total population in each age group.
SELECT (SELECT distinct coalesce(sum(pop_count),0) from educational_attainment where age='00 to
17') as '00-17',(SELECT distinct coalesce(sum(pop_count),0) from educational_attainment where age='18
to 64') as '18-64',(SELECT distinct coalesce(sum(pop_count),0) from educational_attainment where
age='65 to 80+') as '65-80+' FROM `educational_attainment` group by '00-17';
7. Use the query from Step 6 as a subquery to find each type of education attained by the
population in that age group and the fraction of the population of that age group that has
that educational attainment. Label the fraction column output as coefficient. For instance,
the fraction of the population in age group 00 - 17 who has an education attainment of
Bachelor's degree or higher is 0.0015, which is the coefficient....