Install and configure an SQL server, attach the sample AdventureWorks database to the SQL server, prepare and extract data from the AdventureWorks database, and then report on your findings and propose a plan to analyze the data.
There are several preliminary steps you must go through before beginning the assignment itself:
- First, download and install the Microsoft SQL Server 2012 Express software. (See installation instructions here. This document includes instructions for both PCs and Macs.) This is the SQL Database Server.
- Second, download and install the Microsoft SQL Server Management Studio 2012 Expresssoftware. (See installation instructions here. This is the same document as above and includes instructions for both PCs and Macs.) This is the tool you will use to manage your server and access the database.
- Third, download and attach the AdventureWorks database file consisting of pre-existing data and access that database using the SQL Server Management Studio. This database contains the data that you will use for this assignment. If you are using a Mac, then you will need to use theEmployees_DB_Full database.
Next, evaluate the database contents based on the business problem presented below:
You have been hired by XYZ Corporation, a multinational company. XYZ Corporation has recently acquired a competitor (AdventureWorks) within the United States. XYZ Corporation management is interested in learning more about AdventureWork’s employee base. Specifically, they are looking to identify any trends or patterns within the dates of hire, demographics, performance statistics, as well as any interesting or meaningful correlations between any of the data, etc. AdventureWorks has provided XYZ Corporation management with a database. This database, while containing a plethora of information, needs your analysis. Focus on the tables and views that describe the employees of the organization.
In reviewing the data, consider the following questions:
- Does the data lend itself to having various attributes mapped to the two or three axes of a plot for cluster analysis or perhaps classification approaches?
- What opportunities of potential connections or relationships between the data can you identify?
- Is the data more suited for one or more of the association analysis techniques?
Depending on what your findings are, clean the data and apply any necessary pre-processing techniques (e.g., aggregation, sampling, subset selection, binarization, discretization, or transformation) that you feel are needed in order to better analyze the data later. You may also want to reorganize the data into new tables or views for easy export.
(Note: You will not be conducting analysis on the data in this module, so please do not download and install the SPSS trial software until Week 8.)
Once the data has been processed and prepared for analysis, you will need to review your results and propose the analysis approach or approaches that you feel will yield the best results in order to answer the business problem. These analysis models were discussed in Modules 4 and 5. Your Critical Thinking submission must include the following elements:
- Data quality considerations with regard to the database contents. Did you find any issues/missing data? If so, how do you plan on dealing with these issues?
- Description of the data. Discuss what is contained within the database and how the database and the tables of interest are structured.
- Methods you used to prepare your data for processing, e.g., pre-processing and organization. Include any SQL queries and views that you used along with an explanation of why you took these steps.
- A restatement of the business problem along with brief proposal and explanation of the chosen analysis approach that you will be using in your Portfolio Project in Week 8.
Your well-written findings should be 2-4 pages in length and formatted. Include at least two credible outside references: 1) the pre-processing methods used, 2) the analysis supporting the pre-processing methods, and 3) the proposed analysis approach. The outside references may include credible sources in print or from the internet.
The data file and log file for this question can be found here.