Assume for the moment that a star schema's fact table contains a million sales transactions. To gain a better understanding of this large dataset, the analyst may calculate a set of sub-totals. For example, sub-totals of sales could be calculated across different dimension combinations, and at different levels within each dimension hierarchy (e.g., store / city / county / region/ country / world). One specific sub-total could be: the sales revenue for Executive PCs (a specific product), sold during Feb-01 (a specific time interval), in Tokyo (for a specific location). In this context, product, time, and location are examples of dimensions - i.e., the context of each transaction. To help one think about all the possible combinations of sub- and grand-totals (i.e., act as a conceptual model), it is helpful to use a cube metaphor (like shown in the below image). In this respect, each smaller cube represents one possible sub-total. As you can see, there are many possible sub-totals that could be calculated.
As good as the cube (hypercube) metaphor is, it is however limited to highlighting 3 dimensions, which is usually less than many implementations of star schemas (for example, a SAP InfoCube - SAP's implementation of the MM concept - can support up to 13 dimensions).
Typically, all the sub-totals are pre-calculated (e.g., over-night) as dong so requires significant computer resources. But with HANA, the vision is to do these calculations on the fly, thereby making the ETL process a little easier to manage. There are several well-known strategies for how cubes are updated - MOLAP (SAP BW does it this way), ROLAP, and HOLAP. I suggest, if interested, you read more about these approaches, but HANA may change everything!
The calculated sub-totals/totals can then be made available to managers usually through some type of OLAP client software (e.g., MS PivotTable, MS PowerPivot, SAP Analysis, SAP Web Intelligence (Webi), SAP Explorer), which allows the end-user to quickly change the sub-set of sub-totals they are viewing (also called slicing & dicing, drill-down & roll-up, etc.).
Need a refresher on MS Access?
All students should have completed at least an introductory course in MS Access. However, I appreciate some may be rusty. If this applies to you, it won't take too much effort to get up with the basics, and below are three options that may help you.
Option 1: MS Access access has a very good help facility - press [F1] while in the product. For example, the search results on 'creating tables' included this short and very clear video (use IE, as needs Silverlight). There are many other short videos, as shown in the right-hand pane of the video viewer (you'll see them when viewing this video).
Option 2: Microsoft has produced some quality training videos - here. My advice is to review the 'Beginner' material, such as the videos on: Design and build tables for a database.
The below PowerPoint slideshow (which can be downloaded and printed off) features a number of screenshots of the MS Access relationship screenof several star schemas created by the instructor. (Why do you think they are called star schemas?)
Task: Within MS Access, reproduce 2 schemas: the mandatory schema (Frozen Food), and choose one other to reproduce from the remaining optional schemas.
You will need to create a separate database file (.accdb or later) for each schema. Ensure you create the appropriate relationships between the tables (& relationship cardinality), and referential integrity has been enforced. Also ensure you have identified all keys.
Several of the example schemas have a lot of detail. To restrict the amount of work you need to do, please follow these guidelines: • There must be at least 4 dimensions (in addition to the ‘Fact’ table). Each dimension must feature at least 4 fields (attributes), of which one field will be the primary key. • There must be at least 3 key figures (measures) within the ‘Fact’ table (in addition to the foreign keys).
GOOD TO KNOW
Over-view of steps: 1. Create a new blank desktop database 2. Create your first dimension table 2.1 Define the fields for the dimension table, together with the appropriate data types (remove any field called 'id,' or any data type called autonumber, as you must not use these). 2.2 Define the key for the dimension table. 2.3 Close and save this table. 3. Repeat 2. for each dimension table 4. Create the fact table. 4.1 Define the fields for the fact table, together with the appropriate data types (remove any field called 'id,' or any data type called autonumber, as you must not use these) 4.2 Define the key for the dimension table. This key will be a composite key. 4.3 Close and save this table. 5. In the relationship screen, for each dimension table, define the relationship between each dimension table and the fact table. Ensure the relationship type is per the example schemas. Ensure referential integrity is turned on. 6. Close all tables views (not the database itself) and proceed to enter data into each dimension table, while keeping all other objects closed. 6.1 As you complete data entry for each table, close that table. 7. Enter data into the fact table. 7.1 Close this table. 8. Create the required query. 8.1 Run the query. 9. Export the data generated from running the query to Excel. 10. In Excel, define the requested PivotTable.
Never enter data into any database table until you have first created all the tables in the database, all relationships, and have flipped referential integrity on for all relationships. Having done so, then go ahead and populate the tables! In other words, lock-in the database design (structure) before populating. To do it in any other order is just inviting complications. Also, do not attempt to populate the fact table before completing the populating of the dimension tables, as this will invoke the wrath of referential integrity.
Stop the frustration!
Populate tables only after defining the schema has been completely finished (structure first!).
Ensure that data types of foreign keys are the same as for the applicable primary keys.
Populate the DIM (dimension) tables before the FACT (transactional) table.
When working on the Relationship Screen, ensure no other tabs are showing (nothing else is open).
Never use AutoNumber as a data type, as a field with such as data type cannot be used as a foreign key (think about it).
For all fields except those holding numbers, select a text data type. Although not the best practice, will reduce the likelihood of certain errors.
When defining your FACT table, ensure that the data types of the 'measure' fields are suitable for storing values - e.g., using the data type of Text to store transaction amounts is a poor choice (e.g., it will be difficult to sum the transaction amounts in the Excel PivotTable if Excel thinks they are text).
General advice: Close all tabs in MS Access unless you are directly working on them (i.e., usually it is best to have only a single tab open at any point in time!). What students do not realize is that when you close things, there are check-n-balances run by Access, which may include unlocking data.
What do I actually mean by asking you to 'reproduce the star schemas' in MS Access? At the end of the day, I will open your .accdbfile up, turn to the relationship screen and I want to see there exactly what I see in the screenshot. I will then go ahead and inspect each table's contents.
Resist the temptation to start putting data into the tables before the database structure has been finalized. Trust me: To put data in before this point is only inviting trouble. Unfortunately, this was may have been a bad habit you picked up in CISM 2530.
When I create a relationship in the relationship screen, it comes up as one-to-one, when I am expecting one-to-many? >> Check the status of the index field property - if the value is 'Yes (No Duplicates)' then this forces a 1:1 situation. This issue may also arise when a student has entered data before finalizing the database design - which is a big no no.
I do not have the infinity sign showing in my relationship screen. >> Check whether you turned on referential integrity for that relationship.
When I create a relationship in the relationship screen, it will not let me and instead tells me something about a relationship must be on the same number of fields with the same data types. >> Check whether you have been consistent with your choice of data types across the key/foreign key combos
How do I make a composite key (aka. multi-attribute key) in Access? >> How to create a composite key in Access - see here.
Again, do not input any data into the database until you are completely sure all the bugs have been worked out with regards database design - all tables must be there, all fields are there, and all relationships are there. Ok, now you can start with data input.
Crows Feet Notation
MS Access does not support crows feet notation: they support a notation not found anywhere else (which is why I highlight the crow's feet thing, which is the dominant notation for drawing ERDs).
An important point here is that all star schemas are identical (apart from their differing domain focus). So, the 6 schemas provided to you are identical in structure. So,
also has the same meaning as the following, in terms of relationship cardinality
being an example of the middle one below (in terms of relationship type)
The mandatory schema for Frozen Food features a notation different to the one featured in MS Access' relationship screen (which is only found in Access). This notation is called 'crows feet,' and is more-or-less an industry standard for data modeling (see here for more info on this notation).
For each ‘dimensional’ table featured within your database, populate it with 3 (and no more than 3) records of ‘representative’ data. However, for the ‘fact’ table, you need to populate it with at least 7 records (failing to do so will result in a substantial penalty).
All fields must be populated for each record. As you may feel that some of the fields in the example star schemas are cryptic, you can use some liberty in their interpretation. Of course, given that you will probably not have access to the relevant data sets, you can make up the data, but the values should not be silly. I’ll repeat: where do I get data from? Answer: Please make up the data!
GOOD TO KNOW
If you have trouble entering the required number of records within the fact table, you may be:
attempting to include within the fact table foreign key values that are not valid (e.g., entering a transaction for a customer that is not listed in the customer table), thereby violating referential integrity rules, or
attempting to enter duplicate key values.
Let's illustrate the specifics of this latter issue: assume a schema with 4 dimension tables + 1 fact table. I've made up abstract values to represent this situation, as shown below (4 dimension tables, each with 2 fields and 2 records; 1 fact table with 2 records, and 6 fields - 4 of which are part of the fact table's composite key):
For each schema, define a single query that features all fields within the star schema. Name this query PivotTable_Query. This query will produce ade-normalized dataset with many fields. This MS Access query will become the data source for your Access PivotTable.
Create PivotTable View
Import into Excel the data generated by the PivotTable_Query. Then in Excel, create a PivotTable (or use PowerPivot), with this data being the PivotTable's data source. Then configure the PivotTable's field list, so that your PivotTable features at least one column and one row dimension from the field list. Then sum the values for one of your key figures.
GOOD TO KNOW
Summing values within the PivotTable
Key figures hold measurement values of business events. Therefore, the data type chosen in MS Access for key figures should be of a numeric nature. However, if one has given key figures another type of data type--e.g., short text--then Excel will recognize the imported data as text, even if it looks like a number. It will therefore refuse to add it up, as one does not normally sum text.
MS wants end-users to start using its newish PowerPivot technology in Excel 2013. Given the increasingly larger sizes of data, and end-users' demands to join multiple tabs of data from different sources in a single pivot, MS is pushing an Excel 2013 (PowerPivot) + Sharepoint + SQL Server (SQL Analysis Services) OLAP solution (see here for overview), rather than encouraging end-users to keep using Access.
Submit 2 Access database, and 2 MS Excel files.
Rules about naming submitted files:
- name the Frozen Foods database: Food.accdb.
- name the Frozen Foods related Excel file: Foods.xlsx
You can choose any name for the remaining files.
Do not zip the submitted files.
Do not by accident upload the database lock files (e.g., food.laccdb), instead of the actual database files.
To avoid this problem, never upload any files while Access is open.
All submitted files will be scored with regards the level of similarity with materials that have been previously submitted for this assignment, as well as submitted files this semester. All submissions above a reasonable threshold will be closely audited for the possibility of academic misconduct. The scoring goes well beyond comparing table contents - e.g., refers to the amount of time between adding individual fields in a table, which is all secretly logged by Access. If you have used another student's work outside your immediate work group, the analysis of your submission will highlight this.
TAKE A MOMENT TO REFLECT
Why are the schemas you created called 'star' schemas? - metaphorically, at first glance they look star-shaped. A variation on this theme is the more complicated snowflake schema.
If you completed the full SAP Project (some under-graduates would not have), you were required to create an InfoCube and then populate it with data. This task was not too dissimilar to what you accomplished by completing the tasks above. However, clearly BW is a much more sophisticated environment. However, from a learning point of view, the concept of a database star schema is not well articulated when dealing directly with BW. Notwithstanding, after completing this assignment you will be more aware of this fundamental BI concept.