normalization process

User Generated

crgrecrgre

Computer Science

Description

The Scenario:

You have been hired by a manufacturing company with multiple manufacturing facilities to perform the logical design of a database to manage their roster of available temporary workers and their subsequent assignments to fill workforce needs. The temporary workers are needed on a seasonal basis to fill a number of job roles such as materials management, van driver, loading/shipping dock worker, assembly line worker, packaging/shipping support, etc. Each manufacturing facility is configured with three different work areas, which are raw materials management, assembly line productions, and packaging/shipping. Here are the requirements that you have identified after interviewing company management:

  1. The company would like to record and maintain complete information on each temporary worker, including full name, home address, mailing address (which may or may not differ from the home address), preferred and secondary telephone numbers, email address, date of birth, gender.
  2. The manufacturing facilities all operate on a 16-hour daily basis, two 8-hour shifts a day,six days a week, Monday through Saturday. They would also like to maintain information on each temporary worker's work shift availability in terms of days of the week (Monday through Saturday) and work shift (first shift, second shift, "bridge" shift (last 4 hours of first shift and second shift) on the specified days). Workers may specify availability for one or more shifts during those days.
  3. In addition, they will need to maintain complete information on each temporary worker's job certifications and information on certification instructors (same information as for temporary workers identified and item 1 above, information on which courses they are qualified to teach, and their teaching histories for the company). A temporary worker must be certified for a job role by completing a required training program for that role and can be certified to perform one or more job roles (see job role examples in the opening paragraph above). For job certifications, the company would like to maintain the identification of the certification, identification of the temporary worker participating in the certification training, the dates of certification training (certification training ranges from 4 to 16 hours depending on the type of certification), the certification trainer's identification, and a pass/fail designation for the temporary worker completing the certification training. The company would also like to maintain complete information on available job role certification training program courses.
  4. Temporary workers are also allowed to specify prioritized work area location preferences for one or more manufacturing locations.
  5. Finally, they have stated the requirement for maintaining a complete record of each temporary worker's assignments, including manufacturing facility location and work area assignment, date worked, shift(s) worked, and job role(s) filled by shift.

The company intends to use the completed database design to support the identification of temporary workers to fill needed assignments and to produce comprehensive reports on the use of temporary workers.

The Deliverable:

In a Microsoft Word document, provide a complete integrated set of normalized (3NF required) relations, using the format displayed in Figure 9.3e on page 316 (note that entities are in all caps, attribute names with multiple words are connected by an underline character joining each word (for example, "Student_ID"), primary keys are underlined and listed first within the attributes) in Third Normal Form (see pages 318-321 for explanatory information on the normalization process).

An Example of Normalization:

Here is a relatively simplistic example of a normalization process:

An "un-normalized" relation:

STUDENT(Last_Name, First_Name, Course_Name, Instructor_Name, Semester, Year)

There are multiple problems here: multiple students could have the same last name, course name might occur many times within the table and could be entered incorrectly or misspelled in some records, instructor name might occur many times within the table and could be entered incorrectly or misspelled in some records, semester name might occur many times within the table and could be entered incorrectly or misspelled in some records.

The "normalized" version in Third Normal Form:

STUDENT(Student_ID, Last_Name, First_Name)

INSTRUCTOR(Instructor_ID, Last_Name, First_Name)

COURSE(Course_ID, Course_Name)

SEMESTER(Semester_ID, Term, Year)

COURSES_TAKEN(Student_ID,Course_ID,Instructor_ID,Semester_ID)

Unformatted Attachment Preview

CHAPTER 9 DESIGNING DATABASES 315 FIGURE 9-3 Simple example of logical data modeling (a) Highest-volume customer query screen (a) HIGHEST-VOLUME CUSTOMER ENTER PRODUCT ID.: M128 START DATE: 11/01/2017 END DATE: 12/31/2017 ––––––––––––––––––––– CUSTOMER ID.: 1256 NAME: Commonwealth Builder VOLUME: 30 This inquiry screen shows the customer with the largest volume of total sales for a specified product during an indicated time period. Relations: CUSTOMER(Customer_ID,Name) ORDER(Order_Number,Customer_ID,Order_Date) --------PRODUCT(Product_ID) LINE ITEM(Order_Number,Product_ID,Order_Quantity) (b) Backlog summary report (b) PAGE 1 BACKLOG SUMMARY REPORT 11/30/2017 BACKLOG QUANTITY 0 0 6 30 … PRODUCT ID B381 B975 B985 E125 2 … M128 This report shows the unit volume of each product that has been ordered less that amount shipped through the specified date. Relations: PRODUCT(Product_ID) LINE ITEM(Product_ID,Order_Number,Order_Quantity) ORDER(Order_Number,Order_Date) SHIPMENT(Product_ID,Invoice_Number,Ship_Quantity) INVOICE(Invoice_Number,Invoice_Date,Order_Number) (c) CUSTOMER(Customer_ID,Name) PRODUCT(Product_ID) INVOICE(Invoice_Number,Invoice_Date,Order_Number) ________ ORDER(Order_Number,Customer_ID,Order_Date) _______ LINE ITEM(Order_Number,Product_ID,Order_Quantity) SHIPMENT(Product_ID,Invoice_Number,Ship_Quantity) (c) Integrated set of relations 316 PART IV DESIGN FIGURE 9-3 (continued) (d) Conceptual data model and transformed relations (d) CUSTOMER Customer_ID Name Address Places ORDER Order_Number Order_Date LINE ITEM Order_Quantity Bills PRODUCT Product_ID Description INVOICE Invoice_Number SHIPMENT Ship_Quantity Relations: CUSTOMER(Customer_ID,Name,Address) PRODUCT(Product_ID,Description) ORDER(Order_Number,Customer_ID,Order_Date) --------LINE ITEM(Order_Number,Product_ID,Order_Quantity) INVOICE(Invoice_Number,Order_Number) ---------SHIPMENT(Invoice_Number,Product_ID,Ship_Quantity) (e) Final set of normalized relations (e) CUSTOMER(Customer_ID,Name,Address) PRODUCT(Product_ID,Description) ORDER(Order_Number,Customer_ID,Order_Date) _______ LINE ITEM(Order_Number,Product_ID,Order_Quantity) INVOICE(Invoice_Number,Order_Number,Invoice_Date) ________ SHIPMENT(Invoice_Number,Product_ID,Ship_Quantity) FIGURE 9-4 Definition of shipment table in Microsoft Access (Source: Microsoft Corporation.) t The Invoice Number field is required because it is part of the primary key for the SHIPMENT table (the value that makes every row of the SHIPMENT table unique is a combination of Invoice Number and Product ID). t An index is defined for the Invoice Number field, but because there may be several rows in the SHIPMENT table for the same invoice (different products on the same invoice), duplicate index values are allowed (so Invoice Number is what we will call a secondary key). CHAPTER 9 DESIGNING DATABASES 317 FIGURE 9-5 EMPLOYEE1 relation with sample data EMPLOYEE1 Emp_ID Name Dept Salary 100 140 110 190 150 Margaret Simpson Allen Beeton Chris Lucero Lorenzo Davis Susan Martin Marketing Accounting Info Systems Finance Marketing 75,000 95,000 90,000 90,000 62,000 Many other physical database design decisions were made for the SHIPMENT table, but they are not apparent on the display in Figure 9-4. Further, this table is only one table in the Pine Valley Furniture Company Order Entry database, and other tables and structures for this database are not illustrated in this figure. The Relational Database Model Many different database models are in use and are the bases for database technologies. Although hierarchical and network models have been popular in the past, these are not used very often today for new information systems. Object-oriented database models are emerging but are still not common. The vast majority of information systems today use the relational database model. The relational database model (Codd, 1970; Date, 2012; Elmasri and Navathe, 2015; Umanath and Scamell, 2014) represents data in the form of related tables, or relations. A relation is a named, twodimensional table of data. Each relation (or table) consists of a set of named columns and an arbitrary number of unnamed rows. Each column in a relation corresponds to an attribute of that relation. Each row of a relation corresponds to a record that contains data values for an entity. Figure 9-5 shows an example of a relation named EMPLOYEE1. This relation contains the following attributes describing employees: Emp_ID, Name, Dept, and Salary. This table has five sample rows, corresponding to five employees. You can express the structure of a relation with a shorthand notation in which the name of the relation is followed (in parentheses) by the names of the attributes in the relation. The identifier attribute (called the primary key of the relation) is underlined. For example, you would express EMPLOYEE1 as follows: Relational database model Data represented as a set of related tables or relations. Relation A named, two-dimensional table of data. Each relation consists of a set of named columns and an arbitrary number of unnamed rows. EMPLOYEE1(Emp_ID,Name,Dept,Salary) Not all tables are relations. Relations have several properties that distinguish them from nonrelational tables: 1. Entries in cells are simple. An entry at the intersection of each row and column has a single value. 2. Entries in a given column are from the same set of values. 3. Each row is unique. Uniqueness is guaranteed because the relation has a nonempty primary key value. 4. The sequence of columns can be interchanged without changing the meaning or use of the relation. 5. The rows may be interchanged or stored in any sequences. Well-Structured Relations What constitutes a well-structured relation (also known as a table)? Intuitively, a wellstructured relation contains a minimum amount of redundancy and allows users to insert, modify, and delete the rows in a table without errors or inconsistencies. Well-structured relation A relation that contains a minimum amount of redundancy and that allows users to insert, modify, and delete the rows without error or inconsistencies; also known as a table. 318 PART IV DESIGN EMPLOYEE2 Emp_ID Name Dept Salary Course 100 100 140 110 110 190 150 150 Margaret Simpson Margaret Simpson Alan Beeton Chris Lucero Chris Lucero Lorenzo Davis Susan Martin Susan Martin Marketing Marketing Accounting Info Systems Info Systems Finance Marketing Marketing 42,000 42,000 39,000 41,500 41,500 38,000 38,500 38,500 SPSS Surveys Tax Acc SPSS C++ Investments SPSS TQM Date_Completed 6/19/2017 10/7/2017 12/8/2017 1/22/2017 4/22/2017 5/7/2017 6/19/2017 8/12/2017 FIGURE 9-6 Relation with redundancy EMP CO URSE FIGURE 9-7 EMP COURSE relation Emp_ID Course Date_ Completed 100 100 140 110 110 190 150 150 SPSS Surveys Tax Acc SPSS C++ Investments SPSS TQM 6/19/2017 10/7/2017 12/8/2017 1/22/2017 4/22/2017 5/7/2017 6/19/2017 8/12/2017 EMPLOYEE1 (Figure 9-5) is such a relation. Each row of the table contains data describing one employee, and any modification to an employee’s data (such as a change in salary) is confined to one row of the table. In contrast, EMPLOYEE2 (Figure 9-6) contains data about employees and the courses they have completed. Each row in this table is unique for the combination of Emp_ID and Course, which becomes the primary key for the table. This is not a well-structured relation, however. If you examine the sample data in the table, you notice a considerable amount of redundancy. For example, the Emp_ID, Name, Dept, and Salary values appear in two separate rows for employees 100, 110, and 150. Consequently, if the salary for employee 100 changes, we must record this fact in two rows (or more, for some employees). The problem with this relation is that it contains data about two entities: EMPLOYEE and COURSE. You will learn to use principles of normalization to divide EMPLOYEE2 into two relations. One of the resulting relations is EMPLOYEE1 (Figure 9-5). The other we will call EMP COURSE, which appears with sample data in Figure 9-7. The primary key of this relation is the combination of Emp_ID and Course (we emphasize this by underlining the column names for these attributes). NORMALIZATION Normalization The process of converting complex data structures into simple, stable data structures. We have presented an intuitive discussion of well-structured relations; however, we need rules and a process for designing them. Normalization is a process for converting complex data structures into simple, stable data structures (Date, 2012). CHAPTER 9 DESIGNING DATABASES 319 For example, we used the principles of normalization to convert the EMPLOYEE2 table with its redundancy to EMPLOYEE1 (Figure 9-5) and EMP COURSE (Figure 9-7). Rules of Normalization Normalization is based on well-accepted principles and rules. There are many normalization rules, more than can be covered in this text (see Hoffer et al. [2011], for a more complete coverage). Besides the five properties of relations outlined previously, there are two other frequently used rules: 1. Second normal form (2NF). Each nonprimary key attribute is identified by the whole key (what we call full functional dependency). For example, in Figure 9-7, both Emp_ID and Course identify a value of Date_Completed because the same Emp_ID can be associated with more than one Date_Completed and the same for Course. 2. Third normal form (3NF). Nonprimary key attributes do not depend on each other (what we call no transitive dependencies). For example, in Figure 9-5, Name, Dept, and Salary cannot be guaranteed to be unique for one another. The result of normalization is that every nonprimary key attribute depends upon the whole primary key and nothing but the primary key. We discuss second and third normal form in more detail next. Functional Dependence and Primary Keys Normalization is based on the analysis of functional dependence. A functional dependency is a particular relationship between two attributes. In a given relation, attribute B is functionally dependent on attribute A if, for every valid value of A, that value of A uniquely determines the value of B (Date, 2012; Hoffer et al., 2016). The functional dependence of B on A is represented by an arrow, as follows: A S B (e.g., Emp_ID S Name in the relation of Figure 9-5). Functional dependence does not imply mathematical dependence—that the value of one attribute may be computed from the value of another attribute; rather, functional dependence of B on A means that there can be only one value of B for each value of A. Thus, a given Emp_ID value can have only one Name value associated with it; the value of Name, however, cannot be derived from the value of Emp_ID. Other examples of functional dependencies from Figure 9-3b are in ORDER, Order_Number, Order_Date, and in INVOICE, Invoice_Number, Invoice_Date, and Order_Number. An attribute may be functionally dependent on two (or more) attributes rather than on a single attribute. For example, consider the relation EMP COURSE (Emp_ID,Course,Date_Completed) shown in Figure 9-7. We represent the functional dependency in this relation as follows: Emp_ID,Course S Date_Completed (this is sometimes shown as Emp_ID + Course S Date_Completed). In this case, Date_Completed cannot be determined by either Emp_ID or Course alone because Date_Completed is a characteristic of an employee taking a course. You should be aware that the instances (or sample data) in a relation do not prove that a functional dependency exists. Only knowledge of the problem domain, obtained from a thorough requirements analysis, is a reliable method for identifying a functional dependency. However, you can use sample data to demonstrate that a functional dependency does not exist between two or more attributes. For example, consider the sample data in the relation EXAMPLE(A,B,C,D), shown in Figure 9-8. The sample data in this relation prove that attribute B is not functionally dependent on attribute A because A does not uniquely determine B (two rows with the same value of A have different values of B). Functional dependency A constraint between two attributes in which the value of one attribute is determined by the value of another attribute. EXAMPLE A B C D X Y Z Y U X Y Z X Z Y W Y X Y Z FIGURE 9-8 EXAMPLE relation 320 PART IV DESIGN Second Normal Form Second normal form (2NF) A relation is in second normal form if every nonprimary key attribute is functionally dependent on the whole primary key. A relation is in second normal form (2NF) if every nonprimary key attribute is functionally dependent on the whole primary key. Thus, no nonprimary key attribute is functionally dependent on part, but not all, of the primary key. Second normal form is satisfied if any one of the following conditions apply: 1. The primary key consists of only one attribute (such as the attribute Emp_ID in relation EMPLOYEE1). 2. No nonprimary key attributes exist in the relation. 3. Every nonprimary key attribute is functionally dependent on the full set of primary key attributes. EMPLOYEE2 (Figure 9-6) is an example of a relation that is not in second normal form. The shorthand notation for this relation is EMPLOYEE2(Emp_ID,Name,Dept,Salary,Course,Date_Completed) The functional dependencies in this relation are the following: Emp_ID S Name,Dept,Salary Emp_ID,Course S Date_Completed The primary key for this relation is the composite key Emp_ID,Course. Therefore, the nonprimary key attributes Name, Dept, and Salary are functionally dependent on only Emp_ID but not on Course. EMPLOYEE2 has redundancy, which results in problems when the table is updated. To convert a relation to second normal form, you decompose the relation into new relations using the attributes, called determinants, that determine other attributes; the determinants are the primary keys of these relations. EMPLOYEE2 is decomposed into the following two relations: 1. EMPLOYEE(Emp_ID,Name,Dept,Salary): This relation satisfies the first second normal form condition (sample data shown in Figure 9-5). 2. EMP COURSE(Emp_ID,Course,Date_Completed): This relation satisfies second normal form condition three (sample data appear in Figure 9-7). Third Normal Form Third normal form (3NF) A relation is in second normal form and has no functional (transitive) dependencies between two (or more) nonprimary key attributes. A relation is in third normal form (3NF) if it is in second normal form and there are no functional dependencies between two (or more) nonprimary key attributes (a functional dependency between nonprimary key attributes is also called a transitive dependency). For example, consider the relation SALES (Customer_ID, Customer_ Name,Salesperson,Region) (sample data shown in Figure 9-9a). The following functional dependencies exist in the SALES relation: 1. Customer_ID S Customer_Name,Salesperson,Region (Customer_ID is the primary key.) 2. Salesperson S Region (Each salesperson is assigned to a unique region.) Notice that SALES is in second normal form because the primary key consists of a single attribute (Customer_ID). However, Region is functionally dependent on Salesperson, and Salesperson is functionally dependent on Customer_ID. As a result, there are data maintenance problems in SALES. 1. A new salesperson (Robinson) assigned to the North region cannot be entered until a customer has been assigned to that salesperson (because a value for Customer_ID must be provided to insert a row in the table). CHAPTER 9 SALES Customer_ID Customer_Name Salesperson Region 8023 9167 7924 6837 8596 7018 Anderson Bancroft Hobbs Tucker Eckersley Arnold Smith Hicks Smith Hernandez Hicks Faulb South West South East West North SALES1 SPERSON Customer_ID Customer_Name Salesperson 8023 9167 7924 6837 8596 7018 Anderson Bancroft Hobbs Tucker Eckersley Arnold Smith Hicks Smith Hernandez Hicks Faulb DESIGNING DATABASES 321 FIGURE 9-9 Removing transitive dependencies (a) Relation with transitive dependency (b) Relation in 3NF Salesperson Region Smith Hicks Hernandez Faulb South West East North 2. If customer number 6837 is deleted from the table, we lose the information that salesperson Hernandez is assigned to the East region. 3. If salesperson Smith is reassigned to the East region, several rows must be changed to reflect that fact (two rows are shown in Figure 9-9a). These problems can be avoided by decomposing SALES into the two relations, based on the two determinants, shown in Figure 9-9b. These relations are the following: SALES1(Customer_ID,Customer_Name,Salesperson) SPERSON(Salesperson,Region) Note that Salesperson is the primary key in SPERSON. Salesperson is also a foreign key in SALES1. A foreign key is an attribute that appears as a nonprimary key attribute in one relation (such as SALES1) and as a primary key attribute (or part of a primary key) in another relation. You designate a foreign key by using a dashed underline. A foreign key must satisfy referential integrity, which specifies that the value of an attribute in one relation depends on the value of the same attribute in another relation. Thus, in Figure 9-9b, the value of Salesperson in each row of table SALES1 is limited only to the current values of Salesperson in the SPERSON table. If some sales do not have to have a salesperson, then it is possible for the value of Salesperson to be null (i.e., have no value). Referential integrity is one of the most important principles of the relational model. TRANSFORMING E-R DIAGRAMS INTO RELATIONS Normalization produces a set of well-structured relations that contains all of the data mentioned in system inputs and outputs developed in human interface design. Because these specific information requirements may not represent all future information needs, the E-R diagram you developed in conceptual data modeling is another source of insight into possible data requirements for a new application system. To compare Foreign key An attribute that appears as a nonprimary key attribute in one relation and as a primary key attribute (or part of a primary key) in another relation. Referential integrity A rule that states that either each foreign key value must match a primary key value in another relation or the foreign key value must be null (i.e., have no value). 322 PART IV DESIGN the conceptual data model and the normalized relations developed so far, your E-R diagram must be transformed into relational notation, normalized, and then merged with the existing normalized relations. Transforming an E-R diagram into normalized relations and then merging all the relations into one final, consolidated set of relations can be accomplished in four steps. These steps are summarized briefly here, and then steps 1, 2, and 4 are discussed in detail in the remainder of this chapter. 1. Represent entities. Each entity type in the E-R diagram becomes a relation. The identifier of the entity type becomes the primary key of the relation, and other attributes of the entity type become nonprimary key attributes of the relation. 2. Represent relationships. Each relationship in an E-R diagram must be represented in the relational database design. How we represent a relationship depends on its nature. For example, in some cases we represent a relationship by making the primary key of one relation a foreign key of another relation. In other cases, we create a separate relation to represent a relationship. 3. Normalize the relations. The relations created in steps 1 and 2 may have unnecessary redundancy. So we need to normalize these relations to make them well structured. 4. Merge the relations. So far in database design we have created various relations from both a bottom-up normalization of user views and from transforming one or more E-R diagrams into sets of relations. Across these different sets of relations, there may be redundant relations (two or more relations that describe the same entity type) that must be merged and renormalized to remove the redundancy. Represent Entities Each regular entity type in an E-R diagram is transformed into a relation. The identifier of the entity type becomes the primary key of the corresponding relation. Each nonkey attribute of the entity type becomes a nonkey attribute of the relation. You should check to make sure that the primary key satisfies the following two properties: 1. The value of the key must uniquely identify every row in the relation. 2. The key should be nonredundant; that is, no attribute in the key can be deleted without destroying its unique identification. Some entities may have keys that include the primary keys of other entities. For example, an EMPLOYEE DEPENDENT may have a Name for each dependent, but to form the primary key for this entity, you must include the Employee_ID attribute from the associated EMPLOYEE entity. Such an entity whose primary key depends upon the primary key of another entity is called a weak entity. Representation of an entity as a relation is straightforward. Figure 9-10a shows the CUSTOMER entity type for PVF. The corresponding CUSTOMER relation is represented as follows: CUSTOMER(Customer_ID,Name,Address,City_State_ZIP,Discount) In this notation, the entity type label is translated into a relation name. The identifier of the entity type is listed first and underlined. All nonkey attributes are listed after the primary key. This relation is shown as a table with sample data in Figure 9-10b. Represent Relationships The procedure for representing relationships depends on both the degree of the relationship—unary, binary, ternary—and the cardinalities of the relationship.
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached.

3NF Database Structure for Warehouse

WORKER(Worker_ID, First_Name, Last_Name, Home_Address, Mailing_Address,
Preffered_Number, Secondary_Number, Email_Addre...


Anonymous
Excellent! Definitely coming back for more study materials.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags