write 2 pages normalization process, programming homework help

User Generated

sbelbh

Programming

Description

The Scenario:

You have been hired by a manufacturing company with multiple manufacturing facilities to perform the logical design of a database to manage their roster of available temporary workers and their subsequent assignments to fill workforce needs. The temporary workers are needed on a seasonal basis to fill a number of job roles such as materials management, van driver, loading/shipping dock worker, assembly line worker, packaging/shipping support, etc. Each manufacturing facility is configured with three different work areas, which are raw materials management, assembly line productions, and packaging/shipping. Here are the requirements that you have identified after interviewing company management:

  • The company would like to record and maintain complete information on each temporary worker, including full name, home address, mailing address (which may or may not differ from the home address), preferred and secondary telephone numbers, email address, date of birth, gender.
  • The manufacturing facilities all operate on a 16-hour daily basis, two 8-hour shifts a day,six days a week, Monday through Saturday. They would also like to maintain information on each temporary worker's work shift availability in terms of days of the week (Monday through Saturday) and work shift (first shift, second shift, "bridge" shift (last 4 hours of first shift and second shift) on the specified days). Workers may specify availability for one or more shifts during those days.
  • In addition, they will need to maintain complete information on each temporary worker's job certifications and information on certification instructors (same information as for temporary workers identified and item 1 above, information on which courses they are qualified to teach, and their teaching histories for the company). A temporary worker must be certified for a job role by completing a required training program for that role and can be certified to perform one or more job roles (see job role examples in the opening paragraph above). For job certifications, the company would like to maintain the identification of the certification, identification of the temporary worker participating in the certification training, the dates of certification training (certification training ranges from 4 to 16 hours depending on the type of certification), the certification trainer's identification, and a pass/fail designation for the temporary worker completing the certification training. The company would also like to maintain complete information on available job role certification training program courses.
  • Temporary workers are also allowed to specify prioritized work area location preferences for one or more manufacturing locations.
  • Finally, they have stated the requirement for maintaining a complete record of each temporary worker's assignments, including manufacturing facility location and work area assignment, date worked, shift(s) worked, and job role(s) filled by shift.

The company intends to use the completed database design to support the identification of temporary workers to fill needed assignments and to produce comprehensive reports on the use of temporary workers.

The Deliverable:

In a Microsoft Word document, provide a complete integrated set of normalized (3NF required) relations, using the format displayed in Figure 9.3e on page 316 (note that entities are in all caps, attribute names with multiple words are connected by an underline character joining each word (for example, "Student_ID"), primary keys are underlined and listed first within the attributes) in Third Normal Form (see pages 318-321 for explanatory information on the normalization process).

An Example of Normalization:

Here is a relatively simplistic example of a normalization process:

An "un-normalized" relation:

STUDENT(Last_Name, First_Name, Course_Name, Instructor_Name, Semester, Year)

There are multiple problems here: multiple students could have the same last name, course name might occur many times within the table and could be entered incorrectly or misspelled in some records, instructor name might occur many times within the table and could be entered incorrectly or misspelled in some records, semester name might occur many times within the table and could be entered incorrectly or misspelled in some records.

The "normalized" version in Third Normal Form:

STUDENT(Student_ID, Last_Name, First_Name)

INSTRUCTOR(Instructor_ID, Last_Name, First_Name)

COURSE(Course_ID, Course_Name)

SEMESTER(Semester_ID, Term, Year)

COURSES_TAKEN(Student_ID,Course_ID,Instructor_ID,Semester_ID)

Rubric


Criteria

Ratings

Pts

This criterion is linked to a Learning OutcomeEntity Identification

All necessary entities correctly identified and clearly named

40.0 pts

All necessary entities identified but one or more unnecessary entities included

36.0 pts

One necessary entity missing

32.0 pts

Two necessary entities missing

28.0 pts

Three or more necessary entities missing

24.0 pts

Nothing submitted

0.0 pts

40.0 pts

This criterion is linked to a Learning OutcomePrimary and foreign key Identification

All entities have correctly specified primary and foreign keys

40.0 pts

One missing or incorrect primary or foreign keys

36.0 pts

Two missing and/or incorrect primary or foreign keys

32.0 pts

Three missing and/or incorrect primary or foreign keys

28.0 pts

Four or more missing and/or incorrect primary or foreign keys

24.0 pts

Nothing submitted

0.0 pts

40.0 pts

This criterion is linked to a Learning OutcomeAttributes

Attributes completely identified for all entities

20.0 pts

No more than two missing attributes

18.0 pts

No more than three missing attributes

16.0 pts

No more than four missing attributes

14.0 pts

More than four missing attributes

12.0 pts

Nothing submitted

0.0 pts

20.0 pts

Total Points: 100.0



PS:Plz read the textbook first which i attached below. Thanks

Unformatted Attachment Preview

Chapter 9 Designing Databases Learning Objectives After studying this chapter, you should be able to 9.1 describe the database design process, its outcomes, and the relational database model; 9.2 describe normalization and the rules for second and third normal form; 9.3 transform an entity-relationship (E-R) diagram into an equivalent set of well-structured (normalized) relations; 9.5 describe physical database design concepts including choosing storage formats for fields in database tables, translating well-structured relations into efficient database tables, explaining when to use different types of file organizations to store computer files, and describing the purpose of indexes and the important considerations in selecting attributes to be indexed. 9.4 merge normalized relations from separate user views into a consolidated set of well-structured relations; and Introduction I n Chapter 8, you learned how to represent an organization’s data graphically using an entity-relationship (E-R), or class, diagram. In this chapter, you will learn guidelines for well-structured and efficient database files, and you will learn about logical and physical database design. It is likely that the human interface and database design steps will happen in parallel, as illustrated in the systems development life cycle (SDLC) in Figure 9-1. Database design has five purposes: 1. Structure the data in stable structures, called normalized tables, that are not likely to change over time and that have minimal redundancy. 2. Develop a logical database design that reflects the actual data requirements that exist in the forms (hard copy and computer displays) and reports of an information system. This is why database design is often done in parallel with the design of the human interface of an information system. 3. Develop a logical database design from which we can do physical database design. Because most information systems today use relational database management systems, logical database design usually uses a relational database model, which represents data in simple tables with common columns to link related tables. 4. Translate a relational database model into a technical file and database design that balances several performance factors. 5. Choose data storage technologies (such as Read/ Write DVD or optical disc) that will efficiently, accurately, and securely process database activities. The implementation of a database (i.e., creating and loading data into files and databases) is done during the next phase of the systems development life cycle. Because implementation is technology specific, we address implementation issues only at a general level in Chapter 13. DATABASE DESIGN File and database design occurs in two steps. You begin by developing a logical database model, which describes data using a notation that corresponds to a data organization used by a database management system. This is the system software responsible for storing, retrieving, and protecting 311 312 PART IV DESIGN FIGURE 9-1 Systems development life cycle with design phase highlighted Planning Analysis Maintenance Implementation Design Databases Forms and Reports Dialogues and Interfaces Finalizing Design Specifications Distributed and Internet Systems data (such as Microsoft Access, Oracle, or SQL Server). The most common style for a logical database model is the relational database model. Once you develop a clear and precise logical database model, you are ready to prescribe the technical specifications for computer files and databases in which to store the data. A physical database design provides these specifications. You typically do logical and physical database design in parallel with other systems design steps. Thus, you collect the detailed specifications of data necessary for logical database design as you design system inputs and outputs. Logical database design is driven not only from the previously developed E-R data model for the application or enterprise but also from form and report layouts. You study data elements on these system inputs and outputs and identify interrelationships among the data. As with conceptual data modeling, the work of all systems development team members is coordinated and shared through the project dictionary or repository. The designs for logical databases and system inputs and outputs are then used in physical design activities to specify to computer programmers, database administrators, network managers, and others how to implement the new information system. For this text, we assume that the design of computer programs and distributed information processing and data networks are topics of other courses, so we concentrate on the aspect of physical design most often undertaken by a systems analyst—physical file and database design. The Process of Database Design Figure 9-2 shows that database modeling and design activities occur in all phases of the systems development process. In this chapter, we discuss methods that help you finalize logical and physical database designs during the design phase. In logical database design, you use a process called normalization, which is a way to build a data model that has the properties of simplicity, nonredundancy, and minimal maintenance. In most situations, many physical database design decisions are implicit or eliminated when you choose the data management technologies to use with the application. We concentrate on those decisions you will make most frequently and use Oracle to illustrate the range of physical database design parameters you must DESIGNING DATABASES CHAPTER 9 FIGURE 9-2 Relationship between data modeling and the SDLC ( ! $! $!%!!  (!"! $!%!!   '! Planning (!  #"! Analysis Maintenance Implementation (! ''!    ' 313 (!"!    $!  !!"!  Design ( !!  % '!    '&!  manage. The interested reader is referred to Hoffer, Ramesh, and Topi (2016) for a more thorough treatment of techniques for logical and physical database design. There are four key steps in logical database modeling and design: 1. Develop a logical data model for each known user interface (form and report) for the application using normalization principles. 2. Combine normalized data requirements from all user interfaces into one consolidated logical database model; this step is called view integration. 3. Translate the conceptual E-R data model for the application or enterprise, developed without explicit consideration of specific user interfaces, into normalized data requirements. 4. Compare the consolidated logical database design with the translated E-R model and produce, through view integration, one final logical database model for the application. During physical database design, you use the results of these four key logical database design steps. You also consider definitions of each attribute; descriptions of where and when data are entered, retrieved, deleted, and updated; expectations for response time and data integrity; and descriptions of the file and database technologies to be used. These inputs allow you to make key physical database design decisions, including the following: t Choosing the storage format (called data type) for each attribute from the logical database model; the format is chosen to minimize storage space and to maximize data quality. Data type involves choosing length, coding scheme, number of decimal places, minimum and maximum values, and potentially many other parameters for each attribute. t Grouping attributes from the logical database model into physical records (in general, this is called selecting a stored record, or data, structure). t Arranging related records in secondary memory (hard disks and magnetic tapes) so that individual records and groups of records can be stored, retrieved, and updated rapidly (called file organization). You should also consider protecting data and recovering data after errors are found. 314 PART IV DESIGN t Selecting media and structures for storing data to make access more efficient. The choice of media affects the utility of different file organizations. The primary structure used today to make access to data more rapid is key indexes on unique and nonunique keys. In this chapter, we show how to do each of these logical database design steps and discuss factors to consider in making each physical file and database design decision. Deliverables and Outcomes Primary key An attribute (or combination of attributes) whose value is unique across all occurrences of a relation. During logical database design, you must account for every data element on a system input or output—form or report—and on the E-R diagram. Each data element (e.g., customer name, product description, or purchase price) must be a piece of raw data kept in the system’s database or, in the case of a data element on a system output, the element can be derived from data in the database. Figure 9-3 illustrates the outcomes from the four-step logical database design process listed earlier. Figures 9-3a and 9-3b (step 1) contain two sample system outputs for a customer order processing system at Pine Valley Furniture (PVF). A description of the associated database requirements, in the form of what we call normalized relations, is listed below each output diagram. Each relation (think of a relation as a table with rows and columns) is named, and its attributes (columns) are listed within parentheses. The primary key attribute—that attribute whose value is unique across all occurrences of the relation—is indicated by an underline, and an attribute of a relation that is the primary key of another relation is indicated by a dashed underline. In Figure 9-3a, data about customers, products, and the customer orders and associated line items for products are shown. Each of the attributes of each relation either appears in the display or is needed to link connected relations. For example, because an order is for a customer, an attribute of ORDER is the associated Customer_ID. The data for the display in Figure 9-3b are more complex. A backlogged product on an order occurs when the amount ordered (Order_Quantity) is less than the amount shipped (Ship_Quantity) for invoices associated with an order. The query refers only to a specified time period, so the Order_Date is needed. The INVOICE Order_Number links invoices with the associated order. Figure 9-3c (step 2) shows the result of integrating these two separate sets of normalized relations. Figure 9-3d (step 3) shows an E-R diagram for a customer order processing application that might be developed during conceptual data modeling, along with equivalent normalized relations. Finally, Figure 9-3e (step 4) shows a set of normalized relations that would result from reconciling the logical database designs of Figures 9-3c and 9-3d. Normalized relations like those in Figure 9-3e are the primary deliverable from logical database design. It is important to remember that relations do not correspond to computer files. In physical database design, you translate the relations from logical database design into specifications for computer files. For most information systems, these files will be tables in a relational database. These specifications are sufficient for programmers and database analysts to code the definitions of the database. The coding, done during systems implementation, is written in special database definition and processing languages, such as Structured Query Language (SQL), or by filling in table definition forms, such as with Microsoft Access. Figure 9-4 shows a possible definition for the SHIPMENT relation from Figure 9-3e using Microsoft Access. This display of the SHIPMENT table definition illustrates choices made for several physical database design decisions. t All three attributes from the SHIPMENT relation, and no attributes from other relations, have been grouped together to form the fields of the SHIPMENT table. t The Invoice Number field has been given a data type of Text, with a maximum length of 10 characters. CHAPTER 9 DESIGNING DATABASES 315 FIGURE 9-3 Simple example of logical data modeling (a) Highest-volume customer query screen (a) HIGHEST-VOLUME CUSTOMER ENTER PRODUCT ID.: M128 START DATE: 11/01/2017 END DATE: 12/31/2017 ––––––––––––––––––––– CUSTOMER ID.: 1256 NAME: Commonwealth Builder VOLUME: 30 This inquiry screen shows the customer with the largest volume of total sales for a specified product during an indicated time period. Relations: CUSTOMER(Customer_ID,Name) ORDER(Order_Number,Customer_ID,Order_Date) --------PRODUCT(Product_ID) LINE ITEM(Order_Number,Product_ID,Order_Quantity) (b) Backlog summary report (b) PAGE 1 BACKLOG SUMMARY REPORT 11/30/2017 BACKLOG QUANTITY 0 0 6 30 … PRODUCT ID B381 B975 B985 E125 2 … M128 This report shows the unit volume of each product that has been ordered less that amount shipped through the specified date. Relations: PRODUCT(Product_ID) LINE ITEM(Product_ID,Order_Number,Order_Quantity) ORDER(Order_Number,Order_Date) SHIPMENT(Product_ID,Invoice_Number,Ship_Quantity) INVOICE(Invoice_Number,Invoice_Date,Order_Number) (c) CUSTOMER(Customer_ID,Name) PRODUCT(Product_ID) INVOICE(Invoice_Number,Invoice_Date,Order_Number) ________ ORDER(Order_Number,Customer_ID,Order_Date) _______ LINE ITEM(Order_Number,Product_ID,Order_Quantity) SHIPMENT(Product_ID,Invoice_Number,Ship_Quantity) (c) Integrated set of relations 316 PART IV DESIGN FIGURE 9-3 (continued) (d) Conceptual data model and transformed relations (d) CUSTOMER Customer_ID Name Address Places ORDER Order_Number Order_Date LINE ITEM Order_Quantity Bills PRODUCT Product_ID Description INVOICE Invoice_Number SHIPMENT Ship_Quantity Relations: CUSTOMER(Customer_ID,Name,Address) PRODUCT(Product_ID,Description) ORDER(Order_Number,Customer_ID,Order_Date) --------LINE ITEM(Order_Number,Product_ID,Order_Quantity) INVOICE(Invoice_Number,Order_Number) ---------SHIPMENT(Invoice_Number,Product_ID,Ship_Quantity) (e) Final set of normalized relations (e) CUSTOMER(Customer_ID,Name,Address) PRODUCT(Product_ID,Description) ORDER(Order_Number,Customer_ID,Order_Date) _______ LINE ITEM(Order_Number,Product_ID,Order_Quantity) INVOICE(Invoice_Number,Order_Number,Invoice_Date) ________ SHIPMENT(Invoice_Number,Product_ID,Ship_Quantity) FIGURE 9-4 Definition of shipment table in Microsoft Access (Source: Microsoft Corporation.) t The Invoice Number field is required because it is part of the primary key for the SHIPMENT table (the value that makes every row of the SHIPMENT table unique is a combination of Invoice Number and Product ID). t An index is defined for the Invoice Number field, but because there may be several rows in the SHIPMENT table for the same invoice (different products on the same invoice), duplicate index values are allowed (so Invoice Number is what we will call a secondary key). CHAPTER 9 DESIGNING DATABASES 317 FIGURE 9-5 EMPLOYEE1 relation with sample data EMPLOYEE1 Emp_ID Name Dept Salary 100 140 110 190 150 Margaret Simpson Allen Beeton Chris Lucero Lorenzo Davis Susan Martin Marketing Accounting Info Systems Finance Marketing 75,000 95,000 90,000 90,000 62,000 Many other physical database design decisions were made for the SHIPMENT table, but they are not apparent on the display in Figure 9-4. Further, this table is only one table in the Pine Valley Furniture Company Order Entry database, and other tables and structures for this database are not illustrated in this figure. The Relational Database Model Many different database models are in use and are the bases for database technologies. Although hierarchical and network models have been popular in the past, these are not used very often today for new information systems. Object-oriented database models are emerging but are still not common. The vast majority of information systems today use the relational database model. The relational database model (Codd, 1970; Date, 2012; Elmasri and Navathe, 2015; Umanath and Scamell, 2014) represents data in the form of related tables, or relations. A relation is a named, twodimensional table of data. Each relation (or table) consists of a set of named columns and an arbitrary number of unnamed rows. Each column in a relation corresponds to an attribute of that relation. Each row of a relation corresponds to a record that contains data values for an entity. Figure 9-5 shows an example of a relation named EMPLOYEE1. This relation contains the following attributes describing employees: Emp_ID, Name, Dept, and Salary. This table has five sample rows, corresponding to five employees. You can express the structure of a relation with a shorthand notation in which the name of the relation is followed (in parentheses) by the names of the attributes in the relation. The identifier attribute (called the primary key of the relation) is underlined. For example, you would express EMPLOYEE1 as follows: Relational database model Data represented as a set of related tables or relations. Relation A named, two-dimensional table of data. Each relation consists of a set of named columns and an arbitrary number of unnamed rows. EMPLOYEE1(Emp_ID,Name,Dept,Salary) Not all tables are relations. Relations have several properties that distinguish them from nonrelational tables: 1. Entries in cells are simple. An entry at the intersection of each row and column has a single value. 2. Entries in a given column are from the same set of values. 3. Each row is unique. Uniqueness is guaranteed because the relation has a nonempty primary key value. 4. The sequence of columns can be interchanged without changing the meaning or use of the relation. 5. The rows may be interchanged or stored in any sequences. Well-Structured Relations What constitutes a well-structured relation (also known as a table)? Intuitively, a wellstructured relation contains a minimum amount of redundancy and allows users to insert, modify, and delete the rows in a table without errors or inconsistencies. Well-structured relation A relation that contains a minimum amount of redundancy and that allows users to insert, modify, and delete the rows without error or inconsistencies; also known as a table. 318 PART IV DESIGN EMPLOYEE2 Emp_ID Name Dept Salary Course 100 100 140 110 110 190 150 150 Margaret Simpson Margaret Simpson Alan Beeton Chris Lucero Chris Lucero Lorenzo Davis Susan Martin Susan Martin Marketing Marketing Accounting Info Systems Info Systems Finance Marketing Marketing 42,000 42,000 39,000 41,500 41,500 38,000 38,500 38,500 SPSS Surveys Tax Acc SPSS C++ Investments SPSS TQM Date_Completed 6/19/2017 10/7/2017 12/8/2017 1/22/2017 4/22/2017 5/7/2017 6/19/2017 8/12/2017 FIGURE 9-6 Relation with redundancy EMP CO URSE FIGURE 9-7 EMP COURSE relation Emp_ID Course Date_ Completed 100 100 140 110 110 190 150 150 SPSS Surveys Tax Acc SPSS C++ Investments SPSS TQM 6/19/2017 10/7/2017 12/8/2017 1/22/2017 4/22/2017 5/7/2017 6/19/2017 8/12/2017 EMPLOYEE1 (Figure 9-5) is such a relation. Each row of the table contains data describing one employee, and any modification to an employee’s data (such as a change in salary) is confined to one row of the table. In contrast, EMPLOYEE2 (Figure 9-6) contains data about employees and the courses they have completed. Each row in this table is unique for the combination of Emp_ID and Course, which becomes the primary key for the table. This is not a well-structured relation, however. If you examine the sample data in the table, you notice a considerable amount of redundancy. For example, the Emp_ID, Name, Dept, and Salary values appear in two separate rows for employees 100, 110, and 150. Consequently, if the salary for employee 100 changes, we must record this fact in two rows (or more, for some employees). The problem with this relation is that it contains data about two entities: EMPLOYEE and COURSE. You will learn to use principles of normalization to divide EMPLOYEE2 into two relations. One of the resulting relations is EMPLOYEE1 (Figure 9-5). The other we will call EMP COURSE, which appears with sample data in Figure 9-7. The primary key of this relation is the combination of Emp_ID and Course (we emphasize this by underlining the column names for these attributes). NORMALIZATION Normalization The process of converting complex data structures into simple, stable data structures. We have presented an intuitive discussion of well-structured relations; however, we need rules and a process for designing them. Normalization is a process for converting complex data structures into simple, stable data structures (Date, 2012). CHAPTER 9 DESIGNING DATABASES 319 For example, we used the principles of normalization to convert the EMPLOYEE2 table with its redundancy to EMPLOYEE1 (Figure 9-5) and EMP COURSE (Figure 9-7). Rules of Normalization Normalization is based on well-accepted principles and rules. There are many normalization rules, more than can be covered in this text (see Hoffer et al. [2011], for a more complete coverage). Besides the five properties of relations outlined previously, there are two other frequently used rules: 1. Second normal form (2NF). Each nonprimary key attribute is identified by the whole key (what we call full functional dependency). For example, in Figure 9-7, both Emp_ID and Course identify a value of Date_Completed because the same Emp_ID can be associated with more than one Date_Completed and the same for Course. 2. Third normal form (3NF). Nonprimary key attributes do not depend on each other (what we call no transitive dependencies). For example, in Figure 9-5, Name, Dept, and Salary cannot be guaranteed to be unique for one another. The result of normalization is that every nonprimary key attribute depends upon the whole primary key and nothing but the primary key. We discuss second and third normal form in more detail next. Functional Dependence and Primary Keys Normalization is based on the analysis of functional dependence. A functional Functional dependency dependency is a particular relationship between two attributes. In a given relation, A constraint between two attributes attribute B is functionally dependent on attribute A if, for every valid value of A, that in which the value of one attribute is determined by the value of another value of A uniquely determines the value of B (Date, 2012; Hoffer et al., 2016). The attribute. functional dependence of B on A is represented by an arrow, as follows: A S B (e.g., Emp_ID S Name in the relation of Figure 9-5). Functional dependence does not imply mathematical dependence—that the value of one attribute may be computed from the value of another attribute; rather, functional dependence of B on A means that there can be only one value of B for each value of A. Thus, a given Emp_ID value can have only one Name value associated with it; the value of Name, however, cannot be derived from the value of Emp_ID. Other examples of functional dependencies from Figure 9-3b are in ORDER, Order_Number, Order_Date, and in INVOICE, Invoice_Number, Invoice_Date, and Order_Number. An attribute may be functionally dependent on two (or more) attributes rather than on a single attribute. For example, consider the relation EMP COURSE (Emp_ID,Course,Date_Completed) shown in Figure 9-7. We represent the functional dependency in this relation as follows: Emp_ID,Course S Date_Completed (this is sometimes shown as Emp_ID + Course S Date_Completed). In this case, Date_Completed cannot be determined by either Emp_ID or Course alone because Date_Completed is a characteristic of an EXAMPLE employee taking a course. You should be aware that the instances (or sample data) in a relation do not A B C D prove that a functional dependency exists. Only knowledge of the problem domain, obtained from a thorough requirements analysis, is a reliable method for identifying X U X Y Y X Z X a functional dependency. However, you can use sample data to demonstrate that a Z Y Y Y functional dependency does not exist between two or more attributes. For example, Y Z W Z consider the sample data in the relation EXAMPLE(A,B,C,D), shown in Figure 9-8. The sample data in this relation prove that attribute B is not functionally dependent on attribute A because A does not uniquely determine B (two rows with the same FIGURE 9-8 value of A have different values of B). EXAMPLE relation 320 PART IV DESIGN Second Normal Form Second normal form (2NF) A relation is in second normal form if every nonprimary key attribute is functionally dependent on the whole primary key. A relation is in second normal form (2NF) if every nonprimary key attribute is functionally dependent on the whole primary key. Thus, no nonprimary key attribute is functionally dependent on part, but not all, of the primary key. Second normal form is satisfied if any one of the following conditions apply: 1. The primary key consists of only one attribute (such as the attribute Emp_ID in relation EMPLOYEE1). 2. No nonprimary key attributes exist in the relation. 3. Every nonprimary key attribute is functionally dependent on the full set of primary key attributes. EMPLOYEE2 (Figure 9-6) is an example of a relation that is not in second normal form. The shorthand notation for this relation is EMPLOYEE2(Emp_ID,Name,Dept,Salary,Course,Date_Completed) The functional dependencies in this relation are the following: Emp_ID S Name,Dept,Salary Emp_ID,Course S Date_Completed The primary key for this relation is the composite key Emp_ID,Course. Therefore, the nonprimary key attributes Name, Dept, and Salary are functionally dependent on only Emp_ID but not on Course. EMPLOYEE2 has redundancy, which results in problems when the table is updated. To convert a relation to second normal form, you decompose the relation into new relations using the attributes, called determinants, that determine other attributes; the determinants are the primary keys of these relations. EMPLOYEE2 is decomposed into the following two relations: 1. EMPLOYEE(Emp_ID,Name,Dept,Salary): This relation satisfies the first second normal form condition (sample data shown in Figure 9-5). 2. EMP COURSE(Emp_ID,Course,Date_Completed): This relation satisfies second normal form condition three (sample data appear in Figure 9-7). Third Normal Form Third normal form (3NF) A relation is in second normal form and has no functional (transitive) dependencies between two (or more) nonprimary key attributes. A relation is in third normal form (3NF) if it is in second normal form and there are no functional dependencies between two (or more) nonprimary key attributes (a functional dependency between nonprimary key attributes is also called a transitive dependency). For example, consider the relation SALES (Customer_ID, Customer_ Name,Salesperson,Region) (sample data shown in Figure 9-9a). The following functional dependencies exist in the SALES relation: 1. Customer_ID S Customer_Name,Salesperson,Region (Customer_ID is the primary key.) 2. Salesperson S Region (Each salesperson is assigned to a unique region.) Notice that SALES is in second normal form because the primary key consists of a single attribute (Customer_ID). However, Region is functionally dependent on Salesperson, and Salesperson is functionally dependent on Customer_ID. As a result, there are data maintenance problems in SALES. 1. A new salesperson (Robinson) assigned to the North region cannot be entered until a customer has been assigned to that salesperson (because a value for Customer_ID must be provided to insert a row in the table). CHAPTER 9 SALES Customer_ID Customer_Name Salesperson Region 8023 9167 7924 6837 8596 7018 Anderson Bancroft Hobbs Tucker Eckersley Arnold Smith Hicks Smith Hernandez Hicks Faulb South West South East West North SALES1 SPERSON Customer_ID Customer_Name Salesperson 8023 9167 7924 6837 8596 7018 Anderson Bancroft Hobbs Tucker Eckersley Arnold Smith Hicks Smith Hernandez Hicks Faulb DESIGNING DATABASES 321 FIGURE 9-9 Removing transitive dependencies (a) Relation with transitive dependency (b) Relation in 3NF Salesperson Region Smith Hicks Hernandez Faulb South West East North 2. If customer number 6837 is deleted from the table, we lose the information that salesperson Hernandez is assigned to the East region. 3. If salesperson Smith is reassigned to the East region, several rows must be changed to reflect that fact (two rows are shown in Figure 9-9a). These problems can be avoided by decomposing SALES into the two relations, based on the two determinants, shown in Figure 9-9b. These relations are the following: SALES1(Customer_ID,Customer_Name,Salesperson) SPERSON(Salesperson,Region) Note that Salesperson is the primary key in SPERSON. Salesperson is also a foreign key in SALES1. A foreign key is an attribute that appears as a nonprimary key attribute in one relation (such as SALES1) and as a primary key attribute (or part of a primary key) in another relation. You designate a foreign key by using a dashed underline. A foreign key must satisfy referential integrity, which specifies that the value of an attribute in one relation depends on the value of the same attribute in another relation. Thus, in Figure 9-9b, the value of Salesperson in each row of table SALES1 is limited only to the current values of Salesperson in the SPERSON table. If some sales do not have to have a salesperson, then it is possible for the value of Salesperson to be null (i.e., have no value). Referential integrity is one of the most important principles of the relational model. TRANSFORMING E-R DIAGRAMS INTO RELATIONS Normalization produces a set of well-structured relations that contains all of the data mentioned in system inputs and outputs developed in human interface design. Because these specific information requirements may not represent all future information needs, the E-R diagram you developed in conceptual data modeling is another source of insight into possible data requirements for a new application system. To compare Foreign key An attribute that appears as a nonprimary key attribute in one relation and as a primary key attribute (or part of a primary key) in another relation. Referential integrity A rule that states that either each foreign key value must match a primary key value in another relation or the foreign key value must be null (i.e., have no value). 322 PART IV DESIGN the conceptual data model and the normalized relations developed so far, your E-R diagram must be transformed into relational notation, normalized, and then merged with the existing normalized relations. Transforming an E-R diagram into normalized relations and then merging all the relations into one final, consolidated set of relations can be accomplished in four steps. These steps are summarized briefly here, and then steps 1, 2, and 4 are discussed in detail in the remainder of this chapter. 1. Represent entities. Each entity type in the E-R diagram becomes a relation. The identifier of the entity type becomes the primary key of the relation, and other attributes of the entity type become nonprimary key attributes of the relation. 2. Represent relationships. Each relationship in an E-R diagram must be represented in the relational database design. How we represent a relationship depends on its nature. For example, in some cases we represent a relationship by making the primary key of one relation a foreign key of another relation. In other cases, we create a separate relation to represent a relationship. 3. Normalize the relations. The relations created in steps 1 and 2 may have unnecessary redundancy. So we need to normalize these relations to make them well structured. 4. Merge the relations. So far in database design we have created various relations from both a bottom-up normalization of user views and from transforming one or more E-R diagrams into sets of relations. Across these different sets of relations, there may be redundant relations (two or more relations that describe the same entity type) that must be merged and renormalized to remove the redundancy. Represent Entities Each regular entity type in an E-R diagram is transformed into a relation. The identifier of the entity type becomes the primary key of the corresponding relation. Each nonkey attribute of the entity type becomes a nonkey attribute of the relation. You should check to make sure that the primary key satisfies the following two properties: 1. The value of the key must uniquely identify every row in the relation. 2. The key should be nonredundant; that is, no attribute in the key can be deleted without destroying its unique identification. Some entities may have keys that include the primary keys of other entities. For example, an EMPLOYEE DEPENDENT may have a Name for each dependent, but to form the primary key for this entity, you must include the Employee_ID attribute from the associated EMPLOYEE entity. Such an entity whose primary key depends upon the primary key of another entity is called a weak entity. Representation of an entity as a relation is straightforward. Figure 9-10a shows the CUSTOMER entity type for PVF. The corresponding CUSTOMER relation is represented as follows: CUSTOMER(Customer_ID,Name,Address,City_State_ZIP,Discount) In this notation, the entity type label is translated into a relation name. The identifier of the entity type is listed first and underlined. All nonkey attributes are listed after the primary key. This relation is shown as a table with sample data in Figure 9-10b. Represent Relationships The procedure for representing relationships depends on both the degree of the relationship—unary, binary, ternary—and the cardinalities of the relationship. CHAPTER 9 (a) CUSTOMER Customer_ID Name Address City_State_Zip Discount (b) 1273 6390 Name Contemporary Designs Casual Corner Address City_State_ZIP Discount 123 Oak St. 18 Hoosier Dr . Austin, TX 28384 Bloomington, IN45821 5% 3% Binary 1: N and 1:1 Relationships A binary one-to-many (1:N) relationship in an E-R diagram is represented by adding the primary key attribute (or attributes) of the entity on the one side of the relationship as a foreign key in the relation that is on the many side of the relationship. Figure 9-11a, an example of this rule, shows the Places relationship (1:N) linking CUSTOMER and ORDER at PVF. Two relations, CUSTOMER and ORDER, were formed from the respective entity types (see Figure 9-11b). Customer_ID, which is the primary key of CUSTOMER (on the one side of the relationship), is added as a foreign key to ORDER (on the many side of the relationship). One special case under this rule was mentioned in the previous section. If the entity on the many side needs the key of the entity on the one side as part of its primary key (this is a so-called weak entity), then this attribute is added, not as a nonkey but as part of the primary key. For a binary or unary one-to-one (1:1) relationship between two entities A and B (for a unary relationship, A and B would be the same entity type), the relationship can be represented by any of the following choices: 1. Adding the primary key of A as a foreign key of B 2. Adding the primary key of B as a foreign key of A 3. Both of the above (a) CUSTOMER Customer_ID Name Address City_State_Zip Discount Places ORDER Order_Number Order_Date Promised_Date FIGURE 9-11 Representing a 1:N relationship (a) E-R diagram (b) CUSTOMER Customer_ID Name Address City_State_ZIP Discount 1273 6390 Contemporary Designs Casual Corner 123 Oak St. 18 Hoosier Dr. Austin, TX 28384 Bloomington, IN 45821 5% 3% ORDER Order_Number 57194 63725 80149 Order_Date Promised_Date Customer_ID 3/15/1X 3/17/1X 3/14/1X 3/28/1X 4/01/1X 3/24/1X 6390 1273 6390 323 FIGURE 9-10 Transforming an entity type to a relation (a) E-R diagram (b) Relations CUSTOMER Customer_ID DESIGNING DATABASES (b) Relations 324 PART IV DESIGN FIGURE 9-12 Representing an M:N relationship (a) E-R diagram (a) PRODUCT Product_ID Description Room City_State_Zip (Other Attributes) ORDER Order_Number Order_Date Promised_Date Requests Ordered_Quantity (b) Relations (b) ORDER Order_Number 613 84 6200 9 628 07 Order_Date Promised_Date 2/17/2014 2/13/2014 2/15/2014 3/01/2017 2/27/2017 3/01/2017 ORDER LINE Order_Number 613 84 613 84 Product_ID Quantity_ Ordered M128 A261 2 1 PRODUCT Product_ID M128 A261 R149 Description Room (Other Attributes) Bookcase Wall unit Cabinet Study Family Study — — — Binary and Higher-Degree M :N Relationships Suppose that there is a binary many-to-many (M :N) relationship (or associative entity) between two entity types A and B. For such a relationship, we create a separate relation C. The primary key of this relation is a composite key consisting of the primary key for each of the two entities in the relationship. Any nonkey attributes associated with the M :N relationship are included with relation C. Figure 9-12a, an example of this rule, shows the Requests relationship (M :N) between the entity types ORDER and PRODUCT for PVF. Figure 9-12b shows the three relations (ORDER, ORDER LINE, and PRODUCT) that are formed from the entity types and the Requests relationship. A relation (called ORDER LINE in Figure 9-12b) is created for the Requests relationship. The primary key of ORDER LINE is the combination (Order_Number,Product_ID), which is the respective primary keys of ORDER and PRODUCT. The nonkey attribute Quantity_Ordered also appears in ORDER LINE. Occasionally, the relation created from an M :N relationship requires a primary key that includes more than just the primary keys from the two related relations. Consider, for example, the following situation: CUSTOMER Customer_ID Name SHIPMENT Date Amount VENDOR Vendor_ID Address CHAPTER 9 DESIGNING DATABASES 325 In this case, Date must be part of the key for the SHIPMENT relation to uniquely distinguish each row of the SHIPMENT table, as follows: SHIPMENT(Customer_ID,Vendor_ID,Date,Amount) If each shipment has a separate nonintelligent key, say, a shipment number, then Date becomes a nonkey and Customer_ID and Vendor_ID become foreign keys, as follows: SHIPMENT(Shipment_Number,Customer_ID,Vendor_ID,Date,Amount) In some cases, there may be a relationship among three or more entities. In such cases, we create a separate relation that has as a primary key the composite of the primary keys of each of the participating entities (plus any necessary additional key elements). This rule is a simple generalization of the rule for a binary M :N relationship. Unary Relationships To review, a unary relationship is a relationship between the instances of a single entity type, which are also called recursive relationships. Figure 9-13 shows two common examples. Figure 9-13a shows a one-to-many relationship named Manages that associates employees with another employee who is their manager. Figure 9-13b shows a many-to-many relationship that associates certain items with their component items. This relationship is called a bill-of-materials structure. For a unary 1:N relationship, the entity type (such as EMPLOYEE) is modeled as a relation. The primary key of that relation is the same as for the entity type. Then a foreign key is added to the relation that references the primary key values. A recursive foreign key is a foreign key in a relation that references the primary key values of that same relation. We can represent the relationship in Figure 9-13a as follows: Recursive foreign key A foreign key in a relation that references the primary key values of that same relation. EMPLOYEE(Emp_ID,Name,Birthdate,Manager_ID) In this relation, Manager_ID is a recursive foreign key that takes its values from the same set of worker identification numbers as Emp_ID. For a unary M :N relationship, we model the entity type as one relation. Then we create a separate relation to represent the M :N relationship. The primary key of this new relation is a composite key that consists of two attributes (which need not have the same name) that both take their values from the same primary key. Any attribute (a) EMPLOYEE Emp_ID Name Birthdate Manages (b) ITEM Item_Number Name Cost Quantity Contains FIGURE 9-13 Two unary relationships (a) EMPLOYEE with Manages relationship (1:N ) (b) Bill-of-materials structure (M:N ) 326 PART IV DESIGN TABLE 9-1 E-R Diagrams to Relational Transformation E-R Structure Relational Representation Regular entity Weak entity Create a relation with primary key and nonkey attributes. Create a relation with a composite primary key (which includes the primary key of the entity on which this weak entity depends) and nonkey attributes. Place the primary key of either entity in the relation for the other entity or do this for both entities. Place the primary key of the entity on the one side of the relationship as a foreign key in the relation for the entity on the many side. Create a relation with a composite primary key using the primary keys of the related entities, plus any nonkey attributes associative entity of the relationship or associative entity. Create a relation with a composite primary key using the primary keys of the related entities and additional primary key attributes associated with the relationship or associative entity, plus any nonkey attributes of the relationship or associative entity. Create a relation with the primary key associated with the relationship or associative entity, plus any nonkey attributes of the relationship or associative entity and the primary keys of the related entities (as foreign key attributes). Create a relation for the superclass, which contains the primary relationship key and all nonkey attributes in common with all subclasses, plus create a separate relation for each subclass with the same primary key (with the same or local name) but with only the nonkey attributes related to that subclass. Binary or unary 1:1 relationship Binary 1:N relationship Binary or unary M:N relationship or associative entity Binary or unary M:N relationship or associative entity with additional key(s) Binary or unary M:N relationship or associative entity with its own key Supertype/subtype associated with the relationship (such as Quantity in Figure 9-13b) is included as a nonkey attribute in this new relation. We can express the result for Figure 9-13b as follows: ITEM(Item_Number,Name,Cost) ITEM-BILL(Item_Number,Component_Number,Quantity) Summary of Transforming E-R Diagrams to Relations We have now described how to transform E-R diagrams to relations. Table 9-1 lists the rules discussed in this section for transforming E-R diagrams into equivalent relations. After this transformation, you should check the resulting relations to determine whether they are in third normal form and, if necessary, perform normalization as described earlier in this chapter. MERGING RELATIONS As part of the logical database design, normalized relations likely have been created from a number of separate E-R diagrams and various user interfaces. Some of the relations may be redundant—they may refer to the same entities. If so, you should merge those relations to remove the redundancy. This section describes merging relations, or view integration, which is the last step in logical database design and prior to physical file and database design. An Example of Merging Relations Suppose that modeling a user interface or transforming an E-R diagram results in the following 3NF relation: EMPLOYEE1(Emp_ID,Name,Address,Phone) CHAPTER 9 DESIGNING DATABASES 327 Modeling a second user interface might result in the following relation: EMPLOYEE2(Emp_ID,Name,Address,Jobcode,Number_of_Years) Because these two relations have the same primary key (Emp_ID) and describe the same entity, they should be merged into one relation. The result of merging the relations is the following relation: EMPLOYEE(Emp_ID,Name,Address,Phone,Jobcode,Number_of_Years) Notice that an attribute that appears in both relations (such as Name in this example) appears only once in the merged relation. View Integration Problems When integrating relations, you must understand the meaning of the data and be prepared to resolve any problems that may arise in the process. In this section, we describe and illustrate four problems that arise in view integration: synonyms, homonyms, dependencies between nonkeys, and class/subclass relationships. Synonyms In some situations, two or more attributes may have different names but the same meaning, as when they describe the same characteristic of an entity. Such attributes are called synonyms. For example, Emp_ID and Employee_Number may be synonyms. When merging relations that contain synonyms, you should obtain, if possible, agreement from users on a single standardized name for the attribute and eliminate the other synonym. Another alternative is to choose a third name to replace the synonyms. For example, consider the following relations: Synonym Two different names that are used for the same attribute. STUDENT1(Student_ID,Name) STUDENT2(Matriculation_Number,Name,Address) In this case, the analyst recognizes that both the Student_ID and the Matriculation_Number are synonyms for a person’s social security number (SSN) and are identical attributes. One possible resolution would be to standardize one of the two attribute names, such as Student_ID. Another option is to use a new attribute name, such as SSN, to replace both synonyms. With the latter approach, merging the two relations would produce the following result: STUDENT(SSN,Name,Address) Homonyms In other situations, a single attribute name, called a homonym, may have more than one meaning or describe more than one characteristic. For example, the term account might refer to a bank’s checking account, savings account, loan account, or other type of account; therefore, account refers to different data, depending on how it is used. You should be on the lookout for homonyms when merging relations. Consider the following example: STUDENT1(Student_ID,Name,Address) STUDENT2(Student_ID,Name,Phone_Number,Address) In discussions with users, the systems analyst may discover that the attribute Address in STUDENT1 refers to a student’s campus address, whereas in STUDENT2 the same attribute refers to a student’s home address. To resolve this conflict, Homonym A single attribute name that is used for two or more different attributes. 328 PART IV DESIGN we would probably need to create new attribute names, and the merged relation would become STUDENT(Student_ID,Name,Phone_Number,Campus_Address,Permanent_ Address) Dependencies between Nonkeys When two 3NF relations are merged to form a single relation, dependencies between nonkeys may result. For example, consider the following two relations: STUDENT1(Student_ID,Major) STUDENT2(Student_ID,Adviser) Because STUDENT1 and STUDENT2 have the same primary key, the two relations may be merged: STUDENT(Student_ID,Major,Adviser) However, suppose that each major has exactly one adviser. In this case, Adviser is functionally dependent on Major: Major S Adviser If this dependency exists, then STUDENT is in 2NF but not 3NF because it contains a functional dependency between nonkeys. The analyst can create 3NF relations by creating two relations with Major as a foreign key in STUDENT: STUDENT(Student_ID,Major) MAJOR ADVISER(Major,Adviser) Class/Subclass Class/subclass relationships may be hidden in user views or relations. Suppose that we have the following two hospital relations: PATIENT1(Patient_ID,Name,Address,Date_Treated) PATIENT2(Patient_ID,Room_Number) Initially, it appears that these two relations can be merged into a single PATIENT relation. However, suppose that there are two different types of patients: inpatients and outpatients. PATIENT1 actually contains attributes common to all patients. PATIENT2 contains an attribute (Room_Number) that is a characteristic only of inpatients. In this situation, you should create class/subclass relationships for these entities: PATIENT(Patient_ID,Name,Address) INPATIENT(Patient_ID,Room_Number) OUTPATIENT(Patient_ID,Date_Treated) LOGICAL DATABASE DESIGN FOR HOOSIER BURGER HOOSIER BURGER Figure 9-14 shows an E-R diagram that has been developed for a new inventory control system at Hoosier Burger. The new system was discussed previously in Chapter 7, where a DFD and decision table (respectively) for the system were created. In this section we show how this E-R model is translated into normalized relations, and how CHAPTER 9 INVOICE Invoice_Number Vendor_ID Invoice_Date Paid? SALE Receipt_Number Sale_Date Sells Includes ITEM SALE Quantity_Sold INVOICE ITEM Quantity_Added Received on Orders PRODUCT Product_ID Product_Description RECIPE Quantity_Used INVENTORY ITEM Item_Number Item_Description Quantity_in_Stock Type_of_Item Minimum_Order_Quantity to normalize and then merge the relations for a new report with the relations from the E-R model. In this E-R model, four entities exist independently of other entities: SALE, PRODUCT, INVOICE, and INVENTORY ITEM. Given the attributes shown in Figure 9-14, we can represent these entities in the following four relations: SALE(Receipt_Number,Sale_Date) PRODUCT(Product_ID,Product_Description) INVOICE(Vendor_ID,Invoice_Number,Invoice_Date,Paid?) INVENTOR Y ITEM(Item_Number,Item_Description,Quantity_in_ Stock,Minimum_Order_Quantity,Type_of_Item) The entities ITEM SALE and INVOICE ITEM as well as the associative entity RECIPE each have a composite primary key taken from the entities to which they relate, so we can represent these three entities in the following three relations: ITEM SALE(Receipt_Number,Product_ID,Quantity_Sold) INVOICE ITEM (Vendor_ID,Invoice_Number,Item_Number,Quantity_Added) RECIPE(Product_ID,Item_Number,Quantity_Used) Because there are no many-to-many, one-to-one, or unary relationships, we have now represented all the entities and relationships from the E-R model. Also, each of the above relations is in 3NF because all attributes are simple, all nonkeys are fully dependent on the whole key, and there are no dependencies between nonkeys in the INVOICE and INVENTORY ITEM relations. Now suppose that Bob Mellankamp wanted an additional report that was not previously known by the analyst who designed the inventory control system for Hoosier Burger. A rough sketch of this new report, listing volume of purchases from DESIGNING DATABASES 329 FIGURE 9-14 Final E-R diagram for Hoosier Burger’s inventory control system 330 PART IV DESIGN each vendor by type of item in a given month, appears in Figure 9-15. In this report, the same type of item may appear many times if multiple vendors supply the same type of item. This report contains data about several relations already known to the analyst, including the following: t INVOICE(Vendor_ID,Invoice_Number,Invoice_Date): Primary keys and the date are needed to select invoices in the specified month of the report. t INVENTORY ITEM(Item_Number,Type_of_Item): Primary key and a nonkey in the report. t INVOICE ITEM (Vendor_ID,Invoice_Number,Item_Number,Quantity_Added): Primary keys and the raw quantity of items invoiced that are subtotaled by vendor and type of item in the report. In addition, the report includes a new attribute—Vendor_Name. After some investigation, an analyst determines that Vendor_ID S Vendor_Name. The whole primary key of the INVOICE relation is Vendor_ID and Invoice_Number, so if Vendor_ Name were part of the INVOICE relation, this relation would violate the 3NF rule. Thus, a new VENDOR relation must be created as follows: VENDOR(Vendor_ID,Vendor_Name) Now, Vendor_ID not only is part of the primary key of INVOICE but also is a foreign key referencing the VENDOR relation. Hence, there must be a one-to-many relationship from VENDOR to INVOICE. The systems analyst determines that an invoice must come from a vendor, and there is no need to keep data about a vendor unless the vendor invoices Hoosier Burger. An updated E-R diagram, reflecting these enhancements for new data needed in the monthly vendor load report, appears in Figure 9-16. The normalized relations for this database are as follows: SALE(Receipt_Number,Sale_Date) PRODUCT(Product_ID,Product_Description) INVOICE(Vendor_ID,Invoice_Number,Invoice_Date,Paid?) INVENTORY ITEM(Item_Number,Item_Description,Quantity_in_Stock, Minimum_Order_Quantity,Type_of_Item) ITEM SALE(Receipt_Number,Product_ID,Quantity_Sold) INVOICE ITEM(Vendor_ID,Invoice_Number,Item_Number,Quantity_Added) RECIPE(Product_ID,Item_Number,Quantity_Used) VENDOR(Vendor_ID,Vendor_Name) Monthly Vendor Load Report for Month: xxxxx Page x ofn Vendor FIGURE 9-15 Hoosier Burger Monthly Vendor Load Report ID Name Type of Item Total Quantity Added V1 V1name V2 V2name aaa bbb ccc bbb mmm nnn1 nnn2 nnn3 nnn4 nnn5 x x x CHAPTER 9 SALE Receipt_Number Sale_Date VENDOR Vendor_ID Vendor_Name INVOICE Invoice_Number Vendor_ID Invoice_Date Paid? Sells DESIGNING DATABASES 331 FIGURE 9-16 E-R diagram corresponding to normalized relations of Hoosier Burger’s inventory control system Includes ITEM SALE Quantity_Sold INVOICE ITEM Quantity_Added Received on Orders PRODUCT Product_ID Product_Description RECIPE Quantity_Used INVENTORY ITEM Item_Number Item_Description Quantity_in_Stock Type_of_Item Minimum_Order_Quantity PHYSICAL FILE AND DATABASE DESIGN Designing physical files and databases requires certain information that should have been collected and produced during prior SDLC phases. This information includes the following: t Normalized relations, including volume estimates t Definitions of each attribute t Descriptions of where and when data are used: entered, retrieved, deleted, and updated (including frequencies) t Expectations or requirements for response time and data integrity t Descriptions of the technologies used for implementing the files and database so that the range of required decisions and choices for each is known Normalized relations are, of course, the result of logical database design. Statistics on the number of rows in each table as well as the other information listed above may have been collected during requirements determination in systems analysis. If not, these items need to be discovered to proceed with database design. We take a bottom-up approach to reviewing physical file and database design. Thus, we begin the physical design phase by addressing the design of physical fields for each attribute in a logical data model. Designing Fields A field is the smallest unit of application data recognized by system software, such as a programming language or database management system. An attribute from a logical database model may be represented by several fields. For example, a student name attribute in a normalized student relation might be represented as three fields: last name, first name, and middle initial. In general, you will represent each attribute from each normalized relation as one or more fields. The basic decisions you must Field The smallest unit of named application data recognized by system software. 332 PART IV DESIGN make in specifying each field concern the type of data (or storage type) used to represent the field and data integrity controls for the field. Choosing Data Types Data type A coding scheme recognized by system software for representing organizational data. A data type is a coding scheme recognized by system software for representing organizational data. The bit pattern of the coding scheme is usually immaterial to you, but the space to store data and the speed required to access data are of consequence in the physical file and database design. The specific file or database management software you use with your system will dictate which choices are available to you. For example, Table 9-2 lists the most commonly used data types available in Oracle 10g. Selecting a data type balances four objectives that will vary in degree of importance depending on the application: 1. 2. 3. 4. Minimize storage space Represent all possible values of the field Improve data integrity for the field Support all data manipulations desired on the field You want to choose a data type for a field that minimizes space, represents every possible legitimate value for the associated attribute, and allows the data to be manipulated as needed. For example, suppose a quantity sold field can be represented by a Number data type. You would select a length for this field that would handle the maximum value, plus some room for growth of the business. Further, the Number data type will restrict users from entering inappropriate values (text), but it does allow negative numbers (if this is a problem, application code or form design may be required to restrict the values to positive ones). Be careful—the data type must be suitable for the life of the application; otherwise, maintenance will be required. Choose data types for future needs by anticipating growth. Also, be careful that date arithmetic can be done so that dates can be subtracted or time periods can be added to or subtracted from a date. Several other capabilities of data types may be available with some database technologies. We discuss a few of the most common of these features next: calculated fields and coding and compression techniques. TABLE 9-2 Commonly Used Data Types in Oracle 10g Data Type Description VARCHAR2 Variable-length character data with a maximum length of 4000 characters; you must enter a maximum field length (e.g., VARCHAR2(30) for a field with a maximum length of 30 characters). A value less than 30 characters will consume only the required space. Fixed-length character data with a maximum length of 255 characters; default length is 1 character (e.g., CHAR(5) for a field with a fixed length of five characters, capable of holding a value from 0 to 5 characters long). Capable of storing up to two gigabytes of one variable-length character data field (e.g., to hold a medical instruction or a customer comment). Positive and negative numbers in the range 10–130 to 10126; can specify the precision (total number of digits to the left and right of the decimal point) and the scale (the number of digits to the right of the decimal point) (e.g., NUMBER(5) specifies an integer field with a maximum of 5 digits and NUMBER(5, 2) specifies a field with no more than five digits and exactly two digits to the right of the decimal point). Any date from January 1, 4712 BC to December 31, 4712 AD; date stores the century, year, month, day, hour, minute, and second. Binary large object, capable of storing up to four gigabytes of binary data (e.g., a photograph or sound clip). CHAR LONG NUMBER DATE BLOB CHAPTER 9 Calculated Fields It is common for an attribute to be mathematically related to other data. For example, an invoice may include a total due field, which represents the sum of the amount due on each item on the invoice. A field that can be derived from other database fields is called a calculated field (or a computed field or a derived field). Recall that a functional dependency between attributes does not imply a calculated field. Some database technologies allow you to explicitly define calculated fields along with other raw data fields. If you specify a field as calculated, you would then usually be prompted to enter the formula for the calculation; the formula can involve other fields from the same record and possibly fields from records in related files. The database technology will either store the calculated value or compute it when requested. DESIGNING DATABASES Calculated field A field that can be derived from other database fields. Also known as a computed field or a derived field. Coding and Compression Techniques Some attributes have very few values from a large range of possible values. For example, suppose that each product from PVF has a finish attribute, with possible values of Birch, Walnut, Oak, and so forth. To store this attribute as text might require 12, 15, or even 20 bytes to represent the longest finish value. Suppose that even a liberal estimate is that PVF will never have more than 25 finishes. Thus, a single alphabetic or alphanumeric character would be more than sufficient. We not only reduce storage space but also increase integrity (by restricting input to only a few values), which helps to achieve two of the physical file and database design goals. Codes also have disadvantages. If used in system inputs and outputs, they can be more difficult for users to remember, and programs must be written to decode fields if codes will not be displayed. Controlling Data Integrity Accurate data are essential for compliance with new national and international regulations, such as Sarbanes-Oxley (SOX) and Basel II. COBIT (Control Objectives for Information and Related Technologies) and ITIL (IT Infrastructure Library) provide standards, guidelines, and rules for corporate governance, risk assessment, security, and controls of data. These preventive controls are best and consistently applied if designed into the database and enforced by the database management system (DBMS). Data integrity controls can be viewed very positively during audits for compliance with regulations. These controls are only as good as the underlying field data controls. We have already explained that data typing helps control data integrity by limiting the possible range of values for a field. There are additional physical file and database design options you might use to ensure higher-quality data. Although these controls can be imposed within application programs, it is better to include these as part of the file and database definitions so that the controls are guaranteed to be applied all the time as well as uniformly for all programs. There are four popular data integrity control methods: default value, range control, referential integrity, and null value control. t Default value. A default value is the value a field will assume unless an explicit value is entered for the field. For example, the city and state of most customers for a particular retail store will likely be the same as the store’s city and state. Assigning a default value to a field can reduce data entry time (the field can simply be skipped during data entry) and data entry errors, such as typing IM instead of IN for Indiana. t Range control. Both numeric and alphabetic data may have a limited set of permissible values. For example, a field for the number of product units sold may have a lower bound of zero, and a field that represents the month of a product sale may be limited to the values JAN, FEB, and so forth. t Referential integrity. As noted earlier in this chapter, the most common example of referential integrity is cross-referencing between relations. For example, 333 Default value A value a field will assume unless an explicit value is entered for that field. 334 PART IV DESIGN Null value A special field value, distinct from zero, blank, or any other value, that indicates that the value for the field is missing or otherwise unknown. consider the pair of relations in Figure 9-17a. In this case, the values for the foreign key Customer_ID field within a customer order must be limited to the set of Customer_ID values from the CUSTOMER relation; we would not want to accept an order for a nonexisting or unknown customer. Referential integrity may be useful in other instances. Consider the employee relation example in Figure 9-17b. In this example, the EMPLOYEE relation has a field of Supervisor_ID. This field refers to the Employee_ID of the employee’s supervisor and should have referential integrity on the Employee_ID field within the same relation. Note in this case that the value of a Supervisor_ID field may be empty because some employees do not have supervisors; therefore, this is a weak referential integrity constraint. t Null value control. A null value is a special field value, distinct from a zero, blank, or any other value, that indicates that the value for the field is missing or otherwise unknown. It is not uncommon that when it is time to enter data— for example, a new customer—you might not know the customer’s phone number. The question is whether a customer, to be valid, must have a value for this field. The answer for this field is probably no, initially, because most data processing can continue without knowing the customer’s phone number. Later, a null value may not be allowed when you are ready to ship a product to the customer. On the other hand, you must always know a value for the Customer_ID field. Due to referential integrity, you cannot enter any customer orders for this new customer without knowing an existing Customer_ID value, and customer name is essential for visual verification of correct data entry. Besides using a special null value when a field is missing its value, you can also estimate the value, produce a report indicating rows of tables with critical missing values, or determine whether the missing value matters when computing needed information. Designing Physical Tables Physical table A named set of rows and columns that specifies the fields in each row of the table. A relational database is a set of related tables (tables are related by foreign keys referencing primary keys). In logical database design, you grouped into a relation those attributes that concern some unifying, normalized business concept, such as a customer, product, or employee. In contrast, a physical table is a named set of rows and columns that specifies the fields in each row of the table. A physical table may or may not correspond to one relation. Whereas normalized relations possess properties of well-structured relations, the design of a physical table has two goals different from those of normalization: efficient use of secondary storage and data processing speed. The efficient use of secondary storage (disk space) relates to how data are loaded on disks. Disks are physically divided into units (called pages) that can be read or written in one machine operation. Space is used efficiently when the physical CUSTOMER (Customer_ID,Cust_Name,Cust_Address, . . .) CUST_ORDER (Order_ID,Customer_ID,Order_Date, . . .) and Customer_ID may not be null because every order must be for some existing customer FIGURE 9-17 Examples of referential integrity field controls (a) Referential integrity between relations (b) Referential integrity within a relation EMPLOYEE(Employee_ID,Supervisor_ID,Empl_Name, . . .) and Superviosr_ID may be null because not all employees have supervisors CHAPTER 9 length of a table row divides close to evenly into the length of the storage unit. For many information systems, this even division is very difficult to achieve because it depends on factors, such as operating system parameters, outside the control of each database. Consequently, we do not discuss this factor of physical table design in this text. A second and often more important consideration when selecting a physical table design is efficient data processing. Data are most efficiently processed when they are stored close to one another in secondary memory, thus minimizing the number of input/output (I/O) operations that must be performed. Typically, the data in one physical table (all the rows and fields in those rows) are stored close together on disk. Denormalization is the process of splitting or combining normalized relations into physical tables based on affinity of use of rows and fields. In Figure 9-18a, a normalized product relation is split into separate physical tables, each containing only engineering, accounting, or marketing product data; the primary key must be included in each table. Note that the Description and Color attributes are repeated in both the engineering and marketing tables because these attributes relate to both kinds of data. In Figure 9-18b, a customer relation is denormalized by putting rows from different geographic regions into separate tables. In both cases, the goal is to create tables that contain only the data used together in programs. By placing data used together close to one another on disk, the number of disk I/O operations to retrieve all the data needed by a program is minimized. The capability to split a table into separate sections, often called partitioning, is possible with most relational database products. With Oracle, there are three types of table partitioning: 1. Range partitioning. Partitions are defined by nonoverlapping ranges of values for a specified attribute (so separate tables are formed of the rows whose specified attribute values fall in indicated ranges). 2. Hash partitioning. A table row is assigned to a partition by an algorithm and then maps the specified attribute value to a partition. 3. Composite partitioning. Combines range and hash partitioning by first segregating data by ranges on the designated attribute, and then within each of these partitions, it further partitions by hashing on the designated attribute. Each partition is stored in a separate contiguous section of disk space, which Oracle calls a tablespace. Denormalization can increase the chance of errors and inconsistencies that normalization avoided. Further, denormalization optimizes certain data processing activities at the expense of others, so if the frequencies of different processing activities change, the benefits of denormalization may no longer exist (Hoffer et al., 2016). Various forms of denormalization, which involves combining data from several normalized tables, can be done, but there are no hard-and-fast rules for deciding when to denormalize data. Here are three common situations (Microsoft, 2015) in which denormalization across tables often makes accessing related data faster (see Figure 9-19 for illustrations): 1. Two entities with a one-to-one relationship. Figure 9-19a shows student data with o ptional data from a standard scholarship application that a student may complete. In this case, one record could be formed with four fields from the STUDENT and SCHOLARSHIP APPLICATION FORM normalized relations. (Note: In this case, fields from the optional entity must have null values allowed.) 2. A many-to-many relationship (associative entity) with nonkey attributes. Figure 9-19b shows price quotes for different items from different vendors. In this case, fields from ITEM and PRICE QUOTE relations might be combined into one physical table to avoid having to combine all three tables together. (Note: This may create considerable duplication of data—in the example, the ITEM fields, such as DESIGNING DATABASES Denormalization 335 The process of splitting or combining normalized relations into physical tables based on affinity of use of rows and fields. 336 PART IV DESIGN FIGURE 9-18 Examples of denormalization (a) Denormalization by columns Normalized Product Relation Product(Product_ID,Description,Drawing_Number,Weight,Color,Unit_Cost, Burden_Rate,Price,Product_Manager) Denormalized Functional Area Product Relations for Tables Engineering: E_Product(Product_ID,Description,Drawing_Number,Weight,Color) Accounting: A_Product(Product_ID,Unit_Cost,Burden_Rate) Marketing: M_Product(Product_ID,Description,Color,Price,Product_Manager) (b) Denormalization by rows Normalized Customer Table CUSTOMER Customer_ID Name Region Annual_Sales 1256 Rogers Atlantic 10,000 1323 Temple Pacific 20,000 1455 Gates South 15,000 1626 Hope Pacific 22,000 2433 Bates South 14,000 2566 Bailey Atlantic 12,000 Name Region Annual_Sales 1256 Rogers Atlantic 10,000 2566 Bailey Atlantic 12,000 Name Region Annual_Sales 1323 Temple Pacific 20,000 1626 Hope Pacific 22,000 Region Annual_Sales Denormalized Regional Customer Tables A_CUSTOMER Customer_ID P_CUSTOMER Customer_ID S_CUSTOMER Customer_ID Name 1455 Gates South 15,000 2433 Bates South 14,000 Description, would repeat for each price quote—and excessive updating if duplicated data change.) 3. Reference data. Figure 9-19c shows that several ITEMs have the same STORAGE INSTRUCTIONS and that STORAGE INSTRUCTIONS relate only to ITEMs. In this case, the storage instruction data could be stored in the ITEM table, thus reducing the number of tables to access but also creating redundancy and the potential for extra data maintenance. CHAPTER 9 (a) STUDENT Student_ID Campus_Address Submits DESIGNING DATABASES 337 FIGURE 9-19 Possible denormalization situations (a) Two entities with a one-to-one relationship SCHOLARSHIP APPLICATION FORM Application_ID Application_Date Qualifications Normalized relations: STUDENT(Student_ID,Campus_Address,Application_ID) APPLICATION(Application_ID,Application_Date,Qualifications,Student_ID) Denormalized relation: STUDENT(Student_ID,Campus_Address,Application_Date,Qualifications) and Application_Date and Qualifications may be null (Note: We assume Application_ID is not necessary when all fields are stored in one record, but this field can be included if it is required application data.) (b) VENDOR ITEM PRICE QUOTE Price Vendor_ID Address Contact_Name (b) A many-to-many relationship with nonkey attributes Item_ID Description Normalized relations: VENDOR(Vendor_ID,Address,Contact_Name) ITEM(Item_ID,Description) PRICE QUOTE(Vendor_ID,Item_ID,Price) Denormalized relations: VENDOR(Vendor_ID,Address,Contact_Name) ITEM-QUOTE(Vendor_ID,Item_ID,Description,Price) (c) STORAGE INSTRUCTIONS Instr_ID Where_Store Container_Type Control for ITEM Item_ID Description (c) Reference data Normalized relations: STORAGE(Instr_ID,Where_Store,Container_Type) ITEM(Item_ID,Description,Instr_ID) Denormalized relation ITEM(Item_ID,Description,Where_Store,Container_Type) Arranging Table Rows The result of denormalization is the definition of one or more physical files. A computer operating system stores data in a physical file, which is a named set of table rows stored in a contiguous section of secondary memory. A file contains rows and columns from one or more tables, as produced from denormalization. To the operating system (e.g., Windows, Linux, or UNIX), each table may be one file or the whole database may be in one file, depending on how the database technology and database designer organize data. The way the operating system arranges table rows Physical file A named set of table rows stored in a contiguous section of secondary memory. 338 PART IV DESIGN File organization A technique for physically arranging the records of a file. in a file is called a file organization. With some database technologies, the systems designer can choose from among several organizations for a file. If the database designer has a choice, he or she chooses a file organization for a specific file that will provide the following: 1. 2. 3. 4. 5. 6. 7. Pointer A field of data that can be used to locate a related field or row of data. Fast data retrieval High throughput for processing transactions Efficient use of storage space Protection from failures or data loss Minimal need for reorganization Accommodation of growth Security from unauthorized use Often these objectives conflict, and you must select an organization for each file that provides a reasonable balance among the criteria within the resources available. To achieve these objectives, many file organizations use a pointer. A pointer is a field of data that can be used to locate a related field or row of data. In most cases, a pointer contains the address of the associated data, which has no business meaning. Pointers are used in file organizations when it is not possible to store related data next to each other. Because this is often the case, pointers are common. In most cases, fortunately, pointers are hidden from a programmer. Because a database designer may need to decide if and how to use pointers, however, we introduce the concept here. Literally hundreds of different file organizations and variations have been created, but we outline the basics of three families of file organizations used in most file management environments: sequential, indexed, and hashed, as illustrated in Figure 9-20. You need to understand the particular variations of each method available in the environment for which you are designing files. Sequential file organization Sequential File Organizations In a sequential file organization, the rows in the file are stored in sequence according to a primary key value (see Figure 9-20a). To locate a particular row, a program must normally scan the file from the beginning until the desired row is located. A common example of a sequential file is the alphabetic list of persons in the white pages of a phone directory (ignoring any index that may be included with the directory). Sequential files are very fast if you want to process rows sequentially, but they are impractical for random row retrievals. Deleting rows can cause wasted space or the need to compress the file. Adding rows requires rewriting the file, at least from the point of insertion. Updating a row may also require rewriting the file, unless the file organization supports rewriting over the updated row only. Only one sequence can be maintained without duplicating the rows. Indexed file organization Indexed File Organizations In an indexed file organization, the rows are stored either sequentially or nonsequentially, and an index is created that allows the application software to locate individual rows (see Figure 9-20b). Like a card catalog in a library, an index is a structure that is used to determine the rows in a file that satisfy some condition. Each entry matches a key value with one or more rows. An index can point to unique rows (a primary key index, such as on the Product_ID field of a product table) or to potentially more than one row. An index that allows each entry to point to more than one record is called a secondary key index. Secondary key indexes are important for supporting many reporting requirements and for providing rapid ad hoc data retrieval. An example would be an index on the Finish field of a product table. One of the most powerful capabilities of indexed file organizations is the ability to create multiple indexes, similar to the title, author, and subject indexes in a library. Search results from the multiple indexes can be combined very quickly to find those records with precisely the combination of values sought. The example in Figure 9-20b, typical of many index structures, illustrates that indexes can be built A file organization in which rows in a file are stored in sequence according to a primary key value. A file organization in which rows are stored either sequentially or nonsequentially, and an index is created that allows software to locate individual rows. Index A table used to determine the location of rows in a file that satisfy some condition. Secondary key One or a combination of fields for which more than one row may have the same combination of values. CHAPTER 9 (a) Start of file DESIGNING DATABASES FIGURE 9-20 Comparison of file organizations (a) Sequential Aces Boilermakers Devils Scan Flyers Hawkeyes Hoosiers . .. Miners Panthers .. . Seminoles . .. (b) Key (Hoosiers) B D Aces F F P Z H L P Flyers R Miners Boilermakers Devils (b) Indexed S Z Seminoles Panthers Hawkeyes Hoosiers Key (c) (Hoosiers) Hashing Algorithm Miners Hawkeyes Aces . .. Relative Record Number Hoosiers Seminoles Devils Flyers Panthers .. . Boilermakers (c) Hashed 339 340 PART IV DESIGN on top of indexes, creating a hierarchical set of indexes, and the data are stored sequentially in many contiguous segments. For example, to find the record with key “Hoosiers,” the file organization would start at the top index and take the pointer after the entry P, which points to another index for all keys that begin with the letters G through P in the alphabet. Then the software would follow the pointer after the H in this index, which represents all those records with keys that begin with the letters G through H. Eventually, the search through the indexes either locates the desired record or indicates that no such record exists. The reason for storing the data in many contiguous segments is to allow room for some new data to be inserted in sequence without rearranging all the data. The main disadvantages to indexed file organizations are the extra space required to store the indexes and the extra time necessary to access and maintain indexes. Usually these disadvantages are more than offset by the advantages. Because the index is kept in sequential order, both random processing and sequential processing are practical. Also, because the index is separate from the data, you can build multiple index structures on the same data file (just as in the library, where there are multiple indexes on author, title, subject, and so forth). With multiple indexes, software may rapidly find records that have compound conditions, such as records of books by Tom Clancy on espionage. The decision of which indexes to create is probably the most important physical database design task for relational database technology, such as Microsoft Access, Oracle, DB2, and similar systems. Indexes can be created for both primary and secondary keys. When using indexes, there is a trade-off between improved performance for retrievals and degrading performance for inserting, deleting, and updating the rows in a file. Thus, indexes should be used generously for databases intended primarily to support data retrievals, such as for decision support applications. Because they impose additional overhead, indexes should be used judiciously for databases that support transaction processing and other applications with heavy updating requirements. Here are some guidelines for choosing indexes for relational databases (Gibson, Hughes, and Remington, 1989): 1. Specify a unique index for the primary key of each table (file). This selection ensures the uniqueness of primary key values and speeds retrieval based on those values. Random retrieval based on primary key value is common for answering multitable queries and for simple data maintenance tasks. 2. Specify an index for foreign keys. As in the first guideline, this speeds processing of multitable queries. 3. Specify an index for nonkey fields that are referenced in qualification and sorting commands for the purpose of retrieving data. To illustrate the use of these rules, consider the following relations for PVF: PRODUCT(Product_Number,Description,Finish,Room,Price) ORDER(Order_Number,Product_Number,Quantity) You would normally specify a unique index for each primary key: Product_ Number in PRODUCT and Order_Number in ORDER. Other indexes would be assigned based on how the data are used. For example, suppose that there is a system module that requires PRODUCT and PRODUCT_ORDER data for products with a price below $500, ordered by Product_Number. To speed up this retrieval, you could consider specifying indexes on the following nonkey attributes: 1. Price in PRODUCT because it satisfies rule 3 2. Product_Number in ORDER because it satisfies rule 2 Because users may direct a potentially large number of different queries to the database, and especially for a system with a lot of ad hoc queries, you will probably have to be selective in specifying indexes to support the most common or frequently CHAPTER 9 TABLE 9-3 DESIGNING DATABASES 341 Comparative Features of Sequential, Indexed, and Hashed File Organizations File Organization Factor Sequential Indexed Hashed Storage space No wasted space No wasted space for data, but extra space for index Sequential retrieval on primary key Random retrieval on primary key Multiple key retrieval Very fast Moderately fast Extra space may be needed to allow for addition and deletion of records Impractical Impractical Moderately fast Very fast Possible, but requires scanning whole file Can create wasted space or require reorganizing Very fast with multiple indexes Not possible If space can be dynamically allocated, this is easy, but requires maintenance of indexes If space can be dynamically allocated, this is easy, but requires maintenance of indexes Easy, but requires maintenance of indexes Very easy Deleting rows Adding rows Requires rewriting file Updating rows Usually requires rewriting file Very easy, except multiple keys with same address require extra work Very easy used queries. See Hoffer et al. (2016) for a more thorough discussion of factors and rules of thumb for selecting indexes. Hashed File Organizations In a hashed file organization, the location of each row is determined using an algorithm (see Figure 9-20c) that converts a primary key value into a row address. Although there are several variations of hashed files, in most cases the rows are located nonsequentially as dictated by the hashing algorithm. Thus, sequential data processing is impractical. On the other hand, retrieval of random rows is very fast. There are issues in the design of hashing file organizations, such as how to handle two primary keys that translate into the same address, but again, these issues are beyond our scope (see Hoffer et al. [2016] for a thorough discussion). Summary of File Organizations The three families of file organizations—sequential, indexed, and hashed—cover most of the file organizations you will have at your disposal as you design physical files and databases. Table 9-3 summarizes the comparative features of these file organizations. You can use this table to help choose a file organization by matching the file characteristics and file processing requirements with the features of the file organization. Designing Controls for Files Two of the goals of physical table design mentioned earlier are protection from failures or data loss and security from unauthorized use. These goals are achieved primarily by implementing controls on each file. Data integrity controls, a primary type of control, were mentioned earlier in this chapter. Two other important types of controls address file backup and security. It is almost inevitable that a file will be damaged or lost, due to either software or human errors. When a file is damaged, it must be restored to an accurate and reasonably current condition. A file and database designer has several techniques for file restoration, including t periodically making a backup copy of a file, t storing a copy of each change to a file in a transaction log or audit trail, or t storing a copy of each row before or after it is changed. Hashed file organization A file organization in which the address of each row is determined using an algorithm. 342 PART IV DESIGN For example, a backup copy of a file and a log of rows after they were changed can be used to reconstruct a file from a previous state (the backup copy) to its current values. This process would be necessary if the current file were so damaged that it could not be used. If the current file is operational but inaccurate, then a log of before images of rows can be used in reverse order to restore a file to an accurate but previous condition. Then a log of the transactions can be reapplied to the restored file to bring it up to current values. It is important that the information system designer make provisions for backup, audit trail, and row image files so that data files can be rebuilt when errors and damage occur. An information system designer can build data security into a file by several means, including the following: t Coding, or encrypting, the data in the file so that they cannot be read unless the reader knows how to decrypt the stored values. t Requiring data file users to identify themselves by entering user names and passwords, and then possibly allowing only certain file activities (read, add, delete, change) for selected users to selected data in the file. t Prohibiting users from directly manipulating any data in the file, but rather force programs and users to work with a copy (real or virtual) of the data they need; the copy contains only the data that users or programs are allowed to manipulate, and the original version of the data will change only after changes to the copy are thoroughly checked for validity. Security procedures such as these all add overhead to an information system, so only necessary controls should be included. PHYSICAL DATABASE DESIGN FOR HOOSIER BURGER HOOSIER BURGER A set of normalized relations and an associated E-R diagram for Hoosier Burger (Figure 9-16) were presented in the section Logical Database Design for Hoosier Burger earlier in this chapter. The display of a complete design of this database would require more documentation than space permits in this text, so we illustrate in this section only a few key decisions from the complete physical database. As outlined in this chapter, to translate a logical database design into a physical database design, you need to make the following decisions: t Create one or more fields for each attribute and determine a data type for each field. t For each field, decide if it is calculated; needs to be coded or compressed; must have a default value or picture; or must have range, referential integrity, or null value controls. t For each relation, decide if it should be denormalized to achieve desired processing efficiencies. t Choose a file organization for each physical file. t Select suitable controls for each file and the database. Remember, the specifications for these decisions are made in physical database design, and then the specifications are coded in the implementation phase using the capabilities of the chosen database technology. These database technology capabilities determine what physical database design decisions you need to make. For example, for Oracle, which we assume is the implementation environment for this illustration, the only choice for file organization is indexed, so the file organization decision becomes which primary and secondary key attributes should be used to build indexes. We illustrate these physical database design decisions only for the INVOICE table. The first decision most likely would be whether to denormalize this table. CHAPTER 9 Based on the suggestions for possible denormalization presented in this chapter, the only possible denormalization of this table would be to combine it with the VENDOR table. Because each invoice must have a vendor, and the only additional data about vendors not in the INVOICE table is the Vendor_Name attribute, this is a good candidate for denormalization. Because Vendor_Name is not very volatile, repeating Vendor_Name in each invoice for the same vendor will not cause excessive update maintenance. If Vendor_ Name is often used with other invoice data when invoice data are displayed, then this would be a good candidate for denormalization. So the denormalized relation to be transformed into a physical table is INVOICE(Vendor_ID,Invoice_Number,Invoice_Date,Paid?,Vendor_Name) The next decision can be what indexes to create. The guidelines presented in this chapter suggest creating an index for the primary key, all foreign keys, and secondary keys used for sorting and qualifications in queries. So we create a primary key index on the combined fields Vendor_ID and Invoice_Number. INVOICE has no foreign keys. To determine what fields are used as secondary keys in query sorting and qualification clauses, we would need to know the content of queries. Also, it would be helpful to know query frequency because indexes do not provide much performance efficiency for infrequently run queries. For simplicity, suppose there were only two frequently run queries that reference the INVOICE table, as follows: 1. Display all the data about all unpaid invoices due this week. 2. Display all invoices ordered by vendor, show all unpaid invoices first, then all paid invoices, and order the invoices of each category in reverse sequence by invoice date. In the first query, both the Paid? and Invoice_Date fields are used for qualification. Paid?, however, may not be a good candidate for an index because there are only two values for this field. The systems analyst would need to discover what percentage of invoices on file are unpaid. If this value is more than 10 percent, then an index on Paid? would not likely be helpful. Invoice_Date is a more discriminating field, so an index on this field would be helpful. In the second query, Vendor_ID, Paid?, and Invoice_Date are used for sorting. Vendor_ID and Invoice_Date are discriminating fields (most values occur in less than 10 percent of the rows), so indexes on these fields will be helpful. Assuming less than 10 percent of the invoices on file are unpaid, then it would make sense to create the following indexes to make these two queries run as efficiently as possible: 1. Primary key index: Vendor_ID and Invoice_Number 2. Secondary key indexes: Vendor_ID,Invoice_Date, and Paid? We do not illustrate security and other types of controls because these decisions are very dependent on unique capabilities of the technology and a complex analysis of what data which users have the right to read, modify, add, or delete. ELECTRONIC COMMERCE APPLICATION: DESIGNING DATABASES Like many other analysis and design activities, designing the database for an Internetbased electronic commerce application is no different than the process followed when designing the database for other types of applications. In the last chapter, you read how Jim Woo and the PVF development team designed the human interface for DESIGNING DATABASES 343 344 PART IV DESIGN the WebStore. In this section, we examine the processes Jim followed when transforming the conceptual data model for the WebStore into a set of normalized relations. Designing Databases for Pine Valley Furniture’s WebStore The first step Jim took when designing the database for the WebStore was to review the conceptual data model—the E-R diagram—developed during the analysis phase of the SDLC (see Figure 8-22 for a review). Given that there were no associative entities—many-to-many relationships—in the diagram, he began by identifying four distinct entity types, which he named CUSTOMER, ORDER, INVENTORY, and SHOPPING_CART Once reacquainted with the conceptual data model, he examined the lists of attributes for each entity. He noted that three types of customers were identified during conceptual data modeling, namely, corporate customers, home office customers, and student customers. Yet all were referred to simply as a “customer.” Nonetheless, because each type of customer had some unique information (attributes) that other types of customers did not, Jim created three additional entity types, or subtypes, of customers: CORPORATE HOME_OFFICE STUDENT Table 9-4 lists the common and unique information about each customer type. As Table 9-4 implies, four separate relations are needed to keep track of customer information without having anomalies. The CUSTOMER relation is used to capture common attributes, whereas the additional relations are used to capture information unique to each distinct customer type. To identify the type of customer within the CUSTOMER relation easily, a Customer_Type attribute is added to the CUSTOMER relation. Thus, the CUSTOMER relation consists of CUSTOMER(Customer_ID,Address,Phone,E-mail,Customer_Type). To link the CUSTOMER relation to each of the separate customer types— CORPORATE, HOME_OFFICE, and STUDENT—all share the same primary key, TABLE 9-4 Common and Unique Information about Each Customer Type Corporate Customer Customer ID Address Phone Number E-Mail Address Corporate Customer Corporate Name Shipping Method Buyer Name Fax Number Common Information About ALL Customer Types Home Office Customer Student Customer Customer ID Address Phone Number E-Mail Address Customer ID Address Phone Number E-Mail Address Unique Information About EACH Customer Type Home Office Customer Student Customer Customer Name Corporate Name Fax Number Customer Name School CHAPTER 9 Customer_ID, in addition to the attributes unique to each. This results in the following relations: CORPORATE(Customer_ID,Corporate_Name,Shipping_Method,Buyer_ Name,Fax) HOME_OFFICE(Customer_ID,Customer_Name,Corporate_Name,Fax) STUDENT(Customer_ID,Customer_Name,School) In addition to identifying all the attributes for customers, Jim also identified the attributes for the other entity types. The results of this investigation are summarized in Table 9-5. As described in Chapter 8, much of the order-related information is captured and tracked within PVF’s Purchasing Fulfillment System. This means that the ORDER relation does not need to track all the details of the order because the Purchasing Fulfillment System produces a detailed invoice that contains all order details such as the list of ordered products, materials used, colors, quantities, and other such information. To access this invoice information, a foreign key, Invoice_ ID, is included in the ORDER relation. To identify easily which orders belong to a specific customer, the Customer_ID attribute is also included in ORDER. Two additional attributes, Return_Code and Order_Status, are also included in ORDER. The Return_Code is used to track the return of an order more easily—or a product within an order—whereas Order_Status is a code used to represent the state of an order as it moves through the purchasing fulfillment process. This results in the following ORDER relation: ORDER(Order_ID,Invoice_ID,Customer_ID,Return_Code,Order_Status) In the INVENTORY entity, two attributes—Materials and Colors—could take on multiple values but were represented as single attributes. For example, Materials represents the range of materials that a particular inventory item could be constructed from. Likewise, Colors is used to represent the range of possible product colors. PVF has a long-established set of codes for representing materials and colors; each of these complex attributes is represented as a single attribute. For example, the value “A” in the Colors field represents walnut, dark oak, light oak, and natural pine, whereas the value “B” represents cherry and walnut. Using this coding scheme, PVF can use a single character code to represent numerous combinations of colors. This results in the following INVENTORY relation: INVENTORY(Inventory_ID,Name,Description,Size,Weight,Materials,Colors, Price,Lead_Time) Finally, in addition to Cart_ID, each shopping cart contains the Customer_ID and Inventory_ID attributes so that each item in a cart can be linked to a particular TABLE 9-5 Attributes for Order, Inventory, and Shopping Cart Entities Order Inventory Shopping_Cart Order_ID (primary key) Invoice_ID (foreign key) Customer_ID (foreign key) Return_Code Order_Status Inventory_ID (primary key) Name Description Size Weight Materials Colors Price Lead_Time Cart_ID (primary key) Customer_ID (foreign key) Inventory_ID (foreign key) Material Color Quantity DESIGNING DATABASES 345 346 PART IV DESIGN inventory item and to a specific customer. In other wor...
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

The answer document is attached. Please let me know if you have any questions!Thank you 😀

EMPLOYEE(Employee_ID,Name_Last,Name_First,Address_Home,Address_Mailing,Phone_Prim
ary,Phone_Secondary,Email,Dob,Gender,Employee_Availability)
JOB(Job_ID,Job_Title,Employee_Qualification,Cert_ID,Work_Area)
LOCATION(Location_ID,Address)
SHIFT(Shift_ID,Employ...


Anonymous
Great! Studypool always delivers quality work.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags