BBA3551 Columbia Southern Data Marts and Data Warehouses Comparison Paper

User Generated

XYO171810

Business Finance

BBA3551

Columbia Southern University

Description

Write a one- to two-page (250-500 word) paper that discusses the differences between data warehouses and data marts. Also, discuss how organizations can use data warehouses and data marts to acquire data. You must use the CSU Online Library to locate at least two sources for your paper.

APA rules for formatting, quoting, paraphrasing, citing, and listing of sources are to be followed.

I've attatched the two articles that are required and will cite then upon completion, please cite any additional sources used.

Unformatted Attachment Preview

114 January 2006/Vol. 49, No. 1 COMMUNICATIONS OF THE ACM Seven Key Interventions for DATA WAREHOUSE SUCCESS The success of data warehouses depends on the interaction of technology and social context. We present new insights into the implementation process and interventions that can lead to success. NO DATA WAREHOUSE IMPLEMENTATION CAN succeed on its own. The trick is knowing when and how to intervene. Data warehouses have tremendous potential to present information. They provide the foundation for effective business intelligence solutions for companies seeking competitive advantage. While there have been notable successes [2], there have also been significant failures [4, 7]. What accounts for such conflicting results? In the 1990s, adaptive structuration theory (AST) was developed to understand conflicting results with group decision support systems [5]. This theory analyzes the technological and contextual aspects of the application of a technology, focusing on their interactions. Using AST, we examine the interaction of context and technology, and pinpoint seven key interventions specific to that interaction for data warehouse success. We studied a large organization, in which the data warehouse implementation was successful in some units and unsuccessful in others. From interviews with users in multiple business units, both successful and unsuccessful, we derive several insights that are at odds with previous research. First, users can champion a technology just as successfully as management. Second, a wide range of data can be accessed more successfully than a narrow range (as provided in a data mart). Third, the scope and flexibility of tools offered to users should not always follow the dictum “simpler is better.” While restrictive tools make obtaining information easier for many users, some users will deem the warehouse successful only if they have the more intricate unrestrictive tools required for ad hoc queries and reports. Details of these insights will be provided later. In the organization studied the same data ware- By Tim Chenow e t h, K a r e n Cor r a l , and Hal uk D e mi r ka n COMMUNICATIONS OF THE ACM January 2006/Vol. 49, No. 1 115 house was found to be zational culture can be START both a success and a failure described as very conserby different units within vative and resistant to NO convince intervention the company. (For the change. This conser1 2 YES convince intervention Does top Do users Do users purposes of this study, sucvatism extends to the management support the data support the data YES NO support the data warehouse? warehouse? warehouse? cess was determined only company’s philosophy by whether the warehouse concerning technology. YES was used.) This provides While the company NO an opportunity for underbelieves that technology GO FOR STOP DEVELOPMENT standing data warehouses is important to its corthat is free of some of the porate success, its strong confounding aspects compreference is for mature, Figure 1. Intervention points proven technologies. The company also has a strong monly encountered in during the project’s field studies. Because we initiation phase. belief in the uniqueness of its business model, resultstudied the same technoling in a strong bias for the in-house development of ogy in the same company, as opposed to results from systems. tools or solutions are purchased, they Figure 1. Intervention points during theIfproject initiation phase. different systems and different companies, the must be flexible enough to adapt to the way the observed differences are not due to differences in company conducts business and not dictate business technology or corporate culture. We analyze these dif- processes. ferences using the theoretical framework supplied by In 1995, the company began an in-house developAST: context (social systems); technological innova- ment START effort to design and implement an enterprisetion (in this case, the data warehouse); and the inter- wide data warehouse. They chose to model the NO results indicate action of context and technology. Our warehouse using Inmon’s paradigm [3], which convince intervention 2 YES the interaction of the context with the technology is 1espouses a centralized database, referred to as the convince intervention Do top warehouse,” that is used to integrate and store the key to understanding data warehouse success. “data Do users Do users management the data the data YES inter- support Based on the nature of thesesupport interactions, seven datathe data extractedNOfrom support multiple warehouse? operational systems. warehouse? warehouse? ventions are identified. This data warehouse, in turn, feeds area-specific data marts. Over the years the data warehouse has grown YES in size and complexity as more operational systems COMPANY OVERVIEW NO The company studied is a large, global financial have been integrated into its environment. From the GO effort was very much an institution based in the U.S. TheFORcompany is highly beginning, this development STOP DEVELOPMENT successful and enjoys an excellent reputation both IT-driven initiative. The business justification for the within its industry and among its clients. Its organi- warehouse was never fully developed, and the poten- Chenoweth fig 1 (1/06) If there is no management champion for the warehouse, strongly supportive users of the technology can convince management of the technology’s value. Both management and end users must be convinced of the technology’s value if the project is to go forward. 116 January 2006/Vol. 49, No. 1 COMMUNICATIONS OF THE ACM tial business uses of the warehouse were not carefully proceeded in a top-down fashion, there was a notable enumerated. The prevailing assumption of the devel- success in which the push was bottom-up (intervenFigure 2. Intervention points during the design phase. opment team was that if the warehouse was built and tion point 2 in Figure 1). The users championed the filled with data, then business units would find a use warehouse because they recognized the task fit and value to the organization. They convinced their unit for it. During the fall of 2001, the company sponsored a leader, who initially had been unsupportive of the study to evaluate its enterprise data warehouse, data warehouse, of the warehouse’s value and thereby including how the warehouse was being used. Repre- converted the leader into a supporter. This result is sentatives from the business units affected by the surprising in that it is counter to most prescriptions 3 warehouse were interviewed to determine the nature for technology adoption. It shows that not having a o champion is not necesof those impacts and the want to a sarily a death sentence for degree to which each unit YES ange of YES a technology. If there is had been able to use the a? 3 no management chamwarehouse. We use the Do users want pion for the warehouse, AST framework to anaGO FOR access to a YES DEVELOPMENT broad range of strongly supportive users lyze the results of those data? of the technology can interviews. NO convince management of BUILD A POINTS the technology’s value. INTERVENTION SINGLE Both management and ASTREPOSITORY recognizes that both BUILD BUILD A end users must be conthe features of a technolDATA SINGLE MARTS REPOSITORY vinced of the technoloogy and the context in gy’s value if the project is which it is implemented to go forward. affect the use of that tech4 4 Do Do AST states that the nology. “No matter what users want users want limited data access NO YES ted data access features are YESdesigned into more structured the use and analysis and analysis tools? of a technology is, the a system, users mediate tools? easier it will be to develop technological effects, a consensus among the adapting systems to their PROVIDE PROVIDE UNRESTRICTIVE RESTRICTIVE users about how the needs, resisting them, or TOOLS TOOLS innovation should be refusingPROVIDE to use them at RESTRICTIVE used and the appropriateall” [5]. Because there is a TOOLS Do users ness of that use. Ambiguparticularly high degree understand the task fit? ity about use of the of interaction between technology erodes the the technological dimenDo users users’ comfort with the sions and the contextual understand Figure 2. Intervention points technology, which in turn erodes their respect for it. the task fit? features of a data wareduring the design phase. The third intervention point (see Figure 2) occurs house, AST is an ideal with the design of the warehouse. Data warehouses lens for this study. Important contextual aspects are the “rules and are generally designed around one of two plans. Some resources actors use to generate and sustain this sys- data warehouses are designed with a set of data marts tem” [5]. One of the frequently cited necessary con- that partition the data warehouse into smaller, focused textual conditions for the successful implementation databases tailored to the information needs of a subset Chenoweth fig 2 (1/06) of almost any technology is a champion [1]. It is no of users. Other data warehouses provide a single different for data warehouses. The attitude of a unit’s repository that gives the users a very wide range of leader affects all the factors leading to warehouse data. In the company studied, business units that acceptance. At the company studied, when a leader was strongly supportive (intervention point 1 in Fig- accessed their data through data marts were usually ure 1), users were willing to pursue continuing knowl- more successful than units that accessed it through a edge of the technology even after introductory single repository. That is, most units either requested training. If users do not want to use the data ware- a data mart, or simply did not use the single reposihouse, then it is necessary to provide additional train- tory. Data marts help reduce the inherent complexity and ambiguity associated with data warehouses by ing or motivation to change their attitudes. While acceptance of the data warehouse generally providing a slice of the data that is tailored to meet the COMMUNICATIONS OF THE ACM January 2006/Vol. 49, No. 1 117 PROVIDE UNRESTRICTIVE TOOLS NO PROVIDE RESTRICTIVE TOOLS 5 Do users understand the task fit? PROVIDE TRAINING ON BUSINESS APPLICABILITY YES NO Do users perceive IT as supportive? 6 CREATE COOPERATIVE ENVIRONMENT BETWEEN USERS AND IT YES NO NO Does the unit have one or more power user? NO 7 CREATE POWER USER ROLE YES IMPLEMENTATION AND MAINTENANCE STOP Figure 3. Intervention points during the training and support phase. specific requirements of a business unit. However, one of the most successful uses of this data warehouse was through a single repository. That unit had a strong need for a very broad range of data as well as contextual characteristics that distinguished it from the units that used data weth fig 3 (1/06) marts. In particular, people in that unit were proactive in trying technology. As a result, they were not intimidated by technology and were generally technically knowledgeable. The fourth intervention point (see Figure 2) is in selecting the tools that will be available to the users. Tools range from the highly restrictive, which limit the users’ choices and thereby reduce ambiguity and complexity, to the highly unrestrictive, which require the user to have more expertise. Most units that were successful with data marts preferred restrictive tools 118 January 2006/Vol. 49, No. 1 COMMUNICATIONS OF THE ACM for accessing the data, but the unit that was successful with the single repository rejected restrictive tools. Those users wanted the greater flexibility offered by the unrestrictive tools, and were willing to expend additional effort for more capability. The relevance of the task to the organization, the degree to which a technology supports a task, and the degree to which users understand the task fit can influence the acceptance of the technology. This is the fifth intervention point (see Figure 3). In this study, those business units that successfully used the warehouse clearly understood the relationship between the warehouse and the business issues relevant to the unit. In other words, they saw the applicability of the data warehouse to their tasks. This was especially true of units that received data marts, which not only reduce the ambiguity of the technology, but if properly focused, make the applicability to the unit’s task more obvious. Giving a unit a single repository (instead of the more narrowly focused data mart) can overload the users with information. Beyond the design of the technology, users can be supported by training on the business applicability of the warehouse. This helps them see how the warehouse can assist them in performing their jobs. A lack of knowledge of a technology can lead to difficulties using the technology, or even abandonment of the technology; this leads to the sixth intervention point (see Figure 3). In this study, the business units that successfully used the warehouse typically had two mechanisms to acquire knowledge. First, they had a good working relationship with the warehouse development team. Because of this relationship, members were comfortable going to the warehouse development team for help with problems concerning their data warehouse applications. Conversely, those business units that were experiencing difficulty utilizing the warehouse did not draw on this resource, and generally characterized the development team as unresponsive and “difficult to deal with.” The perceived availability of the data warehouse development team as a support group is an important intervention point in the success of the warehouse. If users lack the perception of support, an intervention will be needed to create a spirit of cooperation. Without that perceived support, users will lack an understanding of the purpose of the warehouse, as well as having difficulty learning how to use it. The second mechanism successful business units had for acquiring knowledge was experts within the unit. These super users understood the data warehouse technology itself and the business issues facing the unit, at least to the degree necessary to extract relevant data from the warehouse. In other words, they The relevance of the task to the organization, the degree to which a technology supports a task, and the degree to which users understand the task fit can influence the acceptance of the technology. understood both the business needs of the unit and the potential of the data warehouse to meet those needs. The super users were also highly skilled at using the tools. These users were recognized as authorities within the unit concerning matters related to the warehouse and played a pivotal role in disseminating knowledge. Providing the resources necessary to create the role of super users is another intervention point for management. We have expressed the seven intervention points as sequential; however, each of these points can be addressed throughout the project. CONCLUSION Organizations are spending millions each year on data warehouse development, but the majority of the efforts fail [6], and little is understood about why these failures occur or how to prevent them. Most previous recommendations for data warehouse applications have suggested one-dimensional approaches. By considering the interaction of the contextual and technical dimensions (the human factors and the specifics of the technology design), the previous results become less contradictory. For example, conventional wisdom holds that having a management champion with a tightly focused (data mart) design and restrictive tools will lead to success. In this case study, we observed that the reverse situation can be just as successful. If the users see the potential of the data warehouse to deliver value to the organization, they can be the champions and convince management to adopt the technology. Similarly, because of its simplicity, the data mart approach is frequently recommended as the preferred approach. Providing what the users want and need is more important. If users understand both the technology and the organization’s business processes, a single data repository may actually be more satisfying for them. In the same way, too little flexibility in tools can be just as harmful as too much. The level of tool flexibility that users require for success varies based on their technical knowledge and their business needs. This article not only explains those paradoxes, but identifies the key points at which interventions may have to occur in order to achieve the level of leadership, focus, and flexibility required for data warehouse success. c References 1. Beath, C.M. Supporting the information technology champion. MIS Q. 15, 3 (Sept. 1991), 354–370. 2. Cooper, B.L., Watson, H.J., Wixom, B.H., and Goodhue, D.L. Data warehousing supports corporate strategy at First American Corporation. MIS Q. 24, 4 (Dec. 2000), 547–567. 3. Inmon, W.H., Imhoff, C., and Sousa, R. Corporate Information Factory. John Wiley, New York, 1997. 4. Kelly, S. Data Warehousing in Action. John Wiley, New York, 1997. 5. Poole, M.S., and DeSanctis, G. Understanding the use of group decision support systems: The theory of adaptive structuration. Reprinted in Organizations and Communications Technology, J. Fulk and C. Steinfield, Eds., Sage Publications, Newbury Park, CA, 1990. 6. Vatanasombut, B., and Gray, P. Factors for success in data warehousing: What the literature tells us. J. Data Warehousing 4, 3 (Fall 1999), 25–33. 7. Watson, H.J., Gerard, J.G., Gonzalez, L.E., Haywood, M.E., and Fenton, D. Data warehousing failures: Case studies and findings. J. Data Warehousing 4, 1 (Spring 1999), 44–55. Tim Chenoweth (timchenoweth@boisestate.edu) is an assistant professor of networks, operations and information systems at Boise State University. Karen Corral (Karen.Corral@asu.edu) is an assistant professor of information systems in the W.P. Carey School of Business at Arizona State University. Haluk Demirkan (Haluk.Demirkan@asu.edu) is an assistant professor of information systems in the W.P. Carey School of Business at Arizona State University. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. © 2006 ACM 0001-0782/06/0100 $5.00 COMMUNICATIONS OF THE ACM January 2006/Vol. 49, No. 1 119 INTERNATIONAL REVIEW OF L AW C OMPUTERS & TECHNOLOGY , VOLUME 11, N UMBER 2, P AGES 251–261, 1997 The Data Mart: A New Approach to Data Warehousing PAM ELA PIPE Introduction Vendors have recently begun to deliver low-cost and integrated data warehouse packages intended for the rapid development of departmental data warehouses, or so-called data marts. The availability of these packages requires organizations to consider the role of a data mart in a data warehousing system, and whether a data mart should be built before, after, or in parallel with a corporate enterprise data warehouse. In some situations a set of distributed data marts may even eliminate the need for an enterprise-le vel data warehouse solution. This paper discusses the role of data marts, reviews the pros and cons of the different approaches to building a data warehousing system involving data marts, and also looks at data mart product requireme nts. Throughout the paper, the SmartM art package from Information Builders Inc. is used to describe the characteristics of an integrated data mart package. Types of Data W arehouse · Data warehouses come in all shapes and sizes, but essentially they can be considered to fall into one of the following categories (see Figure 1): · An enterprise data warehouse (EDW) contains integrated subject-oriented data captured from one or more operational systems or external information providers, and loaded into a separate data warehouse database. An EDW contains summarized as well as detailed data recording business operations over a period of time that may range from a few months to many years. This historical and summarize d data allows detailed analysis of business activity for both tactical and strategic business decision-making. An operational data store (ODS) contains current (or near-current) detailed data for regular day-to-day business querying and reporting. As with an EDW, data in an ODS is captured from operational systems and integrated into a separate subject-oriented data warehouse. Unlike an EDW, an ODS does not contain summarized or historical data. Organizations that deploy an ODS may evolve the ODS to an EDW as they gain experience with data warehousing. This article is based upon a paper prepared for ‘Information Builders’, by Colin W hite, DataBase Associates International, California, USA. Correspondence: Ms Pamela Pipe, Information Builders, Wembley Point, Harrow Road, Wembley, Middx HA9 6DE, UK. , http://www.ib i.comm . 1360-0869/97/020251-1 1 Ó 1997 Carfax Publishing Ltd 252 · FIGURE Pamela Pipe 1. Types of data warehouse. A data mart contains a subset of corporate data that is of value to a speciŽ c department or set of users. This data subset may be captured from one or more operational systems, or from an EDW. Data marts usually contain summ arized historical data for a speciŽ c functional area of a corporation, and have the beneŽ t of being faster and cheaper to build than an EDW. Unlike an EDW , however, they do not provide the capability to analyze corporate data across functional areas of the business. It is important to realize that a data mart is deŽ ned by the functional scope of its users, and not by the size of the databases involved. Most data marts today involve less than 100 GB of data; some are larger, however, and it is expected that as data mart usage increases they will rapidly increase in size. Data marts can be developed prior to, in parallel with, or after the deployment of an EDW for a particular subject area. There is considerabl e debate about the role of data marts with respect to an EDW and how data marts should be deployed— this topic is discussed in the next section. Approaches to Data Warehouse Development · Early proponents of data warehousing recommended building an EDW using a top-down approach driven primarily by the IT department. This approach involves developing a business case, assembling a project team, identifying source data of interest in operational systems, developing a data model, and then building an enterprise data warehouse. With this approach, data marts are seen as a follow-on to the construction of an EDW in a multi-tier topology (see Figure 2). The top-down technique follows the traditional waterfall approach to IT development and has two key beneŽ ts: It provides a rigorous and familiar methodology for modelling and implementing end-user DSS requirements. The Data Mart · FIGURE 2. 253 Multi-tier data warehousing system. It creates a data warehousing system that gives end users the ability to have a corporate-wide perspective of business operations, issues and opportunities for business development. A corporate-wide view of data based on a subject-oriented data model minimizes integration problems between data warehouse projects, regardless of whether they are developed serially or in parallel. · A multi-tier warehousing system involving an EDW and underlying data marts is probably the optimal one for most organizations. With this topology data marts are fed from an EDW, rather than operational systems, and data is located where it can deliver the highest availability and performance, without sacriŽ cing integrity or control over the management of corporate data for business decision-making: · · · · Data marts improve availability by reducing dependency on network access to a remote EDW. Performance is improved by storing the data closer to the user in the data mart. Costs are reduced by the use of low-cost data mart software and hardware. Capacity planning is easier since data marts reduce EDW workload and also in some cases database size. Data control and integrity is maintained, since the EDW is the single source of data for feeding the data marts. · · This vision of a rigorous top-down approach to data warehouse construction is not shared by everyone. Many argue that a top-down approach, where data marts are developed after an EDW is built, has several drawbacks: IT-driven top-down projects in the past have led to long delivery schedules, high capitalization, cost overruns, and poor end-user functionality, even though adequate cost justiŽ cation was done prior to the project. Cost justiŽ cation for a data warehouse may be somewhat elusive, since it is difŽ cult to quantify return on investment (ROI) on a project whose major beneŽ t is the potential for better business decision making. Executive management may Ž nd it difŽ cult to approve such projects given the track record of IT departments. · 254 Pamela Pipe The current economic climate and the global nature of competition demand solutions that enable organizations not only to respond rapidly to changing business conditions, but also to be able to predict and quickly exploit new business opportunities. Data warehouse approaches that have long delivery cycles cannot deliver solutions quickly enough to address these challenges. Top-down advocates argue that an iterative approach to EDW development, where the EDW is built subject area by subject area, reduces development time. Even the most optimistic supporters of this approach, however, acknowledge that EDW developme nt time may be closer to a year (or possibly longer). · An alternative to the top-down development of an EDW and underlying data marts is Ž rst to build the data marts, and then an EDW. Some advocates of this bottom-up approach even argue that an EDW is not necessary, since a corporate view of data can be achieved by combining data marts together in a distributed data warehouse using database hubserver middleware . The middleware provides a seamless interface between end user DSS tools and the data marts by insulating the user from the physical locations of the data marts. A hub server can also provide other services, for example, enhanced security, business views of data, and facilities for monitoring and controlling end-user access to data. People who favour top-down warehouse development can rightly point out several disadvantages of the bottom-up approach: · · An uncontrolle d proliferation of data marts can result in integration problems between data marts and a future EDW. Most of these potential issues would be due to differences in business terminology, data formats and representations, and required attributes not being present in the data mart data models. These problems are the primary, if not the only, objection that top-down advocates have to the implementation of data marts prior to the construction of the EDW. These problems are caused by the lack of a corporate data model for the warehouse. As data marts proliferate, users will want to access data marts belonging to other departments for cross-business function analysis. Seamless access to data marts may be difŽ cult without appropriate hub-server middleware . Needless to say, such an environment is complex to administer and manage, and can also lead to poor performance when end-user queries access and join data from multiple data marts. Data mart technology at present is often unable to scale-up to support the increased data volumes and numbers of users that would exist in a distributed data mart environment. The current state of technology often limits data marts to less than 100GB of data and a maximum of 25 to 50 users. Of course these limits will vary by vendor product. Another important consideration here is the ability of a data mart product to be integrated with an EDW in a multi-tier data warehouse topology at some future date if so desired. It is apparent that both the top-down and bottom-up approaches to data warehouse development have their strengths and weaknesses. What most organizations require is a data warehouse strategy that provides  exibility, low capitalization, and a rapid ROI, without unduly sacriŽ cing control or introducing potential integration problems in the future. An ideal solution, therefore, would be a synergistic marriage of the two approaches that maximize the strengths, and minimize the weaknesses, of each approach. The parallel strategy suggested below uses a combination of the top-down and bottom-up approaches The Data Mart 255 and supports incremental and evolutionary data warehouse development, while at the same time attempting to solve the issues with other approaches that this paper raises: 1 Consider the use of an ODS for tactical day-to-day business decision-m aking when existing operational systems cannot supply integrated and consistent data, or when direct access by end users to operational data would adversely affect the performance of the operational environment. 2 Develop data marts as required for complex data analysis and strategic business decision-m aking by business functional area. This developme nt should be based on a high-level subject data model, which documents the boundaries and data relationships that exist between functional areas. This data model will evolve and will form the basis for a future EDW. In fact, an EDW can be developed in parallel with the data marts using this approach. A data warehouse project team and appropriate checkpoint mechanisms should be put in place to coordinate the efforts of data mart and EDW development. The critical consideration here is the development of the high-level data model—the initial effort should not exceed more than a few months of elapsed time. The developme nt of such a model will reduce, but not necessarily eliminate, future integration problems between projects. 3 Connect data marts together in a distributed warehouse environment using appropriate database hub-server middleware . A distributed data warehouse should be viewed as a tactical solution en route to a multi-tier data warehouse, unless usage, cost and performance considerations do not warrant an EDW. 4 Move toward a multi-tier data warehouse topology where an EDW acts as the single source of end-user data for supplying underlying data marts. Business conditions and priorities will determine whether or not this evolutionary parallel approach is a viable solution for an organization. The features and capabilities of data mart products will also in uence the success of this approach. The criterion for selecting data mart products is the subject of the rest of this paper. Data Mart Product Requireme nts · · · · · · · · A data warehousing system provides a complete end-to-end solution for supplying end users with business information. The main components of such a system consist of (see Figure 3): Design tools to design warehouse databases. Source data acquisition tools to capture data from source Ž les and databases; and clean, enhance, transport and apply it to data warehouse databases. A data manager to manage and access warehouse data. GUI and Web-based data access tools to provide business end-users with the tools they need to access and analyze warehouse data. A delivery manager to distribute warehouse data and other information objects to other data warehouses, desktop applications, and Web servers on a corporate Intranet. Middleware to connect data access tools to warehouse databases, and the delivery manager to target systems. An information directory to provide administrators and business users with information about the contents and meaning of data stored in warehouse databases. Warehouse management tools to administer data warehouse operations. 256 FIGURE Pamela Pipe 3. Data warehousing system architecture. In addition to supplying products that support these components, vendors may also provide consulting services to train IT staff to install and deploy data warehousing products. When building ODS and EDW systems, organizations typically select and integrate, based on application requirements, best-of-breed products for each of the components shown in Figure 3. When building a data mart, however, the cost and time to integrate best-of-breed products from multiple vendors is not acceptable, and instead organizations tend to select a data mart package (or set of integrated products) from a single vendor, which supports the components shown in the Ž gure. In fact it is the move toward low cost and integrated data warehouse product sets that has encouraged the growth of data marts. Potential buyers should, however, consider long-term data warehouse requirements when selecting a data mart package. Of particular importance is the ability of the data mart package to support growth as database size, numbers of users, and query volume increase with data mart usage and experience. Also of importance here with respect to future growth is the ability of the product to support distributed and multi-tier data warehousing topologies. The next section of this paper looks at each of the components of a data warehousing system in more detail. Design tools Design tools are used by data warehouse designers and administrators to design and deŽ ne data warehouse databases. In general, any database design tool can be used for designing a warehouse database. Compared with operational database design, there are some additional factors that need to be taken into account when designing a warehouse database, such as the handling of summary and temporal data, for example. Another factor is the use of star schemas, which are employed by some warehouse databases, particularly those used for multidimensional data analysis. The Data Mart 257 Source data acquisition tools · · · · Source data acquisition tools are used to develop and run data aquisition applications that capture data from source systems for applying to warehouse databases. Data acquisition applications are developed based on rules deŽ ned by the warehouse developer. These rules deŽ ne the data sources from which the warehouse data will be obtained, and the cleanup and enhanceme nt to be done to this data before it is applied to warehouse databases. Data cleanup may involve the restructuring of records or Ž elds, removal of operationalonly data, the supply of missing Ž eld values, and the checking of data integrity and consistency. Data enhancemen t may involve decoding and translating Ž eld values, adding a time attribute (if one is not present in the source data) to re ect the currency of data, data summarization, or the calculation of derived values. Once the source data has been cleaned and enhanced it is mapped to the target warehouse databases, transported to the data warehousing system, and applied to the appropriate warehouse databases. The applying of data to the warehouse databases is done using data manipulation language statements (SQL, for example, in the case of a relational DBMS), or the load utility of the DBMS used to manage the warehouse. The rules deŽ ned to data acquisition tools are often stored as metadata in the warehouse information directory. Some products use this metadata to generate customized 3GL/4GL data acquisition programs. Other products use this metadata dynamically during data acquisition operations to manage the  ow of data from source systems to the target data warehouse. There are four main types of product that support data acquisition: Code generators create tailored 3GL/4GL data acquisition programs based on source data deŽ nitions, and cleanup and enhancemen t rules deŽ ned by the warehouse developer. This approach reduces the need for an organization to write its own data acquisition programs. Most code generator products generate query language statements to capture data from the source systems. Some products also support the capturing of changes to source data by using, for example, the recovery log Ž les of the source system. Code generators are used for building an EDW that acquires data from a large number of different data sources and where there is signiŽ cant data cleanup to be done. Database data replication tools capture changes to a source database on one system and apply the changes to a copy of the source database located on a different system. These replication products often do not support the capture of changes to non-relational Ž les and databases, and often do not provide facilities for signiŽ cant data cleanup and enhanceme nt. These tools are used to build an ODS, EDW or data mart when the number of data sources and the amount of data cleanup required are small. Rule-driven data movers capture data from the source system at user-deŽ ned intervals, clean and enhance the data, and then send and apply the results to the target warehouse database. Data to be captured from the source system is usually deŽ ned using query language statements, and cleanup and enhanceme nt are done based on a script or function logic deŽ ned to the data pump. Depending on the product, the data mover may reside on the source system, the target system, or on a separate server. These products are used to build data marts, rather than an ODS or EDW. Data re-engineering tools are designed to perform data cleanup and enhanceme nt. Some of these focus on structural changes to data, while others are designed to handle the cleanup of data content, such as name and address data, for example. These products are 258 Pamela Pipe · often used in conjunction with other data acquisition tools for building an ODS or an EDW. Generalized data acquisition tools and utilities copy data from a source system to a target system. There are many tools and utilities that do not Ž t into any of the four categories outlined above for moving data from a source system to a target system. These products tend to focus on the fast movement of data, rather than on supporting the data integration, cleanup, and enhanceme nt requirements of a data warehousing system. Data manager The data manager is used by other components in the data warehousing system to create, manage and access warehouse data (and possibly metadata). The data manager employed by a data warehousing system is usually either a relational DBMS (RDBMS) or a multidimensional DBMS (MDBMS). An RDBM S is used for building either an ODS, an EDW or a data mart, while an MDBMS is used for building a data mart for doing multidimensional analysis. The pros and cons of using an RDBMS or MDBMS for building a data mart are discussed in more detail below in the section on data access tools. Warehouse DBMSs have requireme nts over and above those for operational OLTP applications. Key factors to consider here are scalability (database size, query complexity, number of users, number of dimensions, software exploitation of underlying hardware), and performance (utility operations and complex query processing). As query complexity and database size increases, data warehouse designers will need to consider the use of parallel hardware and parallel database software in order to obtain satisfactory performance. Data access tools Data access tools support three main user tasks: querying and reporting of known facts, analysis of known facts, and the discovery of unknown facts. Querying and reporting involves displaying data stored in a data warehouse in a visual form. This visual form may, for example, be a printed report, or information displayed on a desktop computer. The processing may be done on-line or in batch, using ad hoc or pre-deŽ ned queries and reports. Tools supporting this type of task have existed for many years. Modern tools, however, often offer more sophisticated facilities like graphical user interfaces, support for Web browsers, the ability to pass retrieved data to desktop applications using techniques such as Microsoft OLE, and a semantic layer to provide a business view of the data. Query and reporting tools are most often used to provide information for tracking day-to-day business operations, i.e., for making tactical business decisions. The beneŽ t a data warehouse offers here is data that has been cleaned and integrated from multiple operational systems. Data analysis tools provide capabilities like mathematical and statistical functions, multidimensional modeling, and forecasting. These tools are used to analyze and forecast trends, and to measure the efŽ ciency of business operations over a period of time. The results of this work are used for strategic business decision-m aking, and to look for ways of improving the efŽ ciency of business operations and reducing costs. This type of processing is sometimes called On-Line Analytical Processing, or OLAP. Many of the tools providing this type of processing have existed for many years, but like query and reporting tools, OLAP products are now taking advantage of graphical user interfaces, Web technol- The Data Mart 259 ogy, and client/server computing. The beneŽ t of data warehousing for OLAP is not only clean and integrated data, but also the provision of the historical data that is essential for forecasting and trend analysis. OLAP is a hot topic, and there is much debate concerning the use of OLAP tools for doing multi-dimensional data analysis (MDA). These tools allow users to analyze and slice and dice data across multiple dimensions, for example, by time, by market, and/or by product category. Vendors offer two kinds of tool: MDA client tools that access data stored in multi-dimensional database systems (MDBM Ss), and MDA client tools that access data stored in relational DBM Ss. The debate centres around which type of DBMS is best suited to store and maintain multi-dimensional data. Much of what is written about the pros and cons of each approach is superŽ cial and lacks strong technical arguments, often confusing user needs with technical issues. When analyzing MDA products, the discussion should focus on two key, but separate, issues. The Ž rst is the requirements of the user, i.e., the functionality provided by the MDA client tool. The second issue is performance and scalability, and this is where the MDBMS vs. RDBMS issues have to be considered. There is not sufŽ cient space in this paper to go into the arguments being put forward for each of the two MDA approaches. SufŽ ce it to say that both approaches have som e merit, and the type of MDA tool should be selected based on the kind of processing being done by the end-user and the type of data required to support this processing. Data analysis tools typically work with summarized, rather than detailed data. These summaries can be built during analytical processing, but it is far more efŽ cient to pre-build the summaries whenever possible. This reduces processing overheads and makes the tools easier to use. Data marts are ideally suited for the storing of summarized data that has been tailored to suit speciŽ c sets of users and applications. Query, reporting and data analysis tools are used to process or to look for known facts, i.e. users of these tools know what kind of information they want to access and analyze. Starting to appear on the market is a new breed of business intelligence tool that is used to explore data for unknown facts, i.e. information that is not known to the business user. This style of processing allows business users to seek new business opportunities, and to look for previously unknown data patterns—to examine customer buying habits or to detect fraud, for example. Data exploration involves digging through large amounts of historical detailed data that is typically kept in an EDW. Tools that support data exploration are also sometimes called data mining or data discovery tools. One important DSS tool direction by vendors is to add support for their use from a Web browser. The use of a Web browser to access a data mart on an internal corporate Intranet or the public Internet provides business users with a cost-effective , easy-to-use, and platform-independent data access capability. Delivery manager The warehouse delivery manager is used to distribute data collections (a data collection is a set of data of interest to a speciŽ c user or group of users) to other data warehouses and end-user applications such as spreadsheets, local databases, and so forth. The content of the data collection is usually deŽ ned by the warehouse administrator and published to end users in the information directory. Users subscribe to a collection and deŽ ne a delivery schedule using the information assistant facility of the information directory. Delivery of data may be based on time-of-day or on the completion of an external event. 260 Pamela Pipe As experience with data warehouses increases, organizations are likely to employ data delivery approaches to supplement the facilities provided by data access tools. Of key importance here will be the ability to deliver warehouse data collections and information objects, such as reports, to business users over corporate intranets. Warehouse middleware Warehouse middleware provides connectivity to warehouse databases from end-user data access tools. Standard database hub server middleware can be used to perform this task, but vendors are beginning to ship specialised middleware designed for a data warehousing environment. This specialized middleware provides features such as the ability to create a single business view of data stored in multiple data warehouses (multiple, data marts, for exam ple) and facilities to monitor, track and control access to warehouse data. Information directory · · · An information directory is the roadmap to a data warehousing system—it helps users navigate their way around the information stored in a warehouse and understand the meaning of this information from a business perspective. It achieves this by providing a set of tools for integrating, maintaining and viewing warehouse metadata. In the same way as a warehouse database integrates data from multiple data sources, an information directory integrates metadata from multiple metadata sources. The main elements of the information directory are the metadata manager, technical and business metadata, and the information assistant (sometimes called an information navigator). The metadata manager is used to maintain, export and import warehouse metadata. Technical metadata contains information about warehouse data for use by warehouse designers and administrators when carrying out warehouse development and management tasks. Technical metadata documents information about data sources, data targets, data cleanup rules, data enhancemen t rules, and data mapping between data sources and the warehouse databases. Most of the information is created when the warehouse designer deŽ nes the data sources and targets, and the rules to be used when copying data into the warehouse. It may also be imported from an external system, such as a 3GL copybook library, DBMS system catalogue, or a CASE tool. Technical metadata about the amount of data in the warehouse and the date it was created or updated should also be stored in the directory. Ideally this information should be added by the tools employed to capture data from operational systems and apply it to the warehouse databases. Technical metadata about how end users access and use warehouse data should also be trapped, and added to the directory to enable designers and administrators to tune and enhance the data warehouse. Business metadata contains information that gives end users an easy-to-understa nd business perspective of the data in the warehouse. This information includes the mapping of business subject areas to technical metadata, details about predeŽ ned queries and reports, business terms and associated technical nam es, and details about the custodian of the data. The business metadata is usually created by the warehouse administrator— it may also be imported from external systems, such as CASE tools or decision support tools. · The Data Mart 261 The information assistant provides warehouse end users with easy access to the business and technical metadata. It helps users discover what data exists in the warehouse and understand its business meaning. Some products also help users create, document and run queries, reports, or analyses, and order data that cannot be found in the warehouse. · In reality no vendor at present provides a complete information directory as deŽ ned above. Vendors provide instead two main types of product for maintaining warehouse metadata: · Technical metadata repositories for managing metadata associated with the technical tasks of building a data warehousing system— database design, source data acquisition and warehouse management. Business information directories that focus on business metadata and support for business user access to a data warehousing system. Warehouse Management Tools Warehouse management tools provide a set of systems management services for maintaining the data warehousing environment. These services include managing data acquisition operations, archiving warehouse data, the backup and recovery of data, securing and authorizing access to warehouse data, and managing and tuning data access operations. At present, there are few tools designed explicitly for managing data warehousing systems, and most data warehouse administrators employ the facilities of the warehouse DBMS to perform these tasks. Conclusion There is more to data warehousing than just copying operational data into a separate informational database. A data warehousing system should provide a complete solution for managing the  ow of information from existing corporate databases and external sources into end-user decision support systems. It should make it easy for business users to Ž nd out what information exists in the warehouse, and provide tools to access and analyze that information. A data warehouse system may contain an operational data store or an enterprise data warehouse that manages corporate-wide information, and one or more data marts for handling departmental function-level information. The top-down and bottom-up methods of building such a system are not the most cost-effective approaches in most cases. Instead, a hybrid parallel approach to development is suggested. Regardless of which method is used, however, data marts will become a key component of data warehousing systems since they provide a cost-effective way of building data warehouses. Organizations evaluating data mart solutions should look for a single integrated package that supports the key components of a data warehousing system as outlined in this paper. Such a solution should also be capable of being integrated into a distributed data mart topology and/or a multi-tier conŽ guration involving multiple data marts and an enterprise data warehouse. Pamela Pipe Wembley UK
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

See attached. ...

Similar Content

Related Tags