You may have seen publications that tell you that you may have to spend the majority of your data warehouse development time building the means for both the initial and recurring extraction, transforming, and loading of data. What I have not seen, though, is much in–depth discussion of what exactly are those errors in the dirty data that you will spend your time cleaning up. Forewarned is forearmed. If you know the possibility that certain errors exist, you will be more prone to spot them and to plan your project to attack the errors in a manageable way. Perhaps the material in this paper can help you formulate a checklist of errors you will be checking for. What follows is a list of common errors. Also, if you are a relational database expert, bear with my imprecise use of some terminology. Finally, note that when I refer to a data warehouse, I refer to the database that is directly fed with data from the source systems – not the data marts (or whatever you want to call them) that are fed with cleansed data.
Apr 23rd, 2014
Studypool's Notebank makes it easy to buy and sell old notes, study guides, reviews, etc.