what ETL is and the challenges posed by it,how would you address the challenges?

Explain what ETL is and the challenges posed by it and how would you address these challenges?

May 3rd, 2015

ETL refers to  extract, transform and load. It is a concept in relation to data warehousing. 

  - Extract data from data sources

-Transforms the data so that it can be stores in proper format or structure. This way data becomes usable for analysis purposes

- Load the data into final target i.e. to give it as input for analysis


Most of the current test automation tools are focused on transaction application testing and are not designed for testing of data warehouses. Hence it is a common practice to test the ETL by copying a handful of records from source and target tables into an excel spread sheet and compare the data. This is a tedious and error prone process when done manually for large number of ETL. Testing is further complicated because the ETL process can run in an incremental mode or full mode often making it necessary to create separate sets of test cases for these two modes. Every incremental change to the source or target system increases the scope errors and redundancy in the data. 

Thus major challenges are - 

      - Incompatible and duplicate data.

      - Loss of data during ETL process

      - Volume and complexity of data is very huge

    Possible solutions
  • Validating - Validating the correctness of the data at each level. Though it might slows down the process but it increases the accuracy level.
  • Test planning based on the inputs from test estimation and business requirement
  • Designing test cases and test scenarios from all the available inputs
  • Once all the test cases are ready and are approved, testing team proceed to perform pre-execution check and test data preparation for testing

