Showing posts with label datawarehousing interview. Show all posts
Showing posts with label datawarehousing interview. Show all posts

ETL Dataware housing Interview questions

Explain why and where do we exactly use the lookup tranformations.
You can use the Lookup transformation to perform many tasks, including:
 Get a related value. For example, your source includes employee ID, but you want to
include the employee name in your target table to make your summary data easier to read.
  Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales).
 Update slowly changing dimension tables. You can use a Lookup transformation to
determine whether rows already exist in the target.

How do you tell aggregator stage that input data is already sorted
By enablign sorted input property in Aggregator Properties

What are push and pull etl strategies?Push and Pull strategies determine how data comes from source system to ETL server.
Push : In this case the Source system pushes data i.e.(sends data) to the ETL server.
Pull : In this case the ETL server pulls data i.e.(gets data) from the source system.

What is the Difference between a ODS and Staging Area
ODS :-Operational Data Store which contains data .
ods comes after the staging area
eg:-
In  our e.g lets consider that we have day level Granularity in the OLTP & Year level Granularity in the Data warehouse.
If  the business(manager) asks for  week level Granularity then we have to go to the oltp and summarize  the day level to the week level which would be pain taking.So wat we do is that we maintain  week level Granularity in the ods for the data,for  abt  30 to 90 days.
Note : Ods information would contain cleansed data only. ie after staging area
Staging Area :-
It comes after the etl has finished.Staging Area consists of
1.Meta Data .
2.The  work area where we apply our complex business rules.
3.Hold the data and do calculations.
In other words we can say that its a temp work area.
  
How you capture changes in data if the source system does not have option of storing date/time field in source table from where you need to extract the data?
The DW database can be Oracle or Teradata. The requirement here is to pull data from source system and ETL need to device a mechanism to identify the changes or new records. The source system can be a legacy system like AS400 application or Mainframe application. List out all such methods of data capture. The ETL can be Informatica, data stage or custom etl code.
If LKP on target table is taken, can we update the rows without update strategy transformation?
yes, by using dynamic lookup

In what scenario ETL coding is preferred than Database level SQL,PL/SQL coding?
Data scrubbing process is difficult. That is, file contains date column like 20070823 but data warehouse requires date as 08/23/2007 in that case it is difficult.

Dataware housing Interview question and answers

What is Data warehousing?
A data warehouse can be considered as a storage area where interest specific or relevant data is stored irrespective of the source. What actually is required to create a data warehouse can be considered as Dataware housing Data warehousing merges data from multiple sources into an easy and complete form.
What are fact tables and dimension tables?
As mentioned, data in a warehouse comes from the transactions. Fact table in a data warehouse consists of facts and/or measures. The nature of data in a fact table is usually numerical.
On the other hand, dimension table in a data warehouse contains fields used to describe the data in fact tables. A dimension table can provide additional and descriptive information (dimension) of the field of a fact table.
e.g. If I want to know the number of resources used for a task, my fact table will store the actual measure (of resources) while my Dimension table will store the task and resource details.
Hence, the relation between a fact and dimension table is one to many.
What is ETL process in data warehousing?
ETL stands for Extraction, transformation and loading. That means extracting data from different sources such as flat files, databases or XML data, transforming this data depending on the application’s need and loads this data into data warehouse.
Explain the difference between data mining and data warehousing?
Data mining is a method for comparing large amounts of data for the purpose of finding patterns.Data mining is normally used for models and forecasting. Data mining is the process of correlations, patterns by shifting through large data repositories using pattern recognition techniques.
Data warehousing is the central repository for the data of several business systems in an enterprise. Data from various resources extracted and organized in the data warehouse selectively for analysis and accessibility.
What is an OLTP system and OLAP system?
OLTP stands for OnLine Transaction Processing.Application that supports and manges transactions which involve high volumes of data are supported by OLTP system. OLTP is based on client-server architecture and supports transactions across networks.
OLAP stands for OnLine Analytical Processing. Business data analysis and complex calculations on low volumes of data are performed by OLAP. An insight of data coming from various resources can be gained by a user with the support of OLAP.
What are cubes?
Multi dimensional data is logically represented by Cubes in data warehousing. The dimension and the data are represented by the edge and the body of the cube respectively. OLAP environments view the data in the form of hierarchical cube. A cube typically includes the aggregations that are needed for business intelligence queries.

Dataware housing interview question and answers

What is Data warehousing?
Answer
A data warehouse can be considered as a storage area where interest specific or relevant data is stored irrespective of the source. What actually is required to create a data warehouse can be considered as Data Warehousing. Data warehousing merges data from multiple sources into an easy and complete form.
Data warehousing - What is Data warehousing? - May 11, 2009 at 13:40 pm by Vidya Sagar
Answer
Data warehousing is a process of repository of electronic data of an organization. For the purpose of reporting and analysis, data warehousing is used. The essence concept of data warehousing is to provide data flow of architectural model from operational system to decision support environments.