Showing posts with label etl interview. Show all posts
Showing posts with label etl interview. Show all posts

ETL tool Interview questions

What is the metadata extension?
Informatica allows end users and partners to extend the metadata stored in the repository by associating information with individual objects in the repository. For example, when you create a mapping, you can store your contact information with the mapping. You associate information with repository metadata using metadata extensions.
Informatica Client applications can contain the following types of metadata extensions:
  • Vendor-defined. Third-party application vendors create vendor-defined metadata extensions. You can view and change the values of vendor-defined metadata extensions, but you cannot create, delete, or redefine them.
  • User-defined. You create user-defined metadata extensions using PowerCenter/PowerMart. You can create, edit, delete, and view user-defined metadata extensions. You can also change the values of user-defined extensions.
 What are the various test procedures used to check whether the data is loaded in the backend, performance of the mapping, and quality of the data loaded in INFORMATICA.
The best procedure to take a help of debugger where we monitor each and every process of mappings and how data is loading based on conditions breaks.

Explain the process of extracting data from source systems,storing in ODS and how data modelling is done.
There are various ways of Extracting Data from Source Systems.For example , You can use a DATA step, an Import Process .It depends with your input data styles. What kind of File/database it is residing in. Storing ur data in an ODS can be done thru an ODS stmt/export stmt/FILE stmt, again which depends on the file & data format ,You want your output to be in.

Where do we use connected and un connected loops
If return port only one then we can go for unconnected. More than one return port is not possible with Unconnected. If more than one return port then go for Connected.

Where do we use connected and un connected lookups
If return port only one then we can go for unconnected. More than one return port is not possible with Unconnected.  If more than one return port then go for Connected. 

Where do we use semi and non additive facts
Additve: A masure can participate arithmatic calulatons using all or any demensions.
Ex: Sales profit
Semi additive: A masure can participate arithmatic calulatons using some demensions.
Ex: Sales amount
Non Additve:A masure can’t  participate arithmatic calulatons using demensions.
Ex: temparature

What is Entity relation?? How is works with Datawarehousing ETL modeling
:Entity is nothing but an Object, it has characteristics.We call entity in terms of Logical view.The entity is called as a table in terms of Physical view.
The Entity relationship is nothing but maintaining a primary key,foreign key relation between the tables for keeping the data and satisfying the Normal form.
There are 4 types of Entity Relationships.
1.One-One, 2.One-Many, 3.Many-One 4.Many-Many
In the Datawarehouse modeling Entity Relationship is nothing but,a Relationship between dimension and facts tables(ie:Primary,foreign key relations between these tables).
The fact table getting data from dimensions tables because it containing primary keys of dimension tables as a foreign keys for getting summarized data for each record.

ETL Dataware housing Interview questions

Explain why and where do we exactly use the lookup tranformations.
You can use the Lookup transformation to perform many tasks, including:
 Get a related value. For example, your source includes employee ID, but you want to
include the employee name in your target table to make your summary data easier to read.
  Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales).
 Update slowly changing dimension tables. You can use a Lookup transformation to
determine whether rows already exist in the target.

How do you tell aggregator stage that input data is already sorted
By enablign sorted input property in Aggregator Properties

What are push and pull etl strategies?Push and Pull strategies determine how data comes from source system to ETL server.
Push : In this case the Source system pushes data i.e.(sends data) to the ETL server.
Pull : In this case the ETL server pulls data i.e.(gets data) from the source system.

What is the Difference between a ODS and Staging Area
ODS :-Operational Data Store which contains data .
ods comes after the staging area
eg:-
In  our e.g lets consider that we have day level Granularity in the OLTP & Year level Granularity in the Data warehouse.
If  the business(manager) asks for  week level Granularity then we have to go to the oltp and summarize  the day level to the week level which would be pain taking.So wat we do is that we maintain  week level Granularity in the ods for the data,for  abt  30 to 90 days.
Note : Ods information would contain cleansed data only. ie after staging area
Staging Area :-
It comes after the etl has finished.Staging Area consists of
1.Meta Data .
2.The  work area where we apply our complex business rules.
3.Hold the data and do calculations.
In other words we can say that its a temp work area.
  
How you capture changes in data if the source system does not have option of storing date/time field in source table from where you need to extract the data?
The DW database can be Oracle or Teradata. The requirement here is to pull data from source system and ETL need to device a mechanism to identify the changes or new records. The source system can be a legacy system like AS400 application or Mainframe application. List out all such methods of data capture. The ETL can be Informatica, data stage or custom etl code.
If LKP on target table is taken, can we update the rows without update strategy transformation?
yes, by using dynamic lookup

In what scenario ETL coding is preferred than Database level SQL,PL/SQL coding?
Data scrubbing process is difficult. That is, file contains date column like 20070823 but data warehouse requires date as 08/23/2007 in that case it is difficult.