Data warehousing concepts (FAQ’s)
1) What is Data warehouse?
Data warehouse is relational database used for query analysis and reporting. By definition data warehouse is Subject-oriented, Integrated, Non-volatile, Time variant.
Subject oriented : Data warehouse is maintained particular subject.
Integrated : Data collected from multiple sources integrated into a
user readable unique format.
Non volatile : Maintain Historical date.
Time variant : data display the weekly, monthly, yearly.
2) What is Data mart?
A subset of data warehouse is called Data mart.
3) Difference between Data warehouse and Data mart?
Data warehouse is maintaining the total organization of data. Multiple data marts used in data warehouse. where as data mart is maintained only particular subject.
4) Difference between OLTP and OLAP?
OLTP is Online Transaction Processing. This is maintained current transactional data. That means insert, update and delete must be fast.
5) Explain ODS?
Operational data store is a part of data warehouse. This is maintained only current transactional data. ODS is subject oriented, integrated, volatile, current data.
6) Difference between Power Center and Power Mart?
Power center receive all product functionality including ability to multiple register servers and metadata across the repository and partition data.
One repository multiple informatica servers. Power mart received all features except multiple register servers and partition data.
7) What is a staging area?
Staging area is a temporary storage area used for transaction, integrated and rather than transaction processing.
When ever your data put in data warehouse you need to clean and process your data.
8) Explain Additive, Semi-additive, Non-additive facts?
Additive fact: Additive Fact can be aggregated by simple arithmetical additions.
Semi-Additive fact: semi additive fact can be aggregated simple arithmetical
additions along with some other dimensions.
Non-additive fact: Non-additive fact can’t be added at all.
9) What is a Fact less Fact and example?
Fact table which has no measures.
10) Explain Surrogate Key?
Surrogate Key is a series of sequential numbers assigned to be a primary key for the table.
11) How many types of approaches in DHW?
Two approaches: Top-down(Inmol approach), Bottom-up(Ralph Kimball)
12) Explain Star Schema?
Star Schema consists of one or more fact table and one or more dimension tables that are related to foreign keys.
Dimension tables are De-normalized, Fact table-normalized
Advantages: Less database space & Simplify queries.
13) Explain Snowflake schema?
Snow flake schema is a normalize dimensions to eliminate the redundancy.The dimension data has been grouped into one large table. Both dimension and fact tables normalized.
14) What is confirm dimension?
If both data marts use same type of dimension that is called confirm dimension.If you have same type of dimension can be used in multiple fact that is called confirm dimension.
15) Explain the DWH architecture?
16) What is a slowly growing dimension?
Slowly growing dimensions are dimensional data,there dimensions increasing dimension data with out update existing dimensions.That means appending new data to existing dimensions.
17) What is a slowly changing dimension?
Slowly changing dimension are dimension data,these dimensions increasing dimensions data with update existing dimensions.
Type1: Rows containing changes to existing dimensional are update in the target by overwriting the existing dimension.In the Type1 Dimension mapping, all rows contain current dimension data.
Use the type1 dimension mapping to update a slowly changing dimension table when you do not need to keep any previous versions of dimensions in the table.
Type2: The Type2 Dimension data mapping inserts both new and changed dimensions into the target.Changes are tracked in the target table by versioning the primary key and creating a version number for each dimension in the table.
Use the Type2 Dimension/version data mapping to update a slowly changing dimension when you want to keep a full history of dimension data in the table.version numbers and versioned primary keys track the order of changes to each dimension.
Type3: The type 3 dimension mapping filters source rows based on user-defined comparisions and inserts only those found to be new dimensions to the target.Rows containing changes to existing dimensions are updated in the target. When updating an existing dimension the informatica server saves existing data in different columns of the same row and replaces the existing data with the updates.
18) When you use for dynamic cache.
Your target table is also look up table then you go for dynamic cache .In dynamic cache multiple matches return an error.use only = operator.
19) what is lookup override?
Override the default SQL statement.You can join multiple sources use lookup override.By default informatica server add the order by clause.
20) we can pass the null value in lookup transformation?
Lookup transformation returns the null value or equal to null value.
21) what is the target load order?
You specify the target load order based on source qualifiers in a mapping.if u have the multiple source qualifiers connected to the multiple targets you can designate the order in which informatica server loads data into the targets.
22) what is default join that source qualifier provides?
Inner equi join.
23) what are the difference between joiner transformation and source qualifier transformation?
You can join heterogeneous data sources in joiner transformation, which we cannot achive in source qualifier transformation.
You need matching keys to join two relational sources in source qualifier transformation.where you doesn’t need matching keys to join two sources.
Two relational sources should come from same data source in source qualifier.You can join relational sources, which are coming from different sources in source qualifier.You can join relational sources which are coming from different sources also.
24) what is update strategy transformation?
Whenever you create the target table whether you are store the historical data or current transaction data in to target table.
25) Describe two levels in which update strategy transformation sets?
26) what is default source option for update strategy transformation?
Data driven.
27) What is data driven?
The information server follows instructions coded into update strategy transformations with in the session mapping determine how to flag records for insert,update,delete or reject if u do not choose data driven option setting , the informatica server ignores all update strategy transformations in the mapping.
28) what are the options in the trarget session of update strategy transformation?
Insert
Delete
Update
Update as update
Update as insert
Update else insert
Truncate table.
29) Difference between the source filter and filter?
Source filter is filtering the data only relational sources. Where as filter transformation filter the data any type of source.
30) what is a tracing level?
Amount of information sent to log file.
-- What are the types of tracing levels?
Normal,Terse,verbose data,verbose intitialization.
--Expalin sequence generator transformation?
-- can you connect multiple ports from one group to multiple transformations?
Yes
31) can you connect more than one group to the same target or transformation?
NO
32) what is a reusable transformation?
Reusable transformation can be a single transformation.This transformation can be used in multiple mappings.when you need to incorporate this transformation into mapping you add an instance of it to mapping.Later if you change the definition of the transformation, all instances of it inherit the changes.Since the instance of reusable transformation is a pointer to that transformation.U can change the transformation in the transformation developer, its instance automatically reflect these changes. This feature can save U great deal of work.
-- what are the methods for creating reusable transformation?
Two methods
1) Design it in the transformation developer.
2) Promote a standard transformation from the mapping designer.After you add a transformation to the mapping, you can promote it to status of reusable transformation.
Once you promote a standard transformation to reusable status, you can demote it to a standard transformation at any time.
If u change the properties of a reusable transformation in mapping , you can revert it to the original reusable transformation properties by clicking the revert.
33) what are mapping parameters and mapping variables?
Mapping parameter represents a constant value that you can define before running a session.A mapping parameter retains the same value throughout the entire session.
When you use the mapping parameter , you declare and use the parameter in a mapping or mapplet.Then define the value of parameter in a parameter file for the session.
Unlike a mapping parameter, a mapping variable represents a value that can change through out the session. The informatica server save the value of mapping variable to the repository at the end of session run and uses that value next time you run the session.
34) can you use the mapping parameters or variables created in one mapping into another mapping?
NO, we can use mapping parameters or variables in any transformation of the same mapping or mapplet in which have crated mapping parameters or variables.
35) Can you are the mapping parameters or variables created in one mapping into any other result transformation.
Yes because the reusable transformation is not contained with any mapplet or mapping.
36) How the informatica server sorts the string values in rank transformation?
When the informatica server runs in the ASCII data movement mode it sorts session data using binary sort order.If you configures the session to use a binary sort order, the informatica server calculates the binary value of each string and returns the specified number of rows with the highest binary values for the string.
37) What is the rank index in rank transformation?
The designer automatically creates a RANKINDEX port for each Rank transformation. The informatica server uses the Rank Index port to store the ranking position for each record in a group.For example, if you create a Rank transformation that ranks the top 5 sales persons for each quarter, the rank index number the salespeople from 1 to 5.
38) what is the mapplet?
Mapplet is a set of transformation that you build in the mapplet designer and you can use in multiple mappings.
39) Difference between mapplet and reusable transformation?
Reusable transformation can be a single transformation.Where as mapplet use multiple transformations.
40) what is a parameter a file?
Paramater file defines the values for parameter and variables.
WORKFLOW MANAGER
41) what is a server?
The power center server moves data from source to targets based on a workflow and mapping metadata stored in a repository.
42) what is a work flow?
A workflow is a set of instructions that describe how and when to run tasks related to extracting,transformation and loading data.
-- what is session?
A session is a set of instructions that describes how to move data from source to target using a mapping.
-- what is workflow monitor?
Use the work flow monitor work flows and stop the power center server.
43) explain a work flow process?
The power center server uses both process memory and system shared memory to perform these tasks.
Load manager process: stores and locks the workflow tasks and start the DTM run the sessions.
Data Transformation Process DTM: Perform session validations,create threads to initialize the session,read,write and transform data, and handle pre and post session operations.
The default memory allocation is 12,000,000 bytes.
44) What are types of threads in DTM?
The main dtm thread is called the master thread.
Mapping thread.
Transformation thread.
Reader thread.
Writer thread.
Pre-and-post session thread.
45) Explain work flow manager tools?
1) Task developer.
2) Work flow designer.
3) Worklet designer.
46) Explain work flow schedule.
You can sehedule a work flow to run continuously, repeat at given time or interval or you manually start a work flow.By default the workflow runs on demand.
47) Explain stopping or aborting a session task?
If the power center is executing a session task when you issue the stop the command the power center stop reading data. If continuous processing and writing data and committing data to targets.
If the power center can’t finish processing and committing data you issue the abort command.
You can also abort a session by using the Abort() function in the mapping logic.
48) What is a worklet?
A worklet is an object that represents a set of taske.It can contain any task available in the work flow manager. You can run worklets inside a workflow. You can also nest a worklet in another worklet.The worklet manager does not provide a parameter file for worklets.
The power center server writes information about worklet execution in the workflow log.
49) what is a commit interval and explain the types?
A commit interval is the interval at which power center server commits data to targets during a session. The commit interval the number of rows you want to use as a basis for the commit point.
Target Based commit: The power center server commits data based on the number of target rows and the key constraints on the target table. The commit point also depends on the buffer block size and the commit interval.
Source-based commit:---------------------------------------------
User-defined commit:----------------------------------------------
50) Explain bulk loading?
You can use bulk loading to improve performance of a session that inserts a large amount of data to a db2,sysbase,oracle or MS SQL server database.
When bulk loading the power center server by passes the database log,which speeds performance.
With out writing to the database log, however the target database can’t perform rollback.As a result you may not be perform recovery.
51) What is a constraint based loading?