1) Define Data Stage? A data stage is basically a tool that is used to design, develop and execute various applications to fill multiple tables in. Datastage best practices, FAQ, tips and tricks and sample solutions with real- world examples. + Data Stage Interview Questions and Answers, Question1: Explain Data Stage? Question2: Tell how a source file is populated? Question3: Write the.
|Published (Last):||23 March 2016|
|PDF File Size:||20.77 Mb|
|ePub File Size:||6.84 Mb|
|Price:||Free* [*Free Regsitration Required]|
A data stage is basically a tool that is used to design, develop and execute various applications to fill multiple tables in data warehouse or data marts. Fxqs is a program for Windows servers that extracts data from databases and change them into data warehouses. Merge means to join two or more tables. The two tables are joined on the basis of Primary key columns in both the tables. In datastage, there is a concept of partition, parallelism for node configuration.
While, there is no concept of partition and parallelism in informatica for node configuration. Also, Informatica is more scalable than Datastage. Datastage is more user-friendly as compared to Informatica. Routines are basically collection of functions that is defined by DS manager.
It can be called via transformer stage. There are three types of routines such as, parallel routines, main frame routines and server routines. Such routines are also created in DS manager and can be called from datastagee stage.
Duplicates can be removed by using Sort stage.
In order to improve performance of Datastage jobs, we have to first establish the baselines. Secondly, we should not use only one flow for performance testing. Thirdly, we should work in increment. Then, we should evaluate data skews. Then we dahastage isolate and solve the problems, one by one. After that, we should distribute the file systems to remove bottlenecks, if any.
Datastage Interview questions with Answers
Last but not the least, we should understand and assess the available tuning knobs. All the three concepts are different from each other in the way they use the memory storage, compare input requirements and how they treat various records. Join and Merge needs less memory as compared to the Lookup stage. Quality stage is also known as Integrity stage.
It assists in integrating different types of data from various sources. This tool is used to execute multiple jobs simultaneously, without using any kind of loop. In Symmetric Multiprocessing, the hardware resources are shared by processor. The processor has one operating system and it communicates through shared memory. While in Massive Parallel processing, the processor access the hardware resources exclusively. This type of processing is also known as Shared Nothing, since nothing is shared in this.
It is faster than the Symmetric Multiprocessing. In Datastage, validating a job means, executing a job. While validating, the Datastage engine verifies whether all the required properties are provided or not. In other case, while compiling a job, the Datastage engine verifies that whether all the given properties are valid or not.
We can use date conversion function for this purpose i. All the stages after the exception activity in Datastage are executed in case of any unknown error occurs while executing the job sequencer. It is also used to store the node information, disk storage information and scratch information. There are two types of Lookups in Datastage i. Normal lkp and Sparse lkp.
In Normal lkp, the data is saved in the memory first and then the lookup is performed. In Sparse lkp, the data is directly saved in the database. Therefore, the Sparse lkp is faster than the Normal lkp. In Datastage, the Repository is another name for a data warehouse. It can be centralized as well as distributed. IConv is basically used to convert formats for system to understand.
While, OConv is used to convert formats for users to understand. In Datastage, Usage Analysis is performed within few clicks.
Launch Datastage Manager and right faqa the job. Basis on this hash key feature, searching in Hash file is faster than in sequential file.
In Datastage, routines are of two types i. We can call a routine from the transformer stage in Datastage. We can say, ODS is a mini data warehouse. It can be used to incorporate other languages such as French, German, and Spanish etc. These languages have same scripts as English language.
In order to improve performance in Datastage, it is recommended, not to use more than 20 stages in every job. If you need to use more than 20 stages then it is better to use another job for those stages. I have worked with these tools and possess hands on experience of working with these third party tools.
Whenever we launch the Datastage client, we are asked to connect to a Datastage project. There are two types of hash files in DataStage i. The static hash file is used when limited amount of data is to be loaded in the target database.
In Datastage, MetaStage is used to save metadata that is helpful for data lineage and data analysis. This knowledge is useful in Datastage because sometimes one has to write UNIX programs such as batch programs to invoke batch processing etc. Transaction adtastage means the number of row written before committing the records in a table.
In Datastage, we use Surrogate Key instead of unique key. Surrogate key is faqx used for retrieving data faster. It uses Index to perform the retrieval operation. In the Datastage, the rejected rows are managed through constraints in transformer. We can either place the rejected rows in the properties of a transformer or we can create a temporary storage for rejected rows with the help of REJECTED command.
Frequently asked questions about IBM DataStage Flow Designer
Process where hardware resources are shared by processor?
Data Stage Interview Questions & Answers
Which type is not used for lookups in data stage? In data stage, the repository is another name for a data warehouse? Which function is used to convert formats from one format to another. How do you find the number of rows in a sequential file? What does not contain information for more than one year? Name the third party tool that can be NOT Used in data stage? Which hash file is used to when limited amount of data is to be loaded in the target database?
Which is type of view in data stage director? Which command is used to execute datastage job from command line prompt? Which type of job is not used in datastage? Once you are finished, click the button below. Any items you have not completed will be marked incorrect.
You have not finished your quiz. If you leave this page, your progress will be lost.