ETL: DataStage 7.5

IBM WebSphere IIS DataStage Enterprise Edition v7.5.
QUESTIONS
A customer wants to migrate only the performance-critical jobs of an existing DataStage
Server Edition environment to a clustered deployment of DataStage Enterprise Edition.
Which migration approach will allow the new parallel jobs to run on all nodes of the
cluster using DataStage Enterprise Edition?
A. Create a Server Shared Container containing the key stages for each job to be
migrated, and use this container in a parallel job flow with native parallel source and target stages.
B. Using requirements for the existing Server Edition jobs, create new parallel jobs using
native parallel stages that implement the required source to target logic.
C. Configure a DataStage Server Edition environment on each node in the DataStage
Enterprise Edition cluster, import the existing .dsx export file to each node, compile and
run the jobs in parallel using a parallel configuration file.
D. Export the Server Edition jobs to a .dsx file, import them into the new DataStage
Enterprise Edition environment, and re-compile.
Answer: B

QUESTIONS
A parts supplier has a single fixed width sequential file. Reading the file has been slow,
so the supplier would like to try to read it in parallel.
If the job executes using a configuration file consisting of four nodes, which two
Sequential File stage settings will cause the DataStage parallel engine to read the file
using four parallel readers? (Choose two.)
(Note: Assume the file path and name is /data/parts_input.txt.)
A. Set the read method to specific file(s), set the file property to '/data/parts_input.txt',
and set the number of readers per node option to 2.
B. Set the read method to specific file(s), set the file property to '/data/parts_input.txt',
and set the read from multiple nodes option to yes.
C. Set read method to file pattern, and set the file pattern property to
'/data/(@PART_COUNT)parts_input.txt'.
D. Set the read method to specific file(s), set the file property to '/data/parts_input.txt',
and set the number of readers per node option to 4.
Answer: B,D

QUESTIONS
A developer has separated processing into multiple jobs in order to support restartability.
The first job writes to a file that the second job reads. The developer is trying to decide
whether to write to a dataset or to a sequential file. Which factor favors using a dataset?
A. I/O performance
B. ability to archive
C. memory usage
D. disk space usage
Answer: A

QUESTIONS
In a Lookup operator, a key column in the stream link is VARCHAR and has Unicode set
in the Extended attribute while the corresponding column in a reference link, also
VARCHAR, does not.
What will allow correct comparison of the data?
A. Convert the column in the stream link to the default code page using the
UstringToString function in a Transformer operator prior to the Lookup operator and
remove Unicode from the Extended attribute of the column.
B. Convert the column in the reference link to the UTF-8 code page using the
StringToUstring function in a Transformer operator prior to the Lookup operator, and set
the Extended attribute of the column.
C. Convert both columns to CHAR, pad with spaces, and remove Unicode from the
Extended attribute in Transformer operators prior to the Lookup operator.
D. Remove Unicode from the Extended attribute of the column from the beginning of the
job to the Lookup operator and then set the Extended attribute of the column in the output
mapping section of the Lookup operator.
Answer: B

QUESTIONS
Detail sales transaction data is received on a monthly basis in a single file that includes
CustID, OrderDate, and Amount fields. For a given month, a single CustID may have
multiple transactions.
Which method would remove duplicate CustID rows, selecting the most recent
transaction for a given CustID?

A. Use Auto partitioning. Perform a unique Sort on CustID and OrderDate (descending).
B. Hash partition on CustID. Perform a non-unique Sort on CustID and OrderDate
(ascending). Use same partitioning, followed by a RemoveDuplicates on CustID (duplicateToRetain=Last).
C. Hash partition on CustID. Perform a unique Sort on CustID and OrderDate (ascending).
D. Use Auto partitioning on all links. Perform a non-unique Sort on CustID and
OrderDate (ascending) followed by a RemoveDuplicates on CustID and OrderDate (duplicateToRetain=Last).
Answer: B

QUESTIONS
Which two statements about performance tuning a DataStage EE environment are true?
(Choose two.)
A. Overall job design has a minimal impact in actual real-world performance.
B. A single, optimized configuration file will yield best performance for all jobs and be
easier to administer.
C. Only adjust buffer tuning parameters after examining other performance factors.
D. Performance tuning is an iterative process - adjust one item at a time and examine the
results in isolation.
Answer: C,D
QUESTIONS
Which two statements are true about the Lookup stage? (Choose two.)
A. The Lookup stage supports one and only one lookup table per stage.
B. A reject link can be specified to capture input rows that do not have matches in the
lookup tables.
C. The source primary, input link must be sorted.
D. The Lookup stage uses more memory than the Merge and Join stages.
Answer: B,D

QUESTIONS
Which three can be identified from a parallel job score? (Choose three.)
A. components inserted by DataStage parallel engine at runtime (such as buffers, sorts).
B. runtime column propagation was enabled to support dynamic job designs.
C. whether a particular job was run in a clustered or SMP hardware environment.
D. stages whose logic was combined together to minimize the number of physical
processes used to execute a given job.
E. actual partitioning method used at runtime for each link defined with "Auto" partitioning.
Answer: A,D,E

QUESTIONS
A DataStage EE job is sourcing a flat file which contains a VARCHAR field. This field
needs to be mapped to a target field that is a date. Which task will accomplish this?
A. Perform a datatype conversion using DateToString function inside the Transformer stage.
B. Use a Modify stage to perform the type conversion.
C. DataStage automatically performs this type conversion by default.
D. Use a Copy stage to perform the type conversion.
Answer: B

QUESTIONS
To encourage users to update the short description for a job, how can you make the short
description visible and updateable on the main canvas?
A. Click the Show Job Short Description option in the Job Properties.
B. Right-click on the job canvas and choose Show Job Short Description in the submenu.
C. Add an Annotation stage to the job canvas and copy and paste in the short description.
D. Add a Description Annotation field to the job canvas and select the Short Description
property.
Answer: D

QUESTIONS
A single sequential file exists on a single node. To read this sequential file in parallel,
what should be done?
A. Set the Execution mode to "Parallel".

B. A sequential file cannot be read in parallel using the Sequential File stage.
C. Select "File Pattern" as the Read Method.
D. Set the "Number of Readers Per Node" optional property to a value greater than 1.
Answer: D

QUESTIONS
What is the default data type produced by the Aggregator stage?
A. decimal [38,9]
B. integer
C. single precision floating point
D. double precision floating point
Answer: D
QUESTIONS
Which two statements are true of the Merge stage? (Choose two.)
A. The Merge stage supports inner, left outer, and right outer joins.
B. All the inputs to the Merge stage must be sorted by the key column(s).
C. The Merge stage supports only two input links.
D. The columns that are used to merge the incoming data much be identically named.
Answer: B,D

QUESTIONS
Job run details for a specific invocation of a multi-instance job can be viewed by which
two clients? (Choose two.)
A. dsjobinfo

B. DataStage Director
C. dsjob
D. DataStage Manager
Answer: B,C

QUESTIONS
Your job reads from a file using a Sequential File stage running sequentially. You are
using a Transformer following the Sequential File stage to format the data in some of the
columns. Which partitioning algorithm would yield optimized performance?
A. Hash
B. Random
C. Round Robin
D. Entire
Answer: C

QUESTIONS
Which three accurately describe the differences between a DataStage server root
installation and a non-root installation? (Choose three.)
A. A non-root installation enables auto-start on reboot.
B. A root installation must specify the user "dsadm" as the DataStage administrative user.
C. A non-root installation inherits the permissions of the user who starts the DataStage services.
D. A root installation will start DataStage services in impersonation mode.
E. A root installation enables auto-start on reboot.
Answer: C,D,E

QUESTIONS
When a sequential file is read using a Sequential File stage, the parallel engine inserts an
operator to convert the data to the internal format. Which operator is inserted?
A. import operator
B. copy operator
C. tsort operator
D. export operator

Answer: A

QUESTIONS
Which two statements regarding the usage of data types in the parallel engine are correct?
(Choose two.)
A. The best way to import RDBMS data types is using the ODBC importer.
B. The parallel engine will use its interpretation of the Oracle meta data (e.g, exact data
types) based on interrogation of Oracle, overriding what you may have specified in the Columns tabs.
C. The best way to import RDBMS data types is using the Import Orchestrate Schema
Definitions using orchdbutil.
D. The parallel engine and server engine have exactly the same data types so there is no
conversion cost overhead from moving data between the engines.
Answer: B,C
QUESTIONS
You must create a job that extracts data from multiple DB2/UDB databases on a
mainframe and AS400 platforms without any database specific client software other than
what is included with DataStage. Which two stages will let you parameterize the options
required to connect to the database and enable you to use the same job if the source
metadata matches all source tables? (Choose two.)
A. DB2/UDB Enterprise stage
B. ODBC Enterprise stage
C. DB2/UDB API stage
D. Dynamic RDBMS stage
Answer: B,D

QUESTIONS
Which environment variable controls whether performance statistics can be displayed in
Designer?
A. APT_NO_JOBMON
B. APT_PERFORMANCE_DATA
C. APT_PM_SHOW_PIDS
D. APT_RECORD_COUNTS
Answer: A

QUESTIONS
Data volumes have grown significantly in the last month. A parallel job that used to run
well is now using unacceptable amounts of disk space and running slowly. You have
reviewed the job and explicitly defined the sorting and partitioning requirements for each
stage but the behavior of the job has not changed.
Which two actions improve performance based on this information? (Choose two.)
A. Change the sort methods on all sorts to "Don't Sort - Previously Grouped".
B. Enable the environment variable APT_NO_SORT_INSERTION in the job.
C. Increase the value of environment variable APT_PARTITION_NUMBER to increase
the level of parallelism for the sorts.
D. Enable the environment variable APT_SORT_INSERTION_CHECK_ONLY in the job.
Answer: B,D

QUESTIONS
Which two job design techniques can be used to give unique names to sequential output
files that are used in multi-instance jobs? (Choose two.)
A. Use the DataStage DSJobInvocationId macro to prepend/append the Invocation Id to
the file name.
B. Use parameters to identify file names.
C. Use a Transformer Stage variable to generate the name.
D. Use the Generate Unique Name property of the Sequential File Stage.
Answer: A,B

QUESTIONS
Jobs that use the Sort stage are running slow due to the amount of data being processed.
Which Sort stage property or environment variable can be modified to improve
performance?
A. Sort Stage Max Memory Limit property
B. Sort Stage Restrict Memory Usage property
C. APT_SORT_MEMORY_SIZE
D. APT_AUTO_TRANSPORT_SIZE
Answer: B

QUESTIONS
A job has two input sources that need to be combined. Each input source exceeds
available physical memory. The files are in the same format and must be combined using
a key value. It is guaranteed that there will be at least one match.
Given the above scenario, which stage would consume the least amount of physical
memory?
A. Funnel
B. Merge
C. Lookup
D. Transformer
Answer: B

QUESTIONS
What determines the degree of parallelism with which the Teradata Enterprise operator
will read from a database?
A. the value of the sessionsperplayer option found on the Additional Connections Options panel
B. the number of Teradata AMPs
C. the value of the Teradata MAXLOADTASKS parameter
D. the number of nodes specified in the APT_CONFIG_FILE
Answer: B

QUESTIONS
What would require creating a new parallel Custom stage rather than a new parallel
BuildOp stage?
A. A Custom stage can be created with properties. BuildOp stages cannot be created with
properties.
B. In a Custom stage, the number of input links does not have to be fixed, but can vary,
for example from one to two. BuildOp stages require a fixed number of input links.
C. Creating a Custom stage requires knowledge of C/C++. You do not need knowledge
of C/C++ to create a BuildOp stage.
D. Custom stages can be created for parallel execution. BuildOp stages can only be built
to run sequentially.
Answer: B

QUESTIONS
Which environment variable, when set to true, causes a report to be produced which
shows the operators, processes and data sets in the job?
A. APT_DUMP_SCORE
B. APT_JOB_REPORT
C. APT_MONITOR_SIZE
D. APT_RECORD_COUNTS
Answer: A

QUESTIONS
The source stream contains customer records. Each record is identified by a CUSTID
field. It is known that the stream contains duplicate records, that is, multiple records with
the same CUSTID value. The business requirement is to add a field named NUMDUPS
to each record that contains the number of duplicates and write the results to a target DB2

table.
Which job design would accomplish this?
A. Send the incoming records to a Transformer stage. Use a Hash partitioning method
with CUSTID as the key and sort by CUSTID. Use stage variables to keep a running
count of the number of each new CUSTID. Add this count to a new output field named
NUMDUPS then load the results into the DB2 table.
B. Use a Modify stage to add the NUMDUPS field to the input stream then process the
data via an Aggregator stage Group and CountRows options on CUSTID with the result
of the sum operation sent to the NUMDUPS column in the Mapping tab for load into the
DB2 table.
C. Use a Copy stage to split the incoming records into two streams. One stream goes to
an Aggregator stage that groups the records by CUSTID and counts the number of
records in each group and outputs the results to the NUMDUPS field. The output from
the Aggregator stage is then joined to the other stream using a Join stage on CUSTID and
the results are then loaded into the DB2 table.
D. Use an Aggregator stage to group the incoming records by CUSTID and to count the
number of records in each group then load the results into the DB2 table.
Answer: C

QUESTIONS
A client requires that a database table be done using two jobs. The first job writes to a
dataset. The second job reads the dataset and loads the table. The two jobs are connected
in a Job Sequence. What are three benefits of this approach? (Choose three.)
A. The time it takes to load the table is reduced.
B. The database table can be reloaded after a failure without re-reading the source data.
C. The dataset can be used by other jobs even if the database load fails.
D. The dataset can be read if the database is not available.
E. The data in the dataset can be archived and shared with other external applications.
Answer: B,C,D

QUESTIONS
Which two tasks will create DataStage projects? (Choose two.)
A. Export and import a DataStage project from DataStage Manager.
B. Add new projects from DataStage Administrator.
C. Install the DataStage engine.
D. Copy a project in DataStage Administrator.
Answer: B,C

QUESTIONS
Which three stages support the dynamic (runtime) definition of the physical column
metadata? (Choose three.)
A. the Sequential stage
B. the Column Export stage
C. the CFF stage
D. the DRS stage
E. the Column Import stage
Answer: A,B,E

QUESTIONS
You are reading customer data using a Sequential File stage and Sorting it by customer
ID using the Sort stage. The data is to be written to a sequential file in sorted order.
Which collection method is more likely to yield optimal performance without violating
the business requirements?
A. Sort Merge on customer ID
B. Auto
C. Round Robin
D. Ordered
Answer: A

QUESTIONS
A new column is required as part of a sorted dataset to be set to 1 when the value of sort
key changes and to 0 when the value of sort key is the same as the prior record. Which
statement is correct?
A. This can be handled entirely within the Sort stage by setting the Create Key Change
Column to True.
B. This can be handled within the Sort stage by including a new column name in the
output tab of the stage and by placing an "if, then, else" expression in the Derivation field
in the stage Mapping tab to generate the value for the column.
C. This can be handled entirely within the Sort stage by setting the Create Cluster Key
Change Column to True.
D. This cannot be handled entirely within the Sort stage.
Answer: A

QUESTIONS
Which two statements describe functionality that is available using the dsjob command?
(Choose two.)
A. dsjob can be used to get a report containing job, stage, and link information.
B. dsjob can be used to add a log entry for a specified job.
C. dsjob can be used to compile a job.
D. dsjob can be used to export job executables.
Answer: A,B

QUESTIONS
You have a parallel job that does not scale beyond two nodes. After investigation you
find that the data has been partitioned on gender and you have an Aggregator stage that is
accumulating totals based on gender using a sort method. Which technique should you
use to allow the job to scale?
A. Change the preserve partitioning option on the stage ahead of the aggregator to clear partitioning.
B. Change the aggregation method to hash to eliminate the blocking sort operation.
C. Add an additional column for partitioning to result in additional data partitions.
D. Change the partitioning key to one that results in more partitions, add a second
Aggregator stage that re-aggregates based on gender.
Answer: D

QUESTIONS
You have three output links coming out of a Transformer. Two of them (A and B) have
constraints you have defined. The third you want to be an Otherwise link that is to
contain all of the rows that do not satisfy the constraints of A and B. This Otherwise link
must work correctly even if the A and B constraints are modified. Which two are
required? (Choose two.)
A. The Otherwise link must be first in the link ordering.
B. A constraint must be coded for the Otherwise link.
C. The Otherwise link must be last in the link ordering.
D. The Otherwise check box must be checked.
Answer: C,D
QUESTIONS
In which two scenarios should a sparse lookup be used in place of a normal lookup to
retrieve data from an Oracle database? (Choose two.)
A. A database function that returns the current value of an Oracle object is required as
part of the result set.
B. When the number of input rows is significantly smaller than the number of rows in the
lookup table.
C. When the Oracle database is on the same system as the DataStage server.
D. When the number of input rows is significantly larger than the number of rows in the
lookup table.
Answer: A,B

QUESTIONS
Which statement about job parameter usage is true?
A. You can change job parameters while a job is running and the changes will
immediately be applied mid-job.
B. You can use environment variables to set parameter values linked to the Job Sequence.
C. You can change the parameter values in an initialization file linked to a Job Sequence.ini file.
D. Changes to the job parameters in the Designer do not require a recompile to be applied to the job.
Answer: B

QUESTIONS
What are two causes for records being incorrectly identified as an Edit in the Change
Capture stage? (Choose two.)

A. The before and after input datasets are not in the same sort order.
B. At least one field containing unique values for all records is specified as a change value.
C. The key fields are not unique.
D. A key field contains null values in some records.
Answer: A,C

QUESTIONS
Which two statements describe the properties needed by the Oracle Enterprise stage to
operate in parallel direct load mode. (Choose two.)
A. The table can have any number of indexes but the job will run slower since the
indexes will have to be updated as the data is loaded.
B. The table must not have any indexes, or you must include the rebuild Index Mode
property unless the only index on the table is the primary key in which case you can use
the Disable Constraints property.
C. Only index organized tables allow you to use parallel direct load mode and you need
to set both DIRECT and PARALLEL properties to True, otherwise the load will be
executed in parallel but will not use direct mode.
D. Set the Write Method to Load.
Answer: B,D

QUESTIONS
How does a Join stage process an Inner join?
A. It transfers all values from the right data set and transfers values from the left data set
and intermediate data sets only where key columns match.
B. It transfers records from the input data sets whose key columns contain equal values to
the output data set.
C. It transfers all values from the left data set but transfers values from the right data set
and intermediate data sets only when key columns match.
D. It transfers records in which the contents of the key columns are equal from the left
and right input data sets to the output data set. It also transfers records whose key
columns contain unequal values from both input data sets to the output data set.
Answer: B

QUESTIONS
Which partitioning method requires specifying a key?
A. Random

B. DB2
C. Entire
D. Modulus
Answer: D
QUESTIONS
Detail sales transaction data is received on a monthly basis in a single file that includes
CustID, OrderDate, and Amount fields. For a given month, a single CustID may have
multiple transactions.
Which method would remove duplicate CustID rows, selecting the largest transaction
amount for a given CustID?
A. Hash partition on CustID. Perform a non-stable, unique Sort on CustID and Amount (descending).
B. Use Auto partitioning on all links. Perform a non-unique Sort on CustID and Amount
(ascending) followed by a RemoveDuplicates on CustID and Amount (duplicateToRetain=Last).
C. Use Auto partitioning. Perform a unique Sort on CustID and Amount (ascending).
D. Hash partition on CustID. Perform a non-unique Sort on CustID and Amount
(descending).Use same partitioning, followed by a RemoveDuplicates on CustID (duplicateToRetain=First).
Answer: D

QUESTIONS
Your business requirement is to read data from three Oracle tables that store historical
sales data from three regions for loading into a single Oracle table.The table definitions
are the same for all three tables, the only difference is that each table contains data for a
particular region.
Which two statements describe how this can be done? (Choose two.)
A. Create a job with a single Oracle Enterprise stage that executes a custom SQL
statement with a FETCH ALL operator that outputs the data to an Oracle Enterprise stage.
B. Create a job with a single Oracle Enterprise stage that executes a custom SQL
statement with a UNION ALL operator that outputs the data to an Oracle Enterprise stage.
C. Create a job with three Oracle Enterprise stages to read from the tables and output to a
Collector stage which in turns outputs the data to an Oracle Enterprise stage.
D. Create a job with three Oracle Enterprise stages to read from the tables and output to a
Funnel stage which in turn outputs the data to an Oracle Enterprise stage.
Answer: B,D

QUESTIONS
When importing a COBOL file definition, which two are required? (Choose two.)
A. The file you are importing is accessible from your client workstation.
B. The file you are importing contains level 01 items.
C. The column definitions are in a COBOL copybook file and not, for example, in a
COBOL source file.
D. The file does not contain any OCCURS DEPENDING ON clauses.
Answer: A,B

QUESTIONS
Which two statements are true about the Join stage? (Choose two.)
A. All the inputs to the Join stage must be sorted by the Join key.
B. Join stages can have reject links that capture rows without matches.
C. The Join stage supports inner, left outer, and right outer joins.
D. The Join stage uses more memory than the Lookup stage.
Answer: A,C

QUESTIONS
A client requires that any job that aborts in a Job Sequence halt processing. Which three
activities would provide this capability? (Choose three.)
A. Nested Condition Activity
B. Exception Handler
C. Sequencer Activity
D. Sendmail Activity
E. Job trigger
Answer: A,B,E

QUESTIONS
What is the default data type produced by the Aggregator stage?
A. integer
B. double precision floating point
C. decimal [38,9]
D. single precision floating point
Answer: B
QUESTIONS
You have set the "Preserve Partitioning" flag for a Sort stage to request that the next
stage preserves whatever partitioning it has implemented. Which statement describes
what will happen next?
A. The job will compile but will abort when run.
B. The job will not compile.
C. The next stage can ignore this request but a warning is logged when the job is run
depending on the stage type that ignores the flag.
D. The next stage disables the partition options that are normally available in the
Partitioning tab.
Answer: C

QUESTIONS
Which statement describes how to add functionality to the Transformer stage?
A. Create a new parallel routine in the Routines category that specifies the name, path,

type, and return type of a function written and compiled in C++.
B. Create a new parallel routine in the Routines category that specifies the name, path,
type, and return type of an external program.
C. Create a new server routine in the Routines category that specifies the name and
category of a function written in DataStage Basic.
D. Edit the C++ code generated by the Transformer stage.
Answer: A

QUESTIONS
Which two statements are true about usage of the APT_DISABLE_COMBINATION
environment variable? (Choose two.)
A. Locks the job so that no one can modify it.
B. Disabling generates more processes requiring more system resources and memory.
C. Must use the job design canvas to check which stages are no longer being combined.
D. Globally disables operator combining.
Answer: B,D

QUESTIONS
A job design consists of an input fileset followed by a Peek stage, followed by a Filter
stage, followed by an output fileset. The environment variable
APT_DISABLE_COMBINATION is set to true, and the job executes on an SMP using a
configuration file with 8 nodes defined. Assume also that the input dataset was created
with the same 8 node configuration file.
Approximately how many data processing processes will this job create?
A. 32
B. 8
C. 16
D. 1
Answer: A

QUESTIONS
A credit card company has about 10 million unique accounts. The company needs to
determine the outstanding balance of each account by aggregating the previous balance
with current charges. A DataStage EE job with an Aggregator stage is being used to
perform this calculation. Which Aggregator method should be used for optimal
performance?
A. Sort
B. Group
C. Auto
D. Hash
Answer: A

QUESTIONS
Your input rows contain customer data from a variety of locations. You want to select
just those rows from a specified location based on a parameter value. You are trying to
decide whether to use a Transformer or a Filter stage to accomplish this. Which statement
is true?
A. The Transformer stage will yield better performance because the Filter stage Where
clause is interpreted at runtime.
B. You cannot use a Filter stage because you cannot use parameters in a Filter stage
Where clause.
C. The Filter stage will yield better performance because it has less overhead than a
Transformer stage.
D. You cannot use the Transformer stage because you cannot use parameters in a
Transformer stage constraint.
Answer: A

QUESTIONS
You need to move a DataStage job from a development server on machine A to a
production server on machine B. What are two valid ways to do this? (Choose two.)

A. Use the command line export tool to create a .dsx file on machine A, then move the
.dsx file to machine B and use the command line import tool to load the .dsx file.
B. Connect the Manager client to the source project on machine A and create a .dsx file
of the job then connect the Manager client to the target project on machine B and import the .dsx file.
C. Use the command line export tool to create a .dsx file on machine A then move the
.dsx file to the client and use the Manager client to import it.
D. Connect to machine A with the Manager client and create a .dsx file of the job, then
move the .dsx file to machine B and use the command line import tool.
Answer: B,D

QUESTIONS
In a Lookup operator, a key column in the stream link is VARCHAR and has Unicode set
in the Extended attribute while the corresponding column in a reference link, also
VARCHAR, does not.
What will allow correct comparison of the data?
A. Convert both columns to CHAR, pad with spaces, and remove Unicode from the
Extended attribute in Transformer operators prior to the Lookup operator.
B. Convert the column in the reference link to the UTF-8 code page using the
StringToUstring function in a Transformer operator prior to the Lookup operator, and set
the Extended attribute of the column.
C. Convert the column in the stream link to the default code page using the
UstringToString function in a Transformer operator prior to the Lookup operator and
remove Unicode from the Extended attribute of the column.
D. Remove Unicode from the Extended attribute of the column from the beginning of the
job to the Lookup operator and then set the Extended attribute of the column in the output
mapping section of the Lookup operator.
Answer: B

QUESTIONS
Which two system variables must be used in a parallel Transformer derivation to
generate a unique sequence of integers across partitions? (Choose two.)
A. @PARTITIONNUM
B. @INROWNUM

C. @DATE
D. @NUMPARTITIONS
Answer: A,D

QUESTIONS
A job contains a Sort stage that sorts a large volume of data across a cluster of servers.
The customer has requested that this sorting be done on a subset of servers identified in
the configuration file to minimize impact on database nodes.
Which two steps will accomplish this? (Choose two.)
A. Create a sort scratch disk pool with a subset of nodes in the parallel configuration file.
B. Set the execution mode of the Sort stage to sequential.
C. Specify the appropriate node constraint within the Sort stage.
D. Define a non-default node pool with a subset of nodes in the parallel configuration
file.
Answer: C,D

QUESTIONS
You are reading customer data using a Sequential File stage and transforming it using the
Transformer stage. The Transformer is used to cleanse the data by trimming spaces from
character fields in the input. The cleansed data is to be written to a target DB2 table.
Which partitioning method would yield optimal performance without violating the
business requirements?
A. Hash on the customer ID field
B. Round Robin
C. Random
D. Entire

Answer: B

QUESTIONS
Which three defaults are set in DataStage Administrator? (Choose three.)
A. default prompting options, such as Autosave job before compile
B. default SMTP mail server name
C. project level default for Runtime Column Propagation
D. project level defaults for environment variables
E. project level default for Auto-purge of job log entries
Answer: C,D,E

QUESTIONS
A Varchar(10) field named SourceColumn is mapped to a Char(25) field named
TargetColumn in a Transformer stage. The APT_STRING_PADCHAR environment
variable is set in Administrator to its default value. Which technique describes how to
write the derivation so that values in SourceColumn are padded with spaces in
TargetColumn?
A. Include APT_STRING_PADCHAR in your job as a job parameter. Specify the C/C++
end of string character (0x0) as its value.
B. Map SourceColumn to TargetColumn. The Transformer stage will automatically pad with spaces.
C. Include APT_STRING_PADCHAR in your job as a job parameter. Specify a space as its value.
D. Concatenate a string of 25 spaces to SourceColumn in the derivation for TargetColumn.
Answer: C

QUESTIONS
Which statement is true about Aggregator Sort and Hash methods when the
APT_NO_SORT_INSERTION environment variable is set to TRUE?
A. If you select the Hash method, the Aggregator stage requires the data to have the
partition sorted by the group key.
B. If you select the Hash method, the Aggregator stage will partition sort the data by the
group key before building a hash table in memory.
C. If you select the Sort method, the Aggregator stage will partition sort the data by the
group key before performing the aggregation.

D. If you select the Sort method, the Aggregator stage requires the data to have been
partition sorted by the group key.
Answer: D

QUESTIONS
Using FTP, a file is transferred from an MVS system to a LINUX system in binary
transfer mode. Which data conversion must be used to read a packed decimal field in the
file?
A. treat the field as EBCDIC
B. treat the field as a packed decimal
C. packed decimal fields are not supported
D. treat the field as ASCII
Answer: B

QUESTIONS
Which two statements are true about DataStage Parallel Buildop stages? (Choose two.)
A. Unlike standard DataStage stages they do not have properties.
B. They are coded using C/C++.
C. They are coded using DataStage Basic.
D. Table Definitions are used to define the input and output interfaces of the BuildOp.
Answer: B,D

QUESTIONS
On which two does the number of data files created by a fileset depend ? (Choose two.)
A. the size of the partitions of the dataset
B. the number of CPUs
C. the schema of the file
D. the number of processing nodes in the default node pool
Answer: A,D

QUESTIONS
An XML file is being processed by the XML Input stage. How can repetition elements be
identified on the stage?
A. Set the "Nullable" property for the column on the output link to "Yes".
B. Set the "Key" property for the column on the output link to "Yes".
C. Check the "Repetition Element Required" box on the output link tab.
D. No special settings are required. XML Input stage automatically detects the repetition
element from the XPath expression.
Answer: B
QUESTIONS
A job requires that data be grouped by primary keys and then sorted within each group
by secondary keys to reproduce the results of a group-by and order-by clause common to
relational databases. The designer has chosen to implement this requirement with two
Sort stages in which the first Sort stage sorts the records by the primary keys.
Which set of properties must be specified in the second Sort stage?
A. Specify both sets of keys, with a Sort Key Option of "Don't Sort (Previously Sorted)"
on the primary keys, and "Don't Sort (Previously Grouped)", and the Stable Sort option set to True.
B. Specify only the secondary keys, with a Sort Key Option of "Don't Sort ( Previously
Sorted)" and the Stable Sort option set to True.
C. Specify both sets of keys, with a Sort Key Option of "Don't Sort (Previously
Grouped)" on the primary keys, and "Sort" on the secondary keys.
D. Specify only the secondary keys, with a Sort Key Option of "Sort" and the Stable Sort
option set to True.
Answer: C

QUESTIONS
Using a second (reject) output link from a Sequential File read stage, which two methods
will allow the number of records rejected by the Sequential File stage to be captured and
logged to a file? (Choose two.)
(Note: The log file should contain only the number of records rejected, not the records
themselves. Assume the file is named reject_count.stage_10.)
A. Send the rejected records to a dataset named rejects_stage_10.ds, and define an After
Job Subroutine to execute the command 'dsrecords rejects_stage_10.ds >
reject_count.stage_10'.
B. Send the rejected records to a Column Generator stage to generate a constant key
value, use the Aggregator stage with the generated key value to count the records, then
write the results to the log file using another Sequential File stage.
C. Send the rejected records to a Peek stage and define an After Job Subroutine to
execute the command 'dsjob -report peek_rejects > reject_count.stage_10' with the
appropriate -jobid and -project arguments.
D. Send the rejected records to a Change Capture stage, use a Sort Funnel stage to sort it
all out, then write the resulting record count to the log file using another Sequential File
stage.
Answer: A,B

QUESTIONS
Which requirement must be satisfied to read from an Oracle table in parallel using the
Oracle Enterprise stage?
A. Set the environment variable $ORA_IGNORE_CONFIG_FILE_PARALLELISM.
B. Configure the source table to be partitioned within the source Oracle database.
C. Oracle Enterprise stage always reads in parallel.
D. Specify the Partition Table option in the Oracle source stage.
Answer: D

QUESTIONS
When a sequential file is written using a Sequential File stage, the parallel engine inserts
an operator to convert the data from the internal format to the external format. Which
operator is inserted?
A. export operator
B. copy operator
C. import operator
D. tsort operator
Answer: A

QUESTIONS
Which "Reject Mode" option in the Sequential File stage will write records to a reject
link?
A. Output
B. Fail
C. Drop
D. Continue
Answer: A

QUESTIONS
A file created by a mainframe COBOL program is imported to the DataStage parallel
engine. The resulting record schema contains a fixed length vector of the three most
recent telephone numbers called by each customer. The processing requirements state
that each of the three most recent calls must be processed differently.
Which stage should be used to restructure the record schema so that the Transformer
stage can be used to process each of the three telephone numbers found in the vector?
A. the Make Vector stage followed by a Split Subrecord stage
B. the Split Subrecord stage followed by a Column Export stage

C. the Promote Subrecord stage
D. the Split Vector stage
Answer: D
QUESTIONS
The parallel dataset input into a Transformer stage contains null values. What should you
do to properly handle these null values?
A. Convert null values to a valid values in a stage variable.
B. Convert null values to a valid value in the output column derivation.
C. Null values are automatically converted to blanks and zero, depending on the target data type.
D. Trap the null values in a link constraint to avoid derivations.
Answer: A

QUESTIONS
Which two statements are true about XML Meta Data Importer? (Choose two.)
A. XML Meta Data Importer is capable of reporting syntax and semantic errors from an XML file.
B. XPATH expressions that are created during XML metadata import cannot be modified.
C. XML Meta Data Importer can import Table Definitions from only XML documents.
D. XPATH expressions that are created during XML metadata import are used by XML
Input stage and XML Output stage.
Answer: A,D

QUESTIONS
Which type of file is both partitioned and readable by external applications?
A. fileset
B. Lookup fileset
C. dataset
D. sequential file
Answer: A

QUESTIONS
Which two describe a DataStage EE installation in a clustered environment? (Choose
two.)

A. The C++ compiler must be installed on all cluster nodes.
B. Transform operators must be copied to all nodes of the cluster.
C. The DataStage parallel engine must be installed or accessible in the same directory on
all machines in the cluster.
D. A remote shell must be configured to support communication between the conductor
and section leader nodes.
Answer: C,D

QUESTIONS
What is the purpose of the uv command in a UNIX DataStage server?
A. Cleanup resources from a failed DataStage job.
B. Start and stop the DataStage engine.
C. Provide read access to a DataStage EE configuration file.
D. Report DataStage client connections.
Answer: B

QUESTIONS
Establishing a consistent naming standard for link names is useful in which two ways?
(Choose two.)
A. using less memory at job runtime
B. specifying link order without having to specify stage properties to generate correct results
C. easing use of captured job statistics (eg. row counts in an XML file) by processes or
users outside of DataStage
D. improving developer productivity and quality by distinguishing link names within stage editors
Answer: C,D
QUESTIONS
Which technique would you use to abort a job from within the Transformer stage?
A. Call the DSLogFatal function from a stage variable derivation.
B. Call the DSStopJob function from a stage or column derivation.
C. Create a dummy output link with a constraint that tests for the condition to abort on -
set the "Abort After Rows" property to 1.
D. Use the SETENV function to set the environmental APT_PM_KILL_NOWAIT.
Answer: C

QUESTIONS
Your job reads from a file using a Sequential File stage running sequentially. The
DataStage server is running on a single SMP system. One of the columns contains a
product ID. In a Lookup stage following the Sequential File stage, you decide to look up
the product description from a reference table. Which two partition settings would
correctly find matching product descriptions? (Choose two.)
A. Hash algorithm, specifying the product ID field as the key, on both the link coming
from the Sequential File stage and the link coming from the reference table.
B. Round Robin on both the link coming from the Sequential File stage and the link
coming from the reference table.
C. Round Robin on the link coming from the Sequential File stage and Entire on the link
coming from the reference table.
D. Entire on the link coming from the Sequential File stage and Hash, specifying the
product ID field as the key, on the link coming from the reference table.
Answer: A,C

QUESTIONS
You need to move a DataStage job from a development server on machine A to a
production server on machine B.
What are two valid ways to do this? (Choose two.)
A. Connect the Manager client to the source project on machine A and create a .dsx file
of the job then connect the Manager client to the target project on machine B and import the .dsx file.
B. Use the command line export tool to create a .dsx file on machine A then move the
.dsx file to the client and use the Manager client to import it.
C. Use the command line export tool to create a .dsx file on machine A, then move the
.dsx file to machine B and use the command line import tool to load the .dsx file.
D. Connect to machine A with the Manager client and create a .dsx file of the job, then
move the .dsx file to machine B and use the command line import tool.

Answer: A,D

QUESTIONS
Which two would cause a stage to sequentially process its incoming data? (Choose two.)
A. The execution mode of the stage is sequential.
B. The stage follows a Sequential File stage and its partitioning algorithm is Auto.
C. The stage follows a Sequential File stage and the Preserve partitioning has been set to Clear.
D. The stage has a constraint with a node pool containing only one node.
Answer: A,D

QUESTIONS
A customer is interested in selecting the right RDBMS environment to run DataStage
Enterprise Edition to solve a multi-file and relational database data merge. The customer
realizes the value of running in parallel and is interested in knowing which RDBMS stage
will match the internal data partitioning of a given RDBMS.
Which RDBMS stage will satisfy the customer's request?
A. DB2/UDB Enterprise
B. Oracle Enterprise
C. ODBC Enterprise
D. Sybase Enterprise
Answer: A

QUESTIONS
You have a compiled job and parallel configuration file. Which three methods can be
used to determine the number of nodes actually used to run the job in parallel? (Choose
three.)
A. within DataStage Designer, generate report and retain intermediate XML
B. within DataStage Designer, show performance statistics
C. within DataStage Director, examine log entry for parallel configuration file
D. within DataStage Director, examine log entry for parallel job score
E. within DataStage Director, open a new DataStage Job Monitor

Answer: C,D,E
QUESTIONS
Which three are valid trigger expressions in a stage in a Job Sequence? (Choose three.)
A. Equality(Conditional)
B. Unconditional
C. ReturnValue(Conditional)
D. Difference(Conditional)
E. Custom(Conditional)
Answer: B,C,E

QUESTIONS
Which three are keyless partitioning methods? (Choose three.)
A. Entire
B. Modulus
C. Round Robin
D. Random
E. Hash
Answer: A,C,D

QUESTIONS
What is the purpose of the Oracle Enterprise Stage Exception Table property?
A. Enables you to specify a table which is used to capture selected column data that
meets user defined criteria for debugging purposes. The table needs to exist before the job is run.
B. Enables you to specify a table which is used to record the ROWID information on
rows that violate constraints during upsert/write operations.
C. Enables you to specify a table which is used to record ROWID information on rows
that violate constraints when constraints are re-enabled after a load operation.
D. Enables you to specify that a table should be created (if it does not exist) to capture all
exceptions when accessing Oracle.
Answer: C

QUESTIONS
Which two statements about shared containers are true? (Choose two.)
A. You can make a local container a shared container but you cannot make a shared
container a local container.
B. Shared containers can be used to make common job components that are available
throughout the project.
C. Changes to a shared container are automatically available in all dependant jobs
without a recompile.
D. Server shared containers allow for DataStage Server Edition components to be placed
in a parallel job.
Answer: B,D

QUESTIONS
Which environment variable controls whether performance statistics can be displayed in
Designer?
A. APT_RECORD_COUNTS
B. APT_PM_SHOW_PIDS
C. APT_NO_JOBMON
D. APT_PERFORMANCE_DATA
Answer: C

QUESTIONS
You are reading data from a Sequential File stage. The column definitions are specified
by a schema. You are considering whether to follow the Sequential File stage by either a
Transformer or a Modify stage. Which two criteria require the use one of these stages
instead of the other? (Choose two.)
A. You want to dynamically specify the name of an output column based on a job
parameter, therefore you select a Modify stage.
B. You want to replace NULL values by a specified constant value, therefore you select a
Modify stage.
C. You want to add additional columns, therefore you select a Transformer stage.
D. You want to concatenate values from multiple input rows and write this to an output
link, therefore you select a Transformer stage.
Answer: A,D
QUESTIONS
During a sequential file read, you experience an error with the data. What is a valid
technique for identifying the column causing the difficulty?

A. Set the "data format" option to text on the Record Options tab.
B. Enable tracing in the DataStage Administrator Tracing panel.
C. Enable the "print field" option at the Record Options tab.
D. Set the APT_IMPORT_DEBUG environmental variable.
Answer: C

QUESTIONS
What does setting an environment variable, specified as a job parameter, to PROJDEF do?
A. Populates the environment variable with the value of PROJDEF.
B. Explicitly unsets the environment variable.
C. Uses the value for the environment variable as shown in the DataStage Administrator.
D. Uses the current setting for the environment variable from the operating system.
Answer: C

QUESTIONS
How are transaction commit operations handled by the DB2/UDB Enterprise stage?
A. Commit operations can only be defined by the number of rows since the start of a transaction.
B. Transaction commits can be controlled by defining the number of rows per transaction
or by a specific time period defined by the number of seconds elapsed between commits.
C. Commit operations can only be defined by the number of seconds since the start of a transaction.
D. Commit operations can be defined globally by setting APT_TRANSACTION_ROWS variable.
Answer: B

QUESTIONS
A customer is interested in selecting the right RDBMS environment to run DataStage
Enterprise Edition to solve a multi-file and relational database data merge. The customer
realizes the value of running in parallel and is interested in knowing which RDBMS stage
will match the internal data partitioning of a given RDBMS. Which RDBMS stage will
satisfy the customer's request?
A. Sybase Enterprise
B. ODBC Enterprise
C. Oracle Enterprise
D. DB2/UDB Enterprise
Answer: D

QUESTIONS
Which two statements are correct when using the Change Capture and Change Apply
stages together on the same data? (Choose two.)
A. You must apply a Differences stage to the output of the Change Capture stage before
passing the data into the Change Apply stage.
B. A Compare stage must be used following the Change Capture stage to identify
changes to the change_code column values.
C. The input to the Change Apply stage must have the same key columns as the input to
the prior Change Capture stage.
D. Both inputs of the Change Apply stage are designated as partitioned using the same
partitioning method.
Answer: C,D

QUESTIONS
Which two would require the use of a Transformer stage instead of a Copy stage?
(Choose two.)
A. Drop a column.
B. Send the input data to multiple output streams.
C. Trim spaces from a character field.
D. Select certain output rows based on a condition.
Answer: C,D
QUESTIONS
Which two statements are correct about XML stages and their usage? (Choose two.)
A. XML Input stage converts XML data to tabular format.
B. XML Output stage converts tabular data to XML hierarchical structure.
C. XML Output stage uses XSLT stylesheet for XML to tabular transformations.
D. XML Transformer stage converts XML data to tabular format.
Answer: A,B

QUESTIONS
Which three statements about the Enterprise Edition parallel Transformer stage are
correct? (Choose three.)
A. The Transformer allows you to copy columns.
B. The Transformer allows you to do lookups.
C. The Transformer allows you to apply transforms using routines.
D. The Transformer stage automatically applies 'NullToValue' function to all
non-nullable output columns.
E. The Transformer allows you to do data type conversions.
Answer: A,C,E

QUESTIONS
Which partitioning method would yield the most even distribution of data without
duplication?
A. Entire
B. Round Robin
C. Hash
D. Random

Answer: B

QUESTIONS
Using FTP, a file is transferred from an MVS system to a LINUX system in binary
transfer mode. Which data conversion must be used to read a packed decimal field in the
file?
A. treat the field as a packed decimal
B. packed decimal fields are not supported
C. treat the field as ASCII
D. treat the field as EBCDIC
Answer: A

QUESTIONS
To encourage users to update the short description for a job, how can you make the short
description visible and updateable on the main canvas?

A. Add a Description Annotation field to the job canvas and select the Short Description property.
B. Right-click on the job canvas and choose Show Job Short Description in the submenu.
C. Click the Show Job Short Description option in the Job Properties.
D. Add an Annotation stage to the job canvas and copy and paste in the short description.
Answer: A

QUESTIONS
The last two steps of a job are an Aggregator stage using the Hash method and a
Sequential File stage with a Collector type of Auto that creates a comma delimited output
file for use by a common spreadsheet program. The job runs a long time because data
volumes have increased. Which two changes would improve performance? (Choose two.)
A. Change the Sequential stage to use a Sort Merge collector on the aggregation keys.
B. Change the Aggregator stage to use the sort method. Hash and sort on the aggregation keys.
C. Change the Sequential stage to a Data Set stage to allow the write to occur in parallel.
D. Change the Aggregator stage to a Transformer stage and use stage variables to
accumulate the aggregations.
Answer: A,B

QUESTIONS
Your business requirement is to read data from three Oracle tables that store historical
sales data from three regions for loading into a single Oracle table.The table definitions
are the same for all three tables, the only difference is that each table contains data for a
particular region.
Which two statements describe how this can be done? (Choose two.)
A. Create a job with a single Oracle Enterprise stage that executes a custom SQL
statement with a UNION ALL operator that outputs the data to an Oracle Enterprise stage.
B. Create a job with three Oracle Enterprise stages to read from the tables and output to a
Collector stage which in turns outputs the data to an Oracle Enterprise stage.
C. Create a job with three Oracle Enterprise stages to read from the tables and output to a
Funnel stage which in turn outputs the data to an Oracle Enterprise stage.
D. Create a job with a single Oracle Enterprise stage that executes a custom SQL
statement with a FETCH ALL operator that outputs the data to an Oracle Enterprise stage.
Answer: A,C

QUESTIONS
In a Transformer you add a new column to an output link named JobName that is to
contain the name of the job that is running. What can be used to derive values for this
column?
A. a DataStage function
B. a link variable
C. a system variable
D. a DataStage macro
Answer: D

QUESTIONS
Which statement describes a process for capturing a COBOL copybook from a z/OS
system?
A. Select the COBOL copybook using the Browse button and capture the COBOL
copybook with Manager.
B. FTP the COBOL copybook to the server platform in text mode and capture the
metadata through Manager.
C. FTP the COBOL copybook to the client workstation in binary and capture the
metadata through Manager.
D. FTP the COBOL copybook to the client workstation in text mode and capture the
copybook with Manager.

Answer: D

QUESTIONS
A dataset needs to be sorted to retain a single occurrence of multiple records that have
identical sorting key values. Which Sort stage option can be selected to achieve this?
A. Stable Sort must be set to True.
B. Sort Group must include the sort key values.
C. Allow Duplicates must be set to False.
D. Unique Sort Keys must be set to True.
Answer: C

QUESTIONS
Which two requirements must be in place when using the DB2 Enterprise stage in LOAD
mode? (Choose two.)
A. Tablespace cannot be addressed by anyone else while the load occurs.
B. User running the job has dbadm privileges.
C. Tablespace must be placed in a load pending state prior to the job being launched.
D. Tablespace must be in read only mode prior to the job being launched.
Answer: A,B

QUESTIONS
Which statement describes a process for capturing a COBOL copybook from a z/OS
system?
A. FTP the COBOL copybook to the server platform in text mode and capture the
metadata through Manager.
B. Select the COBOL copybook using the Browse button and capture the COBOL
copybook with Manager.
C. FTP the COBOL copybook to the client workstation in text mode and capture the
copybook with Manager.
D. FTP the COBOL copybook to the client workstation in binary and capture the
metadata through Manager.
Answer: C

QUESTIONS
A job reads from a dataset using a DataSet stage. This data goes to a Transformer stage
and then is written to a sequential file using a Sequential File stage. The default
configuration file has 3 nodes. The job creating the dataset and the current job both use
the default configuration file. How many instances of the Transformer run in parallel?
A. 3
B. 1
C. 7
D. 9
Answer: A

QUESTIONS
Which three features of datasets make them suitable for job restart points? (Choose
three.)
A. They are indexed for fast data access.
B. They are partitioned.
C. They use datatypes that are in the parallel engine internal format.
D. They are persistent.
E. They are compressed to minimize storage space.
Answer: B,C,D

QUESTIONS
An Aggregator stage using a Hash technique processes a very large number of rows
during month end processing. The job occasionally aborts during these large runs with an
obscure memory error. When the job is rerun, processing the data in smaller amounts
corrects the problem. Which change would correct the problem?
A. Set the Combinability option on the Stage Advanced tab to Combinable allowing the
Aggregator to use the memory associated with other operators.
B. Change the partitioning keys to produce more data partitions.
C. Add a Sort stage prior to the Aggregator and change to a sort technique on the Stage
Properties tab of the Aggregator stage.
D. Set the environment variable APT_AGG_MAXMEMORY to a larger value.
Answer: C

QUESTIONS
Which two must be specified to manage Runtime Column Propagation? (Choose two.)
A. enabled in DataStage Administrator
B. attached to a table definition in DataStage Manager
C. enabled at the stage level
D. enabled with environmental parameters set at runtime
Answer: A,C

QUESTIONS
In a Teradata environment, which stage invokes Teradata supplied utilities?
A. Teradata API
B. DRS Teradata
C. Teradata Enterprise

D. Teradata Multiload
Answer: D

QUESTIONS
Which statement is true when Runtime Column Propagation (RCP) is enabled?
A. DataStage Manager does not import meta data.
B. DataStage Director does not supply row counts in the job log.
C. DataStage Designer does not enforce mapping rules.
D. DataStage Administrator does not allow default settings for environment variables.
Answer: C
QUESTIONS
What are two ways to delete a persistent parallel dataset? (Choose two.)
A. standard UNIX command rm
B. orchadmin command rm
C. delete the dataset Table Definition in DataStage Manager
D. delete the dataset in Data Set Manager
Answer: B,D

QUESTIONS
A source stream contains customer records, identified by a customer ID. Duplicate
records exist in the data. The business requirement is to populate a field on each record
with the number of duplicates and write the results to a DB2 table.
Which job design would accomplish this in a single job?
A. This cannot be accomplished in a single job.
B. Use a Copy stage to split incoming records into two streams. One stream uses an
Aggregator stage to count the number of duplicates. Join the Aggregator stage output
back to the other stream using a Join stage.
C. Use an Aggregator stage to group incoming records by customer ID and to count the
number of records in each group. Output the results to the target DB2 table.
D. Use stage variables in a Transformer stage to keep a running count of records with the
same customer ID. Add this count to a new output field and write the results to the DB2
table.
Answer: B

QUESTIONS
Which two field level data type properties are schema properties that are standard SQL
properties? (Choose two.)
A. the character used to mark the end of a record
B. values to be generated for a field
C. whether a field is nullable
D. the value that is to be written to a sequential file when the field is NULL
Answer: A,C

QUESTIONS
Which three are the critical stages that would be necessary to build a Job Sequence that:
picks up data from a file that will arrive in an directory overnight, launches a job once the
file has arrived, sends an email to the administrator upon successful completion of the
flow? (Choose three.)
A. Sequencer
B. Notification Activity
C. Wait For File Activity
D. Job Activity
E. Terminator Activity
Answer: B,C,D

QUESTIONS
Which two would determine the location of the raw data files for a parallel dataset?
(Choose two.)
A. the orchadmin tool
B. the Data Set Management tool
C. the DataStage Administrator
D. the Dataset stage
Answer: A,B

QUESTIONS
A bank receives daily credit score updates from a credit agency in the form of a fixed
width flat file. The monthly_income column is an unsigned nullable integer (int32)
whose width is specified as 10, and null values are represented as spaces. Which
Sequential File property will properly import any nulls in the monthly_income column of
the input file?

A. Set the record level fill char property to the space character (' ').
B. Set the null field value property to a single space (' ').
C. Set the C_format property to '"%d. 10"'.
D. Set the null field value property to ten spaces (' ').
Answer: D
QUESTIONS
Which two can be implemented in a Job Sequence using job parameters? (Choose two.)
A. All options of the Start Loop Activity stage.
B. The body of the email notification activity using the user interface.
C. A command to be executed by a Routine Activity stage.
D. Name of a job to be executed by a Job Activity stage.
Answer: A,C

QUESTIONS
In which situation should a BASIC Transformer stage be used in a DataStage EE job?

A. in a job containing complex routines migrated from DataStage Server Edition
B. in a job requiring lookups to hashed files
C. in a large-volume job flow
D. in a job requiring complex, reusable logic
Answer: A

QUESTIONS
Which command can be used to execute DataStage jobs from a UNIX shell script?
A. dsjob
B. DSRunJob
C. osh
D. DSExecute
Answer: A

QUESTIONS
Which three UNIX kernel parameters have minimum requirements for DataStage
installations? (Choose three.)
A. MAXUPROC - maximum number of processes per user
B. NOFILES - number of open files
C. MAXPERM - disk cache threshold
D. NOPROC - no process limit
E. SHMMAX - maximum shared memory segment size
Answer: A,B,E

QUESTIONS
Which two requirements must be in place when using the DB2 Enterprise stage in LOAD
mode? (Choose two.)
A. Tablespace must be in read only mode prior to the job being launched.

B. User running the job has dbadm privileges.
C. Tablespace must be placed in a load pending state prior to the job being launched.
D. Tablespace cannot be addressed by anyone else while the load occurs.
Answer: B,D

QUESTIONS
Which Oracle Enterprise Stage property can be set using the DB Options group to tune
the performance of your job with regard to the number of network packets transferred
during the execution of the job?
A. memsize
B. blocksize
C. arraysize

D. transactionsize
Answer: C
QUESTIONS
Which two statements are true of the column data types used in Orchestrate schemas?
(Choose two.)

A. Orchestrate schema column data types are the same as those used in DataStage stages.
B. Examples of Orchestrate schema column data types are varchar and integer.
C. Examples of Orchestrate schema column data types are int32 and string [max=30].
D. OSH import operators are needed to convert data read from sequential files into
schema types.
Answer: C,D

QUESTIONS
An XML file is being processed by the XML Input stage. How can repetition elements be
identified on the stage?
A. No special settings are required. XML Input stage automatically detects the repetition
element from the XPath expression.
B. Set the "Key" property for the column on the output link to "Yes".
C. Check the "Repetition Element Required" box on the output link tab.
D. Set the "Nullable" property for the column on the output link to "Yes".
Answer: B

QUESTIONS
The high performance ETL server on which DataStage EE is installed is networked with
several other servers in the IT department with a very high bandwidth switch. A list of
seven files (all of which contain records with the same record layout) must be retrieved
from three of the other servers using FTP.
Given the high bandwidth network and high performance ETL server, which approach
will retrieve and process all seven files in the minimal amount of time?
A. In a single job, use seven separate FTP Enterprise stages the output links of which
lead to a single Sort Funnel stage, then process the records without landing to disk.
B. Setup a sequence of seven separate DataStage EE jobs, each of which retrieves a
single file and appends to a common dataset,
then process the resulting dataset in an eighth DataStage EE job.
C. Use three FTP Plug-in stages (one for each machine) to retrieve the seven files and
store them to a single file on the fourth server, then use the FTP Enterprise stage to
retrieve the single file and process the records without landing to disk.
D. Use a single FTP Enterprise stage and specify seven URI properties, one for each file,
then process the records without landing to disk.
Answer: D

QUESTIONS
Which two statements are true for parallel shared containers? (Choose two.)
A. When logic in a parallel shared container is changed, all jobs that use the parallel
shared container inherit the new shared logic without recompiling.
B. Within DataStage Manager, Usage Analysis can be used to build a multi-job compile
for all jobs used by a given shared container.
C. All container input and output links must specify every column that will be defined
when the container is used in a parallel job.
D. Parallel shared containers facilitate modular development by reusing common stages
and logic across multiple jobs.
Answer: B,D

QUESTIONS
Which task is performed by the DataStage JobMon daemon?
A. writes the job's OSH script to the job log
B. provides a snapshot of a job's performance
C. graceful shutdown of DataStage engine
D. automatically sets all environment variables
Answer: B

QUESTIONS
The last two steps of a job are an Aggregator stage using the Hash method and a
Sequential File stage with a Collector type of Auto that creates a comma delimited output
file for use by a common spreadsheet program. The job runs a long time because data
volumes have increased. Which two changes would improve performance? (Choose two.)
A. Change the Aggregator stage to a Transformer stage and use stage variables to
accumulate the aggregations.
B. Change the Sequential stage to a Data Set stage to allow the write to occur in parallel.
C. Change the Aggregator stage to use the sort method. Hash and sort on the aggregation keys.
D. Change the Sequential stage to use a Sort Merge collector on the aggregation keys.
Answer: C,D
QUESTIONS
Which two stages allow field names to be specified using job parameters? (Choose two.)
A. Transformer stage
B. Funnel stage
C. Modify stage
D. Filter stage
Answer: C,D

QUESTIONS
Which two statements describe the operation of the Merge stage? (Choose two.)
A. Duplicate records should always be removed from the master data set and from the
update data sets if there is more than one update data set.
B. Merge stages can only have one reject link.
C. Duplicate records should always be removed from the master data set and from the
update data set even if there is only one update data set.
D. Merge stages can have multiple reject links.
Answer: A,D
~JeMuk

QUESTIONS
Which two statements describe the properties needed by the Oracle Enterprise stage to
operate in parallel direct load mode. (Choose two.)
A. Only index organized tables allow you to use parallel direct load mode and you need
to set both DIRECT and PARALLEL properties to True, otherwise the load will be
executed in parallel but will not use direct mode.
B. Set the Write Method to Load.
C. The table can have any number of indexes but the job will run slower since the
indexes will have to be updated as the data is loaded.
D. The table must not have any indexes, or you must include the rebuild Index Mode
property unless the only index on the table is the primary key in which case you can use
the Disable Constraints property.
Answer: B,D

QUESTIONS
Which feature in DataStage will allow you determine all the jobs that are using a Shared
Container?
A. Extended Job view
B. Reporting Assistant
C. Usage Analysis
D. Impact Analysis
Answer: C

QUESTIONS
Data volumes have grown significantly in the last month. A parallel job that used to run
well is now using unacceptable amounts of disk space and running slowly. You have
reviewed the job and explicitly defined the sorting and partitioning requirements for each
stage but the behavior of the job has not changed. Which two actions improve
performance based on this information? (Choose two.)
A. Change the sort methods on all sorts to "Don't Sort - Previously Grouped".
B. Increase the value of environment variable APT_PARTITION_NUMBER to increase
the level of parallelism for the sorts.
C. Enable the environment variable APT_SORT_INSERTION_CHECK_ONLY in the job.

D. Enable the environment variable APT_NO_SORT_INSERTION in the job.
Answer: C,D

QUESTIONS
You have a series of jobs with points where the data is written to disk for restartability.
One of these restart points occurs just before data is written to a DB2 table and the
customer has requested that the data be archivable and externally shared. Which storage
technique would optimize performance and satisfy the customer's request?
A. DB2 tables
B. filesets
C. sequential files
D. datasets
Answer: B
QUESTIONS
Which three actions are performed using stage variables in a parallel Transformer stage?
(Choose three.)
A. A function can be executed once per record.
B. A function can be executed once per run.
C. Identify the first row of an input group.
D. Identify the last row of an input group.
E. Lookup up a value from a reference dataset.
Answer: A,B,C

QUESTIONS
Which three are valid ways within a Job Sequence to pass parameters to Activity stages?
(Choose three.)
A. ExecCommand Activity stage
B. UserVariables Activity stage
C. Sequencer Activity stage
D. Routine Activity stage
E. Nested Condition Activity stage
Answer: A,B,D

QUESTIONS
Which three privileges must the user possess when running a parallel job? (Choose
three.)
A. read access to APT_ORCHHOME
B. execute permissions on local copies of programs and scripts
C. read/write permissions to the UNIX/etc directory
D. read/write permissions to APT_ORCHHOME
E. read/write access to disk and scratch disk resources
Answer: A,B,E

QUESTIONS
When importing a COBOL file definition, which two are required? (Choose two.)
A. The file you are importing is accessible from your client workstation.
B. The file you are importing contains level 01 items.
C. The column definitions are in a COBOL copybook file and not, for example, in a COBOL source file.
D. The file does not contain any OCCURS DEPENDING ON clauses.
Answer: A,B

2 comments:

UnknownNovember 3, 2014 at 5:49 AM
Thanks for Information Datastage training can justify the ideas of DataStage Enterprise Edition, its design and the way to use this to ‘real life’ situations in an exceedingly business case-study during which you may solve business issues.Datastage Online Training
snehaJuly 10, 2020 at 1:13 AM
Your blog is in a convincing manner, thanks for sharing such an information with lots of your effort and time
datastage online training India
datastage online training Hyderabad

ETL

Thursday, June 10, 2010

DataStage 7.5

2 comments:

Labels

About Me