Wednesday, November 4, 2009

Informatica 8.x

PowerCenter 8.x components are
1. PowerCenter Domain 2. PowerCenter Repository3. Administration Console
4. PowerCenter Client 5. Repository Service 6. Integration Service
PowerCenter Domain
A domain is the primary unit for management and administration of services in PowerCenter.
Components of a domain are Node, Service Manager and Application Services.
Node
Node is the logical representation of a machine in a domain. The machine in which the PowerCenter
is installed acts as a Domain and also as a primary node. We can add other machines as nodes in the
domain and configure the nodes to run application services such as the Integration Service or
Repository Service. All service requests from other nodes in the domain go through the primary
node also called as ‘master gateway’.
The Service Manager
The Service Manager runs on each node within a domain and is responsible for starting and running
the application services. The Service Manager performs the following functions,
>Alerts. Provides notifications of events like shutdowns, restart
>Authentication. Authenticates user requests from the Administration Console, PowerCenter Client,
Metadata Manager, and Data Analyzer
>Domain configuration. Manages configuration details of the domain like machine name, port
>Node configuration. Manages configuration details of a node metadata like machine name, port
>Licensing. When an application service connects to the domain for the first time the licensing
registration is performed and for subsequent connections the licensing information is verified
>Logging. Manages the event logs from each service, the messages could be ‘Fatal’, ‘Error’, ‘Warning’, ‘Info’
>User management. Manages users, groups, roles, and privileges
Application services
The services that essentially perform data movement, connect to different data sources and manage
data are called Application services, they are namely Repository Service, Integration Service, Web
Services Hub, SAPBW Service, Reporting Service and Metadata Manager Service. The application services
run on each node based on the way we configure the node and the application service
Domain Configuration
Some of the configurations for a domain involves assigning host name, port numbers to the nodes,
setting up Resilience Timeout values, providing connection information of metadata Database, SMTP
details etc. All the Configuration information for a domain is stored in a set of relational database
tables within the repository. Some of the global properties that are applicable for Application
Services like ‘Maximum Restart Attempts’, ‘Dispatch Mode’ as ‘Round Robin’/’Metric Based’/’Adaptive’
etc are configured under Domain Configuration
2. PowerCenter Repository
The PowerCenter Repository is one of best metadata storage among all ETL products. The repository is
sufficiently normalized to store metadata at a very detail level; which in turn means the Updates to
therepository are very quick and the overall Team-based Development is smooth. The repository data
structure is also useful for the users to do analysis and reporting.
Accessibility to the repository through MX views and SDK kit extends the repositories capability from
a simple storage of technical data to a database for analysis of the ETL metadata.
PowerCenter Repository is a collection of 355 tables which can be created on any major relational database.
The kinds of information that are stored in the repository are,
1. Repository configuration details
2. Mappings
3. Workflows
4. User Security
5. Process Data of session runs
For a quick understanding,
When a user creates a folder, corresponding entries are made into table OPB_SUBJECT; attributes like
folder name, owner id, type of the folder like shared or not are all stored.When we create\import sources
and define field names, datatypes etc in source analyzer entries are made into opb_src and OPB_SRC_FLD.When
target and related fields are created/imported from any database entries are made into tables like OPB_TARG
and OPB_TARG_FLD.Table OPB_MAPPING stores mapping attributes like Mapping Name, Folder Id, Valid status and
mapping comments.Table OPB_WIDGET stores attributes like widget type, widget name, comments etc. Widgets are
nothing but the Transformations which Informatica internally calls them as Widgets.Table OPB_SESSION stores
configurations related to a session task and table OPB_CNX_ATTR stores information related to connection objects.
Table OPB_WFLOW_RUN stores process details like workflow name, workflow started time, workflow completed time,
server node it ran etc.REP_ALL_SOURCES, REP_ALL_TARGETS and REP_ALL_MAPPINGS are few of the many views created
over these tables.
PowerCenter applications access the PowerCenter repository through the Repository Service. The Repository Service
protects metadata in the repository by managing repository connections and using object-locking to ensure object
consistency.
We can create a repository as global or local. We can go for‘global’ to store common objects that multiple
developers can use through shortcuts and go for local repository to perform of development mappings and workflows.
From a local repository, we can create shortcuts to objects in shared folders in the global repository. PowerCenter
supports versioning. A versioned repository can store multiple versions of an object.
3. Administration Console
The Administration Console is a web application that we use to administer the PowerCenter domain and PowerCenter security.
There are two pages in the console, Domain Page & Security Page.We can do the following In Domain Page:
o Create & manage application services like Integration Service and Repository Service
o Create and manage nodes, licenses and folders
o Restart and shutdown nodes
o View log events
o Other domain management tasks like applying licenses and managing grids and resources
We can do the following in Security Page:
o Create, edit and delete native users and groups
o Configure a connection to an LDAP directory service. Import users and groups from the LDAP directory service
o Create, edit and delete Roles (Roles are collections of privileges)
o Assign roles and privileges to users and groups
o Create, edit, and delete operating system profiles. An operating system profile is a level of security that the
Integration Services uses to run workflows
4. PowerCenter Client
Designer, Workflow Manager, Workflow Monitor, Repository Manager & Data Stencil are five client tools that are used
to design mappings, Mapplets, create sessions to load data and manage repository.
Mapping is an ETL code pictorially depicting logical data flow from source to target involving transformations of
the data. Designer is the tool to create mappings
Designer has five window panes, Source Analyzer, Warehouse Designer, Transformation Developer, Mapping Designer
and Mapplet Designer.
Source Analyzer:
Allows us to import Source table metadata from Relational databases, flat files, XML and COBOL files. We can only
import the source definition in the source Analyzer and not the source data itself is to be understood. Source Analyzer also allows us to define our own Source data definition.
Warehouse Designer:
Allows us to import target table definitions which could be Relational databases, flat files, XML and COBOL files.
We can also create target definitions manually and can group them into folders. There is an option to create the
tables physically in the database that we do not have in source analyzer. Warehouse designer doesn’t allow creating
two tables with same name even if the columns names under them vary or they are from different databases/schemas.
Transformation Developer:
Transformations like Filters, Lookups, Expressions etc that have scope to be re-used are developed in this pane. Alternatively Transformations developed in Mapping Designer can also be reused by checking the option‘re-use’ and by that it would be displayed under Transformation Developer folders.
Mapping Designer:
This is the place where we actually depict our ETL process; we bring in source definitions, target definitions, transformations like filter, lookup, aggregate and develop a logical ETL program. In this place it is only a logical program because the actual data load can be done only by creating a session and workflow.
Mapplet Designer:
We create a set of transformations to be used and re-used across mappings.
5. Repository Service
As we already discussed about metadata repository, now we discuss a separate,multi-threaded process that retrieves, inserts and updates metadata in the repository database tables, it is Repository Service. Repository service manages connections to the PowerCenter repository from PowerCenter client applications like Desinger, Workflow Manager, Monitor, Repository manager, console and integration service. Repository service is responsible for ensuring the consistency of metdata in the repository.
Creation & Properties:
Use the PowerCenter Administration Console Navigator window to create a Repository Service. The properties needed to create are,
Service Name – name of the service like rep_SalesPerformanceDev
Location – Domain and folder where the service is created
License – license service name
Node, Primary Node & Backup Nodes – Node on which the service process runs
CodePage – The Repository Service uses the character set encoded in the repository code page when writing data to the repository
Database type & details – Type of database, username, pwd, connect string and tablespacename
The above properties are sufficient to create a repository service, however we can take a look at following features which are important for better performance and maintenance.
General Properties
> OperatingMode: Values are Normal and Exclusive. Use Exclusive mode to perform administrative tasks like enabling version control or promoting local to global repository
> EnableVersionControl: Creates a versioned repository
Node Assignments: “High availability option” is licensed feature which allows us to choose Primary & Backup nodes for continuous running of the repository service. Under normal licenses would see only only Node to select from
Database Properties
> DatabaseArrayOperationSize: Number of rows to fetch each time an array database operation is issued, such as insert or fetch. Default is 100
> DatabasePoolSize:Maximum number of connections to the repository database that the Repository Service can establish. If the Repository Service tries to establish more connections than specified for DatabasePoolSize, it times out the connection attempt after the number of seconds specified for DatabaseConnectionTimeout
Advanced Properties
> CommentsRequiredFor Checkin: Requires users to add comments when checking in repository objects.
> Error Severity Level: Level of error messages written to the Repository Service log. Specify one of the following message levels: Fatal, Error, Warning, Info, Trace & Debug
> EnableRepAgentCaching:Enables repository agent caching. Repository agent caching provides optimal performance of the repository when you run workflows. When you enable repository agent caching, the Repository Service process caches metadata requested by the Integration Service. Default is Yes.
> RACacheCapacity:Number of objects that the cache can contain when repository agent caching is enabled. You can increase the number of objects if there is available memory on the machine running the Repository Service process. The value must be between 100 and 10,000,000,000. Default is 10,000
> AllowWritesWithRACaching: Allows you to modify metadata in the repository when repository agent caching is enabled. When you allow writes, the Repository Service process flushes the cache each time you save metadata through the PowerCenter Client tools. You might want to disable writes to improve performance in a production environment where the Integration Service makes all changes to repository metadata. Default is Yes.
Environment Variables
The database client code page on a node is usually controlled by an environment variable. For example, Oracle uses NLS_LANG, and IBM DB2 uses DB2CODEPAGE. All Integration Services and Repository Services that run on this node use the same environment variable. You can configure a Repository Service process to use a different value for the database client code page environment variable than the value set for the node.
You might want to configure the code page environment variable for a Repository Service process when the Repository Service process requires a different database client code page than the Integration Service process running on the same node.
For example, the Integration Service reads from and writes to databases using the UTF-8 code page. The Integration Service requires that the code page environment variable be set to UTF-8. However, you have a Shift-JIS repository that requires that the code page environment variable be set to Shift-JIS. Set the environment variable on the node to UTF-8. Then add the environment variable to the Repository Service process properties and set the value to Shift-JIS.
6. Integration Service (IS)
The key functions of IS are
§ Interpretation of the workflow and mapping metadata from the repository.
§ Execution of the instructions in the metadata
§ Manages the data from source system to target system within the memory and disk
The main three components of Integration Service which enable data movement are,
§ Integration Service Process
§ Load Balancer
§ Data Transformation Manager
6.1 Integration Service Process (ISP)
The Integration Service starts one or more Integration Service processes to run and monitor workflows. When we run a workflow, the ISP starts and locks the workflow, runs the workflow tasks, and starts the process to run sessions. The functions of the Integration Service Process are,
§ Locks and reads the workflow
§ Manages workflow scheduling, ie, maintains session dependency
§ Reads the workflow parameter file
§ Creates the workflow log
§ Runs workflow tasks and evaluates the conditional links
§ Starts the DTM process to run the session
§ Writes historical run information to the repository
§ Sends post-session emails
6.2 Load Balancer
The Load Balancer dispatches tasks to achieve optimal performance. It dispatches tasks to a single node or across the nodes in a grid after performing a sequence of steps. Before understanding these steps we have to know about Resources, Resource Provision Thresholds, Dispatch mode and Service levels
§ Resources – we can configure the Integration Service to check the resources available on each node and match them with the resources required to run the task. For example, if a session uses an SAP source, the Load Balancer dispatches the session only to nodes where the SAP client is installed
§ Three Resource Provision Thresholds, The maximum number of runnable threads waiting for CPU resources on the node called Maximum CPU Run Queue Length. The maximum percentage of virtual memory allocated on the node relative to the total physical memory size called Maximum Memory %. The maximum number of running Session and Command tasks allowed for each Integration Service process running on the node called Maximum Processes
§ Three Dispatch mode’s – Round-Robin: The Load Balancer dispatches tasks to available nodes in a round-robin fashion after checking the “Maximum Process” threshold. Metric-based: Checks all the three resource provision thresholds and dispatches tasks in round robin fashion. Adaptive: Checks all the three resource provision thresholds and also ranks nodes according to current CPU availability
§ Service Levels establishes priority among tasks that are waiting to be dispatched, the three components of service levels are Name, Dispatch Priority and Maximum dispatch wait time. “Maximum dispatch wait time” is the amount of time a task can wait in queue and this ensures no task waits forever
A .Dispatching Tasks on a node
1. The Load Balancer checks different resource provision thresholds on the node depending on the Dispatch mode set. If dispatching the task causes any threshold to be exceeded, the Load Balancer places the task in the dispatch queue, and it dispatches the task later
2. The Load Balancer dispatches all tasks to the node that runs the master Integration Service process
B. Dispatching Tasks on a grid,
1. The Load Balancer verifies which nodes are currently running and enabled
2. The Load Balancer identifies nodes that have the PowerCenter resources required by the tasks in the workflow
3. The Load Balancer verifies that the resource provision thresholds on each candidate node are not exceeded. If dispatching the task causes a threshold to be exceeded, the Load Balancer places the task in the dispatch queue, and it dispatches the task later
4. The Load Balancer selects a node based on the dispatch mode
6.3 Data Transformation Manager (DTM) Process
When the workflow reaches a session, the Integration Service Process starts the DTM process. The DTM is the process associated with the session task. The DTM process performs the following tasks:
§ Retrieves and validates session information from the repository.
§ Validates source and target code pages.
§ Verifies connection object permissions.
§ Performs pushdown optimization when the session is configured for pushdown optimization.
§ Adds partitions to the session when the session is configured for dynamic partitioning.
§ Expands the service process variables, session parameters, and mapping variables and parameters.
§ Creates the session log.
§ Runs pre-session shell commands, stored procedures, and SQL.
§ Sends a request to start worker DTM processes on other nodes when the session is configured to run on a grid.
§ Creates and runs mapping, reader, writer, and transformation threads to extract, transform, and load data
§ Runs post-session stored procedures, SQL, and shell commands and sends post-session email
§ After the session is complete, reports execution result to ISP
Pictorial Representation of Workflow execution:
1. A PowerCenter Client request IS to start workflow
2. IS starts ISP
3. ISP consults LB to select node
4. ISP starts DTM in node selected by LB

Learn sql online with different database http://sqlzoo.net/
Learn java,c#,c++,Perl,php,vb online and comparatively. http://progzoo.net/

1 comment:

  1. For latest and updated Informatica certification dumps in PDF format contact us at completeexamcollection@gmail.com

    ReplyDelete