Friday, 20 November 2015

SSIS Interview Question and Answer (Last Post Date 20-11-2015)

Hello Friends, 
Here i have a collection of SSIS Interview Question. Hope it will help you little bit...



#  What do we mean by dataflow in SSIS?
Data flow is nothing but the flow of data from the corresponding sources to the referred destinations. In this process, the data transformations make changes to the data to make it ready for the data warehouse
.
Data flow task found in the control flow tools box.



# What is a breakpoint in SSIS? How is it setup? How do you disable it?
A breakpoint is a stopping point in the code. The breakpoint can give the Developer\DBA an
opportunity to review the status of the data, variables and the overall status of the SSIS package.
10 unique conditions exist for each breakpoint.
Breakpoints are setup in BIDS. In BIDS, navigate to the control flow interface. Right click on the
object where you want to set the breakpoint and select the ‘Edit Breakpoints…’ option.

# Can you name 5 or more of the native SSIS connection managers?
When we move to data flow tasks the Source and Destination Connection manager found.

1) OLEDB connection – Used to connect to any data source requiring an OLEDB connection (i.e.,SQL Server 2000)
2) Flat file connection – Used to make a connection to a single file in the File System. Required for reading information from a File System flat file
3) ADO.Net connection – Uses the .Net Provider to make a connection to SQL Server 2005 or other connection exposed through managed code (like C#) in a custom task
4) Analysis Services connection – Used to make a connection to an Analysis Services database or project. Required for the Analysis Services DDL Task and Analysis Services Processing Task
5) File connection – Used to reference a file or folder. The options are to either use or create a file or folder
6) Excel
# What is the use of Bulk Insert Task in SSIS?
Bulk Insert Task is used to upload large amount of data from flat files into Sql Server. It supports only OLE DB connections for destination database. Please note that it only Supports OLEDB connection for Destination.

In Bulk Insert à Option tab à Property Option
1.    Check constraints
2.    Keep nulls,
3.    Enable identity insert
4.    Table lock
5.    Fire triggers



#What is Conditional Split transformation in SSIS?
This is just like IF condition which checks for the given condition and based on the condition evaluation, the output will be sent to the appropriate OUTPUT path. It has ONE input and MANY outputs. Conditional Split transformation is used to send paths to different outputs based on some conditions.
For example, we can organize the transform for the students in a class who have marks greater than 40 to one path and the students who score less than 40 to another path.


# How do you eliminate quotes from being uploaded from a flat file to SQL Server? 
This can be done using TEXT QUALIFIER property. In the SSIS package on the Flat File Connection Manager Editor, enter quotes into the Text qualifier field then preview the data to ensure the quotes are not included.

# Can you explain how to setup a checkpoint file in SSIS?
The following items need to be configured on the properties tab for SSIS package:
CheckpointFileName – Specify the full path to the Checkpoint file that the package uses to save the value of package variables and log completed tasks. Rather than using a hard-coded path as shown above, it’s a good idea to use an expression that concatenates a path defined in a package variable and the package name.

CheckpointUsage – Determines if/how checkpoints are used. Choose from these options: Never(default), IfExists, or Always. Never indicates that you are not using Checkpoints. IfExists is the typical setting and implements the restart at the point of failure behavior. If a Checkpoint file is found it is used to restore package variable values and restart at the point of failure. If a Checkpoint file is not found the package starts execution with the first task. The Always choice raises an error if the Checkpoint file does not exist.

SaveCheckpoints – Choose from these options: True or False (default). You must select True to implement the Checkpoint behavior.

In Package Level we have to configure FailPackageonFailure to True to save data in check point file.

# What are the different values you can set for CheckpointUsageproperty ?
There are three values, which describe how a checkpoint file is used during package execution:
1) Never: The package will not use a checkpoint file and therefore will never restart.
2) If Exists: If a checkpoint file exists in the place you specified for the CheckpointFilename property, then it will be used, and the package will restart according to the checkpoints written.
3) Always: The package will always use a checkpoint file to restart, and if one does not exist, the package will fail.

# What is the ONLY Property you need to set on TASKS in order to configure CHECKPOINTS to RESTART package from failure?
The one property you have to set on the task is FailPackageOnFailure. This must be set for each task or container that you want to be the point for a checkpoint and restart. If you do not set this property to true and the task fails, no file will be written, and the next time you invoke the package, it will start from the beginning again.


# Where can we set the CHECKPOINTS, in DataFlow or ControlFlow ?
Checkpoints only happen at the Control Flow; it is not possible to checkpoint transformations or restart inside a Data Flow. The Data Flow Task can be a checkpoint, but it is treated as any other task.

# Can you explain different options for dynamic configurations in SSIS?

1) XML file
2) custom variables
3) Database per environment with the variables
4) Use a centralized database with all variables
 

This is the Old version Package Level configuration Where we can configure by
1. XML Configuration File
2. Environmental Variable
3. Registry  Entry
4. Parent Package Variable
5. SQL Server
#  What is the use of Percentage Sampling transformation in SSIS?
Percentage Sampling transformation is generally used for data mining. This transformation builds a random sample of set of output rows by choosing specified percentage of input rows. For example if the input has 1000 rows and if I specify 10 as percentage sample then the transformation returns 10% of the RANDOM records from the input data.







# What is the use of Term Extraction transformation in SSIS?
Term Extraction transformation is used to extract nouns or noun phrases or both noun and noun phrases only from English text. It extracts terms from text in a transformation input column and then writes the terms to a transformation output column. It can be also used to find out the content of a dataset.





# What is Data Viewer and what are the different types of Data Viewers in SSIS?
A Data Viewer allows viewing data at a point of time at runtime. If data viewer is placed before and after the Aggregate transform, we can see data flowing to the transformation at the runtime and how it looks like after the transformation occurred. The different types of data viewers are:

1. Grid
2. Histogram
3. Scatter Plot
4. Column Chart
From SQL 2012 onwards only the GRID view is available.

# What is Ignore Failure option in SSIS?
In Ignore Failure option, the error will be ignored and the data row will be directed to continue on the next transformation. Let’s say you have some JUNK data(wrong type of data or JUNK data) flowing from source, then using this option in SSIS we can REDIRECT the junk data records to another transformation instead of FAILING the package. This helps to MOVE only valid data to destination and JUNK can be captured into separate file.


In the Error output there are three Error Event
1.    Ignore Failure – Ignore the error and move to the next task
2.    Fail Component – If the error exists it not ignored and the package execution stops
3.    Redirect Row- Redirect the error output to another path
# Which are the different types of Control Flow components in SSIS?
The different types of Control Flow components are: Data Flow Tasks, SQL Server Tasks, Data Preparation Tasks, Work flow Tasks, Scripting Tasks, Analysis Services Tasks, Maintenance Tasks, Containers.

# What are containers? What are the different types of containers in SSIS?
Containers are objects that provide structures to packages and extra functionality to tasks. There are four types of containers in SSIS, they are: Foreach Loop Container, For Loop Container, Sequence Container and Task Host Container.

# What are the different types of Data flow components in SSIS?
There are 3 data flow components in SSIS.
1. Sources
2. Transformations
3. Destinations

# What are the different types of data sources available in SSIS?
There are 7 types of data sources provided by SSIS: a.) Data Reader source b.) Excel source c.) Flat file source d.)OLEDB source e.)Raw file source f.)XML source g.) Script component

#  What is SSIS Designer?
It is a graphical tool for creating packages. It has 4 tabs: Control Flow, Data Flow, Event Handlers and Package Explorer.

# What is Control Flow tab?
It is the tab in SSIS designer where various Tasks can be arranged and configured. This is the tab where we provide and control the program flow of the project.

# What is Data Flow tab?
This is the tab where we do all the work related to ETL job. It is the tab in SSIS Designer where we can extract data from sources, transform the data and then load them into destinations.

# What is the function of control flow tab in SSIS?
On the control flow tab, the tasks including dataflow task, containers and precedence constraints that connect containers and tasks can be arranged and configured.

# What is the function of Event handlers tab in SSIS?
On the Event handlers tab, workflows can be configured to respond to package events.
For example, we can configure Work Flow when ANY task Failes or Stops or Starts ..

# What is the function of Package explorer tab in SSIS?
This tab provides an explorer view of the package. You can see what is happening in the package. The Package is a container at the top of the hierarchy.


# What is Solution Explorer?
It is a place in SSIS Designer where all the projects, Data Sources, Data Source Views and other miscellaneous files can be viewed and accessed for modification.
# How do we convert data type in SSIS?
The Data Conversion Transformation in SSIS converts the data type of an input column to a different data type.

#How are variables useful in ssis package?
Variables can provide communication among objects in the package. Variables can provide communication between parent and child packages. Variables can also be used in expressions and scripts. This helps in providing dynamic values to tasks.

Control Flow à Right Click à Variables


Click on Move Variable button to change the scope of variable from Package to Container and Executable Task level. 


  
# Explain Aggregate Transformation in SSIS?
It aggregates data, similar you do in applying TSQL functions like Group By, Min, Max, Avg, and Count. For example you get total quantity and Total line item for each product in Aggregate Transformation Editor. First you determine input columns, then output column name in Output Alias table in datagrid, and also operations for each Output Alias in Operation columns of the same datagrid. Some of operation functions listed below :
• Group By
• Average
• Count
• Count Distinct : count distinct and non null column value
• Min, Max, Sum
In Advanced tab, you can do some optimization here, such as setting up Key Scale option (low, medium, high), Count Distinct scale option (low, medium, high), Auto Extend factor and Warn On Division By Zero. If you check Warn On Division By Zero, the component will give warning instead of error. Key Scale option will optimize transformation cache to certain number of key threshold. If you set it low, optimization will target to 500,000 keys written to cache, medium can handle up to 5 million keys, and high can handle up to 25 million keys, or you can specify particular number of keys here. Default value is unspecified. Similar to number of keys for Count Distinct scale option. It is used to optimize number of distinct value written to memory, default value is unspecified. Auto Extend Factor is used when you want some portion of memory is used for this component. Default value is 25% of memory.

# Explain Audit Transformation ?
It allows you to add auditing information as required in auditing world specified by HIPPA and Sarbanes-Oxley (SOX). Auditing options that you can add to transformed data through this transformation are :
1. Execution of Instance GUID : ID of execution instance of the package
2. PackageID : ID of the package
3. PackageName
4.VersionID : GUID version of the package
5. Execution StartTime
6.MachineName
7.UserName
8.TaskName
9.TaskID :uniqueidentifier type of the data flow task that contains audit transformation.



#  Explain Character Map Transformation?
It transforms some character. It gives options whether output result will override the existing column or add to new column. If you define it as new column, specify new column name. Operations available here are:
1. Uppercase
2. Lowercase
3. Byte reversal : such as from 0×1234 to 0×4321
4. Full width
5.Half width
6.Hiragana/katakana/traditional Chinese/simplified Chinese
7. Linguistic casing

# Explain Conditional split Transformation ?
It functions as if…then…else construct. It enables send input data to a satisfied conditional branch. For example you want to split product quantity between less than 500 and greater or equal to 500. You can give the conditional a name that easily identifies its purpose. Else section will be covered in Default Output Column name.
After you configure the component, it connect to subsequent transformation/destination, when connected, it pops up dialog box to let you choose which conditional options will apply to the destination transformation/destination.

#Explain Copy column Transformation?
This component simply copies a column to another new column. Just like ALIAS Column in T-Sql.

Here the Copy Columns is different than derived columns where copy columns is just change the column name (the Alias name of the columns in T-SQL) , but in derived columns we can use exression to make another completely new columns like ‘Firtst Name’ + ‘Middle Name’ +’Last Name’ is Full Employee Name.

# Explain Data conversion Transformation?
This component does conversion data type, similar to TSQL function CAST or CONVERT. If you wish to convery the data from one type to another then this is the best bet. But please make sure that you have COMPATABLE data in the column.

# Explain Data Mining query Transformation?  *** (Later On)
This component does prediction on the data or fills gap on it. Some good scenarios uses this component is:
1. Take some input columns as number of children, domestic income, and marital income to predict whether someone owns a house or not.
2. Take prediction what a customer would buy based analysis buying pattern on their shopping cart.
3. Filling blank data or default values when customer doesn’t fill some items in the questionnaire.

# Explain Derived column Transformation?
Derived column creates new column or put manipulation of several columns into new column. You can directly copy existing or create a new column using more than one column also.

# Explain Merge Transformation?
Merge transformation merges two paths into single path. It is useful when you want to break out data into path that handles errors after the errors are handled, the data are merge back into downstream or you want to merge 2 data sources. It is similar with Union All transformation, but Merge has some restrictions:
1. Data should be in sorted order
2. Data type , data length and other meta data attribute must be similar before merged.
3. Merge takes only 2 Input but Union All Takes Many Input

#  Explain Merge Join Transformation?
Merge Join transformation will merge output from 2 inputs and doing INNER or OUTER join on the data. But if you the data come from 1 OLEDB data source, it is better you join through SQL query rather than using Merge Join transformation. Merge Join is intended to join 2 different data source.

# Explain Multicast Transformation?
This transformation sends output to multiple output paths with no conditional as Conditional Split does. Takes ONE Input and makes the COPY of data and passes the same data through many outputs. In simple Give one input and take many outputs of the same data.

The best part is we need not configure it.

# Explain Percentage and row sampling Transformations?
This transformation will take data from source and randomly sampling data. It gives you 2 outputs. First is selected data and second one is unselected data. It is used in situation where you train data mining model. These two are used to take the SAMPLE of data from the input data.

#  Explain Sort Transformation?
This component will sort data, similar in TSQL command ORDER BY. Some transformations need sorted data.

#  Explain Union all Transformation?
It works in opposite way to Merge transformation. It can take output from more than 2 input paths and combines into single output path.

# What r the possible locations to save SSIS package?
You can save a package wherever you want.
SQL Server
Package Store
File System

# What is a package?
A discrete executable unit of work composed of a collection of control flow and other objects, including data sources, transformations, process sequence, and rules, errors and event handling, and data destinations.

# What is a workflow in SSIS?
A workflow is a set of instructions on how to execute tasks.
(It is a set of instructions on how to execute tasks such as sessions, emails and shell commands. a workflow is created form work flow mgr.

#  What is the diff between control flow Items and data flow Items?
The control flow is the highest level control process. It allows you to manage the run-time process activities of data flow and other processes within a package.
When we want to extract, transform and load data within a package. You add an SSIS dataflow task to the package control flow.

#  What are the main component of SSIS(project-architecture)?
SSIS archItecture has 4 main components
1.ssis service
2.ssis runtime engine & runtime executables
3.ssis dataflow engine & dataflow components
4.ssis clients

#  Different components in SSIS package?
Control flow
Data flow
Event handler
Package explorer

#  What are Connection Managers?
It is a bridge b/w package object and physical data. It provides logical representation of a connection at design time the properties of the connection mgr describes the physical connection that integration services creates when the package is run.

#  What is environment variable in SSIS?
An environment variable configuration sets a package property equal to the value in an environment variable.
Environmental configurations are useful for configuring properties that are dependent on the computer that is executing the package.

#  How to provide securIty to packages?
We can provide security in two ways
1. Package encryption
2.Password protection.
#  What are Precedence constraints?
Constraints that link executable, container, and tasks wIthin the package control flow and specify condItion that determine the sequence and condItions for determine whether executable run.


# What is Design time Deployment in SSIS ?
When you run a package from with inBIDS,it is built and temporarily deployed to the folder. By default the package will be deployed to the BIN folder in the Package’s Project folder and you can configure for custom folder for deployment. When the Package’s execution is completed and stopped in BIDS,the deployed package will be deleted and this is called as Design Time Deployment.


#Explain why variables called the most powerful component of SSIS.
Variable allows us to dynamically control the package at runtime. Example: You have some custom code or script that determines the query parameter’s value. Now, we cannot have fixed value for query parameter. In such scenarios, we can use variables and refer the variable to query parameter. We can use variables for like:
1. updating the properties at runtime, 
2. populating the query parameter value at runtime, 
3. used in script task, 
4. Error handling logic and 
5. With various looping logic.

# Can we add our custom code in SSIS?

We can customize SSIS through code by using Script Task. The main purpose of this task is to control the flow of the package. This is very useful in the scenario where the functionality you want to implement is not available in existing control flow item.
To add your own code:-
1. In control flow tab, drag and drop Script Task from toolbox. 
2. Double click on script task to open and select edit to open Script task editor. 
3. In script task editor, there are 3 main properties
    i.) General – Here you can specify name and description
    ii.) Script – through this we can add our code by clicking on Design Script button. The scripting language present is      VB.Net only.
    iii.) Expression

# Error Handling in SSIS?

An error handler allows us to create flows to handle errors in the package in quite an easy way. Through event handler tab, we can name the event on which we want to handle errors and the task that needs to be performed when such an error arises. We can also add sending mail functionality in event of any error through SMTP Task in Event handler. This is quite useful in event of any failure in office non-working hours. In Data flow, we can handle errors for each connection through following failure path or red arrow.

# What is a Precedence Constraint and what types of Precedence Constraint are there?

  • SSIS allows you to place as many as tasks you want to be placed in control flow. You can connect all these tasks using connectors called Precedence Constraints. Precedence Constraints allow you to define the logical sequence of tasks in the order they should be executed. You can also specify a condition to be evaluated before the next task in the flow is executed.
  • These are the types of precedence constraints and the condition could be either a constraint, an expression or both 
    • Success (next task will be executed only when the last task completed successfully) or
    • Failure (next task will be executed only when the last task failed) or
    • Complete (next task will be executed no matter the last task was completed or failed).

# What is the “RetainSameConnection” property and what is its impact?

Whenever a task uses a connection manager to connect to source or destination database, a connection is opened and closed with the execution of that task. Sometimes you might need to open a connection, execute multiple tasks and close it at the end of the execution. This is where RetainSameConnection property of the connection manager might help you. When you set this property to TRUE, the connection will be opened on first time it is used and remain open until execution of the package completes.
Generally used to Handle Temporary Table.

# What are the different SSIS log providers?

There are several places where you can log execution data generated by an SSIS event log:
    • SSIS log provider for Text files
    • SSIS log provider for Windows Event Log . for this 32 bit compilation needed.
    • SSIS log provider for XML files
    • SSIS log provider for SQL Profiler
    • SSIS log provider for SQL Server, which writes the data to the msdb..sysdtslog90 or msdb..sysssislog table depending on the SQL Server version.

# What is the transaction support feature in SSIS?

When you execute a package, every task of the package executes in its own transaction. What if you want to execute two or more tasks in a single transaction? This is where the transaction support feature helps. You can group all your logically related tasks in single group.  Next you can set the transaction property appropriately to enable a transaction so that all the tasks of the package run in a single transaction. This way you can ensure either all of the tasks complete successfully or if any of them fails, the transaction gets roll-backed too.

# What properties do you need to configure in order to use the transaction feature in SSIS?

  • Suppose you want to execute 5 tasks in a single transaction, in this case you can place all 5 tasks in a Sequence Container and set the TransactionOption and IsolationLevel properties appropriately.
    • The TransactionOption property expects one of these three values:
      • Supported - The container/task does not create a separate transaction, but if the parent object has already initiated a transaction then participate in it
      • Required - The container/task creates a new transaction irrespective of any transaction initiated by the parent object
      • NotSupported - The container/task neither creates a transaction nor participates in any transaction initiated by the parent object
  • Isolation level dictates how two more transaction maintains consistency and concurrency when they are running in parallel. To learn more about Transaction and Isolation Level, refer to this tip.

# When I enabled transactions in an SSIS package, it failed with this exception: "The Transaction Manager is not available. The DTC transaction failed to start."  What caused this exception and how can it be fixed?

SSIS uses the MS DTC (Microsoft Distributed Transaction Coordinator) Windows Service for transaction support.  As such, you need to ensure this service is running on the machine where you are actually executing the SSIS packages or the package execution will fail with the exception message as indicated in this question.

# What is the DisableEventHandlers property used for?

Consider you have a task or package with several event handlers, but for some reason you do not want event handlers to be called. One simple solution is to delete all of the event handlers, but that would not be viable if you want to use them in the future. This is where you can use the DisableEventHandlers property.  You can set this property to TRUE and all event handlers will be disabled. Please note with this property you simply disable the event handlers and you are not actually removing them.  This means you can set this value to FALSE and the event handlers will once again be executed.

# What is SSIS validation?

SSIS validates the package and all of it's tasks to ensure it has been configured correctly.  With a given set of configurations and values, all the tasks and package will execute successfully. In other words, during the validation process, SSIS checks if the source and destination locations are accessible and the meta data about the source and destination tables are stored with the package are correct, so that the task will not fail if executed. The validation process reports warnings and errors depending on the validation failure detected. For example, if the source/destination tables/columns get changed/dropped it will show as error.  Whereas if you are accessing more columns than used to write to the destination object this will be flagged as a warning. To learn about validation click here.

# Define design time validation versus run time validation.

Design time validation is performed when you are opening your package in BIDS whereas run time validation is performed when you are actually executing the package.

# Define early validation (package level validation) versus late validation (component level validation).

When a package is executed, the package goes through the validation process.  All of the components/tasks of package are validated before actually starting the package execution. This is called early validation or package level validation. During execution of a package, SSIS validates the component/task again before executing that particular component/task.  This is called late validation or component level validation.

# What is DelayValidation and what is the significance?

As I said before, during early validation all of the components of the package are validated along with the package itself. If any of the component/task fails to validate, SSIS will not start the package execution. In most cases this is fine, but what if the second task is dependent on the first task?  For example, say you are creating a table in the first task and referring to the same table in the second task? When early validation starts, it will not be able to validate the second task as the dependent table has not been created yet.  Keep in mind that early validation is performed before the package execution starts. So what should we do in this case?  How can we ensure the package is executed successfully and the logically flow of the package is correct?  This is where you can use the DelayValidation property.  In the above scenario you should set the DelayValidation property of the second task to TRUE in which case early validation i.e. package level validation is skipped for that task and that task would only be validated during late validation i.e. component level validation. Please note using the DelayValidation property you can only skip early validation for that specific task, there is no way to skip late or component level validation.

# What are the different components in the SSIS architecture?

  • The SSIS architecture comprises of four main components:
    • The SSIS runtime engine manages the workflow of the package
    • The data flow pipeline engine manages the flow of data from source to destination and in-memory transformations
    • The SSIS object model is used for programmatically creating, managing and monitoring SSIS packages
    • The SSIS windows service allows managing and monitoring packages
  • To learn more about the architecture click here.

# How is SSIS runtime engine different from the SSIS dataflow pipeline engine?

  • The SSIS Runtime Engine manages the workflow of the packages during runtime, which means its role is to execute the tasks in a defined sequence.  As you know, you can define the sequence using precedence constraints. This engine is also responsible for providing support for event logging, breakpoints in the BIDS designer, package configuration, transactions and connections. The SSIS Runtime engine has been designed to support concurrent/parallel execution of tasks in the package.
  • The Dataflow Pipeline Engine is responsible for executing the data flow tasks of the package. It creates a dataflow pipeline by allocating in-memory structure for storing data in-transit. This means, the engine pulls data from source, stores it in memory, executes the required transformation in the data stored in memory and finally loads the data to the destination. Like the SSIS runtime engine, the Dataflow pipeline has been designed to do its work in parallel by creating multiple threads and enabling them to run multiple execution trees/units in parallel.

# How is a synchronous (non-blocking) transformation different from an asynchronous (blocking) transformation in SQL Server Integration Services?

  • A transformation changes the data in the required format before loading it to the destination or passing the data down the path. The transformation can be categorized in Synchronous and Asynchronous transformation.
  • A transformation is called synchronous when it processes each incoming row (modify the data in required format in place only so that the layout of the result-set remains same) and passes them down the hierarchy/path. It means, output rows are synchronous with the input rows (1:1 relationship between input and output rows) and hence it uses the same allocated buffer set/memory and does not require additional memory. Please note, these kinds of transformations have lower memory requirements as they work on a row-by-row basis (and hence run quite faster) and do not block the data flow in the pipeline. Some of the examples are : Lookup, Derived Columns, Data Conversion, Copy column, Multicast, Row count transformations, etc.
  • A transformation is called Asynchronous when it requires all incoming rows to be stored locally in the memory before it can start producing output rows. For example, with an Aggregate Transformation, it requires all the rows to be loaded and stored in memory before it can aggregate and produce the output rows. This way you can see input rows are not in sync with output rows and more memory is required to store the whole set of data (no memory reuse) for both the data input and output. These kind of transformations have higher memory requirements (and there are high chances of buffer spooling to disk if insufficient memory is available) and generally runs slower. The asynchronous transformations are also called "blocking transformations" because of its nature of blocking the output rows unless all input rows are read into memory. To learn more about it click here.

# What is the difference between a partially blocking transformation versus a fully blocking transformation in SQL Server Integration Services?

  • Asynchronous transformations, as discussed in last question, can be further divided in two categories depending on their blocking behavior:
    • Partially Blocking Transformations do not block the output until a full read of the inputs occur.  However, they require new buffers/memory to be allocated to store the newly created result-set because the output from these kind of transformations differs from the input set. For example, Merge Join transformation joins two sorted inputs and produces a merged output. In this case if you notice, the data flow pipeline engine creates two input sets of memory, but the merged output from the transformation requires another set of output buffers as structure of the output rows which are different from the input rows. It means the memory requirement for this type of transformations is higher than synchronous transformations where the transformation is completed in place.
    • Full Blocking Transformations, apart from requiring an additional set of output buffers, also blocks the output completely unless the whole input set is read. For example, the Sort Transformation requires all input rows to be available before it can start sorting and pass down the rows to the output path. These kind of transformations are most expensive and should be used only as needed. For example, if you can get sorted data from the source system, use that logic instead of using a Sort transformation to sort the data in transit/memory. To learn more about it click here.

# What is an SSIS execution tree and how can I analyze the execution trees of a data flow task?

  • The work to be done in the data flow task is divided into multiple chunks, which are called execution units, by the dataflow pipeline engine.  Each represents a group of transformations. The individual execution unit is called an execution tree, which can be executed by separate thread along with other execution trees in a parallel manner. The memory structure is also called a data buffer, which gets created by the data flow pipeline engine and has the scope of each individual execution tree. An execution tree normally starts at either the source or an asynchronous transformation and ends at the first asynchronous transformation or a destination. During execution of the execution tree, the source reads the data, then stores the data to a buffer, executes the transformation in the buffer and passes the buffer to the next execution tree in the path by passing the pointers to the buffers. 
  • To see how many execution trees are getting created and how many rows are getting stored in each buffer for a individual data flow task, you can enable logging of these events of data flow task: PipelineExecutionTrees, PipelineComponentTime, PipelineInitialization, BufferSizeTunning, etc. 

# How can an SSIS package be scheduled to execute at a defined time or at a defined interval per day?

You can configure a SQL Server Agent Job with a job step type of SQL Server Integration Services Package, the job invokes the dtexec command line utility internally to execute the package. You can run the job (and in turn the SSIS package) on demand or you can create a schedule for a one time need or on a reoccurring basis.

# What is an SSIS Proxy account and why would you create it?

When we try to execute an SSIS package from a SQL Server Agent Job it fails with the message "Non-SysAdmins have been denied permission to run DTS Execution job steps without a proxy account". This error message is generated if the account under which SQL Server Agent Service is running and the job owner is not a sysadmin on the instance or the job step is not set to run under a proxy account associated with the SSIS subsystem. 

# How can you configure your SSIS package to run in 32-bit mode on 64-bit machine when using some data providers which are not available on the 64-bit platform?

In order to run an SSIS package in 32-bit mode the SSIS project property Run64BitRuntime needs to be set to "False".  The default configuration for this property is "True".  This configuration is an instruction to load the 32-bit runtime environment rather than 64-bit, and your packages will still run without any additional changes. The property can be found under SSIS Project Property Pages -> Configuration Properties -> Debugging.

# The data in the Flat File as follows:
"132","Ramakrishna"," " ,"Hyderabad"
"132","Radhika","17","Vangara"
How to remove double quotes from the file to process the data?

In the Flat File Connection Manager Editor, Enter double quotes in Text Qualifier text box:





# Will trigger fire when inserting data through SSIS package?

In the data flow task, go to the Advanced Editor of OLEDB Destination, and there should be a property "FastLoadOptions". Specify FIRE_TRIGGERS as an additional option


# What are the different types of Transformations you have worked

AGGEGATE -The Aggregate transformation applies aggregate functions to column values and copies the results to the transformation output. Besides aggregate functions, the transformation provides the GROUP BY clause, which you can use to specify groups to aggregate across.
The Aggregate Transformation supports following operations:
Group By, Sum, Average, Count, Count Distinct, Minimum, Maximum
AUDIT - Adds Package and Task level Metadata - such as Machine Name, Execution Instance, Package Name, Package ID, etc..
CHARACTER MAP - When it comes to string formatting in SSIS, Character Map transformation is very useful, used to convert data lower case, upper case.
CONDITIONAL SPLIT – used to split the input source data based on condition.
COPY COLUMN - Add a copy of column to the output, we can later transform the copy keeping the original for auditing.
DATA CONVERSION - Converts columns data types from one to another type. It stands for Explicit Column Conversion.
DATA MINING QUERY – Used to perform data mining query against analysis services and manage Predictions Graphs and Controls.
DERIVED COLUMN - Create a new (computed) column from given expressions.
EXPORT COLUMN – Used to export a Image specific column from the database to a flat file.
FUZZY GROUPING – Groups the rows in the dataset that contain similar values.
FUZZY LOOKUP - Used for Pattern Matching and Ranking based on fuzzy logic.
IMPORT COLUMN - Reads image specific column from database onto a flat file.
LOOKUP - Performs the lookup (searching) of a given reference object set against a data source. It is used to find exact matches only.
MERGE - Merges two sorted data sets of same column structure into a single output.
MERGE JOIN - Merges two sorted data sets into a single dataset using a join.
MULTI CAST - is used to create/distribute exact copies of the source dataset to one or more destination datasets.
ROW COUNT - Stores the resulting row count from the data flow / transformation into a variable.
ROW SAMPLING - Captures sample data by using a row count of the total rows in dataflow specified by rows or percentage.
UNION ALL - Merge multiple data sets into a single dataset.
PIVOT – Used for Normalization of data sources to reduce analomolies by converting rows into columns
UNPIVOT – Used for demoralizing the data structure by converts columns into rows incase of building Data Warehouses.

# How to enable Xp_CmdShell in Sql Server?
We can enable through either T-Sql or SQL Server Surface Area Configuration.
-- To allow advanced options to be changed.
EXEC sp_configure 'show advanced options', 1
GO-- To update the currently configured value for advanced options.RECONFIGURE
GO
-- To enable the feature.
EXEC sp_configure 'xp_cmdshell', 1
GO
-- To update the currently configured value for this feature.
RECONFIGURE
GO
# How to debug a package
For debugging a package, we have 3 options:
-Setting breakpoints in a package, container or Task
-Using Data Viewer in the Data flow task
-Setting Error redirection in the Data Flow task

# What is the LoggingMode property?
SSIS packages, tasks and containers have a property called LoggingMode. This property accepts 3 possible values:
Enabled - to enable logging of that component
Disabled - to disable logging of that component
UseParentSetting - to use parent's setting of that component to decide whether or not to log the data.



# How to configure Error Output in SSIS
We have 3 options to configure error output in ssis.
1. Ignore Failure
2. Redirect Row
3. Fail Component

# Incremental Load in SSIS
Using Slowly Changing Dimension
Using Lookup and Cache Transformation

# How to migrate Sql server 2005 Package to 2008 version
1. In BIDS, by right click on the "SSIS Packages" folder of an SSIS project and selecting "Upgrade All Packages".
2. Running "ssisupgrade.exe" from the command line (default physical location C:\Program Files\Microsoft SQL Server\100\DTS\Bin folder).
3. If you open a SSIS 2005 project in BIDS 2008, it will automatically launch the SSIS package upgrade wizard.
# What are Row Transformations, Partially Blocking Transformation,   Fully Blocking Transformation with examples.

In Row Transformation, each value is manipulated individually. In this transformation, the buffers can be re-used for other purposes like following:
OLEDB Datasource,         OLEDB Data Destinations
Other Row transformation within the package,     Other partially blocking transformations within the package.
examples of Row Transformations: Copy Column, Audit, Character Map
Partially Blocking Transformation:
These can re-use the buffer space allocated for available Row transformation and get new buffer space allocated exclusively for Transformation.
examples: Merge, Conditional Split, Multicast, Lookup, Import, Export Column
Fully Blocking Transformation:
It will make use of their own reserve buffer and will not share buffer space from other transformation or connection manager.
examples: Sort, Aggregate, Cache Transformation

# Difference between FileSystem and Sql server Deployment
File System Deployment: We can save the package on a physical location on hard drive or any shared folder with this option, and we should provide a full qualified path to stored package in the FileSystem option.
Sql Server Deployment: SSIS packages will be stored in the sysssispackages table of MSDB Database.


# Difference between Lookup and Fuzzy Lookup transformation
Lookup Transformation finds the exact match.
Fuzzy Lookup transformation matches input table with reference table. It finds the closest match and indicates the quality of the match.
# Difference between Full Cache and Partial Cache

Partial Cache: The lookup cache starts off empty at the beginning of the data flow. When a new row comes in, the lookup transform checks its cache for the matching values. If no match is found, it queries the database. If the match is found at the database, the values are cached so they can be used the next time a matching row comes in.

Full Cache: The default cache mode for lookup is Full cache. The database is queried once during the pre-execute phase of the data flow. The entire reference set is pulled into memory. This approach uses most of the memory. Caching takes place before any rows are read from the data flow source. Lookup operations will be very fast during execution.

# Cache Transformation
Cache Transformation: to cache the data used in the Lookup transform.
When to use Full cache mode:
When you're accessing a large portion of your reference set
When you have a small reference table
When your database is remote or under heavy load, and you want to reduce the number of queries sent to the server
When to use Partial cache mode:
When you're processing a small number of rows and it's not worth the time to charge the full cache.
When you have a large reference table.
When your data flow is adding new rows to your reference table.
When you want to limit the size of your reference table by modifying query with parameters from the data flow.

# Explain Slowly Changing Dimesnion (SCD)
The SCD concept is basically about how the data modifications are absorbed and maintained in a Dimension Table.
The new (modified) record and the old record(s) are identified using some kind of a flag like say IsActive, IsDeleted etc. or using Start and End Date fields to
indicate the validity of the record.

Types:
Type 1 - update the columns in the dimension row without preserving any change history.
Type 2 - preserve the change history in the dimension table and create a new row when there are changes.
Type 3 - some combination of Type 1 and Type 2, usually maintaining multiple instances of a column in the dimension row; e.g. a current value and one or more previous values.

# Different types of File Enumerators
Foreach ADO:
The ADO Enumerator enumerates rows in a table. For example, we can get the rows in the ADO records.The variable must be of Object data type.
Foreach ADO.NET Schema Rowset:
The ADO.Net Enumerator enumerates the schema information. For example, we can get the table from the database.
Foreach File:
The File Enumerator enumerates files in a folder. For example, we can get all the files which have the *.txt extension in a windows folder and its sub folders.
Foreach From Variable:
The Variable Enumerator enumerates objects that specified variables contain. Here enumerator objects are nothing but an array or data table.
Foreach Item:
The Item Enumerator enumerates the collections. For example, we can enumerate the names of executables and working directories that an “Execute Process” task uses.
Foreach Nodelist:
The Node List Enumerator enumerates the result of an XPath expression.
Foreach SMO:
The SMO Enumerator enumerates SQL Server Management Objects (SMO). For example, we can get the list of functions or views in a SQL Server database.

# What are containers? (For loop, Sequence Container)
SSIS Containers are controls that provide structure to SSIS packages. Containers support repeating control flows in packages and they group tasks and containers into meaningful units of work. Containers can include other containers in addition to tasks. 
Container Type
Container Description
Purpose of SSIS Container
Foreach Loop Container
This container runs a Control Flow repeatedly using an enumerator.
To repeat tasks for each element in a collection, for example retrieve files from a folder, running T-SQL statements that reside in multiple files, or running a command for multiple objects.
For Loop Container
This container runs a Control Flow repeatedly by checking  conditional expression (same as For Loop in programming language).
To repeat tasks until a specified expression evaluates to false. For example, a package can send a different e-mail message seven times, one time for every day of the week.
Sequence Container
Groups tasks as well as containers into Control Flows that are subsets of the package Control Flow.
This container group tasks and containers that must succeed or fail as a unit. For example, a package can group tasks that delete and add rows in a database table, and then commit or roll back all the tasks when one fails.

# Performance Optimization in SSIS ***

1. Avoid Asynchronous Transformation (Sort T/F) wherever possible. Sort T/F required all the incoming rows to be arrivaed before start processing. Instead of using Sort T/F, we get sorted rows from datasource using ORDER By clause.

2. Pulling High Volumes of Data
Drop all Non-Clustered Indexes and Clustered Index if exists, then Transfer and load the data into Destination Table. Create Clustered Index and Non-clustered indexes.

3. Avoid SELECT *
DataFlow Task uses buffer oriented architecture for data transfer and transformation. When data transfer from Source to Destination, the data first comes into the buffer, required transformations are done in the buffer itself and then written to Destination.
The size of buffer depends on the estimated row size. The estimated row size is equal to the maximum size of all columns in the row. So the more columns in a row means less number of rows in a buffer. Hence select only those columns which are required at the destination.
Even if we need all the columns from source, we should use the column name specifically in the SELECT statement, otherwise it takes another round for the source to gather meta-data about the columns when u are using SELECT *.

4. Effect of OLEDB Destination Settings
There are couple of settings with OLEDB destination which can impact the performance of data transfer as listed below.
Data Access Mode – This setting provides the 'fast load' option which internally uses a BULK INSERT statement for uploading data into the destination table instead of a simple INSERT statement (for each single row) as in the case for other options. So unless you have a reason for changing it, don't change this default value of fast load. If you select the 'fast load' option, there are also a couple of other settings which you can use as discussed below.

Keep Identity – By default this setting is unchecked which means the destination table (if it has an identity column) will create identity values on its own. If you check this setting, the dataflow engine will ensure that the source identity values are preserved and same value is inserted into the destination table.
Keep Nulls – Again by default this setting is unchecked which means default value will be inserted (if the default constraint is defined on the target column) during insert into the destination table if NULL value is coming from the source for that particular column. If you check this option then default constraint on the destination table's column will be ignored and preserved NULL of the source column will be inserted into the destination.

Table Lock
 – By default this setting is checked and the recommendation is to let it be checked unless the same table is being used by some other process at same time. It specifies a table lock will be acquired on the destination table instead of acquiring multiple row level locks, which could turn into lock escalation problems.
Check Constraints – Again by default this setting is checked and recommendation is to un-check it if you are sure that the incoming data is not going to violate constraints of the destination table. This setting specifies that the dataflow pipeline engine will validate the incoming data against the constraints of target table. If you un-check this option it will improve the performance of the data load.

#5 - Effect of Rows Per Batch and Maximum Insert Commit Size Settings:
Rows per batch:
The default value for this setting is -1 which specifies all incoming rows will be treated as a single batch. You can change this default behavior and break all incoming rows into multiple batches. The allowed value is only positive integer which specifies the maximum number of rows in a batch.

#6-Maximum insert commit size:
The default value for this setting is '2147483647' (largest value for 4 byte integer type) which specifies all incoming rows will be committed once on successful completion. You can specify a positive value for this setting to indicate that commit will be done for those number of records. Changing the default value for this setting will put overhead on the dataflow engine to commit several times. Yes that is true, but at the same time it will release the pressure on the transaction log and tempdb to grow specifically during high volume data transfers.

The above two settings are very important to understand to improve the performance of tempdb and the transaction log. For example if you leave 'Max insert commit size' to its default, the transaction log and tempdb will keep on growing during the extraction process and if you are transferring a high volume of data the tempdb will soon run out of memory as a result of this your extraction will fail. So it is recommended to set these values to an optimum value based on your environment.

#7 - DefaultBufferSize and DefaultBufferMaxRows :
The execution tree creates buffers for storing incoming rows and performing transformations.

The number of buffer created is dependent on how many rows fit into a buffer and how many rows fit into a buffer dependent on few other factors. The first consideration is the estimated row size, which is the sum of the maximum sizes of all the columns from the incoming records. The second consideration is the DefaultBufferMaxSize property of the data flow task. This property specifies the default maximum size of a buffer. The default value is 10 MB and its upper and lower boundaries are constrained by two internal properties of SSIS which are MaxBufferSize (100MB) and MinBufferSize (64 KB). It means the size of a buffer can be as small as 64 KB and as large as 100 MB. The third factor is, DefaultBufferMaxRows which is again a property of data flow task which specifies the default number of rows in a buffer. Its default value is 10000.

If the size exceeds the DefaultBufferMaxSize then it reduces the rows in the buffer. For better buffer performance you can do two things.
First you can remove unwanted columns from the source and set data type in each column appropriately, especially if your source is flat file. This will enable you to accommodate as many rows as possible in the buffer.
Second, if your system has sufficient memory available, you can tune these properties to have a small number of large buffers, which could improve performance. Beware if you change the values of these properties to a point where page spooling (see Best Practices #8) begins, it adversely impacts performance. So before you set a value for these properties, first thoroughly testing in your environment and set the values appropriately.

#8 - How DelayValidation property can help you
SSIS uses two types of validation.
First is package validation (early validation) which validates the package and all its components before starting the execution of the package.
Second SSIS uses component validation (late validation), which validates the components of the package once started.

Let's consider a scenario where the first component of the package creates an object i.e. a temporary table, which is being referenced by the second component of the package. During package validation, the first component has not yet executed, so no object has been created causing a package validation failure when validating the second component. SSIS will throw a validation exception and will not start the package execution. So how will you get this package running in this common scenario?

To help you in this scenario, every component has a DelayValidation (default=FALSE) property. If you set it to TRUE, early validation will be skipped and the component will be validated only at the component level (late validation) which is during package execution
9. Better performance with parallel execution
10. When to use events logging and when to avoid.

11. Monitoring the SSIS Performance with Performance Counters
Launch Performance Monitor:
1. Start -> All Programs -> Administrative Tools -> Performance
2. Load the SSIS related Counters
In the Performance Object, select SQL Server:SSIS Pipeline and SQL Server:SSIS Service.

SSIS provide a set of performance counters. Among them, the following few are helpful when you tune or debug your package:
Buffers in use
Flat buffers in use
Private buffers in use
Buffers spooled
Rows read
Rows written

“Buffers in use”, “Flat buffers in use” and “Private buffers in use” are useful to discover leaks. During package execution time, we will see these counters fluctuating. But once the package finishes execution, their values should return to the same value as what they were before the execution. Otherwise, buffers are leaked.

“Buffers spooled” has an initial value of 0. When it goes above 0, it indicates that the engine has started memory swapping. In a case like this, set Data Flow Task properties BLOBTempStoragePath and BufferTempStoragePath appropriately for maximal I/O bandwidth.
Buffers Spooled: The number of buffers currently written to the disk. If the data flow engine runs low on physical memory, buffers not currently used are written to disk and then reloaded when needed.

“Rows read” and “Rows written” show how many rows the entire Data Flow has processed.
12. FastParse property
Fast Parse option in SSIS can be used for very fast loading of flat file data. It will speed up parsing of integer, date and time types if the conversion does not have to be locale-sensitive. This option is set on a per-column basis using the Advanced Editor for the flat file source.

13. Checkpoint features helps in package restarting

# Upgrade DTS package to SSIS
1. In BIDS, from the Project Menu, select 'Migrate DTS 2000 Package'
2. In the Package Migration Wizard, choose the Source, Sql Server 2000 Server Name, Destination folder.
3. Select the List of packages that needs to be upgraded to SSIS
4. Specifty the Log file for Package Migration
# Events in SSIS
OnError : Runs when a task or container reports an error.
OnExecStatusChanged : Runs for all tasks and containers when the execution status changes to In Process, Success, or Failed.
OnInformation : Runs when SSIS outputs information messages during the validation and execution of a task or container.
OnPostExecute : Runs after a container or task successfully completes.
OnPostValidate : Executes after a container or task has successfully been validated.
OnPreExecute : Runs just before a container or task is executed.
OnPreValidate: Runs before the component is validated by the engine.
OnProgress : Executed when a progress message is sent by the SSIS engine, indicating tangible advancement of the task or container.
OnQueryCancel : Invoked when an Execute SQL Task is cancelled through manual intervention, such as stopping the package.
OnTaskFailed : Similar to OnError, but runs when a task fails rather than each time an error occurs.
OnVariableValueChanged: Runs when the value changes in a variable for which the RaiseChangeEvent property is set to True.
OnWarning Runs: when a task returns a warning event such as a column not being used in a data flow.



# Different ways to execute SSIS package
1. Using the Execute Package Utility (DTEXECUI.EXE) graphical interface one can execute an SSIS package that is stored in a File System, SQL Server or an SSIS Package Store.

DTEXECUI  provides a graphical user interface that can be used to specify the various options to be set when executing an SSIS package.  You can launch DTEXECUI by double-clicking on an SSIS package file (.dtsx).  You can also launch DTEXECUI from a Command Prompt then specify the package to execute.
2. Using the DTEXEC.EXE command line utility one can execute an SSIS package that is stored in a File System, SQL Server or an SSIS Package Store. The syntax to execute a SSIS package which is stored in a File System is shown below.
DTEXEC.EXE /F "C:\BulkInsert\BulkInsertTask.dtsx"

3. Test the SSIS package execution by running the package from BIDS:
-In Solution Explorer, right click the SSIS project folder that contains the package which you want to run and then click properties.
- In the SSIS Property Pages dialog box, select Build option under the Configuration Properties node and in the right side panel, provide the folder location where you want the SSIS package to be deployed within the OutputPath. Click OK to save the changes in the property page.
-Right click the package within Solution Explorer and select Execute Package option from the drop down menu 


4. Sql Server Agent:
Drill down to the SQL Server Agent node in the Object Explorer.  Right click on the Jobs node and select New Job from the popup menu.

The first step to setting up the proxy is to create a credential (alternatively you could use an existing credential).  Navigate to Security then Credentials in SSMS Object Explorer and right click to create a new credential

Navigate to SQL Server Agent then Proxies in SSMS Object Explorer and right click to create a new proxy

# How to deploy packages from one server to another server
1.To copy the deployment bundle

Locate the deployment bundle on the first server.
If you used the default location, the deployment bundle is the Bin\Deployment folder.
Right-click the Deployment folder and click Copy.
Locate the public share to which you want to copy the folder on the target computer and click Paste.
2: Running the Package Installation Wizard
1. On the destination computer, locate the deployment bundle.
2. In the Deployment folder, double-click the manifest file, Project1.SSISDeploymentManifest.
3. On the Welcome page of the Package Installation Wizard, click Next.
4. On the Deploy SSIS Packages page, select either File sytem or SQL Server deployment option, select the "Validate packages after installation" check box, and then click Next.
5. On the Specify Target SQL Server page, specify (local), in the Server name box.
6. If the instance of SQL Server supports Windows Authentication, select Use Windows Authentication; otherwise, select Use SQL Server Authentication and provide a user name and a password.
7. Verify that the "Rely on server storage for encryption" check box is cleared.
Click Next.
8. On the Select Installation Folder page, click Browse.
9. On the Confirm Installation page, click Next.
10. The wizard installs the packages. After installation is completed, the Configure Packages page opens.


3 comments:

  1. that is great and hard work goooooooooooooooooooddddddddd may god bless u

    ReplyDelete
  2. Awesome Joydeep!!
    That's really great collection.
    Keep up the great work sir.

    ReplyDelete
  3. Awesome Joydeep!!
    That's really great collection.
    Keep up the great work sir.

    ReplyDelete