ssis interview question

What is SQL Server Integration Services (SSIS)?

SQL Server Integration Services (SSIS) is component of SQL Server 2005 and later versions. SSIS is an enterprise scale ETL (Extraction, Transformation and Load) tool which allows you to develop data integration and workflow solutions. Apart from data integration, SSIS can be used to define workflows to automate updating multi-dimensional cubes and automating maintenance tasks for SQL Server databases.

How does SSIS differ from DTS?

SSIS is a successor to DTS (Data Transformation Services) and has been completely re-written from scratch to overcome the limitations of DTS which was available in SQL Server 2000 and earlier versions. A significant improvement is the segregation of the control/work flow from the data flow and the ability to use a buffer/memory oriented architecture for data flows and transformations which improve performance.

What is the Control Flow?

When you start working with SSIS, you first create a package which is nothing but a collection of tasks or package components. The control flow allows you to order the workflow, so you can ensure tasks/components get executed in the appropriate order.

What is the Data Flow Engine?

The Data Flow Engine, also called the SSIS pipeline engine, is responsible for managing the flow of data from the source to the destination and performing transformations (lookups, data cleansing etc.). Data flow uses memory oriented architecture, called buffers, during the data flow and transformations which allows it to execute extremely fast. This means the SSIS pipeline engine pulls data from the source, stores it in buffers (in-memory), does the requested transformations in the buffers and writes to the destination. The benefit is that it provides the fastest transformation as it happens in memory and we don't need to stage the data for transformations in most cases.

What is a Transformation?

A transformation simply means bringing in the data in a desired format. For example you are pulling data from the source and want to ensure only distinct records are written to the destination, so duplicates are removed. Anther example is if you have master/reference data and want to pull only related data from the source and hence you need some sort of lookup. There are around 30 transformation tasks available and this can be extended further with custom built tasks if needed.

What is a Task?

A task is very much like a method of any programming language which represents or carries out an individual unit of work. There are broadly two categories of tasks in SSIS, Control Flow tasks and Database Maintenance tasks. All Control Flow tasks are operational in nature except Data Flow tasks. Although there are around 30 control flow tasks which you can use in your package you can also develop your own custom tasks with your choice of .NET programming language.

What is a Precedence Constraint and what types of Precedence Constraint are there?

SSIS allows you to place as many as tasks you want to be placed in control flow. You can connect all these tasks using connectors called Precedence Constraints. Precedence Constraints allow you to define the logical sequence of tasks in the order they should be executed. You can also specify a condition to be evaluated before the next task in the flow is executed.
These are the types of precedence constraints and the condition could be either a constraint, an expression or both
- Success (next task will be executed only when the last task completed successfully) or
- Failure (next task will be executed only when the last task failed) or
- Complete (next task will be executed no matter the last task was completed or failed).

What is a container and how many types of containers are there?

A container is a logical grouping of tasks which allows you to manage the scope of the tasks together.
These are the types of containers in SSIS:
- Sequence Container - Used for grouping logically related tasks together
- For Loop Container - Used when you want to have repeating flow in package
- For Each Loop Container - Used for enumerating each object in a collection; for example a record set or a list of files.
Apart from the above mentioned containers, there is one more container called the Task Host Container which is not visible from the IDE, but every task is contained in it (the default container for all the tasks).

What are variables and what is variable scope?

A variable is used to store values. There are basically two types of variables, System Variable (like ErrorCode, ErrorDescription, PackageName etc) whose values you can use but cannot change and User Variable which you create, assign values and read as needed. A variable can hold a value of the data type you have chosen when you defined the variable.
Variables can have a different scope depending on where it was defined. For example you can have package level variables which are accessible to all the tasks in the package and there could also be container level variables which are accessible only to those tasks that are within the container.

What are SSIS Connection Managers?

When we talk of integrating data, we are actually pulling data from different sources and writing it to a destination. But how do you get connected to the source and destination systems? This is where the connection managers come into the picture. Connection manager represent a connection to a system which includes data provider information, the server name, database name, authentication mechanism, etc.

What is the RetainSameConnection property and what is its impact?

Whenever a task uses a connection manager to connect to source or destination database, a connection is opened and closed with the execution of that task. Sometimes you might need to open a connection, execute multiple tasks and close it at the end of the execution. This is where RetainSameConnection property of the connection manager might help you. When you set this property to TRUE, the connection will be opened on first time it is used and remain open until execution of the package completes.

What are a source and destination adapters?

A source adaptor basically indicates a source in Data Flow to pull data from. The source adapter uses a connection manager to connect to a source and along with it you can also specify the query method and query to pull data from the source.
Similar to a source adaptor, the destination adapter indicates a destination in the Data Flow to write data to. Again like the source adapter, the destination adapter also uses a connection manager to connect to a target system and along with that you also specify the target table and writing mode, i.e. write one row at a time or do a bulk insert as well as several other properties.
Please note, the source and destination adapters can both use the same connection manager if you are reading and writing to the same database.

What is the Data Path and how is it different from a Precedence Constraint?

Data Path is used in a Data Flow task to connect to different components of a Data Flow and show transition of the data from one component to another. A data path contains the meta information of the data flowing through it, such as the columns, data type, size, etc. When we talk about differences between the data path and precedence constraint; the data path is used in the data flow, which shows the flow of data. Whereas the precedence constraint is used in control flow, which shows control flow or transition from one task to another task.

What is a Data Viewer utility and what it is used for?

The data viewer utility is used in Business Intelligence Development Studio during development or when troubleshooting an SSIS Package. The data viewer utility is placed on a data path to see what data is flowing through that specific data path during execution. The data viewer utility displays rows from a single buffer at a time, so you can click on the next or previous icons to go forward and backward to display data.

What is an SSIS breakpoint? How do you configure it? How do you disable or delete it?

A breakpoint allows you to pause the execution of the package in Business Intelligence Development Studio during development or when troubleshooting an SSIS Package. You can right click on the task in control flow, click on Edit Breakpoint menu and from the Set Breakpoint window, you specify when you want execution to be halted/paused. For example OnPreExecute, OnPostExecute, OnError events, etc. To toggle a breakpoint, delete all breakpoints and disable all breakpoints go to the Debug menu and click on the respective menu item. You can event specify different conditions to hit the breakpoint as well.

What is SSIS event logging?

Like any other modern programming language, SSIS also raises different events during package execution life cycle. You can enable or write these events to trace the execution of your SSIS package and its tasks. You can also can write your custom message as a custom log. You can enable event logging at the package level as well as at the tasks level. You can also choose any specific event of a task or a package to be logged. This is essential when you are troubleshooting your package and trying to understand a performance problem or root cause of a failure.

What are the different SSIS log providers?

There are several places where you can log execution data generated by an SSIS event log:
- SSIS log provider for Text files
- SSIS log provider for Windows Event Log
- SSIS log provider for XML files
- SSIS log provider for SQL Profiler
- SSIS log provider for SQL Server, which writes the data to the msdb..sysdtslog90 or msdb..sysssislog table depending on the SQL Server version.

How do you enable SSIS event logging?

SSIS provides a granular level of control in deciding what to log and where to log. To enable event logging for an SSIS Package, right click in the control flow area of the package and click on Logging. In the Configure SSIS Logs window you will notice all the tasks of the package are listed on the left side of the tree view. You can specifically choose which tasks you want to enable logging. On the right side you will notice two tabs; on the Providers and Logs tab you specify where you want to write the logs, you can write it to one or more log providers together. On the Details tab you can specify what events do you want to log for the selected task.
Please note, enabling event logging is immensely helpful when you are troubleshooting a package, but also incurs additional overhead on SSIS in order to log the events and information. Hence you should only enabling event logging when needed and only choose events which you want to log. Avoid logging all the events unnecessarily.

What is the LoggingMode property?

SSIS packages and all of the associated tasks or components have a property called LoggingMode. This property accepts three possible values: Enabled - to enable logging of that component, Disabled - to disable logging of that component and UseParentSetting - to use parent's setting of that component to decide whether or not to log the data.

Difference between control flow and data flow

Control flow deals with orderly processing of individual, isolated tasks, these tasks are linked through precedence constraints in random order. Also the output for task has finite outcome i.e., Success, Failure, or Completion. A subsequent task does not initiate unless its predecessor has completed. Data flow, on the other hand, streams the data in pipeline manner from its source to a destination and modifying it in between by applying transformations. Another distinction between them is the absence of a mechanism that would allow direct transfer of data between individual control flow tasks. On the other hand, data flow lacks nesting capabilities provided by containers.

	Control Flow	Data Flow
	Process Oriented	Data Oriented
Made up of	Tasks and Container	Source, Transformation and Destination
Connected through	Precedence constraint	Paths
Smallest unit	Task	Component
Outcome	Finite- Success, Failure, Completion	Not fixed

If you want to send some data from Access database to SQL server database. What are different component of SSIS will you use?

In the data flow, we will use one OLE DB source, data conversion transformation and one OLE DB destination or SQL server destination. OLE DB source is data source is useful for reading data from Oracle, SQL Server and Access databases. Data Conversion transformation would be needed to remove datatype abnormality since there is difference in datatype between the two databases (Access and SQL Server) mentioned. If our database server is stored on and package is run from same machine, we can use SQL Server destination otherwise we need to use OLE DB destination. The SQL Server destination is the destination that optimizes the SQL Server.

Difference and similarity between merge and merge join transformation

	Merge Transofrmations	Merge Join Transformation
	The data from 2 input paths are merged into one	The data from 2 inputs are merged based on some common key.
Works as	UNION	JOIN (LEFT, RIGHT OR FULL)
Supports	2 Datasets	1 Dataset
Columns	Metadata for all columns needs to be same	Key columns metadata needs to be same.
Pre-requisites	Data must be sorted. Merged columns should have same datatype i.e. if merged column is EmployeeName with string of 25 character in Input 1, it can be of less than or equal to 25 characters for merging to happen.	Data must be sorted. Merged columns should have same datatype i.e. if merged column is EmployeeName with string of 25 character in Input 1, it can be of less than or equal to 25 characters for merging to happen.
Limitations	Only 2 input paths can be merged. Does not support error handling.	Does not support error handling.
Use	Merging of data from 2 data source Can create complex datasets using nesting merge transformation,	When data from 2 tables having foreign key relationship needs to present based on common key.

What is precedence constraint?

A precedence constraint is a link between 2 control flow tasks and lays down the condition on which the second task is run. They are used to control the workflow of the package. There are 3 kinds of precedence constraint – success (green arrow), failure (red arrow) or Completion script task (blue arrow). By default, when we add 2 tasks, it links by green arrow. The way the precedence constraint is evaluated can be based on outcome of the initial task. Also, we can add expression to evaluate such outcome. Any expression that can be judged as true or false can be used for such purpose. The precedence constraint is very useful in error handling in SSIS package.

Your browser may not support display of this image.

Explain why variables called the most powerful component of SSIS.

Variable allows us to dynamically control the package at runtime. Example: You have some custom code or script that determines the query parameter’s value. Now, we cannot have fixed value for query parameter. In such scenarios, we can use variables and refer the variable to query parameter. We can use variables for like:
1. updating the properties at runtime,
2. populating the query parameter value at runtime,
3. used in script task,
4. Error handling logic and
5. With various looping logic.

Can we add our custom code in SSIS?

We can customize SSIS through code by using Script Task. The main purpose of this task is to control the flow of the package. This is very useful in the scenario where the functionality you want to implement is not available in existing control flow item.

To add your own code:-
1. In control flow tab, drag and drop Script Task from toolbox.
2. Double click on script task to open and select edit to open Script task editor.
3. In script task editor, there are 3 main properties
i.) General – Here you can specify name and description
ii.) Script – through this we can add our code by clicking on Design Script button. The scripting language present is VB.Net only.
iii.) Expression

What is conditional split?

As the name suggest, this transformation splits the data based on condition and route them to different path. The logic for this transformation is based on CASE statement. The condition for this transformation is an expression. This transformation also provides us with default output, where rows matching no condition are routed. Conditional split is useful in scenarios like Telecom industry data you want to divide the customer data on gender, condition would be:

GENDER == ‘F’

Explain the use of containers in SSIS and also their types.

Containers can be defined as objects that stores one or more tasks. The primary purpose of container is grouping logically related tasks. Once the task is placed into the containers, we can perform various operations such as looping on container level until the desired criterion is met. Nesting of container is allowed. Container is placed inside the control flow.

There are 4 types of Container:-
1. Task Host container- Only one task is placed inside the container. This is default container.
2. Sequence Container – This container can be defined as subset of package control flow.
3. For loop container – Allows looping based on condition. Runs a control flow till condition is met.
4. For each loop container - Loop through container based on enumerator.

Why is the need for data conversion transformations?

This transformation converts the datatype of input columns to different datatype and then route the data to output columns.

This transformation can be used to:
1. Change the datatype
2. If datatype is string then for setting the column length
3. If datatype is numeric then for setting decimal precision.

This data conversion transformation is very useful where you want to merge the data from different source into one. This transformation can remove the abnormality of the data. Example à The Company’s offices are located at different part of world. Each office has separate attendance tracking system in place. Some offices stores data in Access database, some in Oracle and some in SQL Server. Now you want to take data from all the offices and merged into one system. Since the datatypes in all these databases vary, it would be difficult to perform merge directly. Using this transformation, we can normalize them into single datatype and perform merge.

Error Handling in SSIS?

An error handler allows us to create flows to handle errors in the package in quite an easy way. Through event handler tab, we can name the event on which we want to handle errors and the task that needs to be performed when such an error arises. We can also add sending mail functionality in event of any error through SMTP Task in Event handler. This is quite useful in event of any failure in office non-working hours. In Data flow, we can handle errors for each connection through following failure path or red arrow.

KANSIRIS