Top 22 Azure Data Factory (ADF) Interview Questions and Answers for 2025

ADF Interview Questions

Azure data factory is serverless data Integration platform of Microsoft, That allows users to create and manage workflows for data movement and transformation on a large scale. It provides seamless connectivity to a number of data sources by following the ETL process. Here are the interview questions on ADF which will help you to grab a job. You can get practical knowledge by enrolling yourself with Console Flare where you will work on real scenarios under the guidance of industry experts. Here we listed out the top 22 Basic to Advance ADF Interview Questions that will help you to present yourself as a capable and motivated candidate.

Top 22 Azure Data Factory (ADF) Interview Questions and Answers for 2025

  1. Question 1: What is ADF and what are the core components of ADF?

Answer: Azure Data Factory is a cloud-based data integration service provided by Microsoft. With the help of a data factory, you can create, schedule, and orchestrate data which we call a pipeline by following the ETL (Extract, Transform, and Load )operation.

ADF can connect with multiple data sources and several services. In these pipelines, you can ingest data, perform transformations, and prepare data to load into the database.Key Components of Azure Data Factory

  1. Pipeline: For any particular task, we perform a set of activities that we refer to as a Pipeline. Eg Pipeline can extract, transform, and load the data into the desired destination.
  2. Pipelines are reusable and they provide flexibility like you can execute them at a scheduled time , can be triggered for any particular event or manually also you cba trigger the pipeline.
  3. Activities: The individual tasks that we perform in the pipeline, such as copying data, executing the stored procedure, running data flows, or executing the pipelines. Type of activities like copy activity, sink activity, and forEach activity.

1. Datasets:

A dataset represents the data structure within a source or destination. Datasets are used to define data schema and location, making it easy to configure source and target endpoints.

2. Linked Services

    • Linked Services act as connection settings to data sources and destinations, such as Azure Blob Storage, SQL Server, REST APIs, and more.
    • They store authentication and configuration details necessary for accessing data systems.

3. Triggers

      • Triggers initiate pipeline executions, either on a schedule, in response to an event or based on a time window.
      • ADF supports Schedule Triggers, Tumbling Window Triggers, and Event-Based Triggers.

4. Data Flows:

  • Mapping Data Flow: Allows complex data transformations without requiring code, using a visual drag-and-drop interface.
  • Wrangling Data Flow: Provides data wrangling capabilities similar to Power Query, allowing for quick data exp.
  1. Question 2: What is the dataset?

Answer: Datasets are a structural representation of the data that you use in your pipeline. It specifies the data format, like CSV, Excel, JSON, and SQL with schema and path details. Each dataset is connected to the linked services which provide the required connection details like authentications and endpoint details to access the data source. There are two types of datasets, input dataset and output dataset. Input datasets you can read by activity in the pipeline. Output datasets are written by activity in the pipeline.

  1. Question 3: Explain the difference between linked service and dataset.

Ans.  Linked services and datasets are essential components in ADF. Linked service provides the connection details that are required to access the external data source. Linked service is reusable and can be used for multiple pipelines. Whereas Dataset provides structural information of the data along with the schema and file path within the linked service.

  1. Question 4: What is Integration Runtime?

Ans. Integration Runtime (IR) serves as a computational framework used to execute data integration operations. It is essential to make connections with the data source, transformation, and pipeline execution.

  • Types of Integration runtime.

Azure integration runtime –  It is the default option, fully managed by Azure. It is used for cloud-based movement(data movement between Azure services eg, Blob Storage, SQL Database) and transformation. It copies the data between different cloud regions like copying the data from an Azure blob storage container to an Azure SQL database.

  • Self-hosted integration runtime 

Self-hosted runtime when a user tries to define or host the integration runtime.  It is used for secure data movement between the system and is designed for on-premise and hybrid data movement and transformation. It is installed and managed on a local server to an Azure data lake. 

  • Azure-SSIS Integration Runtime

This is mainly designed for SQL SERVER INTEGRATIONAzure environment. Process ETL jobs in the cloud using familiar SSIS tools and provide a fully managed and scalable environment for SSIS execution.

  1. Question 5: Difference between parameter and variable in ADF.

Ans. Parameters and variables are used to store the values in the pipeline, datasets, or dataflows.

Parameter 

The parameter is used to transfer the external value into the pipeline, datasets, or dataflows. They are read-only, once assigned and then can’t be changed. It is defined at the pipeline level or within the datasets/dataflows. Values must be set before the execution like through trigger, debug, or manual execution. These parameters are perfect for dynamic configurations like file path, table name, or filter condition based on input values and can be used as dynamic expressions to make components reusable across different scenarios. They are defined in the parameter section of the pipeline through the pipeline editor, trigger definitions, REST API calls, or custom code orchestration. 

Example 

Define the parameter in the filename 

  • “parameters”: {

“FileName”: {

“type”: “String”,

“defaultValue”: “sample.csv”

}

}

Use it in a dataset linked to a blob storage file path 

    • @concat(‘container/’, pipeline().parameters.FileName)

    Variables in Azure Data Factory

    Variable store value temporarily in Azure data factory during the pipeline execution. Variables are mutable and updatable with set variables and append activity. Variables are defined and used within the same pipeline. They are particularly useful for dynamic control flow logic as they help to make decisions during the runtime, such as whether to execute specific activities or repeat certain steps, variable store values temporarily and can change in the pipeline as per the condition. A variable defined in the set variable section and supports data types like String, Boolean, and Array

    Define a pipeline variable FileCounter:

    “variables”: {

    “FileCounter”: {

    “type”: “Int”,

    “defaultValue”: 0

    }

    }

    Update Variable during execution:

    • Use the Set Variable activity to increment the counter

    @add(variables(‘FileCounter’), 1)

    1. Question 6: What are triggers, explain different types of triggers in adf. How can you schedule a pipeline in adf?

    Ans. Triggers are used to execute the pipeline. There are three types of triggers.

    Event Trigger: As the name suggests whenever any event happens in that case will automatically trigger.

    Tumbling Trigger: Runs pipelines in time-bound slices or windows, useful for processing data incrementally. Batch processing of data within a defined time window (e.g., processing hourly logs or daily sales data).

    Schedule Trigger: These triggers allow the pipeline execution in the scheduled time. Automate jobs that need to run at regular intervals eg daily at 1 PM or every hour.

    1. Question 7: What are different ways to execute the pipeline?

    Ans.

    • Under debug mode 
    • Manual execution using trigger now 
    • Using an added scheduled, tumbling window, or Event trigger.

    Question 8: What are ARN templates in ADF What are they used for?

    Ans. Whenever you create a pipeline, everything gets converted in the form of a JSON file. These JSON files are ARM templates that include datasets, linked services, and include all the configurations that are set up. 

    Use case – You can export the entire configuration of your data factory and deploy the same in the test and production environments. 

    1. Question 9: What are the difficulties that you have faced while copying data from on-prem to the cloud? And how did you resolve it?

    Ans.  Network Latency and Bandwidth Limitations: 

    Due to the large volume of data and the limited bandwidth data movement was slow, to overcome this problem compress the data to reduce size, with ADF split the data into chunks and transfer them in parallel and optimize self-hosted integration runtime.

    If the data volume is very high like in terabytes or petabytes then it can be resource-intensive and time-consuming to overcome this issue by implementing the incremental data load with change data capture or watermarking.

    Data Security and compliance 

    To ensure secure data transfer by following rules and regulations such as GDPR and HIPPA encrypting the data in transit by TLS/SSL, Azure key vault, and data masking techniques.  

    Schema Mismatches: when source data does not match with the target data that creates an issue, to overcome this issue use Mapping data flow for dynamic mapping, enable the “Allow schema drift” and apply data profiling techniques to avoid discrepancy. 

    1. Question 10: Different activities used in ADF.

    Ans. 

    • Copy Data Activity 
    • For Each Activity
    • Metadata Activity
    • Set VAriable Activity
    • Lookup Activity 
    • Wait for Activity
    • Validation Activity
    • Web Activity
    • Webhook Activity
    • Until Activity 
    • If Condition 
    • Filter Activity
    1. Question 11: Can you execute For Each activity under For Each Activity?

    Ans. Yes, For Each Activity can be executed under the For Each Activity which is known as Nested For Each Activity. Nested Foreach Activity is used for Hierarchical Dataset. Suppose you have a list of folders with files inside them,  you can use a nested loop to iterate through each folder and process the files that are present under the folder. 

    1. Question 12: How to delete files that are more than 30 days old?

    Ans. To delete 30 day file there are the steps that need to be followed the same 

    • Get Metadata Activity to Retrieve the file information 

    First, configure the metadata activity and enable the child items property to retrieve the list of files in the folder. 

    • Each activity to Iterate through Files 

    Link the For Each Activity to the output of Get Metadata Activity. Process the file individually inside the ForEAch Activity.

    • Again use get Metadata Activity to get file properties.

    Add a get Metadata Activity inside the ForEach Loop to get the information like the last modified date.

    • If Activity 

    Apply the IF condition to check if the file is 30 days old by using this expression

    @less(addDays(activity(‘Get Metadata Activity 2’).output.lastModified, 30), utcNow())

    • Delete Activity

    If the condition is true use the delete activity to delete the file from the source.

    1. Question 13: How to Archive files after loading in ADF?

    Ans. To archive files loading them into the target system in Azure data factory. Below are the steps to set up the pipeline.

    • Use copy activity to load the file.

    First, you need to copy the activity to load the file into the target system. After successful loading move it to the ARCHIVE folder by using 

    • COPY+ DELETE Activity 

    To copy the Activity source the original file sinks the Archive folder path and configures it dynamically if you process multiple files. 

    @concat(‘<archive-folder-path>/’, item().name)

    After copying the file to the Archive file delete the original file by using Delete Activity.

    • WEB ACTIVITY

    In Azure blob storage by using web activity to call Azure Storage REST API to move the file. With this approach, you can complete the ARCHIVE process without doing a DELETE Activity.

    Dynamic File Naming for Archive Folder

    Add a Timestamp or unique identifier to avoid overwrite issues in the Archived file.

    Expression: –

    @concat(‘<archive-folder-path>/’, item().name, ‘-‘, format DateTime(utcNow(), ‘yyyy-MM-dd-HH-mm-ss’))

    1. Question 14: How to send a notification to a Microsoft Teams channel from an Azure Data Factory?

    Ans. Through Webhook activity, you can send notifications to Microsoft Teams.

    Let’s Explore the Steps

    • Set up An InComing Webhook in MS Teams

    Open MS Teams and navigate the channel where you want to receive notifications.

    For Adding the connector click on three Dots just next to the channel name select the connector and search for Incoming Webhook.

    Configure the Webhook and provide the appropriate name, You can upload an icon for notifications. After the Successful creation of Webhook, copy the URL, for the use of Azure Data Factory.

    • Open the Azure Data Factory portal

    Create the pipeline and Add Webhook activity to the pipeline. Configure the web Activity by providing the name and paste the URL that you previously copied in the MS team during the Webhook creation. 

    Set Method as Method to POST.

    Mention the Key-Value pair in the headers to indicate JSON.

    {

    “Content-Type”: “application/json”

    }

    Create Message: Write the message that you want to send to the MS Team in JSON format.

    For Example 

    {

    “text”: “Pipeline @{{pipeline().PipelineName}} completed successfully!”,

    “summary”: “ADF Notification”

    }

    1. Question 15: How to load the fixed-length flat file to ASQL from an Azure data factory?

    Ans. Here are the Steps to load the fixed-length flat file to ASQL

    • First, prepare the fixed length for the Flat File 

    001John    Smith    5000.50

    002Jane    Doe      4200.75

    • Create An Azure SQL table according to the structure of the Data

    CREATE TABLE EmployeeData (

    EmployeeID INT,

    FirstName VARCHAR(50),

    LastName VARCHAR(50),

    Salary DECIMAL(10, 2)

    );

    • Create Data factory Pipeline 

    Open the Azure portal open the Author section and create the linked services to source the storage account and for Sink for Azure SQL Database.

    • Configure the Dataset

    Go to the dataset click New and choose the file, set the delimiter to None, Skip the Header Rows. 

    • Use the Mapping Data to specify the fixed-width column mapping. Use the derived-to-column feature to split fields according to their width and configure the dataset in ADF’s sink to map Columns to the SQL table. Finally, schedule and test the pipeline 
    1. Question 16: How to load the JSON with Nested hierarchy to a CSV file?

    Ans.  

    • Create Linked Services 

    Create a linked service for Storage accounts Like Blob storage or Data Lake where the JSON file is located and create a linked service for Sink for the destination where the CSV file will be saved.

    • Create  Dataset 

    Create a New Dataset for the JSON source file and create the Dataset for the CSV Output file. Set the file format to CSV and specify the destination location.

    • Build Dataflow

    Add the dataflow activity and add the source transformation to point it to the JSON Dataset which was previously set up. In the source option, Select “Complex JSON” and set up the schema by importing the JSON document or defining it manually as per the JSON structure.

    • Flatten The JSON 

    Define the Root node to flatten the JSON and ADD mapping for the nested elements to create the flat structure.

    • Derived Column Transformation 

    If required use a derived column transformation to modify the data.

    Sink Configuration

    Add sink transformation, connect it to your CSV dataset and configure the sink transformation to the result which is in CSV format. Check that the file name and Path setting are matched to the desired output.

    Execute the Pipeline 

    Before running the full pipeline first Debug the same, if debugging is successful then execute the pipeline.

    1. Question 17: Difference between Mapping Data Flow and Wrangling Dataflow.

    Ans. Mapping Dataflow is a code-free ETL visually designed process.  You can perform a wide range of transformations like Aggregate, Join, Derived Column, Pivot, Unpivot, and Lookup. It is optimized for large scale as pipelines are reusable and you can do parallel processing in Spark-based execution in Azure Databricks.

    Whereas wrangling dataflows are designed for data preparation and exploration which is used for Data Analysis and Data engineering. The data preparation task is done by Power query.

    1. Question 18: What is Schema drift and how to handle it?

    Ans. Schema Drift 

    When the source data changes unexpectedly in Azure Data Factory, This condition is called. Sometimes new columns get added or removed as per the scenario. Data types also get changed. Handling data drift is essential to maintain the proper workflow. 

    Let’s See how to handle Schema drift. 

    • Enable Schema drift
    • In the projection tab allow the Schema drift, This enables the source to accommodate all incoming dynamically without needing to state a schema beforehand.

    Enable schema drift in the sink transformation also to make sure all columns are dynamically transferred to the output. 

    • Derived Column

    Use the derived column transformation to dynamically reference columns in your dataflow and can use expressions like byName() or column()

    1. Question 19:  What are the different Metadata options that you get in get metadata activity in ADF?

    Ans. Get Metadata activity in Azure data factory allows you to retrieve the metadata information. These are various Metadata options. 

    itemName: File name or Folder Name

    itemType: The type of folder

    size: The size of the file in bytes.

    Created: Date and time of creation of file and folder.

    lastModified: The last modified date and time of the file or folder.

    childItems: A list of sub-files and sub-folders within a folder.

    contentMD5: The MD5 hash of the file content.

    Structure: The structure inside the file or relational table, including column names and types.

    columnCount: The number of columns inside the file or relational table.

    Exists: A boolean that checks if the file, folder, or table exists. 

    1. Question 20: If you want to use the output by executing a query which activity shall you use 

    Ans. You should use Lookup Activity if you want to use the output of the query in the later activity in the pipeline. In the pipeline first, add the Vlookup activity and configure the source dataset. You can specify the query in the activity setting which you want to execute. The Lookup Activity will execute the query and provide the output. 

    Lookup Activity is useful in scenarios where you require conditional logic and you need to parameter that extracted data. For Example, you need to look up the Activity to retrieve the name to press the next file.

    SELECT TOP 1 FileName 

    FROM FileTracking 

    WHERE Status = ‘Pending’ 

    ORDER BY CreatedDate

    Use the output in a copy Data Activity to process the next step, Reference the file name dynamically in the source:

    @activity(‘LookupActivity’).output.firstRow.Filename

    How do you verify the presence of a file in storage in ADF?

    Ans. By Get Metadata activity and configuring to point the relevant storage location and specifying the existing field in the Metadata field list. This property checks the target file or folder and returns the value as TRUE or FALSE. You can use this output for further processing as in condition statements where the file is present or not.

    1. Question 21: Difference between pipeline and dataflow.

    Ans. Pipeline in Azure datafactory is like the work flow or a container which manages activities that move and transform the data.

    Pipeline manages the workflow whereas dataflows handle the data transformations. 

    Dataflow in ADF is used for transforming the data through the visual interface. You can perform operations like Join, Aggregations and cleaning the data without writing the code.

    Question 22: How would you handle the scenario where one pipeline fails and another should continue.

    Ans. In Azure data factory for such scenarios you can manage by dependency and failure handling mechanism. 

    Continue even the failure. If one pipeline fails, configure the dependency condition on the subsequent activity to “Continue on Failure”. In this case the next activity will run even if the previous activity fails.

    For this you can go properties and under the general tab, set the “Success”  dependency condition to “ Continue on Failure”.

    ERROR HANDLING 

    An error handle mechanism can handle the error. 

    Add if condition and define the same that the outcome of the previous activity, in the failure of previous activity continue with another dependent pipeline.

    Conclusion 

    Azure data factory is a very essential tool, if you are trying to enter into the Data Field. If you have experience in Azure cloud then you can get a very high package job which is really high in demand right now. Console Flare is the institute where you will have a chance to be an expert in this field, Because you will be guided by the Industry Experts.

    For more such content and regular updates, follow us on FacebookInstagramLinkedIn

    seoadmin

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Back To Top