Introduction
ETL is one of the major tasks for any data engineer, and we have many solutions either on-premise or cloud solutions available in the market to implement this concept, in Microsoft Azure, Azure Data Factory is the ETL solution to implement data pipelines using data from the cloud source or data from on-premise sources, Azure Data Factory is very flexible, GUI ETL solution which provides different types of connectors to on-premise and cloud sources, in this article, we will go through the basic components of Azure Data Factory and we will implement a Full Use Case step by step.
Data Factory Components
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-2.png)
Dataflow
One of the main building blocks of data factory which represents simply the ETL job. Any ETL job has three components Source Dataset(s), Transformations, Target Dataset(s). In Data Factory we call ETL job as a Dataflow
Activity
Activity is one task or a collection of tasks that can be executed sequentially or in parallel, we have different types of tasks we can execute such as a Dataflow task which we already discussed in previous point, Copy Data task, Databricks task, and so on.
Linked Service
Linked Service is the connection details for a specific source, to connect to any data source or an kind of services in Data Factory we need to create a Linked Service for this source or service, you can think of it as a connection statement with all required details to connect to a specific source or service.
Pipeline
The pipeline is a workflow to orchestrate a group of activities together in sequential or parallel order. A common use case is when you have a group of activities that are related to the same use case and affect the output data you will need to create a pipeline to control the flow of data and dependencies between activities. The conclusion is that the Pipeline is a logical grouping of activities.
Use Case
In this use case, we will use Copy Data activity to migrate data from Blob Storage to SQL Server database, Objective from this use case is to walk you through the development process of Data Factory, and see in action Data Factory components.
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-10-1024x967.png)
1- Login to Azure portal
2- Click on Create a Resource
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-11.png)
3- Search for Data Factory and click on it
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-12.png)
4- Click on Create
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-14-1024x578.png)
5- Choose Azure Subscription for your billing, Resource group, region, name of your data factory, in our case we will name our Data Factory as dvdevelopmentdf
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-15.png)
6- You can link your data factory development with a Git repository either Github or Azure DevOps Repository , click on Configure Git Later
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-16.png)
7- If needed add Tag to your Data Factory, Tags enables you track expenses, for example, we added team tag to track all resources that has the same team tag value
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-18.png)
8- Review settings and click Create
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-19.png)
9- Wait till resource is created successfully then click on Go to Resource
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-21.png)
10- Click on Author & Monitor
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-22-1024x368.png)
11- Next, we need to create Linked Services (Connections) to our source and target for our Copy Data Activity, to do that click on Manage
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-23.png)
12- These two demos will show you the process of creating a Linked Service for SQL and Storage services
SQL Connection
Storage Connection
13- Now, we are ready to create a Copy Data Activity, go back to the main Data Factory window, click on Data Factory, then click on Copy Data
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-24-1024x303.png)
14- Enter Task details, Task Name –> Copy Data Task name, then choose Task schedule to run it once or run it based on schedule
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-25.png)
15- Next, we will enter source details, choose source connection
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-27-1024x735.png)
Choose the path to your file
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-28-1024x542.png)
Then choose file configurations, like file format, column delimiters, row delimiters, then check on First Row Header in case the first row has the column names, and check sample of data to make sure data is readed successfully
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-29-1024x541.png)
16- Choose Target Connection, in our case it is SQL connection
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-30-1024x709.png)
17- We can create target table dynamically, or choose an existing table from the target connection, in our case we will create the table dynamically using the source file schema
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-31-1024x659.png)
18- Next, we will choose column mapping between source and target columns, also we can change data types for the output columns
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-32-1024x559.png)
19- The last step is to configure general settings for our task, check Data Consistency verification to let data factory run validation tests between source and target data, choose from the list of fault tolerance approaches, here we will choose to skip incompatible rows, Enable Logging to log task events to a destination, here we selected storage connection, then we clicked on Browse and select a directory we already created for the logs, click Next
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-33-1024x708.png)
20- Review Summary, then click Next
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-34-1024x604.png)
21- The last step is deployment step, data factory will validate our task, create a target dataset, create a pipeline for our activity, then will execute our pipeline. Note, pipeline that will be created will contain only one activity which is our copy data activity
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-35-1024x529.png)
Go to Author window, and you can see we have a pipeline with our activity
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-36-1024x574.png)
22- Check Monitor window, you can see our pipeline executed successfully
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-37-1024x219.png)
Now let’s check the data in the target table to validate our pipeline
![](https://blog.datavalley.technology/wp-content/uploads/2020/08/image-38.png)
Congratulations, you have your first pipeline executed successfully, now you have a practical sense of Data Factory process end to end and of the Data Factory components.