Connect ADL using ADF

In this Connect ADL using the ADF journal, we can learn about ‘connect to Azure data lake(ADL) using Azure data factory(ADF).

Azure Data lake(ADL) is a Microsoft platform as a service(PaaS) model, used as a storage for unstructured (.csv,.txt), semi-structured(.json,.xml) and structured(.parquet,.orc) data in a big volume. Also, support the on-demand analytics job service on the data.

Azure Data Factory(ADF) is a cloud-based integration service used to organize and transform the data on a large scale.

We can connect to Azure data lake storage using any of the below three approaches:

1. Azure Data Factory
2. Azure Databricks
3. Azure Datawarehouse

Data lake (ADLS) connectivity using ADF

Step by step process to connect:

Step 1:

Enter the URL ‘portal.azure.com’ in the browser

Step 2:

Once the site is loaded, click on three horizontal lines icon on the left- hand top side.

Step 3:

Select all resources option

Step 4:

Select the data factory from the listed resources. We can use type filter to filter out only existing data factories.

Note: Create a new data factory, in case of no one

Step 5:

From the listed data factories, click on the existing data factory you want to work on

Step 6:

Once data factory details loaded, select Overview

Step 7:

In Overview, select the Author & Monitor section

Step 8:

Now, Azure Data Factory details open in another tab

Step 9:

Select ‘Manage’ from the left pane

Step 10:

Now, click on the connections link

Step 11:

Select the linked services, click on ‘+ New’ to create a new linked service

Linked Services:

Linked services are like connection strings, used to establish a connection to other data sources and services.

Step 12:

On new linked services, select the Azure Data lake zen 2 option

Step 13:

Now, enter the required details for ADL zen 2

a. Select the Authentication mode from the dropdown, the options are Account key, Service Principal and Managed Identity

1. Select Test Connection
2. Select File Path
3. Select ‘To linked service’

Click on the Test connection button

Step 14:

Now, it is time for pipeline creation

Pipeline:

The pipeline is the place where you define the workflow.

Pipeline Creation:

Step 15:

Select ‘Author’

Step 16:

There are three options under Author

1. Pipelines
2. Datasets
3. Dataflow

Step 17:

Select Pipelines option

Step 18:

Click on the new pipeline, provide the name for new pipeline and other mandatory details

Step 19:

Now select the Activities

Step 20:

Now, select the copy data option and drag and drop it on template area

Step 21:

Click on the copy data from the template

Step 22:

Select Source from the menu

Step 23:

Configure source dataset (either choose it from the dropdown if already available) or create a new source data set

Step 24:

Next, in the search area just type data lake

Step 25:

Select the data lake zen2 option and click on continue

Step 26:

Now select the format, the available formats are

a. Avro
b. ORC
c. Binary
d. Delimited Text
e. Excel
f. JSON
g. Parquet

Click on continue

Step 27:

Now set the properties for the format we choose

Step 28:

For example: if we select format as ‘Parquet’, set the linked service

Step 29:

Select the file path and click continue

Step 30:

We have to select the source file folder for the parquet files

Click on continue to check the validity for the proper file path.

Once verification is successful, we are ready to access the data lake path from the ADF.

Conclusion

Just follow the steps from step 1 to step 30, to enable the connectivity between ADL to ADF.

Connect ADL using ADF

Data lake (ADLS) connectivity using ADF

Linked Services:

Pipeline:

No comments:

Post a Comment

Blogger

Follow Us

Blog Archive

Popular

Tags

Search This Blog