In this Connect ADL using the ADF journal, we can learn about ‘connect to Azure data lake(ADL) using Azure data factory(ADF).
We can connect to Azure data lake storage using any of the below three approaches:
1. Azure Data Factory
2. Azure Databricks
3. Azure Datawarehouse
Data lake (ADLS) connectivity using ADF
Step by step process to connect:
Step 1:
Enter the URL ‘portal.azure.com’ in the browser
Step 2:
Once the site is loaded, click on three horizontal lines
icon on the left- hand top side.
Step 3:
Select all resources option
Step 4:
Select the data factory from the listed resources. We can use
type filter to filter out only existing data factories.
Note: Create a new data factory, in case of no one
Step 5:
From the listed data factories, click on the existing data factory you want to work on
Step 6:
Once data factory details loaded, select Overview
Step 7:
In Overview, select the Author & Monitor section
Step 8:
Now, Azure Data Factory details open in another tab
Step 9:
Select ‘Manage’ from the left pane
Step 10:
Now, click on the connections link
Step 11:
Select the linked services, click on ‘+ New’ to create a new linked service
Linked Services:
Step 12:
On new linked services, select the Azure Data lake zen 2
option
Step 13:
Now, enter the required details for ADL zen 2
a. Select the Authentication mode from the dropdown, the options are Account key, Service Principal and Managed Identity
1. Select Test Connection
2. Select File Path
3. Select ‘To linked service’
Click on the Test connection button
Step 14:
Now, it is time for pipeline creation
Pipeline:
The pipeline is the place where you define the workflow.
Pipeline Creation:
Step 15:
Select ‘Author’
Step 16:
There are three options under Author
1. Pipelines
2. Datasets
3. Dataflow
Step 17:
Select Pipelines option
Step 18:
Click on the new pipeline, provide the name for new pipeline and
other mandatory details
Step 19:
Now select the Activities
Step 20:
Now, select the copy data option and drag and drop it on
template area
Step 21:
Click on the copy data from the template
Step 22:
Select Source from the menu
Step 23:
Configure source dataset (either choose it from the dropdown if
already available) or create a new source data set
Step 24:
Next, in the search area just type data lake
Step 25:
Select the data lake zen2 option and click on
continue
Step 26:
Now select the format, the available formats are
a. Avrob. ORCc. Binaryd. Delimited Texte. Excelf. JSONg. Parquet
Click on continue
Step 27:
Now set the properties for the format we choose
Step 28:
For example: if we select format as ‘Parquet’, set the
linked service
Step 29:
Select the file path and click continue
Step 30:
We have to select the source file folder for the parquet files
Click on continue to check the validity for the proper file path.
Once verification is successful, we are ready to access the data lake path from the ADF.
Conclusion
Just follow the steps from step 1 to step 30, to enable the connectivity between ADL to ADF.
No comments:
Post a Comment