Dataframe exchange between Notebook cells is a databricks programming solution on how to pass or transfer dataframe data from Python cell to Scala Cell within the Python notebook.
Databricks Notebooks allows us to add or execute the code written another language with the help of adding more command cells to a single notebook.
It means that there is a notebook created in one language and we can use another language(scala/R/SQL) code with in the same notebook.
Exchange data between cells
Sometimes there is a scenario that we need to pass/transfer dataframe data from Python cell to Scala cell for data processing. We want to work on some functionality that is not yet supported by Python, in such cases, we can use other languages cells within the same notebook.
example:
Spark Connector bulk copy functionality is not supported by Python.
Please click on the below link to know more details about the bulk copy workaround in Python.
Click here 👉 bulkcopy workaround
So how to handle this scenario without much effort in Python?
Steps to follow:Step 1:
Navigate to databricks workspace
Navigate to databricks workspace
Step 2:
Open the databricks cluster with in the workspace
Open the databricks cluster with in the workspace
Step 3:
If the cluster is offline, just restart the cluster
If the cluster is offline, just restart the cluster
Step 4:
Open the notebook created in Python language
Open the notebook created in Python language
Step 5:
In the notebook, click on '+" symbol on the existing cell to create a new cell
In the notebook, click on '+" symbol on the existing cell to create a new cell
Step 6:
Type '%scala' on top of the new cell
Now the cell is ready to execute any Scala code
Type '%scala' on top of the new cell
Now the cell is ready to execute any Scala code
Step 7:
Create one dataframe/use any existing dataframe in Python cell and push the data to dataframe
Create one dataframe/use any existing dataframe in Python cell and push the data to dataframe
Step 8:
Once the data is ready in dataframe, do the following steps
Type the below code snippet in Python cell:
Once the data is ready in dataframe, do the following steps
In Python cell, create one temporary table like below:
df.registerTempTable("sometemptable") ( here df is dataframe)
- In Scala cell, use the below code to get the temporary table
val scaladf = table("sometemptable")
Conclusion We can do the vise versa also. We can pass the dataframe data from the Scala cell to Python cell as well. The only difference is 'registerTempTable' is not supported by Scala. We have to use 'CreateOrReplaceTempView ' function in Scala to create a temp table,
that's it!!! data is ready now in the Scala cell.
No comments:
Post a Comment