Dataframe exchange between Notebook cells

Dataframe exchange between Notebook cells

Dataframe exchange between Notebook cells is a databricks programming solution on how to pass or transfer dataframe data from Python cell to Scala Cell within the Python notebook. 

Databricks Notebooks allows us to add or execute the code written another language with the help of adding more command cells to a single notebook.
 
It means that there is a notebook created in one language and we can use another language(scala/R/SQL) code with in the same notebook.

Exchange data between cells

Sometimes there is a scenario that we need to pass/transfer dataframe data from Python cell to Scala cell for data processing. We want to work on some functionality that is not yet supported by Python, in such cases, we can use other languages cells within the same notebook.

example:

Spark Connector bulk copy functionality is not supported by Python.

Please click on the below link to know more details about the bulk copy workaround in Python.

Click here 👉 bulkcopy workaround

So how to handle this scenario without much effort in Python?

Steps to follow:

Step 1:
Navigate to databricks workspace
Step 2:
Open the databricks cluster with in the workspace
Step 3:
If the cluster is offline, just restart the cluster
Step 4:
Open the notebook created in Python language
Step 5:
In the notebook, click on '+" symbol on the existing cell to create a new cell
Step 6:
Type '%scala' on top of the new cell
Now the cell is ready to execute any Scala code
Step 7:
Create one dataframe/use any existing dataframe in Python cell and push the data to dataframe
Step 8:
Once the data is ready in dataframe, do the following steps
In Python cell, create one temporary table like below:

Type the below code snippet in Python cell:
df.registerTempTable("sometemptable") ( here df is dataframe)
  • In Scala cell, use the below code to get the temporary table
Type the below code snippet in Scala cell:
val scaladf = table("sometemptable")
Conclusion

We can do the vise versa also. We can pass the dataframe data from the Scala cell to Python cell as well. The only difference is 'registerTempTableis not supported by Scala. We have to use 'CreateOrReplaceTempView ' function in Scala to create a temp table,

that's it!!! data is ready now in the Scala cell.



No comments:

Post a Comment