How to read .xlsx file in Databricks using Pandas

Step 1: In order to read .xlsx file, you need to have the library openpyxl installed in the Databricks cluster.

Steps to install library openpyxl  to Databircks cluster:

Step 1: Select the Databricks cluster where you want to install the library.

Step 2: Click on Libraries.

Step 3: Click on Install New.

Step 4: Select PyPI.

Step 5: Put openpyxl in the text box under Package and click Install.

You will get Status as installed once the library is installed.

Step 2: Open the Databricks notebook.

Step 3: Write and run the code provided below to read the .xlsx file and store the values in Dataframe:

import pandas
ization = pandas.read_excel("/dbfs/mnt/Ization/Sample.xlsx",sheet_name=0,header = 0,dtype=None)
ization = spark.createDataFrame(ization)

If you want, you can create a temporary view to store the Dataframe.

ization.createOrReplaceTempView("ization")

Thank you for reading and Happy Learning.

Ready to elevate your skills? Click here for my Python book and here for my Machine Learning book on Kindle.

Leave a comment

Create a website or blog at WordPress.com

Up ↑