Databricks has their own python module Delta Live Tables called “dlt”. This is unfortunately not a pypi published package and therefore unique naming doesn’t apply to them, so this causes issues when trying to install the dltHub “dlt” pypi package. To use the dlt package (by dltHub) in a Databricks notebook, you have to do the following on a cluster level:

Add an init script

We need to add an init script (which runs on cluster startup) to install the dltHub dlt package, and modify the name of Databricks’ dlt package so that we can use the dltHub dlt package without any issues.

Add an init.sh file somewhere in your Workspace directory with the following content:

#! /bin/bash

# move Databricks' dlt package to a different folder name
mv /databricks/spark/python/dlt/ /databricks/spark/python/dlt_dbricks

# Replace all mentions of `dlt` with `dlt_dbricks` so that Databricks' dlt 
# can be used as `dlt_dbricks` in the notebook instead
find /databricks/spark/python/dlt_dbricks/ -type f -exec sed -i 's/from dlt/from dlt_dbricks/g' {} \\;

# Replace mentions of `dlt` with `dlt_dbricks` in DeltaLiveTablesHook.py to
# avoid import errors
sed -i "s/'dlt'/'dlt_dbricks'/g" /databricks/python_shell/dbruntime/DeltaLiveTablesHook.py
sed -i "s/from dlt/from dlt_dbricks/g" /databricks/python_shell/dbruntime/DeltaLiveTablesHook.py

# Install dltHub dlt
pip install dlt

Go back to your cluster settings, click “Edit” in the top right corner, scroll all the way down and unfurl the “Advanced Options” section. Then click on “Init scripts” and select the file you just created as follows:

Untitled

Clicking “Add” and then “Confirm” will trigger a restart.

And we’re done!

Usage

Now every time you create a new Notebook on this cluster, you will be able to use the dltHub package as normal with the dlt reference, and any of Databricks’ dlt functionality can be used with the dlt_dbricks reference.