This weekend I wanted to use Hbase with Zeppelin on my Hadoop cluster. The Hbase interpreter does not appear to be available by default with my install of Hortonworks (HDP 2.6), so I had to install it. This was the first interpreter set-up that was not smooth, so I thought I would share my experience. FYI – I am running an IaaS cluster in Azure on Ubuntu VMs.
Here are the simple steps I took to fulfill my weekend dream.
Installing Hbase Interpreter on Zeppelin Server
You will need to SSH into the server running Zeppelin. You can find this server using Ambari, just click on the Zeppelin Notebook menu item, then click or hover over Zeppelin Notebook in the summary section.
Once SSH-ed into the server, let’s look at the available interpreters. To view the interpreters, run the following:
sudo /usr/hdp/current/zeppelin-server/bin/install-interpreter.sh --list
A similar list should be displayed:
To install the Hbase interpreter, run:
sudo /usr/hdp/current/zeppelin-server/bin/install-interpreter.sh --name hbase
Once the installation is complete, we are ready to add the Hbase interpreter using the Zeppelin UI.
Adding Interpreter to Zeppelin
Click Interpreter on the Zeppelin UI; Use the drop-down arrow next to your logged-in user name at the top right.
Click the Create button to add the Hbase Interpreter. Give it the name hbase and select the group hbase (the SSH command we ran added the hbase to the groups). Once you choose the hbase group three default properties will be added (this is very helpful when a lot of properties are required).
Let’s leave the defaults for now. Click the Save button.
Create a new notebook and… we need to ensure the Hbase interpreter is available. Almost forgot.
Click on the small gear icon in the top right of the notebook.
Select our Hbase interpreter – it should be a blue-ish color when selected. Click the Save button.
Alright back to the notebook.
Create a new notebook paragraph and let’s create a new Hbase table with our new interpreter. Right now I want a table to capture comfort conditions inside my house. It is cold and dry, I have IOT sensors that are streaming data to be stored into Hbase. Maybe someday I can use an Azure Function to tell me to move south when it get too cold in my house.
Let’s write and run the following:
%hbase create 'Sensors','Comfort'
You may get an error:
org.apache.zeppelin.interpreter.InterpreterException: HBase ruby sources is not available at ‘/usr/lib/hbase/lib/ruby’
I discovered the default directory does not exist. Simple work around. Use the current client directory, everything required appears to be available.
Go back to the Interpreter screen and edit your Hbase interpreter. Update Hbase home with:
Or is it…
When the code is ran again I get the error:
org.jruby.exceptions.RaiseException: (NameError) cannot load Java class org.apache.hadoop.hbase.quotas.ThrottleType
I found that the following dependencies need to be added to the Hbase Interpreter. Thus far (fingers crossed) I have not had any addition problems after adding these dependencies.
/usr/hdp/current/hbase-client/lib/hbase-client.jar /usr/hdp/current/hbase-client/lib/hbase-protocol.jar /usr/hdp/current/hbase-client/lib/hbase-common.jar
Everything should work now once the Interpreter has been saved and restarted. Running the code again should produce something similar to:
0 row(s) in 1.9130 seconds
Fun with Hbase and Zeppelin
Let’s add some data; create a new notebook paragraph and add:
%hbase put 'Sensors', 1, 'Comfort:Temperature', 68 put 'Sensors', 1, 'Comfort:Humidity', 0.22 put 'Sensors', 2, 'Comfort:Temperature', 65 put 'Sensors', 2, 'Comfort:Humidity', 0.23 put 'Sensors', 3, 'Comfort:Temperature', 60 put 'Sensors', 3, 'Comfort:Humidity', 0.21 put 'Sensors', 4, 'Comfort:Temperature', 55 put 'Sensors', 4, 'Comfort:Humidity', 0.20 put 'Sensors', 4, 'Comfort:Relocate', 'TRUE'
Let’s create a new notebook paragraph and view the data:
%hbase scan 'Sensors'
Your results should look similar to:
ROW COLUMN+CELL 1 column=Comfort:Humidity, timestamp=1518416761608, value=0.22 1 column=Comfort:Temperature, timestamp=1518416761602, value=68 2 column=Comfort:Humidity, timestamp=1518416761617, value=0.23 2 column=Comfort:Temperature, timestamp=1518416761612, value=65 3 column=Comfort:Humidity, timestamp=1518416761625, value=0.21 3 column=Comfort:Temperature, timestamp=1518416761621, value=60 4 column=Comfort:Humidity, timestamp=1518416761634, value=0.2 4 column=Comfort:Relocate, timestamp=1518416761638, value=TRUE 4 column=Comfort:Temperature, timestamp=1518416761629, value=55 4 row(s) in 0.0150 seconds
Next time I will query this table from Zeppelin using Phoenix and Hive. Using Hive or Phoenix with Hbase in Zeppelin is a great way to conduct data exploration and gain some quick insights.
I hope this helps those having the same problems I had. I will be sure to keep this article current if I come across new information.