Tuesday, February 8, 2011

Apache Cassandra/ Cassandra-cli

I found a bit new conceptual database system which is totally different from the traditional RDBMS. You may not known that this database system is the main database system used in your favorite (==!my) facebook and twitter. So what is it?? It's Apache Cassandra.
Apache Cassandra firstly owned by facebook and then it contributed the code to the Apache Software foundation at 2008. The interesting part of this database system is that  uses both nice concepts,

  • Bigtable: A Distributed Storage System for Structured Data by Google Inc.
  • Dynamo: Highly Available Key Value Store by Amazon.com
For more information you can refer the papers.



In this post I want give a little introduction to this new concepts and show some usages of it. First lets know about some key concepts (may bit weird for a traditional RDBMS users) which is totally different from the RDBMS. Here there is no concepts like database, schema, table, and row. Also the meaning of the column is bit different. But we can do a mapping between the RDBMS and these concepts which you will see later in this post.

Here we go...........
  • Keyspace: Collection of "column family"s
  • Column Family: Collection of "key"s
  • Super Column Family:  Collection of "column family"s. This use to create a column family tree structure in a Keyspace.
  • Key: A unique identifier to identify a row in a "column family"
  • Column: Use to store data. This contain three values name, value and timestamp
Now lets try this concepts. For this I use the cassandra-cli tool.

What you need?
Java 1.6 (Open and Sun)

First we need to start the Apache Cassandra by executing the following command
sudo sh CASSANDRA_HOME/bin/cassandra -f
This command will start the cassandra server in localhost port 9160. (We can use the cassandra as a clustered database which I will discuss it in a later post). The -f argument will cause the Cassandra to remain in the foreground and log to standard out.

Then we can use the cassandra-cli tool by executing the following command
CASSANDRA_HOME/bin/cassandra-cli -host localhost -port 9160



Now we can use the commands in the command line. As appear when starting you can use, help or ? to help and exit or quit to exit from the cassandra-cli.

First we should create a Keyspace.
Command: 
create keyspace Keyspace1

Then we can create a column family in that keyspace. To do this we have to select the keyspace which we going to create the column family.
Command: 
use Keyspace1
Creating the column family
Command: 
create column family Users with comparator=UTF8Type and default_validation_class= UTF8Type

Here I used with comparator=UTF8Type and default_validation_class=UTF8Type attributes to make the default column type to UTF8Type.

We can use this column family to store data. As an example lets think we want to use my details,
First name: Eranda 
Last name: Sooriyabandara
Age: 24

Commands:
set Users[eranda][first]='Eranda'
set Users[eranda][last]='Sooriyabandara'
set Users[eranda][age]=long(42)  

Here eranda is the key of a row and first, last and age are the columns. This means the three commands I mentioned above are to add values to a single row. long(42) is to set the age column value to long.

Now lets try retrieving data from Cassandra. To this we can use the key as follows, 
Command:
get Users[eranda]

Results:
=> (column=age, value=42, timestamp=1297178199133000)
=> (column=first, value=Eranda, timestamp=1297178178790000)
=> (column=last, value=Sooriyabandara, timestamp=1297178188354000)
Returned 3 results.

Now lets see how I executed these commands in my machine.



If you remember I mentioned that we can map the concepts between RDBMS and Cassandra which you may already feel. But you must remember there is a difference.

database-keyspace
schema-super column family
table-column family
key-row

For more details you can refer Cassandra wiki.