Friday, April 3, 2015

Getting Started With Cassandra noSQL

Even though most of my experience is with Microsoft SQL Server and their BI stack, I like to tinker with other things. I am currently playing around with Cassandra noSQL from Datastax.

First off, what is noSQL? There are some really long, drawn out explanations out there, but to put it simply, noSQL is a database system that consists of many different types of technologies to handle big data. It can handle large volumes of data, whether it's structured or not.

I will be playing with noSQL a bit and exploring what you can do with it, but this post is just a tutorial on creating a connection and setting up a database in Cassandra. For this tutorial, I will using Cassandra community edition. You can download it from planetcassandra.org/cassandra. To interact with the instance of Cassandra, I will be using DevCenter from Datastax. You can download that from the Datastax download page.

I'm not going to cover the installation process, since  it was actually really straight forward. I installed it on a Windows laptop and followed the few prompts that it gave me. I didn't need to change any of the installation options. DevCenter has no installer. It only uses an executable. Be aware that you do need to have a Java JRE or JDK installed in order to use DevCenter.

In DevCenter, click the 'New Connection' icon.













For the connection name, I will just call it localhost. Type localhost into the 'Contact Hosts' field and then click the 'Add' button to the right. Leave all the other options the way they are and click 'Finish'.




















Localhost will now be displayed in the connections pane.

Now it's time to create storage objects. To interact with this Cassandra instance, I will have to use CQL (Cassandra Query Lsnguage). You may find that it is not too different than T-SQL. The first object that needs to be created is a 'keyspace'. Think of this as a database in SQL Server. The following syntax creates a keyspace named 'WHO_Mortality'.

Create keyspace "WHO_Mortality" WITH Replication = {'class' : 'SimpleStrategy', 'replication_factor' : 1 };

When creating a keyspace, replication is a required property. It relates to the nodes of a cluster over which the data needs to be replicated. There are different types of replication that can be used, but for this simple tutorial, I am using 'SimpleStrategy'. Run the statement by clicking the green run icon.







Now that the keyspace has been created, the next step is to create tables. This database will eventually hold mortality data from the World Health Organization. A little deranged, but what did you expect? Besides, there is a lot of interesting health related info in my staging database.

I will start with the 'Country' table. It's just a small table that will hold a list of countries and their ID's. Use the following syntax to create the Country table.

Create Table "WHO_Mortality".Country (
        CountryID int PRIMARY KEY,
        Country varchar
);

There are a few things I want to point out about these two statements. In the first statement, when creating the keyspace, notice that I added double quotes around the keyspace name. Encompassing an object name in double quotes makes it case sensitive. If the object is made case sensitive, then it must be typed exactly the same way every time it is  referenced in code. Notice that I didn't do that for the table or column names. There is no reasoning behind that really other than to show how that works. Also, when creating any CQL statements, you must end it with a semicolon. Whether it's a truncate table statement, create statement or drop table statement. Every statement needs it.

Now that I have the keyspace and a table, I can create the rest of my tables. One other incredibly important thing to remember is that Cassandra doesn't support foreign keys. You will find that there are certain types of commands that Cassandra can't handle such as joins. Because of this, the database will need to be designed according to the queries that will be run against it. Remember, noSQL is designed to handle incredibly large amounts of data, so the tables can be wider than what would typically be seen in a relational database system. This is just a simple tutorial on creating the keyspace and a table. More than likely, as my adventure in Cassandra continues, I will be altering this table and adding more columns until I am more familiar with the design concepts and have a finished database I can load.

The end result for this noSQL database will be another tutorial in which I load it using data that is stored in MSSQL Server.

0 comments:

Post a Comment