Cassandra Delete Column Example
Cassandra Deleting, Removing Columns from ColumnFamily Tutorial
Tutorial Details:
Program: Cassandra
Difficulty: Moderate
Estimated Completion Time: 15 minutes
Chances are, your just getting your feet wet with Cassandra and now your ready for some serious development. In this tutorial we will explore how to programmatically remove Columns from a ColumnFamily in the Cassandra data store using the Java programming language.
The version of Cassandra used in this tutorial is 0.7.3 with Thrift 0.5.
Just to set the context, in Cassandra a ColumnFamily is similar to a database table and a Cassandra Column is similar to a database column. The big difference between Cassandra and a traditional relational database is that Cassandra does not enforce a fixed schema. In other words, a column family can contain rows with different number of columns. For example, a User column family might contain the columns (username and password) for one row and (username, password, description) for another. Row id's in Cassandra are analagous to the primary key of a table in a relational database. Row id's are effectively used to identify a single row of columns in a Cassandra column family. Row id's are typically a variable length string.
Now that we know a bit more about column families and how they can have different number of columns per row, we can explore how to delete a column for a specified row in a Cassandra ColumnFamily.
For example sake, we will use a simple User entity model, with a default schema of username, password, and description. In our delete example we will populate the User column family with dummy data, then delete the “description” column for a given row and observe how the User entity has mutated. Note: we will create and populate the User column family from the Cassandra command line interface and programmatically delete the description column from Java.
Let’s jump into the Cassandra command line interface and setup our User ColumnFamily and populate it with dummy data.
First, let’s define our tutorials keyspace and User column family in a schema file that we can supply to the Cassandra command line interface. In your favorite text editor, add the following code and save the file as cassandra-tutorial.txt
create keyspace tutorials;
use tutorials;
create column family User with
comparator = UTF8Type and
column_metadata =
[
{column_name: username, validation_class: UTF8Type},
{column_name: password, validation_class: UTF8Type},
{column_name: description, validation_class: UTF8Type}
];The first couple of lines create a new keyspace with the name “tutorials” and we change our context to the newly created keyspace. The following lines create the User column family with the columns username, password, and description. Each column is defined in the metadata with the validation class UTF8Type. Setting the validation class to UTF8Type allows us to actually view the human readable values of the data in our User column family. Not setting the metadata would default to binary and we would not be able to view our column data in a human readable format. If your not familiar with the concept of keyspace and column families, I strongly recommend you stop here and get familiar with these concepts in order for this tutorial to be useful.
Assuming your Cassandra server is running on localhost and the default port 9160, change directory into your Cassandra installation directory and run the following command:
> bin/cassandra-cli –h localhost –p 9160 –f cassandra-tutorials.txt
This basically created our keyspace and User column family, now we need to populate the User column family with dummy data. We will create two rows in the User column family with values for the username, password and description columns. Enter the following commands from your Cassandra installation directory.
> bin/cassandra-cli –h localhost –p 9160 use tutorials; set User[‘100’][‘username’] = ‘abelperez’; set User[‘100’][‘password’] = ‘drowssap’; set User[‘100’][‘description’] = ‘a nice dude’; get User[‘100’];
Your output should look something similar to the following (ignore timestamp values):
=> (column=description, value=a nice dude, timestamp=1301874610643000) => (column=password, value=drowssap, timestamp=1301874508235000) => (column=username, value=abelperez, timestamp=1301874460900000) Returned 3 results.
Now lets create a second user in our User column family.
set User[‘101’][‘username’] = ‘anthony’; set User[‘101’][‘password’] = ‘bluevelvet’; set User[‘101’][‘description’] = ‘some other dude’; get User[‘101’];
Now that we have setup and verified our test keyspace and column family we can move on to the fun stuff. In order to programmatically interact with Cassandra from Java you will need to add the following Jar files to your project:
apache-cassandra-0.7.3.jar libthrift-0.5.jar log4j-1.2.16.jar slf4j-api-1.6.1.jar slf4j-log4j12-1.6.1.jar
These Jars contain the Cassandra API, Thrift API and required logging libraries.
All interaction with Cassandra begins with getting a connection to one of the Cassandra nodes in a cluster. In our example, we are connecting to the stand-alone Cassandra server running on localhost. The Thrift API provides an API for connecting to Cassandra. Here is a simplified way to connect. Keep in mind, this code is for example sake only and is not the recommended way for a production environment. For a production environment it’s recommend you keep things DRY and most likely use a connection pool. Many high-level Cassandra Client API’s provide such things.
TTransport transport = new TFramedTransport(new TSocket(“localhost”, 9160)); TProtocol protocol = new TBinaryProtocol(transport); Cassandra.Client client = new Cassandra.Client(protocol); transport.open();
Once a connection had been established, you need to specify the keyspace for this session. You can specify the keyspace by setting invoking the set_keyspace function:
client.set_keyspace(“tutorials”);
Next we need to define the row id of the User column family that we plan to modify. This will be the id 100 that we defined when we setup the dummy User column family in the tutorials keyspace. Note, the row id needs to be a byte buffer so we derive a byte buffer from the string “100”.
ByteBuffer rowid = ByteBuffer.wrap(“100”.getBytes());
Next we need to define a column path and set its column family and column properties. You can think of a column path as somewhat of a directory structure. It’s the lookup path to a single column. The column family property column_family is basically the name of the column family in which the column we plan to delete resides in. And lastly, the column property defines the column we want to delete. The use of a column path is very common, it’s useful to understand that a column path is simply put the absolute location of a column in Cassandra.
Here we construct and set the properties of the ColumnPath required for our delete operation. We set the column name to “description” since this is the column we want to remove. Notice we wrap the string value “description” in a byte buffer. This is because Cassandra stores column names as binary values. Many applications store column name/values as UTF-8 serialized strings.
ColumnPath path = new ColumnPath(); path.column_family = entity; path.column = ByteBuffer.wrap(“description”.getBytes());
Cassandra column family operations like inserts and deletes require that you specify a timestamp value. Internally Cassandra uses these timestamp values for things like conflict resolution. Conflict resolution is outside the scope of this tutorial. Unlike an insert operation where the timestamp value can be the current time in milli-seconds, the timestamp value must match the current timestamp of the column data we are trying to remove. So if you recall the get User command we issued from the command line, we need to grab the timestamp of the description column in order for our example remove operation to work properly. Based on the example output listed above, the value of timestamp would be: 1301874610643000. Note: this value will be different when you run the example.
long timestamp = 1301874610643000;
Last but not least we need to define the consistency level for our delete operation. Consistency Level controls the write and read behavior based on the replication factor defined in your schema definition. We will use consistency level of ONE which effectively means that the client remove operation will not respond until the delete has been written to at least one replica’s commit log and memory table. The Thrift API contains an enum ConsistencyLevel with various consisteny level values.
ConsistencyLevel consistency = ConsistencyLevel.ONE;
Now that we have constructed and setup all the required components to actually execute the deletion, we can invoke the remove function of the Cassandra.Client API. The remove function as you might expect takes the row id, column path, timestamp, and consistency level parameters.
client.remove(rowid, path, timestamp, consistency);
Now lets wrap everything up in a main function and see it in action.
import org.apache.cassandra.thrift.Cassandra;
import org.apache.cassandra.thrift.ColumnPath;
import org.apache.cassandra.thrift.ConsistencyLevel;
import org.apache.cassandra.thrift.TBinaryProtocol;
import org.apache.thrift.protocol.TProtocol;
import org.apache.thrift.transport.TFramedTransport;
import org.apache.thrift.transport.TSocket;
import org.apache.thrift.transport.TTransport;
import java.nio.ByteBuffer;
/**
* Cassandra delete column example.
*/
public class DeleteExample
{
public static void main(String[] args) throws Exception {
// open connection to Cassandra server
TTransport transport = new TFramedTransport(new TSocket("localhost", 9160));
TProtocol protocol = new TBinaryProtocol(transport);
Cassandra.Client client = new Cassandra.Client(protocol);
transport.open();
// specify keyspace
client.set_keyspace("tutorials");
// specify row id
ByteBuffer rowid = ByteBuffer.wrap("100".getBytes());
// specify column path
ColumnPath path = new ColumnPath();
path.column_family = "User";
path.column = ByteBuffer.wrap("description".getBytes());
// specify timestamp value of data to be removed
long timestamp = 1301874610643000L;
// set consistency level to one
ConsistencyLevel consistency = ConsistencyLevel.ONE;
// remove the description column from the specified row
// in the User column family
client.remove(rowid, path, timestamp, consistency);
// release resources
transport.flush();
transport.close();
}
}After we run the example we need to verify that the description column was actually removed for row 100 in the User column family. We can verify this from the command line interface with the following command:
get User['100'];
This should now result in something similar to the following:
=> (column=password, value=drowssap, timestamp=1301874508235000) => (column=username, value=abelperez, timestamp=1301874460900000) Returned 2 results.
As you can see the “description” column is no longer available for row 100 in the User column family. We have effectively removed the specified data. Also, if you run the get command for our second user with row id “101” you will notice that our second user still contains three columns (username, password, description). This clearly shows how Cassandra supports different number of columns per row in a column family.
In this tutorial we only scratched the surface on delete operations, there are several other more advanced ways to do things like batch deletes and deletes based on a range. Stay tuned for part two of this tutorial, where we will cover the more advanced delete topics.