Cassandra Partitioner - Order-Preserving Partitioner and Random Partitioner
Cassandra is a distributed system and can be composed of many nodes in a cluster. When your application stores data in Cassandra, the Cassandra Partitioner figures out where to store the data in the cluster. When your application requests data, the Partitioner figures out what nodes in the cluster contain the range of keys that map to the data your looking for. There are two types of partitioner’s in Cassandra, one is the Order-preserving Partitioner and the other is the Random Partitioner. The default Partitioner in Cassandra is Random Partitioner. The Partitioner to use in your Cassandra deployment can be configured through storage-conf.xml configuration file. It’s important to note that once you have configured a Partitioner you cannot change it. The only way to change it would be to destroy your data and start all over. The reason why you can’t change the Partitioner once configured is due to the fact that SSTable in Cassandra is immutable (only supports appending).
Order-Preserving Partitioner stores rows by the key order of your sort order. This helps Cassandra understand what nodes in the cluster have what keys. Storing the data by its sort order makes range slice queries very efficient. On the flip side, the distribution of keys ends up being unevenly distributed across the cluster. In other words, some nodes will contain much more data than others.
Random Partitioner uses a different mechanism for distributing keys in the cluster. It uses an MD5 Hash to figure out where to place the key in the Cassandra node ring. This has a very nice effect that allows the keys to be evenly distributed across the cluster. You can expect that all nodes in the cluster share a close to equal amount of data making it a great load balancing solution. The downside is that range queries are not efficient, since Cassandra doesn’t know upfront the location of keys in the node ring.