Apache Cassandra is a replicated NoSQL database and an ideal solution for situations that require maximum data redundancy, uptime and horizontal scaling across multiple servers. It is an open source application that can easily be managed from a simple command line interface using Cassandra Query Language (CQL) which is very similar to Structured Query Language, making it easy to learn for users that are already firm with SQL.
1 . Connect to your instance via SSH or by using PuttY.
2 . Add the Java repository:
2 . Add the Apache Cassandra repository:
echo "deb http://www.apache.org/dist/cassandra/debian 39x main" | tee /etc/apt/sources.list.d/cassandra.list
3 . Add the required PGP keys to use the repositories:
gpg --keyserver keys.gnupg.net --recv-keys 749D6EEC0353B12C gpg --export --armor 749D6EEC0353B12C | apt-key add - gpg --keyserver keys.gnupg.net --recv-keys A278B781FE4B2BDA gpg --export --armor A278B781FE4B2BDA | apt-key add -
4 . Reload the APT configuration and update the software already installed on your instance:
apt update && apt upgrade
5 . Install Java, Cassandra and NTP. NTP (Network Time Protocol) is used to keep the time of the instance synchronized:
apt install oracle-java8-set-default cassandra ntp
Important: You have to agree to the license terms for Oracle Java when installing it.
Repeat the steps above on three instances in total.
Configuration files of Cassandra are located in the /etc/cassandra directory. cassandra.yaml is the file containing most of the Cassandra configuration, such as ports used, file locations and seed node IP addresses.
The key points to edit are:
1 . Edit the file /etc/cassandra/cassandra.yaml:
cluster_name: 'Test Cluster' num_tokens: 256 seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider - seeds: 10.0.0.1, 10.0.0.2 listen_address: 10.0.0.1 rpc_address: 10.0.0.1 endpoint_snitch: GossipingPropertyFileSnitch
cluster_name: 'Test Cluster' num_tokens: 256 seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider - seeds: 10.0.0.1, 10.0.0.2 listen_address: 10.0.0.2 rpc_address: 10.0.0.2 endpoint_snitch: GossipingPropertyFileSnitch
cluster_name: 'Test Cluster' num_tokens: 256 seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider - seeds: 10.0.0.1, 10.0.0.2 listen_address: 10.0.0.3 rpc_address: 10.0.0.3 endpoint_snitch: GossipingPropertyFileSnitch
To be fault tolerant and to minimize the risk of data loss or downtime, Cassandra distributes data across the cluster. Whenever possible it will ensure that data and backups are stored on a different rack, or datacenter to ensure that the impact of even a failing datacenter will be minimal on the production environment.
2 . Edit the /etc/cassandra/cassandra-rackdc.properties file on each node and set the DC and rack information. You can use your own naming standard to determine the location of each node.
3 . Remove the file /etc/cassandra/cassandra-topology.properties as we do not use it:
1 . Start Cassandra and enable automatic launching on system boot:
systemctl enable cassandra systemctl start cassandra
2 . Verify that the service is running:
systemctl -l status cassandra
3 . Check the status of the cluster with the command
root@scw-cassandra:~# nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.0.0.3 119.29 KiB 256 60,7% c9b13a33-147f-4293-8aaf-21ace6d1b756 rack2 UN 10.0.0.2 170.88 KiB 256 65,3% 2a100701-3da4-444a-892d-164d2222009c rack1 UN 10.0.0.1 15.47 KiB 256 74,1% 93feee5d-3de8-4c0a-908d-2432f26a1a1e rack1
Once all nodes have started, the cluster is ready. You can use the cqlsh tool to interact with your cluster. It is installed by default on any of the nodes.
1 . Connect to your cluster:
cqlsh 10.0.0.1 Connected to scw-cassandra01 at 10.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.2.1 | CQL spec 3.4.0 | Native protocol v4] Use HELP for help. cqlsh>
2 . To quit the CQL shell, type
EXIT and press enter.
More information about the CQL syntax is available in the official documentation.
By default your cluster is named ‘Test Cluster’, to edit the it to a more friendly name, follow these steps:
1 . Login to admin shell with
[new_cluster_name] with your new cluster name:
UPDATE system.local SET cluster_name = '[new_cluster_name]' WHERE KEY = 'local';
2 . Leave the CQL shell with the command
3 . Edit the file
/etc/cassandra/cassandra.yaml on each of the nodes and replace the value in the cluster_name variable with the new cluster name you just set.
4 . Save and close the file.
5 . Run the following command from your Linux terminal to clear the system cache and preserve all data in the node:
nodetool flush system
6 . Restart Cassandra:
systemctl restart cassandra
7 . Log into the cluster with
cqlsh and verify the new cluster name is visible.
If you want to learn more about Cassandra, you may refer to the official documentation.