NavigationContentFooter
Jump toSuggest an edit

Configuring a Cassandra Cluster on Ubuntu Bionic Beaver

Reviewed on 26 March 2024Published on 20 October 2018
  • compute
  • database
  • cluster
  • nosql
  • Cassandra-cluster
  • cloud
  • Ubuntu-Bionic-Beaver

Cassandra overview

Apache Cassandra is a replicated NoSQL database and an ideal solution for situations that require maximum data redundancy, uptime and horizontal scaling across multiple servers. It is an open-source application that can easily be managed from a simple command-line interface using Cassandra Query Language (CQL) which is very similar to Structured Query Language, making it easy to learn for users that are already firm with SQL.

Before you start

To complete the actions presented below, you must have:

  • A Scaleway account logged into the console
  • Owner status or IAM permissions allowing you to perform actions in the intended Organization
  • An SSH key
  • An Instance running on Ubuntu Bionic Beaver (18.04)
  • sudo privileges or access to the root user

Installing Cassandra

  1. Connect to your instance via SSH or by using PuTTY.
  2. Add the Apache Cassandra repository:
    echo "deb http://www.apache.org/dist/cassandra/debian 41x main" | tee /etc/apt/sources.list.d/cassandra.list
  3. Add the required PGP keys to use the repositories:
    curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
  4. Reload the APT configuration and update the software already installed on your Instance:
    apt update && apt upgrade
  5. Install Cassandra and NTP. NTP (Network Time Protocol) is used to keep the time of the Instance synchronized:
    apt install cassandra ntp

Repeat the steps above on three Instances in total.

Configuring additional nodes

The configuration files of Cassandra are located in the /etc/cassandra directory. cassandra.yaml is the file that contains most of the Cassandra configuration, such as ports used, file locations and seed node IP addresses.

The key points to edit are:

  • cluster_name: Can be anything chosen by you to describe the clusters name. All members of a cluster must have the same name.
  • num_tokens: This value represents the number of virtual nodes within a Cassandra instance. It is used to partition the data and to spread it throughout the cluster. A good starting value is 256.
  • seeds: These are the IP addresses of the clusters seed servers. Seed nodes are used as known places to obtain cluster information (such as a list of nodes in the cluster). All active nodes have this information, to avoid a single point of failure. They are known locations that can be relied on, to have the information when other machines can come and go. It is recommended to have 3 seed nodes per datacenter.
  • listen_address: This is the IP address that Cassandra will listen on for internal (Cassandra to Cassandra) communication. The software will try to guess the IP address of your Instance if you leave it blank, but it is best to specify it yourself. This information will be specific on each node.
  • rpc_address: This is the IP address that Cassandra will listen on for client based communication, such as through the CQL protocol. This information will change on each node.
  • endpoint_snitch: Represents the ‘snitch’ used by Cassandra. A snitch tells Cassandra which datacenter and rack a node belongs to within a cluster. There are various types that could be used here, you may refer to the official documentation for more information on this topic.
  1. Edit the /etc/cassandra/cassandra.yaml file.

    • On Node 1:
      cluster_name: 'Test Cluster'
      num_tokens: 256
      seed_provider:
      - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      - seeds: 10.0.0.1, 10.0.0.2
      listen_address: 10.0.0.1
      rpc_address: 10.0.0.1
      endpoint_snitch: GossipingPropertyFileSnitch
    • On Node 2:
      cluster_name: 'Test Cluster'
      num_tokens: 256
      seed_provider:
      - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      - seeds: 10.0.0.1, 10.0.0.2
      listen_address: 10.0.0.2
      rpc_address: 10.0.0.2
      endpoint_snitch: GossipingPropertyFileSnitch
    • On Node 3:
      cluster_name: 'Test Cluster'
      num_tokens: 256
      seed_provider:
      - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      - seeds: 10.0.0.1, 10.0.0.2
      listen_address: 10.0.0.3
      rpc_address: 10.0.0.3
      endpoint_snitch: GossipingPropertyFileSnitch

    To be fault-tolerant and to minimize the risk of data loss or downtime, Cassandra distributes data across the cluster. Whenever possible it will ensure that data and backups are stored on a different rack, or datacenter to ensure that the impact of even a failing datacenter will be minimal on the production environment.

  2. Edit the /etc/cassandra/cassandra-rackdc.properties file on each node and set the DC and rack information. You can use your own naming standard to determine the location of each node.

    • On Node 1:
    dc=dc1
    rack=rack1
    • On Node 2:
    dc=dc1
    rack=rack1
    • On Node 3:
    dc=dc1
    rack=rack2
  3. Remove the /etc/cassandra/cassandra-topology.properties file, as we do not use it.

    rm /etc/cassandra/cassandra-topology.properties

Activating Cassandra

  1. Start Cassandra and enable automatic launching on system boot.
    systemctl enable cassandra.service
    systemctl start cassandra.service
  2. Verify that the service is running.
    systemctl -l status cassandra.service
  3. Check the status of the cluster with the command nodetool status.
    root@scw-cassandra:~# nodetool status
    Datacenter: datacenter1
    =======================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    -- Address Load Tokens Owns (effective) Host ID Rack
    UN 10.0.0.3 119.29 KiB 256 60,7% c9b13a33-147f-4293-8aaf-21ace6d1b756 rack2
    UN 10.0.0.2 170.88 KiB 256 65,3% 2a100701-3da4-444a-892d-164d2222009c rack1
    UN 10.0.0.1 15.47 KiB 256 74,1% 93feee5d-3de8-4c0a-908d-2432f26a1a1e rack1

Connecting to your cluster

Once all nodes have started, the cluster is ready. You can use the cqlsh tool to interact with your cluster. It is installed by default on any of the nodes.

  1. Connect to your cluster:
    cqlsh 10.0.0.1
    Connected to scw-cassandra01 at 10.0.0.1:9042.
    [cqlsh 5.0.1 | Cassandra 3.2.1 | CQL spec 3.4.0 | Native protocol v4]
    Use HELP for help.
    cqlsh>
  2. To quit the CQL shell, type EXIT and press enter.
    Note

    More information about the CQL syntax is available in the official documentation.

Renaming your cluster

By default your cluster is named ‘Test Cluster’, to edit the it to a more friendly name, follow these steps:

  1. Login to admin shell with cqlsh. Replace [new_cluster_name] with your new cluster name.
    UPDATE system.local SET cluster_name = '[new_cluster_name]' WHERE KEY = 'local';
  2. Leave the CQL shell with the command EXIT.
  3. Edit the file /etc/cassandra/cassandra.yaml on each of the nodes and replace the value in the cluster_name variable with the new cluster name you just set.
  4. Save and close the file.
  5. Run the following command from your Linux terminal to clear the system cache and preserve all data in the node:
    nodetool flush system
  6. Restart Cassandra.
    systemctl restart cassandra.service
  7. Log into the cluster with cqlsh and verify the new cluster name is visible.
Docs APIScaleway consoleDedibox consoleScaleway LearningScaleway.comPricingBlogCarreer
© 2023-2024 – Scaleway