Introducing Scaleway NextGen NVMe Cloud Servers with Hot-Snapshots. 🚀

Help


Community tutorials and documentations

Install a multi-node Cockroach Database with HA Proxy

Install a multi-node Cockroach Database with HA Proxy

In this tutorial, you will learn how to set up a multi-node cockroach Database with HA proxy.

What you need

  • A Scaleway Account

What is CockroachDB and why you should use it

In a traditional SQL setup, all the complexity of replication needs to be managed by the system administrators. This task is quite substantial and can increase the cost of running your infrastructure in case of a mistake.

CockroachDB has been designed to avoid this kind of misconfiguration problems by having a design which is very reliable by default and ready to be used with minimal configuration.

Architecture of Cockroach DB

CockroachDB is a distributed SQL database built on top of a transactional and consistent key-value store. The primary design goals for the application are:

Here is a schema of the architecture of CockroachDB:

Schema cockroachDB

  • SQL: The SQL layer helps developers to run SQL queries as in a traditional environment. It provides all the familiar terms and concepts such as schema, tables, and indexes. CockroachDB uses its own SQL feature set. Please refer to the documentation for the complete feature set.

  • Distributed Key-Value Store: The SQL layer communicates with the distributed key-value store so that we can develop large tables and indexes as HBase, BigTable, and others.

  • Distributed Transactions: Transactions are the core part of this application, their implementation of this feature manages the transition from SQL to stores and ranges.

  • Node: Nodes are the servers that store your data. They can either be virtual or physical machines. The distributed key-value store routes messages to the different nodes of our cluster.

  • Store: Each node can contain one or more stores, and each store can hold many ranges. RocksDB, an open source storage engine manages them.

  • Range: Ranges are the lowest-level of key-value data. Each store contains ranges, and each range covers a segment of the larger key-space. Together they form the entire monolithic sorted map.

Installation of CockroachDB

  1. Spin up at least three instances to guarantee redundancy and fault tolerance. We use Ubuntu Xenial (16.04) in this example.
  2. Once the servers have booted we install CockroachDB on them:

     wget -qO- https://binaries.cockroachdb.com/cockroach-latest.linux-amd64.tgz | tar  xvz
    
  3. We copy the binary into the PATH. It makes it easy to execute cockroach commands from any shell:

     cp -i cockroach-latest.linux-amd64/cockroach /usr/local/bin
    

Note: It is recommended to run CockroachDB on instances with at least 1GB of RAM for optimal performances.

Creation of certificates

SSL certificates are needed to encrypt the connection between our different instances. It is possible to either use the cockroach cert commands or an alternative like OpenSSL. The example below makes use of cockroach cert:

Create a certificate and safe directory for the CA key.

We start by creating the required directories to store the certificates:

mkdir certs
mkdir cr-keys

Create the CA key pair:

Now we create the Certificate Authority (CA) certificate and key: ca.crt and ca.key.

cockroach cert create-ca --certs-dir=certs --ca-key=cr-keys/ca.key

Note: Keep this key secret. It is the key of your Certificate Authority and can be used to issue certificates that are signed by your server.

Creation of a client key pair for the root user:

Now we have to generate the key pair for the root user: client.root.crt and client.root.key. These files are used to secure the communication between the built-in SQL shell and the cluster.

cockroach cert create-client root --certs-dir=certs --ca-key=cr-keys/ca.key

Note: In later steps, we use the root user’s certificate to run cockroachDB client commands from a local machine. The root user’s certificate and the key have to be copied to a remote node as well if you plan to run commands directly as root on that node (for local debugging).

Create a key pair for the nodes:

Now we have to create a node certificate and key for each node of our cluster: node.crt and node.key. They will be used to secure communication between our different nodes:

cockroach cert create-node <node1 internal IP address> <node1 external IP address> <node1 public FQDN> --certs-dir=certs --ca-key=cr-keys/ca.key

We create the remote directory and copy the files to the first node:

ssh <username>@<node1 address> "mkdir certs"
scp certs/ca.crt certs/node.crt certs/node.key <username>@<node1 ip address>:~/certs

Then we delete the local files so that we can create the remaining certificates for our other nodes:

rm certs/node.crt certs/node.key

Note: This step is required as the new files will also be named node.crt and node.key. An alternative method to create the files without deleting the original files is to use the --overwrite attribute which overwrites the existing files.

Now we create the key pair for the second node:

cockroach cert create-node <node2 internal IP address> <node2 external IP address> <node2 public FQDN> --certs-dir=certs --ca-key=cr-keys/ca.key

We upload the files to the remote node:

ssh <username>@<node2 address> "mkdir certs"
scp certs/ca.crt certs/node.crt certs/node.key <username>@<node2 address>:~/certs

We repeat these steps for each additional node that we want to use in our cluster.

Note: It is recommended to use at least three nodes to have redundancy and replication services of CockroachDB enabled. it is possible to launch an unlimited number of nodes to scale your cluster as the workload grows.

Start the instances

Now we can start the cockroachDB daemon on the different nodes by typing the following command (This example is for the first node, you have to replace the parameter --host=<node1 address> with the IP of node2 and node3 on the two other instances):

cockroach start --certs-dir=certs --host=<node1 address> --join=<node1 ip address>:26257,<node2 ip address>:26257,<node3 ip address>:26257 --cache=25% --max-sql-memory=25% --background

We have to repeat this step on each node to start the server (don’t forget to edit the IP settings according to each node).

Initialize the cluster

Once we have started the different instances of Cockroach, they will not automatically form a cluster. We have to initialize it by running the following command:

cockroach init --certs-dir=certs --host=<address of any node>

We see the following message on the first node:

build:      CCL v1.1.6 @ 2018/03/12 17:58:05 (go1.8.3)
admin:      https://10.10.193.23:8080
sql:        postgresql://root@10.10.193.23:26257?    application_name=cockroach&sslmode=verify-full&sslrootcert=certs%2Fca.crt
logs:       /root/cockroach-data/logs
store[0]:   path=/root/cockroach-data
status:     initialized new cluster
clusterID:  4c613798-eb26-412b-8a3a-55654a446188
nodeID:     1
build:      CCL v1.1.6 @ 2018/03/12 17:58:05 (go1.8.3)

On the other nodes a message appears, confirming that they have joined the cluster:

status:     initialized new node, joined pre-existing cluster
clusterID:  4c613798-eb26-412b-8a3a-55654a446188
nodeID:     3

It is possible to verify the status of the cluster from the admin interface of Cockroach DB, available at https://<node1 ip address>:8080. Some statistics of the different nodes connected to our cluster are visible.

The administration interface is available on any of the nodes.

Note: Your browser may show a warning about the connection to this site. This message is due to the self-signed certificate that is used by cockroachDB. You can ignore the advice and configure your web browser to accept the certificate.

Synchronizing time

The clocks of your nodes have to be synchronized to ensure that the replication of your cluster works well. If a node is out of synch with at least the half of the other nodes in the cluster (a default offset of 500ms is set), it will shutdown itself.

It is therefore recommended to synchronize the clock of your nodes with a time server.

We configure ntp with the time servers of Scaleway and Ubuntu in this example:

Install the ntp client:

sudo apt-get install ntp

Stop the ntp service:

sudo service ntp stop

Edit the file /etc/ntp.conf in your favorite text editor and remove any lines starting with server or pool. Then add the following content to the file:

server ntp.int.scaleway.com iburst
server ntp.ubuntu.com iburst

Now it is time to restart the service:

sudo service ntp start

In the last step we verify that the node is using the correct NTP server:

sudo ntpq -p

The active NTP server is displayed with an asterisk.

Note: You should complete these steps on all of your nodes to make sure the clocks are synchronized.

Test your cluster

On our primary node, we launch the SQL client of CockroachDB:

cockroach sql --certs-dir=certs --host=<address of node1>

Now we create a cockroachtest database on the node:

CREATE DATABASE cockroachtest;

We quit the SQL client with \q or by pressing CRTL+C.

We can verify that the database has been replicated, by connecting to another node of the cluster:

cockroach sql --certs-dir=certs --host=<address of node2>

We do a listing of the databases, which includes cockroachtest:

> SHOW DATABASES;

It will return something like:

+--------------------+
|      Database      |
+--------------------+
| crdb_internal      |
| information_schema |
| cockroachtest      |
| pg_catalog         |
| system             |
+--------------------+
(5 rows)

Scale the cluster

Your database needs more computing power? Scale-it easily by adding additional nodes to the cluster.

Launch additional nodes like the ones you have configured during the initial setup of your cluster.

Configure HAproxy

The cluster is now ready to be used. We can use any of the nodes as an endpoint to our databases, but the load is not being distributed automatically between the nodes. CockroachDB provides built-in support for HAproxy, which allows us to distribute the load on equally on any of the nodes.

A load balancer increases the Performance and Reliability of an application.

For performance and availability reasons it is not recommended to run the HAproxy on the same node as your cockroachDB.

We start by running the cockroach gen haproxy command on the primary node to generate automatically the configuration file for HAproxy:

cockroach gen haproxy --certs-dir=certs --host=<address of the primary node> --port=26257

It generates an haproxy.cfg file with the details of your cluster already configured. It looks like the following example:

global
       maxconn 4096

defaults
     mode                tcp
     timeout connect     10s
     timeout client      1m
    timeout server      1m

listen psql
     bind :26257
     mode tcp
     balance roundrobin
     server cockroach1 <node1 address>:26257
     server cockroach2 <node2 address>:26257
     server cockroach3 <node3 address>:26257

We have to copy the file to the instance used for HAproxy:

 scp haproxy.cfg <username>@<haproxy ip address>:~/

Now we connect to the machine via SSH and install the proxy on it:

 sudo apt-get install haproxy

We start HAproxy with our configuration file:

 haproxy -f haproxy.cfg

The load balancer distributes the requests sent to our cluster equally between the different nodes.

Finally, we test the configuration of cockroachDB from the primary node:

 cockroach sql --certs-dir=certs --host=<haproxy ip address>

To check to which node we are connected, we type the following command:

 > SHOW node_id;

The result will look like:

+---------+
| node_id |
+---------+
|       2 |
+---------+
(1 row)

It shows us the ID of the node to which we are connected.

Note: In this setup the load balancer is a single point of failure. If the load balancer fails, our cluster becomes unavailable. It is therefore recommended to configure at least two load balancers and to use round-robin DNS to maximize the availability.

Conclusion

We were able to configure a highly available CockroachDB cluster in this tutorial. As you can see it is easy to configure and provides excellent possibilities to automatize the distribution of your databases, combined with the possibility to scale your cluster at any time if your requirements grow. If you want to explore all configuration options, you can find more information in the documentation of CockroachDB

Discover a New Cloud Experience

Deploy SSD Cloud Servers in seconds.