In this tutorial, you will learn how to set up a multi-node cockroach Database with HA proxy.
What you need
- A Scaleway Account
In a traditional SQL setup, all the complexity of replication needs to be managed by the system administrators. This task is quite substantial and can increase the cost of running your infrastructure in case of a mistake.
CockroachDB has been designed to avoid this kind of misconfiguration problems by having a design which is very reliable by default and ready to be used with minimal configuration.
CockroachDB is a distributed SQL database built on top of a transactional and consistent key-value store. The primary design goals for the application are:
Here is a schema of the architecture of CockroachDB:
SQL: The SQL layer helps developers to run SQL queries as in a traditional environment. It provides all the familiar terms and concepts such as schema, tables, and indexes. CockroachDB uses its own SQL feature set. Please refer to the documentation for the complete feature set.
Distributed Key-Value Store: The SQL layer communicates with the distributed key-value store so that we can develop large tables and indexes as HBase, BigTable, and others.
Distributed Transactions: Transactions are the core part of this application, their implementation of this feature manages the transition from SQL to stores and ranges.
Node: Nodes are the servers that store your data. They can either be virtual or physical machines. The distributed key-value store routes messages to the different nodes of our cluster.
Store: Each node can contain one or more stores, and each store can hold many ranges. RocksDB, an open source storage engine manages them.
Range: Ranges are the lowest-level of key-value data. Each store contains ranges, and each range covers a segment of the larger key-space. Together they form the entire monolithic sorted map.
Once the servers have booted we install CockroachDB on them:
wget -qO- https://binaries.cockroachdb.com/cockroach-latest.linux-amd64.tgz | tar xvz
We copy the binary into the
PATH. It makes it easy to execute cockroach commands from any shell:
cp -i cockroach-latest.linux-amd64/cockroach /usr/local/bin
Note: It is recommended to run CockroachDB on instances with at least 1GB of RAM for optimal performances.
SSL certificates are needed to encrypt the connection between our different instances. It is possible to either use the
cockroach cert commands or an alternative like OpenSSL. The example below makes use of
We start by creating the required directories to store the certificates:
mkdir certs mkdir cr-keys
Now we create the Certificate Authority (CA) certificate and key:
cockroach cert create-ca --certs-dir=certs --ca-key=cr-keys/ca.key
Note: Keep this key secret. It is the key of your Certificate Authority and can be used to issue certificates that are signed by your server.
Now we have to generate the key pair for the
client.root.key. These files are used to secure the communication between the built-in SQL shell and the cluster.
cockroach cert create-client root --certs-dir=certs --ca-key=cr-keys/ca.key
Note: In later steps, we use the root user’s certificate to run
cockroachDBclient commands from a local machine. The root user’s certificate and the key have to be copied to a remote node as well if you plan to run commands directly as root on that node (for local debugging).
Now we have to create a node certificate and key for each node of our cluster:
node.key. They will be used to secure communication between our different nodes:
cockroach cert create-node <node1 internal IP address> <node1 external IP address> <node1 public FQDN> --certs-dir=certs --ca-key=cr-keys/ca.key
We create the remote directory and copy the files to the first node:
ssh <username>@<node1 address> "mkdir certs" scp certs/ca.crt certs/node.crt certs/node.key <username>@<node1 ip address>:~/certs
Then we delete the local files so that we can create the remaining certificates for our other nodes:
rm certs/node.crt certs/node.key
Note: This step is required as the new files will also be named
node.key. An alternative method to create the files without deleting the original files is to use the
--overwriteattribute which overwrites the existing files.
Now we create the key pair for the second node:
cockroach cert create-node <node2 internal IP address> <node2 external IP address> <node2 public FQDN> --certs-dir=certs --ca-key=cr-keys/ca.key
We upload the files to the remote node:
ssh <username>@<node2 address> "mkdir certs" scp certs/ca.crt certs/node.crt certs/node.key <username>@<node2 address>:~/certs
We repeat these steps for each additional node that we want to use in our cluster.
Note: It is recommended to use at least three nodes to have redundancy and replication services of CockroachDB enabled. it is possible to launch an unlimited number of nodes to scale your cluster as the workload grows.
Now we can start the cockroachDB daemon on the different nodes by typing the following command (This example is for the first node, you have to replace the parameter
--host=<node1 address> with the IP of
node3 on the two other instances):
cockroach start --certs-dir=certs --host=<node1 address> --join=<node1 ip address>:26257,<node2 ip address>:26257,<node3 ip address>:26257 --cache=25% --max-sql-memory=25% --background
We have to repeat this step on each node to start the server (don’t forget to edit the IP settings according to each node).
Once we have started the different instances of Cockroach, they will not automatically form a cluster. We have to initialize it by running the following command:
cockroach init --certs-dir=certs --host=<address of any node>
We see the following message on the first node:
build: CCL v1.1.6 @ 2018/03/12 17:58:05 (go1.8.3) admin: https://10.10.193.23:8080 sql: postgresql://firstname.lastname@example.org:26257? application_name=cockroach&sslmode=verify-full&sslrootcert=certs%2Fca.crt logs: /root/cockroach-data/logs store: path=/root/cockroach-data status: initialized new cluster clusterID: 4c613798-eb26-412b-8a3a-55654a446188 nodeID: 1 build: CCL v1.1.6 @ 2018/03/12 17:58:05 (go1.8.3)
On the other nodes a message appears, confirming that they have joined the cluster:
status: initialized new node, joined pre-existing cluster clusterID: 4c613798-eb26-412b-8a3a-55654a446188 nodeID: 3
It is possible to verify the status of the cluster from the admin interface of Cockroach DB, available at
https://<node1 ip address>:8080. Some statistics of the different nodes connected to our cluster are visible.
The administration interface is available on any of the nodes.
Note: Your browser may show a warning about the connection to this site. This message is due to the self-signed certificate that is used by cockroachDB. You can ignore the advice and configure your web browser to accept the certificate.
The clocks of your nodes have to be synchronized to ensure that the replication of your cluster works well. If a node is out of synch with at least the half of the other nodes in the cluster (a default offset of 500ms is set), it will shutdown itself.
It is therefore recommended to synchronize the clock of your nodes with a time server.
ntp with the time servers of Scaleway and Ubuntu in this example:
Install the ntp client:
sudo apt-get install ntp
Stop the ntp service:
sudo service ntp stop
Edit the file
/etc/ntp.conf in your favorite text editor and remove any lines starting with
pool. Then add the following content to the file:
server ntp.int.scaleway.com iburst server ntp.ubuntu.com iburst
Now it is time to restart the service:
sudo service ntp start
In the last step we verify that the node is using the correct NTP server:
sudo ntpq -p
The active NTP server is displayed with an asterisk.
Note: You should complete these steps on all of your nodes to make sure the clocks are synchronized.
On our primary node, we launch the SQL client of CockroachDB:
cockroach sql --certs-dir=certs --host=<address of node1>
Now we create a
cockroachtest database on the node:
CREATE DATABASE cockroachtest;
We quit the SQL client with
\q or by pressing
We can verify that the database has been replicated, by connecting to another node of the cluster:
cockroach sql --certs-dir=certs --host=<address of node2>
We do a listing of the databases, which includes
> SHOW DATABASES;
It will return something like:
+--------------------+ | Database | +--------------------+ | crdb_internal | | information_schema | | cockroachtest | | pg_catalog | | system | +--------------------+ (5 rows)
Your database needs more computing power? Scale-it easily by adding additional nodes to the cluster.
Launch additional nodes like the ones you have configured during the initial setup of your cluster.
The cluster is now ready to be used. We can use any of the nodes as an endpoint to our databases, but the load is not being distributed automatically between the nodes. CockroachDB provides built-in support for HAproxy, which allows us to distribute the load on equally on any of the nodes.
A load balancer increases the Performance and Reliability of an application.
For performance and availability reasons it is not recommended to run the HAproxy on the same node as your cockroachDB.
We start by running the
cockroach gen haproxy command on the primary node to generate automatically the configuration file for HAproxy:
cockroach gen haproxy --certs-dir=certs --host=<address of the primary node> --port=26257
It generates an
haproxy.cfg file with the details of your cluster already configured. It looks like the following example:
global maxconn 4096 defaults mode tcp timeout connect 10s timeout client 1m timeout server 1m listen psql bind :26257 mode tcp balance roundrobin server cockroach1 <node1 address>:26257 server cockroach2 <node2 address>:26257 server cockroach3 <node3 address>:26257
We have to copy the file to the instance used for HAproxy:
scp haproxy.cfg <username>@<haproxy ip address>:~/
Now we connect to the machine via SSH and install the proxy on it:
sudo apt-get install haproxy
We start HAproxy with our configuration file:
haproxy -f haproxy.cfg
The load balancer distributes the requests sent to our cluster equally between the different nodes.
Finally, we test the configuration of cockroachDB from the primary node:
cockroach sql --certs-dir=certs --host=<haproxy ip address>
To check to which node we are connected, we type the following command:
> SHOW node_id;
The result will look like:
+---------+ | node_id | +---------+ | 2 | +---------+ (1 row)
It shows us the ID of the node to which we are connected.
Note: In this setup the load balancer is a single point of failure. If the load balancer fails, our cluster becomes unavailable. It is therefore recommended to configure at least two load balancers and to use round-robin DNS to maximize the availability.
We were able to configure a highly available CockroachDB cluster in this tutorial. As you can see it is easy to configure and provides excellent possibilities to automatize the distribution of your databases, combined with the possibility to scale your cluster at any time if your requirements grow. If you want to explore all configuration options, you can find more information in the documentation of CockroachDB