Installing a multi-node Cockroach database with HA Proxy
CockroachDB - Overview
In this tutorial, you will gain knowledge on how to deploy a multi-node Cockroach Database, which is a distributed SQL database system capable of handling data in a highly scalable and resilient manner. Additionally, you will also learn how to integrate the use of HAProxy, a popular open-source load balancing software, to improve the performance and availability of your Cockroach Database. By following this tutorial, you will be able to set up a robust and efficient database system for your applications.
You may need certain IAM permissions to carry out some actions described on this page. This means:
- you are the Owner of the Scaleway Organization in which the actions will be carried out, or
- you are an IAM user of the Organization, with a policy granting you the necessary permission sets
- You have an account and are logged into the Scaleway console
- You have configured your SSH Key
- You have an Instance running Ubuntu 16.04
What is CockroachDB and why you should use it
In a traditional SQL setup, all the complexity of replication needs to be managed by the system administrators. This task is quite substantial and can increase the cost of running your infrastructure in case of a mistake.
CockroachDB is designed to avoid this kind of misconfiguration problems by having a design which is reliable by default and ready to be used with minimal configuration.
Architecture of CockroachDB
CockroachDB is a distributed SQL database built on top of a transactional and consistent key-value store. The primary design goals for the application are:
- support for ACID transactions
- horizontal scalability
Below you find a schema of the architecture of CockroachDB:
- SQL: The SQL layer helps developers run SQL queries as in a traditional environment. It provides all the familiar terms and concepts such as schema, tables, and indexes. CockroachDB uses its own SQL feature set. Please refer to the documentation for the complete feature set.
- Distributed Key-Value Store: The SQL layer communicates with the distributed key-value store so that we can develop large tables and indexes as HBase, BigTable, and others.
- Distributed Transactions: Transactions are the core part of this application, their implementation of this feature manages the transition from SQL to stores and ranges.
- Node: Nodes are the servers that store your data. They can either be virtual or physical machines. The distributed key-value store routes messages to the different nodes of our cluster.
- Store: Each node can contain one or more stores, and each store can hold many ranges. RocksDB, an open source storage engine manages them.
- Range: Ranges are the lowest-level of key-value data. Each store contains ranges, and each range covers a segment of the larger key-space. Together they form the entire monolithic sorted map.
Installation of CockroachDB
- Spin up at least three Instances to guarantee redundancy and fault tolerance. We use Ubuntu Xenial (16.04) in this example.
- Once the servers have booted we install CockroachDB on them:
wget -qO- https://binaries.cockroachdb.com/cockroach-latest.linux-amd64.tgz | tar xvz
- We copy the binary into the
PATH. It makes it easy to execute cockroach commands from any shell:cp -i cockroach-latest.linux-amd64/cockroach /usr/local/bin
It is recommended to run CockroachDB on Instances with at least 1GB of RAM for optimal performances.
Creation of certificates
SSL certificates are needed to encrypt the connection between our different Instances. It is possible to either use the
cockroach cert commands or an alternative like OpenSSL. The example below makes use of
Create a certificate and safe directory for the CA key.
Start by creating the required directories to store the certificates:
mkdir certsmkdir cr-keys
Create the CA key pair:
Create the Certificate Authority (CA) certificate and key:
cockroach cert create-ca --certs-dir=certs --ca-key=cr-keys/ca.key
Keep this key secret. It is the key of your Certificate Authority and can be used to issue certificates that are signed by your server.
Creation of a client key pair for the root user:
Generate the key pair for the
client.root.key. These files are used to secure the communication between the built-in SQL shell and the cluster.
cockroach cert create-client root --certs-dir=certs --ca-key=cr-keys/ca.key
In later steps, we use the root user’s certificate to run
cockroachDB client commands from a local machine. The root user’s certificate and the key have to be copied to a remote node as well if you plan to run commands directly as root on that node (for local debugging).
Create a key pair for the nodes:
- Create a node certificate and key for each node of our cluster:
node.key. They will be used to secure communication between our different nodes:cockroach cert create-node <node1 internal IP address> <node1 external IP address> <node1 public FQDN> --certs-dir=certs --ca-key=cr-keys/ca.key
- Create the remote directory and copy the files to the first node:
ssh <username>@<node1 address> "mkdir certs"scp certs/ca.crt certs/node.crt certs/node.key <username>@<node1 ip address>:~/certs
- Delete the local files so that we can create the remaining certificates for our other nodes:
rm certs/node.crt certs/node.key
This step is required as the new files will also be named
node.key. An alternative method to create the files without deleting the original files is to use the
--overwrite attribute which overwrites the existing files.
- Create the key pair for the second node:
cockroach cert create-node <node2 internal IP address> <node2 external IP address> <node2 public FQDN> --certs-dir=certs --ca-key=cr-keys/ca.key
- Upload the files to the remote node:
ssh <username>@<node2 address> "mkdir certs"scp certs/ca.crt certs/node.crt certs/node.key <username>@<node2 address>:~/certs
Repeat these steps for each additional node that we want to use in our cluster.
It is recommended to use at least three nodes to have redundancy and replication services of CockroachDB enabled. It is possible to launch an unlimited number of nodes to scale your cluster as the workload grows.
Start the Instances
Start the cockroachDB daemon on the different nodes by typing the following command (This example is for the first node, you have to replace the parameter
--host=node1_address with the IP of
node3 on the two other Instances):
cockroach start --certs-dir=certs --host=<node1 address> --join=<node1 ip address>:26257,<node2 ip address>:26257,<node3 ip address>:26257 --cache=25% --max-sql-memory=25% --background
Repeat this step on each node to start the server (don’t forget to edit the IP settings according to each node).
Initialize the cluster
Once we have started the different Instances of Cockroach, they will not automatically form a cluster. We have to initialize it by running the following command:
cockroach init --certs-dir=certs --host=<address of any node>
The following message displays on the first node:
build: CCL v1.1.6 @ 2018/03/12 17:58:05 (go1.8.3)admin: https://10.10.193.23:8080sql: postgresql://email@example.com:26257? application_name=cockroach&sslmode=verify-full&sslrootcert=certs%2Fca.crtlogs: /root/cockroach-data/logsstore: path=/root/cockroach-datastatus: initialized new clusterclusterID: 4c613798-eb26-412b-8a3a-55654a446188nodeID: 1build: CCL v1.1.6 @ 2018/03/12 17:58:05 (go1.8.3)
On the other nodes a message appears, confirming that they have joined the cluster:
status: initialized new node, joined pre-existing clusterclusterID: 4c613798-eb26-412b-8a3a-55654a446188nodeID: 3
It is possible to verify the status of the cluster from the admin interface of Cockroach DB, available at
https://<node1 ip address>:8080. Some statistics of the different nodes connected to our cluster are visible.
The administration interface is available on any of the nodes.
Your browser may show a warning about the connection to this site. This message is due to the self-signed certificate that is used by cockroachDB. You can ignore the advice and configure your web browser to accept the certificate.
The clocks of your nodes have to be synchronized to ensure that the replication of your cluster works well. If a node is out of synch with at least the half of the other nodes in the cluster (a default offset of 500ms is set), it will shutdown itself.
It is therefore recommended to synchronize the clock of your nodes with a time server.
ntp with the time servers of Scaleway and Ubuntu in this example:
- Install the ntp client:
sudo apt-get install ntp
- Stop the ntp service:
sudo service ntp stop
- Edit the file
/etc/ntp.confin your favorite text editor and remove any lines starting with
pool. Then add the following content to the file:server ntp.int.scaleway.com iburstserver ntp.ubuntu.com iburst
- Restart the service:
sudo service ntp start
- Verify that the node is using the correct NTP server:
sudo ntpq -p
The active NTP server is displayed with an asterisk.
You should complete these steps on all of your nodes to make sure the clocks are synchronized.
Test your cluster
- On the primary node, launch the SQL client of CockroachDB:
cockroach sql --certs-dir=certs --host=<address of node1>
- Create a
cockroachtestdatabase on the node:CREATE DATABASE cockroachtest;
- Quit the SQL client with
\qor by pressing
- Verify that the database has been replicated, by connecting to another node of the cluster:
cockroach sql --certs-dir=certs --host=<address of node2>
- Do a listing of the databases, which includes
cockroachtest:> SHOW DATABASES;
It will return something like:
+--------------------+| Database |+--------------------+| crdb_internal || information_schema || cockroachtest || pg_catalog || system |+--------------------+(5 rows)
Scale the cluster
Your database needs more computing power? Scale-it easily by adding additional nodes to the cluster.
Launch additional nodes like the ones you have configured during the initial setup of your cluster.
The cluster is now ready to be used. We can use any of the nodes as an endpoint to our databases, but the load is not being distributed automatically between the nodes. CockroachDB provides built-in support for HAproxy, which allows us to distribute the load on equally on any of the nodes.
A Load Balancer increases the Performance and Reliability of an application.
For performance and availability reasons it is not recommended to run the HAproxy on the same node as your cockroachDB._
- Start by running the
cockroach gen haproxycommand on the primary node to generate automatically the configuration file for HAproxy:cockroach gen haproxy --certs-dir=certs --host=<address of the primary node> --port=26257
It generates an
haproxy.cfg file with the details of your cluster already configured. It looks like the following example:
globalmaxconn 4096defaultsmode tcptimeout connect 10stimeout client 1mtimeout server 1mlisten psqlbind :26257mode tcpbalance roundrobinserver cockroach1 <node1 address>:26257server cockroach2 <node2 address>:26257server cockroach3 <node3 address>:26257
- Copy the file to the instance used for HAproxy:
scp haproxy.cfg <username>@<haproxy ip address>:~/
- Connect to the machine via SSH and install the proxy on it:
sudo apt-get install haproxy
- Start HAproxy with our configuration file:
haproxy -f haproxy.cfg
The Load Balancer distributes the requests sent to our cluster equally between the different nodes. 5. Test the configuration of cockroachDB from the primary node:
cockroach sql --certs-dir=certs --host=<haproxy ip address>
- To check to which node you are connected, type the following command:
> SHOW node_id;
The result will look like:
+---------+| node_id |+---------+| 2 |+---------+(1 row)
It shows us the ID of the node to which we are connected.
In this setup the Load Balancer is a single point of failure. If the Load Balancer fails, our cluster becomes unavailable. It is therefore recommended to configure at least two load balancers and to use round-robin DNS to maximize the availability.
We were able to configure a highly available CockroachDB cluster in this tutorial. As you can see it is easy to configure and provides excellent possibilities to automatize the distribution of your databases, combined with the possibility to scale your cluster at any time if your requirements grow. If you want to explore all configuration options, you can find more information in the documentation of CockroachDB