Installing CockroachDB on Scaleway Instances

cockroachdbsqldatabase

Reviewed on April 09, 2024

CockroachDB is an open-source, distributed SQL database designed to be scalable, reliable, and consistent.

CockroachDB scales horizontally. This means that in case of increased data volume, you can add more servers to your CockroachDB cluster instead of scaling vertically by adding more resources (memory, CPU, or storage) to a single server.

Its reliability lies in the ability to overcome software and hardware failures. Indeed, CockroachDB uses replication, meaning data is stored across multiple nodes. Moreover, the Raft Consensus Algorithm ensures data consistency by requiring a majority of replicas to agree before a change is committed to the database. Additionally, the Raft algorithm can automatically detect and correct issues (such as identifying and restoring corrupted data) without human intervention.

Thanks to its serializable SQL transactions, CockroachDB also ensures that data remains consistent and accurate. To do so, it combines both the Raft algorithm to read data and timestamps and multiple versions of data (MVCC) to write data. MVCC provides you with a stable and unchanging view of the data whenever you start a transaction, regardless of other changes happening in the database at the same time.

This article shows you how to install CockroachDB using three nodes on a Private Network as well as a Scaleway Load Balancer to access the database and console.

Before you start

To complete the actions presented below, you must have:

A Scaleway account logged into the console
Owner status or IAM permissions allowing you to perform actions in the intended Organization
Installed CockroachDB locally
A network configuration allowing TCP communication on the following ports:
- 26257 for intra-cluster and client-cluster communication
- 8080 to expose your DB Console
Reviewed the production checklist and recommended topology patterns

Availability guidelines

Run each node on a separate machine. Since CockroachDB replicates across nodes, running more than one node per machine increases the risk of data loss if a machine fails. Likewise, if a machine has multiple disks or SSDs, run one node with multiple --store flags and not one node per disk.

For more details about stores, see the Start a Node documentation page.
When starting each node, use the --locality flag to describe the node's location, for example, --locality=region=fr-par,zone=fr-par-1. The key-value pairs should be ordered from most to least inclusive, and the keys and order of key-value pairs must be the same on all nodes.
When deploying in a single Availability Zone (AZ):
- To support the failure of any node, use at least 3 nodes with the default 3-way replication factor. In this case, if one node fails, each range retains 2 of its 3 replicas, a majority.
- To be able to tolerate 2 simultaneous node failures, use at least 5 nodes and increase the default replication factor for user data to 5. The replication factor for important internal data is 5 by default, so no adjustments are needed for internal data. In this case, if 2 nodes fail at the same time, each range retains 3 of its 5 replicas.
When deploying across multiple AZs:
- To support the failure of an entire AZ in a region, use at least 3 AZs per region and set --locality on each node to spread data evenly across regions and AZs. In this case, if one AZ goes offline, the 2 remaining AZs retain a majority of replicas.
- To ensure that ranges are split evenly across nodes, use the same number of nodes in each AZ. This is to avoid overloading any nodes with excessive resource consumption.
When deploying across multiple regions:
- To be able to tolerate the failure of 1 entire region, use at least 3 regions.

Access levels

You can configure different access levels for the DB Console. We recommend different configurations depending on your needs:

Access Level	Description
Partially open	Sets an ACL rule on the Load Balancer to allow only specific IP addresses to communicate on port `8080`.
Completely open	Does not set an ACL rule on the Load Balancer. All IP addresses will be able to access the console on port `8080`.
Completely closed	Does not create a console backend and frontend on the Load Balancer. In this case, a machine with SSH access to a node could use an SSH tunnel to access the DB Console.

Creating Instances

This procedure shows you how to create Instances for each node you plan to have in your cluster. Make sure to replace all example values with your own.

Important

To ensure survivability, you must run at least 3 nodes.

You can use any Instance type except for PLAY2-PICO Instances as they have 1 vCPU, which is below CockroachDB's minimum requirement of 4 vCPUs.

For more details, refer to the Cockroach hardware recommendations and cluster topology.

Run the following command to create a Private Network:

scw vpc private-network create name=cockroachdb region=fr-par subnets.0=192.168.0.0/24

Create a Public Gateway to access Instances.

scw vpc-gw gateway create name=external-access type=VPC-GW-S enable-bastion=true

Create a DHCP subnet that matches the one used in your Private Network.
```
scw vpc-gw dhcp create subnet=192.168.0.0/24
```

Configure your gateway with your Private Network using your newly created DHCP subnet.

scw vpc-gw gateway-network create enable-dhcp=true enable-masquerade=true \
gateway-id=<id> dhcp-id=<id> private-network-id=<id>

Run the following command to create an Instance in the fr-par-1 zone with a data volume of 30 GB.

scw instance server create name=db01 zone=fr-par-1 type=PRO2-XXS image=ubuntu_jammy ip=none additional-volumes.0=b:30G

Create a private NIC (Network Interface Card) for your Instance.

scw instance private-nic create zone=fr-par-1 server-id=<id> private-network-id=<id>

Repeat steps 5 and 6 to create the two other nodes in the fr-par-2 and fr-par-3 AZs.
Check your Instance's DHCP entry on the Private Network and retrieve its private IP.
```
scw vpc-gw dhcp-entry list gateway-network-id=<id>
```
Connect to the Instances using the gateway bastion.
```
ssh -J bastion@<gw-ip>:61000 root@db01.priv
```

Synchronizing clocks

CockroachDB requires moderate levels of clock synchronization to preserve data consistency.

For this reason, nodes spontaneously shut down when they detect that their clock is out of sync with at least half of the other nodes in the cluster by 80% of the maximum offset allowed (500ms by default). This is done to avoid consistency anomalies. However, it is best to prevent clocks from drifting too far off. To do so, a clock synchronization software can be run on each mode.

For the sake of this tutorial, we will be using NTP to keep offsets in single-digit milliseconds.

Note

Note that other methods of clock synchronization are suitable as well.

Connect to one of your nodes using SSH.
Disable timesyncd as it tends to be active by default on some Linux distributions by running the following command.
```
timedatectl set-ntp no
```
Note
You can also make sure timesyncd is off by running the timedatectl command.
Run the following command to install the NTP package.
```
apt-get install ntp -y
```
Run the following command to stop the NTP daemon.
```
service ntp stop
```
Configure the machine's clock with Google's NTP service, in the /etc/ntp.conf file, remove or comment out any lines starting with server or pool and add the following lines.
```
server time1.google.com iburst
server time2.google.com iburst
server time3.google.com iburst
server time4.google.com iburst
```
Restart the NTP daemon:
```
service ntp start
```
Important
We recommend using Google's NTP service because it handles "smearing" the leap second. If you use a different NTP service that does not smear the leap second, be sure to configure client-side smearing in the same way on each machine. See the production checklist for details.
Synchronize the machine's clock with Google's NTP service.
```
ntpd -b time.google.com
```
Verify that the machine is using a Google NTP server.
```
ntpq -p
```
Note
The active NTP server will be marked with an asterisk.

Important
You need to repeat this step for each machine running a CockroachDB node.

Setting up load balancing

Each CockroachDB node is an equally suitable SQL gateway to your cluster, but to ensure client performance and reliability, it is important to use load balancing:

Performance: Load balancers spread client traffic across nodes. This prevents nodes from being overwhelmed by requests and improves overall cluster performance (queries per second).
Reliability: Load balancers decouple client health from the health of a single CockroachDB node. In cases where a node fails, the load balancer redirects client traffic to available nodes.

Scaleway offers fully-managed Load Balancers to distribute traffic between Instances.

Create a Load Balancer.

scw lb lb create name=cockroachdb assign-flexible-ip=true type=LB-S

Attach the Load Balancer to the Private Network using the Private Network UUID.
```
scw lb private-network attach <lb-id> private-network-id=<id>
```
Configure a Load Balancer IP in your DNS, using Scaleway Domains and DNS for example.
```
scw dns record add <domain> type=A name=<entry> data=<lb-ip> ttl=3600
```

Create a Load Balancer certificate using Let’s Encrypt.

scw lb certificate create lb-id=<lb-ip> name=cockroach letsencrypt-common-name=<fqdn>

Create the DB communication backend and check the ready status of the node via the console.

scw lb backend create lb-id=<lb-ip> name=db forward-protocol=tcp forward-port=26257 \
forward-port-algorithm=leastconn sticky-sessions=table health-check.check-max-retries=5 \
server-ip.0=<ip db01> server-ip.1=<ip db02> server-ip.2=<ip db03> health-check.port=8080 \
health-check.http-config.method=GET "health-check.http-config.uri=/health?ready=1" \
health-check.http-config.code=200

Create the DB communication frontend. SSL is handled by the nodes.

scw lb frontend create lb-id=<lb-id> name=db inbound-port=26257 backend-id=<backend-id>

Create the console backend. The health check can be done via HTTP, even if the console is exposed via HTTPS.

scw lb backend create lb-id=<lb-ip> name=console forward-protocol=https forward-port=8080 \
forward-port-algorithm=leastconn sticky-sessions=table health-check.port=8080 \
server-ip.0=<ip db01> server-ip.1=<ip db02> server-ip.2=<ip db03> \
health-check.check-max-retries=5 health-check.http-config.method=GET \
"health-check.http-config.uri=/health" health-check.http-config.code=200 \
ssl-bridging=true ignore-ssl-server-verify=true

Create the console frontend. We use the Let’s Encrypt certificate to avoid a self-signed CA warning.

scw lb frontend create lb-id=<lb-ip> name=console inbound-port=443\
certificate-ids.0=<cert-id> backend-id=<backend-id>

Use the following command to set up an ACL on the frontend:

scw lb acl create frontend-id=<console-frontend-id> name=admin action.type=deny \
match.ip-subnet.0=<your ip> match.invert=true index=0

Configuring your network

Set up a firewall on each of your Instances, allowing TCP incoming communication on the following three ports:

26257 (tcp:26257) for inter-node communication (i.e., working as a cluster), for applications to connect to the Load Balancer, and for routing from the Load Balancer to the nodes
8080 (tcp:8080) for exposing your DB Console
22 (tcp:22) for SSH access

For further information, refer to the Configuring Firewalls for Instances tutorial.

Generating certificates

You can use cockroach cert commands, openssl commands, or Auto TLS cert generation](https://www.cockroachlabs.com/docs/releases/v21.1.html#v21-1-0-database-operations) (alpha) to generate security certificates. This section features the cockroach cert commands.

Locally, you will need to create the following certificates and keys:

A certificate authority (CA) key pair (ca.crt and ca.key).
A node key pair for each node, issued to its IP addresses and any common names the machine uses, as well as to the IP addresses and common names for machines running load balancers.
A client key pair for the root user. You will use this to run a sample workload against the cluster as well as some cockroach client commands from your local machine.

Install CockroachDB on your local machine.
Create two directories.
```
mkdir certs
mkdir my-safe-directory
```
certs
You will generate your CA certificate and all node and client certificates and keys in this directory and then upload some of the files to your nodes.
my-safe-directory
You will generate your CA key in this directory and then reference the key when generating node and client certificates. After that, you will keep the key safe and secret; you will not upload it to your nodes.

Create the CA certificate and key.

cockroach cert create-ca --certs-dir=certs --ca-key=my-safe-directory/ca.key

Create the certificate and key for the first node, issued to all common names you might use to refer to the node as well as to the load balancer instances:

cockroach cert create-node \
<ip db01> \
db01.priv  \
localhost \
127.0.0.1 \
<lb ip> \
<lb fqdn>  \
--certs-dir=certs \
--ca-key=my-safe-directory/ca.key

Upload the CA certificate, node certificate, and key to the first node.
```
scp -J bastion@<gateway ip>:61000 -r certs/ root@db01.priv:~/
```
Delete the local copy of the node certificate and key:
```
rm -f certs/node.*
```
Repeat steps 3 to 5 for each additional node.

Create a client certificate and key for the root user.

cockroach cert create-client root \
--certs-dir=certs \
--ca-key=my-safe-directory/ca.key

Starting the nodes

You will automate starting nodes using systemd.

Note

After completing these steps, nodes will not yet be live. They will complete the startup process and join together to form a cluster as soon as the cluster is initialized in the next step.

For each initial node of your cluster, complete the following steps:

Connect to the machine where you want the node to run via SSH. Ensure you are logged in as the root user.

Download the CockroachDB archive for Linux, and extract the binary:

curl https://binaries.cockroachdb.com/cockroach-v23.2.3.linux-amd64.tgz | tar -xz

Copy the binary into the PATH:
```
cp -i cockroach-v23.2.3.linux-amd64/cockroach /usr/local/bin/
```
CockroachDB uses custom-built versions of the GEOS libraries.

Copy the libraries to the location where CockroachDB expects to find them:

mkdir -p /usr/local/lib/cockroach
cp -i cockroach-v23.2.3.linux-amd64/lib/libgeos.so /usr/local/lib/cockroach/
cp -i cockroach-v23.2.3.linux-amd64/lib/libgeos_c.so /usr/local/lib/cockroach/

Create the Cockroach directory.
```
mkdir /var/lib/cockroach
```

Prepare and attach the data volume.

mkfs -t ext4 /dev/sdb
e2label /dev/sdb DATA
echo "LABEL=DATA  	/var/lib/cockroach ext4 defaults 0 1" >> /etc/fstab
mount -a

Move the certs directory to the cockroach directory.
```
mv certs /var/lib/cockroach/
```
Create a Unix user named cockroach.
```
useradd cockroach
```
Change the ownership of the cockroach directory to the user cockroach.
```
chown -R cockroach /var/lib/cockroach
```

Download the sample configuration template and save the file in the /etc/systemd/system/ directory:

curl -sSL -o /etc/systemd/system/cockroachdb.service https://raw.githubusercontent.com/cockroachdb/docs/master/_includes/v23.2/prod-deployment/securecockroachdb.service

Alternatively, you can create the file yourself and copy the script into it:

[Unit]
Description=Cockroach Database cluster node
Requires=network.target
[Service]
Type=notify
WorkingDirectory=/var/lib/cockroach
ExecStart=/usr/local/bin/cockroach start --certs-dir=certs --advertise-addr=<node1 address> --join=<node1 address>,<node2 address>,<node3 address> --cache=.25 --max-sql-memory=.25
TimeoutStopSec=300
Restart=always
RestartSec=10
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=cockroach
User=cockroach
[Install]
WantedBy=default.target

In the sample configuration template, specify values for the following flags:

Flag Description

--advertise-addr Specifies the IP address/hostname and port to tell other nodes to use. The port number can be omitted, in which case it defaults to 26257.

This value must route to an IP address the node is listening on (with --listen-addr unspecified, the node listens on all IP addresses).

In some networking scenarios, you may need to use --advertise-addr and/or --listen-addr differently. For more details, see Networking.

--join Identifies the address of 3 of the initial nodes of the cluster. These addresses should match the addresses that the target nodes are advertising.

Flag	Description
`--advertise-addr`	Specifies the IP address/hostname and port to tell other nodes to use. The port number can be omitted, in which case it defaults to `26257`. This value must route to an IP address the node is listening on (with `--listen-addr` unspecified, the node listens on all IP addresses). In some networking scenarios, you may need to use `--advertise-addr` and/or `--listen-addr` differently. For more details, see Networking.
`--join`	Identifies the address of 3 of the initial nodes of the cluster. These addresses should match the addresses that the target nodes are advertising.

When deploying across multiple data centers, or when there is otherwise high latency between nodes, it is recommended to set --locality as well, like for example --locality=region=fr-par,zone=fr-par-1. It is also required to use certain enterprise features. For more details, see Locality.

For other flags not explicitly set, the command uses default values. For example, the node stores data in --store=cockroach-data and binds DB Console HTTP requests to --http-port=8080. To set these options manually, see the Start a Node documentation page.

Start the CockroachDB cluster.
```
systemctl start cockroachdb
```
Repeat these steps for each additional node that you want in your cluster.

Initializing the cluster

On your local machine, run the cockroach init command to complete the node startup process and have them join together as a cluster.

Run the cockroach init command, with the --host flag set to the Load Balancer address to access any of the nodes:

cockroach init --certs-dir=certs --host=<lb-fqdn>

The console is available at https://<lb-fqdn>.

Testing the cluster

CockroachDB replicates and distributes data behind the scenes and uses a Gossip protocol to enable each node to locate data across the cluster. Once a cluster is live, any node can be used as an SQL gateway.

When using a Load Balancer, you should issue commands directly to the Load Balancer, which then routes traffic to the nodes.

Use the built-in SQL client locally as follows:

Launch the built-in SQL client on your local machine with the --host flag set to the address of the Load Balancer:
```
cockroach sql --certs-dir=certs --host=<lb-fqdn>
```
Create a database called securenodetest.
```
CREATE DATABASE securenodetest;
```

View the cluster's databases, which will include securenodetest.

SHOW DATABASES;
+--------------------+
|      Database      |
+--------------------+
| crdb_internal      |
| information_schema |
| securenodetest     |
| pg_catalog         |
| system             |
+--------------------+
(5 rows)

Create an admin user and password to log in to the console:

CREATE USER webadmin WITH LOGIN PASSWORD '<strong password>';

GRANT admin TO webadmin;

Use \q to exit the SQL shell.
Use the webadmin user to log into the console via https://<lb-fqdn>.

Monitoring the cluster

Despite CockroachDB's various built-in safeguards against failure, it is critical to actively monitor the overall health and performance of a cluster running in production and to create alerting rules that promptly send notifications when there are events that require investigation or intervention.

You can leverage Scaleway Cockpit to set up monitoring and alerting using CockroachDB Prometheus endpoint via Prometheus Remote Write capabilities. This can be done by installing Prometheus as an agent on each node.

For details about available monitoring options and the most important events and metrics to alert on, see [Monitoring and Alerting(https://www.cockroachlabs.com/docs/v23.2/monitoring-and-alerting).

Scaling the cluster

You can start the new nodes using the same procedure used to create the initial nodes (steps 1, 2, 4, 5, and 6). The join list can be set to the initial 3 nodes installed.

Using the database

Now that your deployment is working, you can:

Implement your data model.
Create users and grant them privileges.
Connect your application. Be sure to connect your application to the load balancer, not to a CockroachDB node.
Make backups of your data.

You may also want to adjust the way the cluster replicates data. For example, by default, a multi-node cluster replicates all data 3 times; you can change this replication factor or create additional rules for replicating individual databases and tables differently. For more information, see Configure Replication Zones.

Important

When running a cluster of 5 nodes or more, it is safest to increase the replication factor for important internal data to 5, even if you do not do so for user data. For the cluster as a whole to remain available, the ranges for this internal data must always retain a majority of their replicas.

Going further

Questions?

Visit our Help Center and find the answers to your most frequent questions.

Visit Help Center