Data Warehouse for ClickHouse® features and limitations
This page lists the different features and limitations of Scaleway Data Warehouse for ClickHouse®.
Features
Load Balancer
Every Scaleway Data Warehouse for ClickHouse® deployment comes automatically with a Load Balancer, even for deployments consisting of only 1 node.
This Load Balancer automatically balances the queries over the different nodes of your deployment. It is therefore not possible to know which node will process a query.
Data replication
For better performance and ease of use, Data Warehouse for ClickHouse® replicates the data across all nodes of a deployment.
Replication is achieved by aliasing commands:
| Default command | Replaced by |
|---|---|
CREATE DATABASE <database> | CREATE DATABASE <database> ON CLUSTER <Scaleway cluster> |
DELETE DATABASE <database> | DELETE DATABASE <database> ON CLUSTER <Scaleway cluster> |
Creating a table in the MergeTree family will also be aliased in order to create the Replicated version:
| Default table | Replaced by |
|---|---|
| MergeTree | ReplicatedMergeTree |
| ReplacingMergeTree | ReplicatedReplacingMergeTree |
| CoalescingMergeTree | ReplicatedCoalescingMergeTree |
| SummingMergeTree | ReplicatedSummingMergeTree |
| AggregatingMergeTree | ReplicatedAggregatingMergeTree |
| CollapsingMergeTree | ReplicatedCollapsingMergeTree |
| VersionedCollapsingMergeTree | ReplicatedVersionedCollapsingMergeTree |
| GraphiteMergeTree | ReplicatedGraphiteMergeTree |
Refer to the official ClickHouse® documentation for more information on the MergeTree table family.
Limitations
Sharding
Sharding cannot be manually configured in Data Warehouse for ClickHouse®. All nodes in the cluster contain a full copy of the data, meaning the deployment operates in a replicated (or "replica") mode rather than a sharded (or "distributed") architecture.
The total data capacity of the cluster is therefore limited to the storage of a single node, and single queries cannot be parallelized across shards to enhance performance.
Queries are executed on each replica independently, so while high availability and read scalability are improved, compute resources are not horizontally scalable for large analytical workloads that would benefit from data distribution.
Distributed table engine
Due to the absence of sharding, the Distributed table engine has no effect in a Data Warehouse for ClickHouse®.