Skip to navigationSkip to main contentSkip to footerScaleway DocsAsk our AI
Ask our AI

How to manage data storage in a Data Warehouse for ClickHouse® deployment

Before you start

To complete the actions presented below, you must have:

  • A Scaleway account logged into the console
  • Owner status or IAM permissions allowing you to perform actions in the intended Organization

Overview

Scaleway Data Warehouse for ClickHouse® uses ClickHouse®'s native storage policies to decide where table data lives: on Scaleway Block storage, and/or on Object Storage (Amazon S3). Policies are applied at table-level during table creation, using the settings presented below. You can choose between three policies for your deployment:

  • tiered (default in Scaleway Data Warehouse for ClickHouse®)
  • s3_cache (cache mode)
  • default (local-only mode)

The table below provides an overview of each storage policy's behavior and use case.

Storage policyBehaviorWhen to use
tieredWrites on Block Storage first, and parts are automatically moved to Object Storage when disk is 90% full.General purpose setting for fast local writes and transparent spillover to Object Storage as the dataset grows.
s3_cacheData is stored on Object Storage. A local Block Storage cache layer keeps frequently-read parts to accelerate repeated reads.Large datasets that mostly live in Object Storage, with repeated reads on smaller subsets that benefit from local caching.
defaultData is stored on Block storage only.Small datasets where lowest latency for reads/writes on local disk is desired and capacity fits the Block Storage volume.

Tiered storage policy

Applying the tiered policy

Tiered is the deployment's default policy. Data is stored on the hot volume (Block Storage) until it reaches 90% of its capacity, then data is moved to the cold volume. This mechanism is controlled by move_factor:0.1.

Connect to you deployment, then run the SQL query below to apply the tiered storage policy.

Note

If you omit the storage_policy setting, ClickHouse® will automatically apply the tiered policy.

Moving data with the tiered policy

When using the tiered storage policy, you can manually move partitions between hot (Block Storage) and cold (Object Storage) volumes using the SQL queries below.

  • Moving specific partitions to the cold volume (Object Storage):

    ALTER TABLE tiered_table MOVE PARTITION '2024-08' TO VOLUME 'cold';
  • Moving a non-partitioned table (tuple()) to the cold volume (Object Storage):

    ALTER TABLE tiered_table_without_partitions MOVE PARTITION tuple() TO VOLUME 'cold';
  • Moving a partition to the hot volume (Block Storage):

    ALTER TABLE tiered_table MOVE PARTITION '2024-08' TO VOLUME 'hot';
  • Moving a non-partitioned table (tuple()) to the hot volume (Block Storage):

    ALTER TABLE tiered_table_without_partitions MOVE PARTITION tuple() TO VOLUME 'hot';

S3 cache policy

Connect to your deployment, then run the SQL query below to create a new table with the s3_cache policy enabled.

CREATE TABLE cache_table
(
  id UInt64,
  ts DateTime,
  value Float32
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(ts)
ORDER BY id
SETTINGS storage_policy = 's3_cache';

Local-only policy

Connect to your deployment, then run the SQL query below to create a new table with the default (Block Storage only) policy enabled.

CREATE TABLE local_table
(
  id UInt64,
  ts DateTime,
  value Float32
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(ts)
ORDER BY id
SETTINGS storage_policy = 'default';
Still need help?

Create a support ticket
No Results