The right storage fit for your infrastructure

Build
Constance Morales
8 min read

I’m Constance, Product Marketing Manager of Block Storage. Since arriving at Scaleway, I realized that users are increasingly investing in storage capability, which has led us to accelerate the development of new Storage solutions to fit as many of our clients’ needs as possible.

I have come to two conclusions:

  1. Storage definitely is a key requirement for numerous cloud use cases
  2. Storage represents a significant expense - “9%” of the overall growth (“+4% in 2022”) of IT spending (even Gartner said so!)

How can startups, companies and individuals make sure their cloud storage usage is optimized and that every cent invested in storage is worth it?
Because, believe it or not, I've seen impressively well-built infrastructures in which storage was a real lever for better performance, cost efficiency, security and compliance.

So how do we exploit storage to fully benefit from it?

Keep reading to find our guide to Scaleway Storage solutions, how they differ, and discover which solution is best-suited to your cloud usage. And a big thank you to Franck Pagny, Scaleway’s PM on Serverless Database, Marie Debard, Scaleway’s PM on Object Storage, and Thomas Deschamps PM on Block Storage who helped to build this guide!

Object Storage

Object Storage allows you to store large amounts of unstructured data (documents, images, videos, etc.) and to distribute them instantly, anywhere in the world.

How does Object Storage work?

Object Storage stores data as distinct units, or "objects". Each object has a unique identifier, and is bundled with highly customizable metadata, making it easier for you to control the way that you upload, download, access and analyze your stored data. Objects are stored in a flat address space, with no file paths, and kept in secure and scalable buckets, making it easier to locate and retrieve your data across regions.

Object Storage specificities

  • Designed for multi-cloud interoperability
  • Ease of data management:
    Intuitive storage classes (One-zone Infrequent Access, Multi-AZ Standard, Glacier)
    Use of HTTP to access data: retrieve data simply via different applications or web browsers. Scaleway Object Storage is compatible with S3 standards
    Lifecycle rules for object deletion or transitioning to a cold storage classes
    Tags and metadata
    Observability dashboard
  • Resilient:
    Multi-AZ resiliency in the Paris region, with the same coming soon for the Amsterdam and Warsaw regions
    Erasure coding
    Object Storage powered by Hive, Scaleway's highly scalable and globally distributed database.
  • Secure:
    Object lock to prevent object deletion
    Fine-grained permission access management at user and application level, bucket policies

Object Storage use cases

Object Storage is a solution for any organization which needs to store a massive and growing amount of unstructured data in a scalable, efficient, and affordable way. Common use cases include:

Applications & websites
Great when content needs to be highly available and highly durable, such as for streaming videos and serving images,documents and other website files.

Data lakes
Object storage allows you to centralize a vast amount of data in its native format. Advantages like unlimited volume, low price and high scalability make object storage the go-to option to build a data lake.

Machine learning
Often, a large amount of data must be stored to train models, Object Storage is a great solution for this.

Cost optimization on Object Storage

It is essential to choose the right storage for your intended usage.
Scaleway Object Storage offers three different storage classes whose price and performance vary, making it easy to optimize storage costs according to your needs.

  • Standard class: for data that is accessed frequently. Multi-AZ resiliency is offered in the Paris region, meaning that data is distributed across different Availability Zones (AZ). This keeps data resilient against the total loss of one AZ in the event of an accident or disaster. Standard class in AMS and WAW currently allows you to store data across one AZ, and will soon be multi-AZ.
  • One Zone - Infrequent Access (IA) class: data is stored across three racks inside the same DC. This storage class is a good choice for storing secondary backup copies or recreatable data.
  • Glacier class: ideal for data that is accessed infrequently.

You can also leverage storage with microservices. Data stored in Object storage needs to be processed, and this usually means provisioning and configuring VMs, managing load balancers, and tweaking autoscaling rules. This is where Serverless comes in: Serverless functions free you from configuring and managing infrastructure and let you focus on your data. You can schedule automated transformation of all *.jpg or *.png images stored in an Object Storage bucket.

Cold storage

Cold Storage is an Object Storage class that is used to store “cold data”, the opposite of “hot data”. Hot data needs to be easily and quickly accessible as it is accessed and used very frequently. Cold storage, on the contrary, represents data that isn’t used frequently and thus doesn’t need fast access.

How does Cold Storage work?

Scaleway’s cold storage service, Glacier, is engineered on specific hardware:

  • SMR disks that increase storage density and overall per-drive storage capacity
  • Motherboards manufactured internally that can power disks on and off on-demand with a SATA bus tree matrix.
    This particular hardware configuration allows us to offer competitive pricing, but the compromise is that data stored on cold storage has limitations in relation to data retrieval. What limitations exactly?

An object stored in Glacier class is listed for you to see, but cannot be downloaded instantly. It needs to be restored to the Standard class first. It can take anywhere from a few seconds to 24 hours to retrieve the first byte of an average-sized file. To facilitate restoration and ensure fast restitution of your data, we recommend using average-sized files (larger than 1MB).

Cold Storage specificities

  • Cost-efficient
  • Extremely reliable and secure: at Scaleway the SMR disks dedicated to cold storage are stored in our most secure data center: the “data-bunker”, a former nuclear fallout shelter within which your data is totally secure and isolated from all natural and technological risks.
  • European: Scaleway only stores data in Europe, which means it's not subject to any extraterritorial legislation.

Cold storage use cases

Cold Storage use cases center around deep archiving: storing data that you need to keep for regulatory purposes but don’t need to access frequently or quickly.

Backups and archiving
Use S3 lifecycle management features and versioning to automatically archive data such as logs or backups to Scaleway Glacier after a certain period of time. Benefit from a lower price, and retrieve your data when needed.

Legal archive
Archive highly restricted legal documents required by the law such as contracts, accounting data, administrative documents or access logs.
Stay compliant with GDPR and other local regulations while limiting your budget.

Cost optimization on Cold Storage

Now that you understand what Scaleway Glacier is designed for, let’s have a closer look at how Scaleway Glacier is perfect for cost optimization.

The amount of data you need to store inevitably grows over time, and some of this will include data that you use and access very infrequently or not at all. But that doesn’t mean that this largely unused data is useless, and this is where Glacier comes in. Thanks to lifecycle rule management, you will be able to reduce your storage costs by “freezing” infrequently accessed data in cold storage. This will represent an approximate 56% saving on your object storage budget (by moving 70% of your data to Glacier).

In addition, moving data from Object Storage “hot” classes to Glacier class (or vice versa) is free of charge.

On top of this, you can also set an expiration date for your data so that it is deleted after this time, to help you keep your budget even further under control.

Block Storage

Scaleway Block Storage provides network-attached storage (NAS) that can be plugged in and out of cloud products such as Instances like a virtual hard-drive. Block Storage devices are independent from the local storage of Instances, and the fact that they are accessed over a network connection makes it easy to move them between Instances in the same Availability Zone. From the user’s point of view, once mounted, the block device behaves like a regular disk.

How does Block Storage work?

When you create a block volume attached to an Instance, the operating system detects it as a raw disk.

In the background, a block device is managed by our Ceph cluster as a collection of smaller pieces (called chunks or blocks). Each of these chunks are replicated 3 times to avoid data loss in the event of a storage medium failure. So when you provision a certain amount of Block Storage inside the data center, we provision three times this amount on multiple devices to ensure your data resiliency.

Block Storage’s specificities

  • Flexible volume per Instance, with up to 16 full SSD volumes, each one ranging from 1GB to 10TB.
  • Resilient: Block Storage redundantly stores data by replicating it three times on multiple devices, ensuring security and high availability.
  • Persistent: The boot-on-block feature allows Instances to boot from attached Block Storage volumes instead of local volumes. Save data even after Instance or cluster deletion. Ideal for quickly accessible and easily transferable data and backups.

Block Storage use cases

Block Storage at Scaleway consists of scalable and persistent SSD disks storing data on your virtual machines (VMs), making it easy to transfer to other VMs or to reinstall quickly on new VMs when you restart your machines.

  • Avoid disruption to business-critical applications: Scale your storage easily and quickly to meet your fluctuating business needs. Expand your Instance’s storage space up to 10TB if you exceed your current storage capacity.
  • Optimize Database Storage: Block Storage for Managed Database offers greater flexibility, allowing cost optimization by reducing compute power when not needed. In addition, several upcoming improvements on our Block Storage-based databases make it a better option for cloud-native uses. Some of our planned developments are:
  • Snapshot exports in a different AZs
  • Read replicas for both local storage and Block Storage databases
  • Block Storage performance improvements
  • Increase storage without any downtime

Cost optimization on Block Storage

Due to its replication, high availability and great performance, Block Storage is more expensive than other types of storage. However, you can nonetheless follow some simple rules of thumb to keep your costs low:

  1. Start with a reasonable volume size. Because Block Storage can be increased without any downtime, you don’t have to choose a huge amount of storage upfront, because you know you can expand it in the future when needed.
  2. If you want to store files for which you don’t need fast access, Object Storage might be a better, more cost-effective solution. In this case, you will still benefit from the high availability and resilience of your files but won’t pay for the speed you don’t need.
  3. Snapshots are full-volume copies of your Instance, and you can snapshot your block volume whenever you want. To make it cost-effective, you can transfer those snapshots to Object Storage!

File Storage

File storage stores data as files and presents it to the user in a hierarchical directory structure. To access a stored file on a file storage system, you must use the specific path to where the file is located, such as

/home/myuser/myphotos/christmas2022/CostaRica/beach.jpeg.

Data can either be stored on a local computer hard drive or on a network-attached storage solution, for example through network attached storage (NAS) devices.

This storage system supports a range of file access management features, such as ownership and permissions across a set of authenticated users. It also supports multiple concurrent writes and ensures high data availability.

How does file storage work?

File storage solutions usually avoid using only a local computer hard drive and rely instead on protocols such as Network File System (NFS), Rados (used in CephFS) or GlusterFS protocol, which create virtual filesystems in a file storage server and expose them to clients. Clients can see and interact with these virtual file systems exactly like a typical file system. As these virtual filesystems are stored on file storage servers and not locally, they can be shared by multiple clients simultaneously. Behind the scenes, servers implementing file storage will handle data replication and/or distribution among different physical nodes to ensure high availability and durability.

File storage specificities

  • Better latency than Object Storage: Thanks to protocols used, file storage is optimized for low latency, and retrieving data is simple and fast.
  • User-friendly data management: With its clear file hierarchy, file storage enables users to retrieve data easily and quickly. Typically, file storage automatically handles certain metadata (such as the directory structure) on behalf of users. This is the flat bucket structure provided by Object Storage for instance, where additional metadata may be needed to classify multiple buckets.
  • Concurrency handling: file storage enables multiple clients to access the same data.
  • High availability: Data can be replicated across multiple storage systems and Availability Zones to ensure it’s always available.

File storage use cases

File storage can be used in a broad range of applications. For example:

Applications & website content
When hosting static files with tools such as Content Management Systems (eg. WordPress, Drupa or Joomla), you can benefit from file storage to store images or video content. This enables the application to be containerized and scaled without having static storage as a bottleneck.

Data processing
Processing a large amount of data to perform operations such as data transformation or machine learning algorithms can require low storage latency to achieve higher performance. If data volumes and concurrent access stay reasonable, file storage can perform better than Object for some use cases.

Drive or file system transfer
Providing personal cloud drives for end users to store their documents, images, or videos online is also a typical use case for file storage. Indeed file storage’s hierarchical structure is already familiar to most end-users who want to organize their files, while also ensuring high data availability.

Cost optimization on File storage

File storage enables a good balance between latency requirements and scalability, despite generally being more expensive than Block or Object Storage per GB stored. File storage is mainly designed to scale data storage and concurrent access up to a certain point, while keeping the well-known file structure hierarchy.

Furthermore, as many solutions have used a file system structure over the last decades, it enables many legacy applications to use it with no or little migration effort required. In such situations, file storage can be a good asset to limit investment in application refactoring while still improving performance and limiting maintenance costs.

Like other storage systems, a file storage cost optimization also relies on assessing data access frequency and durability needs. Choosing a storage type adapted to these needs (hot/cold, single availability zone redundancy/multi-availability zone redundancy etc) can optimize costs drastically.

Final thoughts

In short, if you need compute power to build, test and launch an application, you’ll provision Instances, such as PLAY2, which need Block Storage, as a minimum requirement.

If you are a professional wedding photographer and you keep your clients’ photos for several years before erasing them, you’ll need cold storage, such as Scaleway Glacier to store 8K ultra-high-definition footage or raw image data sets that don't require frequent access.

If you're an e-commerce company managing OLTP protocols (Online Transaction Processing) you need to deploy and scale your PostgreSQL database seasonally. To do this, you’ll have to provide high-availability block storage to guarantee your data redundancy and integrity.

I hope this article helped you understand some of the storage possibilities available, and that you learnt some tips along the way. There is no secret recipe to building your perfectly optimized infrastructure just yet, but if you start by identifying your needs, and take a step back on your options, you are going in the right direction. If this article made you want to ask a million questions, reach out to us on our Slack Community!

Share on
Other articles about:

Recommended articles