Object Storage - How Is It Built? (3/3)
In this article, we will go through the infrastructure design on which our object storage service runs. The first challenge was to find the right balance between the network, CPUs and IOPS.
Starting from today, we are publishing a series of articles about Object Storage.
In this series of articles, we will start with a wide description of the Object Storage technology currently in production at Scaleway.
In the following weeks, two other articles about Object Storage internal management and Object Storage infrastructure will see the light of day. Stay tuned!
Assuming you want to store a large number of static files such as videos, photos, and documents on an on-premise NAS. With traditional storage technologies you are going to stumble upon the following problems:
1. Limited disk space: All disks have limited space. Once your disks are full, you need to add a new one and reconfigure your system. You can spread over several disks on several machines, but your management costs increase on your infrastructures.
2. Hardware failure: Let suppose that you have several disks on which your data is stored. As your disks have a limited lifetime, they will fail randomly after a certain time. As a countermeasure, you can use systems such as RAID to limit data loss. However, this option is limited to your machine. If your machine becomes unavailable, all the data located on your machine will be unavailable as well. The outage will continue until you can restore this machine in the right state.
3. Costs: So now let assume that you have a working RAID array, on a replicated settings. With this configuration, you are charged for both used and unused space. In addition, you will have to invest a significant amount of time in administrating, upgrading and fixing security breaches.
4. Scalability issues: Your setup is popular among your friends and relatives and as a result, they would like to store their data on your solution. What was working for you at a certain point may not be working as your requirements grow. The technical choices you made to store 3 TB per week are certainly not the same as if you wanted to store 3 TB per day or hour. A significant usage increase can become a real challenge with storage solutions and investing time and energy to change your architecture is excruciating. In addition, IOPS are limited on storage media. Large loads of IN/OUT operations may result in a hardware failure as components would not keep up with the rate.
5. Integration into your application: Interfaces such as filesystems or storage middlewares can face multiple issues when running over a large cluster of machines with a large amount of data stored. Such technologies can also suffer from a performance drop when the amount of data stored increases.
Traditional architectures become less suitable to meet the demand of today's applications. Object Storage at Scaleway provides a real solution to face the challenges of the modern infrastructure.
Object Storage technologies are designed to store large amounts of data (think hundreds of petabytes).
To do so, an object storage does not rely on a single machine but on several interconnected machines. As the demand for capacity grows, standardized machine are added regularly.
To ensure the availability of your data, we are qualified to provide hardware resiliency and data security.
We ensure better durability of your data by storing the objects on several disks on different machines. This allow us to prevent data loss in case of hardware failures.
Those copies are distributed enbling us to have better availability. For instance, if one part of your file is corrupted or missing we can recover it easily from a healthy part of the cluster.
We also have efficient intervention processes regarding the physical infrastructures, in the event of hardware failure or unavailability. We ensure reliable communication with the dedicated teams to solve the problem quickly.
In addtion, we invest a lot of time and effort on our monitoring solution, which allows us to work more proactively on the cluster health status.
On the software aspect, we developed a distributed and robust solution to ensure that a customer cannot create a total unavailability of the platform. Indeed, different middlewares are in charge of the authentication of each of the requests. Each of the requests creates associated sub-processes (such as credentials checking or billing statements) which must be able to support 100 % of the load at any given time (for a load significantly higher than 100 million requests per day).
You are only billed for the storage you actually use. The price you pay can be easily calculated depending on your storage needs. You do not have to worry about the setup and configuration (RAID configuration, filesystem requirements.)
Get rid of the hazzle of administering your machines and dealing with security breaches, production start-up and so on. All this expertise and costs are delegated entirely to your provider.
Several protocols allow you to interact with the object storage.
The two protocols that dominate the market are S3 and SWIFT.
We have chosen the S3 protocol because it is massively integrated into third-party applications. Also, it allows us to take a strategic position in this market, limiting to a minimum the modifications required to use our cloud.
In practice, there are only a few edits to make in the configurations to save half of the storage costs. Make sure to check the features that are necessary to run your application. The majority of our work is to implement all the features offered by the competition (AWS S3, GCP Cloud Storage, Azure Blob storage, etc….).
The object storage via S3 is an HTTP protocol, i.e. you can interact with it directly with the curl command which returns an XML format. It can be very tedious to communicate only with raw HTTP calls, that's why our customers mainly use SDKs or CLI clients (such as rclone) and also because they integrate a lot of mechanisms that will simplify your life (retry, sync and others).
Object storage transforms files into objects. The content of your file is totally irrelevant to how it is stored. Whether it is a video, photos, text or binary data, it will be stored the same way.
Data are not placed in a hierarchy of directories. They live in a flat address space.
Applications identify discrete data objects by their unique address.
Objects are gathered into buckets which represents a space where objects are located together.
The bucket name must be a unique identifier which allows us to provide features such as the bucket location or geo-replication.
In your bucket, all objects are said to be first level. This is one of the biggest difference compared to traditional systems such as the filesystem. Let's see this with an example:
Filesystem V S Object storage
- dir1/ | - dir1/
- file1 | - dir1/file1
- file2 | - dir1/file2
- file3 | - dir1/file3
In this example, we can see that the directory dir1
, is replaced by the prefix dir1
, all objects (file1
, file2
and file3
) then share the prefix dir1
but are no longer contained in a directory called dir1
.
Object storage is fundamentally different from filesystem in this regard.
Projects such as s3fs exist to help you migrate from a traditional filesystem to object storage.
Another key difference is that you cannot update the content of an object partially. An object is immutable, and if you want to modify it, the entire object must be returned.
This has an impact on softwares that need to update files regularly such as databases. In the case of a database, its main characteristics are instant access time and ability to seek in a file (move to a certain byte, to read or partially manipulate a file).
If your filesystem is actually connected to an object storage, the access time will increase from a few nanoseconds to several milliseconds which will considerably slow down your database performance. Plus, if you modify only two bytes in a file that is even a few MB, you will have to send the entire file back through the network to update the object.
In the next articles we will see how Object Storage works internally and how the infrastructure was built.
Create your first bucket now!
In this article, we will go through the infrastructure design on which our object storage service runs. The first challenge was to find the right balance between the network, CPUs and IOPS.
In this article, we will present the internal architecture of Scaleway Object Storage.
In February 2022, we pushed Hive into production, demonstrating that a European cloud provider can build its own independent, fully developed in-house, software stack.