HomeBare MetalElastic MetalTroubleshooting
Replacing a failed drive in a software RAID
Jump toUpdate content

Replacing a failed drive in a software RAID

Reviewed on 19 September 2023 • Published on 26 August 2022

Each Elastic Metal server uses a RAID1 configuration after installation from the Scaleway console. If you want to change the RAID configuration of the server, you can modify the RAID array using rescue mode.

Security & Identity (IAM):

You may need certain IAM permissions to carry out some actions described on this page. This means:

  • you are the Owner of the Scaleway Organization in which the actions will be carried out, or
  • you are an IAM user of the Organization, with a policy granting you the necessary permission sets

Removing the failed disk from the RAID configuration

Tip:

It is recommended to make a backup of your data before proceeding.

  1. Boot server in rescue mode from the Scaleway console.

  2. Log in to the server using the rescue account:

    ssh em-XXX@<your_elastic_metal_ip>
    Tip:

    The rescue credentials are available from your server’s status page in the Scaleway console.

  3. Run the following command to make sure all disk caches are written to the disk:

    sync
  4. Mark the failed disk as failed using mdadm:

    mdadm --manage /dev/md0 --fail /dev/sdb2
  5. Visualize the existing mdadm RAID devices by running the following command:

    cat /proc/mdstat

    An output as follows displays:

    Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
    md126 : active (auto-read-only) raid1 sdb3[1] sda3[0]
    974869504 blocks super 1.2 [2/2] [UU]
    resync=PENDING
    bitmap: 8/8 pages [32KB], 65536KB chunk
    md127 : active (auto-read-only) raid1 sdb2[1](F) sda2[0]
    523264 blocks super 1.2 [2/2] [UU]
    unused devices: <none>

    The faulty device is marked with (F).

  6. Remove the failed disk using the mdadm --manage command:

    root@elastic-metal:~# # mdadm --manage /dev/md0 --remove /dev/sdb2
  7. Contact the technical support to replace the failed disk with a working one.

Adding the replacement disk to the RAID

  1. Once the failed disk is replaced, copy the partition table of the source disk to the new disk:

    sfdisk -d /dev/sda | sfdisk /dev/sdb
    Important:

    The sfdisk command above replaces the entire partition table on the new disk with the one of the source disk. Modify the command if you require preserving other partition information on the disk.

  2. Create a mirror of the source disk using the mdadm command:

    mdadm --manage /dev/md0 --add /dev/sdb2
  3. Verify the status of the configuration:

    mdadm --detail /dev/md0
    Tip:

    Use the following command to show the progress of the recovery of the mirror disk:

    cat /proc/mdstat