Migrating My Root Drive Raid1 Array to a Pair of NVME Drives.

I’ve always tried to keep upgrading my own Linux boxes. I enjoy it and as I found out a few years ago if I do not regularly keep updating hardware then I completely lose knowledge of doing so.
My latest project is to move all the services from my workstation to a separate box. I’ve got a few RPis around the house running a few services, but I’ve always used my workstation as a games machine/server/development/everything else. In fact for a number of years I stopped using Linux as a workstation at all and this machine was a headless server.
Because of this cycle of continuous upgrades this computer has existed for probably twenty years. Always running some form of Linux (mainly Gentoo).
Currently it’s using some space heater of a server motherboard with a pair of E5 2967v2 on a Supermicro X9dri-LN4 motherboard with 128Gb of ECC DDR3 RAM (very cheap!). That’s fine over winter (my office has no heating) but I do need to fully transfer all the services to the low power Debian box under my desk instead!
Anyway this computer boots from pair of SATA SSD drives in a RAID1 array, with a six disk RAID10 array for data. That array needs to be replaced by a single large drive when I’ve finished moving services to a new machine….!
The motherboard is too old to EFI boot from NVME drives. However, whilst browsing Reddit I came across some people talking about using an adaptor card to add in four NVME drives and using bifurcation toggle each drive proper access to the four PCIe channels that NVME devices need. So x4,x4,x4,x4 instead of x16.
This was not supported on this board, but turns out Supermicro did release a new BIOS that does support bifurcation.
So I bought the card they suggest and a pair of 1TB NVMe drives. The drives are only PCIev3 as that’s all the motherboard supports. PCIe is backwards/forwards compatible. but PCIev4 drives are considerably more expensive than PCIeV3 ones. I may as well get a pair of these, then when I upgrade to a PCIeV4 motherboard the available drives will likely be larger and cheaper!
– Asus M.2. X16 Gen 4
– 2 x WD Blue SN570 NVMe SSD 1TB
The adaptor and cards came. The adapter has got a lovely heatsink that sandwiches the drives in with a small low noise fan.
The adaptor took ten minutes to install. When booted up the BIOS setup disk was a little tricky to enable bifurcation as the slots are numbered from the bottom. This one was CPU1/slot 1.
I had to recompile the kernel to add NVME support, but once booted the pair of drives were there.
After many, many years of using /dev/sdX to refer to storage devices (I was using SCSI hardware before SATA), it does seem a little strange to be running parted on /dev/nvme1n1 then getting partition devices like /dev/nvme1n1p2
I know I should likely move to ZFS, but I’m knowledgeable enough about mdadm not to completely mess up things! ..and replacing a pair of RAID1 devices is just so easy with mdadm.

Workflow is:
– partition drive.
– add to raid array as spare
– fail drive to be removed and then remove it.
– Wait until raid1 array is synced again.
– repeat with second drive
– resize array, then resize the filesystem

procedure

fdisk /dev/nvme0n1

We can use fdisk again as fdisk is GPT aware. Previously we’d always used parted. But I prefer fdisk as I know it!
– partition
– Label the drive as GPT
– Make 256mb partition and mark as EFI boot.
– Make 2nd partition for the rest of the drive and mark that as type Linux raid.

Now add that drive to our raid1 array. For some reason it was not added to a spare, but was instead immediately synced to make a 3 drive raid1 array. I think this is because I previously created this array as a three drive array (for reasons I forget). I guess that’s stored in the metadata of the array.

mdadm /dev/md127 --add /dev/nvme0n1p2

We can watch the progress:

watch cat /proc/mdstat

Once completed we can fail and then remove the drive

mdadm --manage /dev/md127 --fail /dev/sdh3
mdadm /dev/md127 --remove /dev/sdh3

Then let’s update our mdadm.conf file

mdadm --detail --scan >> /etc/mdadm/mdadm.conf

Then remove the oldlines:

vi /etc/mdadm/mdadm.conf

Finally let’s wipe that old drive of RAID data so the array does not try to reassemble it to the array.

wipefs -a /dev/sdh3

A reboot is a good idea now to ensure the array is correctly assembled (and new PAT reread).

Now let#s copy the partition table (PAT) to the second new drive.

sgdisk /dev/nvme0n1 -R /dev/nvme1n1

Then randomise the UIDs

sgdisk -G /dev/nvme1n1

Check all is OK

Now repeat adding the second new drive:

mdadm /dev/md127 --add /dev/nvme1n1p2
mdadm --manage /dev/md127 --fail /dev/sdg3
mdadm /dev/md127 --remove /dev/sdg3
mdadm --detail --scan >> /etc/mdadm/mdadm.conf
wipefs -a /dev/sdg3

resize after adding new devices

mdadm –grow –size max /dev/md127
df -h
Then resize the filesystem
resize2fs -p /dev/md127

benchmarks

dd if=/dev/zero of=/home/chris/TESTSDD bs=1G count=2 oflag=dsync 

2+0 records in
2+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 4.7162 s, 455 MB/s

dd if=/dev/zero of=/mnt/storage/TESTSDD bs=1G count=2 oflag=dsync

2+0 records in
2+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 10.0978 s, 213 MB/s

dd if=/home/chris/TESTSDD of=/dev/null bs=8k                                                              !10032

262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 0.666261 s, 3.2 GB/s

dd if=/mnt/storage/TESTSDD of=/dev/null bs=8k                                                             !10034

262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 0.573111 s, 3.7 GB/s

I did think that the write speed would be faster. But I guess dd is not the most accurate of benchmark tools.

Use of Backup in Anger!

I lost my entire RAID10 array yesterday. In a fit of “too much noise in office” I removed the hot swap SCSI array box from my workstation box, attached it to a wooden platform, and suspended it in a large plastic box using an old inner tube from my bike. This really reduced the noise, however, like a moron, I did not attach the scsi cable properly and 2 drives got kicked from the array. That was not a problem. However, what was, is when I tried to re-assemble the array without checking the cable. I ended up wiping one of the raid partitions. Still not a major issue, except I subsequently zeroed out the superblock of the missing drive in order to add it back in. Anyway, that was my array lost!

As a main backup strategy I use an homebrewed incremental Rsync script to back up my Linux workstation everynight to a 2Tb ReadyNas+ box (Macs are backed up with a combination of Time Machine and Super Duper). So now I had a chance to test it out. So after recreating the array and copying the data across the network I was back up and running!

mdadm --create /dev/md1 --chunk=256 -R  -l 10 -n 4 -p f2 /dev/sd[abcd]3 
echo 300000 >> /sys/block/md1/md/sync_speed_max 
watch cat /proc/mdstat 
mkfs.xfs /dev/md1
mount /home
rsync -avP /mnt/backup/SCOTGATEHome/current/ /home/

It took about 1 hour to sync, and then 3 hours to copy across the 156Gb of files over the network.

It all worked great, and I’m very pleased to know that my backup strategy is working!

Now back to complete the “silent and suspended hard drive array!”

Linux RAID Wiki

A lot of people keep coming back to a post from 2006 I wrote about expanding a raid5 array. Although that post is still very much relevant 2 years is a looong time in the Linux kernel world, and things have moved on. Rather than use the information here you’re better off going to the Linux Raid wiki. This is very much an active and up to date resource for playing/creating and using Linux software raid and MDADM. It’s contains the thoughts and the resources of many people from the very interesting Linux Kernel raid mailing list.

Linux Raid Wiki

There’s a lot of outdated stuff concerning Linux software raid out there. Over the last 6 months there’s been a concerted effort by people on the Linux Raid mailing list to improve this situation. The continuing results can be seen on this Wiki:

http://linux-raid.osdl.org/

It’s a great resource already, and getting better. Go have a look if you want to know about the current status of just what cool stuff you can do.

RAID10 over 6 devices

I’ve been struggling to find the best layout for my 6 drive RAID10 device: I settled for near with 2 copies, and a 1M chunksize. As far as I understand this I get the r/w speed equivalent to RAID0 with this. It means that each block has two copies, which are striped across 2 drives. So you get the RAID0 speed, with the redundancy of RAID1. Plus since this is kernel RAID10, it’s a lot easier to create and more flexible than a mirror of raid0 devices.
mdadm --create /dev/md1 --chunk=1024 --level=raid10 --layout=f2 --raid-devices=6 /dev/sd[defghi]1

So far the speed seems to be OK. Nothing at all spectacular though. At the moment I think my limitation is that I am running a 64bit PCI-X u320 card in a 32bit PCI slot since my motherboard only has these…. I need to get myself a PCI-X dual socket 604 motherboard. Trouble is these are quite expensive…..! In my quest to silence the machine I replaced the fan in my Graphics card with a quieter one. Unfortunately whilst doing so I knocked of a small component. The card works fine except that I get visual aretefacts on any hw accelerated graphics now. I’ve alwasy steered away from upgrading my motherboard due to the fact I would need to also buy a new GPU to go with PCI-E slots. But if this is now damaged perhaps I should splurge out. Hmmm.

Raid 5 to 6 conversion possible?

A recent post on Raj’s blog made me think about giving RAID6 a play. This then makes me wonder whether a RAID 5 to 6 conversion is possible? I guess this is completely impossible without a complete reformatting. But there again growing a RAID 5 array is also a similar experience, and this is now possible. I would suggest that both require adjustment of stripes. or am I really talking out of my behind?

I recently used mdadm’s new grow function to increase my 3 drive RAID5 array to 4 drives. This worked perfectly. I trust my backups, so am quite happy to risk its integrity to convert to RAID6. Hmmmm!

Growing a RAID5 array – MDADM

With the release of kernel 2.6.17, there’s new functionality to add a device (partition) to a RAID 5 array and make this new device part of the actual array rather than a spare.

I cam across this post a while ago on the LK list.

LKML: Sander: RAID5 grow success (was: Re: 2.6.16-rc6-mm2)

So I gave it a go. My HOME directory is mounted on a 3x70gb SCSI RAID5 array. so I tried adding a further drive.

Although with the release of mdadm > 2.4 the only real critical part of the process is safer (it backs up some live data that is being copied), I didn;t fancy risking growing a mounted array. So I did plenty of backups, then switched to single user run level.

Basically the step includes adding a disc to the array as a spare, then growing the array onto this device.

mdadm --add /dev/md1 /dev/sdf1
mdadm --grow /dev/md1 --raid-devices=4

This then took about 3 hours to reshape the array.

The filesystem the needs to be expanded to fill up the new space.

fsck.ext3 /dev/md1
resize2fs /dev/md1

I then remounted the drive and wahey. Lots extra space….! Cool or what

EDIT  It’s over 18 months since I wrote this post, and Linux kernel RAID and mdadm have continued to move on. All the info here is still current, but as extra info check out the Linux RAID Wiki.

EDIT 2  The Linux Raid Wiki has moved