So here we are, with the main server and the backup server that have been running faithfully for almost a decade about to get a serious hardware upgrade. Each of their arrays of eight 3TB drives is coming out and three spanking new Seagate IronWolf 14TB drives are replacing them.
In part 3 we finished most of the physical work. All we need to do now is install the new drives temporarily using the home-printed drive holders, wire the drives into the system and get the data shifted across. Basically, just software from now on.
What could possibly go wrong?
We took a break at the end of part 3 to ponder just that question. You can call it procrastination. I call it planning.
PLANNING HELPS, BELIEVE ME. If I’d started the data run right away it wouldn’t have been disastrous. But it would have been—shall we say—uncool.
You might think that hard drives know how to take care of themselves and when they start to get too hot will just throttle themselves down until they feel comfortable again. This is what the main processor on your motherboard does. Throttling slows down your computer but saves that crucial central component from burn-out.
Hard drives don’t behave like this, however. And high temperatures can be a factor in early failure. Not, however, nearly as much as people used to think.
I’ve already mentioned the famous study by Google published in 2007, which highlighted the shortcomings of SMART. That same paper also revealed that the conventional wisdom about temperature and drive failure—mainly a guess—was wrong. The correlation between running temperature and failure is weak, Google told us. Especially for new drives. Temperature only becomes more significant for drives in the second half of their natural lives, that is to say, drives that have been running for three years or more.
So with brand new drives, no problem. That became the new conventional wisdom.
But that’s not right, either. Astute analysis of Google’s data and conclusions points out that although drives are typically rated to run as hot as 60°C (70°C, in the case of these IronWolfs) this is significantly above the top temperature covered by Google’s data. Google was running live data centres, designed to preserve data, not test drives to destruction.
Not one of the 100,000 consumer hard drives that make up Google’s data set was allowed to exceed 50°C.
Not good enough. Overheating wouldn’t cause these IronWolfs to drop dead but it might well shorten their time on the planet. The IronWolf warranty runs for three years*; statistically, it would be reasonable to expect at least five years of service. No sense in chancing that.
My pre-migration tests revealed that with the drives set up and running, the airstream from the fan was escaping inefficiently around the sides. I can read the drive temperatures through FreeNAS and they seemed to be OK. But I didn’t want to take any chances—the three IronWolfs were going to be working hard during the data transfer and it was important to make sure they stayed within their operating temperature range. Or, better still, within the tolerances of Google’s data centres. And best of all, as low as possible.
Another session on the Prusa produced a couple of simple, grey airflow-directing cheeks. You can see one of them peeking in at the bottom left-hand side of the picture.
Nothing especially worth contributing to Thingiverse, just an ephemeral fix for the duration of the data transfer. Once that was done I’d be able to take out the old 3TB drives and there would be plenty of room to house the new IronWolfs properly and keep them properly cool.
Keep an eye on your drive temperatures, whatever operating system you’re running. Drives don’t like it hot.
The Epic Data Migration
Before changing anything, I saved the FreeNAS configuration file from each of the servers onto a desktop machine. And onto a laptop, just in case. The config file stores settings such as tasks, services, user accounts, shares and so on—basically everything unique to the environment you’ve set up.
With everything physically installed and connected, I booted FreeNAS and created a new, empty zpool with the three IronWolfs, giving it a temporary name. Now to fill it with the old data.
rsync is clever enough just to back up the difference between the two files. This is particularly important when you’re working with very large files like disk images and virtual machines images.
rsync’s particular genius is moving data only when necessary. It can look at what’s in the source pool, compare it with any data knocking around in the target pool and only bother to shift across what’s needed.
In this particular case, of course, it was moving everything. But if I’d had to interrupt the transfer at some stage, rsync would know exactly how to pick up and carry on.
There’s a handy feature that’s very useful when you’re getting to know rsync. As you can imagine, a software tool that can shift your bits around fast and efficiently can do a lot of damage in the wrong hands, or in the right hands on a bad day.
Today’s “arts and crafts” applications—I’m thinking particularly of something like Photoshop or its excellent open source equivalent, Gimp—let you experiment with changes, giving you the ability to back out multiple steps if you mess up.
Something like that would be invaluable for what I was about to do.
However, rsync is a standard utility that works at a low level across a variety of different flavours of Unix. For this reason, it can’t impose its own undo rules or take advantage of any particular unique abilities of the filesystem. This limited the scope to offer an “undo” feature for something as radical as swarming data from one place to another.
But programmers Andrew Tridgell and Paul Mackerras, who gave rsync to the world nearly a quarter of a century ago, were ahead of their time. Instead, they included a special flag you can add to the command line to show you what would happen if you carry out the command… without it actually happening:
ZFS has its own replication technique, ideologically purer but slightly more involved, requiring you to create a snapshot of the source data first. This is the preferred way of copying a whole zpool when there are users on the system, as the snapshot—the clue’s in the name—creates a very fast picture of the current state of the data. This picture is then used to define the backup.
Although the snapshot is fast, backing up that snapshot—as I discovered when I used ZFS replication instead of rsync on the backup server—isn’t significantly faster than rsync. You still have the same amount of data to shift.
New pools, old data
I did have users on the system and wanted to make sure the data landing up on the new drives exactly reflected the latest state of play. So I shut down all the file sharing to make sure that no new data would be accepted or existing data deleted.
The migration took just over one working day. Theoretically, moving 13TB across a 6Gb/s SATA III interface should be finished in a shade under five hours. The bottleneck here was the speed of the disks delivering and receiving data. The read-write heads move fast, pretty much vibrating like an electric toothbrush. But it’s physical, real-world speed, not digital electron speed.
After dry-running rsync against both pools post-migration to double-check that all the data had made it across, I was in a position to remove the old pool.
This involved detaching the old and new pools from FreeNAS—in ZFS that’s called exporting—powering down the system, and disconnecting and removing the old disks. I then physically installed the new disks into the server chassis, powered up and re-imported the new pool—the one with the temporary name—and renamed it with the old pool’s name.
At this point, the server contained a new disk pool but with the original name, which FreeNAS recognised as if it were the old pool. I reset the FreeNAS configuration back to default, and uploaded the saved configuration.
As far as FreeNAS was concerned, everything was back as it was before the migration. That was the theory.
Well, not quite everything. I did have to delete and recreate all the shares—NFS, Windows and AFP—but this enabled me to double-check that permissions were properly set. It seems that there were differences between the old and new pools that the software was unable to resolve.
That aside, the technology worked flawlessly, both the software and the hardware doing what was required without failures.
Time will tell if my chosen configuration is the right one. I still need to move a few things around to make sure the data on both servers is identical and I’m getting the full backup I’m relying on.
Planning paid dividends. Perhaps it would have been a more entertaining story had there been more problems to write about. But for me, the learning point is that time spent planning saves time.
Where are we going with all this?
The switch from FreeNAS to TrueNAS Core we mentioned in part 2 isn’t expected to be production-ready until next year. A beta version is available for the curious-minded, but I wouldn’t recommend it to Tested Technology readers new to the game, and nobody should be using it for real data at this stage.
Thoughout this story we’ve been calling FreeNAS “an operating system”. That was shorthand: it would be more accurate to call it “middleware”, a software layer between the actual operating system, FreeBSD, and the graphical interface the user sees.
I’m looking forward to the arrival of production-ready TrueNAS Core. But I’m not expecting much to change in the way I use my NAS. I know Linux well, but I won’t seize the opportunity to switch it in under the new TrueNAS Core middleware because—well, if it works, why break it?
I wouldn’t like to think that what you’ve been reading here has given you the impression I’m hunched over a terminal 24/7 tweaking and twiddling. The truth is, I don’t interact much with the server on that level. Everything, including regular data backups, data transfers, configuration backups, scrubs, snapshots and so on, is all automated by either the system or by scripts I’ve written myself.
And where are you going?
I hope this adventure of ours will have inspired you to investigate FreeNAS and perhaps give it a go either right away or when it assumes its grand new title of TrueNAS Core in a few months’ time.
Or, if you don’t want to go the full NAS route, you might be interested in investigating the very real advantages of the OpenZFS filesystem. We’re particularly fortunate to have had Seagate’s backing with the generous donation of some valuable assets. But you don’t need to start with multiple large drives. You can put a toe in the water with ZFS starting with a single drive and 8GB of RAM, pretty much the entry-level for PCs and laptops these days.
In fact, even a Raspberry Pi 3, costing less than £25, can run OpenZFS. And the very popular Ubuntu Linux operating system, which has had optional support for ZFS since October, 2019, is currently adding an experimental module, ZSys, to make a ZFS-based install easy. Canonical, the outfit running Ubuntu, plans to backport a production-ready version of ZSys into the current Ubuntu 20.04 LTS operating system.
Whatever you do, Tested Technology wishes you the best of luck with it. Or, perhaps better than relying on luck—Tested Technology wishes you the best of planning.
Manek Dubash: 11-Aug-20