Re: Growing RAID10 with active XFS filesystem

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Wols Lists <antlists@youngman.org.uk>
To: Emmanuel Florac <eflorac@intellique.com>
Cc: Dave Chinner <david@fromorbit.com>,
	linux-xfs@vger.kernel.org, linux-raid@vger.kernel.org
Subject: Re: Growing RAID10 with active XFS filesystem
Date: Fri, 12 Jan 2018 17:52:59 +0000	[thread overview]
Message-ID: <5A58F5FB.7000101@youngman.org.uk> (raw)
In-Reply-To: <20180112152538.2c4d8cd4@harpe.intellique.com>

On 12/01/18 14:25, Emmanuel Florac wrote:
> Le Fri, 12 Jan 2018 13:32:49 +0000
> Wols Lists <antlists@youngman.org.uk> écrivait:
> 
>> On 11/01/18 03:07, Dave Chinner wrote:
>>> XFS comes from a different background - high performance, high
>>> reliability and hardware RAID storage. Think hundreds of drives in a
>>> filesystem, not a handful. i.e. The XFS world is largely enterprise
>>> and HPC storage, not small DIY solutions for a home or back-room
>>> office.  We live in a different world, and MD rarely enters mine.  
>>
>> So what happens when the hardware raid structure changes?
> 
> hardware RAID controllers don't expose RAID structure to the software.
> So As far as XFS knows, a hardware RAID is just a very large disk.
> That's when using stripe unit and stripe width options make sense in
> mkfs_xfs.

Umm... So you can't partially populate a chassis and add more disks as
you need them? So you have to manually pass stripe unit and width at
creation time and then they are set in stone? Sorry that doesn't sound
that enterprisey to me :-(
> 
>> Ext allows you to grow a filesystem. Btrfs allows you to grow a
>> filesystem. Reiser allows you to grow a file system. Can you add more
>> disks to XFS and grow the filesystem?
> 
> Of course. xfs_growfs is your friend. Worked with online filesystems
> many years before that functionality came to other filesystems.
> 
>> My point is that all this causes geometries to change, and ext and
>> btrfs amongst others can clearly handle this. Can XFS?
> 
> Neither XFS, ext4 or btrfs can handle this. That's why Dave mentioned
> the fact that growing your RAID is almost always the wrong solution.
> A much better solution is to add a new array and use LVM to aggregate
> it with the existing ones.

Isn't this what btrfs does with a rebalance? And I may well be wrong,
but I got the impression that some file systems could change stripe
geometries dynamically.

Adding a new array imho breaks the KISS principle. So I now have
multiple arrays sitting on the hard drives (wasting parity disks if I
have raid5/6), multiple instances of LVM on top of that, and then the
filesystem sitting on top of multiple volumes.

As a hobbyist I want one array, with one LVM on top of that, and one
filesystem per volume. Anything else starts to get confusing. And if I'm
a professional sys-admin I would want that in spades! It's all very well
expecting a sys-admin to cope, but the fewer boobytraps and landmines
left lying around, the better!

Squaring the circle, again :-(
> 
> Basically growing an array then the filesystem on it generally works
> OK, BUT it may kill performance (or not). YMMV. At least, you *probably
> won't* get the performance gain that the difference of stripe width
> would permit when starting anew.
> 
Point taken - but how are you going to backup your huge petabyte XFS
filesystem to get the performance on your bigger array? Catch 22 ...

>> Because if it can, it seems to me the obvious solution to changing
>> raid geometries is that you need to grow the filesystem, and get that
>> to adjust its geometries.
> 
> Unfortunately that's nigh impossible. No filesystem in existence does
> that. The closest thing is ZFS ability to dynamically change stripe
> sizes, but when you extend a ZFS zpool it doesn't rebalance existing
> files and data (and offers absolutely no way to do it). Sorry, no pony.
> 
Well, how does raid get away with it, rebalancing and restriping
everything :-)

Yes I know, it's a major change if the original file system design
didn't allow for it, and major file system changes can be extremely
destructive to user data ...

>> Bear in mind, SUSE has now adopted XFS as the default filesystem for
>> partitions other than /. This means you are going to get a lot of
>> "hobbyist" systems running XFS on top of MD and LVM. Are you telling
>> me that XFS is actually very badly suited to be a default filesystem
>> for SUSE?
> 
> Doesn't seem so. In fact XFS is less permissive than other filesystems,
> and it's a *darn good thing* IMO. It's better having frightening error
> messages "XFS force shutdown" than corrupted data, isn't it?

False dichotomy, I'm afraid. Do you really want a filesystem that
guarantees integrity, but trashes performance when you want to take
advantage of features such as resizing? I'd rather have integrity,
performance *and* features :-) (Pick any two, I know :-)
> 
>> What concerns me here is, not having a clue how LVM handles changing
>> partition sizes, what effect this will have on filesystems ... The
>> problem is the Unix philosophy of "do one thing and do it well".
>> Sometimes that's just not practical.
> 
> LVM volumes changes are propagated to upper levels. 

And what does the filesystem do with them? If LVM is sat on MD, what then?
> 
> If you don't like Unix principles, use Windows then :)
> 
The phrase "a rock and a hard place" comes to mind. Neither were
designed with commercial solidity and integrity and reliability in mind.
And having used commercial systems I get the impression NIH is alive and
kicking far too much. Both Linux and Windows are much more reliable and
solid than they were, but too many of those features are bolt-ons, and
they feel like it ...

>> The Unix philosophy says "leave
>> partition management to lvm, leave redundancy to md, leave the files
>> to the filesystem, ..." and then the filesystem comes along and says
>> "hey, I can't do my job very well, if I don't have a clue about the
>> physical disk layout". It's a hard circle to square ... :-)
> 
> Yeah, that was apparently the very same thinking that brought us ZFS.
> 
>> (Anecdotes about btrfs are that it's made a right pigs ear of trying
>> to do everything itself.)
>>
> 
> Not so sure. Btrfs is excellent, taking into account how little love it
> received for many years at Oracle.
> 
Yep. The solid features are just that - solid. Snag is, a lot of the
nice features are still experimental, and dangerous! Parity raid, for
example ... and I've heard rumours that the flaws could be unfixable, at
least not until btrfs-2 whenever that gets started ...

When MD adds disks, it rewrites the array from top to bottom or the
other way round, moving everything over to the new layout. Is there no
way a file system can do the same sort of thing? Okay, it would probably
need to be a defrag-like utility and linux prides itself on not needing
defrag :-)

Or could it simply switch over to optimising for the new geometry,
accept the fact that the reshape will have caused hotspots, and every
time it rewrites (meta)data, it adjusts it to the new geometry to
reduce/remove hotspots over time?

Cheers,
Wol

next prev parent reply	other threads:[~2018-01-12 17:53 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-08 19:08 Growing RAID10 with active XFS filesystem xfs.pkoch
2018-01-08 19:26 ` Darrick J. Wong
2018-01-08 22:01   ` Dave Chinner
2018-01-08 23:44     ` xfs.pkoch
2018-01-09  9:36     ` Wols Lists
2018-01-09 21:47       ` IMAP-FCC:Sent
2018-01-09 22:25       ` Dave Chinner
2018-01-09 22:32         ` Reindl Harald
2018-01-10  6:17         ` Wols Lists
2018-01-11  2:14           ` Dave Chinner
2018-01-12  2:16             ` Guoqing Jiang
2018-01-10 14:10         ` Phil Turmel
2018-01-11  3:07           ` Dave Chinner
2018-01-12 13:32             ` Wols Lists
2018-01-12 14:25               ` Emmanuel Florac
2018-01-12 17:52                 ` Wols Lists [this message]
2018-01-12 18:37                   ` Emmanuel Florac
2018-01-12 19:35                     ` Wol's lists
2018-01-13 12:30                       ` Brad Campbell
2018-01-13 13:18                         ` Wols Lists
2018-01-13  0:20                   ` Stan Hoeppner
2018-01-13 19:29                     ` Wol's lists
2018-01-13 22:40                       ` Dave Chinner
2018-01-13 23:04                         ` Wols Lists
2018-01-14 21:33                 ` Wol's lists
2018-01-15 17:08                   ` Emmanuel Florac

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A58F5FB.7000101@youngman.org.uk \
    --to=antlists@youngman.org.uk \
    --cc=david@fromorbit.com \
    --cc=eflorac@intellique.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox