* implications of mixed mode
@ 2015-11-26 23:54 Lukas Pirl
2015-11-27 2:21 ` Qu Wenruo
2015-11-27 3:11 ` Duncan
0 siblings, 2 replies; 6+ messages in thread
From: Lukas Pirl @ 2015-11-26 23:54 UTC (permalink / raw)
To: linux-btrfs
Dear list,
if a larger RAID file system (say disk space of 8 TB in total) is
created in mixed mode, what are the implications?
>From reading the mailing list and the Wiki, I can think of the following:
+ less hassle with "false positive" ENOSPC
- data and metadata have to have the same replication level
forever (e.g. RAID 1)
- higher fragmentation
(does this reduce with no(dir)atime?)
-> more work for autodefrag
Is that roughly what is to be expected? Any implications on recovery etc.?
In the specific case, the file system usage is as follows:
* data spread over ~20 subvolumes
* snapshotted with various frequencies
* compression is used
* mostly archive storage
* write once
* read infrequently
* ~500GB of daily rsync'ed system backup
Thanks in advance,
Lukas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: implications of mixed mode
2015-11-26 23:54 implications of mixed mode Lukas Pirl
@ 2015-11-27 2:21 ` Qu Wenruo
2015-11-27 5:40 ` Roman Mamedov
2015-11-27 3:11 ` Duncan
1 sibling, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2015-11-27 2:21 UTC (permalink / raw)
To: Lukas Pirl, linux-btrfs
Lukas Pirl wrote on 2015/11/27 12:54 +1300:
> Dear list,
>
> if a larger RAID file system (say disk space of 8 TB in total) is
> created in mixed mode, what are the implications?
>
> From reading the mailing list and the Wiki, I can think of the following:
>
> + less hassle with "false positive" ENOSPC
If your "false positive" means unbalanced DATA/METADATA chunk
allocation, then yes.
> - data and metadata have to have the same replication level
> forever (e.g. RAID 1)
> - higher fragmentation
> (does this reduce with no(dir)atime?)
> -> more work for autodefrag
They are also true.
And some extra pros and cons due to fixed(4K) small(compared to 16K
default) nodesize:
+ A little higher performance
node/leaf size is restricted to sectorsize, smaller node/leaf,
smaller range to lock.
In our SSD test, operations with high concurrency, the performance is
overall 10% better than 16K nodesize.
And in extreme metadata operation case, like high concurrency on
sequence write into small files, it can be 8 times the performance of
default 16K nodesize.
- Smaller subvolume size
Since the tree block are smaller, but tree level stays the same(level
0 - 7), the up limit of a subvolume is reduced hugely be smaller
node/leaf size.
Although it's quite hard to hit that up limit though.
- (Possible) less developer interest.
Other developers are trying remove default mixed-bg, so I'd like to
consider the trend will be less mixed-bg focused developers.
And hidden bugs are more and more hard to hit and fixed.
>
> Is that roughly what is to be expected? Any implications on recovery etc.?
As long as your chunk tree and extent tree is OK, it shouldn't be much
different from normal fs, at least for now.
Thanks,
Qu
>
> In the specific case, the file system usage is as follows:
> * data spread over ~20 subvolumes
> * snapshotted with various frequencies
> * compression is used
> * mostly archive storage
> * write once
> * read infrequently
> * ~500GB of daily rsync'ed system backup
>
> Thanks in advance,
>
> Lukas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: implications of mixed mode
2015-11-26 23:54 implications of mixed mode Lukas Pirl
2015-11-27 2:21 ` Qu Wenruo
@ 2015-11-27 3:11 ` Duncan
2015-11-27 10:30 ` Lukas Pirl
1 sibling, 1 reply; 6+ messages in thread
From: Duncan @ 2015-11-27 3:11 UTC (permalink / raw)
To: linux-btrfs
Lukas Pirl posted on Fri, 27 Nov 2015 12:54:57 +1300 as excerpted:
> Dear list,
>
> if a larger RAID file system (say disk space of 8 TB in total) is
> created in mixed mode, what are the implications?
>
> From reading the mailing list and the Wiki, I can think of the
> following:
>
> + less hassle with "false positive" ENOSPC
> - data and metadata have to have the same replication level
> forever (e.g. RAID 1)
> - higher fragmentation
> (does this reduce with no(dir)atime?)
> -> more work for autodefrag
>
> Is that roughly what is to be expected? Any implications on recovery
> etc.?
To the best of my knowledge that looks reasonably accurate.
My big hesitancy would be over that fact that very few will run or test
mixed-mode at TB scale filesystem level, and where they do, it's likely
to be in ordered to work around the current (but set to soon be
eliminated) metadata-only (no data) dup mode limit on single-device,
since in that regard mixed-mode is treated as metadata and dup mode is
allowed.
So you're relatively more likely to run into rarely seen scaling issues
and perhaps bugs that nobody else has ever run into as (relatively)
nobody else runs mixed-mode on multi-terabyte-scale btrfs. If you want
to be the guinea pig and make it easier for others to try later on, after
you've flushed out the worst bugs, that's definitely one way to do it.
=:^]
> In the specific case, the file system usage is as follows:
> * data spread over ~20 subvolumes
> * snapshotted with various frequencies
> * compression is used
> * mostly archive storage
> * write once
> * read infrequently
> * ~500GB of daily rsync'ed system backup
It's worth noting that rsync... seems to stress btrfs more than pretty
much any other common single application. It's extremely heavy access
pattern just seems to trigger bugs that nothing else does, and while they
do tend to get fixed, it really does seem to push btrfs to the limits,
and there have been a /lot/ of rsync triggered btrfs bugs reported over
the years.
Between the stresses of rsyncing half a TiB daily and the relatively
untested quantity that is mixed-mode btrfs at multi-terabyte scales on
multi-devices, there's a reasonably high chance that you /will/ be
working with the devs on various bugs for awhile. If you're willing to
do it, great, somebody putting the filesystem thru those kinds of mixed-
mode paces at that scale is just the sort of thing we need to get
coverage on that particular not yet well tested corner case, but don't
expect it to be particularly stable for a couple kernel cycles anyway,
and after that, you'll still be running a particularly rare corner-case
that's likely to put new code thru its paces as well, so just be aware of
the relatively stony path you're signing up to navigate, should you
choose to go that route.
Meanwhile, assuming you're /not/ deliberately setting out to test a
rarely tested corner-case with stress tests known to rather too
frequently get the best of btrfs...
Why are you considering mixed-mode here? At that size the ENOSPC hassles
of unmixed-mode btrfs on say single-digit GiB and below really should be
dwarfed into insignificance, particularly since btrfs since 3.17 or so
deletes empty chunks instead of letting them build up to the point where
they're a problem, so what possible reason, other than simply to test it
and cover that corner-case, could justify mixed-mode at that sort of
scale?
Unless of course, given that you didn't mention number of devices or
individual device size, only the 8 TB total, you have in mind a raid of
something like 1000 8-GB USB sticks, or the like, in which case mixed-
mode on the individual sticks might make some sense (well, to the extent
that a 1000-device raid of /anything/ makes sense! =:^), given their 8-GB
each size.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: implications of mixed mode
2015-11-27 2:21 ` Qu Wenruo
@ 2015-11-27 5:40 ` Roman Mamedov
0 siblings, 0 replies; 6+ messages in thread
From: Roman Mamedov @ 2015-11-27 5:40 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Lukas Pirl, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 1077 bytes --]
On Fri, 27 Nov 2015 10:21:31 +0800
Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> And some extra pros and cons due to fixed(4K) small(compared to 16K
> default) nodesize:
>
> + A little higher performance
> node/leaf size is restricted to sectorsize, smaller node/leaf,
> smaller range to lock.
> In our SSD test, operations with high concurrency, the performance is
> overall 10% better than 16K nodesize.
> And in extreme metadata operation case, like high concurrency on
> sequence write into small files, it can be 8 times the performance of
> default 16K nodesize.
This is surprising to read, as I thought 16K is generally faster and that's
why the default value was changed to it from 4K.
https://oss.oracle.com/~mason/blocksizes/
https://git.kernel.org/cgit/linux/kernel/git/mason/btrfs-progs.git/commit/?id=c652e4efb8e2dd76ef1627d8cd649c6af5905902
Seems like the 16K size prevents fragmentation, but since your SSDs do not care
much about fragmentation, that's not adding a benefit for them.
--
With respect,
Roman
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: implications of mixed mode
2015-11-27 3:11 ` Duncan
@ 2015-11-27 10:30 ` Lukas Pirl
2015-11-28 6:08 ` Duncan
0 siblings, 1 reply; 6+ messages in thread
From: Lukas Pirl @ 2015-11-27 10:30 UTC (permalink / raw)
To: Duncan; +Cc: linux-btrfs
On 11/27/2015 04:11 PM, Duncan wrote as excerpted:
> My big hesitancy would be over that fact that very few will run or test
> mixed-mode at TB scale filesystem level, and where they do, it's likely
> to be in ordered to work around the current (but set to soon be
> eliminated) metadata-only (no data) dup mode limit on single-device,
> since in that regard mixed-mode is treated as metadata and dup mode is
> allowed.
>
> So you're relatively more likely to run into rarely seen scaling issues
> and perhaps bugs that nobody else has ever run into as (relatively)
> nobody else runs mixed-mode on multi-terabyte-scale btrfs. If you want
> to be the guinea pig and make it easier for others to try later on, after
> you've flushed out the worst bugs, that's definitely one way to do it.
> =:^]
I see. This somehow aligns with Qu's answer.
> It's worth noting that rsync... seems to stress btrfs more than pretty
> much any other common single application. It's extremely heavy access
> pattern just seems to trigger bugs that nothing else does, and while they
> do tend to get fixed, it really does seem to push btrfs to the limits,
> and there have been a /lot/ of rsync triggered btrfs bugs reported over
> the years.
Well, IMHO btrfs /has/ to deal with rsync workloads if it wants to be
an alternative for larger storages but that is another story.
I do run btrfs (non-mixed) with rsync workloads for quite a while now
and it is doing well (except for the deadlock that has been around a
while ago). Maybe my network is just slow enough to not trigger any
unfixed weird issues with the intense access patterns of rsync.
Anyways, thanks for the hint!
> Between the stresses of rsyncing half a TiB daily and the relatively
> untested quantity that is mixed-mode btrfs at multi-terabyte scales on
> multi-devices, there's a reasonably high chance that you /will/ be
> working with the devs on various bugs for awhile. If you're willing to
> do it, great, somebody putting the filesystem thru those kinds of mixed-
> mode paces at that scale is just the sort of thing we need to get
> coverage on that particular not yet well tested corner case, but don't
> expect it to be particularly stable for a couple kernel cycles anyway,
> and after that, you'll still be running a particularly rare corner-case
> that's likely to put new code thru its paces as well, so just be aware of
> the relatively stony path you're signing up to navigate, should you
> choose to go that route.
Makes perfect sense. I think I sadly do not have the resources to be
that guinea pig…
> Meanwhile, assuming you're /not/ deliberately setting out to test a
> rarely tested corner-case with stress tests known to rather too
> frequently get the best of btrfs...
>
> Why are you considering mixed-mode here? At that size the ENOSPC hassles
> of unmixed-mode btrfs on say single-digit GiB and below really should be
> dwarfed into insignificance, particularly since btrfs since 3.17 or so
> deletes empty chunks instead of letting them build up to the point where
> they're a problem, so what possible reason, other than simply to test it
> and cover that corner-case, could justify mixed-mode at that sort of
> scale?
>
> Unless of course, given that you didn't mention number of devices or
> individual device size, only the 8 TB total, you have in mind a raid of
> something like 1000 8-GB USB sticks, or the like, in which case mixed-
> mode on the individual sticks might make some sense (well, to the extent
> that a 1000-device raid of /anything/ makes sense! =:^), given their 8-GB
> each size.
That is not the case. I just came to the consideration because I
wondered why mixed-mode is not generally preferred when data and
metadata have the same replication level.
Thanks Duncan!
Lukas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: implications of mixed mode
2015-11-27 10:30 ` Lukas Pirl
@ 2015-11-28 6:08 ` Duncan
0 siblings, 0 replies; 6+ messages in thread
From: Duncan @ 2015-11-28 6:08 UTC (permalink / raw)
To: linux-btrfs
Lukas Pirl posted on Fri, 27 Nov 2015 23:30:05 +1300 as excerpted:
> On 11/27/2015 04:11 PM, Duncan wrote as excerpted:
>> My big hesitancy would be over that fact that very few will run or test
>> mixed-mode at TB scale filesystem level [s]o you're relatively more
>> likely to run into rarely seen scaling issues and perhaps bugs
>> It's worth noting that rsync... seems to stress btrfs more than pretty
>> much any other common single application. It's extremely heavy access
>> pattern just seems to trigger bugs that nothing else does, and while
>> they do tend to get fixed, it really does seem to push btrfs to the
>> limits, and there have been a /lot/ of rsync triggered btrfs bugs
>> reported over the years.
>
> Well, IMHO btrfs /has/ to deal with rsync workloads if it wants to be an
> alternative for larger storages but that is another story.
Yes; that's why they get fixed. =:^) I'm simply saying it's a strong
stressor, and on top of running the less tested setup that TB-scale mixed-
mode is, you're looking at an ideal test case for generating bugs. If
that's your intent...
But it seems not...
> I do run btrfs (non-mixed) with rsync workloads for quite a while now
> and it is doing well (except for the deadlock that has been around a
> while ago). Maybe my network is just slow enough to not trigger any
> unfixed weird issues with the intense access patterns of rsync. Anyways,
> thanks for the hint!
That's good to read. Actually, given the number of rsync-triggered bugs
fixed over the years, rsync should be reasonably solid now in the default
case, so it's not entirely surprising that it's working well for you.
But it's _still_ good to read, as rsync really has triggered quite a
number of bugs over the years, so if it's now working well, that really
does indicate btrfs is maturing and getting more solidly stable, now. =:^)
But of course if you do a less tested setup, rsyncing half a TB a day to
it, you're putting yourself in line to find a whole /new/ set of bugs.
That's primarily what I was saying.
> I think I sadly do not have the resources to be that guinea pig…
Makes sense.
It'd be nice to have that corner-case well tested, but because it /is/ a
reasonably rare corner-case, it's not like millions of users are going to
be stumbling over bugs if testing waits awhile or even never really
happens at all.
>> Meanwhile, assuming you're /not/ deliberately setting out to test,
>> [w]hy are you considering mixed-mode here? At that size the ENOSPC
>> hassles of unmixed-mode btrfs on single-digit GiB [small btrfs] really
>> should be dwarfed into insignificance, particularly since btrfs since
>> [autoo-empty-chunk-deletion] so what possible reason [...] could
>> justify mixed-mode at that sort of scale?
> I just came to the consideration because I wondered why mixed-mode
> is not generally preferred when data and metadata have the
> same replication level.
The most direct answer is that mixed-mode is less efficient, and that it
really was designed for severely size constrained (under double-digit
GiB) btrfs, where the hassles of data vs. metadata chunks really are an
administration headache. It was never designed for even 100 GiB btrfs,
where the data vs. metadata hassles tend to be much less of a headache,
such that the inefficiency of mixed-mode tends to matter more than the
much more minor data vs. metadata hassles.
Compounding that are two additional factors, the first being that unlike
when mixed-mode was introduced, btrfs now deletes empty chunks, as well
as being a bit better at allocating smaller chunks as size gets tight, so
data vs. metadata hassles are at least an order of magnitude lower than
they were (many will never see ENOSPC due to poor data/metadata balance
at all now, while previously, it was generally only a matter of time if
people weren't routinely rebalancing to prevent it).
The second additional factor is that mixed-mode gets /much/ less testing
at 100 GiB and above, because it simply wasn't designed for that and the
devs just don't test it, which means you're probably at least doubling
your chance of bugs. For that factor alone, many would actively negative-
recommend it, except for people who are really prepared to be guinea
pigs, and indeed, that's exactly what you've seen here.
But it's a reasonable question, and now you have reasonable answers. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-11-28 6:09 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-26 23:54 implications of mixed mode Lukas Pirl
2015-11-27 2:21 ` Qu Wenruo
2015-11-27 5:40 ` Roman Mamedov
2015-11-27 3:11 ` Duncan
2015-11-27 10:30 ` Lukas Pirl
2015-11-28 6:08 ` Duncan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.