implications of mixed mode

All of lore.kernel.org
 help / color / mirror / Atom feed

* implications of mixed mode
@ 2015-11-26 23:54 Lukas Pirl
  2015-11-27  2:21 ` Qu Wenruo
  2015-11-27  3:11 ` Duncan
  0 siblings, 2 replies; 6+ messages in thread
From: Lukas Pirl @ 2015-11-26 23:54 UTC (permalink / raw)
  To: linux-btrfs

Dear list,

if a larger RAID file system (say disk space of 8 TB in total) is
created in mixed mode, what are the implications?

>From reading the mailing list and the Wiki, I can think of the following:

+ less hassle with "false positive" ENOSPC
- data and metadata have to have the same replication level
  forever (e.g. RAID 1)
- higher fragmentation
  (does this reduce with no(dir)atime?)
  -> more work for autodefrag

Is that roughly what is to be expected? Any implications on recovery etc.?

In the specific case, the file system usage is as follows:
* data spread over ~20 subvolumes
  * snapshotted with various frequencies
  * compression is used
* mostly archive storage
  * write once
  * read infrequently
* ~500GB of daily rsync'ed system backup

Thanks in advance,

Lukas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: implications of mixed mode
  2015-11-26 23:54 implications of mixed mode Lukas Pirl
@ 2015-11-27  2:21 ` Qu Wenruo
  2015-11-27  5:40   ` Roman Mamedov
  2015-11-27  3:11 ` Duncan
  1 sibling, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2015-11-27  2:21 UTC (permalink / raw)
  To: Lukas Pirl, linux-btrfs



Lukas Pirl wrote on 2015/11/27 12:54 +1300:
> Dear list,
>
> if a larger RAID file system (say disk space of 8 TB in total) is
> created in mixed mode, what are the implications?
>
>  From reading the mailing list and the Wiki, I can think of the following:
>
> + less hassle with "false positive" ENOSPC

If your "false positive" means unbalanced DATA/METADATA chunk 
allocation, then yes.

> - data and metadata have to have the same replication level
>    forever (e.g. RAID 1)
> - higher fragmentation
>    (does this reduce with no(dir)atime?)
>    -> more work for autodefrag

They are also true.

And some extra pros and cons due to fixed(4K) small(compared to 16K 
default) nodesize:

+ A little higher performance
   node/leaf size is restricted to sectorsize, smaller node/leaf,
   smaller range to lock.
   In our SSD test, operations with high concurrency, the performance is
   overall 10% better than 16K nodesize.
   And in extreme metadata operation case, like high concurrency on
   sequence write into small files, it can be 8 times the performance of
   default 16K nodesize.

- Smaller subvolume size
   Since the tree block are smaller, but tree level stays the same(level
   0 - 7), the up limit of a subvolume is reduced hugely be smaller
   node/leaf size.
   Although it's quite hard to hit that up limit though.

- (Possible) less developer interest.
   Other developers are trying remove default mixed-bg, so I'd like to
   consider the trend will be less mixed-bg focused developers.
   And hidden bugs are more and more hard to hit and fixed.

>
> Is that roughly what is to be expected? Any implications on recovery etc.?

As long as your chunk tree and extent tree is OK, it shouldn't be much 
different from normal fs, at least for now.

Thanks,
Qu
>
> In the specific case, the file system usage is as follows:
> * data spread over ~20 subvolumes
>    * snapshotted with various frequencies
>    * compression is used
> * mostly archive storage
>    * write once
>    * read infrequently
> * ~500GB of daily rsync'ed system backup
>
> Thanks in advance,
>
> Lukas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: implications of mixed mode
  2015-11-26 23:54 implications of mixed mode Lukas Pirl
  2015-11-27  2:21 ` Qu Wenruo
@ 2015-11-27  3:11 ` Duncan
  2015-11-27 10:30   ` Lukas Pirl
  1 sibling, 1 reply; 6+ messages in thread
From: Duncan @ 2015-11-27  3:11 UTC (permalink / raw)
  To: linux-btrfs

Lukas Pirl posted on Fri, 27 Nov 2015 12:54:57 +1300 as excerpted:

> Dear list,
> 
> if a larger RAID file system (say disk space of 8 TB in total) is
> created in mixed mode, what are the implications?
> 
> From reading the mailing list and the Wiki, I can think of the
> following:
> 
> + less hassle with "false positive" ENOSPC
> - data and metadata have to have the same replication level
>   forever (e.g. RAID 1)
> - higher fragmentation
>   (does this reduce with no(dir)atime?)
> -> more work for autodefrag
> 
> Is that roughly what is to be expected? Any implications on recovery
> etc.?

To the best of my knowledge that looks reasonably accurate.

My big hesitancy would be over that fact that very few will run or test 
mixed-mode at TB scale filesystem level, and where they do, it's likely 
to be in ordered to work around the current (but set to soon be 
eliminated) metadata-only (no data) dup mode limit on single-device, 
since in that regard mixed-mode is treated as metadata and dup mode is 
allowed.

So you're relatively more likely to run into rarely seen scaling issues 
and perhaps bugs that nobody else has ever run into as (relatively) 
nobody else runs mixed-mode on multi-terabyte-scale btrfs.  If you want 
to be the guinea pig and make it easier for others to try later on, after 
you've flushed out the worst bugs, that's definitely one way to do it.
=:^]

> In the specific case, the file system usage is as follows:
> * data spread over ~20 subvolumes
>   * snapshotted with various frequencies
>   * compression is used
> * mostly archive storage
>   * write once
>   * read infrequently
> * ~500GB of daily rsync'ed system backup

It's worth noting that rsync... seems to stress btrfs more than pretty 
much any other common single application.  It's extremely heavy access 
pattern just seems to trigger bugs that nothing else does, and while they 
do tend to get fixed, it really does seem to push btrfs to the limits, 
and there have been a /lot/ of rsync triggered btrfs bugs reported over 
the years.

Between the stresses of rsyncing half a TiB daily and the relatively 
untested quantity that is mixed-mode btrfs at multi-terabyte scales on 
multi-devices, there's a reasonably high chance that you /will/ be 
working with the devs on various bugs for awhile.  If you're willing to 
do it, great, somebody putting the filesystem thru those kinds of mixed-
mode paces at that scale is just the sort of thing we need to get 
coverage on that particular not yet well tested corner case, but don't 
expect it to be particularly stable for a couple kernel cycles anyway, 
and after that, you'll still be running a particularly rare corner-case 
that's likely to put new code thru its paces as well, so just be aware of 
the relatively stony path you're signing up to navigate, should you 
choose to go that route.

Meanwhile, assuming you're /not/ deliberately setting out to test a 
rarely tested corner-case with stress tests known to rather too 
frequently get the best of btrfs...

Why are you considering mixed-mode here?  At that size the ENOSPC hassles 
of unmixed-mode btrfs on say single-digit GiB and below really should be 
dwarfed into insignificance, particularly since btrfs since 3.17 or so 
deletes empty chunks instead of letting them build up to the point where 
they're a problem, so what possible reason, other than simply to test it 
and cover that corner-case, could justify mixed-mode at that sort of 
scale?

Unless of course, given that you didn't mention number of devices or 
individual device size, only the 8 TB total, you have in mind a raid of 
something like 1000 8-GB USB sticks, or the like, in which case mixed-
mode on the individual sticks might make some sense (well, to the extent 
that a 1000-device raid of /anything/ makes sense! =:^), given their 8-GB 
each size.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: implications of mixed mode
  2015-11-27  2:21 ` Qu Wenruo
@ 2015-11-27  5:40   ` Roman Mamedov
  0 siblings, 0 replies; 6+ messages in thread
From: Roman Mamedov @ 2015-11-27  5:40 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Lukas Pirl, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1077 bytes --]

On Fri, 27 Nov 2015 10:21:31 +0800
Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:

> And some extra pros and cons due to fixed(4K) small(compared to 16K 
> default) nodesize:
> 
> + A little higher performance
>    node/leaf size is restricted to sectorsize, smaller node/leaf,
>    smaller range to lock.
>    In our SSD test, operations with high concurrency, the performance is
>    overall 10% better than 16K nodesize.
>    And in extreme metadata operation case, like high concurrency on
>    sequence write into small files, it can be 8 times the performance of
>    default 16K nodesize.

This is surprising to read, as I thought 16K is generally faster and that's
why the default value was changed to it from 4K.

https://oss.oracle.com/~mason/blocksizes/
https://git.kernel.org/cgit/linux/kernel/git/mason/btrfs-progs.git/commit/?id=c652e4efb8e2dd76ef1627d8cd649c6af5905902

Seems like the 16K size prevents fragmentation, but since your SSDs do not care
much about fragmentation, that's not adding a benefit for them.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: implications of mixed mode
  2015-11-27  3:11 ` Duncan
@ 2015-11-27 10:30   ` Lukas Pirl
  2015-11-28  6:08     ` Duncan
  0 siblings, 1 reply; 6+ messages in thread
From: Lukas Pirl @ 2015-11-27 10:30 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On 11/27/2015 04:11 PM, Duncan wrote as excerpted:
> My big hesitancy would be over that fact that very few will run or test 
> mixed-mode at TB scale filesystem level, and where they do, it's likely 
> to be in ordered to work around the current (but set to soon be 
> eliminated) metadata-only (no data) dup mode limit on single-device, 
> since in that regard mixed-mode is treated as metadata and dup mode is 
> allowed.
> 
> So you're relatively more likely to run into rarely seen scaling issues 
> and perhaps bugs that nobody else has ever run into as (relatively) 
> nobody else runs mixed-mode on multi-terabyte-scale btrfs.  If you want 
> to be the guinea pig and make it easier for others to try later on, after 
> you've flushed out the worst bugs, that's definitely one way to do it.
> =:^]

I see. This somehow aligns with Qu's answer.

> It's worth noting that rsync... seems to stress btrfs more than pretty 
> much any other common single application.  It's extremely heavy access 
> pattern just seems to trigger bugs that nothing else does, and while they 
> do tend to get fixed, it really does seem to push btrfs to the limits, 
> and there have been a /lot/ of rsync triggered btrfs bugs reported over 
> the years.

Well, IMHO btrfs /has/ to deal with rsync workloads if it wants to be
an alternative for larger storages but that is another story.
I do run btrfs (non-mixed) with rsync workloads for quite a while now
and it is doing well (except for the deadlock that has been around a
while ago). Maybe my network is just slow enough to not trigger any
unfixed weird issues with the intense access patterns of rsync.
Anyways, thanks for the hint!

> Between the stresses of rsyncing half a TiB daily and the relatively 
> untested quantity that is mixed-mode btrfs at multi-terabyte scales on 
> multi-devices, there's a reasonably high chance that you /will/ be 
> working with the devs on various bugs for awhile.  If you're willing to 
> do it, great, somebody putting the filesystem thru those kinds of mixed-
> mode paces at that scale is just the sort of thing we need to get 
> coverage on that particular not yet well tested corner case, but don't 
> expect it to be particularly stable for a couple kernel cycles anyway, 
> and after that, you'll still be running a particularly rare corner-case 
> that's likely to put new code thru its paces as well, so just be aware of 
> the relatively stony path you're signing up to navigate, should you 
> choose to go that route.

Makes perfect sense. I think I sadly do not have the resources to be
that guinea pig…

> Meanwhile, assuming you're /not/ deliberately setting out to test a 
> rarely tested corner-case with stress tests known to rather too 
> frequently get the best of btrfs...
> 
> Why are you considering mixed-mode here?  At that size the ENOSPC hassles 
> of unmixed-mode btrfs on say single-digit GiB and below really should be 
> dwarfed into insignificance, particularly since btrfs since 3.17 or so 
> deletes empty chunks instead of letting them build up to the point where 
> they're a problem, so what possible reason, other than simply to test it 
> and cover that corner-case, could justify mixed-mode at that sort of 
> scale?
> 
> Unless of course, given that you didn't mention number of devices or 
> individual device size, only the 8 TB total, you have in mind a raid of 
> something like 1000 8-GB USB sticks, or the like, in which case mixed-
> mode on the individual sticks might make some sense (well, to the extent 
> that a 1000-device raid of /anything/ makes sense! =:^), given their 8-GB 
> each size.

That is not the case. I just came to the consideration because I
wondered why mixed-mode is not generally preferred when data and
metadata have the same replication level.

Thanks Duncan!

Lukas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: implications of mixed mode
  2015-11-27 10:30   ` Lukas Pirl
@ 2015-11-28  6:08     ` Duncan
  0 siblings, 0 replies; 6+ messages in thread
From: Duncan @ 2015-11-28  6:08 UTC (permalink / raw)
  To: linux-btrfs

Lukas Pirl posted on Fri, 27 Nov 2015 23:30:05 +1300 as excerpted:

> On 11/27/2015 04:11 PM, Duncan wrote as excerpted:
>> My big hesitancy would be over that fact that very few will run or test
>> mixed-mode at TB scale filesystem level [s]o you're relatively more
>> likely to run into rarely seen scaling issues and perhaps bugs

>> It's worth noting that rsync... seems to stress btrfs more than pretty
>> much any other common single application.  It's extremely heavy access
>> pattern just seems to trigger bugs that nothing else does, and while
>> they do tend to get fixed, it really does seem to push btrfs to the
>> limits, and there have been a /lot/ of rsync triggered btrfs bugs
>> reported over the years.
> 
> Well, IMHO btrfs /has/ to deal with rsync workloads if it wants to be an
> alternative for larger storages but that is another story.

Yes; that's why they get fixed. =:^)  I'm simply saying it's a strong 
stressor, and on top of running the less tested setup that TB-scale mixed-
mode is, you're looking at an ideal test case for generating bugs.  If 
that's your intent...

But it seems not...

> I do run btrfs (non-mixed) with rsync workloads for quite a while now
> and it is doing well (except for the deadlock that has been around a
> while ago). Maybe my network is just slow enough to not trigger any
> unfixed weird issues with the intense access patterns of rsync. Anyways,
> thanks for the hint!

That's good to read.  Actually, given the number of rsync-triggered bugs 
fixed over the years, rsync should be reasonably solid now in the default 
case, so it's not entirely surprising that it's working well for you.  
But it's _still_ good to read, as rsync really has triggered quite a 
number of bugs over the years, so if it's now working well, that really 
does indicate btrfs is maturing and getting more solidly stable, now. =:^)

But of course if you do a less tested setup, rsyncing half a TB a day to 
it, you're putting yourself in line to find a whole /new/ set of bugs.  
That's primarily what I was saying.

> I think I sadly do not have the resources to be that guinea pig…

Makes sense.

It'd be nice to have that corner-case well tested, but because it /is/ a 
reasonably rare corner-case, it's not like millions of users are going to 
be stumbling over bugs if testing waits awhile or even never really 
happens at all.

>> Meanwhile, assuming you're /not/ deliberately setting out to test,
>> [w]hy are you considering mixed-mode here?  At that size the ENOSPC
>> hassles of unmixed-mode btrfs on single-digit GiB [small btrfs] really
>> should be dwarfed into insignificance, particularly since btrfs since
>> [autoo-empty-chunk-deletion] so what possible reason [...] could
>> justify mixed-mode at that sort of scale?

> I just came to the consideration because I wondered why mixed-mode
> is not generally preferred when data and metadata have the
> same replication level.

The most direct answer is that mixed-mode is less efficient, and that it 
really was designed for severely size constrained (under double-digit 
GiB) btrfs, where the hassles of data vs. metadata chunks really are an 
administration headache.  It was never designed for even 100 GiB btrfs, 
where the data vs. metadata hassles tend to be much less of a headache, 
such that the inefficiency of mixed-mode tends to matter more than the 
much more minor data vs. metadata hassles.

Compounding that are two additional factors, the first being that unlike 
when mixed-mode was introduced, btrfs now deletes empty chunks, as well 
as being a bit better at allocating smaller chunks as size gets tight, so 
data vs. metadata hassles are at least an order of magnitude lower than 
they were (many will never see ENOSPC due to poor data/metadata balance 
at all now, while previously, it was generally only a matter of time if 
people weren't routinely rebalancing to prevent it).

The second additional factor is that mixed-mode gets /much/ less testing 
at 100 GiB and above, because it simply wasn't designed for that and the 
devs just don't test it, which means you're probably at least doubling 
your chance of bugs.  For that factor alone, many would actively negative-
recommend it, except for people who are really prepared to be guinea 
pigs, and indeed, that's exactly what you've seen here.

But it's a reasonable question, and now you have reasonable answers. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-11-28  6:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-26 23:54 implications of mixed mode Lukas Pirl
2015-11-27  2:21 ` Qu Wenruo
2015-11-27  5:40   ` Roman Mamedov
2015-11-27  3:11 ` Duncan
2015-11-27 10:30   ` Lukas Pirl
2015-11-28  6:08     ` Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.