Btrfsck memory usage reduce idea

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Btrfsck memory usage reduce idea
@ 2016-03-07  5:42 Qu Wenruo
  2016-03-08  8:28 ` Satoru Takeuchi
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2016-03-07  5:42 UTC (permalink / raw)
  To: btrfs

Hi,

As many have already known, "btrfs check" is a memory eater.

The problem is, btrfsck checks extent tree in a very comprehensive method.
1) Create extent_ref for each extent item with backref
2) Iterate all other trees to add extent ref
3) If one extent_ref with all ref/backref matches, it's deleted.

The method is good, can found any extent mismatch problem when checking 
extent tree. (Although it has already iterated the whole fs)
For a large enough filesystem, it may have tegas of extents, and memory 
is easy eaten up.

We hope to fix it in the following method:
1) Only check extent backref when iterating extent tree
    Unlike current implement, we check one extent item and its backref
    only.

    If one backref can't be reached, then it's an error and output (or
    try to fix).
    After iterating all backref of an extent item, all related memory is
    freed and we won't bother recording anything for later use.

    That's to say, we only care backref mismatch case when checking
    extent tree.
    Case like missing EXTENT_ITEM for some extent is not checked here.

2) Check extent ref while iterating other trees
    We only check forward-ref while iterating one tree.

    In this step, we only check forward-ref, so we can find the remaining
    problem like missing EXTENT_ITEM for given extent.

Any further advice/suggestion? Or is there anyone already doing such work?

Thanks,
Qu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Btrfsck memory usage reduce idea
  2016-03-07  5:42 Btrfsck memory usage reduce idea Qu Wenruo
@ 2016-03-08  8:28 ` Satoru Takeuchi
  2016-03-08  8:46   ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Satoru Takeuchi @ 2016-03-08  8:28 UTC (permalink / raw)
  To: Qu Wenruo, btrfs

Hi Qu,

On 2016/03/07 14:42, Qu Wenruo wrote:
> Hi,
>
> As many have already known, "btrfs check" is a memory eater.
>
> The problem is, btrfsck checks extent tree in a very comprehensive method.
> 1) Create extent_ref for each extent item with backref
> 2) Iterate all other trees to add extent ref
> 3) If one extent_ref with all ref/backref matches, it's deleted.
>
> The method is good, can found any extent mismatch problem when checking extent tree. (Although it has already iterated the whole fs)
> For a large enough filesystem, it may have tegas of extents, and memory is easy eaten up.
>
>
> We hope to fix it in the following method:
> 1) Only check extent backref when iterating extent tree
>     Unlike current implement, we check one extent item and its backref
>     only.
>
>     If one backref can't be reached, then it's an error and output (or
>     try to fix).
>     After iterating all backref of an extent item, all related memory is
>     freed and we won't bother recording anything for later use.
>
>     That's to say, we only care backref mismatch case when checking
>     extent tree.
>     Case like missing EXTENT_ITEM for some extent is not checked here.
>
> 2) Check extent ref while iterating other trees
>     We only check forward-ref while iterating one tree.
>
>     In this step, we only check forward-ref, so we can find the remaining
>     problem like missing EXTENT_ITEM for given extent.
>
> Any further advice/suggestion? Or is there anyone already doing such work?

Thank you for your effort. I have basic questions.

1. Could you tell me what you'd like to do?

    a) Provide completely the same function with current
       implementation by other, more efficient way.
    b) Replace the current implementation with the quicker
       one that provides the limited function.
    c) Any other

2. Do you have the estimation that how long does the
    new algorithm take compare with the current one?

    # Of course, "currently not sure" is OK at this stage.

    I'm interested in it because there is the trade-off
    between speed and memory consumption in many case,
    and btrfsck takes very long time with a large filesystem.

Thanks,
Satoru

>
> Thanks,
> Qu
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Btrfsck memory usage reduce idea
  2016-03-08  8:28 ` Satoru Takeuchi
@ 2016-03-08  8:46   ` Qu Wenruo
  2016-03-09  0:13     ` Satoru Takeuchi
  2016-03-18 18:18     ` David Sterba
  0 siblings, 2 replies; 9+ messages in thread
From: Qu Wenruo @ 2016-03-08  8:46 UTC (permalink / raw)
  To: Satoru Takeuchi, btrfs



Satoru Takeuchi wrote on 2016/03/08 17:28 +0900:
> Hi Qu,
>
> On 2016/03/07 14:42, Qu Wenruo wrote:
>> Hi,
>>
>> As many have already known, "btrfs check" is a memory eater.
>>
>> The problem is, btrfsck checks extent tree in a very comprehensive
>> method.
>> 1) Create extent_ref for each extent item with backref
>> 2) Iterate all other trees to add extent ref
>> 3) If one extent_ref with all ref/backref matches, it's deleted.
>>
>> The method is good, can found any extent mismatch problem when
>> checking extent tree. (Although it has already iterated the whole fs)
>> For a large enough filesystem, it may have tegas of extents, and
>> memory is easy eaten up.
>>
>>
>> We hope to fix it in the following method:
>> 1) Only check extent backref when iterating extent tree
>>     Unlike current implement, we check one extent item and its backref
>>     only.
>>
>>     If one backref can't be reached, then it's an error and output (or
>>     try to fix).
>>     After iterating all backref of an extent item, all related memory is
>>     freed and we won't bother recording anything for later use.
>>
>>     That's to say, we only care backref mismatch case when checking
>>     extent tree.
>>     Case like missing EXTENT_ITEM for some extent is not checked here.
>>
>> 2) Check extent ref while iterating other trees
>>     We only check forward-ref while iterating one tree.
>>
>>     In this step, we only check forward-ref, so we can find the remaining
>>     problem like missing EXTENT_ITEM for given extent.
>>
>> Any further advice/suggestion? Or is there anyone already doing such
>> work?
>
> Thank you for your effort. I have basic questions.
>
> 1. Could you tell me what you'd like to do?
>
>     a) Provide completely the same function with current
>        implementation by other, more efficient way.

Same function, but less efficient.
It may takes longer time, more IO, but less memory.

And some error message will be output at different time.
E.g, error message for missing backref may be output at fs tree checking 
time, instead of extent tree checking time.

>     b) Replace the current implementation with the quicker
>        one that provides the limited function.
>     c) Any other
>
> 2. Do you have the estimation that how long does the
>     new algorithm take compare with the current one?

Depends on the fs hierarchy. But in all case, IO will be more than 
original implement.

The most efficient case would be, one subvolume and no dedup file.
(which means one file extent refer to one extent on data, no in-band or
out-band dedup).

In that case, old implement will iterate the whole metadata twice,
and new implement will iterate the whole metadata twice + extra.

For worst case, like inband dedup with multiple almost identical 
snapshot, things will be much much slower, more IO, more tree search, 
maybe O(n^2) or more. But memory usage should not be much different though.

In short, use more IO to trade for memory.

Anyway, for a large fs, it won't be possible to take a short time for a 
comprehensive fsck.

Thanks,
Qu

>     # Of course, "currently not sure" is OK at this stage.
>
>     I'm interested in it because there is the trade-off
>     between speed and memory consumption in many case,
>     and btrfsck takes very long time with a large filesystem.
>
> Thanks,
> Satoru
>
>>
>> Thanks,
>> Qu
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Btrfsck memory usage reduce idea
  2016-03-08  8:46   ` Qu Wenruo
@ 2016-03-09  0:13     ` Satoru Takeuchi
  2016-03-18 18:18     ` David Sterba
  1 sibling, 0 replies; 9+ messages in thread
From: Satoru Takeuchi @ 2016-03-09  0:13 UTC (permalink / raw)
  To: Qu Wenruo, btrfs

On 2016/03/08 17:46, Qu Wenruo wrote:
>
> Satoru Takeuchi wrote on 2016/03/08 17:28 +0900:
>> Hi Qu,
>>
>> On 2016/03/07 14:42, Qu Wenruo wrote:
>>> Hi,
>>>
>>> As many have already known, "btrfs check" is a memory eater.
>>>
>>> The problem is, btrfsck checks extent tree in a very comprehensive
>>> method.
>>> 1) Create extent_ref for each extent item with backref
>>> 2) Iterate all other trees to add extent ref
>>> 3) If one extent_ref with all ref/backref matches, it's deleted.
>>>
>>> The method is good, can found any extent mismatch problem when
>>> checking extent tree. (Although it has already iterated the whole fs)
>>> For a large enough filesystem, it may have tegas of extents, and
>>> memory is easy eaten up.
>>>
>>>
>>> We hope to fix it in the following method:
>>> 1) Only check extent backref when iterating extent tree
>>>     Unlike current implement, we check one extent item and its backref
>>>     only.
>>>
>>>     If one backref can't be reached, then it's an error and output (or
>>>     try to fix).
>>>     After iterating all backref of an extent item, all related memory is
>>>     freed and we won't bother recording anything for later use.
>>>
>>>     That's to say, we only care backref mismatch case when checking
>>>     extent tree.
>>>     Case like missing EXTENT_ITEM for some extent is not checked here.
>>>
>>> 2) Check extent ref while iterating other trees
>>>     We only check forward-ref while iterating one tree.
>>>
>>>     In this step, we only check forward-ref, so we can find the remaining
>>>     problem like missing EXTENT_ITEM for given extent.
>>>
>>> Any further advice/suggestion? Or is there anyone already doing such
>>> work?
>>
>> Thank you for your effort. I have basic questions.
>>
>> 1. Could you tell me what you'd like to do?
>>
>>     a) Provide completely the same function with current
>>        implementation by other, more efficient way.
>
> Same function, but less efficient.
> It may takes longer time, more IO, but less memory.

I see.

>
> And some error message will be output at different time.
> E.g, error message for missing backref may be output at fs tree checking time, instead of extent tree checking time.

It's OK if, finally, all error messages the same as the current
implementation are shown.

>
>>     b) Replace the current implementation with the quicker
>>        one that provides the limited function.
>>     c) Any other
>>
>> 2. Do you have the estimation that how long does the
>>     new algorithm take compare with the current one?
>
> Depends on the fs hierarchy. But in all case, IO will be more than original implement.
>
> The most efficient case would be, one subvolume and no dedup file.
> (which means one file extent refer to one extent on data, no in-band or
> out-band dedup).
>
> In that case, old implement will iterate the whole metadata twice,
> and new implement will iterate the whole metadata twice + extra.
>
> For worst case, like inband dedup with multiple almost identical snapshot,
 > things will be much much slower, more IO, more tree search, maybe O(n^2)
 > or more. But memory usage should not be much different though.
>
> In short, use more IO to trade for memory.
>
> Anyway, for a large fs, it won't be possible to take a short time for a comprehensive fsck.

Got it.

Thanks,
Satoru

>
> Thanks,
> Qu
>
>>     # Of course, "currently not sure" is OK at this stage.
>>
>>     I'm interested in it because there is the trade-off
>>     between speed and memory consumption in many case,
>>     and btrfsck takes very long time with a large filesystem.
>>
>> Thanks,
>> Satoru
>>
>>>
>>> Thanks,
>>> Qu
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Btrfsck memory usage reduce idea
  2016-03-08  8:46   ` Qu Wenruo
  2016-03-09  0:13     ` Satoru Takeuchi
@ 2016-03-18 18:18     ` David Sterba
  2016-03-21  2:15       ` Qu Wenruo
  1 sibling, 1 reply; 9+ messages in thread
From: David Sterba @ 2016-03-18 18:18 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Satoru Takeuchi, btrfs

On Tue, Mar 08, 2016 at 04:46:41PM +0800, Qu Wenruo wrote:
> > 1. Could you tell me what you'd like to do?
> >
> >     a) Provide completely the same function with current
> >        implementation by other, more efficient way.
> 
> Same function, but less efficient.
> It may takes longer time, more IO, but less memory.

IOW, there will be two options for the use to choose from, right? That's
what I'd expect. Be able to check the filesystem on a machine with less
memory at the cost of IO, but also do the faster check on a different
machine.

> And some error message will be output at different time.
> E.g, error message for missing backref may be output at fs tree checking 
> time, instead of extent tree checking time.

This is acceptable.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Btrfsck memory usage reduce idea
  2016-03-18 18:18     ` David Sterba
@ 2016-03-21  2:15       ` Qu Wenruo
  2016-03-21  9:43         ` Duncan
  2016-03-22 14:49         ` David Sterba
  0 siblings, 2 replies; 9+ messages in thread
From: Qu Wenruo @ 2016-03-21  2:15 UTC (permalink / raw)
  To: dsterba, Satoru Takeuchi, btrfs

David Sterba wrote on 2016/03/18 19:18 +0100:
> On Tue, Mar 08, 2016 at 04:46:41PM +0800, Qu Wenruo wrote:
>>> 1. Could you tell me what you'd like to do?
>>>
>>>      a) Provide completely the same function with current
>>>         implementation by other, more efficient way.
>>
>> Same function, but less efficient.
>> It may takes longer time, more IO, but less memory.
>
> IOW, there will be two options for the use to choose from, right? That's
> what I'd expect. Be able to check the filesystem on a machine with less
> memory at the cost of IO, but also do the faster check on a different
> machine.
>

I was planning to use the new extent tree check to replace current one, 
as a rework.
Am I always reworking things? :)

The point that I didn't want to keep the current behavior is, the old 
one is just OK or OOM, no one would know if it will OOM until it happens.

But the new one would be much flex than current behavior.
As it fully uses the IO cache provided by kernel.

For machine with lot of memory, the whole IO will be cached except the 
first read.
Making the difference between new and old implementation quite small.

For machine with less memory or the fs is just too large, at least 
btrfsck won't cause OOM.

Another point is, the new implementation would be smaller and simpler.
For non performance critical case, the simpler code the better.

But that's not settled yet, maybe just like 
multi-snapshot+quota+rebalance case, the old implementation may be times 
faster on some special case.
So my teammate and I will keeps the old one until enough feedback.

Thanks,
Qu

>> And some error message will be output at different time.
>> E.g, error message for missing backref may be output at fs tree checking
>> time, instead of extent tree checking time.
>
> This is acceptable.
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Btrfsck memory usage reduce idea
  2016-03-21  2:15       ` Qu Wenruo
@ 2016-03-21  9:43         ` Duncan
  2016-03-22 14:49         ` David Sterba
  1 sibling, 0 replies; 9+ messages in thread
From: Duncan @ 2016-03-21  9:43 UTC (permalink / raw)
  To: linux-btrfs

Qu Wenruo posted on Mon, 21 Mar 2016 10:15:55 +0800 as excerpted:

> The point that I didn't want to keep the current behavior is, the old
> one is just OK or OOM, no one would know if it will OOM until it
> happens.
> 
> But the new one would be much flex than current behavior.
> As it fully uses the IO cache provided by kernel.
> 
> For machine with lot of memory, the whole IO will be cached except the
> first read.
> Making the difference between new and old implementation quite small.
> 
> For machine with less memory or the fs is just too large, at least
> btrfsck won't cause OOM.

That makes more sense to me, now.

I didn't initially post but I was worried about the tradeoff to much 
longer times, tho with lower memory requirements, on operations that can 
already take a long time.  But if as you say here the new, lower memory 
version makes use of cache so there shouldn't be a lot of difference on 
higher memory machines anyway, then yes, replacing the current 
implementation instead of providing an option so either implementation 
can be used, makes more sense, because otherwise one option will get more 
testing than the other, fixes to one might not carry over to the other, 
etc.

Tho as you suggest, possibly with a transition period during which both 
implementations are available, to work out any differences in results and/
or dramatically worse runtime cases.

Thanks. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Btrfsck memory usage reduce idea
  2016-03-21  2:15       ` Qu Wenruo
  2016-03-21  9:43         ` Duncan
@ 2016-03-22 14:49         ` David Sterba
  2016-03-23  1:12           ` Qu Wenruo
  1 sibling, 1 reply; 9+ messages in thread
From: David Sterba @ 2016-03-22 14:49 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Satoru Takeuchi, btrfs

On Mon, Mar 21, 2016 at 10:15:55AM +0800, Qu Wenruo wrote:
> > IOW, there will be two options for the use to choose from, right? That's
> > what I'd expect. Be able to check the filesystem on a machine with less
> > memory at the cost of IO, but also do the faster check on a different
> > machine.
> 
> I was planning to use the new extent tree check to replace current one, 
> as a rework.
> Am I always reworking things? :)

The problem with big reworks is that there are few people willing to
review them. So I'm not against doing such changes, especially in this
case it would be welcome, but I'm afraid that it could end up stalled
similar to the convert rewrite.

> The point that I didn't want to keep the current behavior is, the old 
> one is just OK or OOM, no one would know if it will OOM until it happens.
> 
> But the new one would be much flex than current behavior.
> As it fully uses the IO cache provided by kernel.

That's a good point, for the single implementation.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Btrfsck memory usage reduce idea
  2016-03-22 14:49         ` David Sterba
@ 2016-03-23  1:12           ` Qu Wenruo
  0 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2016-03-23  1:12 UTC (permalink / raw)
  To: dsterba, Satoru Takeuchi, btrfs



David Sterba wrote on 2016/03/22 15:49 +0100:
> On Mon, Mar 21, 2016 at 10:15:55AM +0800, Qu Wenruo wrote:
>>> IOW, there will be two options for the use to choose from, right? That's
>>> what I'd expect. Be able to check the filesystem on a machine with less
>>> memory at the cost of IO, but also do the faster check on a different
>>> machine.
>>
>> I was planning to use the new extent tree check to replace current one,
>> as a rework.
>> Am I always reworking things? :)
>
> The problem with big reworks is that there are few people willing to
> review them. So I'm not against doing such changes, especially in this
> case it would be welcome, but I'm afraid that it could end up stalled
> similar to the convert rewrite.

So for convert rework, unless some other developer reviews the patchset, 
it won't be merged, right?


To avoid the same problem, what about submitting small patchsets and 
replace extent tree fsck codes part by part?
(Although not sure if it's possible)

Reviewers would be much more happy reviewing 5 patches for 5 times, 
other than reviewing a big 25 patchset.

Thanks,
Qu

>
>> The point that I didn't want to keep the current behavior is, the old
>> one is just OK or OOM, no one would know if it will OOM until it happens.
>>
>> But the new one would be much flex than current behavior.
>> As it fully uses the IO cache provided by kernel.
>
> That's a good point, for the single implementation.
>
>



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-03-23  1:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-07  5:42 Btrfsck memory usage reduce idea Qu Wenruo
2016-03-08  8:28 ` Satoru Takeuchi
2016-03-08  8:46   ` Qu Wenruo
2016-03-09  0:13     ` Satoru Takeuchi
2016-03-18 18:18     ` David Sterba
2016-03-21  2:15       ` Qu Wenruo
2016-03-21  9:43         ` Duncan
2016-03-22 14:49         ` David Sterba
2016-03-23  1:12           ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).