Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* [PATCH 0/3] btrfs: improve sibling keys check failure report
@ 2023-04-26 10:51 fdmanana
  2023-04-26 10:51 ` [PATCH 1/3] btrfs: abort transaction when sibling keys check fails for leaves fdmanana
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: fdmanana @ 2023-04-26 10:51 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

This makes reporting of sibling key check failures more useful, by
printing more exact stack traces in case the extent buffers are leaves
and dumping the extent buffers. This is motivated by a recent report [1].
More details on the changelogs of each patch.

[1] https://lore.kernel.org/linux-btrfs/20230423222213.5391.409509F4@e16-tech.com/

Filipe Manana (3):
  btrfs: abort transaction when sibling keys check fails for leaves
  btrfs: print extent buffers when sibling keys check fails
  btrfs: tag as unlikely the key comparison when checking sibling keys

 fs/btrfs/ctree.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] btrfs: abort transaction when sibling keys check fails for leaves
  2023-04-26 10:51 [PATCH 0/3] btrfs: improve sibling keys check failure report fdmanana
@ 2023-04-26 10:51 ` fdmanana
  2023-04-26 11:18   ` Qu Wenruo
  2023-04-26 10:51 ` [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails fdmanana
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: fdmanana @ 2023-04-26 10:51 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

If the sibling keys check fails before we move keys from one sibling
leaf to another, we are not aborting the transaction - we leave that to
some higher level caller of btrfs_search_slot() (or anything else that
uses it to insert items into a b+tree).

This means that the transaction abort will provide a stack trace that
omits the b+tree modification call chain. So change this to immediately
abort the transaction and therefore get a more useful stack trace that
shows us the call chain in the bt+tree modification code.

It's also important to immediately abort the transaction just in case
some higher level caller is not doing it, as this indicates a very
serious corruption and we should stop the possibility of doing further
damage.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index d94429e0f16a..a0b97a6d075a 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -3296,6 +3296,7 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root
 
 	if (check_sibling_keys(left, right)) {
 		ret = -EUCLEAN;
+		btrfs_abort_transaction(trans, ret);
 		btrfs_tree_unlock(right);
 		free_extent_buffer(right);
 		return ret;
@@ -3514,6 +3515,7 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
 
 	if (check_sibling_keys(left, right)) {
 		ret = -EUCLEAN;
+		btrfs_abort_transaction(trans, ret);
 		goto out;
 	}
 	return __push_leaf_left(trans, path, min_data_size, empty, left,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails
  2023-04-26 10:51 [PATCH 0/3] btrfs: improve sibling keys check failure report fdmanana
  2023-04-26 10:51 ` [PATCH 1/3] btrfs: abort transaction when sibling keys check fails for leaves fdmanana
@ 2023-04-26 10:51 ` fdmanana
  2023-04-26 11:20   ` Qu Wenruo
  2023-04-26 10:51 ` [PATCH 3/3] btrfs: tag as unlikely the key comparison when checking sibling keys fdmanana
  2023-04-27 23:13 ` [PATCH 0/3] btrfs: improve sibling keys check failure report David Sterba
  3 siblings, 1 reply; 15+ messages in thread
From: fdmanana @ 2023-04-26 10:51 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When trying to move keys from one node/leaf to another sibling node/leaf,
if the sibling keys check fails we just print an error message with the
last key of the left sibling and the first key of the right sibling.
However it's also useful to print all the keys of each sibling, as it
may provide some clues to what went wrong, which code path may be
inserting keys in an incorrect order. So just do that, print the siblings
with btrfs_print_tree(), as it works for both leaves and nodes.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index a0b97a6d075a..a061d7092369 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left,
 	}
 
 	if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
+		btrfs_crit(left->fs_info, "left extent buffer:");
+		btrfs_print_tree(left, false);
+		btrfs_crit(left->fs_info, "right extent buffer:");
+		btrfs_print_tree(right, false);
 		btrfs_crit(left->fs_info,
 "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)",
 			   left_last.objectid, left_last.type,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] btrfs: tag as unlikely the key comparison when checking sibling keys
  2023-04-26 10:51 [PATCH 0/3] btrfs: improve sibling keys check failure report fdmanana
  2023-04-26 10:51 ` [PATCH 1/3] btrfs: abort transaction when sibling keys check fails for leaves fdmanana
  2023-04-26 10:51 ` [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails fdmanana
@ 2023-04-26 10:51 ` fdmanana
  2023-04-26 11:21   ` Qu Wenruo
  2023-04-27 23:13   ` David Sterba
  2023-04-27 23:13 ` [PATCH 0/3] btrfs: improve sibling keys check failure report David Sterba
  3 siblings, 2 replies; 15+ messages in thread
From: fdmanana @ 2023-04-26 10:51 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When checking siblings keys, before moving keys from one node/leaf to a
sibling node/leaf, it's very unexpected to have the last key of the left
sibling greater than or equals to the first key of the right sibling, as
that means we have a (serious) corruption that breaks the key ordering
properties of a b+tree. Since this is unexpected, surround the comparison
with the unlikely macro, which helps the compiler generate better code
for the most expected case (no existing b+tree corruption). This is also
what we do for other unexpected cases of invalid key ordering (like at
btrfs_set_item_key_safe()).

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index a061d7092369..c315fb793b30 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2707,7 +2707,7 @@ static bool check_sibling_keys(struct extent_buffer *left,
 		btrfs_item_key_to_cpu(right, &right_first, 0);
 	}
 
-	if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
+	if (unlikely(btrfs_comp_cpu_keys(&left_last, &right_first) >= 0)) {
 		btrfs_crit(left->fs_info, "left extent buffer:");
 		btrfs_print_tree(left, false);
 		btrfs_crit(left->fs_info, "right extent buffer:");
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] btrfs: abort transaction when sibling keys check fails for leaves
  2023-04-26 10:51 ` [PATCH 1/3] btrfs: abort transaction when sibling keys check fails for leaves fdmanana
@ 2023-04-26 11:18   ` Qu Wenruo
  0 siblings, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2023-04-26 11:18 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/4/26 18:51, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> If the sibling keys check fails before we move keys from one sibling
> leaf to another, we are not aborting the transaction - we leave that to
> some higher level caller of btrfs_search_slot() (or anything else that
> uses it to insert items into a b+tree).
> 
> This means that the transaction abort will provide a stack trace that
> omits the b+tree modification call chain. So change this to immediately
> abort the transaction and therefore get a more useful stack trace that
> shows us the call chain in the bt+tree modification code.
> 
> It's also important to immediately abort the transaction just in case
> some higher level caller is not doing it, as this indicates a very
> serious corruption and we should stop the possibility of doing further
> damage.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/ctree.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index d94429e0f16a..a0b97a6d075a 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -3296,6 +3296,7 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root
>   
>   	if (check_sibling_keys(left, right)) {
>   		ret = -EUCLEAN;
> +		btrfs_abort_transaction(trans, ret);
>   		btrfs_tree_unlock(right);
>   		free_extent_buffer(right);
>   		return ret;
> @@ -3514,6 +3515,7 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
>   
>   	if (check_sibling_keys(left, right)) {
>   		ret = -EUCLEAN;
> +		btrfs_abort_transaction(trans, ret);
>   		goto out;
>   	}
>   	return __push_leaf_left(trans, path, min_data_size, empty, left,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails
  2023-04-26 10:51 ` [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails fdmanana
@ 2023-04-26 11:20   ` Qu Wenruo
  2023-04-26 11:31     ` Filipe Manana
  2023-04-27 23:00     ` David Sterba
  0 siblings, 2 replies; 15+ messages in thread
From: Qu Wenruo @ 2023-04-26 11:20 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/4/26 18:51, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When trying to move keys from one node/leaf to another sibling node/leaf,
> if the sibling keys check fails we just print an error message with the
> last key of the left sibling and the first key of the right sibling.
> However it's also useful to print all the keys of each sibling, as it
> may provide some clues to what went wrong, which code path may be
> inserting keys in an incorrect order. So just do that, print the siblings
> with btrfs_print_tree(), as it works for both leaves and nodes.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

However my concern is, printing two tree blocks may be a little too large.
Definitely not something can be capture by one screen.

Although dumping single tree block is already too large for a single 
screen, I don't have any better way...

Thanks,
Qu
> ---
>   fs/btrfs/ctree.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index a0b97a6d075a..a061d7092369 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left,
>   	}
>   
>   	if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
> +		btrfs_crit(left->fs_info, "left extent buffer:");
> +		btrfs_print_tree(left, false);
> +		btrfs_crit(left->fs_info, "right extent buffer:");
> +		btrfs_print_tree(right, false);
>   		btrfs_crit(left->fs_info,
>   "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)",
>   			   left_last.objectid, left_last.type,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] btrfs: tag as unlikely the key comparison when checking sibling keys
  2023-04-26 10:51 ` [PATCH 3/3] btrfs: tag as unlikely the key comparison when checking sibling keys fdmanana
@ 2023-04-26 11:21   ` Qu Wenruo
  2023-04-27 23:13   ` David Sterba
  1 sibling, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2023-04-26 11:21 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/4/26 18:51, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When checking siblings keys, before moving keys from one node/leaf to a
> sibling node/leaf, it's very unexpected to have the last key of the left
> sibling greater than or equals to the first key of the right sibling, as
> that means we have a (serious) corruption that breaks the key ordering
> properties of a b+tree. Since this is unexpected, surround the comparison
> with the unlikely macro, which helps the compiler generate better code
> for the most expected case (no existing b+tree corruption). This is also
> what we do for other unexpected cases of invalid key ordering (like at
> btrfs_set_item_key_safe()).
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/ctree.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index a061d7092369..c315fb793b30 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2707,7 +2707,7 @@ static bool check_sibling_keys(struct extent_buffer *left,
>   		btrfs_item_key_to_cpu(right, &right_first, 0);
>   	}
>   
> -	if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
> +	if (unlikely(btrfs_comp_cpu_keys(&left_last, &right_first) >= 0)) {
>   		btrfs_crit(left->fs_info, "left extent buffer:");
>   		btrfs_print_tree(left, false);
>   		btrfs_crit(left->fs_info, "right extent buffer:");

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails
  2023-04-26 11:20   ` Qu Wenruo
@ 2023-04-26 11:31     ` Filipe Manana
  2023-04-26 11:40       ` Qu Wenruo
  2023-04-27 23:00     ` David Sterba
  1 sibling, 1 reply; 15+ messages in thread
From: Filipe Manana @ 2023-04-26 11:31 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Apr 26, 2023 at 12:21 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2023/4/26 18:51, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When trying to move keys from one node/leaf to another sibling node/leaf,
> > if the sibling keys check fails we just print an error message with the
> > last key of the left sibling and the first key of the right sibling.
> > However it's also useful to print all the keys of each sibling, as it
> > may provide some clues to what went wrong, which code path may be
> > inserting keys in an incorrect order. So just do that, print the siblings
> > with btrfs_print_tree(), as it works for both leaves and nodes.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
>
> Reviewed-by: Qu Wenruo <wqu@suse.com>
>
> However my concern is, printing two tree blocks may be a little too large.
> Definitely not something can be capture by one screen.

What?
If I check syslog/dmesg, I can certainly access hundreds, thousands of lines...

I suppose by "capture by one screen" you mean a screenshot?

>
> Although dumping single tree block is already too large for a single
> screen, I don't have any better way...
>
> Thanks,
> Qu
> > ---
> >   fs/btrfs/ctree.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > index a0b97a6d075a..a061d7092369 100644
> > --- a/fs/btrfs/ctree.c
> > +++ b/fs/btrfs/ctree.c
> > @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left,
> >       }
> >
> >       if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
> > +             btrfs_crit(left->fs_info, "left extent buffer:");
> > +             btrfs_print_tree(left, false);
> > +             btrfs_crit(left->fs_info, "right extent buffer:");
> > +             btrfs_print_tree(right, false);
> >               btrfs_crit(left->fs_info,
> >   "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)",
> >                          left_last.objectid, left_last.type,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails
  2023-04-26 11:31     ` Filipe Manana
@ 2023-04-26 11:40       ` Qu Wenruo
  2023-04-26 11:44         ` Filipe Manana
  0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2023-04-26 11:40 UTC (permalink / raw)
  To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs



On 2023/4/26 19:31, Filipe Manana wrote:
> On Wed, Apr 26, 2023 at 12:21 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2023/4/26 18:51, fdmanana@kernel.org wrote:
>>> From: Filipe Manana <fdmanana@suse.com>
>>>
>>> When trying to move keys from one node/leaf to another sibling node/leaf,
>>> if the sibling keys check fails we just print an error message with the
>>> last key of the left sibling and the first key of the right sibling.
>>> However it's also useful to print all the keys of each sibling, as it
>>> may provide some clues to what went wrong, which code path may be
>>> inserting keys in an incorrect order. So just do that, print the siblings
>>> with btrfs_print_tree(), as it works for both leaves and nodes.
>>>
>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>
>> Reviewed-by: Qu Wenruo <wqu@suse.com>
>>
>> However my concern is, printing two tree blocks may be a little too large.
>> Definitely not something can be capture by one screen.
> 
> What?
> If I check syslog/dmesg, I can certainly access hundreds, thousands of lines...
> 
> I suppose by "capture by one screen" you mean a screenshot?

Yep, mostly a phone shot of a physical monitor, which a lot of end users 
choose to use to report their initial trans abort.

E.g. 
https://lore.kernel.org/linux-btrfs/f057bdd1-bdd9-459f-b25f-6a2faa652499@betaapp.fastmail.com/

I guess the reason is, if the trans abort happens on the root fs, there 
will be no persistent log files anyway.
(Although it's still possible to pass the dmesg to another machine or go 
netconsole, but not everyone has such setup ready).

Thanks,
Qu
> 
>>
>> Although dumping single tree block is already too large for a single
>> screen, I don't have any better way...
>>
>> Thanks,
>> Qu
>>> ---
>>>    fs/btrfs/ctree.c | 4 ++++
>>>    1 file changed, 4 insertions(+)
>>>
>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>>> index a0b97a6d075a..a061d7092369 100644
>>> --- a/fs/btrfs/ctree.c
>>> +++ b/fs/btrfs/ctree.c
>>> @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left,
>>>        }
>>>
>>>        if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
>>> +             btrfs_crit(left->fs_info, "left extent buffer:");
>>> +             btrfs_print_tree(left, false);
>>> +             btrfs_crit(left->fs_info, "right extent buffer:");
>>> +             btrfs_print_tree(right, false);
>>>                btrfs_crit(left->fs_info,
>>>    "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)",
>>>                           left_last.objectid, left_last.type,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails
  2023-04-26 11:40       ` Qu Wenruo
@ 2023-04-26 11:44         ` Filipe Manana
  2023-04-26 11:50           ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: Filipe Manana @ 2023-04-26 11:44 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

On Wed, Apr 26, 2023 at 12:40 PM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> On 2023/4/26 19:31, Filipe Manana wrote:
> > On Wed, Apr 26, 2023 at 12:21 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>
> >>
> >>
> >> On 2023/4/26 18:51, fdmanana@kernel.org wrote:
> >>> From: Filipe Manana <fdmanana@suse.com>
> >>>
> >>> When trying to move keys from one node/leaf to another sibling node/leaf,
> >>> if the sibling keys check fails we just print an error message with the
> >>> last key of the left sibling and the first key of the right sibling.
> >>> However it's also useful to print all the keys of each sibling, as it
> >>> may provide some clues to what went wrong, which code path may be
> >>> inserting keys in an incorrect order. So just do that, print the siblings
> >>> with btrfs_print_tree(), as it works for both leaves and nodes.
> >>>
> >>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> >>
> >> Reviewed-by: Qu Wenruo <wqu@suse.com>
> >>
> >> However my concern is, printing two tree blocks may be a little too large.
> >> Definitely not something can be capture by one screen.
> >
> > What?
> > If I check syslog/dmesg, I can certainly access hundreds, thousands of lines...
> >
> > I suppose by "capture by one screen" you mean a screenshot?
>
> Yep, mostly a phone shot of a physical monitor, which a lot of end users
> choose to use to report their initial trans abort.
>
> E.g.
> https://lore.kernel.org/linux-btrfs/f057bdd1-bdd9-459f-b25f-6a2faa652499@betaapp.fastmail.com/
>
> I guess the reason is, if the trans abort happens on the root fs, there
> will be no persistent log files anyway.
> (Although it's still possible to pass the dmesg to another machine or go
> netconsole, but not everyone has such setup ready).

So what?
I don't see how that invalidates printing extent buffers... Many users
are able to access dmesg/syslog
and paste what they get there... It's also useful for development
where we can certainly access everything...


>
> Thanks,
> Qu
> >
> >>
> >> Although dumping single tree block is already too large for a single
> >> screen, I don't have any better way...
> >>
> >> Thanks,
> >> Qu
> >>> ---
> >>>    fs/btrfs/ctree.c | 4 ++++
> >>>    1 file changed, 4 insertions(+)
> >>>
> >>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> >>> index a0b97a6d075a..a061d7092369 100644
> >>> --- a/fs/btrfs/ctree.c
> >>> +++ b/fs/btrfs/ctree.c
> >>> @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left,
> >>>        }
> >>>
> >>>        if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
> >>> +             btrfs_crit(left->fs_info, "left extent buffer:");
> >>> +             btrfs_print_tree(left, false);
> >>> +             btrfs_crit(left->fs_info, "right extent buffer:");
> >>> +             btrfs_print_tree(right, false);
> >>>                btrfs_crit(left->fs_info,
> >>>    "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)",
> >>>                           left_last.objectid, left_last.type,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails
  2023-04-26 11:44         ` Filipe Manana
@ 2023-04-26 11:50           ` Qu Wenruo
  2023-04-26 12:00             ` Filipe Manana
  0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2023-04-26 11:50 UTC (permalink / raw)
  To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs



On 2023/4/26 19:44, Filipe Manana wrote:
> On Wed, Apr 26, 2023 at 12:40 PM Qu Wenruo <wqu@suse.com> wrote:
>>
>>
>>
>> On 2023/4/26 19:31, Filipe Manana wrote:
>>> On Wed, Apr 26, 2023 at 12:21 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2023/4/26 18:51, fdmanana@kernel.org wrote:
>>>>> From: Filipe Manana <fdmanana@suse.com>
>>>>>
>>>>> When trying to move keys from one node/leaf to another sibling node/leaf,
>>>>> if the sibling keys check fails we just print an error message with the
>>>>> last key of the left sibling and the first key of the right sibling.
>>>>> However it's also useful to print all the keys of each sibling, as it
>>>>> may provide some clues to what went wrong, which code path may be
>>>>> inserting keys in an incorrect order. So just do that, print the siblings
>>>>> with btrfs_print_tree(), as it works for both leaves and nodes.
>>>>>
>>>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>>>
>>>> Reviewed-by: Qu Wenruo <wqu@suse.com>
>>>>
>>>> However my concern is, printing two tree blocks may be a little too large.
>>>> Definitely not something can be capture by one screen.
>>>
>>> What?
>>> If I check syslog/dmesg, I can certainly access hundreds, thousands of lines...
>>>
>>> I suppose by "capture by one screen" you mean a screenshot?
>>
>> Yep, mostly a phone shot of a physical monitor, which a lot of end users
>> choose to use to report their initial trans abort.
>>
>> E.g.
>> https://lore.kernel.org/linux-btrfs/f057bdd1-bdd9-459f-b25f-6a2faa652499@betaapp.fastmail.com/
>>
>> I guess the reason is, if the trans abort happens on the root fs, there
>> will be no persistent log files anyway.
>> (Although it's still possible to pass the dmesg to another machine or go
>> netconsole, but not everyone has such setup ready).
> 
> So what?
> I don't see how that invalidates printing extent buffers... Many users
> are able to access dmesg/syslog
> and paste what they get there... It's also useful for development
> where we can certainly access everything...

That's why I gave the reviewed-by tag.

Maybe we can shrink the output to the first several and last several 
items in the future.

But for now since it's just a trans abort, it should be mostly fine for 
the end uesrs to access the full dump.

Thanks,
Qu
> 
> 
>>
>> Thanks,
>> Qu
>>>
>>>>
>>>> Although dumping single tree block is already too large for a single
>>>> screen, I don't have any better way...
>>>>
>>>> Thanks,
>>>> Qu
>>>>> ---
>>>>>     fs/btrfs/ctree.c | 4 ++++
>>>>>     1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>>>>> index a0b97a6d075a..a061d7092369 100644
>>>>> --- a/fs/btrfs/ctree.c
>>>>> +++ b/fs/btrfs/ctree.c
>>>>> @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left,
>>>>>         }
>>>>>
>>>>>         if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
>>>>> +             btrfs_crit(left->fs_info, "left extent buffer:");
>>>>> +             btrfs_print_tree(left, false);
>>>>> +             btrfs_crit(left->fs_info, "right extent buffer:");
>>>>> +             btrfs_print_tree(right, false);
>>>>>                 btrfs_crit(left->fs_info,
>>>>>     "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)",
>>>>>                            left_last.objectid, left_last.type,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails
  2023-04-26 11:50           ` Qu Wenruo
@ 2023-04-26 12:00             ` Filipe Manana
  0 siblings, 0 replies; 15+ messages in thread
From: Filipe Manana @ 2023-04-26 12:00 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

On Wed, Apr 26, 2023 at 12:50 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2023/4/26 19:44, Filipe Manana wrote:
> > On Wed, Apr 26, 2023 at 12:40 PM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >>
> >>
> >> On 2023/4/26 19:31, Filipe Manana wrote:
> >>> On Wed, Apr 26, 2023 at 12:21 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2023/4/26 18:51, fdmanana@kernel.org wrote:
> >>>>> From: Filipe Manana <fdmanana@suse.com>
> >>>>>
> >>>>> When trying to move keys from one node/leaf to another sibling node/leaf,
> >>>>> if the sibling keys check fails we just print an error message with the
> >>>>> last key of the left sibling and the first key of the right sibling.
> >>>>> However it's also useful to print all the keys of each sibling, as it
> >>>>> may provide some clues to what went wrong, which code path may be
> >>>>> inserting keys in an incorrect order. So just do that, print the siblings
> >>>>> with btrfs_print_tree(), as it works for both leaves and nodes.
> >>>>>
> >>>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> >>>>
> >>>> Reviewed-by: Qu Wenruo <wqu@suse.com>
> >>>>
> >>>> However my concern is, printing two tree blocks may be a little too large.
> >>>> Definitely not something can be capture by one screen.
> >>>
> >>> What?
> >>> If I check syslog/dmesg, I can certainly access hundreds, thousands of lines...
> >>>
> >>> I suppose by "capture by one screen" you mean a screenshot?
> >>
> >> Yep, mostly a phone shot of a physical monitor, which a lot of end users
> >> choose to use to report their initial trans abort.
> >>
> >> E.g.
> >> https://lore.kernel.org/linux-btrfs/f057bdd1-bdd9-459f-b25f-6a2faa652499@betaapp.fastmail.com/
> >>
> >> I guess the reason is, if the trans abort happens on the root fs, there
> >> will be no persistent log files anyway.
> >> (Although it's still possible to pass the dmesg to another machine or go
> >> netconsole, but not everyone has such setup ready).
> >
> > So what?
> > I don't see how that invalidates printing extent buffers... Many users
> > are able to access dmesg/syslog
> > and paste what they get there... It's also useful for development
> > where we can certainly access everything...
>
> That's why I gave the reviewed-by tag.
>
> Maybe we can shrink the output to the first several and last several
> items in the future.

That's not really useful for most cases, and I've certainly run into key
ordering bugs in the past where having an entire leaf printed made the
problem easier to debug,
sometimes it made it immediately obvious where the bug happened and why.

We are avoiding printing useful stuff just because we assume people will take
a photo from their phone... For those with access to dmesg/syslog, we
lose useful
information.

>
> But for now since it's just a trans abort, it should be mostly fine for
> the end uesrs to access the full dump.
>
> Thanks,
> Qu
> >
> >
> >>
> >> Thanks,
> >> Qu
> >>>
> >>>>
> >>>> Although dumping single tree block is already too large for a single
> >>>> screen, I don't have any better way...
> >>>>
> >>>> Thanks,
> >>>> Qu
> >>>>> ---
> >>>>>     fs/btrfs/ctree.c | 4 ++++
> >>>>>     1 file changed, 4 insertions(+)
> >>>>>
> >>>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> >>>>> index a0b97a6d075a..a061d7092369 100644
> >>>>> --- a/fs/btrfs/ctree.c
> >>>>> +++ b/fs/btrfs/ctree.c
> >>>>> @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left,
> >>>>>         }
> >>>>>
> >>>>>         if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
> >>>>> +             btrfs_crit(left->fs_info, "left extent buffer:");
> >>>>> +             btrfs_print_tree(left, false);
> >>>>> +             btrfs_crit(left->fs_info, "right extent buffer:");
> >>>>> +             btrfs_print_tree(right, false);
> >>>>>                 btrfs_crit(left->fs_info,
> >>>>>     "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)",
> >>>>>                            left_last.objectid, left_last.type,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails
  2023-04-26 11:20   ` Qu Wenruo
  2023-04-26 11:31     ` Filipe Manana
@ 2023-04-27 23:00     ` David Sterba
  1 sibling, 0 replies; 15+ messages in thread
From: David Sterba @ 2023-04-27 23:00 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: fdmanana, linux-btrfs

On Wed, Apr 26, 2023 at 07:20:57PM +0800, Qu Wenruo wrote:
> On 2023/4/26 18:51, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> > 
> > When trying to move keys from one node/leaf to another sibling node/leaf,
> > if the sibling keys check fails we just print an error message with the
> > last key of the left sibling and the first key of the right sibling.
> > However it's also useful to print all the keys of each sibling, as it
> > may provide some clues to what went wrong, which code path may be
> > inserting keys in an incorrect order. So just do that, print the siblings
> > with btrfs_print_tree(), as it works for both leaves and nodes.
> > 
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> 
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> 
> However my concern is, printing two tree blocks may be a little too large.
> Definitely not something can be capture by one screen.
> 
> Although dumping single tree block is already too large for a single 
> screen, I don't have any better way...

For debugging with full logging we can add what we think would be
useful but for the screenshot debugging we can only choose the last
piece of information which I think is the stack trace.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] btrfs: tag as unlikely the key comparison when checking sibling keys
  2023-04-26 10:51 ` [PATCH 3/3] btrfs: tag as unlikely the key comparison when checking sibling keys fdmanana
  2023-04-26 11:21   ` Qu Wenruo
@ 2023-04-27 23:13   ` David Sterba
  1 sibling, 0 replies; 15+ messages in thread
From: David Sterba @ 2023-04-27 23:13 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Wed, Apr 26, 2023 at 11:51:37AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When checking siblings keys, before moving keys from one node/leaf to a
> sibling node/leaf, it's very unexpected to have the last key of the left
> sibling greater than or equals to the first key of the right sibling, as
> that means we have a (serious) corruption that breaks the key ordering
> properties of a b+tree. Since this is unexpected, surround the comparison
> with the unlikely macro, which helps the compiler generate better code
> for the most expected case (no existing b+tree corruption). This is also
> what we do for other unexpected cases of invalid key ordering (like at
> btrfs_set_item_key_safe()).
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>  fs/btrfs/ctree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index a061d7092369..c315fb793b30 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2707,7 +2707,7 @@ static bool check_sibling_keys(struct extent_buffer *left,
>  		btrfs_item_key_to_cpu(right, &right_first, 0);
>  	}
>  
> -	if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
> +	if (unlikely(btrfs_comp_cpu_keys(&left_last, &right_first) >= 0)) {
>  		btrfs_crit(left->fs_info, "left extent buffer:");
>  		btrfs_print_tree(left, false);
>  		btrfs_crit(left->fs_info, "right extent buffer:");

Yeah, this is the pattern for unlikely(), compiler unfortunately does
not recognize the cold functions like printk in the statement block so
the manual annotations are needed. There's a minor effect on assembly,
some of the instructions are reordered.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] btrfs: improve sibling keys check failure report
  2023-04-26 10:51 [PATCH 0/3] btrfs: improve sibling keys check failure report fdmanana
                   ` (2 preceding siblings ...)
  2023-04-26 10:51 ` [PATCH 3/3] btrfs: tag as unlikely the key comparison when checking sibling keys fdmanana
@ 2023-04-27 23:13 ` David Sterba
  3 siblings, 0 replies; 15+ messages in thread
From: David Sterba @ 2023-04-27 23:13 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Wed, Apr 26, 2023 at 11:51:34AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> This makes reporting of sibling key check failures more useful, by
> printing more exact stack traces in case the extent buffers are leaves
> and dumping the extent buffers. This is motivated by a recent report [1].
> More details on the changelogs of each patch.
> 
> [1] https://lore.kernel.org/linux-btrfs/20230423222213.5391.409509F4@e16-tech.com/
> 
> Filipe Manana (3):
>   btrfs: abort transaction when sibling keys check fails for leaves
>   btrfs: print extent buffers when sibling keys check fails
>   btrfs: tag as unlikely the key comparison when checking sibling keys

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-04-27 23:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-26 10:51 [PATCH 0/3] btrfs: improve sibling keys check failure report fdmanana
2023-04-26 10:51 ` [PATCH 1/3] btrfs: abort transaction when sibling keys check fails for leaves fdmanana
2023-04-26 11:18   ` Qu Wenruo
2023-04-26 10:51 ` [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails fdmanana
2023-04-26 11:20   ` Qu Wenruo
2023-04-26 11:31     ` Filipe Manana
2023-04-26 11:40       ` Qu Wenruo
2023-04-26 11:44         ` Filipe Manana
2023-04-26 11:50           ` Qu Wenruo
2023-04-26 12:00             ` Filipe Manana
2023-04-27 23:00     ` David Sterba
2023-04-26 10:51 ` [PATCH 3/3] btrfs: tag as unlikely the key comparison when checking sibling keys fdmanana
2023-04-26 11:21   ` Qu Wenruo
2023-04-27 23:13   ` David Sterba
2023-04-27 23:13 ` [PATCH 0/3] btrfs: improve sibling keys check failure report David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox