* [PATCH 0/3] btrfs: improve sibling keys check failure report
@ 2023-04-26 10:51 fdmanana
2023-04-26 10:51 ` [PATCH 1/3] btrfs: abort transaction when sibling keys check fails for leaves fdmanana
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: fdmanana @ 2023-04-26 10:51 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
This makes reporting of sibling key check failures more useful, by
printing more exact stack traces in case the extent buffers are leaves
and dumping the extent buffers. This is motivated by a recent report [1].
More details on the changelogs of each patch.
[1] https://lore.kernel.org/linux-btrfs/20230423222213.5391.409509F4@e16-tech.com/
Filipe Manana (3):
btrfs: abort transaction when sibling keys check fails for leaves
btrfs: print extent buffers when sibling keys check fails
btrfs: tag as unlikely the key comparison when checking sibling keys
fs/btrfs/ctree.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 1/3] btrfs: abort transaction when sibling keys check fails for leaves 2023-04-26 10:51 [PATCH 0/3] btrfs: improve sibling keys check failure report fdmanana @ 2023-04-26 10:51 ` fdmanana 2023-04-26 11:18 ` Qu Wenruo 2023-04-26 10:51 ` [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails fdmanana ` (2 subsequent siblings) 3 siblings, 1 reply; 15+ messages in thread From: fdmanana @ 2023-04-26 10:51 UTC (permalink / raw) To: linux-btrfs From: Filipe Manana <fdmanana@suse.com> If the sibling keys check fails before we move keys from one sibling leaf to another, we are not aborting the transaction - we leave that to some higher level caller of btrfs_search_slot() (or anything else that uses it to insert items into a b+tree). This means that the transaction abort will provide a stack trace that omits the b+tree modification call chain. So change this to immediately abort the transaction and therefore get a more useful stack trace that shows us the call chain in the bt+tree modification code. It's also important to immediately abort the transaction just in case some higher level caller is not doing it, as this indicates a very serious corruption and we should stop the possibility of doing further damage. Signed-off-by: Filipe Manana <fdmanana@suse.com> --- fs/btrfs/ctree.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index d94429e0f16a..a0b97a6d075a 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -3296,6 +3296,7 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root if (check_sibling_keys(left, right)) { ret = -EUCLEAN; + btrfs_abort_transaction(trans, ret); btrfs_tree_unlock(right); free_extent_buffer(right); return ret; @@ -3514,6 +3515,7 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root if (check_sibling_keys(left, right)) { ret = -EUCLEAN; + btrfs_abort_transaction(trans, ret); goto out; } return __push_leaf_left(trans, path, min_data_size, empty, left, -- 2.34.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] btrfs: abort transaction when sibling keys check fails for leaves 2023-04-26 10:51 ` [PATCH 1/3] btrfs: abort transaction when sibling keys check fails for leaves fdmanana @ 2023-04-26 11:18 ` Qu Wenruo 0 siblings, 0 replies; 15+ messages in thread From: Qu Wenruo @ 2023-04-26 11:18 UTC (permalink / raw) To: fdmanana, linux-btrfs On 2023/4/26 18:51, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > If the sibling keys check fails before we move keys from one sibling > leaf to another, we are not aborting the transaction - we leave that to > some higher level caller of btrfs_search_slot() (or anything else that > uses it to insert items into a b+tree). > > This means that the transaction abort will provide a stack trace that > omits the b+tree modification call chain. So change this to immediately > abort the transaction and therefore get a more useful stack trace that > shows us the call chain in the bt+tree modification code. > > It's also important to immediately abort the transaction just in case > some higher level caller is not doing it, as this indicates a very > serious corruption and we should stop the possibility of doing further > damage. > > Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Thanks, Qu > --- > fs/btrfs/ctree.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c > index d94429e0f16a..a0b97a6d075a 100644 > --- a/fs/btrfs/ctree.c > +++ b/fs/btrfs/ctree.c > @@ -3296,6 +3296,7 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root > > if (check_sibling_keys(left, right)) { > ret = -EUCLEAN; > + btrfs_abort_transaction(trans, ret); > btrfs_tree_unlock(right); > free_extent_buffer(right); > return ret; > @@ -3514,6 +3515,7 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root > > if (check_sibling_keys(left, right)) { > ret = -EUCLEAN; > + btrfs_abort_transaction(trans, ret); > goto out; > } > return __push_leaf_left(trans, path, min_data_size, empty, left, ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails 2023-04-26 10:51 [PATCH 0/3] btrfs: improve sibling keys check failure report fdmanana 2023-04-26 10:51 ` [PATCH 1/3] btrfs: abort transaction when sibling keys check fails for leaves fdmanana @ 2023-04-26 10:51 ` fdmanana 2023-04-26 11:20 ` Qu Wenruo 2023-04-26 10:51 ` [PATCH 3/3] btrfs: tag as unlikely the key comparison when checking sibling keys fdmanana 2023-04-27 23:13 ` [PATCH 0/3] btrfs: improve sibling keys check failure report David Sterba 3 siblings, 1 reply; 15+ messages in thread From: fdmanana @ 2023-04-26 10:51 UTC (permalink / raw) To: linux-btrfs From: Filipe Manana <fdmanana@suse.com> When trying to move keys from one node/leaf to another sibling node/leaf, if the sibling keys check fails we just print an error message with the last key of the left sibling and the first key of the right sibling. However it's also useful to print all the keys of each sibling, as it may provide some clues to what went wrong, which code path may be inserting keys in an incorrect order. So just do that, print the siblings with btrfs_print_tree(), as it works for both leaves and nodes. Signed-off-by: Filipe Manana <fdmanana@suse.com> --- fs/btrfs/ctree.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index a0b97a6d075a..a061d7092369 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left, } if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) { + btrfs_crit(left->fs_info, "left extent buffer:"); + btrfs_print_tree(left, false); + btrfs_crit(left->fs_info, "right extent buffer:"); + btrfs_print_tree(right, false); btrfs_crit(left->fs_info, "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)", left_last.objectid, left_last.type, -- 2.34.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails 2023-04-26 10:51 ` [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails fdmanana @ 2023-04-26 11:20 ` Qu Wenruo 2023-04-26 11:31 ` Filipe Manana 2023-04-27 23:00 ` David Sterba 0 siblings, 2 replies; 15+ messages in thread From: Qu Wenruo @ 2023-04-26 11:20 UTC (permalink / raw) To: fdmanana, linux-btrfs On 2023/4/26 18:51, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > When trying to move keys from one node/leaf to another sibling node/leaf, > if the sibling keys check fails we just print an error message with the > last key of the left sibling and the first key of the right sibling. > However it's also useful to print all the keys of each sibling, as it > may provide some clues to what went wrong, which code path may be > inserting keys in an incorrect order. So just do that, print the siblings > with btrfs_print_tree(), as it works for both leaves and nodes. > > Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> However my concern is, printing two tree blocks may be a little too large. Definitely not something can be capture by one screen. Although dumping single tree block is already too large for a single screen, I don't have any better way... Thanks, Qu > --- > fs/btrfs/ctree.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c > index a0b97a6d075a..a061d7092369 100644 > --- a/fs/btrfs/ctree.c > +++ b/fs/btrfs/ctree.c > @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left, > } > > if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) { > + btrfs_crit(left->fs_info, "left extent buffer:"); > + btrfs_print_tree(left, false); > + btrfs_crit(left->fs_info, "right extent buffer:"); > + btrfs_print_tree(right, false); > btrfs_crit(left->fs_info, > "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)", > left_last.objectid, left_last.type, ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails 2023-04-26 11:20 ` Qu Wenruo @ 2023-04-26 11:31 ` Filipe Manana 2023-04-26 11:40 ` Qu Wenruo 2023-04-27 23:00 ` David Sterba 1 sibling, 1 reply; 15+ messages in thread From: Filipe Manana @ 2023-04-26 11:31 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Wed, Apr 26, 2023 at 12:21 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2023/4/26 18:51, fdmanana@kernel.org wrote: > > From: Filipe Manana <fdmanana@suse.com> > > > > When trying to move keys from one node/leaf to another sibling node/leaf, > > if the sibling keys check fails we just print an error message with the > > last key of the left sibling and the first key of the right sibling. > > However it's also useful to print all the keys of each sibling, as it > > may provide some clues to what went wrong, which code path may be > > inserting keys in an incorrect order. So just do that, print the siblings > > with btrfs_print_tree(), as it works for both leaves and nodes. > > > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > > Reviewed-by: Qu Wenruo <wqu@suse.com> > > However my concern is, printing two tree blocks may be a little too large. > Definitely not something can be capture by one screen. What? If I check syslog/dmesg, I can certainly access hundreds, thousands of lines... I suppose by "capture by one screen" you mean a screenshot? > > Although dumping single tree block is already too large for a single > screen, I don't have any better way... > > Thanks, > Qu > > --- > > fs/btrfs/ctree.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c > > index a0b97a6d075a..a061d7092369 100644 > > --- a/fs/btrfs/ctree.c > > +++ b/fs/btrfs/ctree.c > > @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left, > > } > > > > if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) { > > + btrfs_crit(left->fs_info, "left extent buffer:"); > > + btrfs_print_tree(left, false); > > + btrfs_crit(left->fs_info, "right extent buffer:"); > > + btrfs_print_tree(right, false); > > btrfs_crit(left->fs_info, > > "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)", > > left_last.objectid, left_last.type, ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails 2023-04-26 11:31 ` Filipe Manana @ 2023-04-26 11:40 ` Qu Wenruo 2023-04-26 11:44 ` Filipe Manana 0 siblings, 1 reply; 15+ messages in thread From: Qu Wenruo @ 2023-04-26 11:40 UTC (permalink / raw) To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs On 2023/4/26 19:31, Filipe Manana wrote: > On Wed, Apr 26, 2023 at 12:21 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >> >> >> >> On 2023/4/26 18:51, fdmanana@kernel.org wrote: >>> From: Filipe Manana <fdmanana@suse.com> >>> >>> When trying to move keys from one node/leaf to another sibling node/leaf, >>> if the sibling keys check fails we just print an error message with the >>> last key of the left sibling and the first key of the right sibling. >>> However it's also useful to print all the keys of each sibling, as it >>> may provide some clues to what went wrong, which code path may be >>> inserting keys in an incorrect order. So just do that, print the siblings >>> with btrfs_print_tree(), as it works for both leaves and nodes. >>> >>> Signed-off-by: Filipe Manana <fdmanana@suse.com> >> >> Reviewed-by: Qu Wenruo <wqu@suse.com> >> >> However my concern is, printing two tree blocks may be a little too large. >> Definitely not something can be capture by one screen. > > What? > If I check syslog/dmesg, I can certainly access hundreds, thousands of lines... > > I suppose by "capture by one screen" you mean a screenshot? Yep, mostly a phone shot of a physical monitor, which a lot of end users choose to use to report their initial trans abort. E.g. https://lore.kernel.org/linux-btrfs/f057bdd1-bdd9-459f-b25f-6a2faa652499@betaapp.fastmail.com/ I guess the reason is, if the trans abort happens on the root fs, there will be no persistent log files anyway. (Although it's still possible to pass the dmesg to another machine or go netconsole, but not everyone has such setup ready). Thanks, Qu > >> >> Although dumping single tree block is already too large for a single >> screen, I don't have any better way... >> >> Thanks, >> Qu >>> --- >>> fs/btrfs/ctree.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c >>> index a0b97a6d075a..a061d7092369 100644 >>> --- a/fs/btrfs/ctree.c >>> +++ b/fs/btrfs/ctree.c >>> @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left, >>> } >>> >>> if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) { >>> + btrfs_crit(left->fs_info, "left extent buffer:"); >>> + btrfs_print_tree(left, false); >>> + btrfs_crit(left->fs_info, "right extent buffer:"); >>> + btrfs_print_tree(right, false); >>> btrfs_crit(left->fs_info, >>> "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)", >>> left_last.objectid, left_last.type, ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails 2023-04-26 11:40 ` Qu Wenruo @ 2023-04-26 11:44 ` Filipe Manana 2023-04-26 11:50 ` Qu Wenruo 0 siblings, 1 reply; 15+ messages in thread From: Filipe Manana @ 2023-04-26 11:44 UTC (permalink / raw) To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs On Wed, Apr 26, 2023 at 12:40 PM Qu Wenruo <wqu@suse.com> wrote: > > > > On 2023/4/26 19:31, Filipe Manana wrote: > > On Wed, Apr 26, 2023 at 12:21 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >> > >> > >> > >> On 2023/4/26 18:51, fdmanana@kernel.org wrote: > >>> From: Filipe Manana <fdmanana@suse.com> > >>> > >>> When trying to move keys from one node/leaf to another sibling node/leaf, > >>> if the sibling keys check fails we just print an error message with the > >>> last key of the left sibling and the first key of the right sibling. > >>> However it's also useful to print all the keys of each sibling, as it > >>> may provide some clues to what went wrong, which code path may be > >>> inserting keys in an incorrect order. So just do that, print the siblings > >>> with btrfs_print_tree(), as it works for both leaves and nodes. > >>> > >>> Signed-off-by: Filipe Manana <fdmanana@suse.com> > >> > >> Reviewed-by: Qu Wenruo <wqu@suse.com> > >> > >> However my concern is, printing two tree blocks may be a little too large. > >> Definitely not something can be capture by one screen. > > > > What? > > If I check syslog/dmesg, I can certainly access hundreds, thousands of lines... > > > > I suppose by "capture by one screen" you mean a screenshot? > > Yep, mostly a phone shot of a physical monitor, which a lot of end users > choose to use to report their initial trans abort. > > E.g. > https://lore.kernel.org/linux-btrfs/f057bdd1-bdd9-459f-b25f-6a2faa652499@betaapp.fastmail.com/ > > I guess the reason is, if the trans abort happens on the root fs, there > will be no persistent log files anyway. > (Although it's still possible to pass the dmesg to another machine or go > netconsole, but not everyone has such setup ready). So what? I don't see how that invalidates printing extent buffers... Many users are able to access dmesg/syslog and paste what they get there... It's also useful for development where we can certainly access everything... > > Thanks, > Qu > > > >> > >> Although dumping single tree block is already too large for a single > >> screen, I don't have any better way... > >> > >> Thanks, > >> Qu > >>> --- > >>> fs/btrfs/ctree.c | 4 ++++ > >>> 1 file changed, 4 insertions(+) > >>> > >>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c > >>> index a0b97a6d075a..a061d7092369 100644 > >>> --- a/fs/btrfs/ctree.c > >>> +++ b/fs/btrfs/ctree.c > >>> @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left, > >>> } > >>> > >>> if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) { > >>> + btrfs_crit(left->fs_info, "left extent buffer:"); > >>> + btrfs_print_tree(left, false); > >>> + btrfs_crit(left->fs_info, "right extent buffer:"); > >>> + btrfs_print_tree(right, false); > >>> btrfs_crit(left->fs_info, > >>> "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)", > >>> left_last.objectid, left_last.type, ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails 2023-04-26 11:44 ` Filipe Manana @ 2023-04-26 11:50 ` Qu Wenruo 2023-04-26 12:00 ` Filipe Manana 0 siblings, 1 reply; 15+ messages in thread From: Qu Wenruo @ 2023-04-26 11:50 UTC (permalink / raw) To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs On 2023/4/26 19:44, Filipe Manana wrote: > On Wed, Apr 26, 2023 at 12:40 PM Qu Wenruo <wqu@suse.com> wrote: >> >> >> >> On 2023/4/26 19:31, Filipe Manana wrote: >>> On Wed, Apr 26, 2023 at 12:21 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>> >>>> >>>> >>>> On 2023/4/26 18:51, fdmanana@kernel.org wrote: >>>>> From: Filipe Manana <fdmanana@suse.com> >>>>> >>>>> When trying to move keys from one node/leaf to another sibling node/leaf, >>>>> if the sibling keys check fails we just print an error message with the >>>>> last key of the left sibling and the first key of the right sibling. >>>>> However it's also useful to print all the keys of each sibling, as it >>>>> may provide some clues to what went wrong, which code path may be >>>>> inserting keys in an incorrect order. So just do that, print the siblings >>>>> with btrfs_print_tree(), as it works for both leaves and nodes. >>>>> >>>>> Signed-off-by: Filipe Manana <fdmanana@suse.com> >>>> >>>> Reviewed-by: Qu Wenruo <wqu@suse.com> >>>> >>>> However my concern is, printing two tree blocks may be a little too large. >>>> Definitely not something can be capture by one screen. >>> >>> What? >>> If I check syslog/dmesg, I can certainly access hundreds, thousands of lines... >>> >>> I suppose by "capture by one screen" you mean a screenshot? >> >> Yep, mostly a phone shot of a physical monitor, which a lot of end users >> choose to use to report their initial trans abort. >> >> E.g. >> https://lore.kernel.org/linux-btrfs/f057bdd1-bdd9-459f-b25f-6a2faa652499@betaapp.fastmail.com/ >> >> I guess the reason is, if the trans abort happens on the root fs, there >> will be no persistent log files anyway. >> (Although it's still possible to pass the dmesg to another machine or go >> netconsole, but not everyone has such setup ready). > > So what? > I don't see how that invalidates printing extent buffers... Many users > are able to access dmesg/syslog > and paste what they get there... It's also useful for development > where we can certainly access everything... That's why I gave the reviewed-by tag. Maybe we can shrink the output to the first several and last several items in the future. But for now since it's just a trans abort, it should be mostly fine for the end uesrs to access the full dump. Thanks, Qu > > >> >> Thanks, >> Qu >>> >>>> >>>> Although dumping single tree block is already too large for a single >>>> screen, I don't have any better way... >>>> >>>> Thanks, >>>> Qu >>>>> --- >>>>> fs/btrfs/ctree.c | 4 ++++ >>>>> 1 file changed, 4 insertions(+) >>>>> >>>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c >>>>> index a0b97a6d075a..a061d7092369 100644 >>>>> --- a/fs/btrfs/ctree.c >>>>> +++ b/fs/btrfs/ctree.c >>>>> @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left, >>>>> } >>>>> >>>>> if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) { >>>>> + btrfs_crit(left->fs_info, "left extent buffer:"); >>>>> + btrfs_print_tree(left, false); >>>>> + btrfs_crit(left->fs_info, "right extent buffer:"); >>>>> + btrfs_print_tree(right, false); >>>>> btrfs_crit(left->fs_info, >>>>> "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)", >>>>> left_last.objectid, left_last.type, ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails 2023-04-26 11:50 ` Qu Wenruo @ 2023-04-26 12:00 ` Filipe Manana 0 siblings, 0 replies; 15+ messages in thread From: Filipe Manana @ 2023-04-26 12:00 UTC (permalink / raw) To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs On Wed, Apr 26, 2023 at 12:50 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2023/4/26 19:44, Filipe Manana wrote: > > On Wed, Apr 26, 2023 at 12:40 PM Qu Wenruo <wqu@suse.com> wrote: > >> > >> > >> > >> On 2023/4/26 19:31, Filipe Manana wrote: > >>> On Wed, Apr 26, 2023 at 12:21 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >>>> > >>>> > >>>> > >>>> On 2023/4/26 18:51, fdmanana@kernel.org wrote: > >>>>> From: Filipe Manana <fdmanana@suse.com> > >>>>> > >>>>> When trying to move keys from one node/leaf to another sibling node/leaf, > >>>>> if the sibling keys check fails we just print an error message with the > >>>>> last key of the left sibling and the first key of the right sibling. > >>>>> However it's also useful to print all the keys of each sibling, as it > >>>>> may provide some clues to what went wrong, which code path may be > >>>>> inserting keys in an incorrect order. So just do that, print the siblings > >>>>> with btrfs_print_tree(), as it works for both leaves and nodes. > >>>>> > >>>>> Signed-off-by: Filipe Manana <fdmanana@suse.com> > >>>> > >>>> Reviewed-by: Qu Wenruo <wqu@suse.com> > >>>> > >>>> However my concern is, printing two tree blocks may be a little too large. > >>>> Definitely not something can be capture by one screen. > >>> > >>> What? > >>> If I check syslog/dmesg, I can certainly access hundreds, thousands of lines... > >>> > >>> I suppose by "capture by one screen" you mean a screenshot? > >> > >> Yep, mostly a phone shot of a physical monitor, which a lot of end users > >> choose to use to report their initial trans abort. > >> > >> E.g. > >> https://lore.kernel.org/linux-btrfs/f057bdd1-bdd9-459f-b25f-6a2faa652499@betaapp.fastmail.com/ > >> > >> I guess the reason is, if the trans abort happens on the root fs, there > >> will be no persistent log files anyway. > >> (Although it's still possible to pass the dmesg to another machine or go > >> netconsole, but not everyone has such setup ready). > > > > So what? > > I don't see how that invalidates printing extent buffers... Many users > > are able to access dmesg/syslog > > and paste what they get there... It's also useful for development > > where we can certainly access everything... > > That's why I gave the reviewed-by tag. > > Maybe we can shrink the output to the first several and last several > items in the future. That's not really useful for most cases, and I've certainly run into key ordering bugs in the past where having an entire leaf printed made the problem easier to debug, sometimes it made it immediately obvious where the bug happened and why. We are avoiding printing useful stuff just because we assume people will take a photo from their phone... For those with access to dmesg/syslog, we lose useful information. > > But for now since it's just a trans abort, it should be mostly fine for > the end uesrs to access the full dump. > > Thanks, > Qu > > > > > >> > >> Thanks, > >> Qu > >>> > >>>> > >>>> Although dumping single tree block is already too large for a single > >>>> screen, I don't have any better way... > >>>> > >>>> Thanks, > >>>> Qu > >>>>> --- > >>>>> fs/btrfs/ctree.c | 4 ++++ > >>>>> 1 file changed, 4 insertions(+) > >>>>> > >>>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c > >>>>> index a0b97a6d075a..a061d7092369 100644 > >>>>> --- a/fs/btrfs/ctree.c > >>>>> +++ b/fs/btrfs/ctree.c > >>>>> @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left, > >>>>> } > >>>>> > >>>>> if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) { > >>>>> + btrfs_crit(left->fs_info, "left extent buffer:"); > >>>>> + btrfs_print_tree(left, false); > >>>>> + btrfs_crit(left->fs_info, "right extent buffer:"); > >>>>> + btrfs_print_tree(right, false); > >>>>> btrfs_crit(left->fs_info, > >>>>> "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)", > >>>>> left_last.objectid, left_last.type, ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails 2023-04-26 11:20 ` Qu Wenruo 2023-04-26 11:31 ` Filipe Manana @ 2023-04-27 23:00 ` David Sterba 1 sibling, 0 replies; 15+ messages in thread From: David Sterba @ 2023-04-27 23:00 UTC (permalink / raw) To: Qu Wenruo; +Cc: fdmanana, linux-btrfs On Wed, Apr 26, 2023 at 07:20:57PM +0800, Qu Wenruo wrote: > On 2023/4/26 18:51, fdmanana@kernel.org wrote: > > From: Filipe Manana <fdmanana@suse.com> > > > > When trying to move keys from one node/leaf to another sibling node/leaf, > > if the sibling keys check fails we just print an error message with the > > last key of the left sibling and the first key of the right sibling. > > However it's also useful to print all the keys of each sibling, as it > > may provide some clues to what went wrong, which code path may be > > inserting keys in an incorrect order. So just do that, print the siblings > > with btrfs_print_tree(), as it works for both leaves and nodes. > > > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > > Reviewed-by: Qu Wenruo <wqu@suse.com> > > However my concern is, printing two tree blocks may be a little too large. > Definitely not something can be capture by one screen. > > Although dumping single tree block is already too large for a single > screen, I don't have any better way... For debugging with full logging we can add what we think would be useful but for the screenshot debugging we can only choose the last piece of information which I think is the stack trace. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/3] btrfs: tag as unlikely the key comparison when checking sibling keys 2023-04-26 10:51 [PATCH 0/3] btrfs: improve sibling keys check failure report fdmanana 2023-04-26 10:51 ` [PATCH 1/3] btrfs: abort transaction when sibling keys check fails for leaves fdmanana 2023-04-26 10:51 ` [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails fdmanana @ 2023-04-26 10:51 ` fdmanana 2023-04-26 11:21 ` Qu Wenruo 2023-04-27 23:13 ` David Sterba 2023-04-27 23:13 ` [PATCH 0/3] btrfs: improve sibling keys check failure report David Sterba 3 siblings, 2 replies; 15+ messages in thread From: fdmanana @ 2023-04-26 10:51 UTC (permalink / raw) To: linux-btrfs From: Filipe Manana <fdmanana@suse.com> When checking siblings keys, before moving keys from one node/leaf to a sibling node/leaf, it's very unexpected to have the last key of the left sibling greater than or equals to the first key of the right sibling, as that means we have a (serious) corruption that breaks the key ordering properties of a b+tree. Since this is unexpected, surround the comparison with the unlikely macro, which helps the compiler generate better code for the most expected case (no existing b+tree corruption). This is also what we do for other unexpected cases of invalid key ordering (like at btrfs_set_item_key_safe()). Signed-off-by: Filipe Manana <fdmanana@suse.com> --- fs/btrfs/ctree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index a061d7092369..c315fb793b30 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -2707,7 +2707,7 @@ static bool check_sibling_keys(struct extent_buffer *left, btrfs_item_key_to_cpu(right, &right_first, 0); } - if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) { + if (unlikely(btrfs_comp_cpu_keys(&left_last, &right_first) >= 0)) { btrfs_crit(left->fs_info, "left extent buffer:"); btrfs_print_tree(left, false); btrfs_crit(left->fs_info, "right extent buffer:"); -- 2.34.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] btrfs: tag as unlikely the key comparison when checking sibling keys 2023-04-26 10:51 ` [PATCH 3/3] btrfs: tag as unlikely the key comparison when checking sibling keys fdmanana @ 2023-04-26 11:21 ` Qu Wenruo 2023-04-27 23:13 ` David Sterba 1 sibling, 0 replies; 15+ messages in thread From: Qu Wenruo @ 2023-04-26 11:21 UTC (permalink / raw) To: fdmanana, linux-btrfs On 2023/4/26 18:51, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > When checking siblings keys, before moving keys from one node/leaf to a > sibling node/leaf, it's very unexpected to have the last key of the left > sibling greater than or equals to the first key of the right sibling, as > that means we have a (serious) corruption that breaks the key ordering > properties of a b+tree. Since this is unexpected, surround the comparison > with the unlikely macro, which helps the compiler generate better code > for the most expected case (no existing b+tree corruption). This is also > what we do for other unexpected cases of invalid key ordering (like at > btrfs_set_item_key_safe()). > > Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Thanks, Qu > --- > fs/btrfs/ctree.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c > index a061d7092369..c315fb793b30 100644 > --- a/fs/btrfs/ctree.c > +++ b/fs/btrfs/ctree.c > @@ -2707,7 +2707,7 @@ static bool check_sibling_keys(struct extent_buffer *left, > btrfs_item_key_to_cpu(right, &right_first, 0); > } > > - if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) { > + if (unlikely(btrfs_comp_cpu_keys(&left_last, &right_first) >= 0)) { > btrfs_crit(left->fs_info, "left extent buffer:"); > btrfs_print_tree(left, false); > btrfs_crit(left->fs_info, "right extent buffer:"); ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] btrfs: tag as unlikely the key comparison when checking sibling keys 2023-04-26 10:51 ` [PATCH 3/3] btrfs: tag as unlikely the key comparison when checking sibling keys fdmanana 2023-04-26 11:21 ` Qu Wenruo @ 2023-04-27 23:13 ` David Sterba 1 sibling, 0 replies; 15+ messages in thread From: David Sterba @ 2023-04-27 23:13 UTC (permalink / raw) To: fdmanana; +Cc: linux-btrfs On Wed, Apr 26, 2023 at 11:51:37AM +0100, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > When checking siblings keys, before moving keys from one node/leaf to a > sibling node/leaf, it's very unexpected to have the last key of the left > sibling greater than or equals to the first key of the right sibling, as > that means we have a (serious) corruption that breaks the key ordering > properties of a b+tree. Since this is unexpected, surround the comparison > with the unlikely macro, which helps the compiler generate better code > for the most expected case (no existing b+tree corruption). This is also > what we do for other unexpected cases of invalid key ordering (like at > btrfs_set_item_key_safe()). > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > --- > fs/btrfs/ctree.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c > index a061d7092369..c315fb793b30 100644 > --- a/fs/btrfs/ctree.c > +++ b/fs/btrfs/ctree.c > @@ -2707,7 +2707,7 @@ static bool check_sibling_keys(struct extent_buffer *left, > btrfs_item_key_to_cpu(right, &right_first, 0); > } > > - if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) { > + if (unlikely(btrfs_comp_cpu_keys(&left_last, &right_first) >= 0)) { > btrfs_crit(left->fs_info, "left extent buffer:"); > btrfs_print_tree(left, false); > btrfs_crit(left->fs_info, "right extent buffer:"); Yeah, this is the pattern for unlikely(), compiler unfortunately does not recognize the cold functions like printk in the statement block so the manual annotations are needed. There's a minor effect on assembly, some of the instructions are reordered. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] btrfs: improve sibling keys check failure report 2023-04-26 10:51 [PATCH 0/3] btrfs: improve sibling keys check failure report fdmanana ` (2 preceding siblings ...) 2023-04-26 10:51 ` [PATCH 3/3] btrfs: tag as unlikely the key comparison when checking sibling keys fdmanana @ 2023-04-27 23:13 ` David Sterba 3 siblings, 0 replies; 15+ messages in thread From: David Sterba @ 2023-04-27 23:13 UTC (permalink / raw) To: fdmanana; +Cc: linux-btrfs On Wed, Apr 26, 2023 at 11:51:34AM +0100, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > This makes reporting of sibling key check failures more useful, by > printing more exact stack traces in case the extent buffers are leaves > and dumping the extent buffers. This is motivated by a recent report [1]. > More details on the changelogs of each patch. > > [1] https://lore.kernel.org/linux-btrfs/20230423222213.5391.409509F4@e16-tech.com/ > > Filipe Manana (3): > btrfs: abort transaction when sibling keys check fails for leaves > btrfs: print extent buffers when sibling keys check fails > btrfs: tag as unlikely the key comparison when checking sibling keys Added to misc-next, thanks. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-04-27 23:14 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-04-26 10:51 [PATCH 0/3] btrfs: improve sibling keys check failure report fdmanana 2023-04-26 10:51 ` [PATCH 1/3] btrfs: abort transaction when sibling keys check fails for leaves fdmanana 2023-04-26 11:18 ` Qu Wenruo 2023-04-26 10:51 ` [PATCH 2/3] btrfs: print extent buffers when sibling keys check fails fdmanana 2023-04-26 11:20 ` Qu Wenruo 2023-04-26 11:31 ` Filipe Manana 2023-04-26 11:40 ` Qu Wenruo 2023-04-26 11:44 ` Filipe Manana 2023-04-26 11:50 ` Qu Wenruo 2023-04-26 12:00 ` Filipe Manana 2023-04-27 23:00 ` David Sterba 2023-04-26 10:51 ` [PATCH 3/3] btrfs: tag as unlikely the key comparison when checking sibling keys fdmanana 2023-04-26 11:21 ` Qu Wenruo 2023-04-27 23:13 ` David Sterba 2023-04-27 23:13 ` [PATCH 0/3] btrfs: improve sibling keys check failure report David Sterba
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox