From: Boris Burkov <boris@bur.io>
To: Filipe Manana <fdmanana@kernel.org>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 1/2] btrfs: warn on invalid slot in tree mod log rewind
Date: Wed, 31 May 2023 07:51:25 -0700 [thread overview]
Message-ID: <20230531145125.GA2249308@zen> (raw)
In-Reply-To: <CAL3q7H6Fage2sJLY6JEdzPVJGogOqHWWNnbpRL2gU3o24x9F6w@mail.gmail.com>
On Wed, May 31, 2023 at 01:37:32PM +0100, Filipe Manana wrote:
> On Tue, May 30, 2023 at 8:52 PM Boris Burkov <boris@bur.io> wrote:
> >
> > The way that tree mod log tracks the ultimate length of the eb, the
> > variable 'n', eventually turns up the correct value, but at intermediate
> > steps during the rewind, n can be inaccurate as a representation of the
> > end of the eb. For example, it doesn't get updated on move rewinds, and
> > it does get updated for add/remove in the middle of the eb.
> >
> > To detect cases with invalid moves, introduce a separate variable called
> > max_slot which tries to track the maximum valid slot in the rewind eb.
> > We can then warn if we do a move whose src range goes beyond the max
> > valid slot.
> >
> > There is a commented caveat that it is possible to have this value be an
> > overestimate due to the challenge of properly handling 'add' operations
> > in the middle of the eb, but in practice it doesn't cause enough of a
> > problem to throw out the max idea in favor of tracking every valid slot.
> >
> > Signed-off-by: Boris Burkov <boris@bur.io>
> > ---
> > fs/btrfs/tree-mod-log.c | 19 +++++++++++++++++++
> > 1 file changed, 19 insertions(+)
> >
> > diff --git a/fs/btrfs/tree-mod-log.c b/fs/btrfs/tree-mod-log.c
> > index a555baa0143a..1b1ae9ce9d1e 100644
> > --- a/fs/btrfs/tree-mod-log.c
> > +++ b/fs/btrfs/tree-mod-log.c
> > @@ -664,8 +664,10 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
> > unsigned long o_dst;
> > unsigned long o_src;
> > unsigned long p_size = sizeof(struct btrfs_key_ptr);
> > + u32 max_slot;
> >
> > n = btrfs_header_nritems(eb);
> > + max_slot = n - 1;
> > read_lock(&fs_info->tree_mod_log_lock);
> > while (tm && tm->seq >= time_seq) {
> > /*
> > @@ -684,6 +686,8 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
> > btrfs_set_node_ptr_generation(eb, tm->slot,
> > tm->generation);
> > n++;
> > + if (max_slot == (u32)-1 || tm->slot > max_slot)
> > + max_slot = tm->slot;
> > break;
> > case BTRFS_MOD_LOG_KEY_REPLACE:
> > BUG_ON(tm->slot >= n);
> > @@ -693,6 +697,15 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
> > tm->generation);
> > break;
> > case BTRFS_MOD_LOG_KEY_ADD:
> > + /*
> > + * It is possible we could have already removed keys behind the known
> > + * max slot, so this will be an overestimate. In practice, the copy
> > + * operation inserts them in increasing order, and overestimating just
> > + * means we miss some warnings, so it's OK. It isn't worth carefully
> > + * tracking the full array of valid slots to check against when moving.
> > + */
> > + if (tm->slot == max_slot)
> > + max_slot--;
> > /* if a move operation is needed it's in the log */
> > n--;
> > break;
> > @@ -701,6 +714,12 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
> > o_src = btrfs_node_key_ptr_offset(eb, tm->move.dst_slot);
> > memmove_extent_buffer(eb, o_dst, o_src,
> > tm->move.nr_items * p_size);
> > + WARN((tm->move.dst_slot + tm->move.nr_items - 1 > max_slot) ||
> > + (max_slot == (u32)-1 && tm->move.nr_items > 0),
>
> Why the: "tm->move.nr_items > 0" ?
> It's actually a bug, or in the best case a pointless operation that
> should not have been logged, to have a move operation for 0 items.
My thinking was that max_slot models the maximum valid slot with
(u32)-1 meaning "no slot is valid". But a move of size 0 doesn't read
any slot so doesn't constitute a bug per-se, just a dumb no-op tree mod
log entry. I can still warn on it, if you think that's better.
>
> > + "Move from invalid tree mod log slot eb %llu slot %d dst_slot %d nr_items %d seq %llu n %u max_slot %u\n",
> > + eb->start, tm->slot, tm->move.dst_slot, tm->move.nr_items,
> > + tm->seq, n, max_slot);
>
> Why trigger the warning after doing the memmove, and not before?
> I would say it's preferable, because in case the bad memmove triggers
> a crash/panic, we get the warning logged with some useful information.
>
> It's also possibly better to log using the btrfs_warn() helper, as it
> prints information about the affected filesystem, etc.
> So like a: if (WARN_ON(conditions)) btrfs_warn()
Good points, thanks for the review.
>
> Thanks.
>
>
> > + max_slot = tm->slot + tm->move.nr_items - 1;
> > break;
> > case BTRFS_MOD_LOG_ROOT_REPLACE:
> > /*
> > --
> > 2.40.1
> >
next prev parent reply other threads:[~2023-05-31 14:51 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-30 19:22 [PATCH 0/2] btrfs: fix logical_to_ino panic in btrfs_map_bio Boris Burkov
2023-05-30 19:22 ` [PATCH 1/2] btrfs: warn on invalid slot in tree mod log rewind Boris Burkov
2023-05-31 12:37 ` Filipe Manana
2023-05-31 14:51 ` Boris Burkov [this message]
2023-05-30 19:22 ` [PATCH 2/2] btrfs: insert tree mod log move in push_node_left Boris Burkov
2023-05-31 12:53 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230531145125.GA2249308@zen \
--to=boris@bur.io \
--cc=fdmanana@kernel.org \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox