Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Wang Yugui <wangyugui@e16-tech.com>
To: fdmanana@kernel.org, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 1/2] btrfs: fix btrfs_prev_leaf() to not return the same key twice
Date: Fri, 14 Apr 2023 11:43:41 +0800	[thread overview]
Message-ID: <20230414114340.57ED.409509F4@e16-tech.com> (raw)
In-Reply-To: <20230414035734.6DF7.409509F4@e16-tech.com>

Hi,

> Hi,
> 
> > From: Filipe Manana <fdmanana@suse.com>
> > 
> > A call to btrfs_prev_leaf() may end up returning a path that points to the
> > same item (key) again. This happens if while btrfs_prev_leaf(), after we
> > release the path, a concurrent insertion happens, which moves items off
> > from a sibbling into the front of the previous leaf, and an item with the
> > computed previous key does not exists.
> > 
> > For example, suppose we have the two following leaves:
> > 
> >   Leaf A
> > 
> >   -------------------------------------------------------------
> >   | ...   key (300 96 10)   key (300 96 15)   key (300 96 16) |
> >   -------------------------------------------------------------
> >               slot 20             slot 21             slot 22
> > 
> >   Leaf B
> > 
> >   -------------------------------------------------------------
> >   | key (300 96 20)   key (300 96 21)   key (300 96 22)   ... |
> >   -------------------------------------------------------------
> >       slot 0             slot 1             slot 2
> > 
> > If we call btrfs_prev_leaf(), from btrfs_previous_item() for example, with
> > a path pointing to leaf B and slot 0 and the following happens:
> > 
> > 1) At btrfs_prev_leaf() we compute the previous key to search as:
> >    (300 96 19), which is a key that does not exists in the tree;
> > 
> > 2) Then we call btrfs_release_path() at btrfs_prev_leaf();
> > 
> > 3) Some other task inserts a key at leaf A, that sorts before the key at
> >    slot 20, for example it has an objectid of 299. In order to make room
> >    for the new key, the key at slot 22 is moved to the front of leaf B.
> >    This happens at push_leaf_right(), called from split_leaf().
> > 
> >    After this leaf B now looks like:
> > 
> >   --------------------------------------------------------------------------------
> >   | key (300 96 16)    key (300 96 20)   key (300 96 21)   key (300 96 22)   ... |
> >   --------------------------------------------------------------------------------
> >        slot 0              slot 1             slot 2             slot 3
> > 
> > 4) At btrfs_prev_leaf() we call btrfs_search_slot() for the computed
> >    previous key: (300 96 19). Since the key does not exists,
> >    btrfs_search_slot() returns 1 and with a path pointing to leaf B
> >    and slot 1, the item with key (300 96 20);
> > 
> > 5) This makes btrfs_prev_leaf() return a path that points to slot 1 of
> >    leaf B, the same key as before it was called, since the key at slot 0
> >    of leaf B (300 96 16) is less than the computed previous key, which is
> >    (300 96 19);
> > 
> > 6) As a consequence btrfs_previous_item() returns a path that points again
> >    to the item with key (300 96 20).
> > 
> > For some users of btrfs_prev_leaf() or btrfs_previous_item() this may not
> > be functional a problem, despite not making sense to return a new path
> > pointing again to the same item/key. However for a caller such as
> > tree-log.c:log_dir_items(), this has a bad consequence, as it can result
> > in not logging some dir index deletions in case the directory is being
> > logged without holding the inode's VFS lock (logging triggered while
> > logging a child inode for example) - for the example scenario above, in
> > case the dir index keys 17, 18 and 19 were deleted in the current
> > transaction.
> 
> fstests(btrfs/080) had few frequency(<1/10) to fail after this pacth is applied.
> but it is yet not clear enough.
> 
> # cat results//btrfs/080.out.bad
> QA output created by 080
> Unexpected digest for file /mnt/scratch/20_40_30_986878191_snap/foobar_63
> (see /usr/hpc-bio/xfstests/results//btrfs/080.full for details)

fstests(btrfs/080) failure happened without this patch too.

the next info update may take some long time, because of the low reproduce
frequency.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/04/14



  reply	other threads:[~2023-04-14  3:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-12 10:33 [PATCH 0/2] btrfs: fix for btrfs_prev_leaf() and unexport it fdmanana
2023-04-12 10:33 ` [PATCH 1/2] btrfs: fix btrfs_prev_leaf() to not return the same key twice fdmanana
2023-04-13 19:57   ` Wang Yugui
2023-04-14  3:43     ` Wang Yugui [this message]
2023-04-17 10:18       ` Filipe Manana
2023-04-12 10:33 ` [PATCH 2/2] btrfs: unexport btrfs_prev_leaf() fdmanana
2023-04-15  7:14   ` Anand Jain
2023-04-12 13:47 ` [PATCH 0/2] btrfs: fix for btrfs_prev_leaf() and unexport it Josef Bacik
2023-04-17 18:53 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230414114340.57ED.409509F4@e16-tech.com \
    --to=wangyugui@e16-tech.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox