linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@kernel.org>
To: robbieko <robbieko@synology.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: reduce lock contention when eb cache miss for btree search
Date: Wed, 9 Oct 2024 16:09:52 +0100	[thread overview]
Message-ID: <CAL3q7H4sdxdnLx9uaSDrhVpJCdsik+ZPVCH4cjeZKCBKY_ohzw@mail.gmail.com> (raw)
In-Reply-To: <CAL3q7H4sjkQ2-s=jdP-bsF3Jpc7iqGTkOfKe1asfo17+a4Mvgg@mail.gmail.com>

On Wed, Oct 9, 2024 at 3:18 PM Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Wed, Oct 9, 2024 at 3:09 AM robbieko <robbieko@synology.com> wrote:
> >
> > From: Robbie Ko <robbieko@synology.com>
> >
> > When crawling btree, if an eb cache miss occurs, we change to use
> > the eb read lock and release all previous locks to reduce lock contention.
> >
> > Because we have prepared the check parameters and the read lock
> > of eb we hold, we can ensure that no race will occur during the check
> > and cause unexpected errors.
> >
> > Signed-off-by: Robbie Ko <robbieko@synology.com>
> > ---
> >  fs/btrfs/ctree.c | 88 ++++++++++++++++++++++++++++--------------------
> >  1 file changed, 52 insertions(+), 36 deletions(-)
> >
> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > index 0cc919d15b14..0efbe61497f3 100644
> > --- a/fs/btrfs/ctree.c
> > +++ b/fs/btrfs/ctree.c
> > @@ -1515,12 +1515,12 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
> >         struct btrfs_tree_parent_check check = { 0 };
> >         u64 blocknr;
> >         u64 gen;
> > -       struct extent_buffer *tmp;
> > -       int ret;
> > +       struct extent_buffer *tmp = NULL;
> > +       int ret, err;
>
> Please avoid declaring two (or more) variables in the same line. 1 per
> line is prefered for readability and be consistent with this
> function's code.
>
> >         int parent_level;
> > -       bool unlock_up;
> > +       bool create = false;
> > +       bool lock = false;
> >
> > -       unlock_up = ((level + 1 < BTRFS_MAX_LEVEL) && p->locks[level + 1]);
> >         blocknr = btrfs_node_blockptr(*eb_ret, slot);
> >         gen = btrfs_node_ptr_generation(*eb_ret, slot);
> >         parent_level = btrfs_header_level(*eb_ret);
> > @@ -1551,52 +1551,66 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
> >                          */
> >                         if (btrfs_verify_level_key(tmp,
> >                                         parent_level - 1, &check.first_key, gen)) {
> > -                               free_extent_buffer(tmp);
> > -                               return -EUCLEAN;
> > +                               ret = -EUCLEAN;
> > +                               goto out;
> >                         }
> >                         *eb_ret = tmp;
> > -                       return 0;
> > +                       tmp = NULL;
> > +                       ret = 0;
> > +                       goto out;
>
> By setting tmp to NULL and jumping to the out label, we leak a
> reference on the tmp extent buffer.
>
> >                 }
> >
> >                 if (p->nowait) {
> > -                       free_extent_buffer(tmp);
> > -                       return -EAGAIN;
> > +                       ret = -EAGAIN;
> > +                       btrfs_release_path(p);
>
> To reduce the critical section sizes, please set ret to -EAGAIN after
> releasing the path.
>
> > +                       goto out;
> >                 }
> >
> > -               if (unlock_up)
> > -                       btrfs_unlock_up_safe(p, level + 1);
> > +               ret = -EAGAIN;
> > +               btrfs_unlock_up_safe(p, level + 1);
>
> Same here, set ret after the unlocking.
>
> But why set ret to -EAGAIN? Here we know p->nowait is false.

Nevermind this is because of the unconditional unlock now.

>
> > +               btrfs_tree_read_lock(tmp);
> > +               lock = true;
>
> And here, with the same reasoning, set 'lock' to true before calling
> btrfs_tree_read_lock().
>
> > +               btrfs_release_path(p);
> >
> >                 /* now we're allowed to do a blocking uptodate check */
> > -               ret = btrfs_read_extent_buffer(tmp, &check);
> > -               if (ret) {
> > -                       free_extent_buffer(tmp);
> > -                       btrfs_release_path(p);
> > -                       return ret;
> > +               err = btrfs_read_extent_buffer(tmp, &check);
> > +               if (err) {
> > +                       ret = err;
> > +                       goto out;
>
> Why do we need to have this 'err' variable?
> Why not use 'ret' and simplify?
>
> >                 }
> > -
> > -               if (unlock_up)
> > -                       ret = -EAGAIN;
> > -
> >                 goto out;
> >         } else if (p->nowait) {
> > -               return -EAGAIN;
> > -       }
> > -
> > -       if (unlock_up) {
> > -               btrfs_unlock_up_safe(p, level + 1);
> >                 ret = -EAGAIN;
> > -       } else {
> > -               ret = 0;
> > +               btrfs_release_path(p);
>
> Same here, set 'ret' to -EAGAIN after releasing the path.
>
> > +               goto out;
> >         }
> >
> > +       ret = -EAGAIN;
> > +       btrfs_unlock_up_safe(p, level + 1);
>
> Same here.
>
> But why set ret to -EAGAIN? Here we know p->nowait is false.

Nevermind this is because of the unconditional unlock now.

In both cases we still don't need the 'err' variable.
Just set 'ret' to -EAGAIN if btrfs_find_create_tree_block() below
doesn't return an error and btrfs_read_extent_buffer() below and above
don't return an error.

>
> > +
> >         if (p->reada != READA_NONE)
> >                 reada_for_search(fs_info, p, level, slot, key->objectid);
> >
> > -       tmp = read_tree_block(fs_info, blocknr, &check);
> > +       tmp = btrfs_find_create_tree_block(fs_info, blocknr, check.owner_root, check.level);
> >         if (IS_ERR(tmp)) {
> > +               ret = PTR_ERR(tmp);
> > +               tmp = NULL;
> >                 btrfs_release_path(p);
> > -               return PTR_ERR(tmp);
> > +               goto out;
> >         }
> > +       create = true;
> > +
> > +       btrfs_tree_read_lock(tmp);
> > +       lock = true;
>
> Same here, set 'lock' to true before the call to btrfs_tree_read_lock();
>
> > +       btrfs_release_path(p);
> > +
> > +       /* now we're allowed to do a blocking uptodate check */
>
> Please capitalize the first word of a sentence and end the sentence
> with punctuation.
> This is the preferred style and we strive to do that for new comments
> or code that moves comments around.
>
> > +       err = btrfs_read_extent_buffer(tmp, &check);
>
> This can block, so if p->nowait at this point we should instead exit
> with -EAGAIN and not call this function.
>
> > +       if (err) {
> > +               ret = err;
> > +               goto out;
> > +       }
>
> Same here, no need to use extra variable 'err', can just use 'ret'.
>
> > +
> >         /*
> >          * If the read above didn't mark this buffer up to date,
> >          * it will never end up being up to date.  Set ret to EIO now
> > @@ -1607,11 +1621,13 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
> >                 ret = -EIO;
> >
> >  out:
> > -       if (ret == 0) {
> > -               *eb_ret = tmp;
> > -       } else {
> > -               free_extent_buffer(tmp);
> > -               btrfs_release_path(p);
> > +       if (tmp) {
> > +               if (lock)
> > +                       btrfs_tree_read_unlock(tmp);
> > +               if (create && ret && ret != -EAGAIN)
> > +                       free_extent_buffer_stale(tmp);
> > +               else
> > +                       free_extent_buffer(tmp);
>
> So in the success case we no longer set *eb_ret to tmp. Why? The
> callers expect that.

Ok, it's because of the -EAGAIN due to having the unconditional unlock now.

And btw, have you observed any case where this improved anything? Any
benchmarks?

I can't see how this can make anything better because this function is
never called for a root node, and therefore level + 1 <
BTRFS_MAX_LEVEL.

But there is one case where this patch causes unnecessary path
releases and makes the caller retry a search:
when none of the levels above were locked. That's why we had the
following expression before:

unlock_up = ((level + 1 < BTRFS_MAX_LEVEL) && p->locks[level + 1]);

So we wouldn't make the caller retry the search if upper levels aren't
locked or p->skip_locking == 1.
But now after this patch we make the caller retry the search in that case.

Thanks.

>
> Thanks.
>
> >         }
> >
> >         return ret;
> > @@ -2198,7 +2214,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
> >                 }
> >
> >                 err = read_block_for_search(root, p, &b, level, slot, key);
> > -               if (err == -EAGAIN)
> > +               if (err == -EAGAIN && !p->nowait)
> >                         goto again;
> >                 if (err) {
> >                         ret = err;
> > @@ -2325,7 +2341,7 @@ int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
> >                 }
> >
> >                 err = read_block_for_search(root, p, &b, level, slot, key);
> > -               if (err == -EAGAIN)
> > +               if (err == -EAGAIN && !p->nowait)
> >                         goto again;
> >                 if (err) {
> >                         ret = err;
> > --
> > 2.17.1
> >
> >

  parent reply	other threads:[~2024-10-09 15:10 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-09  2:01 [PATCH] btrfs: reduce lock contention when eb cache miss for btree search robbieko
2024-10-09 14:18 ` Filipe Manana
2024-10-09 14:20   ` Filipe Manana
2024-10-09 15:09   ` Filipe Manana [this message]
2024-10-11  2:40     ` robbieko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL3q7H4sdxdnLx9uaSDrhVpJCdsik+ZPVCH4cjeZKCBKY_ohzw@mail.gmail.com \
    --to=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=robbieko@synology.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).