From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D45B31E32DC for ; Wed, 9 Oct 2024 15:10:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728486631; cv=none; b=n0nHuj78lyCY7S+bdFeUHxXm0Q2gr/xJmqvGcVW7WMYvCEJlLy7Op1VzCJlqyixq5I7NXKNATTQ7oTe82gyRCgrb9RHThmJrvK6WShSoqc+est5cGofW5VuaKPPW3xL7KXEjwsZpzDKvefRSj6+3Xdd53Ap9t0NdHJKHUhc9Vyo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728486631; c=relaxed/simple; bh=9RtjoNsCFoyAzTw3fL96naAYKHnJMvWaYteLNNqnBpk=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=XrJkcjEPMZPyajGz7U513jnK6PyARnf7bquw4wgjntiAaQePO0LPy4yTOt/Vr9xGrXq+DpZ4H+693QWCs2kwaNhmxWsWEvB2dhiI/kHWfbWfKMacU+ohefvD+/tgI9i/A01HaaMV3o2JexwD5JU3t0rnAiQ30FPVO/PU8dWCnLw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=L5KZZacK; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="L5KZZacK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7188EC4CECF for ; Wed, 9 Oct 2024 15:10:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1728486631; bh=9RtjoNsCFoyAzTw3fL96naAYKHnJMvWaYteLNNqnBpk=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=L5KZZacKhZFoRxaVdC5BaZW6lSserA9r7bhOISRT4zCD1U+dTE6iPrfaMRkA+Rh94 C0r08Artrht4GlztaqKOkFBiV+9vJoLVfgALsriwm21JSjlEkTVC5iiMXSqQiuIhqD IQWjVPtCnPTXjHO+9IDn2EKwepvdf4ARtxivGv80wZPMLqSbcGzkyZgdocz3sd3V3c T+6CqVzHQEfWZd3tQ8clv85vVRT0eCOPE6fdaTufl1gIAxCnu/S6QCYMajQytAHtoU UgKW2V6Ut+M4pAwpnlUjo65MJwTBpROQDLQnTaQUDi219iboOkt11KUDUbU4pyQ89j fppZELAQMk6ew== Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-a995f56ea2dso479316366b.1 for ; Wed, 09 Oct 2024 08:10:31 -0700 (PDT) X-Gm-Message-State: AOJu0YyFozH07+JNxPLZvSsE3Ca6GQqOj+WRcC5uPOFGZrD0QhdUXKip bfF9NdbFPEDO7zH+cWXZJDCTrGFB+3hQzS2sUr+NYFHPNaMdDIam69gJqGLV8BzoqWk80gmUxW0 +l43ARy3J2swBC35l/3D09PPDbNs= X-Google-Smtp-Source: AGHT+IE9nUh05E3g1BS2Y8SC0TxN87G8qJfD9yluIXePnzvDio/YV+aZW0R5m5/EGeKr5xgmIGsO6Yp/Qi/2H4U0/9A= X-Received: by 2002:a17:907:7f94:b0:a7a:aa35:408c with SMTP id a640c23a62f3a-a998d10df8bmr235407066b.8.1728486629879; Wed, 09 Oct 2024 08:10:29 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20241009020149.23467-1-robbieko@synology.com> In-Reply-To: From: Filipe Manana Date: Wed, 9 Oct 2024 16:09:52 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] btrfs: reduce lock contention when eb cache miss for btree search To: robbieko Cc: linux-btrfs@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Oct 9, 2024 at 3:18=E2=80=AFPM Filipe Manana = wrote: > > On Wed, Oct 9, 2024 at 3:09=E2=80=AFAM robbieko w= rote: > > > > From: Robbie Ko > > > > When crawling btree, if an eb cache miss occurs, we change to use > > the eb read lock and release all previous locks to reduce lock contenti= on. > > > > Because we have prepared the check parameters and the read lock > > of eb we hold, we can ensure that no race will occur during the check > > and cause unexpected errors. > > > > Signed-off-by: Robbie Ko > > --- > > fs/btrfs/ctree.c | 88 ++++++++++++++++++++++++++++-------------------- > > 1 file changed, 52 insertions(+), 36 deletions(-) > > > > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c > > index 0cc919d15b14..0efbe61497f3 100644 > > --- a/fs/btrfs/ctree.c > > +++ b/fs/btrfs/ctree.c > > @@ -1515,12 +1515,12 @@ read_block_for_search(struct btrfs_root *root, = struct btrfs_path *p, > > struct btrfs_tree_parent_check check =3D { 0 }; > > u64 blocknr; > > u64 gen; > > - struct extent_buffer *tmp; > > - int ret; > > + struct extent_buffer *tmp =3D NULL; > > + int ret, err; > > Please avoid declaring two (or more) variables in the same line. 1 per > line is prefered for readability and be consistent with this > function's code. > > > int parent_level; > > - bool unlock_up; > > + bool create =3D false; > > + bool lock =3D false; > > > > - unlock_up =3D ((level + 1 < BTRFS_MAX_LEVEL) && p->locks[level = + 1]); > > blocknr =3D btrfs_node_blockptr(*eb_ret, slot); > > gen =3D btrfs_node_ptr_generation(*eb_ret, slot); > > parent_level =3D btrfs_header_level(*eb_ret); > > @@ -1551,52 +1551,66 @@ read_block_for_search(struct btrfs_root *root, = struct btrfs_path *p, > > */ > > if (btrfs_verify_level_key(tmp, > > parent_level - 1, &check.first_= key, gen)) { > > - free_extent_buffer(tmp); > > - return -EUCLEAN; > > + ret =3D -EUCLEAN; > > + goto out; > > } > > *eb_ret =3D tmp; > > - return 0; > > + tmp =3D NULL; > > + ret =3D 0; > > + goto out; > > By setting tmp to NULL and jumping to the out label, we leak a > reference on the tmp extent buffer. > > > } > > > > if (p->nowait) { > > - free_extent_buffer(tmp); > > - return -EAGAIN; > > + ret =3D -EAGAIN; > > + btrfs_release_path(p); > > To reduce the critical section sizes, please set ret to -EAGAIN after > releasing the path. > > > + goto out; > > } > > > > - if (unlock_up) > > - btrfs_unlock_up_safe(p, level + 1); > > + ret =3D -EAGAIN; > > + btrfs_unlock_up_safe(p, level + 1); > > Same here, set ret after the unlocking. > > But why set ret to -EAGAIN? Here we know p->nowait is false. Nevermind this is because of the unconditional unlock now. > > > + btrfs_tree_read_lock(tmp); > > + lock =3D true; > > And here, with the same reasoning, set 'lock' to true before calling > btrfs_tree_read_lock(). > > > + btrfs_release_path(p); > > > > /* now we're allowed to do a blocking uptodate check */ > > - ret =3D btrfs_read_extent_buffer(tmp, &check); > > - if (ret) { > > - free_extent_buffer(tmp); > > - btrfs_release_path(p); > > - return ret; > > + err =3D btrfs_read_extent_buffer(tmp, &check); > > + if (err) { > > + ret =3D err; > > + goto out; > > Why do we need to have this 'err' variable? > Why not use 'ret' and simplify? > > > } > > - > > - if (unlock_up) > > - ret =3D -EAGAIN; > > - > > goto out; > > } else if (p->nowait) { > > - return -EAGAIN; > > - } > > - > > - if (unlock_up) { > > - btrfs_unlock_up_safe(p, level + 1); > > ret =3D -EAGAIN; > > - } else { > > - ret =3D 0; > > + btrfs_release_path(p); > > Same here, set 'ret' to -EAGAIN after releasing the path. > > > + goto out; > > } > > > > + ret =3D -EAGAIN; > > + btrfs_unlock_up_safe(p, level + 1); > > Same here. > > But why set ret to -EAGAIN? Here we know p->nowait is false. Nevermind this is because of the unconditional unlock now. In both cases we still don't need the 'err' variable. Just set 'ret' to -EAGAIN if btrfs_find_create_tree_block() below doesn't return an error and btrfs_read_extent_buffer() below and above don't return an error. > > > + > > if (p->reada !=3D READA_NONE) > > reada_for_search(fs_info, p, level, slot, key->objectid= ); > > > > - tmp =3D read_tree_block(fs_info, blocknr, &check); > > + tmp =3D btrfs_find_create_tree_block(fs_info, blocknr, check.ow= ner_root, check.level); > > if (IS_ERR(tmp)) { > > + ret =3D PTR_ERR(tmp); > > + tmp =3D NULL; > > btrfs_release_path(p); > > - return PTR_ERR(tmp); > > + goto out; > > } > > + create =3D true; > > + > > + btrfs_tree_read_lock(tmp); > > + lock =3D true; > > Same here, set 'lock' to true before the call to btrfs_tree_read_lock(); > > > + btrfs_release_path(p); > > + > > + /* now we're allowed to do a blocking uptodate check */ > > Please capitalize the first word of a sentence and end the sentence > with punctuation. > This is the preferred style and we strive to do that for new comments > or code that moves comments around. > > > + err =3D btrfs_read_extent_buffer(tmp, &check); > > This can block, so if p->nowait at this point we should instead exit > with -EAGAIN and not call this function. > > > + if (err) { > > + ret =3D err; > > + goto out; > > + } > > Same here, no need to use extra variable 'err', can just use 'ret'. > > > + > > /* > > * If the read above didn't mark this buffer up to date, > > * it will never end up being up to date. Set ret to EIO now > > @@ -1607,11 +1621,13 @@ read_block_for_search(struct btrfs_root *root, = struct btrfs_path *p, > > ret =3D -EIO; > > > > out: > > - if (ret =3D=3D 0) { > > - *eb_ret =3D tmp; > > - } else { > > - free_extent_buffer(tmp); > > - btrfs_release_path(p); > > + if (tmp) { > > + if (lock) > > + btrfs_tree_read_unlock(tmp); > > + if (create && ret && ret !=3D -EAGAIN) > > + free_extent_buffer_stale(tmp); > > + else > > + free_extent_buffer(tmp); > > So in the success case we no longer set *eb_ret to tmp. Why? The > callers expect that. Ok, it's because of the -EAGAIN due to having the unconditional unlock now. And btw, have you observed any case where this improved anything? Any benchmarks? I can't see how this can make anything better because this function is never called for a root node, and therefore level + 1 < BTRFS_MAX_LEVEL. But there is one case where this patch causes unnecessary path releases and makes the caller retry a search: when none of the levels above were locked. That's why we had the following expression before: unlock_up =3D ((level + 1 < BTRFS_MAX_LEVEL) && p->locks[level + 1]); So we wouldn't make the caller retry the search if upper levels aren't locked or p->skip_locking =3D=3D 1. But now after this patch we make the caller retry the search in that case. Thanks. > > Thanks. > > > } > > > > return ret; > > @@ -2198,7 +2214,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *= trans, struct btrfs_root *root, > > } > > > > err =3D read_block_for_search(root, p, &b, level, slot,= key); > > - if (err =3D=3D -EAGAIN) > > + if (err =3D=3D -EAGAIN && !p->nowait) > > goto again; > > if (err) { > > ret =3D err; > > @@ -2325,7 +2341,7 @@ int btrfs_search_old_slot(struct btrfs_root *root= , const struct btrfs_key *key, > > } > > > > err =3D read_block_for_search(root, p, &b, level, slot,= key); > > - if (err =3D=3D -EAGAIN) > > + if (err =3D=3D -EAGAIN && !p->nowait) > > goto again; > > if (err) { > > ret =3D err; > > -- > > 2.17.1 > > > >