From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19EA5C54EE9 for ; Wed, 7 Sep 2022 22:35:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229614AbiIGWfn (ORCPT ); Wed, 7 Sep 2022 18:35:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229535AbiIGWfm (ORCPT ); Wed, 7 Sep 2022 18:35:42 -0400 Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4325B7E011 for ; Wed, 7 Sep 2022 15:35:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1662590136; bh=GxKg9uoLa/tGN71ecePqvRVSX0p6GUQvFkhO5ZM4smk=; h=X-UI-Sender-Class:Date:Subject:To:Cc:References:From:In-Reply-To; b=jnFqJ2xciYpO2q6ys5pF69sRnEgUt3YfZOwJDiUkPF+vYzMXDtlB4GHtlzN8AmGkz ypu4+w039Z8S8rzd6otHefiF+UB+mlP3+oB271querwk/p1kpmwfoiiujVjN7ot8oJ 3AWf/vLMKv1RJvp9np5HARIWmXR/mA/wv7WdHHTg= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from [0.0.0.0] ([149.28.201.231]) by mail.gmx.net (mrgmx104 [212.227.17.174]) with ESMTPSA (Nemesis) id 1Mr9Bu-1pBSOM0PLy-00oBRf; Thu, 08 Sep 2022 00:35:36 +0200 Message-ID: Date: Thu, 8 Sep 2022 06:35:32 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.0 Subject: Re: [PATCH] btrfs: don't update the block group item if used bytes are the same Content-Language: en-US To: Josef Bacik , Qu Wenruo , dsterba@suse.com Cc: linux-btrfs@vger.kernel.org References: <64e4434370badd801a79a782613c405830475dde.1657521468.git.wqu@suse.com> <8fe1128c-ebd9-87b6-a2ac-0a427223b456@gmx.com> From: Qu Wenruo In-Reply-To: <8fe1128c-ebd9-87b6-a2ac-0a427223b456@gmx.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Provags-ID: V03:K1:yoNvqIvzrlkrIorhvHJwok3DfVahvYsV5hio7rOPM9j4NHyn44c WBJjW469bgnZkigvvEt7tVwT1FPy+9i5pHejVj8ETF8r0/9CWU/o74tyKFgbKHA9VHcKUxd S7se6kZj5CraL1P7StPjgj8OtIgl9yOfYIB77fWjPisgsZzesANzROPEDDH1pTtocVvE2OZ 0uXDtN7Uh+etjWTZ+f5gQ== X-UI-Out-Filterresults: notjunk:1;V03:K0:qZLF1Rhytio=:K6aKO7eUx4wTVvqIXTlnZr G+neQj+Mv2RWBpVZVTJdsDjgejtWjF2lBVpPVcEu54cqKBQxiB7m64Av0oSyB3sHZU+Cl86GB CP2XipMEoLgffPWUBYIR9WFng05vax7D49Tcqxo8R0MmF4lLAaDiaWrATjLJrIIpMKSff3c6I AFXPgxQzsZyiTDbMu6gDT3uPwdUW+qbGLu+dqv7uVQ6i28wFdUNySWKH49YjUGXqlkB8lhZxz x5F63U0oOZzHx53E9msRzJXCh+5ozOQtcegzvKOG14GSuYXG9IYV/jPJP2C4T07tcXQHSE0Uh lAcF7UbLlV/i/lVUPC6jV/5JrMdT1dQq5GlmeSliYsU8SeM/Yj8d4v6n2/BRWZ7x0jSCVI7kn Ea2GHWtY7NEOq2h9KX8qJ3Ev70/4YFzwp+2/Vn4kKqzje325zY4wczkVfA6lLtVqZFctxfV+H V49PD5Qo5FLsnGESVWX1Wzy4JyY90CUHrtfKd/BYXJYzhXaUOP7hFMBR3dv0X+lK96uzTxwFa 2jzq9OXvFaFqRXa7SN7fRppFHwVoycofFD82lwgLPuZ3Z+AdxnBRwo3Q3Akl6gSaSnz4dQH9f 7VmoIdgGSmcBLzEG3XpxTMZhtgReHJBRlbs1HlfTDcydD+4KtISkDt98CJOudZTcz68ie9oHw dsXiO32cqzqeWXuixi1H09oqMLWzyerUv4EPwOxAaWoyGsXkuIA3PN/WpLaetgMVU4Qw4ohUy Rg8uXAMmqx1AbzewzbRYxNugwr63EkD4ugLaq42kPKJT+HYXJMSoHvRIkrj5Kyh6JmPbxQ1pD HeMa9DR5PsOJ4u5XyO/eIiKX5E4veTFpFch/T1xI2K7W4/3xLkaRm+t2+PPY2gPWWyPsHUUTM ItPvnZWDbSVV6hpOU1igkzMZO3OVTF/2CkjDnuyRm3gtMxjVdhC0laEMUs8xm+W5ylz4mkUxK 9pNHaOncWj8WUFsQfYS2l8nO6rlLoa1Trb90XRD7QUSEIuDJeQVLeTyxLi5+043QlsL0vFiML 4O6s+j/6TtI6ajFtaTqX6YJIJPYDmMwrvoTUllwl3dea8f7r+5W4nBDku7w/Tu/neGqjPJMK+ hl8pAPv/RyVIgxgK7n1zPUl/UsKZ/QanCbgmPsDG7dbQC5E+tgi6w883O4qdqUnc4dGZqKXPJ oBAttbYGL/N+p56CUUS5a9Ez10 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On 2022/9/8 06:20, Qu Wenruo wrote: > > > On 2022/9/8 01:29, Josef Bacik wrote: >> On Wed, Sep 07, 2022 at 10:31:58AM -0400, Josef Bacik wrote: >>> On Mon, Jul 11, 2022 at 02:37:52PM +0800, Qu Wenruo wrote: >>>> When committing a transaction, we will update block group items for a= ll >>>> dirty block groups. >>>> >>>> But in fact, dirty block groups don't always need to update their blo= ck >>>> group items. >>>> It's pretty common to have a metadata block group which experienced >>>> several CoW operations, but still have the same amount of used bytes. >>>> >>>> In that case, we may unnecessarily CoW a tree block doing nothing. >>>> >>>> This patch will introduce btrfs_block_group::commit_used member to >>>> remember the last used bytes, and use that new member to skip >>>> unnecessary block group item update. >>>> >>>> This would be more common for large fs, which metadata block group ca= n >>>> be as large as 1GiB, containing at most 64K metadata items. >>>> >>>> In that case, if CoW added and the deleted one metadata item near >>>> the end >>>> of the block group, then it's completely possible we don't need to >>>> touch >>>> the block group item at all. >>>> >>>> I don't have any benchmark to prove this, but this should not cause a= ny >>>> hurt either. >>>> >>>> Signed-off-by: Qu Wenruo >>> >>> I've been seeing random btrfs check failures on our overnight testing >>> since this >>> patch was merged.=C2=A0 I can't blame it directly yet, I've mostly see= n it on >>> TEST_DEV, and once while running generic/648.=C2=A0 I'm running it in = a >>> loop now to >>> reproduce and then fix it. >>> >>> We can start updating block groups before we're in the critical >>> section, so we >>> can update block_group->bytes_used while we're updating the block >>> group item in >>> a different thread.=C2=A0 So if we set the block_group item to some va= lue of >>> bytes_used, then update it in another thread, and then set >>> ->commit_used to the >>> new value we'll fail to update the block group item with the correct >>> value >>> later. >>> >>> We need to wrap this bit in the block_group->lock to avoid this >>> particular >>> problem.=C2=A0 Once I reproduce and validate the fix I'll send that, b= ut I >>> wanted to >>> reply in case that takes longer than I expect.=C2=A0 Thanks, >> >> Ok this is in fact the problem, this fixup made the problem go away. >> Thanks, > > This fix means, a bg members can change even we are at > update_block_group_item(). > > The old code is completely relying on the one time access on cache->used= . > > Anyway thanks for the fix. So this is only happening if we execute btrfs_start_dirty_block_groups(), which unlike btrfs_write_dirty_block_groups(), is not yet protected by transaction critical path. Thus we can have bg members changing halfway and caused the race. To David, do I need to send a updated version with extra comments on this? Thanks, Qu > > Thanks, > Qu >> >> Josef >> >> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c >> index 6e7bb1c0352d..1e2773b120d4 100644 >> --- a/fs/btrfs/block-group.c >> +++ b/fs/btrfs/block-group.c >> @@ -2694,10 +2694,16 @@ static int update_block_group_item(struct >> btrfs_trans_handle *trans, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct extent_buffer *leaf; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct btrfs_block_group_item bgi; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct btrfs_key key; >> +=C2=A0=C2=A0=C2=A0 u64 used; >> >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* No change in used bytes, can safely s= kip it. */ >> -=C2=A0=C2=A0=C2=A0 if (cache->commit_used =3D=3D cache->used) >> +=C2=A0=C2=A0=C2=A0 spin_lock(&cache->lock); >> +=C2=A0=C2=A0=C2=A0 used =3D cache->used; >> +=C2=A0=C2=A0=C2=A0 if (cache->commit_used =3D=3D used) { >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 spin_unlock(&cache->lock); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return 0; >> +=C2=A0=C2=A0=C2=A0 } >> +=C2=A0=C2=A0=C2=A0 spin_unlock(&cache->lock); >> >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 key.objectid =3D cache->start; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 key.type =3D BTRFS_BLOCK_GROUP_ITEM_KEY; >> @@ -2712,13 +2718,14 @@ static int update_block_group_item(struct >> btrfs_trans_handle *trans, >> >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 leaf =3D path->nodes[0]; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bi =3D btrfs_item_ptr_offset(leaf, path-= >slots[0]); >> -=C2=A0=C2=A0=C2=A0 btrfs_set_stack_block_group_used(&bgi, cache->used)= ; >> + >> +=C2=A0=C2=A0=C2=A0 btrfs_set_stack_block_group_used(&bgi, used); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 btrfs_set_stack_block_group_chunk_object= id(&bgi, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 cache->global_root_id); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 btrfs_set_stack_block_group_flags(&bgi, = cache->flags); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 write_extent_buffer(leaf, &bgi, bi, size= of(bgi)); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 btrfs_mark_buffer_dirty(leaf); >> -=C2=A0=C2=A0=C2=A0 cache->commit_used =3D cache->used; >> +=C2=A0=C2=A0=C2=A0 cache->commit_used =3D used; >> =C2=A0 fail: >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 btrfs_release_path(path); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return ret;