From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0E9DC25B06 for ; Fri, 12 Aug 2022 00:48:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234583AbiHLAsH (ORCPT ); Thu, 11 Aug 2022 20:48:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39024 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229524AbiHLAsG (ORCPT ); Thu, 11 Aug 2022 20:48:06 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0D00A0247 for ; Thu, 11 Aug 2022 17:48:04 -0700 (PDT) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27BLSqUn032683 for ; Thu, 11 Aug 2022 17:48:04 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=uneK66wQj1+/+aTEUmPWzVm58EK2GBlb0Ft2+meq328=; b=oPUMJ8vE/gX9sSfKi2NR7O8EJCT+Jet8cqyB/7CjC1ZBInycujjrRsdBIH1F0meSC7JX h1B2Z6qgGFfl84gB7NpMgGDuVqe0jVDgNcOV3SBPx/qEoCXkGHHiBCVfqs7jM8L3HkVB vAt/i1C6L+n1xU+kLXcwigKWASydfBrvAkM= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hw9qfh1rn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 11 Aug 2022 17:48:03 -0700 Received: from twshared30313.14.frc2.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 11 Aug 2022 17:48:02 -0700 Received: by devvm7778.ftw0.facebook.com (Postfix, from userid 558217) id 98F1C33AFC0B; Thu, 11 Aug 2022 17:48:00 -0700 (PDT) From: Ioannis Angelakopoulos To: , Subject: [RFC PATCH 2/2] btrfs: Lockdep annotations for the extent bits wait event Date: Thu, 11 Aug 2022 17:42:44 -0700 Message-ID: <20220812004241.1722846-3-iangelak@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220812004241.1722846-1-iangelak@fb.com> References: <20220812004241.1722846-1-iangelak@fb.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-FB-Internal: Safe Content-Type: text/plain X-Proofpoint-GUID: fZUC7Fu0KYYFUWE_iCXU0aDl_-YlmSB7 X-Proofpoint-ORIG-GUID: fZUC7Fu0KYYFUWE_iCXU0aDl_-YlmSB7 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-11_14,2022-08-11_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Add lockdep annotations in five places that lock extent bits: 1) find_lock_delalloc_range() in fs/btrfs/extent_io.c and its callers 2) btrfs_lock_and_flush_ordered_range() in fs/btrfs/ordered-data.c and its callers 3) extent_fiemap() in fs/btrfs/extent_io.c 4) btrfs_fallocate() in fs/btrfs/file.c 5) btrfs_finish_ordered_io() in fs/btrfs/inode.c To annotate the extent bits wait event we make use of a two level lockdep map (making use of the multilevel wait event lockdep annotation macros). The first level is used for high level functions such as btrfs_fallocate(= ) and the second level is used for low level functions, such as find_lock_delalloc_range(). If we used only one level then lockdep would complain because there are execution contexts where the extent bits annotation is acquired before th= e delalloc_mutex (i.e., in btrfs_fallocate()) and later delalloc_mutex is acquired before the extent bits annotation (i.e., find_lock_delalloc_range()). Normally, if the range of bits locked were t= he same for both btrfs_fallocate() and find_lock_delalloc_range() we would have a deadlock(). However, a call to btrfs_wait_ordered_range() from btrfs_fallocate() guarantees that the range of extent bits is unlocked before locking, thus the deadlock is averted. By using a two level lockdep map we "break" the dependency between the hi= gh and low levels, thus lockdep does not complain. Signed-off-by: Ioannis Angelakopoulos --- fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 1 + fs/btrfs/extent_io.c | 114 +++++++++++++++++++++++++++++++++++++--- fs/btrfs/file.c | 10 ++-- fs/btrfs/inode.c | 22 ++++++-- fs/btrfs/ordered-data.c | 4 +- 6 files changed, 133 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 44837545eef8..85f947ce6e6b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1104,6 +1104,7 @@ struct btrfs_fs_info { struct lockdep_map btrfs_state_change_map[4]; struct lockdep_map btrfs_trans_pending_ordered_map; struct lockdep_map btrfs_ordered_extent_map; + struct lockdep_map btrfs_inode_extent_lock_map; =20 #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 6268dafeeb2d..afd971c31168 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2996,6 +2996,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_in= fo) btrfs_lockdep_init_map(fs_info, btrfs_trans_num_extwriters); btrfs_lockdep_init_map(fs_info, btrfs_trans_pending_ordered); btrfs_lockdep_init_map(fs_info, btrfs_ordered_extent); + btrfs_lockdep_init_map(fs_info, btrfs_inode_extent_lock); btrfs_state_lockdep_init_map(fs_info, btrfs_trans_commit_start, BTRFS_LOCKDEP_TRANS_COMMIT_START); btrfs_state_lockdep_init_map(fs_info, btrfs_trans_unblocked, diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 6e8e936a8a1e..3d2ef3d78e7f 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -891,6 +891,24 @@ int __clear_extent_bit(struct extent_io_tree *tree, = u64 start, u64 end, =20 } =20 +int __clear_extent_bit_lockdep(struct extent_io_tree *tree, u64 start, u= 64 end, + u32 bits, int wake, int delete, + struct extent_state **cached_state, + gfp_t mask, struct extent_changeset *changeset) +{ + int ret; + + ret =3D __clear_extent_bit(tree, start, end, bits, wake, delete, cached= _state, + mask, changeset); + + if ((tree->owner !=3D IO_TREE_FREE_SPACE_INODE_IO) && + (tree->owner =3D=3D IO_TREE_INODE_IO)) + btrfs_lockdep_release(tree->fs_info, btrfs_inode_extent_lock); + + return ret; +} + + static void wait_on_state(struct extent_io_tree *tree, struct extent_state *state) __releases(tree->lock) @@ -1470,6 +1488,14 @@ int clear_extent_bit(struct extent_io_tree *tree, = u64 start, u64 end, cached, GFP_NOFS, NULL); } =20 +int clear_extent_bit_lockdep(struct extent_io_tree *tree, u64 start, u64= end, + u32 bits, int wake, int delete, + struct extent_state **cached) +{ + return __clear_extent_bit_lockdep(tree, start, end, bits, wake, delete, + cached, GFP_NOFS, NULL); +} + int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64= end, u32 bits, struct extent_changeset *changeset) { @@ -1504,6 +1530,43 @@ int lock_extent_bits(struct extent_io_tree *tree, = u64 start, u64 end, break; WARN_ON(start > end); } + + return err; +} + +int lock_extent_bits_lockdep(struct extent_io_tree *tree, u64 start, u64= end, + struct extent_state **cached_state, bool nested) +{ + int err; +#ifdef CONFIG_LOCKDEP + int subclass =3D 1; +#endif + + /* The wait event occurs within lock_extent_bits() */ + if ((tree->owner !=3D IO_TREE_FREE_SPACE_INODE_IO) && + (tree->owner =3D=3D IO_TREE_INODE_IO)) { + if (nested) + btrfs_might_wait_for_event_nested(tree->fs_info, + btrfs_inode_extent_lock, + subclass); + else + btrfs_might_wait_for_event(tree->fs_info, + btrfs_inode_extent_lock); + } + + err =3D lock_extent_bits(tree, start, end, cached_state); + + if ((tree->owner !=3D IO_TREE_FREE_SPACE_INODE_IO) && + (tree->owner =3D=3D IO_TREE_INODE_IO)) { + if (nested) + btrfs_lockdep_acquire_nested(tree->fs_info, + btrfs_inode_extent_lock, + subclass); + else + btrfs_lockdep_acquire(tree->fs_info, + btrfs_inode_extent_lock); + } + return err; } =20 @@ -2094,14 +2157,15 @@ noinline_for_stack bool find_lock_delalloc_range(= struct inode *inode, } =20 /* step three, lock the state bits for the whole range */ - lock_extent_bits(tree, delalloc_start, delalloc_end, &cached_state); + lock_extent_bits_lockdep(tree, delalloc_start, delalloc_end, &cached_st= ate, + true); =20 /* then test to make sure it is all still delalloc */ ret =3D test_range_bit(tree, delalloc_start, delalloc_end, EXTENT_DELALLOC, 1, cached_state); if (!ret) { - unlock_extent_cached(tree, delalloc_start, delalloc_end, - &cached_state); + unlock_extent_cached_lockdep(tree, delalloc_start, delalloc_end, + &cached_state); __unlock_for_delalloc(inode, locked_page, delalloc_start, delalloc_end); cond_resched(); @@ -3774,6 +3838,20 @@ int btrfs_read_folio(struct file *file, struct fol= io *folio) * bio to do the cleanup. */ submit_one_bio(&bio_ctrl); + + /* + * Release the lockdep map here because + * btrfs_lock_and_flush_ordered_range() will exit with the lockdep map + * acquired as a reader. Actually bio worker would unlock the extent bi= ts + * and thus the lockdep map when it ran asynchronously, if extent bits = were + * annotated in a generic way. However, lockdep expects that the + * acquisition and release of the lockdep map occur in the same context= , + * thus we have to unlock the lockdep map here. + */ + if ((inode->io_tree.owner !=3D IO_TREE_FREE_SPACE_INODE_IO) && + (inode->io_tree.owner =3D=3D IO_TREE_INODE_IO)) + btrfs_lockdep_release(inode->io_tree.fs_info, + btrfs_inode_extent_lock); return ret; } =20 @@ -3793,6 +3871,18 @@ static inline void contiguous_readpages(struct pag= e *pages[], int nr_pages, REQ_RAHEAD, prev_em_start); put_page(pages[index]); } + + /* + * Release the lockdep map here because + * btrfs_lock_and_flush_ordered_range() will exit with the lockdep map + * acquired as a reader. We have to unlock here because + * contiguous_readpages() will be called in while loop by + * extent_readahead(), thus lockdep would complain about double locking= . + */ + if ((inode->io_tree.owner !=3D IO_TREE_FREE_SPACE_INODE_IO) && + (inode->io_tree.owner =3D=3D IO_TREE_INODE_IO)) + btrfs_lockdep_release(inode->io_tree.fs_info, + btrfs_inode_extent_lock); } =20 /* @@ -3829,6 +3919,16 @@ static noinline_for_stack int writepage_delalloc(s= truct btrfs_inode *inode, } ret =3D btrfs_run_delalloc_range(inode, page, delalloc_start, delalloc_end, &page_started, &nr_written, wbc); + + /* + * Release the lockdep map here, because btrfs_run_delalloc_range() + * will exit with the lockdep map acquired. + */ + if ((inode->io_tree.owner !=3D IO_TREE_FREE_SPACE_INODE_IO) && + (inode->io_tree.owner =3D=3D IO_TREE_INODE_IO)) + btrfs_lockdep_release(inode->io_tree.fs_info, + btrfs_inode_extent_lock); + if (ret) { btrfs_page_set_error(inode->root->fs_info, page, page_offset(page), PAGE_SIZE); @@ -5584,8 +5684,8 @@ int extent_fiemap(struct btrfs_inode *inode, struct= fiemap_extent_info *fieinfo, last_for_get_extent =3D isize; } =20 - lock_extent_bits(&inode->io_tree, start, start + len - 1, - &cached_state); + lock_extent_bits_lockdep(&inode->io_tree, start, start + len - 1, + &cached_state, false); =20 em =3D get_extent_skip_holes(inode, start, last_for_get_extent); if (!em) @@ -5697,8 +5797,8 @@ int extent_fiemap(struct btrfs_inode *inode, struct= fiemap_extent_info *fieinfo, ret =3D emit_last_fiemap_cache(fieinfo, &cache); free_extent_map(em); out: - unlock_extent_cached(&inode->io_tree, start, start + len - 1, - &cached_state); + unlock_extent_cached_lockdep(&inode->io_tree, start, start + len - 1, + &cached_state); =20 out_free_ulist: btrfs_free_path(path); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index c9649b7b2f18..df51a0cc3d98 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1510,7 +1510,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inod= e, loff_t pos, *write_bytes =3D min_t(size_t, *write_bytes , num_bytes - pos + lockstart); } - unlock_extent(&inode->io_tree, lockstart, lockend); + unlock_extent_lockdep(&inode->io_tree, lockstart, lockend); =20 return ret; } @@ -3517,8 +3517,8 @@ static long btrfs_fallocate(struct file *file, int = mode, } =20 locked_end =3D alloc_end - 1; - lock_extent_bits(&BTRFS_I(inode)->io_tree, alloc_start, locked_end, - &cached_state); + lock_extent_bits_lockdep(&BTRFS_I(inode)->io_tree, alloc_start, locked_= end, + &cached_state, false); =20 btrfs_assert_inode_range_clean(BTRFS_I(inode), alloc_start, locked_end)= ; =20 @@ -3607,8 +3607,8 @@ static long btrfs_fallocate(struct file *file, int = mode, */ ret =3D btrfs_fallocate_update_isize(inode, actual_end, mode); out_unlock: - unlock_extent_cached(&BTRFS_I(inode)->io_tree, alloc_start, locked_end, - &cached_state); + unlock_extent_cached_lockdep(&BTRFS_I(inode)->io_tree, alloc_start, + locked_end, &cached_state); out: btrfs_inode_unlock(inode, BTRFS_ILOCK_MMAP); extent_changeset_free(data_reserved); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8cdb173331c7..0974df5bcdd5 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3214,6 +3214,7 @@ int btrfs_finish_ordered_io(struct btrfs_ordered_ex= tent *ordered_extent) bool freespace_inode; bool truncated =3D false; bool clear_reserved_extent =3D true; + bool have_locked =3D false; unsigned int clear_bits =3D EXTENT_DEFRAG; =20 start =3D ordered_extent->file_offset; @@ -3272,7 +3273,12 @@ int btrfs_finish_ordered_io(struct btrfs_ordered_e= xtent *ordered_extent) } =20 clear_bits |=3D EXTENT_LOCKED; - lock_extent_bits(io_tree, start, end, &cached_state); + lock_extent_bits_lockdep(io_tree, start, end, &cached_state, true); + /* + * We need this helper boolean because we can jump to out tag without + * having acquired the lockdep map first and unlock it + */ + have_locked =3D true; =20 if (freespace_inode) trans =3D btrfs_join_transaction_spacecache(root); @@ -3338,9 +3344,14 @@ int btrfs_finish_ordered_io(struct btrfs_ordered_e= xtent *ordered_extent) } ret =3D 0; out: - clear_extent_bit(&inode->io_tree, start, end, clear_bits, - (clear_bits & EXTENT_LOCKED) ? 1 : 0, 0, - &cached_state); + if (have_locked) + clear_extent_bit_lockdep(&inode->io_tree, start, end, clear_bits, + (clear_bits & EXTENT_LOCKED) ? 1 : 0, 0, + &cached_state); + else + clear_extent_bit(&inode->io_tree, start, end, clear_bits, + (clear_bits & EXTENT_LOCKED) ? 1 : 0, 0, + &cached_state); =20 if (trans) btrfs_end_transaction(trans); @@ -5139,7 +5150,8 @@ int btrfs_cont_expand(struct btrfs_inode *inode, lo= ff_t oldsize, loff_t size) break; } free_extent_map(em); - unlock_extent_cached(io_tree, hole_start, block_end - 1, &cached_state)= ; + unlock_extent_cached_lockdep(io_tree, hole_start, block_end - 1, + &cached_state); return err; } =20 diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index eb24a6d20ff8..c6d377aa4abb 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -1043,7 +1043,7 @@ void btrfs_lock_and_flush_ordered_range(struct btrf= s_inode *inode, u64 start, cachedp =3D cached_state; =20 while (1) { - lock_extent_bits(&inode->io_tree, start, end, cachedp); + lock_extent_bits_lockdep(&inode->io_tree, start, end, cachedp, true); ordered =3D btrfs_lookup_ordered_range(inode, start, end - start + 1); if (!ordered) { @@ -1056,7 +1056,7 @@ void btrfs_lock_and_flush_ordered_range(struct btrf= s_inode *inode, u64 start, refcount_dec(&cache->refs); break; } - unlock_extent_cached(&inode->io_tree, start, end, cachedp); + unlock_extent_cached_lockdep(&inode->io_tree, start, end, cachedp); btrfs_start_ordered_extent(ordered, 1); btrfs_put_ordered_extent(ordered); } --=20 2.30.2