All of lore.kernel.org
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@kernel.org>
To: dsterba@suse.cz
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 0/8] btrfs: some speedups for io tree operations and cleanups
Date: Fri, 22 Sep 2023 20:13:50 +0100	[thread overview]
Message-ID: <ZQ3nbut84wv6jWiT@debian0.Home> (raw)
In-Reply-To: <CAL3q7H62P3h3ABOXn2HjqQ3ZwBp1XBhHqbXsQnktrfHZCyGMMQ@mail.gmail.com>

On Fri, Sep 22, 2023 at 07:43:24PM +0100, Filipe Manana wrote:
> On Fri, Sep 22, 2023 at 7:24 PM David Sterba <dsterba@suse.cz> wrote:
> >
> > On Fri, Sep 22, 2023 at 11:39:01AM +0100, fdmanana@kernel.org wrote:
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > These are some changes to make some io tree operations more efficient and
> > > some cleanups. More details on the changelogs.
> > >
> > > Filipe Manana (8):
> > >   btrfs: make extent state merges more efficient during insertions
> > >   btrfs: update stale comment at extent_io_tree_release()
> > >   btrfs: make wait_extent_bit() static
> > >   btrfs: remove pointless memory barrier from extent_io_tree_release()
> > >   btrfs: collapse wait_on_state() to its caller wait_extent_bit()
> > >   btrfs: make extent_io_tree_release() more efficient
> > >   btrfs: use extent_io_tree_release() to empty dirty log pages
> > >   btrfs: make sure we cache next state in find_first_extent_bit()
> >
> > I see a lot of reports like:
> >
> > btrfs/004        [16:14:23][  468.732077] run fstests btrfs/004 at 2023-09-22 16:14:24
> > [  470.921989] BTRFS: device fsid f7d57de2-899a-4b33-b77a-084058ac36e9 devid 1 transid 11 /dev/vda scanned by mount (31993)
> > [  470.926438] BTRFS info (device vda): using crc32c (crc32c-generic) checksum algorithm
> > [  470.928013] BTRFS info (device vda): using free space tree
> > [  470.952723] BTRFS info (device vda): auto enabling async discard
> > [  472.385556] BTRFS: device fsid 097a012d-8e9b-4bd8-960d-87577821cbbe devid 1 transid 6 /dev/vdb scanned by mount (32061)
> > [  472.395192] BTRFS info (device vdb): using crc32c (crc32c-generic) checksum algorithm
> > [  472.398895] BTRFS info (device vdb): using free space tree
> > [  472.438755] BTRFS info (device vdb): auto enabling async discard
> > [  472.440900] BTRFS info (device vdb): checking UUID tree
> > [  472.534254] BUG: sleeping function called from invalid context at fs/btrfs/extent-io-tree.c:132
> > [  472.539305] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 32079, name: fsstress
> > [  472.542685] preempt_count: 1, expected: 0
> > [  472.543641] RCU nest depth: 0, expected: 0
> > [  472.544778] 6 locks held by fsstress/32079:
> > [  472.546916]  #0: ffff888017ad4648 (sb_internal#2){.+.+}-{0:0}, at: btrfs_sync_file+0x594/0xa20 [btrfs]
> > [  472.551474]  #1: ffff888014c72790 (btrfs_trans_completed){.+.+}-{0:0}, at: btrfs_commit_transaction+0x840/0x1580 [btrfs]
> > [  472.556334]  #2: ffff888014c72760 (btrfs_trans_super_committed){.+.+}-{0:0}, at: btrfs_commit_transaction+0x840/0x1580 [btrfs]
> > [  472.561372]  #3: ffff888014c72730 (btrfs_trans_unblocked){++++}-{0:0}, at: btrfs_commit_transaction+0x840/0x1580 [btrfs]
> > [  472.565793]  #4: ffff888014c70de0 (&fs_info->reloc_mutex){+.+.}-{3:3}, at: btrfs_commit_transaction+0x8ed/0x1580 [btrfs]
> > [  472.569931]  #5: ffff88802ec1c248 (&tree->lock#2){+.+.}-{2:2}, at: extent_io_tree_release+0x1c/0x120 [btrfs]
> > [  472.573099] Preemption disabled at:
> > [  472.573110] [<0000000000000000>] 0x0
> > [  472.575200] CPU: 0 PID: 32079 Comm: fsstress Not tainted 6.6.0-rc2-default+ #2197
> > [  472.577440] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
> > [  472.580902] Call Trace:
> > [  472.581738]  <TASK>
> > [  472.582480]  dump_stack_lvl+0x60/0x70
> > [  472.583621]  __might_resched+0x224/0x360
> > [  472.584591]  extent_io_tree_release+0xa5/0x120 [btrfs]
> > [  472.585864]  free_log_tree+0xec/0x250 [btrfs]
> > [  472.587007]  ? walk_log_tree+0x4a0/0x4a0 [btrfs]
> > [  472.588116]  ? reacquire_held_locks+0x280/0x280
> > [  472.589104]  ? btrfs_log_holes+0x430/0x430 [btrfs]
> > [  472.590296]  ? node_tag_clear+0xf4/0x160
> > [  472.591292]  btrfs_free_log+0x2c/0x60 [btrfs]
> > [  472.592468]  commit_fs_roots+0x1e0/0x440 [btrfs]
> > [  472.593569]  ? __lock_release.isra.0+0x14e/0x510
> > [  472.594470]  ? percpu_up_read+0xe0/0xe0 [btrfs]
> > [  472.595496]  ? btrfs_run_delayed_refs+0xf6/0x180 [btrfs]
> > [  472.596758]  ? btrfs_assert_delayed_root_empty+0x2d/0xd0 [btrfs]
> > [  472.598077]  ? btrfs_commit_transaction+0x932/0x1580 [btrfs]
> > [  472.599437]  btrfs_commit_transaction+0x94e/0x1580 [btrfs]
> > [  472.600845]  ? cleanup_transaction+0x650/0x650 [btrfs]
> > [  472.602164]  ? preempt_count_sub+0x18/0xc0
> > [  472.603111]  ? __rcu_read_unlock+0x67/0x90
> > [  472.604011]  ? preempt_count_add+0x71/0xd0
> > [  472.604840]  ? preempt_count_sub+0x18/0xc0
> > [  472.605664]  ? preempt_count_add+0x71/0xd0
> > [  472.606454]  ? preempt_count_sub+0x18/0xc0
> > [  472.607251]  ? __up_write+0x125/0x300
> > [  472.608059]  btrfs_sync_file+0x794/0xa20 [btrfs]
> > [  472.609105]  ? start_ordered_ops.constprop.0+0xd0/0xd0 [btrfs]
> > [  472.610392]  ? mark_held_locks+0x1a/0x80
> > [  472.611179]  __x64_sys_fdatasync+0x70/0xb0
> > [  472.614054]  do_syscall_64+0x3d/0x90
> > [  472.614584]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > [  472.615522] RIP: 0033:0x7f9bbbe983d4
> >
> >  115 void extent_io_tree_release(struct extent_io_tree *tree)
> >  116 {
> >  117         struct extent_state *state;
> >  118         struct extent_state *tmp;
> >  119
> >  120         spin_lock(&tree->lock);
> >  121         rbtree_postorder_for_each_entry_safe(state, tmp, &tree->state, rb_node) {
> >  122                 /* Clear node to keep free_extent_state() happy. */
> >  123                 RB_CLEAR_NODE(&state->rb_node);
> >  124                 ASSERT(!(state->state & EXTENT_LOCKED));
> >  125                 /*
> >  126                  * No need for a memory barrier here, as we are holding the tree
> >  127                  * lock and we only change the waitqueue while holding that lock
> >  128                  * (see wait_extent_bit()).
> >  129                  */
> >  130                 ASSERT(!waitqueue_active(&state->wq));
> >  131                 free_extent_state(state);
> >  132                 cond_resched();
> >  ^^^
> >
> > should be cond_resched_lock() as it's under spinlock but then I'm not
> > sure if relocking is still safe in the middle of the tree traversal.
> 
> cond_resched_lock() works here under the assumption that at this point no other
> tasks access the tree (as documented in an earlier patch) - if the
> assumption is broken in
> the future, then another task can access a node that was freed before
> the cond_resched_lock()
> call and result in a use-after-free or other weird issues.
> 
> But I think it's best to remove the cond_resched() call. I removed it
> and then added it again
> but didn't think properly and ran on a vm without several debugging
> configs turned on, likely
> why I didn't get that splat.
> 
> Do you want me to send a v2 with the cond_resched() line deleted?

Better, we can keep the cond_resched_lock() safely with this incremental on
top of the patch:

diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index 6e3645afaa38..32788fb7837e 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -114,11 +114,14 @@ void extent_io_tree_init(struct btrfs_fs_info *fs_info,
  */
 void extent_io_tree_release(struct extent_io_tree *tree)
 {
+       struct rb_root root;
        struct extent_state *state;
        struct extent_state *tmp;
 
        spin_lock(&tree->lock);
-       rbtree_postorder_for_each_entry_safe(state, tmp, &tree->state, rb_node) {
+       root = tree->state;
+       tree->state = RB_ROOT;
+       rbtree_postorder_for_each_entry_safe(state, tmp, &root, rb_node) {
                /* Clear node to keep free_extent_state() happy. */
                RB_CLEAR_NODE(&state->rb_node);
                ASSERT(!(state->state & EXTENT_LOCKED));
@@ -129,9 +132,13 @@ void extent_io_tree_release(struct extent_io_tree *tree)
                 */
                ASSERT(!waitqueue_active(&state->wq));
                free_extent_state(state);
-               cond_resched();
+               cond_resched_lock(&tree->lock);
        }
-       tree->state = RB_ROOT;
+       /*
+        * Should still be empty even after a reschedule, no other task should
+        * be accessing the tree anymore.
+        */
+       ASSERT(RB_EMPTY_ROOT(&tree->state));
        spin_unlock(&tree->lock);
 }
 
Also pasted at: https://pastebin.com/raw/uVMP2e5b

Let me know if you prefer I send a v2 or squash this patch.

Thanks.

> 
> Thanks.
> 
> >
> >  133         }
> >  134         tree->state = RB_ROOT;
> >  135         spin_unlock(&tree->lock);
> >  136 }

  parent reply	other threads:[~2023-09-22 19:13 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-22 10:39 [PATCH 0/8] btrfs: some speedups for io tree operations and cleanups fdmanana
2023-09-22 10:39 ` [PATCH 1/8] btrfs: make extent state merges more efficient during insertions fdmanana
2023-09-22 10:39 ` [PATCH 2/8] btrfs: update stale comment at extent_io_tree_release() fdmanana
2023-09-22 10:39 ` [PATCH 3/8] btrfs: make wait_extent_bit() static fdmanana
2023-09-22 22:49   ` Anand Jain
2023-09-22 10:39 ` [PATCH 4/8] btrfs: remove pointless memory barrier from extent_io_tree_release() fdmanana
2023-09-29 15:25   ` David Sterba
2023-09-22 10:39 ` [PATCH 5/8] btrfs: collapse wait_on_state() to its caller wait_extent_bit() fdmanana
2023-09-22 22:54   ` Anand Jain
2023-09-22 10:39 ` [PATCH 6/8] btrfs: make extent_io_tree_release() more efficient fdmanana
2023-09-22 10:39 ` [PATCH 7/8] btrfs: use extent_io_tree_release() to empty dirty log pages fdmanana
2023-09-26  6:25   ` kernel test robot
2023-09-26  6:25     ` [LTP] " kernel test robot
2023-09-26 16:27     ` David Sterba
2023-09-26 16:27       ` [LTP] " David Sterba
2023-09-22 10:39 ` [PATCH 8/8] btrfs: make sure we cache next state in find_first_extent_bit() fdmanana
2023-09-22 18:18 ` [PATCH 0/8] btrfs: some speedups for io tree operations and cleanups David Sterba
2023-09-22 18:43   ` Filipe Manana
2023-09-22 19:08     ` David Sterba
2023-09-22 19:13     ` Filipe Manana [this message]
2023-09-22 19:15       ` David Sterba
2023-09-29 16:47 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZQ3nbut84wv6jWiT@debian0.Home \
    --to=fdmanana@kernel.org \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.