From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7780EC433F5 for ; Mon, 7 Mar 2022 11:28:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237975AbiCGL33 (ORCPT ); Mon, 7 Mar 2022 06:29:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57242 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240230AbiCGL10 (ORCPT ); Mon, 7 Mar 2022 06:27:26 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 199C83A3 for ; Mon, 7 Mar 2022 03:05:27 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B1643B810AA for ; Mon, 7 Mar 2022 11:05:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C4AB5C340E9; Mon, 7 Mar 2022 11:05:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1646651124; bh=GUQXFeyhwlCOzqUgEtGAN4bsCDNeymFO7PlklaRgvQo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=QrVdd7f83NHJHbU3LJMWwQQ0tDRDGNqDkfhYTnrbZ3iPfo4wEffjtCcY8tXfrUcpa nkbhe9uOKQmJuTxRg5kEFfVF/rIp68Oi6j1dUQjTWC2oAGq51udACnyjKOLg8n/iCQ I23eVIDdXi0dlBNm/Ez9scFfKVFpdx58dwcvIjYxBOqkO1rCbq2cJdMJPhY1O5ZPhH yVDDGeVdRZoWbLo7f0akoXSANGZAV2uOGBjVCeGeAEbNfWI/D5hYljr+nAZjSQeZxE O06Ylf/LDHB33rxNqFcSLjOsX5jMJEaUU9ylZ/P+nGTqkafgvp5aMEtp2FhpnK285P icUKkZSsyCcGA== Date: Mon, 7 Mar 2022 11:05:20 +0000 From: Filipe Manana To: Zygo Blaxell Cc: linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs: avoid full commit due to race when logging dentry deletion Message-ID: References: <78d0dffe5f48910e126886559d0c69194b32eab9.1646416779.git.fdmanana@suse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Sat, Mar 05, 2022 at 06:19:28PM -0500, Zygo Blaxell wrote: > On Fri, Mar 04, 2022 at 06:10:57PM +0000, fdmanana@kernel.org wrote: > > From: Filipe Manana > > > > During a rename, when logging that a directory entry was deleted, we may > > race with another task that is logging the directory. Even though the > > directory is locked at the VFS level, its logging can be triggered when > > other task is logging some other inode that had, or still has, a dentry > > in the directory (because its last_unlink_trans matches the current > > transaction). > > > > The chances are slim, and if the race happens, recording the deletion > > through insert_dir_log_key() can fail with -EEXIST and result in marking > > the log for a full transaction commit, which will make the next fsync > > fallback to a transaction commit. The opposite can also happen, we log the > > key before the other task attempts to insert the same key, in which case > > it fails with -EEXIST and fallsback to a transaction commit or trigger an > > assertion at process_dir_items_leaf() due to the unexpected -EEXIST error. > > Nice! I was just about to report this. I had a test VM hit it a dozen > times running misc-next. > > Does the stack trace look like this? Yes, that's it. Thanks. > > [ 324.007183][ T5981] assertion failed: ret != -EEXIST, in fs/btrfs/tree-log.c:3857 > [ 324.014960][ T5981] ------------[ cut here ]------------ > [ 324.020696][ T5981] kernel BUG at fs/btrfs/ctree.h:3553! > [ 324.026478][ T5981] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI > [ 324.032768][ T5981] CPU: 3 PID: 5981 Comm: repro-ghost-dir Not tainted 5.17.0-5d855cca4965-misc-next+ #154 0ccd294c521679ce6b41b39a0ab10e4adb946622 > [ 324.046174][ T5981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 > [ 324.064520][ T5981] RIP: 0010:assertfail.constprop.0+0x1c/0x1e > [ 324.073000][ T5981] Code: b3 4c 89 ef e8 2e 28 ff ff e9 d4 22 df fe 55 89 f1 48 c7 c2 40 43 4e b3 48 89 fe 48 c7 c7 80 43 4e b3 48 89 e5 e8 ce 81 fd ff <0f> 0b e8 7c 14 9a fe be a7 0e 00 00 48 c7 c7 20 44 4e b3 e8 cc ff > [ 324.092521][ T5981] RSP: 0018:ffffc90005c1f198 EFLAGS: 00010282 > [ 324.094666][ T5981] RAX: 000000000000003d RBX: 00000000000008e4 RCX: 0000000000000000 > [ 324.099245][ T5981] RDX: 0000000000000001 RSI: ffffffffb36038e0 RDI: fffff52000b83e26 > [ 324.104262][ T5981] RBP: ffffc90005c1f198 R08: 000000000000003d R09: ffff8881f7016507 > [ 324.109447][ T5981] R10: ffffed103ee02ca0 R11: 0000000000000001 R12: 00000000ffffffef > [ 324.114318][ T5981] R13: ffff88812c4d17c8 R14: ffff8881da62e1f8 R15: ffffc90005c1f368 > [ 324.119009][ T5981] FS: 00007f82dd62a640(0000) GS:ffff8881f6e00000(0000) knlGS:0000000000000000 > [ 324.124619][ T5981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 324.128796][ T5981] CR2: 000055a3e6285078 CR3: 0000000122b88005 CR4: 0000000000170ee0 > [ 324.133953][ T5981] Call Trace: > [ 324.135883][ T5981] > [ 324.137797][ T5981] log_dir_items.cold+0x16/0x2c > [ 324.140806][ T5981] ? replay_one_extent+0xbe0/0xbe0 > [ 324.144779][ T5981] ? __asan_loadN+0xf/0x20 > [ 324.147926][ T5981] ? __kasan_check_read+0x11/0x20 > [ 324.151339][ T5981] ? check_chain_key+0x1ee/0x2e0 > [ 324.154930][ T5981] ? __this_cpu_preempt_check+0x13/0x20 > [ 324.158814][ T5981] log_directory_changes+0xf9/0x170 > [ 324.162212][ T5981] ? log_dir_items+0xba0/0xba0 > [ 324.165276][ T5981] ? do_raw_write_unlock+0x7d/0xe0 > [ 324.168437][ T5981] btrfs_log_inode+0x2345/0x26d0 > [ 324.171480][ T5981] ? __asan_loadN+0xf/0x20 > [ 324.174221][ T5981] ? log_directory_changes+0x170/0x170 > [ 324.177464][ T5981] ? __this_cpu_preempt_check+0x13/0x20 > [ 324.180617][ T5981] ? __this_cpu_preempt_check+0x13/0x20 > [ 324.183851][ T5981] ? __might_resched+0x129/0x1c0 > [ 324.186715][ T5981] ? __might_sleep+0x66/0xc0 > [ 324.189432][ T5981] ? __kasan_check_read+0x11/0x20 > [ 324.192502][ T5981] ? iget5_locked+0xbd/0x150 > [ 324.195165][ T5981] ? acls_after_inode_item+0x290/0x290 > [ 324.199052][ T5981] ? btrfs_iget+0xc7/0x150 > [ 324.202206][ T5981] ? btrfs_orphan_cleanup+0x4a0/0x4a0 > [ 324.205589][ T5981] ? free_extent_buffer+0x13/0x20 > [ 324.208814][ T5981] btrfs_log_inode+0x268b/0x26d0 > [ 324.212339][ T5981] ? log_directory_changes+0x170/0x170 > [ 324.215788][ T5981] ? __this_cpu_preempt_check+0x13/0x20 > [ 324.219077][ T5981] ? lock_downgrade+0x420/0x420 > [ 324.222273][ T5981] ? __mutex_lock+0x540/0xdc0 > [ 324.225490][ T5981] ? check_chain_key+0x1ee/0x2e0 > [ 324.228805][ T5981] ? __kasan_check_write+0x14/0x20 > [ 324.232127][ T5981] ? __mutex_unlock_slowpath+0x12a/0x430 > [ 324.235852][ T5981] ? mutex_lock_io_nested+0xcd0/0xcd0 > [ 324.239247][ T5981] ? wait_for_completion_io_timeout+0x20/0x20 > [ 324.243062][ T5981] ? lock_downgrade+0x420/0x420 > [ 324.246695][ T5981] ? lock_contended+0x770/0x770 > [ 324.250187][ T5981] btrfs_log_inode_parent+0x523/0x17c0 > [ 324.253818][ T5981] ? pvclock_clocksource_read+0xde/0x1a0 > [ 324.258191][ T5981] ? __asan_loadN+0xf/0x20 > [ 324.261498][ T5981] ? pvclock_clocksource_read+0xde/0x1a0 > [ 324.265875][ T5981] ? btrfs_end_log_trans+0x70/0x70 > [ 324.270084][ T5981] ? __this_cpu_preempt_check+0x13/0x20 > [ 324.274520][ T5981] ? lock_downgrade+0x420/0x420 > [ 324.277936][ T5981] ? do_raw_spin_unlock+0xad/0x110 > [ 324.281957][ T5981] ? dget_parent+0xb7/0x300 > [ 324.285570][ T5981] btrfs_log_dentry_safe+0x48/0x60 > [ 324.289434][ T5981] btrfs_sync_file+0x629/0xa40 > [ 324.292954][ T5981] ? start_ordered_ops.constprop.0+0x120/0x120 > [ 324.297919][ T5981] ? __fget_files+0x167/0x230 > [ 324.301706][ T5981] vfs_fsync_range+0x6d/0x110 > [ 324.304831][ T5981] ? start_ordered_ops.constprop.0+0x120/0x120 > [ 324.309013][ T5981] __x64_sys_fsync+0x45/0x70 > [ 324.311898][ T5981] do_syscall_64+0x5c/0xc0 > [ 324.315064][ T5981] ? lockdep_hardirqs_on+0xce/0x150 > [ 324.318504][ T5981] ? syscall_exit_to_user_mode+0x3b/0x50 > [ 324.322046][ T5981] ? do_syscall_64+0x69/0xc0 > [ 324.325180][ T5981] ? sysvec_call_function_single+0x57/0xc0 > [ 324.329166][ T5981] ? asm_sysvec_call_function_single+0xa/0x20 > [ 324.332977][ T5981] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 324.336984][ T5981] RIP: 0033:0x7f82e137b6ab > [ 324.339923][ T5981] Code: 0f 05 48 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8 53 f7 ff ff 8b 7c 24 0c 41 89 c0 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 b1 f7 ff ff 8b 44 > [ 324.353145][ T5981] RSP: 002b:00007f82dd629950 EFLAGS: 00000293 ORIG_RAX: 000000000000004a > [ 324.357209][ T5981] RAX: ffffffffffffffda RBX: 00007f82dd6299f0 RCX: 00007f82e137b6ab > [ 324.363329][ T5981] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 000000000000000c > [ 324.369064][ T5981] RBP: 000000000000000c R08: 0000000000000000 R09: 0000000000000000 > [ 324.374926][ T5981] R10: 00000000000001ff R11: 0000000000000293 R12: 0000000000000001 > [ 324.380237][ T5981] R13: 000055be30453140 R14: 00007f82dd6299d0 R15: 000055be31bfe098 > [ 324.385736][ T5981] > > > So make that code that records a dentry deletion to be inside a critical > > section delimited by the directory's log mutex. > > > > Signed-off-by: Filipe Manana > > --- > > > > David, this can be optionally squashed into the following patch: > > > > "btrfs: avoid logging all directory changes during renames" > > > > (misc-next only) > > > > fs/btrfs/tree-log.c | 12 ++++++++++++ > > 1 file changed, 12 insertions(+) > > > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > > index 4f61b38bb186..571dae8ad65e 100644 > > --- a/fs/btrfs/tree-log.c > > +++ b/fs/btrfs/tree-log.c > > @@ -7015,6 +7015,17 @@ void btrfs_log_new_name(struct btrfs_trans_handle *trans, > > goto out; > > } > > > > + /* > > + * Other concurrent task might be logging the old directory, > > + * as it can be triggered when logging other inode that had or > > + * still has a dentry in the old directory. So take the old > > + * directory's log_mutex to prevent getting an -EEXIST when > > + * logging a key to record the deletion, or having that other > > + * task logging the old directory get an -EEXIST if it attempts > > + * to log the same key after we just did it. In both cases that > > + * would result in falling back to a transaction commit. > > + */ > > + mutex_lock(&old_dir->log_mutex); > > ret = del_logged_dentry(trans, log, path, btrfs_ino(old_dir), > > old_dentry->d_name.name, > > old_dentry->d_name.len, old_dir_index); > > @@ -7028,6 +7039,7 @@ void btrfs_log_new_name(struct btrfs_trans_handle *trans, > > btrfs_ino(old_dir), > > old_dir_index, old_dir_index); > > } > > + mutex_unlock(&old_dir->log_mutex); > > > > btrfs_free_path(path); > > if (ret < 0) > > -- > > 2.33.0 > >