From: Su Yue <l@damenly.org>
To: Brian Foster <bfoster@redhat.com>
Cc: Su Yue <glass.su@suse.com>,
linux-bcachefs@vger.kernel.org, l@damenly.org
Subject: Re: [PATCH] bcachefs: grab s_umount only if snapshotting
Date: Tue, 16 Jan 2024 22:10:53 +0800 [thread overview]
Message-ID: <8r4p71d3.fsf@damenly.org> (raw)
In-Reply-To: <ZaaJBUlutcDw7s8D@bfoster>
On Tue 16 Jan 2024 at 08:47, Brian Foster <bfoster@redhat.com>
wrote:
> On Mon, Jan 15, 2024 at 10:21:25AM +0800, Su Yue wrote:
>> When I was testing mongodb over bcachefs with compression,
>> there is a lockdep warning when snapshotting mongodb data
>> volume.
>>
>> $ cat test.sh
>> prog=bcachefs
>>
>> $prog subvolume create /mnt/data
>> $prog subvolume create /mnt/data/snapshots
>>
>> while true;do
>> $prog subvolume snapshot /mnt/data
>> /mnt/data/snapshots/$(date +%s)
>> sleep 1s
>> done
>>
>> $ cat /etc/mongodb.conf
>> systemLog:
>> destination: file
>> logAppend: true
>> path: /mnt/data/mongod.log
>>
>> storage:
>> dbPath: /mnt/data/
>>
>> lockdep reports:
>> [ 3437.452330]
>> ======================================================
>> [ 3437.452750] WARNING: possible circular locking dependency
>> detected
>> [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E
>> [ 3437.453562]
>> ------------------------------------------------------
>> [ 3437.453981] bcachefs/35533 is trying to acquire lock:
>> [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0},
>> at: filename_create+0x62/0x190
>> [ 3437.454875]
>> but task is already holding lock:
>> [ 3437.455268] ffffa0a02b2b10e0
>> (&type->s_umount_key#48){.+.+}-{3:3}, at:
>> bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
>> [ 3437.456009]
>> which lock already depends on the new lock.
>>
>> [ 3437.456553]
>> the existing dependency chain (in reverse order)
>> is:
>> [ 3437.457054]
>> -> #3 (&type->s_umount_key#48){.+.+}-{3:3}:
>> [ 3437.457507] down_read+0x3e/0x170
>> [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
>> [ 3437.458206] __x64_sys_ioctl+0x93/0xd0
>> [ 3437.458498] do_syscall_64+0x42/0xf0
>> [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76
>> [ 3437.459155]
>> -> #2 (&c->snapshot_create_lock){++++}-{3:3}:
>> [ 3437.459615] down_read+0x3e/0x170
>> [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs]
>> [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs]
>> [ 3437.460686] notify_change+0x1f1/0x4a0
>> [ 3437.461283] do_truncate+0x7f/0xd0
>> [ 3437.461555] path_openat+0xa57/0xce0
>> [ 3437.461836] do_filp_open+0xb4/0x160
>> [ 3437.462116] do_sys_openat2+0x91/0xc0
>> [ 3437.462402] __x64_sys_openat+0x53/0xa0
>> [ 3437.462701] do_syscall_64+0x42/0xf0
>> [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76
>> [ 3437.463359]
>> -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
>> [ 3437.463843] down_write+0x3b/0xc0
>> [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs]
>> [ 3437.464493] vfs_write+0x21b/0x4c0
>> [ 3437.464653] ksys_write+0x69/0xf0
>> [ 3437.464839] do_syscall_64+0x42/0xf0
>> [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76
>> [ 3437.465231]
>> -> #0 (sb_writers#10){.+.+}-{0:0}:
>> [ 3437.465471] __lock_acquire+0x1455/0x21b0
>> [ 3437.465656] lock_acquire+0xc6/0x2b0
>> [ 3437.465822] mnt_want_write+0x46/0x1a0
>> [ 3437.465996] filename_create+0x62/0x190
>> [ 3437.466175] user_path_create+0x2d/0x50
>> [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
>> [ 3437.466617] __x64_sys_ioctl+0x93/0xd0
>> [ 3437.466791] do_syscall_64+0x42/0xf0
>> [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76
>> [ 3437.467180]
>> other info that might help us debug this:
>>
>> [ 3437.469670] 2 locks held by bcachefs/35533:
>> other info that might help us debug this:
>>
>> [ 3437.467507] Chain exists of:
>> sb_writers#10 --> &c->snapshot_create_lock -->
>> &type->s_umount_key#48
>>
>> [ 3437.467979] Possible unsafe locking scenario:
>>
>> [ 3437.468223] CPU0 CPU1
>> [ 3437.468405] ---- ----
>> [ 3437.468585] rlock(&type->s_umount_key#48);
>> [ 3437.468758]
>> lock(&c->snapshot_create_lock);
>> [ 3437.469030]
>> lock(&type->s_umount_key#48);
>> [ 3437.469291] rlock(sb_writers#10);
>> [ 3437.469434]
>> *** DEADLOCK ***
>>
>> [ 3437.469670] 2 locks held by bcachefs/35533:
>> [ 3437.469838] #0: ffffa0a02ce00a88
>> (&c->snapshot_create_lock){++++}-{3:3}, at:
>> bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs]
>> [ 3437.470294] #1: ffffa0a02b2b10e0
>> (&type->s_umount_key#48){.+.+}-{3:3}, at:
>> bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
>> [ 3437.470744]
>> stack backtrace:
>> [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded
>> Tainted: G E 6.7.0-rc7-custom+ #85
>> [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX,
>> 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
>> [ 3437.471694] Call Trace:
>> [ 3437.471795] <TASK>
>> [ 3437.471884] dump_stack_lvl+0x57/0x90
>> [ 3437.472035] check_noncircular+0x132/0x150
>> [ 3437.472202] __lock_acquire+0x1455/0x21b0
>> [ 3437.472369] lock_acquire+0xc6/0x2b0
>> [ 3437.472518] ? filename_create+0x62/0x190
>> [ 3437.472683] ? lock_is_held_type+0x97/0x110
>> [ 3437.472856] mnt_want_write+0x46/0x1a0
>> [ 3437.473025] ? filename_create+0x62/0x190
>> [ 3437.473204] filename_create+0x62/0x190
>> [ 3437.473380] user_path_create+0x2d/0x50
>> [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
>> [ 3437.473819] ? lock_acquire+0xc6/0x2b0
>> [ 3437.474002] ? __fget_files+0x2a/0x190
>> [ 3437.474195] ? __fget_files+0xbc/0x190
>> [ 3437.474380] ? lock_release+0xc5/0x270
>> [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0
>> [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs]
>> [ 3437.475090] __x64_sys_ioctl+0x93/0xd0
>> [ 3437.475277] do_syscall_64+0x42/0xf0
>> [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76
>> [ 3437.475691] RIP: 0033:0x7f2743c313af
>> ======================================================
>>
>> In __bch2_ioctl_subvolume_create(), we grab s_umount
>> unconditionally
>> and unlock it at the end of the function. There is a comment
>> "why do we need this lock?" about the lock coming from
>> commit 42d237320e98 ("bcachefs: Snapshot creation, deletion")
>> The reason is that __bch2_ioctl_subvolume_create() calls
>> sync_inodes_sb() which enforce locked s_umount to writeback all
>> dirty
>> nodes before doing snapshot works.
>>
>> Fix it by read locking s_umount for snapshotting only and
>> unlocking
>> s_umount after sync_inodes_sb().
>>
>
> Heh, seems reasonable to me..
>
>> Signed-off-by: Su Yue <glass.su@suse.com>
>> ---
>> fs/bcachefs/fs-ioctl.c | 11 +++++------
>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c
>> index e0a19a73c8e1..1346861ed944 100644
>> --- a/fs/bcachefs/fs-ioctl.c
>> +++ b/fs/bcachefs/fs-ioctl.c
>> @@ -337,11 +337,12 @@ static long
>> __bch2_ioctl_subvolume_create(struct bch_fs *c, struct file
>> *filp,
>> if (arg.flags & BCH_SUBVOL_SNAPSHOT_RO)
>> create_flags |= BCH_CREATE_SNAPSHOT_RO;
>>
>> - /* why do we need this lock? */
>> - down_read(&c->vfs_sb->s_umount);
>> -
>> - if (arg.flags & BCH_SUBVOL_SNAPSHOT_CREATE)
>> + if (arg.flags & BCH_SUBVOL_SNAPSHOT_CREATE) {
>> + /* sync_inodes_sb enforce s_umount is locked */
>> + down_read(&c->vfs_sb->s_umount);
>> sync_inodes_sb(c->vfs_sb);
>> + up_read(&c->vfs_sb->s_umount);
>> + }
>> retry:
>> if (arg.src_ptr) {
>> error = user_path_at(arg.dirfd,
>> @@ -425,8 +426,6 @@ static long
>> __bch2_ioctl_subvolume_create(struct bch_fs *c, struct file
>> *filp,
>> goto retry;
>> }
>> err1:
>> - up_read(&c->vfs_sb->s_umount);
>> -
>
> Can we get rid of this label now that it serves no purpose?
>
I think we can't. Two gotos are still there(under the retry label)
unless change them to return
--
Su
> Brian
>
>> return error;
>> }
>>
>> --
>> 2.43.0
>>
>>
next prev parent reply other threads:[~2024-01-16 15:25 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-15 2:21 [PATCH] bcachefs: grab s_umount only if snapshotting Su Yue
2024-01-16 13:47 ` Brian Foster
2024-01-16 14:10 ` Su Yue [this message]
2024-01-16 15:32 ` Brian Foster
2024-01-16 15:45 ` Glass Su
2024-01-16 17:17 ` Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8r4p71d3.fsf@damenly.org \
--to=l@damenly.org \
--cc=bfoster@redhat.com \
--cc=glass.su@suse.com \
--cc=linux-bcachefs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox