* [PATCH] bcachefs: grab s_umount only if snapshotting
@ 2024-01-15 2:21 Su Yue
2024-01-16 13:47 ` Brian Foster
0 siblings, 1 reply; 6+ messages in thread
From: Su Yue @ 2024-01-15 2:21 UTC (permalink / raw)
To: linux-bcachefs; +Cc: l, Su Yue
When I was testing mongodb over bcachefs with compression,
there is a lockdep warning when snapshotting mongodb data volume.
$ cat test.sh
prog=bcachefs
$prog subvolume create /mnt/data
$prog subvolume create /mnt/data/snapshots
while true;do
$prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s)
sleep 1s
done
$ cat /etc/mongodb.conf
systemLog:
destination: file
logAppend: true
path: /mnt/data/mongod.log
storage:
dbPath: /mnt/data/
lockdep reports:
[ 3437.452330] ======================================================
[ 3437.452750] WARNING: possible circular locking dependency detected
[ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E
[ 3437.453562] ------------------------------------------------------
[ 3437.453981] bcachefs/35533 is trying to acquire lock:
[ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190
[ 3437.454875]
but task is already holding lock:
[ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.456009]
which lock already depends on the new lock.
[ 3437.456553]
the existing dependency chain (in reverse order) is:
[ 3437.457054]
-> #3 (&type->s_umount_key#48){.+.+}-{3:3}:
[ 3437.457507] down_read+0x3e/0x170
[ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.458206] __x64_sys_ioctl+0x93/0xd0
[ 3437.458498] do_syscall_64+0x42/0xf0
[ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.459155]
-> #2 (&c->snapshot_create_lock){++++}-{3:3}:
[ 3437.459615] down_read+0x3e/0x170
[ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs]
[ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs]
[ 3437.460686] notify_change+0x1f1/0x4a0
[ 3437.461283] do_truncate+0x7f/0xd0
[ 3437.461555] path_openat+0xa57/0xce0
[ 3437.461836] do_filp_open+0xb4/0x160
[ 3437.462116] do_sys_openat2+0x91/0xc0
[ 3437.462402] __x64_sys_openat+0x53/0xa0
[ 3437.462701] do_syscall_64+0x42/0xf0
[ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.463359]
-> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
[ 3437.463843] down_write+0x3b/0xc0
[ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs]
[ 3437.464493] vfs_write+0x21b/0x4c0
[ 3437.464653] ksys_write+0x69/0xf0
[ 3437.464839] do_syscall_64+0x42/0xf0
[ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.465231]
-> #0 (sb_writers#10){.+.+}-{0:0}:
[ 3437.465471] __lock_acquire+0x1455/0x21b0
[ 3437.465656] lock_acquire+0xc6/0x2b0
[ 3437.465822] mnt_want_write+0x46/0x1a0
[ 3437.465996] filename_create+0x62/0x190
[ 3437.466175] user_path_create+0x2d/0x50
[ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.466617] __x64_sys_ioctl+0x93/0xd0
[ 3437.466791] do_syscall_64+0x42/0xf0
[ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.467180]
other info that might help us debug this:
[ 3437.469670] 2 locks held by bcachefs/35533:
other info that might help us debug this:
[ 3437.467507] Chain exists of:
sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48
[ 3437.467979] Possible unsafe locking scenario:
[ 3437.468223] CPU0 CPU1
[ 3437.468405] ---- ----
[ 3437.468585] rlock(&type->s_umount_key#48);
[ 3437.468758] lock(&c->snapshot_create_lock);
[ 3437.469030] lock(&type->s_umount_key#48);
[ 3437.469291] rlock(sb_writers#10);
[ 3437.469434]
*** DEADLOCK ***
[ 3437.469670] 2 locks held by bcachefs/35533:
[ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs]
[ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.470744]
stack backtrace:
[ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85
[ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 3437.471694] Call Trace:
[ 3437.471795] <TASK>
[ 3437.471884] dump_stack_lvl+0x57/0x90
[ 3437.472035] check_noncircular+0x132/0x150
[ 3437.472202] __lock_acquire+0x1455/0x21b0
[ 3437.472369] lock_acquire+0xc6/0x2b0
[ 3437.472518] ? filename_create+0x62/0x190
[ 3437.472683] ? lock_is_held_type+0x97/0x110
[ 3437.472856] mnt_want_write+0x46/0x1a0
[ 3437.473025] ? filename_create+0x62/0x190
[ 3437.473204] filename_create+0x62/0x190
[ 3437.473380] user_path_create+0x2d/0x50
[ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.473819] ? lock_acquire+0xc6/0x2b0
[ 3437.474002] ? __fget_files+0x2a/0x190
[ 3437.474195] ? __fget_files+0xbc/0x190
[ 3437.474380] ? lock_release+0xc5/0x270
[ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0
[ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs]
[ 3437.475090] __x64_sys_ioctl+0x93/0xd0
[ 3437.475277] do_syscall_64+0x42/0xf0
[ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.475691] RIP: 0033:0x7f2743c313af
======================================================
In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally
and unlock it at the end of the function. There is a comment
"why do we need this lock?" about the lock coming from
commit 42d237320e98 ("bcachefs: Snapshot creation, deletion")
The reason is that __bch2_ioctl_subvolume_create() calls
sync_inodes_sb() which enforce locked s_umount to writeback all dirty
nodes before doing snapshot works.
Fix it by read locking s_umount for snapshotting only and unlocking
s_umount after sync_inodes_sb().
Signed-off-by: Su Yue <glass.su@suse.com>
---
fs/bcachefs/fs-ioctl.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c
index e0a19a73c8e1..1346861ed944 100644
--- a/fs/bcachefs/fs-ioctl.c
+++ b/fs/bcachefs/fs-ioctl.c
@@ -337,11 +337,12 @@ static long __bch2_ioctl_subvolume_create(struct bch_fs *c, struct file *filp,
if (arg.flags & BCH_SUBVOL_SNAPSHOT_RO)
create_flags |= BCH_CREATE_SNAPSHOT_RO;
- /* why do we need this lock? */
- down_read(&c->vfs_sb->s_umount);
-
- if (arg.flags & BCH_SUBVOL_SNAPSHOT_CREATE)
+ if (arg.flags & BCH_SUBVOL_SNAPSHOT_CREATE) {
+ /* sync_inodes_sb enforce s_umount is locked */
+ down_read(&c->vfs_sb->s_umount);
sync_inodes_sb(c->vfs_sb);
+ up_read(&c->vfs_sb->s_umount);
+ }
retry:
if (arg.src_ptr) {
error = user_path_at(arg.dirfd,
@@ -425,8 +426,6 @@ static long __bch2_ioctl_subvolume_create(struct bch_fs *c, struct file *filp,
goto retry;
}
err1:
- up_read(&c->vfs_sb->s_umount);
-
return error;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] bcachefs: grab s_umount only if snapshotting 2024-01-15 2:21 [PATCH] bcachefs: grab s_umount only if snapshotting Su Yue @ 2024-01-16 13:47 ` Brian Foster 2024-01-16 14:10 ` Su Yue 0 siblings, 1 reply; 6+ messages in thread From: Brian Foster @ 2024-01-16 13:47 UTC (permalink / raw) To: Su Yue; +Cc: linux-bcachefs, l On Mon, Jan 15, 2024 at 10:21:25AM +0800, Su Yue wrote: > When I was testing mongodb over bcachefs with compression, > there is a lockdep warning when snapshotting mongodb data volume. > > $ cat test.sh > prog=bcachefs > > $prog subvolume create /mnt/data > $prog subvolume create /mnt/data/snapshots > > while true;do > $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) > sleep 1s > done > > $ cat /etc/mongodb.conf > systemLog: > destination: file > logAppend: true > path: /mnt/data/mongod.log > > storage: > dbPath: /mnt/data/ > > lockdep reports: > [ 3437.452330] ====================================================== > [ 3437.452750] WARNING: possible circular locking dependency detected > [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E > [ 3437.453562] ------------------------------------------------------ > [ 3437.453981] bcachefs/35533 is trying to acquire lock: > [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 > [ 3437.454875] > but task is already holding lock: > [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] > [ 3437.456009] > which lock already depends on the new lock. > > [ 3437.456553] > the existing dependency chain (in reverse order) is: > [ 3437.457054] > -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: > [ 3437.457507] down_read+0x3e/0x170 > [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] > [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 > [ 3437.458498] do_syscall_64+0x42/0xf0 > [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > [ 3437.459155] > -> #2 (&c->snapshot_create_lock){++++}-{3:3}: > [ 3437.459615] down_read+0x3e/0x170 > [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] > [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] > [ 3437.460686] notify_change+0x1f1/0x4a0 > [ 3437.461283] do_truncate+0x7f/0xd0 > [ 3437.461555] path_openat+0xa57/0xce0 > [ 3437.461836] do_filp_open+0xb4/0x160 > [ 3437.462116] do_sys_openat2+0x91/0xc0 > [ 3437.462402] __x64_sys_openat+0x53/0xa0 > [ 3437.462701] do_syscall_64+0x42/0xf0 > [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > [ 3437.463359] > -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: > [ 3437.463843] down_write+0x3b/0xc0 > [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] > [ 3437.464493] vfs_write+0x21b/0x4c0 > [ 3437.464653] ksys_write+0x69/0xf0 > [ 3437.464839] do_syscall_64+0x42/0xf0 > [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > [ 3437.465231] > -> #0 (sb_writers#10){.+.+}-{0:0}: > [ 3437.465471] __lock_acquire+0x1455/0x21b0 > [ 3437.465656] lock_acquire+0xc6/0x2b0 > [ 3437.465822] mnt_want_write+0x46/0x1a0 > [ 3437.465996] filename_create+0x62/0x190 > [ 3437.466175] user_path_create+0x2d/0x50 > [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] > [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 > [ 3437.466791] do_syscall_64+0x42/0xf0 > [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > [ 3437.467180] > other info that might help us debug this: > > [ 3437.469670] 2 locks held by bcachefs/35533: > other info that might help us debug this: > > [ 3437.467507] Chain exists of: > sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 > > [ 3437.467979] Possible unsafe locking scenario: > > [ 3437.468223] CPU0 CPU1 > [ 3437.468405] ---- ---- > [ 3437.468585] rlock(&type->s_umount_key#48); > [ 3437.468758] lock(&c->snapshot_create_lock); > [ 3437.469030] lock(&type->s_umount_key#48); > [ 3437.469291] rlock(sb_writers#10); > [ 3437.469434] > *** DEADLOCK *** > > [ 3437.469670] 2 locks held by bcachefs/35533: > [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] > [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] > [ 3437.470744] > stack backtrace: > [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85 > [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 > [ 3437.471694] Call Trace: > [ 3437.471795] <TASK> > [ 3437.471884] dump_stack_lvl+0x57/0x90 > [ 3437.472035] check_noncircular+0x132/0x150 > [ 3437.472202] __lock_acquire+0x1455/0x21b0 > [ 3437.472369] lock_acquire+0xc6/0x2b0 > [ 3437.472518] ? filename_create+0x62/0x190 > [ 3437.472683] ? lock_is_held_type+0x97/0x110 > [ 3437.472856] mnt_want_write+0x46/0x1a0 > [ 3437.473025] ? filename_create+0x62/0x190 > [ 3437.473204] filename_create+0x62/0x190 > [ 3437.473380] user_path_create+0x2d/0x50 > [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] > [ 3437.473819] ? lock_acquire+0xc6/0x2b0 > [ 3437.474002] ? __fget_files+0x2a/0x190 > [ 3437.474195] ? __fget_files+0xbc/0x190 > [ 3437.474380] ? lock_release+0xc5/0x270 > [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 > [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] > [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 > [ 3437.475277] do_syscall_64+0x42/0xf0 > [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > [ 3437.475691] RIP: 0033:0x7f2743c313af > ====================================================== > > In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally > and unlock it at the end of the function. There is a comment > "why do we need this lock?" about the lock coming from > commit 42d237320e98 ("bcachefs: Snapshot creation, deletion") > The reason is that __bch2_ioctl_subvolume_create() calls > sync_inodes_sb() which enforce locked s_umount to writeback all dirty > nodes before doing snapshot works. > > Fix it by read locking s_umount for snapshotting only and unlocking > s_umount after sync_inodes_sb(). > Heh, seems reasonable to me.. > Signed-off-by: Su Yue <glass.su@suse.com> > --- > fs/bcachefs/fs-ioctl.c | 11 +++++------ > 1 file changed, 5 insertions(+), 6 deletions(-) > > diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c > index e0a19a73c8e1..1346861ed944 100644 > --- a/fs/bcachefs/fs-ioctl.c > +++ b/fs/bcachefs/fs-ioctl.c > @@ -337,11 +337,12 @@ static long __bch2_ioctl_subvolume_create(struct bch_fs *c, struct file *filp, > if (arg.flags & BCH_SUBVOL_SNAPSHOT_RO) > create_flags |= BCH_CREATE_SNAPSHOT_RO; > > - /* why do we need this lock? */ > - down_read(&c->vfs_sb->s_umount); > - > - if (arg.flags & BCH_SUBVOL_SNAPSHOT_CREATE) > + if (arg.flags & BCH_SUBVOL_SNAPSHOT_CREATE) { > + /* sync_inodes_sb enforce s_umount is locked */ > + down_read(&c->vfs_sb->s_umount); > sync_inodes_sb(c->vfs_sb); > + up_read(&c->vfs_sb->s_umount); > + } > retry: > if (arg.src_ptr) { > error = user_path_at(arg.dirfd, > @@ -425,8 +426,6 @@ static long __bch2_ioctl_subvolume_create(struct bch_fs *c, struct file *filp, > goto retry; > } > err1: > - up_read(&c->vfs_sb->s_umount); > - Can we get rid of this label now that it serves no purpose? Brian > return error; > } > > -- > 2.43.0 > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bcachefs: grab s_umount only if snapshotting 2024-01-16 13:47 ` Brian Foster @ 2024-01-16 14:10 ` Su Yue 2024-01-16 15:32 ` Brian Foster 0 siblings, 1 reply; 6+ messages in thread From: Su Yue @ 2024-01-16 14:10 UTC (permalink / raw) To: Brian Foster; +Cc: Su Yue, linux-bcachefs, l On Tue 16 Jan 2024 at 08:47, Brian Foster <bfoster@redhat.com> wrote: > On Mon, Jan 15, 2024 at 10:21:25AM +0800, Su Yue wrote: >> When I was testing mongodb over bcachefs with compression, >> there is a lockdep warning when snapshotting mongodb data >> volume. >> >> $ cat test.sh >> prog=bcachefs >> >> $prog subvolume create /mnt/data >> $prog subvolume create /mnt/data/snapshots >> >> while true;do >> $prog subvolume snapshot /mnt/data >> /mnt/data/snapshots/$(date +%s) >> sleep 1s >> done >> >> $ cat /etc/mongodb.conf >> systemLog: >> destination: file >> logAppend: true >> path: /mnt/data/mongod.log >> >> storage: >> dbPath: /mnt/data/ >> >> lockdep reports: >> [ 3437.452330] >> ====================================================== >> [ 3437.452750] WARNING: possible circular locking dependency >> detected >> [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E >> [ 3437.453562] >> ------------------------------------------------------ >> [ 3437.453981] bcachefs/35533 is trying to acquire lock: >> [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, >> at: filename_create+0x62/0x190 >> [ 3437.454875] >> but task is already holding lock: >> [ 3437.455268] ffffa0a02b2b10e0 >> (&type->s_umount_key#48){.+.+}-{3:3}, at: >> bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] >> [ 3437.456009] >> which lock already depends on the new lock. >> >> [ 3437.456553] >> the existing dependency chain (in reverse order) >> is: >> [ 3437.457054] >> -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: >> [ 3437.457507] down_read+0x3e/0x170 >> [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] >> [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 >> [ 3437.458498] do_syscall_64+0x42/0xf0 >> [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >> [ 3437.459155] >> -> #2 (&c->snapshot_create_lock){++++}-{3:3}: >> [ 3437.459615] down_read+0x3e/0x170 >> [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] >> [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] >> [ 3437.460686] notify_change+0x1f1/0x4a0 >> [ 3437.461283] do_truncate+0x7f/0xd0 >> [ 3437.461555] path_openat+0xa57/0xce0 >> [ 3437.461836] do_filp_open+0xb4/0x160 >> [ 3437.462116] do_sys_openat2+0x91/0xc0 >> [ 3437.462402] __x64_sys_openat+0x53/0xa0 >> [ 3437.462701] do_syscall_64+0x42/0xf0 >> [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >> [ 3437.463359] >> -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: >> [ 3437.463843] down_write+0x3b/0xc0 >> [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] >> [ 3437.464493] vfs_write+0x21b/0x4c0 >> [ 3437.464653] ksys_write+0x69/0xf0 >> [ 3437.464839] do_syscall_64+0x42/0xf0 >> [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >> [ 3437.465231] >> -> #0 (sb_writers#10){.+.+}-{0:0}: >> [ 3437.465471] __lock_acquire+0x1455/0x21b0 >> [ 3437.465656] lock_acquire+0xc6/0x2b0 >> [ 3437.465822] mnt_want_write+0x46/0x1a0 >> [ 3437.465996] filename_create+0x62/0x190 >> [ 3437.466175] user_path_create+0x2d/0x50 >> [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] >> [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 >> [ 3437.466791] do_syscall_64+0x42/0xf0 >> [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >> [ 3437.467180] >> other info that might help us debug this: >> >> [ 3437.469670] 2 locks held by bcachefs/35533: >> other info that might help us debug this: >> >> [ 3437.467507] Chain exists of: >> sb_writers#10 --> &c->snapshot_create_lock --> >> &type->s_umount_key#48 >> >> [ 3437.467979] Possible unsafe locking scenario: >> >> [ 3437.468223] CPU0 CPU1 >> [ 3437.468405] ---- ---- >> [ 3437.468585] rlock(&type->s_umount_key#48); >> [ 3437.468758] >> lock(&c->snapshot_create_lock); >> [ 3437.469030] >> lock(&type->s_umount_key#48); >> [ 3437.469291] rlock(sb_writers#10); >> [ 3437.469434] >> *** DEADLOCK *** >> >> [ 3437.469670] 2 locks held by bcachefs/35533: >> [ 3437.469838] #0: ffffa0a02ce00a88 >> (&c->snapshot_create_lock){++++}-{3:3}, at: >> bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] >> [ 3437.470294] #1: ffffa0a02b2b10e0 >> (&type->s_umount_key#48){.+.+}-{3:3}, at: >> bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] >> [ 3437.470744] >> stack backtrace: >> [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded >> Tainted: G E 6.7.0-rc7-custom+ #85 >> [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, >> 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 >> [ 3437.471694] Call Trace: >> [ 3437.471795] <TASK> >> [ 3437.471884] dump_stack_lvl+0x57/0x90 >> [ 3437.472035] check_noncircular+0x132/0x150 >> [ 3437.472202] __lock_acquire+0x1455/0x21b0 >> [ 3437.472369] lock_acquire+0xc6/0x2b0 >> [ 3437.472518] ? filename_create+0x62/0x190 >> [ 3437.472683] ? lock_is_held_type+0x97/0x110 >> [ 3437.472856] mnt_want_write+0x46/0x1a0 >> [ 3437.473025] ? filename_create+0x62/0x190 >> [ 3437.473204] filename_create+0x62/0x190 >> [ 3437.473380] user_path_create+0x2d/0x50 >> [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] >> [ 3437.473819] ? lock_acquire+0xc6/0x2b0 >> [ 3437.474002] ? __fget_files+0x2a/0x190 >> [ 3437.474195] ? __fget_files+0xbc/0x190 >> [ 3437.474380] ? lock_release+0xc5/0x270 >> [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 >> [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] >> [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 >> [ 3437.475277] do_syscall_64+0x42/0xf0 >> [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >> [ 3437.475691] RIP: 0033:0x7f2743c313af >> ====================================================== >> >> In __bch2_ioctl_subvolume_create(), we grab s_umount >> unconditionally >> and unlock it at the end of the function. There is a comment >> "why do we need this lock?" about the lock coming from >> commit 42d237320e98 ("bcachefs: Snapshot creation, deletion") >> The reason is that __bch2_ioctl_subvolume_create() calls >> sync_inodes_sb() which enforce locked s_umount to writeback all >> dirty >> nodes before doing snapshot works. >> >> Fix it by read locking s_umount for snapshotting only and >> unlocking >> s_umount after sync_inodes_sb(). >> > > Heh, seems reasonable to me.. > >> Signed-off-by: Su Yue <glass.su@suse.com> >> --- >> fs/bcachefs/fs-ioctl.c | 11 +++++------ >> 1 file changed, 5 insertions(+), 6 deletions(-) >> >> diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c >> index e0a19a73c8e1..1346861ed944 100644 >> --- a/fs/bcachefs/fs-ioctl.c >> +++ b/fs/bcachefs/fs-ioctl.c >> @@ -337,11 +337,12 @@ static long >> __bch2_ioctl_subvolume_create(struct bch_fs *c, struct file >> *filp, >> if (arg.flags & BCH_SUBVOL_SNAPSHOT_RO) >> create_flags |= BCH_CREATE_SNAPSHOT_RO; >> >> - /* why do we need this lock? */ >> - down_read(&c->vfs_sb->s_umount); >> - >> - if (arg.flags & BCH_SUBVOL_SNAPSHOT_CREATE) >> + if (arg.flags & BCH_SUBVOL_SNAPSHOT_CREATE) { >> + /* sync_inodes_sb enforce s_umount is locked */ >> + down_read(&c->vfs_sb->s_umount); >> sync_inodes_sb(c->vfs_sb); >> + up_read(&c->vfs_sb->s_umount); >> + } >> retry: >> if (arg.src_ptr) { >> error = user_path_at(arg.dirfd, >> @@ -425,8 +426,6 @@ static long >> __bch2_ioctl_subvolume_create(struct bch_fs *c, struct file >> *filp, >> goto retry; >> } >> err1: >> - up_read(&c->vfs_sb->s_umount); >> - > > Can we get rid of this label now that it serves no purpose? > I think we can't. Two gotos are still there(under the retry label) unless change them to return -- Su > Brian > >> return error; >> } >> >> -- >> 2.43.0 >> >> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bcachefs: grab s_umount only if snapshotting 2024-01-16 14:10 ` Su Yue @ 2024-01-16 15:32 ` Brian Foster 2024-01-16 15:45 ` Glass Su 0 siblings, 1 reply; 6+ messages in thread From: Brian Foster @ 2024-01-16 15:32 UTC (permalink / raw) To: Su Yue; +Cc: Su Yue, linux-bcachefs On Tue, Jan 16, 2024 at 10:10:53PM +0800, Su Yue wrote: > > On Tue 16 Jan 2024 at 08:47, Brian Foster <bfoster@redhat.com> wrote: > > > On Mon, Jan 15, 2024 at 10:21:25AM +0800, Su Yue wrote: > > > When I was testing mongodb over bcachefs with compression, > > > there is a lockdep warning when snapshotting mongodb data volume. > > > > > > $ cat test.sh > > > prog=bcachefs > > > > > > $prog subvolume create /mnt/data > > > $prog subvolume create /mnt/data/snapshots > > > > > > while true;do > > > $prog subvolume snapshot /mnt/data > > > /mnt/data/snapshots/$(date +%s) > > > sleep 1s > > > done > > > > > > $ cat /etc/mongodb.conf > > > systemLog: > > > destination: file > > > logAppend: true > > > path: /mnt/data/mongod.log > > > > > > storage: > > > dbPath: /mnt/data/ > > > > > > lockdep reports: > > > [ 3437.452330] > > > ====================================================== > > > [ 3437.452750] WARNING: possible circular locking dependency > > > detected > > > [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E > > > [ 3437.453562] > > > ------------------------------------------------------ > > > [ 3437.453981] bcachefs/35533 is trying to acquire lock: > > > [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: > > > filename_create+0x62/0x190 > > > [ 3437.454875] > > > but task is already holding lock: > > > [ 3437.455268] ffffa0a02b2b10e0 > > > (&type->s_umount_key#48){.+.+}-{3:3}, at: > > > bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] > > > [ 3437.456009] > > > which lock already depends on the new lock. > > > > > > [ 3437.456553] > > > the existing dependency chain (in reverse order) > > > is: > > > [ 3437.457054] > > > -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: > > > [ 3437.457507] down_read+0x3e/0x170 > > > [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] > > > [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 > > > [ 3437.458498] do_syscall_64+0x42/0xf0 > > > [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > > [ 3437.459155] > > > -> #2 (&c->snapshot_create_lock){++++}-{3:3}: > > > [ 3437.459615] down_read+0x3e/0x170 > > > [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] > > > [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] > > > [ 3437.460686] notify_change+0x1f1/0x4a0 > > > [ 3437.461283] do_truncate+0x7f/0xd0 > > > [ 3437.461555] path_openat+0xa57/0xce0 > > > [ 3437.461836] do_filp_open+0xb4/0x160 > > > [ 3437.462116] do_sys_openat2+0x91/0xc0 > > > [ 3437.462402] __x64_sys_openat+0x53/0xa0 > > > [ 3437.462701] do_syscall_64+0x42/0xf0 > > > [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > > [ 3437.463359] > > > -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: > > > [ 3437.463843] down_write+0x3b/0xc0 > > > [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] > > > [ 3437.464493] vfs_write+0x21b/0x4c0 > > > [ 3437.464653] ksys_write+0x69/0xf0 > > > [ 3437.464839] do_syscall_64+0x42/0xf0 > > > [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > > [ 3437.465231] > > > -> #0 (sb_writers#10){.+.+}-{0:0}: > > > [ 3437.465471] __lock_acquire+0x1455/0x21b0 > > > [ 3437.465656] lock_acquire+0xc6/0x2b0 > > > [ 3437.465822] mnt_want_write+0x46/0x1a0 > > > [ 3437.465996] filename_create+0x62/0x190 > > > [ 3437.466175] user_path_create+0x2d/0x50 > > > [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] > > > [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 > > > [ 3437.466791] do_syscall_64+0x42/0xf0 > > > [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > > [ 3437.467180] > > > other info that might help us debug this: > > > > > > [ 3437.469670] 2 locks held by bcachefs/35533: > > > other info that might help us debug this: > > > > > > [ 3437.467507] Chain exists of: > > > sb_writers#10 --> &c->snapshot_create_lock --> > > > &type->s_umount_key#48 > > > > > > [ 3437.467979] Possible unsafe locking scenario: > > > > > > [ 3437.468223] CPU0 CPU1 > > > [ 3437.468405] ---- ---- > > > [ 3437.468585] rlock(&type->s_umount_key#48); > > > [ 3437.468758] lock(&c->snapshot_create_lock); > > > [ 3437.469030] lock(&type->s_umount_key#48); > > > [ 3437.469291] rlock(sb_writers#10); > > > [ 3437.469434] > > > *** DEADLOCK *** > > > > > > [ 3437.469670] 2 locks held by bcachefs/35533: > > > [ 3437.469838] #0: ffffa0a02ce00a88 > > > (&c->snapshot_create_lock){++++}-{3:3}, at: > > > bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] > > > [ 3437.470294] #1: ffffa0a02b2b10e0 > > > (&type->s_umount_key#48){.+.+}-{3:3}, at: > > > bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] > > > [ 3437.470744] > > > stack backtrace: > > > [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded > > > Tainted: G E 6.7.0-rc7-custom+ #85 > > > [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, > > > 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 > > > [ 3437.471694] Call Trace: > > > [ 3437.471795] <TASK> > > > [ 3437.471884] dump_stack_lvl+0x57/0x90 > > > [ 3437.472035] check_noncircular+0x132/0x150 > > > [ 3437.472202] __lock_acquire+0x1455/0x21b0 > > > [ 3437.472369] lock_acquire+0xc6/0x2b0 > > > [ 3437.472518] ? filename_create+0x62/0x190 > > > [ 3437.472683] ? lock_is_held_type+0x97/0x110 > > > [ 3437.472856] mnt_want_write+0x46/0x1a0 > > > [ 3437.473025] ? filename_create+0x62/0x190 > > > [ 3437.473204] filename_create+0x62/0x190 > > > [ 3437.473380] user_path_create+0x2d/0x50 > > > [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] > > > [ 3437.473819] ? lock_acquire+0xc6/0x2b0 > > > [ 3437.474002] ? __fget_files+0x2a/0x190 > > > [ 3437.474195] ? __fget_files+0xbc/0x190 > > > [ 3437.474380] ? lock_release+0xc5/0x270 > > > [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 > > > [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] > > > [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 > > > [ 3437.475277] do_syscall_64+0x42/0xf0 > > > [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > > [ 3437.475691] RIP: 0033:0x7f2743c313af > > > ====================================================== > > > > > > In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally > > > and unlock it at the end of the function. There is a comment > > > "why do we need this lock?" about the lock coming from > > > commit 42d237320e98 ("bcachefs: Snapshot creation, deletion") > > > The reason is that __bch2_ioctl_subvolume_create() calls > > > sync_inodes_sb() which enforce locked s_umount to writeback all > > > dirty > > > nodes before doing snapshot works. > > > > > > Fix it by read locking s_umount for snapshotting only and unlocking > > > s_umount after sync_inodes_sb(). > > > > > > > Heh, seems reasonable to me.. > > > > > Signed-off-by: Su Yue <glass.su@suse.com> > > > --- > > > fs/bcachefs/fs-ioctl.c | 11 +++++------ > > > 1 file changed, 5 insertions(+), 6 deletions(-) > > > > > > diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c > > > index e0a19a73c8e1..1346861ed944 100644 > > > --- a/fs/bcachefs/fs-ioctl.c > > > +++ b/fs/bcachefs/fs-ioctl.c > > > @@ -337,11 +337,12 @@ static long > > > __bch2_ioctl_subvolume_create(struct bch_fs *c, struct file *filp, > > > if (arg.flags & BCH_SUBVOL_SNAPSHOT_RO) > > > create_flags |= BCH_CREATE_SNAPSHOT_RO; > > > > > > - /* why do we need this lock? */ > > > - down_read(&c->vfs_sb->s_umount); > > > - > > > - if (arg.flags & BCH_SUBVOL_SNAPSHOT_CREATE) > > > + if (arg.flags & BCH_SUBVOL_SNAPSHOT_CREATE) { > > > + /* sync_inodes_sb enforce s_umount is locked */ > > > + down_read(&c->vfs_sb->s_umount); > > > sync_inodes_sb(c->vfs_sb); > > > + up_read(&c->vfs_sb->s_umount); > > > + } > > > retry: > > > if (arg.src_ptr) { > > > error = user_path_at(arg.dirfd, > > > @@ -425,8 +426,6 @@ static long __bch2_ioctl_subvolume_create(struct > > > bch_fs *c, struct file *filp, > > > goto retry; > > > } > > > err1: > > > - up_read(&c->vfs_sb->s_umount); > > > - > > > > Can we get rid of this label now that it serves no purpose? > > > I think we can't. Two gotos are still there(under the retry label) > unless change them to return > Sorry.. yes, that's what I mean. Usually we can just return directly rather than goto for cases where there is no necessary clean up on exit. Brian > -- > Su > > > Brian > > > > > return error; > > > } > > > > > > -- > > > 2.43.0 > > > > > > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bcachefs: grab s_umount only if snapshotting 2024-01-16 15:32 ` Brian Foster @ 2024-01-16 15:45 ` Glass Su 2024-01-16 17:17 ` Kent Overstreet 0 siblings, 1 reply; 6+ messages in thread From: Glass Su @ 2024-01-16 15:45 UTC (permalink / raw) To: Brian Foster; +Cc: Su Yue, Glass Su, linux-bcachefs@vger.kernel.org > On Jan 16, 2024, at 23:32, Brian Foster <bfoster@redhat.com> wrote: > > On Tue, Jan 16, 2024 at 10:10:53PM +0800, Su Yue wrote: >> >> On Tue 16 Jan 2024 at 08:47, Brian Foster <bfoster@redhat.com> wrote: >> >>> On Mon, Jan 15, 2024 at 10:21:25AM +0800, Su Yue wrote: >>>> When I was testing mongodb over bcachefs with compression, >>>> there is a lockdep warning when snapshotting mongodb data volume. >>>> >>>> $ cat test.sh >>>> prog=bcachefs >>>> >>>> $prog subvolume create /mnt/data >>>> $prog subvolume create /mnt/data/snapshots >>>> >>>> while true;do >>>> $prog subvolume snapshot /mnt/data >>>> /mnt/data/snapshots/$(date +%s) >>>> sleep 1s >>>> done >>>> >>>> $ cat /etc/mongodb.conf >>>> systemLog: >>>> destination: file >>>> logAppend: true >>>> path: /mnt/data/mongod.log >>>> >>>> storage: >>>> dbPath: /mnt/data/ >>>> >>>> lockdep reports: >>>> [ 3437.452330] >>>> ====================================================== >>>> [ 3437.452750] WARNING: possible circular locking dependency >>>> detected >>>> [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E >>>> [ 3437.453562] >>>> ------------------------------------------------------ >>>> [ 3437.453981] bcachefs/35533 is trying to acquire lock: >>>> [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: >>>> filename_create+0x62/0x190 >>>> [ 3437.454875] >>>> but task is already holding lock: >>>> [ 3437.455268] ffffa0a02b2b10e0 >>>> (&type->s_umount_key#48){.+.+}-{3:3}, at: >>>> bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] >>>> [ 3437.456009] >>>> which lock already depends on the new lock. >>>> >>>> [ 3437.456553] >>>> the existing dependency chain (in reverse order) >>>> is: >>>> [ 3437.457054] >>>> -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: >>>> [ 3437.457507] down_read+0x3e/0x170 >>>> [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] >>>> [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 >>>> [ 3437.458498] do_syscall_64+0x42/0xf0 >>>> [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >>>> [ 3437.459155] >>>> -> #2 (&c->snapshot_create_lock){++++}-{3:3}: >>>> [ 3437.459615] down_read+0x3e/0x170 >>>> [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] >>>> [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] >>>> [ 3437.460686] notify_change+0x1f1/0x4a0 >>>> [ 3437.461283] do_truncate+0x7f/0xd0 >>>> [ 3437.461555] path_openat+0xa57/0xce0 >>>> [ 3437.461836] do_filp_open+0xb4/0x160 >>>> [ 3437.462116] do_sys_openat2+0x91/0xc0 >>>> [ 3437.462402] __x64_sys_openat+0x53/0xa0 >>>> [ 3437.462701] do_syscall_64+0x42/0xf0 >>>> [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >>>> [ 3437.463359] >>>> -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: >>>> [ 3437.463843] down_write+0x3b/0xc0 >>>> [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] >>>> [ 3437.464493] vfs_write+0x21b/0x4c0 >>>> [ 3437.464653] ksys_write+0x69/0xf0 >>>> [ 3437.464839] do_syscall_64+0x42/0xf0 >>>> [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >>>> [ 3437.465231] >>>> -> #0 (sb_writers#10){.+.+}-{0:0}: >>>> [ 3437.465471] __lock_acquire+0x1455/0x21b0 >>>> [ 3437.465656] lock_acquire+0xc6/0x2b0 >>>> [ 3437.465822] mnt_want_write+0x46/0x1a0 >>>> [ 3437.465996] filename_create+0x62/0x190 >>>> [ 3437.466175] user_path_create+0x2d/0x50 >>>> [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] >>>> [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 >>>> [ 3437.466791] do_syscall_64+0x42/0xf0 >>>> [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >>>> [ 3437.467180] >>>> other info that might help us debug this: >>>> >>>> [ 3437.469670] 2 locks held by bcachefs/35533: >>>> other info that might help us debug this: >>>> >>>> [ 3437.467507] Chain exists of: >>>> sb_writers#10 --> &c->snapshot_create_lock --> >>>> &type->s_umount_key#48 >>>> >>>> [ 3437.467979] Possible unsafe locking scenario: >>>> >>>> [ 3437.468223] CPU0 CPU1 >>>> [ 3437.468405] ---- ---- >>>> [ 3437.468585] rlock(&type->s_umount_key#48); >>>> [ 3437.468758] lock(&c->snapshot_create_lock); >>>> [ 3437.469030] lock(&type->s_umount_key#48); >>>> [ 3437.469291] rlock(sb_writers#10); >>>> [ 3437.469434] >>>> *** DEADLOCK *** >>>> >>>> [ 3437.469670] 2 locks held by bcachefs/35533: >>>> [ 3437.469838] #0: ffffa0a02ce00a88 >>>> (&c->snapshot_create_lock){++++}-{3:3}, at: >>>> bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] >>>> [ 3437.470294] #1: ffffa0a02b2b10e0 >>>> (&type->s_umount_key#48){.+.+}-{3:3}, at: >>>> bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] >>>> [ 3437.470744] >>>> stack backtrace: >>>> [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded >>>> Tainted: G E 6.7.0-rc7-custom+ #85 >>>> [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, >>>> 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 >>>> [ 3437.471694] Call Trace: >>>> [ 3437.471795] <TASK> >>>> [ 3437.471884] dump_stack_lvl+0x57/0x90 >>>> [ 3437.472035] check_noncircular+0x132/0x150 >>>> [ 3437.472202] __lock_acquire+0x1455/0x21b0 >>>> [ 3437.472369] lock_acquire+0xc6/0x2b0 >>>> [ 3437.472518] ? filename_create+0x62/0x190 >>>> [ 3437.472683] ? lock_is_held_type+0x97/0x110 >>>> [ 3437.472856] mnt_want_write+0x46/0x1a0 >>>> [ 3437.473025] ? filename_create+0x62/0x190 >>>> [ 3437.473204] filename_create+0x62/0x190 >>>> [ 3437.473380] user_path_create+0x2d/0x50 >>>> [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] >>>> [ 3437.473819] ? lock_acquire+0xc6/0x2b0 >>>> [ 3437.474002] ? __fget_files+0x2a/0x190 >>>> [ 3437.474195] ? __fget_files+0xbc/0x190 >>>> [ 3437.474380] ? lock_release+0xc5/0x270 >>>> [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 >>>> [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] >>>> [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 >>>> [ 3437.475277] do_syscall_64+0x42/0xf0 >>>> [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >>>> [ 3437.475691] RIP: 0033:0x7f2743c313af >>>> ====================================================== >>>> >>>> In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally >>>> and unlock it at the end of the function. There is a comment >>>> "why do we need this lock?" about the lock coming from >>>> commit 42d237320e98 ("bcachefs: Snapshot creation, deletion") >>>> The reason is that __bch2_ioctl_subvolume_create() calls >>>> sync_inodes_sb() which enforce locked s_umount to writeback all >>>> dirty >>>> nodes before doing snapshot works. >>>> >>>> Fix it by read locking s_umount for snapshotting only and unlocking >>>> s_umount after sync_inodes_sb(). >>>> >>> >>> Heh, seems reasonable to me.. >>> >>>> Signed-off-by: Su Yue <glass.su@suse.com> >>>> --- >>>> fs/bcachefs/fs-ioctl.c | 11 +++++------ >>>> 1 file changed, 5 insertions(+), 6 deletions(-) >>>> >>>> diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c >>>> index e0a19a73c8e1..1346861ed944 100644 >>>> --- a/fs/bcachefs/fs-ioctl.c >>>> +++ b/fs/bcachefs/fs-ioctl.c >>>> @@ -337,11 +337,12 @@ static long >>>> __bch2_ioctl_subvolume_create(struct bch_fs *c, struct file *filp, >>>> if (arg.flags & BCH_SUBVOL_SNAPSHOT_RO) >>>> create_flags |= BCH_CREATE_SNAPSHOT_RO; >>>> >>>> - /* why do we need this lock? */ >>>> - down_read(&c->vfs_sb->s_umount); >>>> - >>>> - if (arg.flags & BCH_SUBVOL_SNAPSHOT_CREATE) >>>> + if (arg.flags & BCH_SUBVOL_SNAPSHOT_CREATE) { >>>> + /* sync_inodes_sb enforce s_umount is locked */ >>>> + down_read(&c->vfs_sb->s_umount); >>>> sync_inodes_sb(c->vfs_sb); >>>> + up_read(&c->vfs_sb->s_umount); >>>> + } >>>> retry: >>>> if (arg.src_ptr) { >>>> error = user_path_at(arg.dirfd, >>>> @@ -425,8 +426,6 @@ static long __bch2_ioctl_subvolume_create(struct >>>> bch_fs *c, struct file *filp, >>>> goto retry; >>>> } >>>> err1: >>>> - up_read(&c->vfs_sb->s_umount); >>>> - >>> >>> Can we get rid of this label now that it serves no purpose? >>> >> I think we can't. Two gotos are still there(under the retry label) >> unless change them to return >> > > Sorry.. yes, that's what I mean. Usually we can just return directly > rather than goto for cases where there is no necessary clean up on exit. > Of course… It’s just about personal flavor and code style. Actually I don’t like neither of jumping between labels or returning in loop. — Su > Brian > >> -- >> Su >> >>> Brian >>> >>>> return error; >>>> } >>>> >>>> -- >>>> 2.43.0 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bcachefs: grab s_umount only if snapshotting 2024-01-16 15:45 ` Glass Su @ 2024-01-16 17:17 ` Kent Overstreet 0 siblings, 0 replies; 6+ messages in thread From: Kent Overstreet @ 2024-01-16 17:17 UTC (permalink / raw) To: Glass Su; +Cc: Brian Foster, Su Yue, linux-bcachefs@vger.kernel.org On Tue, Jan 16, 2024 at 03:45:29PM +0000, Glass Su wrote: > > > > On Jan 16, 2024, at 23:32, Brian Foster <bfoster@redhat.com> wrote: > > > > On Tue, Jan 16, 2024 at 10:10:53PM +0800, Su Yue wrote: > >> > >> On Tue 16 Jan 2024 at 08:47, Brian Foster <bfoster@redhat.com> wrote: > >> > >>> On Mon, Jan 15, 2024 at 10:21:25AM +0800, Su Yue wrote: > >>>> When I was testing mongodb over bcachefs with compression, > >>>> there is a lockdep warning when snapshotting mongodb data volume. > >>>> > >>>> $ cat test.sh > >>>> prog=bcachefs > >>>> > >>>> $prog subvolume create /mnt/data > >>>> $prog subvolume create /mnt/data/snapshots > >>>> > >>>> while true;do > >>>> $prog subvolume snapshot /mnt/data > >>>> /mnt/data/snapshots/$(date +%s) > >>>> sleep 1s > >>>> done > >>>> > >>>> $ cat /etc/mongodb.conf > >>>> systemLog: > >>>> destination: file > >>>> logAppend: true > >>>> path: /mnt/data/mongod.log > >>>> > >>>> storage: > >>>> dbPath: /mnt/data/ > >>>> > >>>> lockdep reports: > >>>> [ 3437.452330] > >>>> ====================================================== > >>>> [ 3437.452750] WARNING: possible circular locking dependency > >>>> detected > >>>> [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E > >>>> [ 3437.453562] > >>>> ------------------------------------------------------ > >>>> [ 3437.453981] bcachefs/35533 is trying to acquire lock: > >>>> [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: > >>>> filename_create+0x62/0x190 > >>>> [ 3437.454875] > >>>> but task is already holding lock: > >>>> [ 3437.455268] ffffa0a02b2b10e0 > >>>> (&type->s_umount_key#48){.+.+}-{3:3}, at: > >>>> bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] > >>>> [ 3437.456009] > >>>> which lock already depends on the new lock. > >>>> > >>>> [ 3437.456553] > >>>> the existing dependency chain (in reverse order) > >>>> is: > >>>> [ 3437.457054] > >>>> -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: > >>>> [ 3437.457507] down_read+0x3e/0x170 > >>>> [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] > >>>> [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 > >>>> [ 3437.458498] do_syscall_64+0x42/0xf0 > >>>> [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > >>>> [ 3437.459155] > >>>> -> #2 (&c->snapshot_create_lock){++++}-{3:3}: > >>>> [ 3437.459615] down_read+0x3e/0x170 > >>>> [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] > >>>> [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] > >>>> [ 3437.460686] notify_change+0x1f1/0x4a0 > >>>> [ 3437.461283] do_truncate+0x7f/0xd0 > >>>> [ 3437.461555] path_openat+0xa57/0xce0 > >>>> [ 3437.461836] do_filp_open+0xb4/0x160 > >>>> [ 3437.462116] do_sys_openat2+0x91/0xc0 > >>>> [ 3437.462402] __x64_sys_openat+0x53/0xa0 > >>>> [ 3437.462701] do_syscall_64+0x42/0xf0 > >>>> [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > >>>> [ 3437.463359] > >>>> -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: > >>>> [ 3437.463843] down_write+0x3b/0xc0 > >>>> [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] > >>>> [ 3437.464493] vfs_write+0x21b/0x4c0 > >>>> [ 3437.464653] ksys_write+0x69/0xf0 > >>>> [ 3437.464839] do_syscall_64+0x42/0xf0 > >>>> [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > >>>> [ 3437.465231] > >>>> -> #0 (sb_writers#10){.+.+}-{0:0}: > >>>> [ 3437.465471] __lock_acquire+0x1455/0x21b0 > >>>> [ 3437.465656] lock_acquire+0xc6/0x2b0 > >>>> [ 3437.465822] mnt_want_write+0x46/0x1a0 > >>>> [ 3437.465996] filename_create+0x62/0x190 > >>>> [ 3437.466175] user_path_create+0x2d/0x50 > >>>> [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] > >>>> [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 > >>>> [ 3437.466791] do_syscall_64+0x42/0xf0 > >>>> [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > >>>> [ 3437.467180] > >>>> other info that might help us debug this: > >>>> > >>>> [ 3437.469670] 2 locks held by bcachefs/35533: > >>>> other info that might help us debug this: > >>>> > >>>> [ 3437.467507] Chain exists of: > >>>> sb_writers#10 --> &c->snapshot_create_lock --> > >>>> &type->s_umount_key#48 > >>>> > >>>> [ 3437.467979] Possible unsafe locking scenario: > >>>> > >>>> [ 3437.468223] CPU0 CPU1 > >>>> [ 3437.468405] ---- ---- > >>>> [ 3437.468585] rlock(&type->s_umount_key#48); > >>>> [ 3437.468758] lock(&c->snapshot_create_lock); > >>>> [ 3437.469030] lock(&type->s_umount_key#48); > >>>> [ 3437.469291] rlock(sb_writers#10); > >>>> [ 3437.469434] > >>>> *** DEADLOCK *** > >>>> > >>>> [ 3437.469670] 2 locks held by bcachefs/35533: > >>>> [ 3437.469838] #0: ffffa0a02ce00a88 > >>>> (&c->snapshot_create_lock){++++}-{3:3}, at: > >>>> bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] > >>>> [ 3437.470294] #1: ffffa0a02b2b10e0 > >>>> (&type->s_umount_key#48){.+.+}-{3:3}, at: > >>>> bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] > >>>> [ 3437.470744] > >>>> stack backtrace: > >>>> [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded > >>>> Tainted: G E 6.7.0-rc7-custom+ #85 > >>>> [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, > >>>> 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 > >>>> [ 3437.471694] Call Trace: > >>>> [ 3437.471795] <TASK> > >>>> [ 3437.471884] dump_stack_lvl+0x57/0x90 > >>>> [ 3437.472035] check_noncircular+0x132/0x150 > >>>> [ 3437.472202] __lock_acquire+0x1455/0x21b0 > >>>> [ 3437.472369] lock_acquire+0xc6/0x2b0 > >>>> [ 3437.472518] ? filename_create+0x62/0x190 > >>>> [ 3437.472683] ? lock_is_held_type+0x97/0x110 > >>>> [ 3437.472856] mnt_want_write+0x46/0x1a0 > >>>> [ 3437.473025] ? filename_create+0x62/0x190 > >>>> [ 3437.473204] filename_create+0x62/0x190 > >>>> [ 3437.473380] user_path_create+0x2d/0x50 > >>>> [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] > >>>> [ 3437.473819] ? lock_acquire+0xc6/0x2b0 > >>>> [ 3437.474002] ? __fget_files+0x2a/0x190 > >>>> [ 3437.474195] ? __fget_files+0xbc/0x190 > >>>> [ 3437.474380] ? lock_release+0xc5/0x270 > >>>> [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 > >>>> [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] > >>>> [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 > >>>> [ 3437.475277] do_syscall_64+0x42/0xf0 > >>>> [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > >>>> [ 3437.475691] RIP: 0033:0x7f2743c313af > >>>> ====================================================== > >>>> > >>>> In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally > >>>> and unlock it at the end of the function. There is a comment > >>>> "why do we need this lock?" about the lock coming from > >>>> commit 42d237320e98 ("bcachefs: Snapshot creation, deletion") > >>>> The reason is that __bch2_ioctl_subvolume_create() calls > >>>> sync_inodes_sb() which enforce locked s_umount to writeback all > >>>> dirty > >>>> nodes before doing snapshot works. > >>>> > >>>> Fix it by read locking s_umount for snapshotting only and unlocking > >>>> s_umount after sync_inodes_sb(). > >>>> > >>> > >>> Heh, seems reasonable to me.. > >>> > >>>> Signed-off-by: Su Yue <glass.su@suse.com> > >>>> --- > >>>> fs/bcachefs/fs-ioctl.c | 11 +++++------ > >>>> 1 file changed, 5 insertions(+), 6 deletions(-) > >>>> > >>>> diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c > >>>> index e0a19a73c8e1..1346861ed944 100644 > >>>> --- a/fs/bcachefs/fs-ioctl.c > >>>> +++ b/fs/bcachefs/fs-ioctl.c > >>>> @@ -337,11 +337,12 @@ static long > >>>> __bch2_ioctl_subvolume_create(struct bch_fs *c, struct file *filp, > >>>> if (arg.flags & BCH_SUBVOL_SNAPSHOT_RO) > >>>> create_flags |= BCH_CREATE_SNAPSHOT_RO; > >>>> > >>>> - /* why do we need this lock? */ > >>>> - down_read(&c->vfs_sb->s_umount); > >>>> - > >>>> - if (arg.flags & BCH_SUBVOL_SNAPSHOT_CREATE) > >>>> + if (arg.flags & BCH_SUBVOL_SNAPSHOT_CREATE) { > >>>> + /* sync_inodes_sb enforce s_umount is locked */ > >>>> + down_read(&c->vfs_sb->s_umount); > >>>> sync_inodes_sb(c->vfs_sb); > >>>> + up_read(&c->vfs_sb->s_umount); > >>>> + } > >>>> retry: > >>>> if (arg.src_ptr) { > >>>> error = user_path_at(arg.dirfd, > >>>> @@ -425,8 +426,6 @@ static long __bch2_ioctl_subvolume_create(struct > >>>> bch_fs *c, struct file *filp, > >>>> goto retry; > >>>> } > >>>> err1: > >>>> - up_read(&c->vfs_sb->s_umount); > >>>> - > >>> > >>> Can we get rid of this label now that it serves no purpose? > >>> > >> I think we can't. Two gotos are still there(under the retry label) > >> unless change them to return > >> > > > > Sorry.. yes, that's what I mean. Usually we can just return directly > > rather than goto for cases where there is no necessary clean up on exit. > > > Of course… It’s just about personal flavor and code style. > Actually I don’t like neither of jumping between labels or returning in loop. I've got no real opinion on the matter (except that I can't wait for Rust so we don't have to be bothered), so applying this :) ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-01-16 17:17 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-01-15 2:21 [PATCH] bcachefs: grab s_umount only if snapshotting Su Yue 2024-01-16 13:47 ` Brian Foster 2024-01-16 14:10 ` Su Yue 2024-01-16 15:32 ` Brian Foster 2024-01-16 15:45 ` Glass Su 2024-01-16 17:17 ` Kent Overstreet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox