public inbox for linux-bcachefs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Su Yue <glass.su@suse.com>
Cc: linux-bcachefs@vger.kernel.org, l@damenly.org
Subject: Re: [PATCH] bcachefs: grab s_umount only if snapshotting
Date: Tue, 16 Jan 2024 08:47:49 -0500	[thread overview]
Message-ID: <ZaaJBUlutcDw7s8D@bfoster> (raw)
In-Reply-To: <20240115022125.148669-1-glass.su@suse.com>

On Mon, Jan 15, 2024 at 10:21:25AM +0800, Su Yue wrote:
> When I was testing mongodb over bcachefs with compression,
> there is a lockdep warning when snapshotting mongodb data volume.
> 
> $ cat test.sh
> prog=bcachefs
> 
> $prog subvolume create /mnt/data
> $prog subvolume create /mnt/data/snapshots
> 
> while true;do
>     $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s)
>     sleep 1s
> done
> 
> $ cat /etc/mongodb.conf
> systemLog:
>   destination: file
>   logAppend: true
>   path: /mnt/data/mongod.log
> 
> storage:
>   dbPath: /mnt/data/
> 
> lockdep reports:
> [ 3437.452330] ======================================================
> [ 3437.452750] WARNING: possible circular locking dependency detected
> [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G            E
> [ 3437.453562] ------------------------------------------------------
> [ 3437.453981] bcachefs/35533 is trying to acquire lock:
> [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190
> [ 3437.454875]
>                but task is already holding lock:
> [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
> [ 3437.456009]
>                which lock already depends on the new lock.
> 
> [ 3437.456553]
>                the existing dependency chain (in reverse order) is:
> [ 3437.457054]
>                -> #3 (&type->s_umount_key#48){.+.+}-{3:3}:
> [ 3437.457507]        down_read+0x3e/0x170
> [ 3437.457772]        bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
> [ 3437.458206]        __x64_sys_ioctl+0x93/0xd0
> [ 3437.458498]        do_syscall_64+0x42/0xf0
> [ 3437.458779]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
> [ 3437.459155]
>                -> #2 (&c->snapshot_create_lock){++++}-{3:3}:
> [ 3437.459615]        down_read+0x3e/0x170
> [ 3437.459878]        bch2_truncate+0x82/0x110 [bcachefs]
> [ 3437.460276]        bchfs_truncate+0x254/0x3c0 [bcachefs]
> [ 3437.460686]        notify_change+0x1f1/0x4a0
> [ 3437.461283]        do_truncate+0x7f/0xd0
> [ 3437.461555]        path_openat+0xa57/0xce0
> [ 3437.461836]        do_filp_open+0xb4/0x160
> [ 3437.462116]        do_sys_openat2+0x91/0xc0
> [ 3437.462402]        __x64_sys_openat+0x53/0xa0
> [ 3437.462701]        do_syscall_64+0x42/0xf0
> [ 3437.462982]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
> [ 3437.463359]
>                -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
> [ 3437.463843]        down_write+0x3b/0xc0
> [ 3437.464223]        bch2_write_iter+0x5b/0xcc0 [bcachefs]
> [ 3437.464493]        vfs_write+0x21b/0x4c0
> [ 3437.464653]        ksys_write+0x69/0xf0
> [ 3437.464839]        do_syscall_64+0x42/0xf0
> [ 3437.465009]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
> [ 3437.465231]
>                -> #0 (sb_writers#10){.+.+}-{0:0}:
> [ 3437.465471]        __lock_acquire+0x1455/0x21b0
> [ 3437.465656]        lock_acquire+0xc6/0x2b0
> [ 3437.465822]        mnt_want_write+0x46/0x1a0
> [ 3437.465996]        filename_create+0x62/0x190
> [ 3437.466175]        user_path_create+0x2d/0x50
> [ 3437.466352]        bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
> [ 3437.466617]        __x64_sys_ioctl+0x93/0xd0
> [ 3437.466791]        do_syscall_64+0x42/0xf0
> [ 3437.466957]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
> [ 3437.467180]
>                other info that might help us debug this:
> 
> [ 3437.469670] 2 locks held by bcachefs/35533:
>                other info that might help us debug this:
> 
> [ 3437.467507] Chain exists of:
>                  sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48
> 
> [ 3437.467979]  Possible unsafe locking scenario:
> 
> [ 3437.468223]        CPU0                    CPU1
> [ 3437.468405]        ----                    ----
> [ 3437.468585]   rlock(&type->s_umount_key#48);
> [ 3437.468758]                                lock(&c->snapshot_create_lock);
> [ 3437.469030]                                lock(&type->s_umount_key#48);
> [ 3437.469291]   rlock(sb_writers#10);
> [ 3437.469434]
>                 *** DEADLOCK ***
> 
> [ 3437.469670] 2 locks held by bcachefs/35533:
> [ 3437.469838]  #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs]
> [ 3437.470294]  #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
> [ 3437.470744]
>                stack backtrace:
> [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G            E      6.7.0-rc7-custom+ #85
> [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
> [ 3437.471694] Call Trace:
> [ 3437.471795]  <TASK>
> [ 3437.471884]  dump_stack_lvl+0x57/0x90
> [ 3437.472035]  check_noncircular+0x132/0x150
> [ 3437.472202]  __lock_acquire+0x1455/0x21b0
> [ 3437.472369]  lock_acquire+0xc6/0x2b0
> [ 3437.472518]  ? filename_create+0x62/0x190
> [ 3437.472683]  ? lock_is_held_type+0x97/0x110
> [ 3437.472856]  mnt_want_write+0x46/0x1a0
> [ 3437.473025]  ? filename_create+0x62/0x190
> [ 3437.473204]  filename_create+0x62/0x190
> [ 3437.473380]  user_path_create+0x2d/0x50
> [ 3437.473555]  bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
> [ 3437.473819]  ? lock_acquire+0xc6/0x2b0
> [ 3437.474002]  ? __fget_files+0x2a/0x190
> [ 3437.474195]  ? __fget_files+0xbc/0x190
> [ 3437.474380]  ? lock_release+0xc5/0x270
> [ 3437.474567]  ? __x64_sys_ioctl+0x93/0xd0
> [ 3437.474764]  ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs]
> [ 3437.475090]  __x64_sys_ioctl+0x93/0xd0
> [ 3437.475277]  do_syscall_64+0x42/0xf0
> [ 3437.475454]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> [ 3437.475691] RIP: 0033:0x7f2743c313af
> ======================================================
> 
> In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally
> and unlock it at the end of the function. There is a comment
> "why do we need this lock?" about the lock coming from
> commit 42d237320e98 ("bcachefs: Snapshot creation, deletion")
> The reason is that __bch2_ioctl_subvolume_create() calls
> sync_inodes_sb() which enforce locked s_umount to writeback all dirty
> nodes before doing snapshot works.
> 
> Fix it by read locking s_umount for snapshotting only and unlocking
> s_umount after sync_inodes_sb().
> 

Heh, seems reasonable to me..

> Signed-off-by: Su Yue <glass.su@suse.com>
> ---
>  fs/bcachefs/fs-ioctl.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c
> index e0a19a73c8e1..1346861ed944 100644
> --- a/fs/bcachefs/fs-ioctl.c
> +++ b/fs/bcachefs/fs-ioctl.c
> @@ -337,11 +337,12 @@ static long __bch2_ioctl_subvolume_create(struct bch_fs *c, struct file *filp,
>  	if (arg.flags & BCH_SUBVOL_SNAPSHOT_RO)
>  		create_flags |= BCH_CREATE_SNAPSHOT_RO;
>  
> -	/* why do we need this lock? */
> -	down_read(&c->vfs_sb->s_umount);
> -
> -	if (arg.flags & BCH_SUBVOL_SNAPSHOT_CREATE)
> +	if (arg.flags & BCH_SUBVOL_SNAPSHOT_CREATE) {
> +		/* sync_inodes_sb enforce s_umount is locked */
> +		down_read(&c->vfs_sb->s_umount);
>  		sync_inodes_sb(c->vfs_sb);
> +		up_read(&c->vfs_sb->s_umount);
> +	}
>  retry:
>  	if (arg.src_ptr) {
>  		error = user_path_at(arg.dirfd,
> @@ -425,8 +426,6 @@ static long __bch2_ioctl_subvolume_create(struct bch_fs *c, struct file *filp,
>  		goto retry;
>  	}
>  err1:
> -	up_read(&c->vfs_sb->s_umount);
> -

Can we get rid of this label now that it serves no purpose?

Brian

>  	return error;
>  }
>  
> -- 
> 2.43.0
> 
> 


  reply	other threads:[~2024-01-16 13:46 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-15  2:21 [PATCH] bcachefs: grab s_umount only if snapshotting Su Yue
2024-01-16 13:47 ` Brian Foster [this message]
2024-01-16 14:10   ` Su Yue
2024-01-16 15:32     ` Brian Foster
2024-01-16 15:45       ` Glass Su
2024-01-16 17:17         ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZaaJBUlutcDw7s8D@bfoster \
    --to=bfoster@redhat.com \
    --cc=glass.su@suse.com \
    --cc=l@damenly.org \
    --cc=linux-bcachefs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox