Re: [PATCH] btrfs: lock balance status ioctls against shutdown

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: ZhengYuan Huang <gality369@gmail.com>, dsterba@suse.com, clm@fb.com
Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	baijiaju1990@gmail.com, r33s3n6@gmail.com, zzzccc427@gmail.com
Subject: Re: [PATCH] btrfs: lock balance status ioctls against shutdown
Date: Tue, 12 May 2026 20:10:58 +0930	[thread overview]
Message-ID: <78e451d1-cc71-4e30-9ee8-903439f584cd@gmx.com> (raw)
In-Reply-To: <20260512065342.537219-1-gality369@gmail.com>

The word "shutdown" can be very confusing, as we also have shutdown 
ioctl and super block operation callback.

If you do not mean the emergency shutdown ioctl, please change it to 
"unmount" or something similar.

在 2026/5/12 16:23, ZhengYuan Huang 写道:
> [BUG]
> A KASAN slab-use-after-free was reported while querying balance
> progress:
> 
> BUG: KASAN: slab-use-after-free in owner_on_cpu include/linux/sched.h:2282 [inline]
> BUG: KASAN: slab-use-after-free in mutex_can_spin_on_owner+0x1cf/0x1f0 kernel/locking/mutex.c:397
> Read of size 4 at addr ffff888012700034 by task syz.0.130/433
> 
> Call Trace:
>   ...
>   owner_on_cpu include/linux/sched.h:2282 [inline]
>   mutex_can_spin_on_owner+0x1cf/0x1f0 kernel/locking/mutex.c:397
>   mutex_optimistic_spin kernel/locking/mutex.c:440 [inline]
>   __mutex_lock_common kernel/locking/mutex.c:602 [inline]
>   __mutex_lock+0x2e8/0x1d80 kernel/locking/mutex.c:760
>   mutex_lock_nested+0x1b/0x30 kernel/locking/mutex.c:812
>   btrfs_ioctl_balance_progress fs/btrfs/ioctl.c:3620 [inline]
>   btrfs_ioctl+0x3f20/0x5b90 fs/btrfs/ioctl.c:5317
>   vfs_ioctl fs/ioctl.c:51 [inline]
>   __do_sys_ioctl fs/ioctl.c:597 [inline]
>   __se_sys_ioctl fs/ioctl.c:583 [inline]
>   __x64_sys_ioctl+0x197/0x1e0 fs/ioctl.c:583
>   ...
> 
> Allocated by task 291:
>   ...
>   slab_post_alloc_hook mm/slub.c:4978 [inline]
>   slab_alloc_node mm/slub.c:5288 [inline]
>   kmem_cache_alloc_node_noprof+0x1f9/0x7c0 mm/slub.c:5340
>   alloc_task_struct_node kernel/fork.c:184 [inline]
>   dup_task_struct kernel/fork.c:873 [inline]
>   copy_process+0x3d3/0x6e30 kernel/fork.c:2012
>   kernel_clone+0xe7/0x880 kernel/fork.c:2609
>   __do_sys_clone+0xf5/0x150 kernel/fork.c:2750
>   __se_sys_clone kernel/fork.c:2734 [inline]
>   __x64_sys_clone+0xc3/0x160 kernel/fork.c:2734
>   ...
> 
> Freed by task 24:
>   ...
>   slab_free_hook mm/slub.c:2543 [inline]
>   slab_free mm/slub.c:6642 [inline]
>   kmem_cache_free+0x384/0x7a0 mm/slub.c:6752
>   free_task_struct kernel/fork.c:189 [inline]
>   free_task+0x106/0x160 kernel/fork.c:512
>   __put_task_struct+0x209/0x3b0 kernel/fork.c:748
>   __put_task_struct_rcu_cb+0x1e/0x30 kernel/fork.c:756
>   rcu_do_batch+0x397/0xe40 kernel/rcu/tree.c:2605
>   rcu_core+0x669/0xa10 kernel/rcu/tree.c:2861
>   rcu_core_si+0xe/0x20 kernel/rcu/tree.c:2878
>   handle_softirqs+0x1d6/0x840 kernel/softirq.c:622
>   run_ksoftirqd kernel/softirq.c:1063 [inline]
>   run_ksoftirqd+0x3e/0x80 kernel/softirq.c:1055
>   smpboot_thread_fn+0x3a7/0x950 kernel/smpboot.c:160
>   kthread+0x3f0/0x850 kernel/kthread.c:463
>   ret_from_fork+0x50f/0x610 arch/x86/kernel/process.c:158
>   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> 
> Last potentially related work creation:
>   ...
>   __call_rcu_common+0xcc/0x11b0 kernel/rcu/tree.c:3123
>   call_rcu+0x15/0x30 kernel/rcu/tree.c:3243
>   put_task_struct include/linux/sched/task.h:159 [inline]
>   put_task_struct include/linux/sched/task.h:128 [inline]
>   delayed_put_task_struct+0xbb/0x220 kernel/exit.c:231
>   rcu_do_batch+0x397/0xe40 kernel/rcu/tree.c:2605
>   rcu_core+0x669/0xa10 kernel/rcu/tree.c:2861
>   rcu_core_si+0xe/0x20 kernel/rcu/tree.c:2878
>   handle_softirqs+0x1d6/0x840 kernel/softirq.c:622
>   __do_softirq kernel/softirq.c:656 [inline]
>   invoke_softirq kernel/softirq.c:496 [inline]
>   __irq_exit_rcu+0x1b0/0x200 kernel/softirq.c:723
>   irq_exit_rcu+0xe/0x20 kernel/softirq.c:739
>   instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1052 [inline]
>   sysvec_apic_timer_interrupt+0x91/0xd0 arch/x86/kernel/apic/apic.c:1052
>   asm_sysvec_apic_timer_interrupt+0x1b/0x20 arch/x86/include/asm/idtentry.h:569
> 
> Second to last potentially related work creation:
>   ...
>   __call_rcu_common+0xcc/0x11b0 kernel/rcu/tree.c:3123
>   call_rcu+0x15/0x30 kernel/rcu/tree.c:3243
>   put_task_struct_rcu_user kernel/exit.c:237 [inline]
>   put_task_struct_rcu_user+0x61/0xd0 kernel/exit.c:234
>   release_task+0x108a/0x1ba0 kernel/exit.c:308
>   wait_task_zombie kernel/exit.c:1269 [inline]
>   wait_consider_task+0x1501/0x39e0 kernel/exit.c:1496
>   do_wait_thread kernel/exit.c:1559 [inline]
>   __do_wait+0x1fb/0x810 kernel/exit.c:1677
>   do_wait+0x1da/0x4b0 kernel/exit.c:1711
>   kernel_wait4+0x14e/0x270 kernel/exit.c:1870
>   __do_sys_wait4+0x14e/0x160 kernel/exit.c:1898
>   __se_sys_wait4 kernel/exit.c:1894 [inline]
>   __x64_sys_wait4+0x9b/0x110 kernel/exit.c:1894
>   x64_sys_call+0x16f0/0x26a0 arch/x86/include/generated/asm/syscalls_64.h:62
>   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>   do_syscall_64+0x93/0xf80 arch/x86/entry/syscall_64.c:94
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> The buggy address belongs to the object at ffff888012700000
>   which belongs to the cache task_struct of size 13320
> The buggy address is located 52 bytes inside of
>   freed 13320-byte region [ffff888012700000, ffff888012703408)
> 
> [CAUSE]
> BTRFS_IOC_BALANCE_PROGRESS and BTRFS_IOC_BALANCE_CTL only take fs_info
> and then touch fs_info->balance_mutex and balance state directly. They
> do not pin the superblock against shutdown and they do not reject a
> dying superblock before taking balance_mutex.
> 
> During unmount, btrfs_put_super() runs close_ctree(), which pauses
> balance and tears down filesystem state, and btrfs_kill_super() later
> frees fs_info. If a balance status/control ioctl races with that
> teardown, it can enter mutex locking on a stale balance_mutex. The
> optimistic mutex spin path only tolerates speculative owner reads while
> the mutex object itself remains valid, so a stale mutex can surface as
> the observed task_struct UAF in owner_on_cpu().
> 
> [FIX]
> Take s_umount in read mode around BALANCE_CTL and BALANCE_PROGRESS, and
> bail out once the superblock is already dying. This gives the read-only
> balance status ioctls a shutdown barrier without changing their
> semantics to require a writable mount.

Why not just follow btrfs_ioctl_balance() to take mnt_want_write_file()?

Balance is always a read-write operation, there is no read-only balance.
Thus I think it's completely fine to call mnt_want_write_file() even for 
btrfs_ioctl_balance_progress() and btrfs_ioctl_balance_ctl().

And this avoids unnecessary low-level access to s_umount.


> 
> Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
> ---
>   1 file changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index a39460bf68a7..a13fe50f2441 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3469,19 +3469,43 @@ static long btrfs_ioctl_balance(struct file *file, void __user *arg)
>   	return ret;
>   }
>   
> +static int btrfs_ioctl_lock_live_super(struct btrfs_fs_info *fs_info)
> +{
> +	struct super_block *sb = fs_info->sb;
> +
> +	down_read(&sb->s_umount);
> +	if (!(sb->s_flags & SB_BORN) || (sb->s_flags & SB_DYING) || !sb->s_root) {
> +		up_read(&sb->s_umount);
> +		return -EIO;
> +	}
> +
> +	return 0;
> +}
> +
>   static long btrfs_ioctl_balance_ctl(struct btrfs_fs_info *fs_info, int cmd)
>   {
> +	int ret;
> +
>   	if (!capable(CAP_SYS_ADMIN))
>   		return -EPERM;
>   
> +	ret = btrfs_ioctl_lock_live_super(fs_info);
> +	if (ret)
> +		return ret;
> +
>   	switch (cmd) {
>   	case BTRFS_BALANCE_CTL_PAUSE:
> -		return btrfs_pause_balance(fs_info);
> +		ret = btrfs_pause_balance(fs_info);
> +		break;
>   	case BTRFS_BALANCE_CTL_CANCEL:
> -		return btrfs_cancel_balance(fs_info);
> +		ret = btrfs_cancel_balance(fs_info);
> +		break;
> +	default:
> +		ret = -EINVAL;
>   	}
>   
> -	return -EINVAL;
> +	up_read(&fs_info->sb->s_umount);
> +	return ret;
>   }
>   
>   static long btrfs_ioctl_balance_progress(struct btrfs_fs_info *fs_info,
> @@ -3493,6 +3517,10 @@ static long btrfs_ioctl_balance_progress(struct btrfs_fs_info *fs_info,
>   	if (!capable(CAP_SYS_ADMIN))
>   		return -EPERM;
>   
> +	ret = btrfs_ioctl_lock_live_super(fs_info);
> +	if (ret)
> +		return ret;
> +
>   	mutex_lock(&fs_info->balance_mutex);
>   	if (!fs_info->balance_ctl) {
>   		ret = -ENOTCONN;
> @@ -3511,6 +3539,7 @@ static long btrfs_ioctl_balance_progress(struct btrfs_fs_info *fs_info,
>   		ret = -EFAULT;
>   out:
>   	mutex_unlock(&fs_info->balance_mutex);
> +	up_read(&fs_info->sb->s_umount);
>   	return ret;
>   }
>

next prev parent reply	other threads:[~2026-05-12 10:41 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12  6:53 [PATCH] btrfs: lock balance status ioctls against shutdown ZhengYuan Huang
2026-05-12 10:40 ` Qu Wenruo [this message]
2026-05-12 13:28   ` David Sterba
2026-05-12 13:35 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=78e451d1-cc71-4e30-9ee8-903439f584cd@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=baijiaju1990@gmail.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=gality369@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=r33s3n6@gmail.com \
    --cc=zzzccc427@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox