All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	kvm@vger.kernel.org, virtualization@lists.linux.dev,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: Re: [PATCH 1/3] vhost_task: KVM: Don't wake KVM x86's recovery thread if vhost task was killed
Date: Tue, 26 Aug 2025 03:52:59 -0400	[thread overview]
Message-ID: <20250826034937-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20250826004012.3835150-2-seanjc@google.com>

On Mon, Aug 25, 2025 at 05:40:09PM -0700, Sean Christopherson wrote:
> Add a vhost_task_wake_safe() variant to handle the case where a vhost task
> has exited due to a signal, i.e. before being explicitly stopped by the
> owner of the task, and use the "safe" API in KVM when waking NX hugepage
> recovery tasks.  This fixes a bug where KVM will attempt to wake a task
> that has exited, which ultimately results in all manner of badness, e.g.
> 
>   Oops: general protection fault, probably for non-canonical address 0xff0e899fa1566052: 0000 [#1] SMP
>   CPU: 51 UID: 0 PID: 53807 Comm: tee Tainted: G S         O        6.17.0-smp--38183c31756a-next #826 NONE
>   Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
>   Hardware name: Google LLC Indus/Indus_QC_03, BIOS 30.110.0 09/13/2024
>   RIP: 0010:queued_spin_lock_slowpath+0x123/0x250
>   Code: ... <48> 89 8c 02 c0 da 47 a2 83 79 08 00 75 08 f3 90 83 79 08 00 74 f8
>   RSP: 0018:ffffbf55cffe7cf8 EFLAGS: 00010006
>   RAX: ff0e899fff0e8562 RBX: 0000000000d00000 RCX: ffffa39b40aefac0
>   RDX: 0000000000000030 RSI: fffffffffffffff8 RDI: ffffa39d0592e68c
>   RBP: 0000000000d00000 R08: 00000000ffffff80 R09: 0000000400000000
>   R10: ffffa36cce4fe401 R11: 0000000000000800 R12: 0000000000000003
>   R13: 0000000000000000 R14: ffffa39d0592e68c R15: ffffa39b9e672000
>   FS:  00007f233b2e9740(0000) GS:ffffa39b9e672000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007f233b39fda0 CR3: 00000004d031f002 CR4: 00000000007726f0
>   PKRU: 55555554
>   Call Trace:
>    <TASK>
>    _raw_spin_lock_irqsave+0x50/0x60
>    try_to_wake_up+0x4f/0x5d0
>    set_nx_huge_pages+0xe4/0x1c0 [kvm]
>    param_attr_store+0x89/0xf0
>    module_attr_store+0x1e/0x30
>    kernfs_fop_write_iter+0xe4/0x160
>    vfs_write+0x2cb/0x420
>    ksys_write+0x7f/0xf0
>    do_syscall_64+0x6f/0x1f0
>    entry_SYSCALL_64_after_hwframe+0x4b/0x53
>   RIP: 0033:0x7f233b4178b3
>   R13: 0000000000000002 R14: 00000000226ff3d0 R15: 0000000000000002
>    </TASK>
> 
> Provide an API in vhost task instead of forcing KVM to solve the problem,
> as KVM would literally just add an equivalent to VHOST_TASK_FLAGS_KILLED,
> along with a new lock to protect said flag.  In general, forcing simple
> usage of vhost task to care about signals _and_ take non-trivial action to
> do the right thing isn't developer friendly, and is likely to lead to
> similar bugs in the future.
> 
> Debugged-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Link: https://lore.kernel.org/all/aKkLEtoDXKxAAWju@google.com
> Link: https://lore.kernel.org/all/aJ_vEP2EHj6l0xRT@google.com
> Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Fixes: d96c77bd4eeb ("KVM: x86: switch hugepage recovery thread to vhost_task")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>

OK but I dislike the API.

Default APIs should be safe. So vhost_task_wake_safe should be
vhost_task_wake

This also reduces the changes to kvm.


It does not look like we need the "unsafe" variant, so pls drop it.
If we do need it, it should be called __vhost_task_wake.






> ---
>  arch/x86/kvm/mmu/mmu.c           |  2 +-
>  include/linux/sched/vhost_task.h |  1 +
>  kernel/vhost_task.c              | 42 +++++++++++++++++++++++++++++---
>  3 files changed, 41 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 6e838cb6c9e1..d11730467fd4 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -7376,7 +7376,7 @@ static void kvm_wake_nx_recovery_thread(struct kvm *kvm)
>  	struct vhost_task *nx_thread = READ_ONCE(kvm->arch.nx_huge_page_recovery_thread);
>  
>  	if (nx_thread)
> -		vhost_task_wake(nx_thread);
> +		vhost_task_wake_safe(nx_thread);
>  }
>  
>  static int get_nx_huge_pages(char *buffer, const struct kernel_param *kp)
> diff --git a/include/linux/sched/vhost_task.h b/include/linux/sched/vhost_task.h
> index 25446c5d3508..5d5c187088f7 100644
> --- a/include/linux/sched/vhost_task.h
> +++ b/include/linux/sched/vhost_task.h
> @@ -10,5 +10,6 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *),
>  void vhost_task_start(struct vhost_task *vtsk);
>  void vhost_task_stop(struct vhost_task *vtsk);
>  void vhost_task_wake(struct vhost_task *vtsk);
> +void vhost_task_wake_safe(struct vhost_task *vtsk);
>  
>  #endif /* _LINUX_SCHED_VHOST_TASK_H */
> diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
> index bc738fa90c1d..5aa8ddf88d01 100644
> --- a/kernel/vhost_task.c
> +++ b/kernel/vhost_task.c
> @@ -67,18 +67,54 @@ static int vhost_task_fn(void *data)
>  	do_exit(0);
>  }
>  
> +static void __vhost_task_wake(struct vhost_task *vtsk)
> +{
> +	wake_up_process(vtsk->task);
> +}
> +
>  /**
>   * vhost_task_wake - wakeup the vhost_task
>   * @vtsk: vhost_task to wake
>   *
> - * wake up the vhost_task worker thread
> + * Wake up the vhost_task worker thread.  The caller is responsible for ensuring
> + * that the task hasn't exited.
>   */
>  void vhost_task_wake(struct vhost_task *vtsk)
>  {
> -	wake_up_process(vtsk->task);
> +	/*
> +	 * Checking VHOST_TASK_FLAGS_KILLED can race with signal delivery, but
> +	 * a race can only result in false negatives and this is just a sanity
> +	 * check, i.e. if KILLED is set, the caller is buggy no matter what.
> +	 */
> +	if (WARN_ON_ONCE(test_bit(VHOST_TASK_FLAGS_KILLED, &vtsk->flags)))
> +		return;
> +
> +	__vhost_task_wake(vtsk);
>  }
>  EXPORT_SYMBOL_GPL(vhost_task_wake);
>  
> +/**
> + * vhost_task_wake_safe - wakeup the vhost_task if it hasn't been killed
> + * @vtsk: vhost_task to wake
> + *
> + * Wake up the vhost_task worker thread if the task hasn't exited, e.g. due to
> + * a signal.
> + */
> +void vhost_task_wake_safe(struct vhost_task *vtsk)
> +{
> +	guard(mutex)(&vtsk->exit_mutex);
> +
> +	/* Attempting to wake a task that has been explicitly stopped is a bug. */
> +	if (WARN_ON_ONCE(test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags)))
> +		return;
> +
> +	if (test_bit(VHOST_TASK_FLAGS_KILLED, &vtsk->flags))
> +		return;
> +
> +	__vhost_task_wake(vtsk);
> +}
> +EXPORT_SYMBOL_GPL(vhost_task_wake_safe);
> +
>  /**
>   * vhost_task_stop - stop a vhost_task
>   * @vtsk: vhost_task to stop
> @@ -91,7 +127,7 @@ void vhost_task_stop(struct vhost_task *vtsk)
>  	mutex_lock(&vtsk->exit_mutex);
>  	if (!test_bit(VHOST_TASK_FLAGS_KILLED, &vtsk->flags)) {
>  		set_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags);
> -		vhost_task_wake(vtsk);
> +		__vhost_task_wake(vtsk);
>  	}
>  	mutex_unlock(&vtsk->exit_mutex);
>  
> -- 
> 2.51.0.261.g7ce5a0a67e-goog


  reply	other threads:[~2025-08-26  7:53 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-26  0:40 [PATCH 0/3] vhost_task: KVM: Fix a race where KVM wakes an exited task Sean Christopherson
2025-08-26  0:40 ` [PATCH 1/3] vhost_task: KVM: Don't wake KVM x86's recovery thread if vhost task was killed Sean Christopherson
2025-08-26  7:52   ` Michael S. Tsirkin [this message]
2025-08-26 14:03     ` Sean Christopherson
2025-08-26 14:15       ` Sebastian Andrzej Siewior
2025-08-26 14:40       ` Michael S. Tsirkin
2025-08-26 14:43       ` Michael S. Tsirkin
2025-08-26  0:40 ` [PATCH 2/3] vhost_task: Allow caller to omit handle_sigkill() callback Sean Christopherson
2025-08-26  6:29   ` Sebastian Andrzej Siewior
2025-08-26  0:40 ` [PATCH 3/3] KVM: x86/mmu: Don't register a sigkill callback for NX hugepage recovery tasks Sean Christopherson
2025-08-28  2:19 ` [PATCH 0/3] vhost_task: KVM: Fix a race where KVM wakes an exited task Lei Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250826034937-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=bigeasy@linutronix.de \
    --cc=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.