kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv3 0/2]
@ 2025-02-27 23:06 Keith Busch
  2025-02-27 23:06 ` [PATCHv3 1/2] vhost: return task creation error instead of NULL Keith Busch
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Keith Busch @ 2025-02-27 23:06 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm; +Cc: leiyang, virtualization, x86, netdev, Keith Busch

From: Keith Busch <kbusch@kernel.org>

changes from v2:

  Fixed up the logical error in vhost on the new failure criteria

Keith Busch (1):
  vhost: return task creation error instead of NULL

Sean Christopherson (1):
  kvm: retry nx_huge_page_recovery_thread creation

 arch/x86/kvm/mmu/mmu.c    | 12 +++++-------
 drivers/vhost/vhost.c     |  2 +-
 include/linux/call_once.h | 16 +++++++++++-----
 kernel/vhost_task.c       |  4 ++--
 4 files changed, 19 insertions(+), 15 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCHv3 1/2] vhost: return task creation error instead of NULL
  2025-02-27 23:06 [PATCHv3 0/2] Keith Busch
@ 2025-02-27 23:06 ` Keith Busch
  2025-02-28 18:34   ` Mike Christie
  2025-02-27 23:06 ` [PATCHv3 2/2] kvm: retry nx_huge_page_recovery_thread creation Keith Busch
  2025-02-28  8:07 ` [PATCHv3 0/2] Lei Yang
  2 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2025-02-27 23:06 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm; +Cc: leiyang, virtualization, x86, netdev, Keith Busch

From: Keith Busch <kbusch@kernel.org>

Lets callers distinguish why the vhost task creation failed. No one
currently cares why it failed, so no real runtime change from this
patch, but that will not be the case for long.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 arch/x86/kvm/mmu/mmu.c | 2 +-
 drivers/vhost/vhost.c  | 2 +-
 kernel/vhost_task.c    | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d4ac4a1f8b81b..18ca1ea6dc240 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7471,7 +7471,7 @@ static void kvm_mmu_start_lpage_recovery(struct once *once)
 				      kvm_nx_huge_page_recovery_worker_kill,
 				      kvm, "kvm-nx-lpage-recovery");
 
-	if (!nx_thread)
+	if (IS_ERR(nx_thread))
 		return;
 
 	vhost_task_start(nx_thread);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 9ac25d08f473e..63612faeab727 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -666,7 +666,7 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
 
 	vtsk = vhost_task_create(vhost_run_work_list, vhost_worker_killed,
 				 worker, name);
-	if (!vtsk)
+	if (IS_ERR(vtsk))
 		goto free_worker;
 
 	mutex_init(&worker->mutex);
diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
index 8800f5acc0071..2ef2e1b800916 100644
--- a/kernel/vhost_task.c
+++ b/kernel/vhost_task.c
@@ -133,7 +133,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *),
 
 	vtsk = kzalloc(sizeof(*vtsk), GFP_KERNEL);
 	if (!vtsk)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 	init_completion(&vtsk->exited);
 	mutex_init(&vtsk->exit_mutex);
 	vtsk->data = arg;
@@ -145,7 +145,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *),
 	tsk = copy_process(NULL, 0, NUMA_NO_NODE, &args);
 	if (IS_ERR(tsk)) {
 		kfree(vtsk);
-		return NULL;
+		return ERR_PTR(PTR_ERR(tsk));
 	}
 
 	vtsk->task = tsk;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv3 2/2] kvm: retry nx_huge_page_recovery_thread creation
  2025-02-27 23:06 [PATCHv3 0/2] Keith Busch
  2025-02-27 23:06 ` [PATCHv3 1/2] vhost: return task creation error instead of NULL Keith Busch
@ 2025-02-27 23:06 ` Keith Busch
  2025-03-04 15:59   ` Simon Horman
  2025-02-28  8:07 ` [PATCHv3 0/2] Lei Yang
  2 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2025-02-27 23:06 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm; +Cc: leiyang, virtualization, x86, netdev, Keith Busch

From: Sean Christopherson <seanjc@google.com>

A VMM may send a signal to its threads while they've entered KVM_RUN. If
that thread happens to be trying to make the huge page recovery vhost
task, then it fails with -ERESTARTNOINTR. We need to retry if that
happens, so call_once needs to be retryable. Make call_once complete
only if what it called was successful.

[implemented the kvm user side]
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 arch/x86/kvm/mmu/mmu.c    | 10 ++++------
 include/linux/call_once.h | 16 +++++++++++-----
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 18ca1ea6dc240..8160870398b90 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7460,7 +7460,7 @@ static bool kvm_nx_huge_page_recovery_worker(void *data)
 	return true;
 }
 
-static void kvm_mmu_start_lpage_recovery(struct once *once)
+static int kvm_mmu_start_lpage_recovery(struct once *once)
 {
 	struct kvm_arch *ka = container_of(once, struct kvm_arch, nx_once);
 	struct kvm *kvm = container_of(ka, struct kvm, arch);
@@ -7472,12 +7472,13 @@ static void kvm_mmu_start_lpage_recovery(struct once *once)
 				      kvm, "kvm-nx-lpage-recovery");
 
 	if (IS_ERR(nx_thread))
-		return;
+		return PTR_ERR(nx_thread);
 
 	vhost_task_start(nx_thread);
 
 	/* Make the task visible only once it is fully started. */
 	WRITE_ONCE(kvm->arch.nx_huge_page_recovery_thread, nx_thread);
+	return 0;
 }
 
 int kvm_mmu_post_init_vm(struct kvm *kvm)
@@ -7485,10 +7486,7 @@ int kvm_mmu_post_init_vm(struct kvm *kvm)
 	if (nx_hugepage_mitigation_hard_disabled)
 		return 0;
 
-	call_once(&kvm->arch.nx_once, kvm_mmu_start_lpage_recovery);
-	if (!kvm->arch.nx_huge_page_recovery_thread)
-		return -ENOMEM;
-	return 0;
+	return call_once(&kvm->arch.nx_once, kvm_mmu_start_lpage_recovery);
 }
 
 void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
diff --git a/include/linux/call_once.h b/include/linux/call_once.h
index 6261aa0b3fb00..ddcfd91493eaa 100644
--- a/include/linux/call_once.h
+++ b/include/linux/call_once.h
@@ -26,20 +26,26 @@ do {									\
 	__once_init((once), #once, &__key);				\
 } while (0)
 
-static inline void call_once(struct once *once, void (*cb)(struct once *))
+static inline int call_once(struct once *once, int (*cb)(struct once *))
 {
+	int r;
+
         /* Pairs with atomic_set_release() below.  */
         if (atomic_read_acquire(&once->state) == ONCE_COMPLETED)
-                return;
+		return 0;
 
         guard(mutex)(&once->lock);
         WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
         if (atomic_read(&once->state) != ONCE_NOT_STARTED)
-                return;
+                return -EINVAL;
 
         atomic_set(&once->state, ONCE_RUNNING);
-        cb(once);
-        atomic_set_release(&once->state, ONCE_COMPLETED);
+	r = cb(once);
+	if (r)
+		atomic_set(&once->state, ONCE_NOT_STARTED);
+	else
+		atomic_set_release(&once->state, ONCE_COMPLETED);
+	return r;
 }
 
 #endif /* _LINUX_CALL_ONCE_H */
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 0/2]
  2025-02-27 23:06 [PATCHv3 0/2] Keith Busch
  2025-02-27 23:06 ` [PATCHv3 1/2] vhost: return task creation error instead of NULL Keith Busch
  2025-02-27 23:06 ` [PATCHv3 2/2] kvm: retry nx_huge_page_recovery_thread creation Keith Busch
@ 2025-02-28  8:07 ` Lei Yang
  2025-02-28 14:15   ` Sean Christopherson
  2 siblings, 1 reply; 16+ messages in thread
From: Lei Yang @ 2025-02-28  8:07 UTC (permalink / raw)
  To: Keith Busch
  Cc: seanjc, pbonzini, kvm, virtualization, x86, netdev, Keith Busch

Hi Keith

V3 introduced a new bug, the following error messages from qemu output
after applying this patch to boot up a guest.
Error messages:
error: kvm run failed Invalid argument
error: kvm run failed Invalid argument
EAX=00000000 EBX=00000000 ECX=00000000 EDX=000806f4
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200error: kvm run failed Invalid argument

TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=c5 5a 08 2d 00 00 00 00 00 00 00 00 00 00 00 00 56 54 46 00 <0f>
20 c0 a8 01 74 05 e9 2c ff ff ff e9 11 ff 90 00 00 00 00 00 00 00 00
00 00 00 00 00 00
EAX=00000000 EBX=00000000 ECX=00000000 EDX=000806f4
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=c5 5a 08 2d 00 00 00 00 00 00 00 00 00 00 00 00 56 54 46 00 <0f>
20 c0 a8 01 74 05 e9 2c ff ff ff e9 11 ff 90 00 00 00 00 00 00 00 00
00 00 00 00 00 00
EAX=00000000 EBX=00000000 ECX=00000000 EDX=000806f4
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=c5 5a 08 2d 00 00 00 00 00 00 00 00 00 00 00 00 56 54 46 00 <0f>
20 c0 a8 01 74 05 e9 2c ff ff ff e9 11 ff 90 00 00 00 00 00 00 00 00
00 00 00 00 00 00

Thanks
Lei

On Fri, Feb 28, 2025 at 7:06 AM Keith Busch <kbusch@meta.com> wrote:
>
> From: Keith Busch <kbusch@kernel.org>
>
> changes from v2:
>
>   Fixed up the logical error in vhost on the new failure criteria
>
> Keith Busch (1):
>   vhost: return task creation error instead of NULL
>
> Sean Christopherson (1):
>   kvm: retry nx_huge_page_recovery_thread creation
>
>  arch/x86/kvm/mmu/mmu.c    | 12 +++++-------
>  drivers/vhost/vhost.c     |  2 +-
>  include/linux/call_once.h | 16 +++++++++++-----
>  kernel/vhost_task.c       |  4 ++--
>  4 files changed, 19 insertions(+), 15 deletions(-)
>
> --
> 2.43.5
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 0/2]
  2025-02-28  8:07 ` [PATCHv3 0/2] Lei Yang
@ 2025-02-28 14:15   ` Sean Christopherson
  2025-02-28 14:32     ` Sean Christopherson
  0 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2025-02-28 14:15 UTC (permalink / raw)
  To: Lei Yang
  Cc: Keith Busch, pbonzini, kvm, virtualization, x86, netdev,
	Keith Busch

On Fri, Feb 28, 2025, Lei Yang wrote:
> Hi Keith
> 
> V3 introduced a new bug, the following error messages from qemu output
> after applying this patch to boot up a guest.

Doh, my bug.  Not yet tested, but this should fix things.  Assuming it does, I'll
post a v3 so I can add my SoB.

diff --git a/include/linux/call_once.h b/include/linux/call_once.h
index ddcfd91493ea..b053f4701c94 100644
--- a/include/linux/call_once.h
+++ b/include/linux/call_once.h
@@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int (*cb)(struct once *))
                return 0;
 
         guard(mutex)(&once->lock);
-        WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
-        if (atomic_read(&once->state) != ONCE_NOT_STARTED)
+        if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING))
                 return -EINVAL;
 
+        if (atomic_read(&once->state) == ONCE_COMPLETED)
+                return 0;
+
         atomic_set(&once->state, ONCE_RUNNING);
        r = cb(once);
        if (r)

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 0/2]
  2025-02-28 14:15   ` Sean Christopherson
@ 2025-02-28 14:32     ` Sean Christopherson
  2025-02-28 14:58       ` Keith Busch
  0 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2025-02-28 14:32 UTC (permalink / raw)
  To: Lei Yang
  Cc: Keith Busch, pbonzini, kvm, virtualization, x86, netdev,
	Keith Busch

On Fri, Feb 28, 2025, Sean Christopherson wrote:
> On Fri, Feb 28, 2025, Lei Yang wrote:
> > Hi Keith
> > 
> > V3 introduced a new bug, the following error messages from qemu output
> > after applying this patch to boot up a guest.
> 
> Doh, my bug.  Not yet tested, but this should fix things.  Assuming it does, I'll
> post a v3 so I can add my SoB.
         v4

Confirmed that it worked, but deleting the pre-mutex check for ONCE_COMPLETED.
Will post v4 later today.

> diff --git a/include/linux/call_once.h b/include/linux/call_once.h
> index ddcfd91493ea..b053f4701c94 100644
> --- a/include/linux/call_once.h
> +++ b/include/linux/call_once.h
> @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int (*cb)(struct once *))
>                 return 0;
>  
>          guard(mutex)(&once->lock);
> -        WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
> -        if (atomic_read(&once->state) != ONCE_NOT_STARTED)
> +        if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING))
>                  return -EINVAL;
>  
> +        if (atomic_read(&once->state) == ONCE_COMPLETED)
> +                return 0;
> +
>          atomic_set(&once->state, ONCE_RUNNING);
>         r = cb(once);
>         if (r)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 0/2]
  2025-02-28 14:32     ` Sean Christopherson
@ 2025-02-28 14:58       ` Keith Busch
  2025-02-28 15:29         ` Sean Christopherson
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2025-02-28 14:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Lei Yang, Keith Busch, pbonzini, kvm, virtualization, x86, netdev

On Fri, Feb 28, 2025 at 06:32:47AM -0800, Sean Christopherson wrote:
> On Fri, Feb 28, 2025, Sean Christopherson wrote:
> > On Fri, Feb 28, 2025, Lei Yang wrote:
> > > Hi Keith
> > > 
> > > V3 introduced a new bug, the following error messages from qemu output
> > > after applying this patch to boot up a guest.
> > 
> > Doh, my bug.  Not yet tested, but this should fix things.  Assuming it does, I'll
> > post a v3 so I can add my SoB.
>          v4
> 
> Confirmed that it worked, but deleting the pre-mutex check for ONCE_COMPLETED.
> Will post v4 later today.
> 
> > diff --git a/include/linux/call_once.h b/include/linux/call_once.h
> > index ddcfd91493ea..b053f4701c94 100644
> > --- a/include/linux/call_once.h
> > +++ b/include/linux/call_once.h
> > @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int (*cb)(struct once *))
> >                 return 0;
> >  
> >          guard(mutex)(&once->lock);
> > -        WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
> > -        if (atomic_read(&once->state) != ONCE_NOT_STARTED)
> > +        if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING))
> >                  return -EINVAL;
> >  
> > +        if (atomic_read(&once->state) == ONCE_COMPLETED)
> > +                return 0;
> > +
> >          atomic_set(&once->state, ONCE_RUNNING);
> >         r = cb(once);
> >         if (r)

Possible suggestion since it seems odd to do an atomic_read twice on the
same value. Maybe make this a switch:

	switch (atomic_read(&once->state) {
	case ONCE_NOT_STARTED:
		atomic_set(&once->state, ONCE_RUNNING);
		break;
	case ONCE_COMPLETED:
		return 0;
	case ONCE_RUNNING:
	default:
		WARN_ON(1);
		return -EINVAL;
	} 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 0/2]
  2025-02-28 14:58       ` Keith Busch
@ 2025-02-28 15:29         ` Sean Christopherson
  2025-02-28 15:36           ` Keith Busch
  0 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2025-02-28 15:29 UTC (permalink / raw)
  To: Keith Busch
  Cc: Lei Yang, Keith Busch, pbonzini, kvm, virtualization, x86, netdev

On Fri, Feb 28, 2025, Keith Busch wrote:
> On Fri, Feb 28, 2025 at 06:32:47AM -0800, Sean Christopherson wrote:
> > > diff --git a/include/linux/call_once.h b/include/linux/call_once.h
> > > index ddcfd91493ea..b053f4701c94 100644
> > > --- a/include/linux/call_once.h
> > > +++ b/include/linux/call_once.h
> > > @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int (*cb)(struct once *))
> > >                 return 0;
> > >  
> > >          guard(mutex)(&once->lock);
> > > -        WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
> > > -        if (atomic_read(&once->state) != ONCE_NOT_STARTED)
> > > +        if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING))
> > >                  return -EINVAL;
> > >  
> > > +        if (atomic_read(&once->state) == ONCE_COMPLETED)
> > > +                return 0;
> > > +
> > >          atomic_set(&once->state, ONCE_RUNNING);
> > >         r = cb(once);
> > >         if (r)
> 
> Possible suggestion since it seems odd to do an atomic_read twice on the
> same value.

Yeah, good call.  At the risk of getting too cute, how about this?

static inline int call_once(struct once *once, int (*cb)(struct once *))
{
	int r, state;

	/* Pairs with atomic_set_release() below.  */
	if (atomic_read_acquire(&once->state) == ONCE_COMPLETED)
		return 0;

	guard(mutex)(&once->lock);
	state = atomic_read(&once->state);
	if (unlikely(state != ONCE_NOT_STARTED))
		return WARN_ON_ONCE(state != ONCE_COMPLETED) ? -EINVAL : 0;

	atomic_set(&once->state, ONCE_RUNNING);
	r = cb(once);
	if (r)
		atomic_set(&once->state, ONCE_NOT_STARTED);
	else
		atomic_set_release(&once->state, ONCE_COMPLETED);
	return r;
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 0/2]
  2025-02-28 15:29         ` Sean Christopherson
@ 2025-02-28 15:36           ` Keith Busch
  2025-02-28 16:43             ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2025-02-28 15:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Lei Yang, Keith Busch, pbonzini, kvm, virtualization, x86, netdev

On Fri, Feb 28, 2025 at 07:29:45AM -0800, Sean Christopherson wrote:
> On Fri, Feb 28, 2025, Keith Busch wrote:
> > On Fri, Feb 28, 2025 at 06:32:47AM -0800, Sean Christopherson wrote:
> > > > @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int (*cb)(struct once *))
> > > >                 return 0;
> > > >  
> > > >          guard(mutex)(&once->lock);
> > > > -        WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
> > > > -        if (atomic_read(&once->state) != ONCE_NOT_STARTED)
> > > > +        if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING))
> > > >                  return -EINVAL;
> > > >  
> > > > +        if (atomic_read(&once->state) == ONCE_COMPLETED)
> > > > +                return 0;
> > > > +
> > > >          atomic_set(&once->state, ONCE_RUNNING);
> > > >         r = cb(once);
> > > >         if (r)
> > 
> > Possible suggestion since it seems odd to do an atomic_read twice on the
> > same value.
> 
> Yeah, good call.  At the risk of getting too cute, how about this?

Sure, that also looks good to me.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 0/2]
  2025-02-28 15:36           ` Keith Busch
@ 2025-02-28 16:43             ` Paolo Bonzini
  2025-02-28 17:03               ` Sean Christopherson
  2025-07-08 22:17               ` Keith Busch
  0 siblings, 2 replies; 16+ messages in thread
From: Paolo Bonzini @ 2025-02-28 16:43 UTC (permalink / raw)
  To: Keith Busch, Sean Christopherson
  Cc: Lei Yang, Keith Busch, kvm, virtualization, x86, netdev

On 2/28/25 16:36, Keith Busch wrote:
> On Fri, Feb 28, 2025 at 07:29:45AM -0800, Sean Christopherson wrote:
>> On Fri, Feb 28, 2025, Keith Busch wrote:
>>> On Fri, Feb 28, 2025 at 06:32:47AM -0800, Sean Christopherson wrote:
>>>>> @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int (*cb)(struct once *))
>>>>>                  return 0;
>>>>>   
>>>>>           guard(mutex)(&once->lock);
>>>>> -        WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
>>>>> -        if (atomic_read(&once->state) != ONCE_NOT_STARTED)
>>>>> +        if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING))
>>>>>                   return -EINVAL;
>>>>>   
>>>>> +        if (atomic_read(&once->state) == ONCE_COMPLETED)
>>>>> +                return 0;
>>>>> +
>>>>>           atomic_set(&once->state, ONCE_RUNNING);
>>>>>          r = cb(once);
>>>>>          if (r)
>>>
>>> Possible suggestion since it seems odd to do an atomic_read twice on the
>>> same value.
>>
>> Yeah, good call.  At the risk of getting too cute, how about this?
> 
> Sure, that also looks good to me.

Just to overthink it a bit more, I'm changing "if (r)" to "if (r < 0)". 
Not because it's particularly useful to return a meaningful nonzero 
value on the first initialization, but more because 0+ for success and 
-errno for failure is a more common.

Queued with this change, thanks.

(Keith, I haven't forgotten about AVX by the way).

Paolo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 0/2]
  2025-02-28 16:43             ` Paolo Bonzini
@ 2025-02-28 17:03               ` Sean Christopherson
  2025-07-08 22:17               ` Keith Busch
  1 sibling, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2025-02-28 17:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Keith Busch, Lei Yang, Keith Busch, kvm, virtualization, x86,
	netdev

On Fri, Feb 28, 2025, Paolo Bonzini wrote:
> On 2/28/25 16:36, Keith Busch wrote:
> > On Fri, Feb 28, 2025 at 07:29:45AM -0800, Sean Christopherson wrote:
> > > On Fri, Feb 28, 2025, Keith Busch wrote:
> > > > On Fri, Feb 28, 2025 at 06:32:47AM -0800, Sean Christopherson wrote:
> > > > > > @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int (*cb)(struct once *))
> > > > > >                  return 0;
> > > > > >           guard(mutex)(&once->lock);
> > > > > > -        WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
> > > > > > -        if (atomic_read(&once->state) != ONCE_NOT_STARTED)
> > > > > > +        if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING))
> > > > > >                   return -EINVAL;
> > > > > > +        if (atomic_read(&once->state) == ONCE_COMPLETED)
> > > > > > +                return 0;
> > > > > > +
> > > > > >           atomic_set(&once->state, ONCE_RUNNING);
> > > > > >          r = cb(once);
> > > > > >          if (r)
> > > > 
> > > > Possible suggestion since it seems odd to do an atomic_read twice on the
> > > > same value.
> > > 
> > > Yeah, good call.  At the risk of getting too cute, how about this?
> > 
> > Sure, that also looks good to me.
> 
> Just to overthink it a bit more, I'm changing "if (r)" to "if (r < 0)". Not
> because it's particularly useful to return a meaningful nonzero value on the
> first initialization, but more because 0+ for success and -errno for failure
> is a more common.
> 
> Queued with this change, thanks.

If it's not too late, the first patch can/should use ERR_CAST() instead of a
PTR_ERR() => ERR_PTR():

	tsk = copy_process(NULL, 0, NUMA_NO_NODE, &args);
	if (IS_ERR(tsk)) {
		kfree(vtsk);
		return ERR_CAST(tsk);
	}

And I was going to get greedy and replace spaces with tabs in call_once.

The changelog for this patch is also misleading.  KVM_RUN doesn't currently return
-ERESTARTNOINTR, it only ever returns -ENOMEN.  copy_process() is what returns
-ERESTARTNOINTR.

I also think it's worth calling out that it's a non-fatal signal.

--
From: Sean Christopherson <seanjc@google.com>
Date: Thu, 27 Feb 2025 15:06:31 -0800
Subject: [PATCH] KVM: x86/mmu: Allow retry of nx_huge_page_recovery_thread
 creation

A VMM may send a non-fatal signal to its threads, including vCPU tasks,
at any time, and thus may signal vCPU tasks during KVM_RUN.  If a vCPU
task receives the signal while its trying to spawn the huge page recovery
vhost task, then KVM_RUN will fail due to copy_process() returning
-ERESTARTNOINTR.

Rework call_once() to mark the call complete if and only if the called
function succeeds, and plumb the function's true error code back to the
call_once() invoker.  This provides userspace with the correct, non-fatal
error code so that the VMM doesn't terminate the VM on -ENOMEM, and allows
subsequent KVM_RUN a succeed by virtue of retrying creation of the NX huge
page task.

Opportunistically replace spaces with tabs in call_once.h.

Fixes: 931656b9e2ff ("kvm: defer huge page recovery vhost task to later")
Co-developed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c    | 10 ++++------
 include/linux/call_once.h | 36 +++++++++++++++++++++---------------
 2 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 70af12b693a3..63bb77ee1bb1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7633,7 +7633,7 @@ static bool kvm_nx_huge_page_recovery_worker(void *data)
 	return true;
 }
 
-static void kvm_mmu_start_lpage_recovery(struct once *once)
+static int kvm_mmu_start_lpage_recovery(struct once *once)
 {
 	struct kvm_arch *ka = container_of(once, struct kvm_arch, nx_once);
 	struct kvm *kvm = container_of(ka, struct kvm, arch);
@@ -7645,12 +7645,13 @@ static void kvm_mmu_start_lpage_recovery(struct once *once)
 				      kvm, "kvm-nx-lpage-recovery");
 
 	if (IS_ERR(nx_thread))
-		return;
+		return PTR_ERR(nx_thread);
 
 	vhost_task_start(nx_thread);
 
 	/* Make the task visible only once it is fully started. */
 	WRITE_ONCE(kvm->arch.nx_huge_page_recovery_thread, nx_thread);
+	return 0;
 }
 
 int kvm_mmu_post_init_vm(struct kvm *kvm)
@@ -7658,10 +7659,7 @@ int kvm_mmu_post_init_vm(struct kvm *kvm)
 	if (nx_hugepage_mitigation_hard_disabled)
 		return 0;
 
-	call_once(&kvm->arch.nx_once, kvm_mmu_start_lpage_recovery);
-	if (!kvm->arch.nx_huge_page_recovery_thread)
-		return -ENOMEM;
-	return 0;
+	return call_once(&kvm->arch.nx_once, kvm_mmu_start_lpage_recovery);
 }
 
 void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
diff --git a/include/linux/call_once.h b/include/linux/call_once.h
index 6261aa0b3fb0..56cb9625b48b 100644
--- a/include/linux/call_once.h
+++ b/include/linux/call_once.h
@@ -9,15 +9,15 @@
 #define ONCE_COMPLETED   2
 
 struct once {
-        atomic_t state;
-        struct mutex lock;
+	atomic_t state;
+	struct mutex lock;
 };
 
 static inline void __once_init(struct once *once, const char *name,
 			       struct lock_class_key *key)
 {
-        atomic_set(&once->state, ONCE_NOT_STARTED);
-        __mutex_init(&once->lock, name, key);
+	atomic_set(&once->state, ONCE_NOT_STARTED);
+	__mutex_init(&once->lock, name, key);
 }
 
 #define once_init(once)							\
@@ -26,20 +26,26 @@ do {									\
 	__once_init((once), #once, &__key);				\
 } while (0)
 
-static inline void call_once(struct once *once, void (*cb)(struct once *))
+static inline int call_once(struct once *once, int (*cb)(struct once *))
 {
-        /* Pairs with atomic_set_release() below.  */
-        if (atomic_read_acquire(&once->state) == ONCE_COMPLETED)
-                return;
+	int r, state;
 
-        guard(mutex)(&once->lock);
-        WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
-        if (atomic_read(&once->state) != ONCE_NOT_STARTED)
-                return;
+	/* Pairs with atomic_set_release() below.  */
+	if (atomic_read_acquire(&once->state) == ONCE_COMPLETED)
+		return 0;
 
-        atomic_set(&once->state, ONCE_RUNNING);
-        cb(once);
-        atomic_set_release(&once->state, ONCE_COMPLETED);
+	guard(mutex)(&once->lock);
+	state = atomic_read(&once->state);
+	if (unlikely(state != ONCE_NOT_STARTED))
+		return WARN_ON_ONCE(state != ONCE_COMPLETED) ? -EINVAL : 0;
+
+	atomic_set(&once->state, ONCE_RUNNING);
+	r = cb(once);
+	if (r)
+		atomic_set(&once->state, ONCE_NOT_STARTED);
+	else
+		atomic_set_release(&once->state, ONCE_COMPLETED);
+	return r;
 }
 
 #endif /* _LINUX_CALL_ONCE_H */

base-commit: 7d2322b0472eb402e8a206ba9b332b6c75f6f130
-- 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 1/2] vhost: return task creation error instead of NULL
  2025-02-27 23:06 ` [PATCHv3 1/2] vhost: return task creation error instead of NULL Keith Busch
@ 2025-02-28 18:34   ` Mike Christie
  0 siblings, 0 replies; 16+ messages in thread
From: Mike Christie @ 2025-02-28 18:34 UTC (permalink / raw)
  To: Keith Busch, seanjc, pbonzini, kvm
  Cc: leiyang, virtualization, x86, netdev, Keith Busch

On 2/27/25 5:06 PM, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Lets callers distinguish why the vhost task creation failed. No one
> currently cares why it failed, so no real runtime change from this
> patch, but that will not be the case for long.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  arch/x86/kvm/mmu/mmu.c | 2 +-
>  drivers/vhost/vhost.c  | 2 +-
>  kernel/vhost_task.c    | 4 ++--
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index d4ac4a1f8b81b..18ca1ea6dc240 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -7471,7 +7471,7 @@ static void kvm_mmu_start_lpage_recovery(struct once *once)
>  				      kvm_nx_huge_page_recovery_worker_kill,
>  				      kvm, "kvm-nx-lpage-recovery");
>  
> -	if (!nx_thread)
> +	if (IS_ERR(nx_thread))
>  		return;
>  
>  	vhost_task_start(nx_thread);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 9ac25d08f473e..63612faeab727 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -666,7 +666,7 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
>  
>  	vtsk = vhost_task_create(vhost_run_work_list, vhost_worker_killed,
>  				 worker, name);
> -	if (!vtsk)
> +	if (IS_ERR(vtsk))
>  		goto free_worker;
>  
>  	mutex_init(&worker->mutex);
> diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
> index 8800f5acc0071..2ef2e1b800916 100644
> --- a/kernel/vhost_task.c
> +++ b/kernel/vhost_task.c
> @@ -133,7 +133,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *),
>  
>  	vtsk = kzalloc(sizeof(*vtsk), GFP_KERNEL);
>  	if (!vtsk)
> -		return NULL;
> +		return ERR_PTR(-ENOMEM);
>  	init_completion(&vtsk->exited);
>  	mutex_init(&vtsk->exit_mutex);
>  	vtsk->data = arg;
> @@ -145,7 +145,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *),
>  	tsk = copy_process(NULL, 0, NUMA_NO_NODE, &args);
>  	if (IS_ERR(tsk)) {
>  		kfree(vtsk);
> -		return NULL;
> +		return ERR_PTR(PTR_ERR(tsk));
>  	}
>  
>  	vtsk->task = tsk;


The vhost task parts look ok to me.

Reviewed-by: Mike Christie <michael.christie@oracle.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 2/2] kvm: retry nx_huge_page_recovery_thread creation
  2025-02-27 23:06 ` [PATCHv3 2/2] kvm: retry nx_huge_page_recovery_thread creation Keith Busch
@ 2025-03-04 15:59   ` Simon Horman
  2025-03-04 16:07     ` Keith Busch
  0 siblings, 1 reply; 16+ messages in thread
From: Simon Horman @ 2025-03-04 15:59 UTC (permalink / raw)
  To: Keith Busch
  Cc: seanjc, pbonzini, kvm, leiyang, virtualization, x86, netdev,
	Keith Busch

On Thu, Feb 27, 2025 at 03:06:31PM -0800, Keith Busch wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> A VMM may send a signal to its threads while they've entered KVM_RUN. If
> that thread happens to be trying to make the huge page recovery vhost
> task, then it fails with -ERESTARTNOINTR. We need to retry if that
> happens, so call_once needs to be retryable. Make call_once complete
> only if what it called was successful.
> 
> [implemented the kvm user side]
> Signed-off-by: Keith Busch <kbusch@kernel.org>

...

> diff --git a/include/linux/call_once.h b/include/linux/call_once.h
> index 6261aa0b3fb00..ddcfd91493eaa 100644
> --- a/include/linux/call_once.h
> +++ b/include/linux/call_once.h
> @@ -26,20 +26,26 @@ do {									\
>  	__once_init((once), #once, &__key);				\
>  } while (0)
>  
> -static inline void call_once(struct once *once, void (*cb)(struct once *))
> +static inline int call_once(struct once *once, int (*cb)(struct once *))
>  {
> +	int r;
> +
>          /* Pairs with atomic_set_release() below.  */
>          if (atomic_read_acquire(&once->state) == ONCE_COMPLETED)
> -                return;
> +		return 0;
>  
>          guard(mutex)(&once->lock);
>          WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
>          if (atomic_read(&once->state) != ONCE_NOT_STARTED)
> -                return;
> +                return -EINVAL;

Hi Keith,

A minor nit from my side:

As you are changing this line, and it seems like there will be another
revision of this series anyway, please consider updating the indentation to
use tabs.

>  
>          atomic_set(&once->state, ONCE_RUNNING);
> -        cb(once);
> -        atomic_set_release(&once->state, ONCE_COMPLETED);
> +	r = cb(once);
> +	if (r)
> +		atomic_set(&once->state, ONCE_NOT_STARTED);
> +	else
> +		atomic_set_release(&once->state, ONCE_COMPLETED);
> +	return r;
>  }
>  
>  #endif /* _LINUX_CALL_ONCE_H */
> -- 
> 2.43.5
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 2/2] kvm: retry nx_huge_page_recovery_thread creation
  2025-03-04 15:59   ` Simon Horman
@ 2025-03-04 16:07     ` Keith Busch
  2025-05-01 15:12       ` Frederick Lawler
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2025-03-04 16:07 UTC (permalink / raw)
  To: Simon Horman
  Cc: Keith Busch, seanjc, pbonzini, kvm, leiyang, virtualization, x86,
	netdev

On Tue, Mar 04, 2025 at 03:59:22PM +0000, Simon Horman wrote:
> A minor nit from my side:
> 
> As you are changing this line, and it seems like there will be another
> revision of this series anyway, please consider updating the indentation to
> use tabs.

The patch is already applied to upstream and stable, and I think Paolo
must have taken care of the formatting because it looks good in the
commit now:

  https://web.git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=916b7f42b3b3b539a71c204a9b49fdc4ca92cd82

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 2/2] kvm: retry nx_huge_page_recovery_thread creation
  2025-03-04 16:07     ` Keith Busch
@ 2025-05-01 15:12       ` Frederick Lawler
  0 siblings, 0 replies; 16+ messages in thread
From: Frederick Lawler @ 2025-05-01 15:12 UTC (permalink / raw)
  To: Keith Busch
  Cc: Simon Horman, Keith Busch, seanjc, pbonzini, kvm, leiyang,
	virtualization, x86, netdev, kernel-team

Hi Keith,

On Tue, Mar 04, 2025 at 09:07:23AM -0700, Keith Busch wrote:
> On Tue, Mar 04, 2025 at 03:59:22PM +0000, Simon Horman wrote:
> > A minor nit from my side:
> > 
> > As you are changing this line, and it seems like there will be another
> > revision of this series anyway, please consider updating the indentation to
> > use tabs.
> 
> The patch is already applied to upstream and stable, and I think Paolo
> must have taken care of the formatting because it looks good in the
> commit now:
> 
>   https://web.git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=916b7f42b3b3b539a71c204a9b49fdc4ca92cd82a

I backported this patch cleanly on top of 6.12.24 and ran it on some of
our production nodes. I compared 6.12.24 to the patched 6.12.24 over the
last three days, and we saw a drop in ENOMEM reports from firecracker.
It also appears our VM's to be consistently spinning up again.

When you said stable here, I checked the stable queue and lists, but I
didn't see this patch in 6.12. (only in the 6.14/15 stable) Can we get
this backported to 6.12 as well?

Fred

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 0/2]
  2025-02-28 16:43             ` Paolo Bonzini
  2025-02-28 17:03               ` Sean Christopherson
@ 2025-07-08 22:17               ` Keith Busch
  1 sibling, 0 replies; 16+ messages in thread
From: Keith Busch @ 2025-07-08 22:17 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Lei Yang, Keith Busch, kvm, virtualization,
	x86, netdev

On Fri, Feb 28, 2025 at 05:43:43PM +0100, Paolo Bonzini wrote:
> (Keith, I haven't forgotten about AVX by the way).

Hey, how's that going by the way? :) Just checking in as I'm still
having to carrying this part out of tree.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-07-08 22:17 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-27 23:06 [PATCHv3 0/2] Keith Busch
2025-02-27 23:06 ` [PATCHv3 1/2] vhost: return task creation error instead of NULL Keith Busch
2025-02-28 18:34   ` Mike Christie
2025-02-27 23:06 ` [PATCHv3 2/2] kvm: retry nx_huge_page_recovery_thread creation Keith Busch
2025-03-04 15:59   ` Simon Horman
2025-03-04 16:07     ` Keith Busch
2025-05-01 15:12       ` Frederick Lawler
2025-02-28  8:07 ` [PATCHv3 0/2] Lei Yang
2025-02-28 14:15   ` Sean Christopherson
2025-02-28 14:32     ` Sean Christopherson
2025-02-28 14:58       ` Keith Busch
2025-02-28 15:29         ` Sean Christopherson
2025-02-28 15:36           ` Keith Busch
2025-02-28 16:43             ` Paolo Bonzini
2025-02-28 17:03               ` Sean Christopherson
2025-07-08 22:17               ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).