* [PATCHv3 0/2] @ 2025-02-27 23:06 Keith Busch 2025-02-27 23:06 ` [PATCHv3 1/2] vhost: return task creation error instead of NULL Keith Busch ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Keith Busch @ 2025-02-27 23:06 UTC (permalink / raw) To: seanjc, pbonzini, kvm; +Cc: leiyang, virtualization, x86, netdev, Keith Busch From: Keith Busch <kbusch@kernel.org> changes from v2: Fixed up the logical error in vhost on the new failure criteria Keith Busch (1): vhost: return task creation error instead of NULL Sean Christopherson (1): kvm: retry nx_huge_page_recovery_thread creation arch/x86/kvm/mmu/mmu.c | 12 +++++------- drivers/vhost/vhost.c | 2 +- include/linux/call_once.h | 16 +++++++++++----- kernel/vhost_task.c | 4 ++-- 4 files changed, 19 insertions(+), 15 deletions(-) -- 2.43.5 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCHv3 1/2] vhost: return task creation error instead of NULL 2025-02-27 23:06 [PATCHv3 0/2] Keith Busch @ 2025-02-27 23:06 ` Keith Busch 2025-02-28 18:34 ` Mike Christie 2025-02-27 23:06 ` [PATCHv3 2/2] kvm: retry nx_huge_page_recovery_thread creation Keith Busch 2025-02-28 8:07 ` [PATCHv3 0/2] Lei Yang 2 siblings, 1 reply; 16+ messages in thread From: Keith Busch @ 2025-02-27 23:06 UTC (permalink / raw) To: seanjc, pbonzini, kvm; +Cc: leiyang, virtualization, x86, netdev, Keith Busch From: Keith Busch <kbusch@kernel.org> Lets callers distinguish why the vhost task creation failed. No one currently cares why it failed, so no real runtime change from this patch, but that will not be the case for long. Signed-off-by: Keith Busch <kbusch@kernel.org> --- arch/x86/kvm/mmu/mmu.c | 2 +- drivers/vhost/vhost.c | 2 +- kernel/vhost_task.c | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index d4ac4a1f8b81b..18ca1ea6dc240 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7471,7 +7471,7 @@ static void kvm_mmu_start_lpage_recovery(struct once *once) kvm_nx_huge_page_recovery_worker_kill, kvm, "kvm-nx-lpage-recovery"); - if (!nx_thread) + if (IS_ERR(nx_thread)) return; vhost_task_start(nx_thread); diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 9ac25d08f473e..63612faeab727 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -666,7 +666,7 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) vtsk = vhost_task_create(vhost_run_work_list, vhost_worker_killed, worker, name); - if (!vtsk) + if (IS_ERR(vtsk)) goto free_worker; mutex_init(&worker->mutex); diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c index 8800f5acc0071..2ef2e1b800916 100644 --- a/kernel/vhost_task.c +++ b/kernel/vhost_task.c @@ -133,7 +133,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *), vtsk = kzalloc(sizeof(*vtsk), GFP_KERNEL); if (!vtsk) - return NULL; + return ERR_PTR(-ENOMEM); init_completion(&vtsk->exited); mutex_init(&vtsk->exit_mutex); vtsk->data = arg; @@ -145,7 +145,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *), tsk = copy_process(NULL, 0, NUMA_NO_NODE, &args); if (IS_ERR(tsk)) { kfree(vtsk); - return NULL; + return ERR_PTR(PTR_ERR(tsk)); } vtsk->task = tsk; -- 2.43.5 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCHv3 1/2] vhost: return task creation error instead of NULL 2025-02-27 23:06 ` [PATCHv3 1/2] vhost: return task creation error instead of NULL Keith Busch @ 2025-02-28 18:34 ` Mike Christie 0 siblings, 0 replies; 16+ messages in thread From: Mike Christie @ 2025-02-28 18:34 UTC (permalink / raw) To: Keith Busch, seanjc, pbonzini, kvm Cc: leiyang, virtualization, x86, netdev, Keith Busch On 2/27/25 5:06 PM, Keith Busch wrote: > From: Keith Busch <kbusch@kernel.org> > > Lets callers distinguish why the vhost task creation failed. No one > currently cares why it failed, so no real runtime change from this > patch, but that will not be the case for long. > > Signed-off-by: Keith Busch <kbusch@kernel.org> > --- > arch/x86/kvm/mmu/mmu.c | 2 +- > drivers/vhost/vhost.c | 2 +- > kernel/vhost_task.c | 4 ++-- > 3 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index d4ac4a1f8b81b..18ca1ea6dc240 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -7471,7 +7471,7 @@ static void kvm_mmu_start_lpage_recovery(struct once *once) > kvm_nx_huge_page_recovery_worker_kill, > kvm, "kvm-nx-lpage-recovery"); > > - if (!nx_thread) > + if (IS_ERR(nx_thread)) > return; > > vhost_task_start(nx_thread); > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index 9ac25d08f473e..63612faeab727 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -666,7 +666,7 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) > > vtsk = vhost_task_create(vhost_run_work_list, vhost_worker_killed, > worker, name); > - if (!vtsk) > + if (IS_ERR(vtsk)) > goto free_worker; > > mutex_init(&worker->mutex); > diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c > index 8800f5acc0071..2ef2e1b800916 100644 > --- a/kernel/vhost_task.c > +++ b/kernel/vhost_task.c > @@ -133,7 +133,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *), > > vtsk = kzalloc(sizeof(*vtsk), GFP_KERNEL); > if (!vtsk) > - return NULL; > + return ERR_PTR(-ENOMEM); > init_completion(&vtsk->exited); > mutex_init(&vtsk->exit_mutex); > vtsk->data = arg; > @@ -145,7 +145,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *), > tsk = copy_process(NULL, 0, NUMA_NO_NODE, &args); > if (IS_ERR(tsk)) { > kfree(vtsk); > - return NULL; > + return ERR_PTR(PTR_ERR(tsk)); > } > > vtsk->task = tsk; The vhost task parts look ok to me. Reviewed-by: Mike Christie <michael.christie@oracle.com> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCHv3 2/2] kvm: retry nx_huge_page_recovery_thread creation 2025-02-27 23:06 [PATCHv3 0/2] Keith Busch 2025-02-27 23:06 ` [PATCHv3 1/2] vhost: return task creation error instead of NULL Keith Busch @ 2025-02-27 23:06 ` Keith Busch 2025-03-04 15:59 ` Simon Horman 2025-02-28 8:07 ` [PATCHv3 0/2] Lei Yang 2 siblings, 1 reply; 16+ messages in thread From: Keith Busch @ 2025-02-27 23:06 UTC (permalink / raw) To: seanjc, pbonzini, kvm; +Cc: leiyang, virtualization, x86, netdev, Keith Busch From: Sean Christopherson <seanjc@google.com> A VMM may send a signal to its threads while they've entered KVM_RUN. If that thread happens to be trying to make the huge page recovery vhost task, then it fails with -ERESTARTNOINTR. We need to retry if that happens, so call_once needs to be retryable. Make call_once complete only if what it called was successful. [implemented the kvm user side] Signed-off-by: Keith Busch <kbusch@kernel.org> --- arch/x86/kvm/mmu/mmu.c | 10 ++++------ include/linux/call_once.h | 16 +++++++++++----- 2 files changed, 15 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 18ca1ea6dc240..8160870398b90 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7460,7 +7460,7 @@ static bool kvm_nx_huge_page_recovery_worker(void *data) return true; } -static void kvm_mmu_start_lpage_recovery(struct once *once) +static int kvm_mmu_start_lpage_recovery(struct once *once) { struct kvm_arch *ka = container_of(once, struct kvm_arch, nx_once); struct kvm *kvm = container_of(ka, struct kvm, arch); @@ -7472,12 +7472,13 @@ static void kvm_mmu_start_lpage_recovery(struct once *once) kvm, "kvm-nx-lpage-recovery"); if (IS_ERR(nx_thread)) - return; + return PTR_ERR(nx_thread); vhost_task_start(nx_thread); /* Make the task visible only once it is fully started. */ WRITE_ONCE(kvm->arch.nx_huge_page_recovery_thread, nx_thread); + return 0; } int kvm_mmu_post_init_vm(struct kvm *kvm) @@ -7485,10 +7486,7 @@ int kvm_mmu_post_init_vm(struct kvm *kvm) if (nx_hugepage_mitigation_hard_disabled) return 0; - call_once(&kvm->arch.nx_once, kvm_mmu_start_lpage_recovery); - if (!kvm->arch.nx_huge_page_recovery_thread) - return -ENOMEM; - return 0; + return call_once(&kvm->arch.nx_once, kvm_mmu_start_lpage_recovery); } void kvm_mmu_pre_destroy_vm(struct kvm *kvm) diff --git a/include/linux/call_once.h b/include/linux/call_once.h index 6261aa0b3fb00..ddcfd91493eaa 100644 --- a/include/linux/call_once.h +++ b/include/linux/call_once.h @@ -26,20 +26,26 @@ do { \ __once_init((once), #once, &__key); \ } while (0) -static inline void call_once(struct once *once, void (*cb)(struct once *)) +static inline int call_once(struct once *once, int (*cb)(struct once *)) { + int r; + /* Pairs with atomic_set_release() below. */ if (atomic_read_acquire(&once->state) == ONCE_COMPLETED) - return; + return 0; guard(mutex)(&once->lock); WARN_ON(atomic_read(&once->state) == ONCE_RUNNING); if (atomic_read(&once->state) != ONCE_NOT_STARTED) - return; + return -EINVAL; atomic_set(&once->state, ONCE_RUNNING); - cb(once); - atomic_set_release(&once->state, ONCE_COMPLETED); + r = cb(once); + if (r) + atomic_set(&once->state, ONCE_NOT_STARTED); + else + atomic_set_release(&once->state, ONCE_COMPLETED); + return r; } #endif /* _LINUX_CALL_ONCE_H */ -- 2.43.5 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCHv3 2/2] kvm: retry nx_huge_page_recovery_thread creation 2025-02-27 23:06 ` [PATCHv3 2/2] kvm: retry nx_huge_page_recovery_thread creation Keith Busch @ 2025-03-04 15:59 ` Simon Horman 2025-03-04 16:07 ` Keith Busch 0 siblings, 1 reply; 16+ messages in thread From: Simon Horman @ 2025-03-04 15:59 UTC (permalink / raw) To: Keith Busch Cc: seanjc, pbonzini, kvm, leiyang, virtualization, x86, netdev, Keith Busch On Thu, Feb 27, 2025 at 03:06:31PM -0800, Keith Busch wrote: > From: Sean Christopherson <seanjc@google.com> > > A VMM may send a signal to its threads while they've entered KVM_RUN. If > that thread happens to be trying to make the huge page recovery vhost > task, then it fails with -ERESTARTNOINTR. We need to retry if that > happens, so call_once needs to be retryable. Make call_once complete > only if what it called was successful. > > [implemented the kvm user side] > Signed-off-by: Keith Busch <kbusch@kernel.org> ... > diff --git a/include/linux/call_once.h b/include/linux/call_once.h > index 6261aa0b3fb00..ddcfd91493eaa 100644 > --- a/include/linux/call_once.h > +++ b/include/linux/call_once.h > @@ -26,20 +26,26 @@ do { \ > __once_init((once), #once, &__key); \ > } while (0) > > -static inline void call_once(struct once *once, void (*cb)(struct once *)) > +static inline int call_once(struct once *once, int (*cb)(struct once *)) > { > + int r; > + > /* Pairs with atomic_set_release() below. */ > if (atomic_read_acquire(&once->state) == ONCE_COMPLETED) > - return; > + return 0; > > guard(mutex)(&once->lock); > WARN_ON(atomic_read(&once->state) == ONCE_RUNNING); > if (atomic_read(&once->state) != ONCE_NOT_STARTED) > - return; > + return -EINVAL; Hi Keith, A minor nit from my side: As you are changing this line, and it seems like there will be another revision of this series anyway, please consider updating the indentation to use tabs. > > atomic_set(&once->state, ONCE_RUNNING); > - cb(once); > - atomic_set_release(&once->state, ONCE_COMPLETED); > + r = cb(once); > + if (r) > + atomic_set(&once->state, ONCE_NOT_STARTED); > + else > + atomic_set_release(&once->state, ONCE_COMPLETED); > + return r; > } > > #endif /* _LINUX_CALL_ONCE_H */ > -- > 2.43.5 > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHv3 2/2] kvm: retry nx_huge_page_recovery_thread creation 2025-03-04 15:59 ` Simon Horman @ 2025-03-04 16:07 ` Keith Busch 2025-05-01 15:12 ` Frederick Lawler 0 siblings, 1 reply; 16+ messages in thread From: Keith Busch @ 2025-03-04 16:07 UTC (permalink / raw) To: Simon Horman Cc: Keith Busch, seanjc, pbonzini, kvm, leiyang, virtualization, x86, netdev On Tue, Mar 04, 2025 at 03:59:22PM +0000, Simon Horman wrote: > A minor nit from my side: > > As you are changing this line, and it seems like there will be another > revision of this series anyway, please consider updating the indentation to > use tabs. The patch is already applied to upstream and stable, and I think Paolo must have taken care of the formatting because it looks good in the commit now: https://web.git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=916b7f42b3b3b539a71c204a9b49fdc4ca92cd82 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHv3 2/2] kvm: retry nx_huge_page_recovery_thread creation 2025-03-04 16:07 ` Keith Busch @ 2025-05-01 15:12 ` Frederick Lawler 0 siblings, 0 replies; 16+ messages in thread From: Frederick Lawler @ 2025-05-01 15:12 UTC (permalink / raw) To: Keith Busch Cc: Simon Horman, Keith Busch, seanjc, pbonzini, kvm, leiyang, virtualization, x86, netdev, kernel-team Hi Keith, On Tue, Mar 04, 2025 at 09:07:23AM -0700, Keith Busch wrote: > On Tue, Mar 04, 2025 at 03:59:22PM +0000, Simon Horman wrote: > > A minor nit from my side: > > > > As you are changing this line, and it seems like there will be another > > revision of this series anyway, please consider updating the indentation to > > use tabs. > > The patch is already applied to upstream and stable, and I think Paolo > must have taken care of the formatting because it looks good in the > commit now: > > https://web.git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=916b7f42b3b3b539a71c204a9b49fdc4ca92cd82a I backported this patch cleanly on top of 6.12.24 and ran it on some of our production nodes. I compared 6.12.24 to the patched 6.12.24 over the last three days, and we saw a drop in ENOMEM reports from firecracker. It also appears our VM's to be consistently spinning up again. When you said stable here, I checked the stable queue and lists, but I didn't see this patch in 6.12. (only in the 6.14/15 stable) Can we get this backported to 6.12 as well? Fred ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHv3 0/2] 2025-02-27 23:06 [PATCHv3 0/2] Keith Busch 2025-02-27 23:06 ` [PATCHv3 1/2] vhost: return task creation error instead of NULL Keith Busch 2025-02-27 23:06 ` [PATCHv3 2/2] kvm: retry nx_huge_page_recovery_thread creation Keith Busch @ 2025-02-28 8:07 ` Lei Yang 2025-02-28 14:15 ` Sean Christopherson 2 siblings, 1 reply; 16+ messages in thread From: Lei Yang @ 2025-02-28 8:07 UTC (permalink / raw) To: Keith Busch Cc: seanjc, pbonzini, kvm, virtualization, x86, netdev, Keith Busch Hi Keith V3 introduced a new bug, the following error messages from qemu output after applying this patch to boot up a guest. Error messages: error: kvm run failed Invalid argument error: kvm run failed Invalid argument EAX=00000000 EBX=00000000 ECX=00000000 EDX=000806f4 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200error: kvm run failed Invalid argument TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=c5 5a 08 2d 00 00 00 00 00 00 00 00 00 00 00 00 56 54 46 00 <0f> 20 c0 a8 01 74 05 e9 2c ff ff ff e9 11 ff 90 00 00 00 00 00 00 00 00 00 00 00 00 00 00 EAX=00000000 EBX=00000000 ECX=00000000 EDX=000806f4 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=c5 5a 08 2d 00 00 00 00 00 00 00 00 00 00 00 00 56 54 46 00 <0f> 20 c0 a8 01 74 05 e9 2c ff ff ff e9 11 ff 90 00 00 00 00 00 00 00 00 00 00 00 00 00 00 EAX=00000000 EBX=00000000 ECX=00000000 EDX=000806f4 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=c5 5a 08 2d 00 00 00 00 00 00 00 00 00 00 00 00 56 54 46 00 <0f> 20 c0 a8 01 74 05 e9 2c ff ff ff e9 11 ff 90 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Thanks Lei On Fri, Feb 28, 2025 at 7:06 AM Keith Busch <kbusch@meta.com> wrote: > > From: Keith Busch <kbusch@kernel.org> > > changes from v2: > > Fixed up the logical error in vhost on the new failure criteria > > Keith Busch (1): > vhost: return task creation error instead of NULL > > Sean Christopherson (1): > kvm: retry nx_huge_page_recovery_thread creation > > arch/x86/kvm/mmu/mmu.c | 12 +++++------- > drivers/vhost/vhost.c | 2 +- > include/linux/call_once.h | 16 +++++++++++----- > kernel/vhost_task.c | 4 ++-- > 4 files changed, 19 insertions(+), 15 deletions(-) > > -- > 2.43.5 > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHv3 0/2] 2025-02-28 8:07 ` [PATCHv3 0/2] Lei Yang @ 2025-02-28 14:15 ` Sean Christopherson 2025-02-28 14:32 ` Sean Christopherson 0 siblings, 1 reply; 16+ messages in thread From: Sean Christopherson @ 2025-02-28 14:15 UTC (permalink / raw) To: Lei Yang Cc: Keith Busch, pbonzini, kvm, virtualization, x86, netdev, Keith Busch On Fri, Feb 28, 2025, Lei Yang wrote: > Hi Keith > > V3 introduced a new bug, the following error messages from qemu output > after applying this patch to boot up a guest. Doh, my bug. Not yet tested, but this should fix things. Assuming it does, I'll post a v3 so I can add my SoB. diff --git a/include/linux/call_once.h b/include/linux/call_once.h index ddcfd91493ea..b053f4701c94 100644 --- a/include/linux/call_once.h +++ b/include/linux/call_once.h @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int (*cb)(struct once *)) return 0; guard(mutex)(&once->lock); - WARN_ON(atomic_read(&once->state) == ONCE_RUNNING); - if (atomic_read(&once->state) != ONCE_NOT_STARTED) + if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING)) return -EINVAL; + if (atomic_read(&once->state) == ONCE_COMPLETED) + return 0; + atomic_set(&once->state, ONCE_RUNNING); r = cb(once); if (r) ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCHv3 0/2] 2025-02-28 14:15 ` Sean Christopherson @ 2025-02-28 14:32 ` Sean Christopherson 2025-02-28 14:58 ` Keith Busch 0 siblings, 1 reply; 16+ messages in thread From: Sean Christopherson @ 2025-02-28 14:32 UTC (permalink / raw) To: Lei Yang Cc: Keith Busch, pbonzini, kvm, virtualization, x86, netdev, Keith Busch On Fri, Feb 28, 2025, Sean Christopherson wrote: > On Fri, Feb 28, 2025, Lei Yang wrote: > > Hi Keith > > > > V3 introduced a new bug, the following error messages from qemu output > > after applying this patch to boot up a guest. > > Doh, my bug. Not yet tested, but this should fix things. Assuming it does, I'll > post a v3 so I can add my SoB. v4 Confirmed that it worked, but deleting the pre-mutex check for ONCE_COMPLETED. Will post v4 later today. > diff --git a/include/linux/call_once.h b/include/linux/call_once.h > index ddcfd91493ea..b053f4701c94 100644 > --- a/include/linux/call_once.h > +++ b/include/linux/call_once.h > @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int (*cb)(struct once *)) > return 0; > > guard(mutex)(&once->lock); > - WARN_ON(atomic_read(&once->state) == ONCE_RUNNING); > - if (atomic_read(&once->state) != ONCE_NOT_STARTED) > + if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING)) > return -EINVAL; > > + if (atomic_read(&once->state) == ONCE_COMPLETED) > + return 0; > + > atomic_set(&once->state, ONCE_RUNNING); > r = cb(once); > if (r) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHv3 0/2] 2025-02-28 14:32 ` Sean Christopherson @ 2025-02-28 14:58 ` Keith Busch 2025-02-28 15:29 ` Sean Christopherson 0 siblings, 1 reply; 16+ messages in thread From: Keith Busch @ 2025-02-28 14:58 UTC (permalink / raw) To: Sean Christopherson Cc: Lei Yang, Keith Busch, pbonzini, kvm, virtualization, x86, netdev On Fri, Feb 28, 2025 at 06:32:47AM -0800, Sean Christopherson wrote: > On Fri, Feb 28, 2025, Sean Christopherson wrote: > > On Fri, Feb 28, 2025, Lei Yang wrote: > > > Hi Keith > > > > > > V3 introduced a new bug, the following error messages from qemu output > > > after applying this patch to boot up a guest. > > > > Doh, my bug. Not yet tested, but this should fix things. Assuming it does, I'll > > post a v3 so I can add my SoB. > v4 > > Confirmed that it worked, but deleting the pre-mutex check for ONCE_COMPLETED. > Will post v4 later today. > > > diff --git a/include/linux/call_once.h b/include/linux/call_once.h > > index ddcfd91493ea..b053f4701c94 100644 > > --- a/include/linux/call_once.h > > +++ b/include/linux/call_once.h > > @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int (*cb)(struct once *)) > > return 0; > > > > guard(mutex)(&once->lock); > > - WARN_ON(atomic_read(&once->state) == ONCE_RUNNING); > > - if (atomic_read(&once->state) != ONCE_NOT_STARTED) > > + if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING)) > > return -EINVAL; > > > > + if (atomic_read(&once->state) == ONCE_COMPLETED) > > + return 0; > > + > > atomic_set(&once->state, ONCE_RUNNING); > > r = cb(once); > > if (r) Possible suggestion since it seems odd to do an atomic_read twice on the same value. Maybe make this a switch: switch (atomic_read(&once->state) { case ONCE_NOT_STARTED: atomic_set(&once->state, ONCE_RUNNING); break; case ONCE_COMPLETED: return 0; case ONCE_RUNNING: default: WARN_ON(1); return -EINVAL; } ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHv3 0/2] 2025-02-28 14:58 ` Keith Busch @ 2025-02-28 15:29 ` Sean Christopherson 2025-02-28 15:36 ` Keith Busch 0 siblings, 1 reply; 16+ messages in thread From: Sean Christopherson @ 2025-02-28 15:29 UTC (permalink / raw) To: Keith Busch Cc: Lei Yang, Keith Busch, pbonzini, kvm, virtualization, x86, netdev On Fri, Feb 28, 2025, Keith Busch wrote: > On Fri, Feb 28, 2025 at 06:32:47AM -0800, Sean Christopherson wrote: > > > diff --git a/include/linux/call_once.h b/include/linux/call_once.h > > > index ddcfd91493ea..b053f4701c94 100644 > > > --- a/include/linux/call_once.h > > > +++ b/include/linux/call_once.h > > > @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int (*cb)(struct once *)) > > > return 0; > > > > > > guard(mutex)(&once->lock); > > > - WARN_ON(atomic_read(&once->state) == ONCE_RUNNING); > > > - if (atomic_read(&once->state) != ONCE_NOT_STARTED) > > > + if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING)) > > > return -EINVAL; > > > > > > + if (atomic_read(&once->state) == ONCE_COMPLETED) > > > + return 0; > > > + > > > atomic_set(&once->state, ONCE_RUNNING); > > > r = cb(once); > > > if (r) > > Possible suggestion since it seems odd to do an atomic_read twice on the > same value. Yeah, good call. At the risk of getting too cute, how about this? static inline int call_once(struct once *once, int (*cb)(struct once *)) { int r, state; /* Pairs with atomic_set_release() below. */ if (atomic_read_acquire(&once->state) == ONCE_COMPLETED) return 0; guard(mutex)(&once->lock); state = atomic_read(&once->state); if (unlikely(state != ONCE_NOT_STARTED)) return WARN_ON_ONCE(state != ONCE_COMPLETED) ? -EINVAL : 0; atomic_set(&once->state, ONCE_RUNNING); r = cb(once); if (r) atomic_set(&once->state, ONCE_NOT_STARTED); else atomic_set_release(&once->state, ONCE_COMPLETED); return r; } ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHv3 0/2] 2025-02-28 15:29 ` Sean Christopherson @ 2025-02-28 15:36 ` Keith Busch 2025-02-28 16:43 ` Paolo Bonzini 0 siblings, 1 reply; 16+ messages in thread From: Keith Busch @ 2025-02-28 15:36 UTC (permalink / raw) To: Sean Christopherson Cc: Lei Yang, Keith Busch, pbonzini, kvm, virtualization, x86, netdev On Fri, Feb 28, 2025 at 07:29:45AM -0800, Sean Christopherson wrote: > On Fri, Feb 28, 2025, Keith Busch wrote: > > On Fri, Feb 28, 2025 at 06:32:47AM -0800, Sean Christopherson wrote: > > > > @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int (*cb)(struct once *)) > > > > return 0; > > > > > > > > guard(mutex)(&once->lock); > > > > - WARN_ON(atomic_read(&once->state) == ONCE_RUNNING); > > > > - if (atomic_read(&once->state) != ONCE_NOT_STARTED) > > > > + if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING)) > > > > return -EINVAL; > > > > > > > > + if (atomic_read(&once->state) == ONCE_COMPLETED) > > > > + return 0; > > > > + > > > > atomic_set(&once->state, ONCE_RUNNING); > > > > r = cb(once); > > > > if (r) > > > > Possible suggestion since it seems odd to do an atomic_read twice on the > > same value. > > Yeah, good call. At the risk of getting too cute, how about this? Sure, that also looks good to me. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHv3 0/2] 2025-02-28 15:36 ` Keith Busch @ 2025-02-28 16:43 ` Paolo Bonzini 2025-02-28 17:03 ` Sean Christopherson 2025-07-08 22:17 ` Keith Busch 0 siblings, 2 replies; 16+ messages in thread From: Paolo Bonzini @ 2025-02-28 16:43 UTC (permalink / raw) To: Keith Busch, Sean Christopherson Cc: Lei Yang, Keith Busch, kvm, virtualization, x86, netdev On 2/28/25 16:36, Keith Busch wrote: > On Fri, Feb 28, 2025 at 07:29:45AM -0800, Sean Christopherson wrote: >> On Fri, Feb 28, 2025, Keith Busch wrote: >>> On Fri, Feb 28, 2025 at 06:32:47AM -0800, Sean Christopherson wrote: >>>>> @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int (*cb)(struct once *)) >>>>> return 0; >>>>> >>>>> guard(mutex)(&once->lock); >>>>> - WARN_ON(atomic_read(&once->state) == ONCE_RUNNING); >>>>> - if (atomic_read(&once->state) != ONCE_NOT_STARTED) >>>>> + if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING)) >>>>> return -EINVAL; >>>>> >>>>> + if (atomic_read(&once->state) == ONCE_COMPLETED) >>>>> + return 0; >>>>> + >>>>> atomic_set(&once->state, ONCE_RUNNING); >>>>> r = cb(once); >>>>> if (r) >>> >>> Possible suggestion since it seems odd to do an atomic_read twice on the >>> same value. >> >> Yeah, good call. At the risk of getting too cute, how about this? > > Sure, that also looks good to me. Just to overthink it a bit more, I'm changing "if (r)" to "if (r < 0)". Not because it's particularly useful to return a meaningful nonzero value on the first initialization, but more because 0+ for success and -errno for failure is a more common. Queued with this change, thanks. (Keith, I haven't forgotten about AVX by the way). Paolo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHv3 0/2] 2025-02-28 16:43 ` Paolo Bonzini @ 2025-02-28 17:03 ` Sean Christopherson 2025-07-08 22:17 ` Keith Busch 1 sibling, 0 replies; 16+ messages in thread From: Sean Christopherson @ 2025-02-28 17:03 UTC (permalink / raw) To: Paolo Bonzini Cc: Keith Busch, Lei Yang, Keith Busch, kvm, virtualization, x86, netdev On Fri, Feb 28, 2025, Paolo Bonzini wrote: > On 2/28/25 16:36, Keith Busch wrote: > > On Fri, Feb 28, 2025 at 07:29:45AM -0800, Sean Christopherson wrote: > > > On Fri, Feb 28, 2025, Keith Busch wrote: > > > > On Fri, Feb 28, 2025 at 06:32:47AM -0800, Sean Christopherson wrote: > > > > > > @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int (*cb)(struct once *)) > > > > > > return 0; > > > > > > guard(mutex)(&once->lock); > > > > > > - WARN_ON(atomic_read(&once->state) == ONCE_RUNNING); > > > > > > - if (atomic_read(&once->state) != ONCE_NOT_STARTED) > > > > > > + if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING)) > > > > > > return -EINVAL; > > > > > > + if (atomic_read(&once->state) == ONCE_COMPLETED) > > > > > > + return 0; > > > > > > + > > > > > > atomic_set(&once->state, ONCE_RUNNING); > > > > > > r = cb(once); > > > > > > if (r) > > > > > > > > Possible suggestion since it seems odd to do an atomic_read twice on the > > > > same value. > > > > > > Yeah, good call. At the risk of getting too cute, how about this? > > > > Sure, that also looks good to me. > > Just to overthink it a bit more, I'm changing "if (r)" to "if (r < 0)". Not > because it's particularly useful to return a meaningful nonzero value on the > first initialization, but more because 0+ for success and -errno for failure > is a more common. > > Queued with this change, thanks. If it's not too late, the first patch can/should use ERR_CAST() instead of a PTR_ERR() => ERR_PTR(): tsk = copy_process(NULL, 0, NUMA_NO_NODE, &args); if (IS_ERR(tsk)) { kfree(vtsk); return ERR_CAST(tsk); } And I was going to get greedy and replace spaces with tabs in call_once. The changelog for this patch is also misleading. KVM_RUN doesn't currently return -ERESTARTNOINTR, it only ever returns -ENOMEN. copy_process() is what returns -ERESTARTNOINTR. I also think it's worth calling out that it's a non-fatal signal. -- From: Sean Christopherson <seanjc@google.com> Date: Thu, 27 Feb 2025 15:06:31 -0800 Subject: [PATCH] KVM: x86/mmu: Allow retry of nx_huge_page_recovery_thread creation A VMM may send a non-fatal signal to its threads, including vCPU tasks, at any time, and thus may signal vCPU tasks during KVM_RUN. If a vCPU task receives the signal while its trying to spawn the huge page recovery vhost task, then KVM_RUN will fail due to copy_process() returning -ERESTARTNOINTR. Rework call_once() to mark the call complete if and only if the called function succeeds, and plumb the function's true error code back to the call_once() invoker. This provides userspace with the correct, non-fatal error code so that the VMM doesn't terminate the VM on -ENOMEM, and allows subsequent KVM_RUN a succeed by virtue of retrying creation of the NX huge page task. Opportunistically replace spaces with tabs in call_once.h. Fixes: 931656b9e2ff ("kvm: defer huge page recovery vhost task to later") Co-developed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/mmu/mmu.c | 10 ++++------ include/linux/call_once.h | 36 +++++++++++++++++++++--------------- 2 files changed, 25 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 70af12b693a3..63bb77ee1bb1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7633,7 +7633,7 @@ static bool kvm_nx_huge_page_recovery_worker(void *data) return true; } -static void kvm_mmu_start_lpage_recovery(struct once *once) +static int kvm_mmu_start_lpage_recovery(struct once *once) { struct kvm_arch *ka = container_of(once, struct kvm_arch, nx_once); struct kvm *kvm = container_of(ka, struct kvm, arch); @@ -7645,12 +7645,13 @@ static void kvm_mmu_start_lpage_recovery(struct once *once) kvm, "kvm-nx-lpage-recovery"); if (IS_ERR(nx_thread)) - return; + return PTR_ERR(nx_thread); vhost_task_start(nx_thread); /* Make the task visible only once it is fully started. */ WRITE_ONCE(kvm->arch.nx_huge_page_recovery_thread, nx_thread); + return 0; } int kvm_mmu_post_init_vm(struct kvm *kvm) @@ -7658,10 +7659,7 @@ int kvm_mmu_post_init_vm(struct kvm *kvm) if (nx_hugepage_mitigation_hard_disabled) return 0; - call_once(&kvm->arch.nx_once, kvm_mmu_start_lpage_recovery); - if (!kvm->arch.nx_huge_page_recovery_thread) - return -ENOMEM; - return 0; + return call_once(&kvm->arch.nx_once, kvm_mmu_start_lpage_recovery); } void kvm_mmu_pre_destroy_vm(struct kvm *kvm) diff --git a/include/linux/call_once.h b/include/linux/call_once.h index 6261aa0b3fb0..56cb9625b48b 100644 --- a/include/linux/call_once.h +++ b/include/linux/call_once.h @@ -9,15 +9,15 @@ #define ONCE_COMPLETED 2 struct once { - atomic_t state; - struct mutex lock; + atomic_t state; + struct mutex lock; }; static inline void __once_init(struct once *once, const char *name, struct lock_class_key *key) { - atomic_set(&once->state, ONCE_NOT_STARTED); - __mutex_init(&once->lock, name, key); + atomic_set(&once->state, ONCE_NOT_STARTED); + __mutex_init(&once->lock, name, key); } #define once_init(once) \ @@ -26,20 +26,26 @@ do { \ __once_init((once), #once, &__key); \ } while (0) -static inline void call_once(struct once *once, void (*cb)(struct once *)) +static inline int call_once(struct once *once, int (*cb)(struct once *)) { - /* Pairs with atomic_set_release() below. */ - if (atomic_read_acquire(&once->state) == ONCE_COMPLETED) - return; + int r, state; - guard(mutex)(&once->lock); - WARN_ON(atomic_read(&once->state) == ONCE_RUNNING); - if (atomic_read(&once->state) != ONCE_NOT_STARTED) - return; + /* Pairs with atomic_set_release() below. */ + if (atomic_read_acquire(&once->state) == ONCE_COMPLETED) + return 0; - atomic_set(&once->state, ONCE_RUNNING); - cb(once); - atomic_set_release(&once->state, ONCE_COMPLETED); + guard(mutex)(&once->lock); + state = atomic_read(&once->state); + if (unlikely(state != ONCE_NOT_STARTED)) + return WARN_ON_ONCE(state != ONCE_COMPLETED) ? -EINVAL : 0; + + atomic_set(&once->state, ONCE_RUNNING); + r = cb(once); + if (r) + atomic_set(&once->state, ONCE_NOT_STARTED); + else + atomic_set_release(&once->state, ONCE_COMPLETED); + return r; } #endif /* _LINUX_CALL_ONCE_H */ base-commit: 7d2322b0472eb402e8a206ba9b332b6c75f6f130 -- ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCHv3 0/2] 2025-02-28 16:43 ` Paolo Bonzini 2025-02-28 17:03 ` Sean Christopherson @ 2025-07-08 22:17 ` Keith Busch 1 sibling, 0 replies; 16+ messages in thread From: Keith Busch @ 2025-07-08 22:17 UTC (permalink / raw) To: Paolo Bonzini Cc: Sean Christopherson, Lei Yang, Keith Busch, kvm, virtualization, x86, netdev On Fri, Feb 28, 2025 at 05:43:43PM +0100, Paolo Bonzini wrote: > (Keith, I haven't forgotten about AVX by the way). Hey, how's that going by the way? :) Just checking in as I'm still having to carrying this part out of tree. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-07-08 22:17 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-27 23:06 [PATCHv3 0/2] Keith Busch 2025-02-27 23:06 ` [PATCHv3 1/2] vhost: return task creation error instead of NULL Keith Busch 2025-02-28 18:34 ` Mike Christie 2025-02-27 23:06 ` [PATCHv3 2/2] kvm: retry nx_huge_page_recovery_thread creation Keith Busch 2025-03-04 15:59 ` Simon Horman 2025-03-04 16:07 ` Keith Busch 2025-05-01 15:12 ` Frederick Lawler 2025-02-28 8:07 ` [PATCHv3 0/2] Lei Yang 2025-02-28 14:15 ` Sean Christopherson 2025-02-28 14:32 ` Sean Christopherson 2025-02-28 14:58 ` Keith Busch 2025-02-28 15:29 ` Sean Christopherson 2025-02-28 15:36 ` Keith Busch 2025-02-28 16:43 ` Paolo Bonzini 2025-02-28 17:03 ` Sean Christopherson 2025-07-08 22:17 ` Keith Busch
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).