All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Keith Busch <kbusch@meta.com>
Cc: kvm@vger.kernel.org, x86@kernel.org,
	linux-kernel@vger.kernel.org,  Vlad Poenaru <thevlad@meta.com>,
	tj@kernel.org, Keith Busch <kbusch@kernel.org>,
	 Paolo Bonzini <pbonzini@redhat.com>, Alyssa Ross <hi@alyssa.is>
Subject: Re: [PATCH] kvm: defer huge page recovery vhost task to later
Date: Fri, 24 Jan 2025 12:07:24 -0800	[thread overview]
Message-ID: <Z5Py_JYc8nYHNgZS@google.com> (raw)
In-Reply-To: <20250123153543.2769928-1-kbusch@meta.com>

On Thu, Jan 23, 2025, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Some libraries want to ensure they are single threaded before forking,
> so making the kernel's kvm huge page recovery process a vhost task of
> the user process breaks those. The minijail library used by crosvm is
> one such affected application.
> 
> Defer the task to after the first VM_RUN call, which occurs after the
> parent process has forked all its jailed processes. This needs to happen
> only once for the kvm instance, so this patch introduces infrastructure
> to do that (Suggested-by Paolo).
> 
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Tested-by: Alyssa Ross <hi@alyssa.is>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 26b4ba7e7cb5e..a45ae60e84ab4 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -7447,20 +7447,28 @@ static bool kvm_nx_huge_page_recovery_worker(void *data)
>  	return true;
>  }
>  
> -int kvm_mmu_post_init_vm(struct kvm *kvm)
> +static void kvm_mmu_start_lpage_recovery(struct once *once)
>  {
> -	if (nx_hugepage_mitigation_hard_disabled)
> -		return 0;
> +	struct kvm_arch *ka = container_of(once, struct kvm_arch, nx_once);
> +	struct kvm *kvm = container_of(ka, struct kvm, arch);
>  
>  	kvm->arch.nx_huge_page_last = get_jiffies_64();
>  	kvm->arch.nx_huge_page_recovery_thread = vhost_task_create(
>  		kvm_nx_huge_page_recovery_worker, kvm_nx_huge_page_recovery_worker_kill,
>  		kvm, "kvm-nx-lpage-recovery");
>  
> +	if (kvm->arch.nx_huge_page_recovery_thread)
> +		vhost_task_start(kvm->arch.nx_huge_page_recovery_thread);
> +}
> +
> +int kvm_mmu_post_init_vm(struct kvm *kvm)
> +{
> +	if (nx_hugepage_mitigation_hard_disabled)
> +		return 0;
> +
> +	call_once(&kvm->arch.nx_once, kvm_mmu_start_lpage_recovery);
>  	if (!kvm->arch.nx_huge_page_recovery_thread)
>  		return -ENOMEM;
> -
> -	vhost_task_start(kvm->arch.nx_huge_page_recovery_thread);
>  	return 0;
>  }
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6e248152fa134..6d4a6734b2d69 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11471,6 +11471,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  	struct kvm_run *kvm_run = vcpu->run;
>  	int r;
>  
> +	r = kvm_mmu_post_init_vm(vcpu->kvm);
> +	if (r)
> +		return r;

This is broken.  If the module param is toggled before the first KVM_RUN, KVM
will hit a NULL pointer deref due to trying to start a non-existent vhost task:

  BUG: kernel NULL pointer dereference, address: 0000000000000040
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0 
  Oops: Oops: 0000 [#1] SMP
  CPU: 16 UID: 0 PID: 1190 Comm: bash Not tainted 6.13.0-rc3-9bb02e874121-x86/xen_msr_fixes-vm #2382
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:vhost_task_wake+0x5/0x10
  Call Trace:
   <TASK>
   set_nx_huge_pages+0xcc/0x1e0 [kvm]
   param_attr_store+0x8a/0xd0
   module_attr_store+0x1a/0x30
   kernfs_fop_write_iter+0x12f/0x1e0
   vfs_write+0x233/0x3e0
   ksys_write+0x60/0xd0
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f3b52710104
   </TASK>
  Modules linked in: kvm_intel kvm
  CR2: 0000000000000040
  ---[ end trace 0000000000000000 ]---
 

  parent reply	other threads:[~2025-01-24 20:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-23 15:35 [PATCH] kvm: defer huge page recovery vhost task to later Keith Busch
2025-01-24 15:28 ` Paolo Bonzini
2025-01-24 16:48   ` Keith Busch
2025-01-24 20:07 ` Sean Christopherson [this message]
2025-01-24 20:54   ` Keith Busch
2025-01-25  0:10     ` Sean Christopherson
2025-01-25  4:05       ` Keith Busch
  -- strict thread matches above, loose matches on Subject: below --
2025-01-14 18:22 Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5Py_JYc8nYHNgZS@google.com \
    --to=seanjc@google.com \
    --cc=hi@alyssa.is \
    --cc=kbusch@kernel.org \
    --cc=kbusch@meta.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=thevlad@meta.com \
    --cc=tj@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.