From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB990155726 for ; Fri, 24 Jan 2025 20:07:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737749252; cv=none; b=KJpsZzfpbMXY2A5ijlpGp41zyxYRwEnFRqS43zJSnb67XrMzCtMNpinXe6ripg03ss3VpKet/qyMO/L+D/Yxr27ScGqIFvF6F+VwV+wSP+U6y6UVjKPvsLlIhw9jCUIsmhbwHAaguFQzNoyI5nDHqjBgF+Mm1N6dl0+7DoUUQt0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737749252; c=relaxed/simple; bh=ODSntvjD6W+4YpXp9Bh6F5/SFdR8yiKpNusKGnRb7N0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=LsY8Goy6C+/j28otQ3ARd2xafqLbg9lEG/7BfbzDfcC3I6vh6kqiSvmpHMkKSoc+RI/qxxTH4GM3OOTt8zUgQpPk94GTUIybwu6iezgoLPvKMNv26H03/sCAV+6oXElYcCBKXfgjCU5CXtVgG+Qekj85GP/WsPYCOdLjVptZ+0Q= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Oqjn3180; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Oqjn3180" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ef9b9981f1so6944058a91.3 for ; Fri, 24 Jan 2025 12:07:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737749250; x=1738354050; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=KhTPDZKClR8RwBmtEvLH4PFqfS6j7uQiwi1REIM/kTk=; b=Oqjn318044Ez2SOAXqAoFtOakrZVU6V6jmg2Kd5zQ32bfx0gc7RHAt++s6G+xeweRP JNrRkXNACAcwiJGfcP4nhdExdyPPdFIj9gTi4K5vxtH8csFnqHP4GmELiI95R9CNzslJ nWws/Zxvua7hhlz/HZ6+LzB1wREhYMgOyyNBQjMXagcnfYcGrqk/kgBMqn8rd51+dZKD 4zFEnY2SZwYfd2wFjOmyQaE9cUjq9Y0vlByubJmsOI3cfh2qqU471Gcc0rDWDJQyO+21 xiwTnYUEaJeKbirFI3Wf6S5k+FzyQrn6VubflT7Jlmt/Jv/fa1RPxTKtEtrJHelR0QV0 PGeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737749250; x=1738354050; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KhTPDZKClR8RwBmtEvLH4PFqfS6j7uQiwi1REIM/kTk=; b=tE8SauxO1pfPXqPV5sp2uQFbnCHdn/0zdM0/anemPEqOp1VrRCFpVs+xbQRJXSE7Yf Y4IAaxn7Rdu2ktuTz1hyOdrJD8eOR2roa1DZ0l3Z3NXZYcIg9YlF/UZDak7wiAgdVlzS 0twl9bfEyFkI/MaDRKz0kt5oMqX2lgh2juETntuTnoPwPfb2j6nnX/kMbSNP4HWBMy7P 4LV0TDuthdYeGTeHc/AcgMlDHG/we+r2ClxkjM1MMiZVSMKvswjwmwuruEYBwfUO5vxT bX5TYZGyLYvEz9bYeONaqAouEgxH0ze8tHCt4vHL+H6PwyOZy89cKZF77sG6PEaB1hTW 5VYA== X-Forwarded-Encrypted: i=1; AJvYcCVvHC5Eqy0NHK7wOfUnRiU0Q2b5OxbfIQ9I3J/MSL9cwX95ABkELC2vBpGZH+3j2du05tJv6J7b4iD5wVQ=@vger.kernel.org X-Gm-Message-State: AOJu0YyjKbCFjN4/KMgaWM+whKYxbPrKKCPBzoDYTdFeL6+2U4mzB++T 0F58kL/suP5m0RCJMuPNAeFdEvQN+K4rWQz5oSaZfbeBgxWMFI6W31ejrOjZRR5JBAl4MRidOca /XA== X-Google-Smtp-Source: AGHT+IGcJ6wP8hLd4osx+Vo7Twq6Px7+RiViueK+P+RCJ4HbMqpM6aOH1imQopNLRyQ/23ONZIEYC8znPL4= X-Received: from pfbbd39.prod.google.com ([2002:a05:6a00:27a7:b0:725:cd3b:3256]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:21cd:b0:72d:3861:895c with SMTP id d2e1a72fcca58-72dafa030ebmr48992669b3a.8.1737749249994; Fri, 24 Jan 2025 12:07:29 -0800 (PST) Date: Fri, 24 Jan 2025 12:07:24 -0800 In-Reply-To: <20250123153543.2769928-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250123153543.2769928-1-kbusch@meta.com> Message-ID: Subject: Re: [PATCH] kvm: defer huge page recovery vhost task to later From: Sean Christopherson To: Keith Busch Cc: kvm@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Vlad Poenaru , tj@kernel.org, Keith Busch , Paolo Bonzini , Alyssa Ross Content-Type: text/plain; charset="us-ascii" On Thu, Jan 23, 2025, Keith Busch wrote: > From: Keith Busch > > Some libraries want to ensure they are single threaded before forking, > so making the kernel's kvm huge page recovery process a vhost task of > the user process breaks those. The minijail library used by crosvm is > one such affected application. > > Defer the task to after the first VM_RUN call, which occurs after the > parent process has forked all its jailed processes. This needs to happen > only once for the kvm instance, so this patch introduces infrastructure > to do that (Suggested-by Paolo). > > Cc: Sean Christopherson > Cc: Paolo Bonzini > Tested-by: Alyssa Ross > Signed-off-by: Keith Busch > --- > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 26b4ba7e7cb5e..a45ae60e84ab4 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -7447,20 +7447,28 @@ static bool kvm_nx_huge_page_recovery_worker(void *data) > return true; > } > > -int kvm_mmu_post_init_vm(struct kvm *kvm) > +static void kvm_mmu_start_lpage_recovery(struct once *once) > { > - if (nx_hugepage_mitigation_hard_disabled) > - return 0; > + struct kvm_arch *ka = container_of(once, struct kvm_arch, nx_once); > + struct kvm *kvm = container_of(ka, struct kvm, arch); > > kvm->arch.nx_huge_page_last = get_jiffies_64(); > kvm->arch.nx_huge_page_recovery_thread = vhost_task_create( > kvm_nx_huge_page_recovery_worker, kvm_nx_huge_page_recovery_worker_kill, > kvm, "kvm-nx-lpage-recovery"); > > + if (kvm->arch.nx_huge_page_recovery_thread) > + vhost_task_start(kvm->arch.nx_huge_page_recovery_thread); > +} > + > +int kvm_mmu_post_init_vm(struct kvm *kvm) > +{ > + if (nx_hugepage_mitigation_hard_disabled) > + return 0; > + > + call_once(&kvm->arch.nx_once, kvm_mmu_start_lpage_recovery); > if (!kvm->arch.nx_huge_page_recovery_thread) > return -ENOMEM; > - > - vhost_task_start(kvm->arch.nx_huge_page_recovery_thread); > return 0; > } > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 6e248152fa134..6d4a6734b2d69 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -11471,6 +11471,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > struct kvm_run *kvm_run = vcpu->run; > int r; > > + r = kvm_mmu_post_init_vm(vcpu->kvm); > + if (r) > + return r; This is broken. If the module param is toggled before the first KVM_RUN, KVM will hit a NULL pointer deref due to trying to start a non-existent vhost task: BUG: kernel NULL pointer dereference, address: 0000000000000040 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP CPU: 16 UID: 0 PID: 1190 Comm: bash Not tainted 6.13.0-rc3-9bb02e874121-x86/xen_msr_fixes-vm #2382 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:vhost_task_wake+0x5/0x10 Call Trace: set_nx_huge_pages+0xcc/0x1e0 [kvm] param_attr_store+0x8a/0xd0 module_attr_store+0x1a/0x30 kernfs_fop_write_iter+0x12f/0x1e0 vfs_write+0x233/0x3e0 ksys_write+0x60/0xd0 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f3b52710104 Modules linked in: kvm_intel kvm CR2: 0000000000000040 ---[ end trace 0000000000000000 ]---