public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Yan Zhao <yan.y.zhao@intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: <pbonzini@redhat.com>, <rick.p.edgecombe@intel.com>,
	<kai.huang@intel.com>, <isaku.yamahata@intel.com>,
	<dmatlack@google.com>, <sagis@google.com>,
	<erdemaktas@google.com>, <graf@amazon.com>,
	<linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>
Subject: Re: [PATCH v2 1/4] KVM: x86/mmu: Introduce a quirk to control memslot zap behavior
Date: Tue, 24 Sep 2024 08:58:11 +0800	[thread overview]
Message-ID: <ZvIOox8CncED/gSL@yzhao56-desk.sh.intel.com> (raw)
In-Reply-To: <ZvG1Wki4GvIyVWqB@google.com>

On Mon, Sep 23, 2024 at 11:37:14AM -0700, Sean Christopherson wrote:
> On Wed, Jul 03, 2024, Yan Zhao wrote:
> > Introduce the quirk KVM_X86_QUIRK_SLOT_ZAP_ALL to allow users to select
> > KVM's behavior when a memslot is moved or deleted for KVM_X86_DEFAULT_VM
> > VMs. Make sure KVM behave as if the quirk is always disabled for
> > non-KVM_X86_DEFAULT_VM VMs.
>  
> ...
> 
> > Suggested-by: Kai Huang <kai.huang@intel.com>
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> 
> Bad Sean, bad.
> 
> > +/*
> > + * Zapping leaf SPTEs with memslot range when a memslot is moved/deleted.
> > + *
> > + * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
> > + * case scenario we'll have unused shadow pages lying around until they
> > + * are recycled due to age or when the VM is destroyed.
> > + */
> > +static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *slot)
> > +{
> > +	struct kvm_gfn_range range = {
> > +		.slot = slot,
> > +		.start = slot->base_gfn,
> > +		.end = slot->base_gfn + slot->npages,
> > +		.may_block = true,
> > +	};
> > +	bool flush = false;
> > +
> > +	write_lock(&kvm->mmu_lock);
> > +
> > +	if (kvm_memslots_have_rmaps(kvm))
> > +		flush = kvm_handle_gfn_range(kvm, &range, kvm_zap_rmap);
> 
> This, and Paolo's merged variant, break shadow paging.  As was tried in commit
> 4e103134b862 ("KVM: x86/mmu: Zap only the relevant pages when removing a memslot"),
> all shadow pages, i.e. non-leaf SPTEs, need to be zapped.  All of the accounting
> for a shadow page is tied to the memslot, i.e. the shadow page holds a reference
> to the memslot, for all intents and purposes.  Deleting the memslot without removing
> all relevant shadow pages results in NULL pointer derefs when tearing down the VM.
> 
> Note, that commit is/was buggy, and I suspect my follow-up attempt[*] was as well.
> https://lore.kernel.org/all/20190820200318.GA15808@linux.intel.com
> 
> Rather than trying to get this functional for shadow paging (which includes nested
> TDP), I think we should scrap the quirk idea and simply make this the behavior for
> S-EPT and nothing else.
Ok. Thanks for identifying this error. Will change code to this way.
BTW: update some findings regarding to the previous bug with Nvidia GPU
assignment:
I found that after v5.19-rc1+, even with nx_huge_pages=N, the bug is not
reproducible when only leaf entries of memslot are zapped.
(no more detailed info due to limited time to debug).


> 
>  BUG: kernel NULL pointer dereference, address: 00000000000000b0
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 6085f43067 P4D 608c080067 PUD 608c081067 PMD 0 
>  Oops: Oops: 0000 [#1] SMP NOPTI
>  CPU: 79 UID: 0 PID: 187063 Comm: set_memory_regi Tainted: G        W          6.11.0-smp--24867312d167-cpl #395
>  Tainted: [W]=WARN
>  Hardware name: Google Astoria/astoria, BIOS 0.20240617.0-0 06/17/2024
>  RIP: 0010:__kvm_mmu_prepare_zap_page+0x3a9/0x7b0 [kvm]
>  Code:  <48> 8b 8e b0 00 00 00 48 8b 96 e0 00 00 00 48 c1 e9 09 48 29 c8 8b
>  RSP: 0018:ff314a25b19f7c28 EFLAGS: 00010212
>  Call Trace:
>   <TASK>
>   kvm_arch_flush_shadow_all+0x7a/0xf0 [kvm]
>   kvm_mmu_notifier_release+0x6c/0xb0 [kvm]
>   mmu_notifier_unregister+0x85/0x140
>   kvm_put_kvm+0x263/0x410 [kvm]
>   kvm_vm_release+0x21/0x30 [kvm]
>   __fput+0x8d/0x2c0
>   __se_sys_close+0x71/0xc0
>   do_syscall_64+0x83/0x160
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e

  reply	other threads:[~2024-09-24  1:00 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-03  2:09 [PATCH v2 0/4] Introduce a quirk to control memslot zap behavior Yan Zhao
2024-07-03  2:10 ` [PATCH v2 1/4] KVM: x86/mmu: " Yan Zhao
2024-09-23 18:37   ` Sean Christopherson
2024-09-24  0:58     ` Yan Zhao [this message]
2024-09-24  7:45       ` Sean Christopherson
2024-09-24  8:37         ` Yan Zhao
2024-11-26  9:15       ` Yan Zhao
2024-07-03  2:11 ` [PATCH v2 2/4] KVM: selftests: Test slot move/delete with slot zap quirk enabled/disabled Yan Zhao
2024-07-03  2:12 ` [PATCH v2 3/4] KVM: selftests: Allow slot modification stress test with quirk disabled Yan Zhao
2024-09-24 12:26   ` Aishwarya TCV
2024-09-25  0:42     ` Yan Zhao
2024-09-30 18:15       ` Mark Brown
2024-07-03  2:12 ` [PATCH v2 4/4] KVM: selftests: Test memslot move in memslot_perf_test " Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZvIOox8CncED/gSL@yzhao56-desk.sh.intel.com \
    --to=yan.y.zhao@intel.com \
    --cc=dmatlack@google.com \
    --cc=erdemaktas@google.com \
    --cc=graf@amazon.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=sagis@google.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox