From: bugzilla-daemon@kernel.org
To: kvm@vger.kernel.org
Subject: [Bug 219588] [6.13.0-rc2+]WARNING: CPU: 52 PID: 12253 at arch/x86/kvm/mmu/tdp_mmu.c:1001 tdp_mmu_map_handle_target_level+0x1f0/0x310 [kvm]
Date: Mon, 16 Dec 2024 05:42:30 +0000 [thread overview]
Message-ID: <bug-219588-28872-3UuJWx01n1@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-219588-28872@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=219588
--- Comment #2 from leiyang@redhat.com ---
(In reply to Sean Christopherson from comment #1)
> On Wed, Dec 11, 2024, bugzilla-daemon@kernel.org wrote:
> > I hit a bug on the intel host, this problem occurs randomly:
> > [ 406.127925] ------------[ cut here ]------------
> > [ 406.132572] WARNING: CPU: 52 PID: 12253 at
> arch/x86/kvm/mmu/tdp_mmu.c:1001
> > tdp_mmu_map_handle_target_level+0x1f0/0x310 [kvm]
>
> Can you describe the host activity at the time of the WARN? E.g. is it under
> memory pressure and potentially swapping, is KSM or NUMA balancing active? I
> have a sound theory for how the scenario occurs on KVM's end, but I still
> think
> it's wrong for KVM to overwrite a writable SPTE with a read-only SPTE in this
> situation.
I spent some time for this problem so late reply. When host dmesg print this
error messages which running install a new guest via automation. And I found
this bug's reproducer is run this install case after the mchine first time
running(Let me introduce more to avoid ambiguity: 1. Must to test it when the
machine first time running this kernel,that mean's if I hit this problem then
reboot host, it can not reproduced again even if I run the same tests. 2.
Based on 1, I also must test this installation guest case, it can not
reporduced on other cases.). But through compare, this installation cases only
used pxe install based on a internal KS cfg is different other cases.
Sure, I think it's running under memory pressure and swapping. Based on
automation log, KSM is disable and I don't add NUMA in qemu command line.
If you have a machine can clone avocado and run tp-qemu tests, you can prepare
env then run this case:
unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads
>
> And does the VM have device memory or any other type of VM_PFNMAP or VM_IO
> memory exposed to it? E.g. an assigned device? If so, can you provide the
> register
> state from the other WARNs? If the PFNs are all in the same range, then
> maybe
> this is something funky with the VM_PFNMAP | VM_IO path.
I can confirm it has VM_IO because it runing installation case, VM is
constantly performing I/O operations
This my tests used memory and CPU configured, hope it help you debug this
problem:
-m 29G \
-smp 48,maxcpus=48,cores=24,threads=1,dies=1,sockets=2 \
And looks like there are no other device and no using VM_PFNMAP. Please
correct me if I'm wrong.
>
> The WARN is a sanity check I added because it should be impossible for KVM to
> install a non-writable SPTE overtop an existing writable SPTE. Or so I
> thought.
> The WARN is benign in the sense that nothing bad will happen _in KVM_; KVM
> correctly handles the unexpected change, the WARN is there purely to flag
> that
> something unexpected happen.
>
> if (new_spte == iter->old_spte)
> ret = RET_PF_SPURIOUS;
> else if (tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte))
> return RET_PF_RETRY;
> else if (is_shadow_present_pte(iter->old_spte) &&
> (!is_last_spte(iter->old_spte, iter->level) ||
> WARN_ON_ONCE(leaf_spte_change_needs_tlb_flush(iter->old_spte,
> new_spte)))) <====
> kvm_flush_remote_tlbs_gfn(vcpu->kvm, iter->gfn, iter->level);
>
> Cross referencing the register state
>
> RAX: 860000025e000bf7 RBX: ff4af92c619cf920 RCX: 0400000000000000
> RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000015
> RBP: ff4af92c619cf9e8 R08: 800000025e0009f5 R09: 0000000000000002
> R10: 000000005e000901 R11: 0000000000000001 R12: ff1e70694fc68000
> R13: 0000000000000005 R14: 0000000000000000 R15: ff4af92c619a1000
>
> with the disassembly
>
> 4885C8 TEST RAX,RCX
> 0F84EEFEFFFF JE 0000000000000-F1
> 4985C8 TEST R8,RCX
> 0F85E5FEFFFF JNE 0000000000000-F1
> 0F0B UD2
>
> RAX is the old SPTE and RCX is the new SPTE, i.e. the SPTE change is:
>
> 860000025e000bf7
> 800000025e0009f5
>
> On Intel, bits 57 and 58 are the host-writable and MMU-writable flags
>
> #define EPT_SPTE_HOST_WRITABLE BIT_ULL(57)
> #define EPT_SPTE_MMU_WRITABLE BIT_ULL(58)
>
> which means KVM is overwriting a writable SPTE with a non-writable SPTE
> because
> the current vCPU (a) hit a READ or EXEC fault on a non-present SPTE and (b)
> retrieved
> a non-writable PFN from the primary MMU, and that fault raced with a WRITE
> fault
> on a different vCPU that retrieved and installed a writable PFN.
>
> On a READ or EXEC fault, this code in hva_to_pfn_slow() should get a
> writable PFN.
> Given that KVM has an valid writable SPTE, the corresponding PTE in the
> primary MMU
> *must* be writable, otherwise there's a missing mmu_notifier invalidation.
>
> /* map read fault as writable if possible */
> if (!(flags & FOLL_WRITE) && kfp->map_writable &&
> get_user_page_fast_only(kfp->hva, FOLL_WRITE, &wpage)) {
> put_page(page);
> page = wpage;
> flags |= FOLL_WRITE;
> }
>
> out:
> *pfn = kvm_resolve_pfn(kfp, page, NULL, flags & FOLL_WRITE);
> return npages;
>
> Hmm, gup_fast_folio_allowed() has a few conditions where it will reject fast
> GUP,
> but they should be mutually exclusive with KVM having a writable SPTE. If
> the
> mapping is truncated or the folio is swapped out, secondary MMUs need to be
> invalidated before folio->mapping is nullified.
>
> /*
> * The mapping may have been truncated, in any case we cannot determine
> * if this mapping is safe - fall back to slow path to determine how to
> * proceed.
> */
> if (!mapping)
> return false;
>
> And secretmem can't be GUP'd, and it's not a long-term pin, so these checks
> don't
> apply either:
>
> if (check_secretmem && secretmem_mapping(mapping))
> return false;
> /* The only remaining allowed file system is shmem. */
> return !reject_file_backed || shmem_mapping(mapping);
>
> Similarly, hva_to_pfn_remapped() should get a writable PFN if said PFN is
> writable
> in the primary MMU, regardless of the fault type.
>
> If this turns out to get a legitimate scenario, then I think it makes sense
> to
> add an is_access_allowed() check and treat the fault as spurious. But I
> would
> like to try to bottom out on what exactly is happening, because I'm mildly
> concerned something is buggy in the primary MMU.
If you need me to provide more info, please feel free to let me know. And if
you sent a patch to fix this problem I can help to verified it, since I think
I found the stable reproducer.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
next prev parent reply other threads:[~2024-12-16 5:42 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-11 5:41 [Bug 219588] New: [6.13.0-rc2+]WARNING: CPU: 52 PID: 12253 at arch/x86/kvm/mmu/tdp_mmu.c:1001 tdp_mmu_map_handle_target_level+0x1f0/0x310 [kvm] bugzilla-daemon
2024-12-11 16:12 ` Sean Christopherson
2024-12-11 16:12 ` [Bug 219588] " bugzilla-daemon
2024-12-16 5:42 ` bugzilla-daemon [this message]
2024-12-16 23:52 ` Sean Christopherson
2024-12-16 23:53 ` bugzilla-daemon
2024-12-17 9:03 ` bugzilla-daemon
2025-03-20 14:48 ` bugzilla-daemon
2025-03-20 14:48 ` bugzilla-daemon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-219588-28872-3UuJWx01n1@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@kernel.org \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox