All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	 Isaku Yamahata <isaku.yamahata@intel.com>
Subject: Re: [PATCH 7/7] KVM: VMX: Introduce test mode related to EPT violation VE
Date: Thu, 16 May 2024 18:40:02 -0700	[thread overview]
Message-ID: <Zka1cub00xu37mHP@google.com> (raw)
In-Reply-To: <ZkVHh49Hn8gB3_9o@google.com>

On Wed, May 15, 2024, Sean Christopherson wrote:
> On Tue, May 07, 2024, Paolo Bonzini wrote:
> > @@ -5200,6 +5215,9 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
> >  	if (is_invalid_opcode(intr_info))
> >  		return handle_ud(vcpu);
> >  
> > +	if (KVM_BUG_ON(is_ve_fault(intr_info), vcpu->kvm))
> > +		return -EIO;
> 
> I've hit this three times now when running KVM-Unit-Tests (I'm pretty sure it's
> the EPT test, unsurprisingly).  And unless I screwed up my testing, I verified it
> still fires with Isaku's fix[*], though I'm suddenly having problems repro'ing.
> 
> I'll update tomorrow as to whether I botched my testing of Isaku's fix, or if
> there's another bug lurking.

*sigh*

AFAICT, I'm hitting a hardware issue.  The #VE occurs when the CPU does an A/D
assist on an entry in the L2's PML4 (L2 GPA 0x109fff8).  EPT A/D bits are disabled,
and KVM has write-protected the GPA (hooray for shadowing EPT entries).  The CPU
tries to write the PML4 entry to do the A/D assist and generates what appears to
be a spurious #VE.

Isaku, please forward this to the necessary folks at Intel.  I doubt whatever
is broken will block TDX, but it would be nice to get a root cause so we at least
know whether or not TDX is a ticking time bomb.

A branch with fixes (nested support for PROVE_VE is broken) and debug hooks can
be found here:

  https://github.com/sean-jc/linux vmx/prove_ve_fixes

The failing KUT is nVMX's ept_access_test_not_present.  It is 100% reproducible
on my system (in isolation and in sequence).

  ./x86/run x86/vmx.flat -smp 1 -cpu max,host-phys-bits,+vmx -m 2560 -append ept_access_test_not_present

I ruled out KVM TLB flushing bugs by doing a full INVEPT before every entry to L2.

I (more or less) ruled out KVM SPTE bugs by printing the failing translation
before every entry to L2, and adding KVM_MMU_WARN_ON() checks on the paths that
write SPTEs to assert that the SPTE value won't generate a #VE.

I ruled out a completely bogus EPT Violation by simply resuming the guest without
clearing the #VE info's busy field, and verifying by tracepoints that the same
EPT violation occurs (and gets fixed by KVM).

Unless I botched the SPTE printing, which doesn't seem to be the case as the
printed SPTEs match KVM's tracepoints, I'm out of ideas.

Basic system info:

  processor       : 1
  vendor_id       : GenuineIntel
  cpu family      : 6
  model           : 106
  model name      : Intel(R) Xeon(R) Platinum 8373C CPU @ 2.60GHz
  stepping        : 6
  microcode       : 0xd0003b9
  cpu MHz         : 2600.000
  cache size      : 55296 KB
  physical id     : 0
  siblings        : 72
  core id         : 1
  cpu cores       : 36
  address sizes   : 46 bits physical, 57 bits virtual

Relevant addresses printed from the test:

  PTE[4] @ 109fff8 = 9fed0007
  PTE[3] @ 9fed0ff0 = 9fed1007
  PTE[2] @ 9fed1000 = 9fed2007
  VA PTE @ 9fed2000 = 8000000007
  Created EPT @ 9feca008 = 11d2007
  Created EPT @ 11d2000 = 11d3007
  Created EPT @ 11d3000 = 11d4007
  L1 hva = 40000000, hpa = 40000000, L2 gva = ffffffff80000000, gpa = 8000000000

And the splat from KVM, with extra printing of the exploding translation, and a
dump of the VMCS.

  kvm: VM-Enter 109fff8, spte[4] = 0x8000000000000000
  kvm: VM-Enter 109fff8, spte[4] = 0x80100040becb6807, spte[3] = 0x80100040911ed807, spte[2] = 0x86100040f9e008f7
  kvm: VM-Enter 109fff8, spte[4] = 0x80100040becb6807, spte[3] = 0x80100040911ed807, spte[2] = 0x86100040f9e008f7
  kvm: VM-Enter 109fff8, spte[4] = 0x80100040becb6807, spte[3] = 0x80100040911ed807, spte[2] = 0x86100040f9e008f7
  kvm: VM-Enter 109fff8, spte[4] = 0x80100040becb6807, spte[3] = 0x80100040911ed807, spte[2] = 0x86100040f9e008f7
  kvm: VM-Enter 109fff8, spte[4] = 0x80100040becb6807, spte[3] = 0x80100040911ed807, spte[2] = 0x86100040f9e008f7
  kvm: VM-Enter 109fff8, spte[4] = 0x80100040becb6807, spte[3] = 0x80100040911ed807, spte[2] = 0x86100040f9e008f7
  kvm: VM-Enter 109fff8, spte[4] = 0x80100040becb6807, spte[3] = 0x80100040911ed807, spte[2] = 0x86100040f9e008f7
  kvm: VM-Enter 109fff8, spte[4] = 0x80100040becb6807, spte[3] = 0x80100040911ed807, spte[2] = 0x86100040f9e008f7
  kvm: VM-Enter 109fff8, spte[4] = 0x80100040becb6807, spte[3] = 0x80100040911ed807, spte[2] = 0x86100040f9e008f7
  kvm: VM-Enter 109fff8, spte[4] = 0x80100040becb6807, spte[3] = 0x80100040911ed807, spte[2] = 0x86100040f9e008f7
  kvm: VM-Enter 109fff8, spte[4] = 0x80100040becb6807, spte[3] = 0x80100040911ed807, spte[2] = 0x82100040f9e008f5

  ------------[ cut here ]------------
  WARNING: CPU: 93 PID: 16309 at arch/x86/kvm/vmx/vmx.c:5217 handle_exception_nmi+0x418/0x5d0 [kvm_intel]
  Modules linked in: kvm_intel kvm vfat fat dummy bridge stp llc spidev cdc_ncm cdc_eem cdc_ether usbnet mii xhci_pci xhci_hcd ehci_pci ehci_hcd gq(O) sha3_generic [last unloaded: kvm]
  CPU: 93 PID: 16309 Comm: qemu Tainted: G S      W  O       6.9.0-smp--317ea923d74d-vmenter #319
  Hardware name: Google Interlaken/interlaken, BIOS 0.20231025.0-0 10/25/2023
  RIP: 0010:handle_exception_nmi+0x418/0x5d0 [kvm_intel]
  Code: 48 89 75 c8 44 0f 79 75 c8 2e 0f 86 bf 01 00 00 48 89 df be 01 00 00 00 4c 89 fa e8 f2 78 ed ff b8 01 00 00 00 e9 74 ff ff ff <0f> 0b 4c 8b b3 b8 22 00 00 41 8b 36 83 fe 30 74 09 f6 05 5a ac 01
  RSP: 0018:ff3c22846acebb38 EFLAGS: 00010246
  RAX: 0000000000000001 RBX: ff3c2284dff2c580 RCX: ff3c22845cba9000
  RDX: 4813020000000002 RSI: 0000000000000000 RDI: ff3c2284dff2c580
  RBP: ff3c22846acebb70 R08: ff3c2284a3b3a180 R09: 0000000000000001
  R10: 0000000000000005 R11: ffffffffc0978d80 R12: 0000000080000300
  R13: 0000000000000000 R14: 0000000080000314 R15: 0000000080000314
  FS:  00007fc71fc006c0(0000) GS:ff3c22c2bf880000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 000000012c9fc005 CR4: 0000000000773ef0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   <TASK>
   vmx_handle_exit+0x565/0x7e0 [kvm_intel]
   vcpu_run+0x188b/0x22b0 [kvm]
   kvm_arch_vcpu_ioctl_run+0x358/0x680 [kvm]
   kvm_vcpu_ioctl+0x4ca/0x5b0 [kvm]
   __se_sys_ioctl+0x7b/0xd0
   __x64_sys_ioctl+0x21/0x30
   x64_sys_call+0x15ac/0x2e40
   do_syscall_64+0x85/0x160
   ? clear_bhb_loop+0x45/0xa0
   ? clear_bhb_loop+0x45/0xa0
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7fc7c5e2bfbb
  Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
  RSP: 002b:00007fc71fbffbf0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fc7c5e2bfbb
  RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000c
  RBP: 000055557d2ef5f0 R08: 00007fc7c600e1c8 R09: 00007fc7c67ab0b0
  R10: 0000000000000123 R11: 0000000000000246 R12: 0000000000000000
  R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
   </TASK>
  ---[ end trace 0000000000000000 ]---

  kvm_intel: VMCS 0000000034d8de8f, last attempted VM-entry on CPU 93
  kvm_intel: *** Guest State ***
  kvm_intel: CR0: actual=0x0000000080010031, shadow=0x0000000080010031, gh_mask=fffffffffffefff7
  kvm_intel: CR4: actual=0x0000000000002060, shadow=0x0000000000002020, gh_mask=fffffffffffef871
  kvm_intel: CR3 = 0x000000000109f000
  kvm_intel: PDPTR0 = 0x0000000000000000  PDPTR1 = 0x0000000000000000
  kvm_intel: PDPTR2 = 0x0000000000000000  PDPTR3 = 0x0000000000000000
  kvm_intel: RSP = 0x000000009fec6f20  RIP = 0x0000000000410d39
  kvm_intel: RFLAGS=0x00010097         DR7 = 0x0000000000000400
  kvm_intel: Sysenter RSP=000000009fec8000 CS:RIP=0008:00000000004001d8
  kvm_intel: CS:   sel=0x0008, attr=0x0a09b, limit=0xffffffff, base=0x0000000000000000
  kvm_intel: DS:   sel=0x0010, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000
  kvm_intel: SS:   sel=0x0010, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000
  kvm_intel: ES:   sel=0x0010, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000
  kvm_intel: FS:   sel=0x0010, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000
  kvm_intel: GS:   sel=0x0010, attr=0x0c093, limit=0xffffffff, base=0x00000000005390f0
  kvm_intel: GDTR:                           limit=0x0000106f, base=0x000000000042aee0
  kvm_intel: LDTR: sel=0x0000, attr=0x00082, limit=0x0000ffff, base=0x0000000000000000
  kvm_intel: IDTR:                           limit=0x00000fff, base=0x000000000054aa60
  kvm_intel: TR:   sel=0x0080, attr=0x0008b, limit=0x0000ffff, base=0x00000000005442c0
  kvm_intel: EFER= 0x0000000000000500
  kvm_intel: PAT = 0x0007040600070406
  kvm_intel: DebugCtl = 0x0000000000000000  DebugExceptions = 0x0000000000000000
  kvm_intel: Interruptibility = 00000000  ActivityState = 00000000
  kvm_intel: MSR guest autoload:
  kvm_intel:    0: msr=0x00000600 value=0x0000000000000000
  kvm_intel: *** Host State ***
  kvm_intel: RIP = 0xffffffffc098e6c0  RSP = 0xff3c22846aceba38
  kvm_intel: CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040
  kvm_intel: FSBase=00007fc71fc006c0 GSBase=ff3c22c2bf880000 TRBase=fffffe5926d88000
  kvm_intel: GDTBase=fffffe5926d86000 IDTBase=fffffe0000000000
  kvm_intel: CR0=0000000080050033 CR3=000000012c9fc005 CR4=0000000000773ef0
  kvm_intel: Sysenter RSP=fffffe5926d88000 CS:RIP=0010:ffffffffb7801fa0
  kvm_intel: EFER= 0x0000000000000d01
  kvm_intel: PAT = 0x0407050600070106
  kvm_intel: MSR host autoload:
  kvm_intel:    0: msr=0x00000600 value=0xfffffe5926da0000
  kvm_intel: *** Control State ***
  kvm_intel: CPUBased=0xa5986dfa SecondaryExec=0x02040462 TertiaryExec=0x0000000000000000
  kvm_intel: PinBased=0x0000007f EntryControls=0000d3ff ExitControls=002befff
  kvm_intel: ExceptionBitmap=00160042 PFECmask=00000000 PFECmatch=00000000
  kvm_intel: VMEntry: intr_info=00000000 errcode=00000000 ilen=00000000
  kvm_intel: VMExit: intr_info=80000314 errcode=0000fff8 ilen=00000003
  kvm_intel:         reason=00000000 qualification=0000000000000000
  kvm_intel: IDTVectoring: info=00000000 errcode=00000000
  kvm_intel: TSC Offset = 0xffcd4eeccb7b3279
  kvm_intel: TSC Multiplier = 0x0001000000000000
  kvm_intel: EPT pointer = 0x0000000114fd601e
  kvm_intel: PLE Gap=00000000 Window=00000000
  kvm_intel: Virtual processor ID = 0x0001
  kvm_intel: VE info address = 0x0000000135a04000
  kvm_intel: ve_info: 0x00000030 0xffffffff 0x00000000000006ab 0xffffffff80000000 0x000000000109fff8 0x0000

  kvm: #VE 109fff8, spte[4] = 0x8010000136b61807, spte[3] = 0x8010000136b60807, spte[2] = 0x82100001950008f5
  kvm: VM-Enter 109fff8, spte[4] = 0x8010000136b61807, spte[3] = 0x8010000136b60807, spte[2] = 0x82100001950008f5
  kvm: VM-Enter 109fff8, spte[4] = 0x8010000136b61807, spte[3] = 0x8010000136b60807, spte[2] = 0x80100001b5790807, spte[1] = 0x861000019509f877

  reply	other threads:[~2024-05-17  1:40 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-07 15:44 [PATCH 0/7] KVM: MMU changes for TDX VE support Paolo Bonzini
2024-05-07 15:44 ` [PATCH 1/7] KVM: Allow page-sized MMU caches to be initialized with custom 64-bit values Paolo Bonzini
2024-05-07 15:44 ` [PATCH 2/7] KVM: x86/mmu: Replace hardcoded value 0 for the initial value for SPTE Paolo Bonzini
2024-05-15 17:32   ` Isaku Yamahata
2024-05-15 17:33     ` Paolo Bonzini
2024-05-07 15:44 ` [PATCH 3/7] KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed SPTE Paolo Bonzini
2024-05-07 15:44 ` [PATCH 4/7] KVM: x86/mmu: Add Suppress VE bit to EPT shadow_mmio_mask/shadow_present_mask Paolo Bonzini
2024-05-07 15:44 ` [PATCH 5/7] KVM: x86/mmu: Track shadow MMIO value on a per-VM basis Paolo Bonzini
2024-05-07 15:44 ` [PATCH 6/7] KVM, x86: add architectural support code for #VE Paolo Bonzini
2024-05-07 15:44 ` [PATCH 7/7] KVM: VMX: Introduce test mode related to EPT violation VE Paolo Bonzini
2024-05-15 23:38   ` Sean Christopherson
2024-05-17  1:40     ` Sean Christopherson [this message]
2024-05-17  9:56       ` Isaku Yamahata
2024-05-17 16:35         ` Sean Christopherson
2024-05-17 16:35     ` Paolo Bonzini
2024-05-17 16:38       ` Sean Christopherson
2024-05-17 17:09         ` Paolo Bonzini
2024-05-17 18:17           ` Sean Christopherson
2024-05-17 22:05             ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zka1cub00xu37mHP@google.com \
    --to=seanjc@google.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.