[BUG] KVM: NULL pointer dereference in kvm_tdp_mmu

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [BUG] KVM: NULL pointer dereference in kvm_tdp_mmu_map under memory pressure
@ 2026-04-08 10:29 punixcorn
  2026-04-08 11:21 ` punixcorn
  2026-04-08 14:18 ` Sean Christopherson
  0 siblings, 2 replies; 6+ messages in thread
From: punixcorn @ 2026-04-08 10:29 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, linux-kernel, punixcorn

Under host memory pressure, a NULL pointer dereference occurs in
kvm_tdp_mmu_map() at offset 0x24. The exact root cause is unclear --
it may be an unhandled NULL return from tdp_mmu_alloc_sp(), or a
violated invariant elsewhere in the map path.

Crash log:

  BUG: kernel NULL pointer dereference, address: 0000000000000024
  #PF: supervisor read access in kernel mode
  Oops: 0000 [#1] SMP NOPTI
  CPU: 2 PID: 1110212 Comm: MainLoopThread Tainted: G U OE 6.19.10-arch1-1
  Hardware name: Default Default/NLXB, BIOS BQ141 06/27/2024
  RIP: 0010:kvm_tdp_mmu_map+0x471/0x880 [kvm]
  Code: 00 00 00 80 48 2b 35 76 72 5c c8 48 c7 44 24 20 00 00 00 00 48 01 f1 48 c1 e9 0c 48 c1 e1 06 48 03 0d 4b 72 5c c8 48 8b 71 28 <0f> b6 4e 24 83 e1 0f 39 ca 0f 85 a7 02 00 00 f6 c4 08 74 26 80 7b
  RSP: 0018:ffffce128333f790 EFLAGS: 00010286

Reproduction:

The issue was observed under heavy host memory pressure while running
a KVM guest (Android emulator via QEMU).

The crash is not reliably reproducible and appears to be
timing-dependent. Fault injection targeting tdp_mmu_alloc_sp()
increases the frequency of hitting the same code path without
triggering a panic, suggesting the retry path may be a viable
recovery, though the exact failure condition is still unclear.

Fault injection used:

  sp = tdp_mmu_alloc_sp(vcpu);
  if (!sp || (atomic_inc_return(&fail_counter) % 100 == 0)) {
      if (sp) tdp_mmu_free_sp(sp);
      goto retry;
  }

With this injection the guest continues running normally initially,
but eventually terminates after sustained injection pressure. This is
expected behavior given the repeated forced failures.

A speculative fix:
  if (!sp)
      goto retry;

This has not been fully verified. Sending for maintainer review.

Environment:
  Linux 6.19.10-arch1-1 x86_64
  GNU C 15.2.1
  Binutils 2.46

Signed-off-by: punixcorn <ohyunwoods663@gmail.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] KVM: NULL pointer dereference in kvm_tdp_mmu_map under memory pressure
  2026-04-08 10:29 punixcorn
@ 2026-04-08 11:21 ` punixcorn
  2026-04-08 14:18 ` Sean Christopherson
  1 sibling, 0 replies; 6+ messages in thread
From: punixcorn @ 2026-04-08 11:21 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, linux-kernel

Following up with additional analysis from gdb.

The crash is at spte.h:263 in to_shadow_page(), not at the
tdp_mmu_alloc_sp() path as initially suspected.

  (gdb) list *(kvm_tdp_mmu_map+0x471)
  0x79451 is in kvm_tdp_mmu_map (mmu/spte.h:263)
  return (struct kvm_mmu_page *)page_private(page);

The crash location suggests page_private() is returning 0 for the
parent shadow page in tdp_mmu_init_child_sp(). The exact cause is
unclear. Sharing for maintainer review.

My earlier speculative fix (checking sp == NULL) was incorrect.
I am not familiar enough with the KVM MMU internals to propose a
correct fix. Sharing this in case it helps maintainers narrow down
the root cause.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] KVM: NULL pointer dereference in kvm_tdp_mmu_map under memory pressure
  2026-04-08 10:29 punixcorn
  2026-04-08 11:21 ` punixcorn
@ 2026-04-08 14:18 ` Sean Christopherson
  1 sibling, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2026-04-08 14:18 UTC (permalink / raw)
  To: punixcorn; +Cc: pbonzini, kvm, linux-kernel

On Wed, Apr 08, 2026, punixcorn wrote:
> Under host memory pressure, a NULL pointer dereference occurs in
> kvm_tdp_mmu_map() at offset 0x24. The exact root cause is unclear --
> it may be an unhandled NULL return from tdp_mmu_alloc_sp(), or a
> violated invariant elsewhere in the map path.

It's pretty much guaranteed to be the latter.

tdp_mmu_alloc_sp() can't fail, as KVM ensures vcpu->arch.mmu_page_header_cache
holds enough pre-allocated entries to service the page fault.  Even if that
invariant fails and KVM exhausts the cache, it should still be impossible for
kvm_mmu_memory_cache_alloc() to return NULL because it will either use a fallback
allocation (after WARNing) and succeed, or BUG_ON() and prevent hitting the NULL
pointer deref.

  void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
  {
	void *p;

	if (WARN_ON(!mc->nobjs))
		p = mmu_memory_cache_alloc_obj(mc, GFP_ATOMIC | __GFP_ACCOUNT);
	else
		p = mc->objects[--mc->nobjs];
	BUG_ON(!p);
	return p;
  }

And even if _that_ didn't suffice, tdp_mmu_alloc_sp() itself deferences the
return sp, so the NULL pointer deref would happen earlier.

  static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
  {
	struct kvm_mmu_page *sp;

	sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
	sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);

	return sp;
  }

> 
> Crash log:
> 
>   BUG: kernel NULL pointer dereference, address: 0000000000000024
>   #PF: supervisor read access in kernel mode
>   Oops: 0000 [#1] SMP NOPTI
>   CPU: 2 PID: 1110212 Comm: MainLoopThread Tainted: G U OE 6.19.10-arch1-1
>   Hardware name: Default Default/NLXB, BIOS BQ141 06/27/2024
>   RIP: 0010:kvm_tdp_mmu_map+0x471/0x880 [kvm]
>   Code: 00 00 00 80 48 2b 35 76 72 5c c8 48 c7 44 24 20 00 00 00 00 48 01 f1 48 c1 e9 0c 48 c1 e1 06 48 03 0d 4b 72 5c c8 48 8b 71 28 <0f> b6 4e 24 83 e1 0f 39 ca 0f 85 a7 02 00 00 f6 c4 08 74 26 80 7b
>   RSP: 0018:ffffce128333f790 EFLAGS: 00010286

As noted in your response, I'm 99% certain this is the first derefence of the
shadow page in tdp_mmu_map_handle_target_level():

  static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 					  struct kvm_page_fault *fault,
					  struct tdp_iter *iter)
  {
	struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(iter->sptep));
	u64 new_spte;
	int ret = RET_PF_FIXED;
	bool wrprot = false;

	if (WARN_ON_ONCE(sp->role.level != fault->goal_level))  <============= "sp" is NULL
		return RET_PF_RETRY;


The code stream lines up with that on my builds, and "role" is at offset 0x24.

I can think of three possible sources of failure:

  1. KVM installed a non-leaf SPTE without doing set_page_private().
  2. iter->sptep is corrupted/garbage.
  3. iter->sptep points at a freed shadow page, i.e. page->private was nullified
     due to the page being freed and/or re-allocated.

#1 seems unlikely as I wouldn't expect such a bug to manifest intermittently; the
code is pretty fixed/straightforward.

#2 isn't very likely either, given that it's dereferencing the shadow page that
fails.  I.e. KVM did _not_ fail grabbing the shadow page from iter->sptep, then
iter->sptep isn't complete garbage.  But it's still a possibility, e.g. if sptep
is garbage but happens to still point at a valid struct page.

#3 is the most likely option; as it would "just" require a violation of RCU
protection somewhere.

Can you run with this as a debug patch?  With luck, the output will provide some
hint as to what's going wrong.

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7b1102d26f9c..0332faf8ef9a 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1174,6 +1174,17 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
        int ret = RET_PF_FIXED;
        bool wrprot = false;
 
+       if (WARN_ON_ONCE(!sp)) {
+               pr_warn("NULL sp.  sptep = %lx, spte = %llx, pt[0] = %lx, pt[1] = %lx, pt[2] = %lx, pt[3] = %lx, pt[4] = %lx\n",
+                       (unsigned long)iter->sptep, iter->old_spte,
+                       (unsigned long)iter->pt_path[0],
+                       (unsigned long)iter->pt_path[1],
+                       (unsigned long)iter->pt_path[2],
+                       (unsigned long)iter->pt_path[3],
+                       (unsigned long)iter->pt_path[4]);
+               return RET_PF_RETRY;
+       }
+
        if (WARN_ON_ONCE(sp->role.level != fault->goal_level))
                return RET_PF_RETRY;
 


> Reproduction:
> 
> The issue was observed under heavy host memory pressure while running
> a KVM guest (Android emulator via QEMU).

Can you elaborate on the environment?  Specifically, what is your host setup?
E.g. CPU and platform info, and your .config.

> This has not been fully verified. Sending for maintainer review.
> 
> Environment:
>   Linux 6.19.10-arch1-1 x86_64
>   GNU C 15.2.1
>   Binutils 2.46
> 
> Signed-off-by: punixcorn <ohyunwoods663@gmail.com>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [BUG] KVM: NULL pointer dereference in kvm_tdp_mmu_map under memory pressure
       [not found] <202604081418.sean.christopherson@intel.com>
@ 2026-04-08 15:36 ` punixcorn
  2026-04-08 16:33   ` Sean Christopherson
  0 siblings, 1 reply; 6+ messages in thread
From: punixcorn @ 2026-04-08 15:36 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, linux-kernel, punixcorn

Hi Sean,

I attempted to trigger your debug patch via fault injection (zeroing
page_private on the allocated sp before it's linked), but the resulting
logs aren't meaningful -- every captured entry shows spte =
8000000000000000, a non-present SPTE, which doesn't reflect the real
crash scenario where the SPTE is present but page_private returns 0.
So I'm not sending those.

Natural reproduction is rare and I haven't caught it yet
with your patch applied.

Given that, what would you recommend as a next step? Would lockdep,
KASAN, or RCU debugging (CONFIG_PROVE_RCU) be worth enabling to catch
the violation when it happens naturally?

Environment:
- CPU: 13th Gen Intel(R) Core(TM) i5-13420H (12) @ 4.60 GHz
- RAM: 16GB (15Gi usable, 16Gi swap)
- OS: Arch Linux
- Kernel: 6.19.10-dirty #1 SMP PREEMPT_DYNAMIC Wed Apr 8 06:08:08 GMT 2026 x86_64
- /proc/cpuinfo: https://pastebin.com/pwvNYsCu
- .config: https://pastebin.com/z4fVZENs

The crash occurs while running an Android emulator (QEMU) under host
memory pressure.

Signed-off-by: punixcorn <ohyunwoods663@gmail.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] KVM: NULL pointer dereference in kvm_tdp_mmu_map under memory pressure
  2026-04-08 15:36 ` [BUG] KVM: NULL pointer dereference in kvm_tdp_mmu_map under memory pressure punixcorn
@ 2026-04-08 16:33   ` Sean Christopherson
  0 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2026-04-08 16:33 UTC (permalink / raw)
  To: punixcorn; +Cc: pbonzini, kvm, linux-kernel

On Wed, Apr 08, 2026, punixcorn wrote:
> Hi Sean,
> 
> I attempted to trigger your debug patch via fault injection (zeroing
> page_private on the allocated sp before it's linked), but the resulting
> logs aren't meaningful -- every captured entry shows spte =
> 8000000000000000, a non-present SPTE, which doesn't reflect the real
> crash scenario where the SPTE is present but page_private returns 0.
> So I'm not sending those.

Ya, I wouldn't expect synthetic injection to help root cause this.  
 
> Natural reproduction is rare and I haven't caught it yet with your patch
> applied.

How rare is rare?  Are we talking hours of runtime?  Days?

> Given that, what would you recommend as a next step?

If it's not too onerous, keep trying to reproduce with that initial debug patch.
If the time to repro is several hours (or more), I can try to provide a more
elaborate debug patch.

> Would lockdep, KASAN, or RCU debugging (CONFIG_PROVE_RCU) be worth enabling
> to catch the violation when it happens naturally?

Hmm, of those, KASAN has the best chance of being useful.  Thought it might make
reproducing the bug even more difficult.

> Environment:
> - CPU: 13th Gen Intel(R) Core(TM) i5-13420H (12) @ 4.60 GHz
> - RAM: 16GB (15Gi usable, 16Gi swap)
> - OS: Arch Linux
> - Kernel: 6.19.10-dirty #1 SMP PREEMPT_DYNAMIC Wed Apr 8 06:08:08 GMT 2026 x86_64
> - /proc/cpuinfo: https://pastebin.com/pwvNYsCu
> - .config: https://pastebin.com/z4fVZENs
> 
> The crash occurs while running an Android emulator (QEMU) under host
> memory pressure.
> 
> Signed-off-by: punixcorn <ohyunwoods663@gmail.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] KVM: NULL pointer dereference in kvm_tdp_mmu_map under memory pressure
       [not found] <202604081633.sean.christopherson@intel.com>
@ 2026-04-08 18:43 ` punixcorn
  0 siblings, 0 replies; 6+ messages in thread
From: punixcorn @ 2026-04-08 18:43 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, linux-kernel, punixcorn

To be honest, it could be days. The original crash happened only once 
in a month of heavy use, though my system has been hitting 100% RAM 
usage frequently. 

I suspect a specific transition-like a guest memory zap during high 
host contention-is the trigger. I am currently trying to reproduce 
this by scripting a loop that reloads the guest project (Android emulator)
while the host is under heavy memory load, as that was the environment 
when the crash occurred.

I’ll keep the current debug patch running. If I can't catch it within 
the next 48 hours, I’d be very interested in that more elaborate 
debug patch you mentioned to help track the SPTE lifecycle more 
closely.

Signed-off-by: punixcorn <ohyunwoods663@gmail.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-08 18:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <202604081418.sean.christopherson@intel.com>
2026-04-08 15:36 ` [BUG] KVM: NULL pointer dereference in kvm_tdp_mmu_map under memory pressure punixcorn
2026-04-08 16:33   ` Sean Christopherson
     [not found] <202604081633.sean.christopherson@intel.com>
2026-04-08 18:43 ` punixcorn
2026-04-08 10:29 punixcorn
2026-04-08 11:21 ` punixcorn
2026-04-08 14:18 ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox