Re: AMD SNP guest kdump broken since linuxnext-20250908

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Srikanth Aithal <sraithal@amd.com>
Cc: Linux-Next Mailing List <linux-next@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	 KVM <kvm@vger.kernel.org>, Ashish Kalra <Ashish.Kalra@amd.com>,
	 Ard Biesheuvel <ardb@kernel.org>, Borislav Petkov <bp@alien8.de>,
	Tom Lendacky <thomas.lendacky@amd.com>
Subject: Re: AMD SNP guest kdump broken since linuxnext-20250908
Date: Wed, 24 Sep 2025 06:25:01 -0700	[thread overview]
Message-ID: <aNPxLQBxUau-FWtj@google.com> (raw)
In-Reply-To: <e8ace4cc-eb22-4117-b34d-16ecc1c8742d@amd.com>

+Ard and Boris (and Tom for good measure)

On Wed, Sep 24, 2025, Srikanth Aithal wrote:
> Hello all,
> 
> kdump on an SNP guest is broken in linux-next, starting with next-20250908 [1].
> 
> kdump on an SNP guest works with the following kernels as the guest kernel:
> 
> 1. https://git.kernel.org/pub/scm/virt/kvm/kvm.git, kvm/next
> 2. git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git next-20250905
> 3. git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git v6.17-rc7
> 
> The crash log during kdump varies each time. I have attached all variants of
> the error console logs to this bug report as files, as they are too large to
> include here.
> 
> kdump with other guest types (normal, SEV, SEV-ES) is working fine.
> 
> I attempted bisecting multiple times, but due to varying error console
> messages—sometimes with a call trace, sometimes just a hang with no error
> messages, and sometimes with extensive register dumps including KVM hardware
> error messages—I had no success until now. Additionally, a couple of
> linux-next bisect attempt pointed to a merge commit where the parent commits
> had no issues, suggesting a possible merge problem.
> 
> I am also attaching the host kernel config and guest kernel config used for
> these tests.
> 
> Tests were conducted with the following component versions:
> 
>  * Host kernel: next-20250919
>  * QEMU version: v10.1.0
>  * EDK2: edk2-stable202508
>  * Platform: Milan with the latest BIOS v2.20
> 
> 
> Thank you,
> 
> Srikanth Aithal <Srikanth.Aithal@amd.com>
> 
> root@ubuntu:~# echo c > /proc/sysrq-trigger
> [   26.686014] sysrq: Trigger a crash
> [   26.687006] Kernel panic - not syncing: sysrq triggered crash
> [   26.688594] CPU: 0 UID: 0 PID: 4235 Comm: bash Kdump: loaded Not tainted 6.17.0-rc7-next-20250923ce7f1a983b07 #1 PREEMPT(voluntary)
> [   26.691788] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
> [   26.693957] Call Trace:
> [   26.694681]  <TASK>
> [   26.695320]  vpanic+0x307/0x360
> [   26.696237]  panic+0x52/0x60
> [   26.697065]  sysrq_handle_crash+0x11/0x20
> [   26.698177]  __handle_sysrq+0xb6/0x170
> [   26.699220]  write_sysrq_trigger+0x50/0x70
> [   26.700358]  proc_reg_write+0x50/0x90
> [   26.701395]  ? preempt_count_add+0x42/0xa0
> [   26.702531]  vfs_write+0xf4/0x430
> [   26.703481]  ? handle_mm_fault+0xd0/0x200
> [   26.704602]  ksys_write+0x5c/0xd0
> [   26.705551]  do_syscall_64+0x4c/0x200
> [   26.706577]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   26.707961] RIP: 0033:0x7f4cb8024574
> [   26.708974] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d d5 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
> [   26.713912] RSP: 002b:00007ffdad4f3208 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> [   26.715976] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f4cb8024574
> [   26.717905] RDX: 0000000000000002 RSI: 0000564731e37b80 RDI: 0000000000000001
> [   26.719843] RBP: 00007ffdad4f3230 R08: 0000000000000073 R09: 0000000000000000
> [   26.721797] R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002
> [   26.723715] R13: 0000564731e37b80 R14: 00007f4cb810c5c0 R15: 00007f4cb8109ee0
> [   26.725658]  </TASK>
> 
> [1373710140.379273] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> [2800084354.542901] BUG: unable to handle page fault for address: ffffffff9a91e731
> [15541331571.597940] #PF: supervisor instruction fetch in kernel mode
> [11262208929.107056] #PF: error_code(0x0011) - permissions violation
> [15541331571.597940] PGD 800000e045067 P4D 800000e045067 PUD 800000e046063 PMD 80000021b8063 PTE 800800000e91e163

This is definitely a valid (i.e. not corrupted), NX mapping.

> [1373710140.379273] Oops: Oops: 0011 [#1] SMP NOPTI
> [11262208929.107056] CPU: 0 UID: 0 PID: 4235 Comm: bash Kdump: loaded Not tainted 6.17.0-rc7-next-20250923ce7f1a983b07 #1 PREEMPT(voluntary)
> [2800084354.542901] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
> [12688583143.270684] RIP: 0010:early_set_pages_state+0x0/0x120

Given that a lore search on early_set_pages_state lights up Ard's series[*] to
cleanup the boot code for SEV, and that said series is new in next-20250908 (NOT
in next-20250905), that seems like a likely culprit.

[*] https://lore.kernel.org/all/20250828102202.1849035-24-ardb+git@google.com

> [15541331571.597940] Code: 02 02 02 02 02 00 02 02 02 02 02 02 02 02 02 02 02 02 02 02 00 02 02 02 00 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 <02> 02 02 02 02 02 02 02 02 02 02 02 02 02 02 00 02 02 02 02 02 02
> [12688583143.270684] RSP: 0018:ffffb608807a7be0 EFLAGS: 00010006
> [1373710140.379273] RAX: ffff9ed0bfe53000 RBX: ffffffff9abecbe8 RCX: ffffb608807a7be8
> [2800084354.542901] RDX: 0000000000000001 RSI: 000000007fe53000 RDI: ffff9ed03fe53000
> [1373710140.379273] RBP: 0000000000000001 R08: 0000000000000001 R09: ffff9ed03fe53000
> [12688583143.270684] R10: 000000000f001000 R11: 0000000000000000 R12: ffff9ed03fe53000
> [15541331571.597940] R13: 0000000000000000 R14: ffff9ecfcf00a298 R15: 0000000000001000
> [11262208929.107056] FS:  00007f4cb7f05740(0000) GS:ffff9ed0a282c000(0000) knlGS:0000000000000000
> [2800084354.542901] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [18394079999.925196] CR2: ffffffff9a91e731 CR3: 000800000fb1c000 CR4: 00000000003506f0
> [12688583143.270684] Call Trace:
> [18394079999.925196]  <TASK>
> [2800084354.542901]  set_pages_state.part.0+0x63/0xa0
> [2800084354.542901]  snp_kexec_finish+0x432/0x490
> [12688583143.270684]  native_machine_crash_shutdown+0x65/0x90
> [15541331571.597940]  __crash_kexec+0x56/0x120
> [1373710140.379273]  ? __crash_kexec+0x104/0x120
> [12688583143.270684]  ? vpanic+0x2a2/0x360
> [18394079999.925196]  ? panic+0x52/0x60
> [11262208929.107056]  ? sysrq_handle_crash+0x11/0x20
> [16967705785.761568]  ? __handle_sysrq+0xb6/0x170
> [1373710140.379273]  ? write_sysrq_trigger+0x50/0x70
> [1373710140.379273]  ? proc_reg_write+0x50/0x90
> [18394079999.925196]  ? preempt_count_add+0x42/0xa0
> [2800084354.542901]  ? vfs_write+0xf4/0x430
> [11262208929.107056]  ? handle_mm_fault+0xd0/0x200
> [18394079999.925196]  ? ksys_write+0x5c/0xd0
> [12688583143.270684]  ? do_syscall_64+0x4c/0x200
> [11262208929.107056]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [15541331571.597940]  </TASK>
> [12688583143.270684] Modules linked in: efivarfs
> [2800084354.542901] CR2: ffffffff9a91e731
> [14114957357.434312] ---[ end trace 0000000000000000 ]---
> [11262208929.107056] RIP: 0010:early_set_pages_state+0x0/0x120
> [12688583143.270684] Code: 02 02 02 02 02 00 02 02 02 02 02 02 02 02 02 02 02 02 02 02 00 02 02 02 00 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 <02> 02 02 02 02 02 02 02 02 02 02 02 02 02 02 00 02 02 02 02 02 02
> [15541331571.597940] RSP: 0018:ffffb608807a7be0 EFLAGS: 00010006
> [14114957357.434312] RAX: ffff9ed0bfe53000 RBX: ffffffff9abecbe8 RCX: ffffb608807a7be8
> [2800084354.542901] RDX: 0000000000000001 RSI: 000000007fe53000 RDI: ffff9ed03fe53000
> [15541331571.597940] RBP: 0000000000000001 R08: 0000000000000001 R09: ffff9ed03fe53000
> [2800084354.542901] R10: 000000000f001000 R11: 0000000000000000 R12: ffff9ed03fe53000
> [2800084354.542901] R13: 0000000000000000 R14: ffff9ecfcf00a298 R15: 0000000000001000
> [2800084354.542901] FS:  00007f4cb7f05740(0000) GS:ffff9ed0a282c000(0000) knlGS:0000000000000000
> [14114957357.434312] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [11262208929.107056] CR2: ffffffff9a91e731 CR3: 000800000fb1c000 CR4: 00000000003506f0
> [12688583143.270684] Kernel panic - not syncing: Fatal exception

next prev parent reply	other threads:[~2025-09-24 13:25 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-24 12:05 AMD SNP guest kdump broken since linuxnext-20250908 Aithal, Srikanth
2025-09-24 13:25 ` Sean Christopherson [this message]
2025-09-24 14:15   ` Ard Biesheuvel
2025-09-24 14:43     ` Aithal, Srikanth
2025-09-24 14:45       ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aNPxLQBxUau-FWtj@google.com \
    --to=seanjc@google.com \
    --cc=Ashish.Kalra@amd.com \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=sraithal@amd.com \
    --cc=thomas.lendacky@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.