kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] iommu/amd: Fix host kdump support for SNP
@ 2025-07-15 19:26 Ashish Kalra
  2025-07-15 19:26 ` [PATCH v3 1/4] iommu/amd: Add support to remap/unmap IOMMU buffers for kdump Ashish Kalra
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Ashish Kalra @ 2025-07-15 19:26 UTC (permalink / raw)
  To: joro, suravee.suthikulpanit, thomas.lendacky, Sairaj.ArunKodilkar,
	Vasant.Hegde, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm

From: Ashish Kalra <ashish.kalra@amd.com>

When a crash is triggered the kernel attempts to shut down SEV-SNP
using the SNP_SHUTDOWN_EX command. If active SNP VMs are present,
SNP_SHUTDOWN_EX fails as firmware checks all encryption-capable ASIDs
to ensure none are in use and that a DF_FLUSH is not required. 

This casues the kdump kernel to boot with IOMMU SNP enforcement still
enabled and IOMMU completion wait buffers (CWBs), command buffers,
device tables and event buffer registers remain locked and exclusive
to the previous kernel. Attempts to allocate and use new buffers in
the kdump kernel fail, as the hardware ignores writes to the locked
MMIO registers (per AMD IOMMU spec Section 2.12.2.1).

As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
remapping which is required for proper operation.

This results in repeated "Completion-Wait loop timed out" errors and a
second kernel panic: "Kernel panic - not syncing: timer doesn't work
through Interrupt-remapped IO-APIC"

The following MMIO registers are locked and ignore writes after failed
SNP shutdown:
Device Table Base Address Register
Command Buffer Base Address Register
Event Buffer Base Address Register
Completion Store Base Register/Exclusion Base Register
Completion Store Limit Register/Exclusion Range Limit Register

Instead of allocating new buffers, re-use the previous kernel’s pages
for completion wait buffers, command buffers, event buffers and device
tables and operate with the already enabled SNP configuration and
existing data structures.

This approach is now used for kdump boot regardless of whether SNP is
enabled during kdump.

The fix enables successful crashkernel/kdump operation on SNP hosts
even when SNP_SHUTDOWN_EX fails.

Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")

v3:
- Moving to AMD IOMMU driver fix so that there is no need to do SNP_DECOMMISSION
during panic() and kdump kernel boot will be more agnostic to 
whether or not SNP_SHUTDOWN is done properly (or even done at all),
i.e., even with active SNP guests. Fixing crashkernel/kdump boot with IOMMU SNP/RMP
enforcement still enabled prior to kdump boot by reusing the pages of the previous 
kernel for IOMMU completion wait buffers, command buffer and device table and
memremap them during kdump boot.
- Rebased on linux-next.
- Split the original patch into smaller patches and prepare separate
patches for adding iommu_memremap() helper and remapping/unmapping of 
IOMMU buffers for kdump, Reusing device table for kdump and skip the
enabling of IOMMU buffers for kdump.
- Add new functions for remapping/unmapping IOMMU buffers and call
them from alloc_iommu_buffers/free_iommu_buffers in case of kdump boot
else call the exisiting alloc/free variants of CWB, command and event buffers.
- Skip SNP INIT in case of kdump boot.
- The final patch skips enabling IOMMU command buffer and event buffer
for kdump boot which fixes kdump on SNP host.
- Add comment that completion wait buffers are only re-used when SNP is
enabled.

Ashish Kalra (4):
  iommu/amd: Add support to remap/unmap IOMMU buffers for kdump
  iommu/amd: Reuse device table for kdump
  crypto: ccp: Skip SNP INIT for kdump boot
  iommu/amd: Fix host kdump support for SNP

 drivers/crypto/ccp/sev-dev.c        |   8 +
 drivers/iommu/amd/amd_iommu_types.h |   5 +
 drivers/iommu/amd/init.c            | 288 +++++++++++++++++++---------
 drivers/iommu/amd/iommu.c           |   2 +-
 4 files changed, 212 insertions(+), 91 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 1/4] iommu/amd: Add support to remap/unmap IOMMU buffers for kdump
  2025-07-15 19:26 [PATCH v3 0/4] iommu/amd: Fix host kdump support for SNP Ashish Kalra
@ 2025-07-15 19:26 ` Ashish Kalra
  2025-07-16  9:19   ` Vasant Hegde
  2025-07-15 19:27 ` [PATCH v3 2/4] iommu/amd: Reuse device table " Ashish Kalra
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Ashish Kalra @ 2025-07-15 19:26 UTC (permalink / raw)
  To: joro, suravee.suthikulpanit, thomas.lendacky, Sairaj.ArunKodilkar,
	Vasant.Hegde, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm

From: Ashish Kalra <ashish.kalra@amd.com>

After a panic if SNP is enabled in the previous kernel then the kdump
kernel boots with IOMMU SNP enforcement still enabled.

IOMMU completion wait buffers (CWBs), command buffers and event buffer
registers remain locked and exclusive to the previous kernel. Attempts
to allocate and use new buffers in the kdump kernel fail, as hardware
ignores writes to the locked MMIO registers as per AMD IOMMU spec
Section 2.12.2.1.

This results in repeated "Completion-Wait loop timed out" errors and a
second kernel panic: "Kernel panic - not syncing: timer doesn't work
through Interrupt-remapped IO-APIC"

The following MMIO registers are locked and ignore writes after failed
SNP shutdown:
Command Buffer Base Address Register
Event Log Base Address Register
Completion Store Base Register/Exclusion Base Register
Completion Store Limit Register/Exclusion Limit Register
As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
remapping, which is required for proper operation.

Reuse the pages of the previous kernel for completion wait buffers,
command buffers, event buffers and memremap them during kdump boot
and essentially work with an already enabled IOMMU configuration and
re-using the previous kernel’s data structures.

Reusing of command buffers and event buffers is now done for kdump boot
irrespective of SNP being enabled during kdump.

Re-use of completion wait buffers is only done when SNP is enabled as
the exclusion base register is used for the completion wait buffer
(CWB) address only when SNP is enabled.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 drivers/iommu/amd/amd_iommu_types.h |   5 +
 drivers/iommu/amd/init.c            | 163 ++++++++++++++++++++++++++--
 drivers/iommu/amd/iommu.c           |   2 +-
 3 files changed, 157 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 9b64cd706c96..082eb1270818 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -791,6 +791,11 @@ struct amd_iommu {
 	u32 flags;
 	volatile u64 *cmd_sem;
 	atomic64_t cmd_sem_val;
+	/*
+	 * Track physical address to directly use it in build_completion_wait()
+	 * and avoid adding any special checks and handling for kdump.
+	 */
+	u64 cmd_sem_paddr;
 
 #ifdef CONFIG_AMD_IOMMU_DEBUGFS
 	/* DebugFS Info */
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index cadb2c735ffc..32295f26be1b 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -710,6 +710,23 @@ static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
 	pci_seg->alias_table = NULL;
 }
 
+static inline void *iommu_memremap(unsigned long paddr, size_t size)
+{
+	phys_addr_t phys;
+
+	if (!paddr)
+		return NULL;
+
+	/*
+	 * Obtain true physical address in kdump kernel when SME is enabled.
+	 * Currently, IOMMU driver does not support booting into an unencrypted
+	 * kdump kernel.
+	 */
+	phys = __sme_clr(paddr);
+
+	return ioremap_encrypted(phys, size);
+}
+
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
  * write commands to that buffer later and the IOMMU will execute them
@@ -942,8 +959,105 @@ static int iommu_init_ga_log(struct amd_iommu *iommu)
 static int __init alloc_cwwb_sem(struct amd_iommu *iommu)
 {
 	iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL, 1);
+	if (!iommu->cmd_sem)
+		return -ENOMEM;
+	iommu->cmd_sem_paddr = iommu_virt_to_phys((void *)iommu->cmd_sem);
+	return 0;
+}
+
+static int __init remap_event_buffer(struct amd_iommu *iommu)
+{
+	u64 paddr;
+
+	pr_info_once("Re-using event buffer from the previous kernel\n");
+	/*
+	 * Read-back the event log base address register and apply
+	 * PM_ADDR_MASK to obtain the event log base address.
+	 */
+	paddr = readq(iommu->mmio_base + MMIO_EVT_BUF_OFFSET) & PM_ADDR_MASK;
+	iommu->evt_buf = iommu_memremap(paddr, EVT_BUFFER_SIZE);
+
+	return iommu->evt_buf ? 0 : -ENOMEM;
+}
+
+static int __init remap_command_buffer(struct amd_iommu *iommu)
+{
+	u64 paddr;
+
+	pr_info_once("Re-using command buffer from the previous kernel\n");
+	/*
+	 * Read-back the command buffer base address register and apply
+	 * PM_ADDR_MASK to obtain the command buffer base address.
+	 */
+	paddr = readq(iommu->mmio_base + MMIO_CMD_BUF_OFFSET) & PM_ADDR_MASK;
+	iommu->cmd_buf = iommu_memremap(paddr, CMD_BUFFER_SIZE);
+
+	return iommu->cmd_buf ? 0 : -ENOMEM;
+}
+
+static int __init remap_cwwb_sem(struct amd_iommu *iommu)
+{
+	u64 paddr;
+
+	if (check_feature(FEATURE_SNP)) {
+		/*
+		 * When SNP is enabled, the exclusion base register is used for the
+		 * completion wait buffer (CWB) address. Read and re-use it.
+		 */
+		pr_info_once("Re-using CWB buffers from the previous kernel\n");
+		/*
+		 * Read-back the exclusion base register and apply PM_ADDR_MASK
+		 * to obtain the exclusion range base address.
+		 */
+		paddr = readq(iommu->mmio_base + MMIO_EXCL_BASE_OFFSET) & PM_ADDR_MASK;
+		iommu->cmd_sem = iommu_memremap(paddr, PAGE_SIZE);
+		if (!iommu->cmd_sem)
+			return -ENOMEM;
+		iommu->cmd_sem_paddr = paddr;
+	} else {
+		return alloc_cwwb_sem(iommu);
+	}
+
+	return 0;
+}
+
+static int __init alloc_iommu_buffers(struct amd_iommu *iommu)
+{
+	int ret;
+
+	/*
+	 * IOMMU Completion Store Base MMIO, Command Buffer Base Address MMIO
+	 * registers are locked if SNP is enabled during kdump, reuse/remap
+	 * the previous kernel's allocated completion wait and command buffers
+	 * for kdump boot.
+	 */
+	if (is_kdump_kernel()) {
+		ret = remap_cwwb_sem(iommu);
+		if (ret)
+			return ret;
+
+		ret = remap_command_buffer(iommu);
+		if (ret)
+			return ret;
+
+		ret = remap_event_buffer(iommu);
+		if (ret)
+			return ret;
+	} else {
+		ret = alloc_cwwb_sem(iommu);
+		if (ret)
+			return ret;
+
+		ret = alloc_command_buffer(iommu);
+		if (ret)
+			return ret;
 
-	return iommu->cmd_sem ? 0 : -ENOMEM;
+		ret = alloc_event_buffer(iommu);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
 }
 
 static void __init free_cwwb_sem(struct amd_iommu *iommu)
@@ -951,6 +1065,38 @@ static void __init free_cwwb_sem(struct amd_iommu *iommu)
 	if (iommu->cmd_sem)
 		iommu_free_pages((void *)iommu->cmd_sem);
 }
+static void __init unmap_cwwb_sem(struct amd_iommu *iommu)
+{
+	if (iommu->cmd_sem) {
+		if (check_feature(FEATURE_SNP))
+			memunmap((void *)iommu->cmd_sem);
+		else
+			iommu_free_pages((void *)iommu->cmd_sem);
+	}
+}
+
+static void __init unmap_command_buffer(struct amd_iommu *iommu)
+{
+	memunmap((void *)iommu->cmd_buf);
+}
+
+static void __init unmap_event_buffer(struct amd_iommu *iommu)
+{
+	memunmap(iommu->evt_buf);
+}
+
+static void __init free_iommu_buffers(struct amd_iommu *iommu)
+{
+	if (is_kdump_kernel()) {
+		unmap_cwwb_sem(iommu);
+		unmap_command_buffer(iommu);
+		unmap_event_buffer(iommu);
+	} else {
+		free_cwwb_sem(iommu);
+		free_command_buffer(iommu);
+		free_event_buffer(iommu);
+	}
+}
 
 static void iommu_enable_xt(struct amd_iommu *iommu)
 {
@@ -1655,9 +1801,7 @@ static void __init free_sysfs(struct amd_iommu *iommu)
 static void __init free_iommu_one(struct amd_iommu *iommu)
 {
 	free_sysfs(iommu);
-	free_cwwb_sem(iommu);
-	free_command_buffer(iommu);
-	free_event_buffer(iommu);
+	free_iommu_buffers(iommu);
 	amd_iommu_free_ppr_log(iommu);
 	free_ga_log(iommu);
 	iommu_unmap_mmio_space(iommu);
@@ -1821,14 +1965,9 @@ static int __init init_iommu_one_late(struct amd_iommu *iommu)
 {
 	int ret;
 
-	if (alloc_cwwb_sem(iommu))
-		return -ENOMEM;
-
-	if (alloc_command_buffer(iommu))
-		return -ENOMEM;
-
-	if (alloc_event_buffer(iommu))
-		return -ENOMEM;
+	ret = alloc_iommu_buffers(iommu);
+	if (ret)
+		return ret;
 
 	iommu->int_enabled = false;
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 7dc6d2dd8dd1..8e1b475e39c7 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1201,7 +1201,7 @@ static void build_completion_wait(struct iommu_cmd *cmd,
 				  struct amd_iommu *iommu,
 				  u64 data)
 {
-	u64 paddr = iommu_virt_to_phys((void *)iommu->cmd_sem);
+	u64 paddr = iommu->cmd_sem_paddr;
 
 	memset(cmd, 0, sizeof(*cmd));
 	cmd->data[0] = lower_32_bits(paddr) | CMD_COMPL_WAIT_STORE_MASK;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 2/4] iommu/amd: Reuse device table for kdump
  2025-07-15 19:26 [PATCH v3 0/4] iommu/amd: Fix host kdump support for SNP Ashish Kalra
  2025-07-15 19:26 ` [PATCH v3 1/4] iommu/amd: Add support to remap/unmap IOMMU buffers for kdump Ashish Kalra
@ 2025-07-15 19:27 ` Ashish Kalra
  2025-07-16  9:42   ` Vasant Hegde
  2025-07-15 19:27 ` [PATCH v3 3/4] crypto: ccp: Skip SNP INIT for kdump boot Ashish Kalra
  2025-07-15 19:27 ` [PATCH v3 4/4] iommu/amd: Fix host kdump support for SNP Ashish Kalra
  3 siblings, 1 reply; 21+ messages in thread
From: Ashish Kalra @ 2025-07-15 19:27 UTC (permalink / raw)
  To: joro, suravee.suthikulpanit, thomas.lendacky, Sairaj.ArunKodilkar,
	Vasant.Hegde, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm

From: Ashish Kalra <ashish.kalra@amd.com>

After a panic if SNP is enabled in the previous kernel then the kdump
kernel boots with IOMMU SNP enforcement still enabled.

IOMMU device table register is locked and exclusive to the previous
kernel. Attempts to copy old device table from the previous kernel
fails in kdump kernel as hardware ignores writes to the locked device
table base address register as per AMD IOMMU spec Section 2.12.2.1.

This results in repeated "Completion-Wait loop timed out" errors and a
second kernel panic: "Kernel panic - not syncing: timer doesn't work
through Interrupt-remapped IO-APIC".

Reuse device table instead of copying device table in case of kdump
boot and remove all copying device table code.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 drivers/iommu/amd/init.c | 97 ++++++++++++----------------------------
 1 file changed, 28 insertions(+), 69 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 32295f26be1b..18bd869a82d9 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -406,6 +406,9 @@ static void iommu_set_device_table(struct amd_iommu *iommu)
 
 	BUG_ON(iommu->mmio_base == NULL);
 
+	if (is_kdump_kernel())
+		return;
+
 	entry = iommu_virt_to_phys(dev_table);
 	entry |= (dev_table_size >> 12) - 1;
 	memcpy_toio(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET,
@@ -646,7 +649,10 @@ static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
 
 static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg)
 {
-	iommu_free_pages(pci_seg->dev_table);
+	if (is_kdump_kernel())
+		memunmap((void *)pci_seg->dev_table);
+	else
+		iommu_free_pages(pci_seg->dev_table);
 	pci_seg->dev_table = NULL;
 }
 
@@ -1128,15 +1134,12 @@ static void set_dte_bit(struct dev_table_entry *dte, u8 bit)
 	dte->data[i] |= (1UL << _bit);
 }
 
-static bool __copy_device_table(struct amd_iommu *iommu)
+static bool __reuse_device_table(struct amd_iommu *iommu)
 {
-	u64 int_ctl, int_tab_len, entry = 0;
 	struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
-	struct dev_table_entry *old_devtb = NULL;
-	u32 lo, hi, devid, old_devtb_size;
+	u32 lo, hi, old_devtb_size;
 	phys_addr_t old_devtb_phys;
-	u16 dom_id, dte_v, irq_v;
-	u64 tmp;
+	u64 entry;
 
 	/* Each IOMMU use separate device table with the same size */
 	lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
@@ -1161,66 +1164,22 @@ static bool __copy_device_table(struct amd_iommu *iommu)
 		pr_err("The address of old device table is above 4G, not trustworthy!\n");
 		return false;
 	}
-	old_devtb = (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT) && is_kdump_kernel())
-		    ? (__force void *)ioremap_encrypted(old_devtb_phys,
-							pci_seg->dev_table_size)
-		    : memremap(old_devtb_phys, pci_seg->dev_table_size, MEMREMAP_WB);
-
-	if (!old_devtb)
-		return false;
 
-	pci_seg->old_dev_tbl_cpy = iommu_alloc_pages_sz(
-		GFP_KERNEL | GFP_DMA32, pci_seg->dev_table_size);
+	/*
+	 * IOMMU Device Table Base Address MMIO register is locked
+	 * if SNP is enabled during kdump, reuse the previous kernel's
+	 * device table.
+	 */
+	pci_seg->old_dev_tbl_cpy = iommu_memremap(old_devtb_phys, pci_seg->dev_table_size);
 	if (pci_seg->old_dev_tbl_cpy == NULL) {
-		pr_err("Failed to allocate memory for copying old device table!\n");
-		memunmap(old_devtb);
+		pr_err("Failed to remap memory for reusing old device table!\n");
 		return false;
 	}
 
-	for (devid = 0; devid <= pci_seg->last_bdf; ++devid) {
-		pci_seg->old_dev_tbl_cpy[devid] = old_devtb[devid];
-		dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
-		dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
-
-		if (dte_v && dom_id) {
-			pci_seg->old_dev_tbl_cpy[devid].data[0] = old_devtb[devid].data[0];
-			pci_seg->old_dev_tbl_cpy[devid].data[1] = old_devtb[devid].data[1];
-			/* Reserve the Domain IDs used by previous kernel */
-			if (ida_alloc_range(&pdom_ids, dom_id, dom_id, GFP_ATOMIC) != dom_id) {
-				pr_err("Failed to reserve domain ID 0x%x\n", dom_id);
-				memunmap(old_devtb);
-				return false;
-			}
-			/* If gcr3 table existed, mask it out */
-			if (old_devtb[devid].data[0] & DTE_FLAG_GV) {
-				tmp = (DTE_GCR3_30_15 | DTE_GCR3_51_31);
-				pci_seg->old_dev_tbl_cpy[devid].data[1] &= ~tmp;
-				tmp = (DTE_GCR3_14_12 | DTE_FLAG_GV);
-				pci_seg->old_dev_tbl_cpy[devid].data[0] &= ~tmp;
-			}
-		}
-
-		irq_v = old_devtb[devid].data[2] & DTE_IRQ_REMAP_ENABLE;
-		int_ctl = old_devtb[devid].data[2] & DTE_IRQ_REMAP_INTCTL_MASK;
-		int_tab_len = old_devtb[devid].data[2] & DTE_INTTABLEN_MASK;
-		if (irq_v && (int_ctl || int_tab_len)) {
-			if ((int_ctl != DTE_IRQ_REMAP_INTCTL) ||
-			    (int_tab_len != DTE_INTTABLEN_512 &&
-			     int_tab_len != DTE_INTTABLEN_2K)) {
-				pr_err("Wrong old irq remapping flag: %#x\n", devid);
-				memunmap(old_devtb);
-				return false;
-			}
-
-			pci_seg->old_dev_tbl_cpy[devid].data[2] = old_devtb[devid].data[2];
-		}
-	}
-	memunmap(old_devtb);
-
 	return true;
 }
 
-static bool copy_device_table(void)
+static bool reuse_device_table(void)
 {
 	struct amd_iommu *iommu;
 	struct amd_iommu_pci_seg *pci_seg;
@@ -1228,17 +1187,17 @@ static bool copy_device_table(void)
 	if (!amd_iommu_pre_enabled)
 		return false;
 
-	pr_warn("Translation is already enabled - trying to copy translation structures\n");
+	pr_warn("Translation is already enabled - trying to reuse translation structures\n");
 
 	/*
 	 * All IOMMUs within PCI segment shares common device table.
-	 * Hence copy device table only once per PCI segment.
+	 * Hence reuse device table only once per PCI segment.
 	 */
 	for_each_pci_segment(pci_seg) {
 		for_each_iommu(iommu) {
 			if (pci_seg->id != iommu->pci_seg->id)
 				continue;
-			if (!__copy_device_table(iommu))
+			if (!__reuse_device_table(iommu))
 				return false;
 			break;
 		}
@@ -2917,8 +2876,8 @@ static void early_enable_iommu(struct amd_iommu *iommu)
  * This function finally enables all IOMMUs found in the system after
  * they have been initialized.
  *
- * Or if in kdump kernel and IOMMUs are all pre-enabled, try to copy
- * the old content of device table entries. Not this case or copy failed,
+ * Or if in kdump kernel and IOMMUs are all pre-enabled, try to reuse
+ * the old content of device table entries. Not this case or reuse failed,
  * just continue as normal kernel does.
  */
 static void early_enable_iommus(void)
@@ -2926,18 +2885,18 @@ static void early_enable_iommus(void)
 	struct amd_iommu *iommu;
 	struct amd_iommu_pci_seg *pci_seg;
 
-	if (!copy_device_table()) {
+	if (!reuse_device_table()) {
 		/*
-		 * If come here because of failure in copying device table from old
+		 * If come here because of failure in reusing device table from old
 		 * kernel with all IOMMUs enabled, print error message and try to
 		 * free allocated old_dev_tbl_cpy.
 		 */
 		if (amd_iommu_pre_enabled)
-			pr_err("Failed to copy DEV table from previous kernel.\n");
+			pr_err("Failed to reuse DEV table from previous kernel.\n");
 
 		for_each_pci_segment(pci_seg) {
 			if (pci_seg->old_dev_tbl_cpy != NULL) {
-				iommu_free_pages(pci_seg->old_dev_tbl_cpy);
+				memunmap((void *)pci_seg->old_dev_tbl_cpy);
 				pci_seg->old_dev_tbl_cpy = NULL;
 			}
 		}
@@ -2947,7 +2906,7 @@ static void early_enable_iommus(void)
 			early_enable_iommu(iommu);
 		}
 	} else {
-		pr_info("Copied DEV table from previous kernel.\n");
+		pr_info("Reused DEV table from previous kernel.\n");
 
 		for_each_pci_segment(pci_seg) {
 			iommu_free_pages(pci_seg->dev_table);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 3/4] crypto: ccp: Skip SNP INIT for kdump boot
  2025-07-15 19:26 [PATCH v3 0/4] iommu/amd: Fix host kdump support for SNP Ashish Kalra
  2025-07-15 19:26 ` [PATCH v3 1/4] iommu/amd: Add support to remap/unmap IOMMU buffers for kdump Ashish Kalra
  2025-07-15 19:27 ` [PATCH v3 2/4] iommu/amd: Reuse device table " Ashish Kalra
@ 2025-07-15 19:27 ` Ashish Kalra
  2025-07-16  9:20   ` Vasant Hegde
  2025-07-15 19:27 ` [PATCH v3 4/4] iommu/amd: Fix host kdump support for SNP Ashish Kalra
  3 siblings, 1 reply; 21+ messages in thread
From: Ashish Kalra @ 2025-07-15 19:27 UTC (permalink / raw)
  To: joro, suravee.suthikulpanit, thomas.lendacky, Sairaj.ArunKodilkar,
	Vasant.Hegde, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm

From: Ashish Kalra <ashish.kalra@amd.com>

If SNP is enabled and initialized in the previous kernel then SNP is
already initialized for kdump boot and attempting SNP INIT again
during kdump boot causes SNP INIT failure and eventually leads to
IOMMU failures.

Skip SNP INIT if doing kdump boot.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 17edc6bf5622..19df68963602 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -28,6 +28,7 @@
 #include <linux/fs_struct.h>
 #include <linux/psp.h>
 #include <linux/amd-iommu.h>
+#include <linux/crash_dump.h>
 
 #include <asm/smp.h>
 #include <asm/cacheflush.h>
@@ -1114,6 +1115,13 @@ static int __sev_snp_init_locked(int *error)
 	if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
 		return -ENODEV;
 
+	/*
+	 * Skip SNP INIT for kdump boot as SNP is already initialized if
+	 * SNP is enabled.
+	 */
+	if (is_kdump_kernel())
+		return 0;
+
 	sev = psp->sev_data;
 
 	if (sev->snp_initialized)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 4/4] iommu/amd: Fix host kdump support for SNP
  2025-07-15 19:26 [PATCH v3 0/4] iommu/amd: Fix host kdump support for SNP Ashish Kalra
                   ` (2 preceding siblings ...)
  2025-07-15 19:27 ` [PATCH v3 3/4] crypto: ccp: Skip SNP INIT for kdump boot Ashish Kalra
@ 2025-07-15 19:27 ` Ashish Kalra
  2025-07-16  9:46   ` Vasant Hegde
  3 siblings, 1 reply; 21+ messages in thread
From: Ashish Kalra @ 2025-07-15 19:27 UTC (permalink / raw)
  To: joro, suravee.suthikulpanit, thomas.lendacky, Sairaj.ArunKodilkar,
	Vasant.Hegde, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm

From: Ashish Kalra <ashish.kalra@amd.com>

When a crash is triggered the kernel attempts to shut down SEV-SNP
using the SNP_SHUTDOWN_EX command. If active SNP VMs are present,
SNP_SHUTDOWN_EX fails as firmware checks all encryption-capable ASIDs
to ensure none are in use and that a DF_FLUSH is not required. If a
DF_FLUSH is required, the firmware returns DFFLUSH_REQUIRED, causing
SNP_SHUTDOWN_EX to fail.

This casues the kdump kernel to boot with IOMMU SNP enforcement still
enabled and IOMMU completion wait buffers (CWBs), command buffers,
device tables and event buffer registers remain locked and exclusive
to the previous kernel. Attempts to allocate and use new buffers in
the kdump kernel fail, as the hardware ignores writes to the locked
MMIO registers (per AMD IOMMU spec Section 2.12.2.1).

As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
remapping which is required for proper operation.

This results in repeated "Completion-Wait loop timed out" errors and a
second kernel panic: "Kernel panic - not syncing: timer doesn't work
through Interrupt-remapped IO-APIC"

The following MMIO registers are locked and ignore writes after failed
SNP shutdown:
Device Table Base Address Register
Command Buffer Base Address Register
Event Buffer Base Address Register
Completion Store Base Register/Exclusion Base Register
Completion Store Limit Register/Exclusion Range Limit Register

Instead of allocating new buffers, re-use the previous kernel’s pages
for completion wait buffers, command buffers, event buffers and device
tables and operate with the already enabled SNP configuration and
existing data structures.

This approach is now used for kdump boot regardless of whether SNP is
enabled during kdump.

The fix enables successful crashkernel/kdump operation on SNP hosts
even when SNP_SHUTDOWN_EX fails.

Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 drivers/iommu/amd/init.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 18bd869a82d9..3f24fd775d6e 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -818,11 +818,16 @@ static void iommu_enable_command_buffer(struct amd_iommu *iommu)
 
 	BUG_ON(iommu->cmd_buf == NULL);
 
-	entry = iommu_virt_to_phys(iommu->cmd_buf);
-	entry |= MMIO_CMD_SIZE_512;
-
-	memcpy_toio(iommu->mmio_base + MMIO_CMD_BUF_OFFSET,
-		    &entry, sizeof(entry));
+	if (!is_kdump_kernel()) {
+		/*
+		 * Command buffer is re-used for kdump kernel and setting
+		 * of MMIO register is not required.
+		 */
+		entry = iommu_virt_to_phys(iommu->cmd_buf);
+		entry |= MMIO_CMD_SIZE_512;
+		memcpy_toio(iommu->mmio_base + MMIO_CMD_BUF_OFFSET,
+			    &entry, sizeof(entry));
+	}
 
 	amd_iommu_reset_cmd_buffer(iommu);
 }
@@ -873,10 +878,15 @@ static void iommu_enable_event_buffer(struct amd_iommu *iommu)
 
 	BUG_ON(iommu->evt_buf == NULL);
 
-	entry = iommu_virt_to_phys(iommu->evt_buf) | EVT_LEN_MASK;
-
-	memcpy_toio(iommu->mmio_base + MMIO_EVT_BUF_OFFSET,
-		    &entry, sizeof(entry));
+	if (!is_kdump_kernel()) {
+		/*
+		 * Event buffer is re-used for kdump kernel and setting
+		 * of MMIO register is not required.
+		 */
+		entry = iommu_virt_to_phys(iommu->evt_buf) | EVT_LEN_MASK;
+		memcpy_toio(iommu->mmio_base + MMIO_EVT_BUF_OFFSET,
+			    &entry, sizeof(entry));
+	}
 
 	/* set head and tail to zero manually */
 	writel(0x00, iommu->mmio_base + MMIO_EVT_HEAD_OFFSET);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/4] iommu/amd: Add support to remap/unmap IOMMU buffers for kdump
  2025-07-15 19:26 ` [PATCH v3 1/4] iommu/amd: Add support to remap/unmap IOMMU buffers for kdump Ashish Kalra
@ 2025-07-16  9:19   ` Vasant Hegde
  2025-07-16 21:55     ` Kalra, Ashish
  0 siblings, 1 reply; 21+ messages in thread
From: Vasant Hegde @ 2025-07-16  9:19 UTC (permalink / raw)
  To: Ashish Kalra, joro, suravee.suthikulpanit, thomas.lendacky,
	Sairaj.ArunKodilkar, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm

Hi Ashish,


On 7/16/2025 12:56 AM, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> After a panic if SNP is enabled in the previous kernel then the kdump
> kernel boots with IOMMU SNP enforcement still enabled.
> 
> IOMMU completion wait buffers (CWBs), command buffers and event buffer
> registers remain locked and exclusive to the previous kernel. Attempts
> to allocate and use new buffers in the kdump kernel fail, as hardware
> ignores writes to the locked MMIO registers as per AMD IOMMU spec
> Section 2.12.2.1.
> 
> This results in repeated "Completion-Wait loop timed out" errors and a
> second kernel panic: "Kernel panic - not syncing: timer doesn't work
> through Interrupt-remapped IO-APIC"
> 
> The following MMIO registers are locked and ignore writes after failed
> SNP shutdown:
> Command Buffer Base Address Register
> Event Log Base Address Register
> Completion Store Base Register/Exclusion Base Register
> Completion Store Limit Register/Exclusion Limit Register
> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
> remapping, which is required for proper operation.

There are couple of other registers in locked list. Can you please rephrase
above paras?  Also you don't need to callout indivisual registers here. You can
just add link to IOMMU spec.

Unrelated to this patch :
  I went to some of the SNP related code in IOMMU driver. One thing confused me
is in amd_iommu_snp_disable() code why Command buffer is not marked as shared?
any idea?


> 
> Reuse the pages of the previous kernel for completion wait buffers,
> command buffers, event buffers and memremap them during kdump boot
> and essentially work with an already enabled IOMMU configuration and
> re-using the previous kernel’s data structures.
> 
> Reusing of command buffers and event buffers is now done for kdump boot
> irrespective of SNP being enabled during kdump.
> 
> Re-use of completion wait buffers is only done when SNP is enabled as
> the exclusion base register is used for the completion wait buffer
> (CWB) address only when SNP is enabled.
> 
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  drivers/iommu/amd/amd_iommu_types.h |   5 +
>  drivers/iommu/amd/init.c            | 163 ++++++++++++++++++++++++++--
>  drivers/iommu/amd/iommu.c           |   2 +-
>  3 files changed, 157 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
> index 9b64cd706c96..082eb1270818 100644
> --- a/drivers/iommu/amd/amd_iommu_types.h
> +++ b/drivers/iommu/amd/amd_iommu_types.h
> @@ -791,6 +791,11 @@ struct amd_iommu {
>  	u32 flags;
>  	volatile u64 *cmd_sem;
>  	atomic64_t cmd_sem_val;
> +	/*
> +	 * Track physical address to directly use it in build_completion_wait()
> +	 * and avoid adding any special checks and handling for kdump.
> +	 */
> +	u64 cmd_sem_paddr;

With this we are tracking both physical and virtual address? Is that really
needed? Can we just track PA and convert it into va?

>  
>  #ifdef CONFIG_AMD_IOMMU_DEBUGFS
>  	/* DebugFS Info */
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index cadb2c735ffc..32295f26be1b 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -710,6 +710,23 @@ static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
>  	pci_seg->alias_table = NULL;
>  }
>  
> +static inline void *iommu_memremap(unsigned long paddr, size_t size)
> +{
> +	phys_addr_t phys;
> +
> +	if (!paddr)
> +		return NULL;
> +
> +	/*
> +	 * Obtain true physical address in kdump kernel when SME is enabled.
> +	 * Currently, IOMMU driver does not support booting into an unencrypted
> +	 * kdump kernel.

You mean production kernel w/ SME and kdump kernel with non-SME is not supported?


> +	 */
> +	phys = __sme_clr(paddr);
> +
> +	return ioremap_encrypted(phys, size);

You are clearing C bit and then immediately remapping using encrypted mode. Also
existing code checks for C bit before calling ioremap_encrypted(). So I am not
clear why you do this.



> +}
> +
>  /*
>   * Allocates the command buffer. This buffer is per AMD IOMMU. We can
>   * write commands to that buffer later and the IOMMU will execute them
> @@ -942,8 +959,105 @@ static int iommu_init_ga_log(struct amd_iommu *iommu)
>  static int __init alloc_cwwb_sem(struct amd_iommu *iommu)
>  {
>  	iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL, 1);
> +	if (!iommu->cmd_sem)
> +		return -ENOMEM;
> +	iommu->cmd_sem_paddr = iommu_virt_to_phys((void *)iommu->cmd_sem);
> +	return 0;
> +}
> +
> +static int __init remap_event_buffer(struct amd_iommu *iommu)
> +{
> +	u64 paddr;
> +
> +	pr_info_once("Re-using event buffer from the previous kernel\n");
> +	/*
> +	 * Read-back the event log base address register and apply
> +	 * PM_ADDR_MASK to obtain the event log base address.
> +	 */
> +	paddr = readq(iommu->mmio_base + MMIO_EVT_BUF_OFFSET) & PM_ADDR_MASK;
> +	iommu->evt_buf = iommu_memremap(paddr, EVT_BUFFER_SIZE);
> +
> +	return iommu->evt_buf ? 0 : -ENOMEM;
> +}
> +
> +static int __init remap_command_buffer(struct amd_iommu *iommu)
> +{
> +	u64 paddr;
> +
> +	pr_info_once("Re-using command buffer from the previous kernel\n");
> +	/*
> +	 * Read-back the command buffer base address register and apply
> +	 * PM_ADDR_MASK to obtain the command buffer base address.
> +	 */
> +	paddr = readq(iommu->mmio_base + MMIO_CMD_BUF_OFFSET) & PM_ADDR_MASK;
> +	iommu->cmd_buf = iommu_memremap(paddr, CMD_BUFFER_SIZE);
> +
> +	return iommu->cmd_buf ? 0 : -ENOMEM;
> +}
> +
> +static int __init remap_cwwb_sem(struct amd_iommu *iommu)
> +{
> +	u64 paddr;
> +
> +	if (check_feature(FEATURE_SNP)) {
> +		/*
> +		 * When SNP is enabled, the exclusion base register is used for the
> +		 * completion wait buffer (CWB) address. Read and re-use it.
> +		 */
> +		pr_info_once("Re-using CWB buffers from the previous kernel\n");
> +		/*
> +		 * Read-back the exclusion base register and apply PM_ADDR_MASK
> +		 * to obtain the exclusion range base address.
> +		 */
> +		paddr = readq(iommu->mmio_base + MMIO_EXCL_BASE_OFFSET) & PM_ADDR_MASK;
> +		iommu->cmd_sem = iommu_memremap(paddr, PAGE_SIZE);
> +		if (!iommu->cmd_sem)
> +			return -ENOMEM;
> +		iommu->cmd_sem_paddr = paddr;
> +	} else {
> +		return alloc_cwwb_sem(iommu);

I understand this one is different from command/event buffer. But calling
function name as remap_*() and then allocating memory internally is bit odd.
Also this differs from previous functions.

> +	}
> +
> +	return 0;
> +}
> +
> +static int __init alloc_iommu_buffers(struct amd_iommu *iommu)
> +{
> +	int ret;
> +
> +	/*
> +	 * IOMMU Completion Store Base MMIO, Command Buffer Base Address MMIO
> +	 * registers are locked if SNP is enabled during kdump, reuse/remap

Redudant explaination because implementation is going to support non-SNP
scenario as well.


-Vasant



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/4] crypto: ccp: Skip SNP INIT for kdump boot
  2025-07-15 19:27 ` [PATCH v3 3/4] crypto: ccp: Skip SNP INIT for kdump boot Ashish Kalra
@ 2025-07-16  9:20   ` Vasant Hegde
  2025-07-16 22:03     ` Kalra, Ashish
  0 siblings, 1 reply; 21+ messages in thread
From: Vasant Hegde @ 2025-07-16  9:20 UTC (permalink / raw)
  To: Ashish Kalra, joro, suravee.suthikulpanit, thomas.lendacky,
	Sairaj.ArunKodilkar, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm



On 7/16/2025 12:57 AM, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> If SNP is enabled and initialized in the previous kernel then SNP is
> already initialized for kdump boot and attempting SNP INIT again
> during kdump boot causes SNP INIT failure and eventually leads to
> IOMMU failures.
> 
> Skip SNP INIT if doing kdump boot.

Just double checking, do we need check for snp_rmptable_init()?

-Vasant


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/4] iommu/amd: Reuse device table for kdump
  2025-07-15 19:27 ` [PATCH v3 2/4] iommu/amd: Reuse device table " Ashish Kalra
@ 2025-07-16  9:42   ` Vasant Hegde
  2025-07-16 22:07     ` Kalra, Ashish
  0 siblings, 1 reply; 21+ messages in thread
From: Vasant Hegde @ 2025-07-16  9:42 UTC (permalink / raw)
  To: Ashish Kalra, joro, suravee.suthikulpanit, thomas.lendacky,
	Sairaj.ArunKodilkar, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm



On 7/16/2025 12:57 AM, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> After a panic if SNP is enabled in the previous kernel then the kdump
> kernel boots with IOMMU SNP enforcement still enabled.
> 
> IOMMU device table register is locked and exclusive to the previous
> kernel. Attempts to copy old device table from the previous kernel
> fails in kdump kernel as hardware ignores writes to the locked device
> table base address register as per AMD IOMMU spec Section 2.12.2.1.
> 
> This results in repeated "Completion-Wait loop timed out" errors and a
> second kernel panic: "Kernel panic - not syncing: timer doesn't work
> through Interrupt-remapped IO-APIC".
> 
> Reuse device table instead of copying device table in case of kdump
> boot and remove all copying device table code.
> 
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  drivers/iommu/amd/init.c | 97 ++++++++++++----------------------------
>  1 file changed, 28 insertions(+), 69 deletions(-)
> 
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index 32295f26be1b..18bd869a82d9 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -406,6 +406,9 @@ static void iommu_set_device_table(struct amd_iommu *iommu)
>  
>  	BUG_ON(iommu->mmio_base == NULL);
>  
> +	if (is_kdump_kernel())

This is fine.. but its becoming too many places with kdump check! I don't know
what is the better way here.
Is it worth to keep it like this -OR- add say iommu ops that way during init we
check is_kdump_kernel() and adjust the ops ?

@Joerg, any preference?


> +		return;
> +
>  	entry = iommu_virt_to_phys(dev_table);
>  	entry |= (dev_table_size >> 12) - 1;
>  	memcpy_toio(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET,
> @@ -646,7 +649,10 @@ static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
>  
>  static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg)
>  {
> -	iommu_free_pages(pci_seg->dev_table);
> +	if (is_kdump_kernel())
> +		memunmap((void *)pci_seg->dev_table);
> +	else
> +		iommu_free_pages(pci_seg->dev_table);
>  	pci_seg->dev_table = NULL;
>  }
>  
> @@ -1128,15 +1134,12 @@ static void set_dte_bit(struct dev_table_entry *dte, u8 bit)
>  	dte->data[i] |= (1UL << _bit);
>  }
>  
> -static bool __copy_device_table(struct amd_iommu *iommu)
> +static bool __reuse_device_table(struct amd_iommu *iommu)
>  {
> -	u64 int_ctl, int_tab_len, entry = 0;
>  	struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
> -	struct dev_table_entry *old_devtb = NULL;
> -	u32 lo, hi, devid, old_devtb_size;
> +	u32 lo, hi, old_devtb_size;
>  	phys_addr_t old_devtb_phys;
> -	u16 dom_id, dte_v, irq_v;
> -	u64 tmp;
> +	u64 entry;
>  
>  	/* Each IOMMU use separate device table with the same size */
>  	lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
> @@ -1161,66 +1164,22 @@ static bool __copy_device_table(struct amd_iommu *iommu)
>  		pr_err("The address of old device table is above 4G, not trustworthy!\n");
>  		return false;
>  	}
> -	old_devtb = (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT) && is_kdump_kernel())
> -		    ? (__force void *)ioremap_encrypted(old_devtb_phys,
> -							pci_seg->dev_table_size)
> -		    : memremap(old_devtb_phys, pci_seg->dev_table_size, MEMREMAP_WB);
> -
> -	if (!old_devtb)
> -		return false;
>  
> -	pci_seg->old_dev_tbl_cpy = iommu_alloc_pages_sz(
> -		GFP_KERNEL | GFP_DMA32, pci_seg->dev_table_size);
> +	/*
> +	 * IOMMU Device Table Base Address MMIO register is locked
> +	 * if SNP is enabled during kdump, reuse the previous kernel's
> +	 * device table.
> +	 */
> +	pci_seg->old_dev_tbl_cpy = iommu_memremap(old_devtb_phys, pci_seg->dev_table_size);
>  	if (pci_seg->old_dev_tbl_cpy == NULL) {
> -		pr_err("Failed to allocate memory for copying old device table!\n");
> -		memunmap(old_devtb);
> +		pr_err("Failed to remap memory for reusing old device table!\n");
>  		return false;
>  	}
>  
> -	for (devid = 0; devid <= pci_seg->last_bdf; ++devid) {
> -		pci_seg->old_dev_tbl_cpy[devid] = old_devtb[devid];
> -		dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
> -		dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
> -
> -		if (dte_v && dom_id) {
> -			pci_seg->old_dev_tbl_cpy[devid].data[0] = old_devtb[devid].data[0];
> -			pci_seg->old_dev_tbl_cpy[devid].data[1] = old_devtb[devid].data[1];
> -			/* Reserve the Domain IDs used by previous kernel */
> -			if (ida_alloc_range(&pdom_ids, dom_id, dom_id, GFP_ATOMIC) != dom_id) {
> -				pr_err("Failed to reserve domain ID 0x%x\n", dom_id);
> -				memunmap(old_devtb);
> -				return false;
> -			}
> -			/* If gcr3 table existed, mask it out */
> -			if (old_devtb[devid].data[0] & DTE_FLAG_GV) {
> -				tmp = (DTE_GCR3_30_15 | DTE_GCR3_51_31);
> -				pci_seg->old_dev_tbl_cpy[devid].data[1] &= ~tmp;
> -				tmp = (DTE_GCR3_14_12 | DTE_FLAG_GV);
> -				pci_seg->old_dev_tbl_cpy[devid].data[0] &= ~tmp;
> -			}
> -		}
> -
> -		irq_v = old_devtb[devid].data[2] & DTE_IRQ_REMAP_ENABLE;
> -		int_ctl = old_devtb[devid].data[2] & DTE_IRQ_REMAP_INTCTL_MASK;
> -		int_tab_len = old_devtb[devid].data[2] & DTE_INTTABLEN_MASK;
> -		if (irq_v && (int_ctl || int_tab_len)) {
> -			if ((int_ctl != DTE_IRQ_REMAP_INTCTL) ||
> -			    (int_tab_len != DTE_INTTABLEN_512 &&
> -			     int_tab_len != DTE_INTTABLEN_2K)) {
> -				pr_err("Wrong old irq remapping flag: %#x\n", devid);
> -				memunmap(old_devtb);
> -				return false;
> -			}
> -
> -			pci_seg->old_dev_tbl_cpy[devid].data[2] = old_devtb[devid].data[2];
> -		}
> -	}
> -	memunmap(old_devtb);
> -
>  	return true;
>  }
>  
> -static bool copy_device_table(void)
> +static bool reuse_device_table(void)
>  {
>  	struct amd_iommu *iommu;
>  	struct amd_iommu_pci_seg *pci_seg;
> @@ -1228,17 +1187,17 @@ static bool copy_device_table(void)
>  	if (!amd_iommu_pre_enabled)
>  		return false;
>  
> -	pr_warn("Translation is already enabled - trying to copy translation structures\n");
> +	pr_warn("Translation is already enabled - trying to reuse translation structures\n");
>  
>  	/*
>  	 * All IOMMUs within PCI segment shares common device table.
> -	 * Hence copy device table only once per PCI segment.
> +	 * Hence reuse device table only once per PCI segment.
>  	 */
>  	for_each_pci_segment(pci_seg) {
>  		for_each_iommu(iommu) {
>  			if (pci_seg->id != iommu->pci_seg->id)
>  				continue;
> -			if (!__copy_device_table(iommu))
> +			if (!__reuse_device_table(iommu))
>  				return false;
>  			break;
>  		}
> @@ -2917,8 +2876,8 @@ static void early_enable_iommu(struct amd_iommu *iommu)
>   * This function finally enables all IOMMUs found in the system after
>   * they have been initialized.
>   *
> - * Or if in kdump kernel and IOMMUs are all pre-enabled, try to copy
> - * the old content of device table entries. Not this case or copy failed,
> + * Or if in kdump kernel and IOMMUs are all pre-enabled, try to reuse
> + * the old content of device table entries. Not this case or reuse failed,
>   * just continue as normal kernel does.
>   */
>  static void early_enable_iommus(void)
> @@ -2926,18 +2885,18 @@ static void early_enable_iommus(void)
>  	struct amd_iommu *iommu;
>  	struct amd_iommu_pci_seg *pci_seg;
>  
> -	if (!copy_device_table()) {
> +	if (!reuse_device_table()) {

Hmmm. What happens if SNP enabled and reuse_device_table() couldn't setup
previous DTE?
In non-SNP case it works fine as we can rebuild new DTE. But in SNP case we
should fail the kdump right?

-Vasant



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 4/4] iommu/amd: Fix host kdump support for SNP
  2025-07-15 19:27 ` [PATCH v3 4/4] iommu/amd: Fix host kdump support for SNP Ashish Kalra
@ 2025-07-16  9:46   ` Vasant Hegde
  2025-07-16 22:12     ` Kalra, Ashish
  0 siblings, 1 reply; 21+ messages in thread
From: Vasant Hegde @ 2025-07-16  9:46 UTC (permalink / raw)
  To: Ashish Kalra, joro, suravee.suthikulpanit, thomas.lendacky,
	Sairaj.ArunKodilkar, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm



On 7/16/2025 12:57 AM, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> When a crash is triggered the kernel attempts to shut down SEV-SNP
> using the SNP_SHUTDOWN_EX command. If active SNP VMs are present,
> SNP_SHUTDOWN_EX fails as firmware checks all encryption-capable ASIDs
> to ensure none are in use and that a DF_FLUSH is not required. If a
> DF_FLUSH is required, the firmware returns DFFLUSH_REQUIRED, causing
> SNP_SHUTDOWN_EX to fail.
> 
> This casues the kdump kernel to boot with IOMMU SNP enforcement still
> enabled and IOMMU completion wait buffers (CWBs), command buffers,
> device tables and event buffer registers remain locked and exclusive
> to the previous kernel. Attempts to allocate and use new buffers in
> the kdump kernel fail, as the hardware ignores writes to the locked
> MMIO registers (per AMD IOMMU spec Section 2.12.2.1).
> 
> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
> remapping which is required for proper operation.
> 
> This results in repeated "Completion-Wait loop timed out" errors and a
> second kernel panic: "Kernel panic - not syncing: timer doesn't work
> through Interrupt-remapped IO-APIC"
> 
> The following MMIO registers are locked and ignore writes after failed
> SNP shutdown:
> Device Table Base Address Register
> Command Buffer Base Address Register
> Event Buffer Base Address Register
> Completion Store Base Register/Exclusion Base Register
> Completion Store Limit Register/Exclusion Range Limit Register
> 

May be you can rephrase the description as first patch covered some of these
details.

> Instead of allocating new buffers, re-use the previous kernel’s pages
> for completion wait buffers, command buffers, event buffers and device
> tables and operate with the already enabled SNP configuration and
> existing data structures.
> 
> This approach is now used for kdump boot regardless of whether SNP is
> enabled during kdump.
> 
> The fix enables successful crashkernel/kdump operation on SNP hosts
> even when SNP_SHUTDOWN_EX fails.
> 
> Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")

I am not sure why you have marked only this patch as Fixes? Also it won't fix
the kdump if someone just backports only this patch right?

-Vasant



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/4] iommu/amd: Add support to remap/unmap IOMMU buffers for kdump
  2025-07-16  9:19   ` Vasant Hegde
@ 2025-07-16 21:55     ` Kalra, Ashish
  2025-07-17  7:05       ` Vasant Hegde
  0 siblings, 1 reply; 21+ messages in thread
From: Kalra, Ashish @ 2025-07-16 21:55 UTC (permalink / raw)
  To: Vasant Hegde, joro, suravee.suthikulpanit, thomas.lendacky,
	Sairaj.ArunKodilkar, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm

Hello Vasant,

On 7/16/2025 4:19 AM, Vasant Hegde wrote:
> Hi Ashish,
> 
> 
> On 7/16/2025 12:56 AM, Ashish Kalra wrote:
>> From: Ashish Kalra <ashish.kalra@amd.com>
>>
>> After a panic if SNP is enabled in the previous kernel then the kdump
>> kernel boots with IOMMU SNP enforcement still enabled.
>>
>> IOMMU completion wait buffers (CWBs), command buffers and event buffer
>> registers remain locked and exclusive to the previous kernel. Attempts
>> to allocate and use new buffers in the kdump kernel fail, as hardware
>> ignores writes to the locked MMIO registers as per AMD IOMMU spec
>> Section 2.12.2.1.
>>
>> This results in repeated "Completion-Wait loop timed out" errors and a
>> second kernel panic: "Kernel panic - not syncing: timer doesn't work
>> through Interrupt-remapped IO-APIC"
>>
>> The following MMIO registers are locked and ignore writes after failed
>> SNP shutdown:
>> Command Buffer Base Address Register
>> Event Log Base Address Register
>> Completion Store Base Register/Exclusion Base Register
>> Completion Store Limit Register/Exclusion Limit Register
>> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
>> remapping, which is required for proper operation.
> 
> There are couple of other registers in locked list. Can you please rephrase
> above paras?  Also you don't need to callout indivisual registers here. You can
> just add link to IOMMU spec.

Yes i will drop listing the individual registers here and just provide the link
to the IOMMU specs.

> 
> Unrelated to this patch :
>   I went to some of the SNP related code in IOMMU driver. One thing confused me
> is in amd_iommu_snp_disable() code why Command buffer is not marked as shared?
> any idea?
> 

Yes that's interesting. 

This is as per the SNP Firmware ABI specs: 

from SNP_INIT_EX: 

The firmware initializes the IOMMU to perform RMP enforcement. The firmware also transitions
the event log, PPR log, and completion wait buffers of the IOMMU to an RMP page state that is 
read only to the hypervisor and cannot be assigned to guests

So during SNP_SHUTDOWN_EX, transitioning these same buffers back to shared state.

But will investigate deeper and check why is command buffer not marked as FW/Reclaim state
by firmware ? 

> 
>>
>> Reuse the pages of the previous kernel for completion wait buffers,
>> command buffers, event buffers and memremap them during kdump boot
>> and essentially work with an already enabled IOMMU configuration and
>> re-using the previous kernel’s data structures.
>>
>> Reusing of command buffers and event buffers is now done for kdump boot
>> irrespective of SNP being enabled during kdump.
>>
>> Re-use of completion wait buffers is only done when SNP is enabled as
>> the exclusion base register is used for the completion wait buffer
>> (CWB) address only when SNP is enabled.
>>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> ---
>>  drivers/iommu/amd/amd_iommu_types.h |   5 +
>>  drivers/iommu/amd/init.c            | 163 ++++++++++++++++++++++++++--
>>  drivers/iommu/amd/iommu.c           |   2 +-
>>  3 files changed, 157 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
>> index 9b64cd706c96..082eb1270818 100644
>> --- a/drivers/iommu/amd/amd_iommu_types.h
>> +++ b/drivers/iommu/amd/amd_iommu_types.h
>> @@ -791,6 +791,11 @@ struct amd_iommu {
>>  	u32 flags;
>>  	volatile u64 *cmd_sem;
>>  	atomic64_t cmd_sem_val;
>> +	/*
>> +	 * Track physical address to directly use it in build_completion_wait()
>> +	 * and avoid adding any special checks and handling for kdump.
>> +	 */
>> +	u64 cmd_sem_paddr;
> 
> With this we are tracking both physical and virtual address? Is that really
> needed? Can we just track PA and convert it into va?
>

I believe it is simpler to keep/track cmd_sem and use it directly, instead of doing
phys_to_virt() calls everytime before using it.
 
>>  
>>  #ifdef CONFIG_AMD_IOMMU_DEBUGFS
>>  	/* DebugFS Info */
>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>> index cadb2c735ffc..32295f26be1b 100644
>> --- a/drivers/iommu/amd/init.c
>> +++ b/drivers/iommu/amd/init.c
>> @@ -710,6 +710,23 @@ static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
>>  	pci_seg->alias_table = NULL;
>>  }
>>  
>> +static inline void *iommu_memremap(unsigned long paddr, size_t size)
>> +{
>> +	phys_addr_t phys;
>> +
>> +	if (!paddr)
>> +		return NULL;
>> +
>> +	/*
>> +	 * Obtain true physical address in kdump kernel when SME is enabled.
>> +	 * Currently, IOMMU driver does not support booting into an unencrypted
>> +	 * kdump kernel.
> 
> You mean production kernel w/ SME and kdump kernel with non-SME is not supported?
> 

Yes. 

> 
>> +	 */
>> +	phys = __sme_clr(paddr);
>> +
>> +	return ioremap_encrypted(phys, size);
> 
> You are clearing C bit and then immediately remapping using encrypted mode. Also
> existing code checks for C bit before calling ioremap_encrypted(). So I am not
> clear why you do this.
> 
>

We need to clear the C-bit to get the correct physical address for remapping.

Which existing code checks for C-bit before calling ioremap_encrypted() ?

After getting the correct physical address we call ioremap_encrypted() which
which map it with C-bit enabled if SME is enabled or else it will map it 
without C-bit (so it handles both SME and non-SME cases).
 
Earlier we used to check for CC_ATTR_HOST_MEM_ENCRYPT flag and if set 
then call ioremap_encrypted() or otherwise call memremap(), but then
as mentioned above ioremap_encrypted() works for both cases - SME or
non-SME, hence we use that approach.

> 
>> +}
>> +
>>  /*
>>   * Allocates the command buffer. This buffer is per AMD IOMMU. We can
>>   * write commands to that buffer later and the IOMMU will execute them
>> @@ -942,8 +959,105 @@ static int iommu_init_ga_log(struct amd_iommu *iommu)
>>  static int __init alloc_cwwb_sem(struct amd_iommu *iommu)
>>  {
>>  	iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL, 1);
>> +	if (!iommu->cmd_sem)
>> +		return -ENOMEM;
>> +	iommu->cmd_sem_paddr = iommu_virt_to_phys((void *)iommu->cmd_sem);
>> +	return 0;
>> +}
>> +
>> +static int __init remap_event_buffer(struct amd_iommu *iommu)
>> +{
>> +	u64 paddr;
>> +
>> +	pr_info_once("Re-using event buffer from the previous kernel\n");
>> +	/*
>> +	 * Read-back the event log base address register and apply
>> +	 * PM_ADDR_MASK to obtain the event log base address.
>> +	 */
>> +	paddr = readq(iommu->mmio_base + MMIO_EVT_BUF_OFFSET) & PM_ADDR_MASK;
>> +	iommu->evt_buf = iommu_memremap(paddr, EVT_BUFFER_SIZE);
>> +
>> +	return iommu->evt_buf ? 0 : -ENOMEM;
>> +}
>> +
>> +static int __init remap_command_buffer(struct amd_iommu *iommu)
>> +{
>> +	u64 paddr;
>> +
>> +	pr_info_once("Re-using command buffer from the previous kernel\n");
>> +	/*
>> +	 * Read-back the command buffer base address register and apply
>> +	 * PM_ADDR_MASK to obtain the command buffer base address.
>> +	 */
>> +	paddr = readq(iommu->mmio_base + MMIO_CMD_BUF_OFFSET) & PM_ADDR_MASK;
>> +	iommu->cmd_buf = iommu_memremap(paddr, CMD_BUFFER_SIZE);
>> +
>> +	return iommu->cmd_buf ? 0 : -ENOMEM;
>> +}
>> +
>> +static int __init remap_cwwb_sem(struct amd_iommu *iommu)
>> +{
>> +	u64 paddr;
>> +
>> +	if (check_feature(FEATURE_SNP)) {
>> +		/*
>> +		 * When SNP is enabled, the exclusion base register is used for the
>> +		 * completion wait buffer (CWB) address. Read and re-use it.
>> +		 */
>> +		pr_info_once("Re-using CWB buffers from the previous kernel\n");
>> +		/*
>> +		 * Read-back the exclusion base register and apply PM_ADDR_MASK
>> +		 * to obtain the exclusion range base address.
>> +		 */
>> +		paddr = readq(iommu->mmio_base + MMIO_EXCL_BASE_OFFSET) & PM_ADDR_MASK;
>> +		iommu->cmd_sem = iommu_memremap(paddr, PAGE_SIZE);
>> +		if (!iommu->cmd_sem)
>> +			return -ENOMEM;
>> +		iommu->cmd_sem_paddr = paddr;
>> +	} else {
>> +		return alloc_cwwb_sem(iommu);
> 
> I understand this one is different from command/event buffer. But calling
> function name as remap_*() and then allocating memory internally is bit odd.
> Also this differs from previous functions.
> 

Yes i agree, but then what do we name it ?

remap_or_alloc_cwb_sem() does that sound Ok ?

Thanks,
Ashish

>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int __init alloc_iommu_buffers(struct amd_iommu *iommu)
>> +{
>> +	int ret;
>> +
>> +	/*
>> +	 * IOMMU Completion Store Base MMIO, Command Buffer Base Address MMIO
>> +	 * registers are locked if SNP is enabled during kdump, reuse/remap
> 
> Redudant explaination because implementation is going to support non-SNP
> scenario as well.
> 
> 
> -Vasant
> 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/4] crypto: ccp: Skip SNP INIT for kdump boot
  2025-07-16  9:20   ` Vasant Hegde
@ 2025-07-16 22:03     ` Kalra, Ashish
  2025-07-17  5:56       ` Vasant Hegde
  0 siblings, 1 reply; 21+ messages in thread
From: Kalra, Ashish @ 2025-07-16 22:03 UTC (permalink / raw)
  To: Vasant Hegde, joro, suravee.suthikulpanit, thomas.lendacky,
	Sairaj.ArunKodilkar, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm

Hello Vasant,

On 7/16/2025 4:20 AM, Vasant Hegde wrote:
> 
> 
> On 7/16/2025 12:57 AM, Ashish Kalra wrote:
>> From: Ashish Kalra <ashish.kalra@amd.com>
>>
>> If SNP is enabled and initialized in the previous kernel then SNP is
>> already initialized for kdump boot and attempting SNP INIT again
>> during kdump boot causes SNP INIT failure and eventually leads to
>> IOMMU failures.
>>
>> Skip SNP INIT if doing kdump boot.
> 
> Just double checking, do we need check for snp_rmptable_init()?
> 

Do you mean adding this check in snp_rmptable_init() ?

We already have a check there for kexec boot: 

snp_rmptable_init()
{
...
...
	/*
         * Check if SEV-SNP is already enabled, this can happen in case of
         * kexec boot.
         */
        rdmsrq(MSR_AMD64_SYSCFG, val);
        if (val & MSR_AMD64_SYSCFG_SNP_EN)
                goto skip_enable;

And we still have to map the RMP table in the kernel as SNP is still enabled
and initialized in this case for kdump boot, so that is still required as
part of snp_rmptable_init().

Additionally, for this patch i also have to skip SEV INIT similar to what we
are doing for SNP INIT as we get SEV INIT failure warnings as SEV is also
initialized during this kdump boot similar to SNP.

So will be moving this check and skip to _sev_platform_init_locked() to handle
both SEV and SNP INIT cases.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/4] iommu/amd: Reuse device table for kdump
  2025-07-16  9:42   ` Vasant Hegde
@ 2025-07-16 22:07     ` Kalra, Ashish
  2025-07-17  5:38       ` Sairaj Kodilkar
  2025-07-17  6:05       ` Vasant Hegde
  0 siblings, 2 replies; 21+ messages in thread
From: Kalra, Ashish @ 2025-07-16 22:07 UTC (permalink / raw)
  To: Vasant Hegde, joro, suravee.suthikulpanit, thomas.lendacky,
	Sairaj.ArunKodilkar, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm

Hello Vasant,

On 7/16/2025 4:42 AM, Vasant Hegde wrote:
> 
> 
> On 7/16/2025 12:57 AM, Ashish Kalra wrote:
>> From: Ashish Kalra <ashish.kalra@amd.com>
>>
>> After a panic if SNP is enabled in the previous kernel then the kdump
>> kernel boots with IOMMU SNP enforcement still enabled.
>>
>> IOMMU device table register is locked and exclusive to the previous
>> kernel. Attempts to copy old device table from the previous kernel
>> fails in kdump kernel as hardware ignores writes to the locked device
>> table base address register as per AMD IOMMU spec Section 2.12.2.1.
>>
>> This results in repeated "Completion-Wait loop timed out" errors and a
>> second kernel panic: "Kernel panic - not syncing: timer doesn't work
>> through Interrupt-remapped IO-APIC".
>>
>> Reuse device table instead of copying device table in case of kdump
>> boot and remove all copying device table code.
>>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> ---
>>  drivers/iommu/amd/init.c | 97 ++++++++++++----------------------------
>>  1 file changed, 28 insertions(+), 69 deletions(-)
>>
>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>> index 32295f26be1b..18bd869a82d9 100644
>> --- a/drivers/iommu/amd/init.c
>> +++ b/drivers/iommu/amd/init.c
>> @@ -406,6 +406,9 @@ static void iommu_set_device_table(struct amd_iommu *iommu)
>>  
>>  	BUG_ON(iommu->mmio_base == NULL);
>>  
>> +	if (is_kdump_kernel())
> 
> This is fine.. but its becoming too many places with kdump check! I don't know
> what is the better way here.
> Is it worth to keep it like this -OR- add say iommu ops that way during init we
> check is_kdump_kernel() and adjust the ops ?
> 
> @Joerg, any preference?
> 
> 
>> +		return;
>> +
>>  	entry = iommu_virt_to_phys(dev_table);
>>  	entry |= (dev_table_size >> 12) - 1;
>>  	memcpy_toio(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET,
>> @@ -646,7 +649,10 @@ static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
>>  
>>  static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg)
>>  {
>> -	iommu_free_pages(pci_seg->dev_table);
>> +	if (is_kdump_kernel())
>> +		memunmap((void *)pci_seg->dev_table);
>> +	else
>> +		iommu_free_pages(pci_seg->dev_table);
>>  	pci_seg->dev_table = NULL;
>>  }
>>  
>> @@ -1128,15 +1134,12 @@ static void set_dte_bit(struct dev_table_entry *dte, u8 bit)
>>  	dte->data[i] |= (1UL << _bit);
>>  }
>>  
>> -static bool __copy_device_table(struct amd_iommu *iommu)
>> +static bool __reuse_device_table(struct amd_iommu *iommu)
>>  {
>> -	u64 int_ctl, int_tab_len, entry = 0;
>>  	struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
>> -	struct dev_table_entry *old_devtb = NULL;
>> -	u32 lo, hi, devid, old_devtb_size;
>> +	u32 lo, hi, old_devtb_size;
>>  	phys_addr_t old_devtb_phys;
>> -	u16 dom_id, dte_v, irq_v;
>> -	u64 tmp;
>> +	u64 entry;
>>  
>>  	/* Each IOMMU use separate device table with the same size */
>>  	lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
>> @@ -1161,66 +1164,22 @@ static bool __copy_device_table(struct amd_iommu *iommu)
>>  		pr_err("The address of old device table is above 4G, not trustworthy!\n");
>>  		return false;
>>  	}
>> -	old_devtb = (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT) && is_kdump_kernel())
>> -		    ? (__force void *)ioremap_encrypted(old_devtb_phys,
>> -							pci_seg->dev_table_size)
>> -		    : memremap(old_devtb_phys, pci_seg->dev_table_size, MEMREMAP_WB);
>> -
>> -	if (!old_devtb)
>> -		return false;
>>  
>> -	pci_seg->old_dev_tbl_cpy = iommu_alloc_pages_sz(
>> -		GFP_KERNEL | GFP_DMA32, pci_seg->dev_table_size);
>> +	/*
>> +	 * IOMMU Device Table Base Address MMIO register is locked
>> +	 * if SNP is enabled during kdump, reuse the previous kernel's
>> +	 * device table.
>> +	 */
>> +	pci_seg->old_dev_tbl_cpy = iommu_memremap(old_devtb_phys, pci_seg->dev_table_size);
>>  	if (pci_seg->old_dev_tbl_cpy == NULL) {
>> -		pr_err("Failed to allocate memory for copying old device table!\n");
>> -		memunmap(old_devtb);
>> +		pr_err("Failed to remap memory for reusing old device table!\n");
>>  		return false;
>>  	}
>>  
>> -	for (devid = 0; devid <= pci_seg->last_bdf; ++devid) {
>> -		pci_seg->old_dev_tbl_cpy[devid] = old_devtb[devid];
>> -		dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
>> -		dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
>> -
>> -		if (dte_v && dom_id) {
>> -			pci_seg->old_dev_tbl_cpy[devid].data[0] = old_devtb[devid].data[0];
>> -			pci_seg->old_dev_tbl_cpy[devid].data[1] = old_devtb[devid].data[1];
>> -			/* Reserve the Domain IDs used by previous kernel */
>> -			if (ida_alloc_range(&pdom_ids, dom_id, dom_id, GFP_ATOMIC) != dom_id) {
>> -				pr_err("Failed to reserve domain ID 0x%x\n", dom_id);
>> -				memunmap(old_devtb);
>> -				return false;
>> -			}
>> -			/* If gcr3 table existed, mask it out */
>> -			if (old_devtb[devid].data[0] & DTE_FLAG_GV) {
>> -				tmp = (DTE_GCR3_30_15 | DTE_GCR3_51_31);
>> -				pci_seg->old_dev_tbl_cpy[devid].data[1] &= ~tmp;
>> -				tmp = (DTE_GCR3_14_12 | DTE_FLAG_GV);
>> -				pci_seg->old_dev_tbl_cpy[devid].data[0] &= ~tmp;
>> -			}
>> -		}
>> -
>> -		irq_v = old_devtb[devid].data[2] & DTE_IRQ_REMAP_ENABLE;
>> -		int_ctl = old_devtb[devid].data[2] & DTE_IRQ_REMAP_INTCTL_MASK;
>> -		int_tab_len = old_devtb[devid].data[2] & DTE_INTTABLEN_MASK;
>> -		if (irq_v && (int_ctl || int_tab_len)) {
>> -			if ((int_ctl != DTE_IRQ_REMAP_INTCTL) ||
>> -			    (int_tab_len != DTE_INTTABLEN_512 &&
>> -			     int_tab_len != DTE_INTTABLEN_2K)) {
>> -				pr_err("Wrong old irq remapping flag: %#x\n", devid);
>> -				memunmap(old_devtb);
>> -				return false;
>> -			}
>> -
>> -			pci_seg->old_dev_tbl_cpy[devid].data[2] = old_devtb[devid].data[2];
>> -		}
>> -	}
>> -	memunmap(old_devtb);
>> -
>>  	return true;
>>  }
>>  
>> -static bool copy_device_table(void)
>> +static bool reuse_device_table(void)
>>  {
>>  	struct amd_iommu *iommu;
>>  	struct amd_iommu_pci_seg *pci_seg;
>> @@ -1228,17 +1187,17 @@ static bool copy_device_table(void)
>>  	if (!amd_iommu_pre_enabled)
>>  		return false;
>>  
>> -	pr_warn("Translation is already enabled - trying to copy translation structures\n");
>> +	pr_warn("Translation is already enabled - trying to reuse translation structures\n");
>>  
>>  	/*
>>  	 * All IOMMUs within PCI segment shares common device table.
>> -	 * Hence copy device table only once per PCI segment.
>> +	 * Hence reuse device table only once per PCI segment.
>>  	 */
>>  	for_each_pci_segment(pci_seg) {
>>  		for_each_iommu(iommu) {
>>  			if (pci_seg->id != iommu->pci_seg->id)
>>  				continue;
>> -			if (!__copy_device_table(iommu))
>> +			if (!__reuse_device_table(iommu))
>>  				return false;
>>  			break;
>>  		}
>> @@ -2917,8 +2876,8 @@ static void early_enable_iommu(struct amd_iommu *iommu)
>>   * This function finally enables all IOMMUs found in the system after
>>   * they have been initialized.
>>   *
>> - * Or if in kdump kernel and IOMMUs are all pre-enabled, try to copy
>> - * the old content of device table entries. Not this case or copy failed,
>> + * Or if in kdump kernel and IOMMUs are all pre-enabled, try to reuse
>> + * the old content of device table entries. Not this case or reuse failed,
>>   * just continue as normal kernel does.
>>   */
>>  static void early_enable_iommus(void)
>> @@ -2926,18 +2885,18 @@ static void early_enable_iommus(void)
>>  	struct amd_iommu *iommu;
>>  	struct amd_iommu_pci_seg *pci_seg;
>>  
>> -	if (!copy_device_table()) {
>> +	if (!reuse_device_table()) {
> 
> Hmmm. What happens if SNP enabled and reuse_device_table() couldn't setup
> previous DTE?
> In non-SNP case it works fine as we can rebuild new DTE. But in SNP case we
> should fail the kdump right?
> 

Which will happen automatically, if we can't setup previous DTE for SNP case
then IOMMU commands will time-out and subsequenly cause a panic as IRQ remapping
won't be setup.

So this is as good as failing the kdump, which will have the same result.

Thanks,
Ashish


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 4/4] iommu/amd: Fix host kdump support for SNP
  2025-07-16  9:46   ` Vasant Hegde
@ 2025-07-16 22:12     ` Kalra, Ashish
  2025-07-17  6:22       ` Vasant Hegde
  0 siblings, 1 reply; 21+ messages in thread
From: Kalra, Ashish @ 2025-07-16 22:12 UTC (permalink / raw)
  To: Vasant Hegde, joro, suravee.suthikulpanit, thomas.lendacky,
	Sairaj.ArunKodilkar, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm

Hello Vasant,

On 7/16/2025 4:46 AM, Vasant Hegde wrote:
> 
> 
> On 7/16/2025 12:57 AM, Ashish Kalra wrote:
>> From: Ashish Kalra <ashish.kalra@amd.com>
>>
>> When a crash is triggered the kernel attempts to shut down SEV-SNP
>> using the SNP_SHUTDOWN_EX command. If active SNP VMs are present,
>> SNP_SHUTDOWN_EX fails as firmware checks all encryption-capable ASIDs
>> to ensure none are in use and that a DF_FLUSH is not required. If a
>> DF_FLUSH is required, the firmware returns DFFLUSH_REQUIRED, causing
>> SNP_SHUTDOWN_EX to fail.
>>
>> This casues the kdump kernel to boot with IOMMU SNP enforcement still
>> enabled and IOMMU completion wait buffers (CWBs), command buffers,
>> device tables and event buffer registers remain locked and exclusive
>> to the previous kernel. Attempts to allocate and use new buffers in
>> the kdump kernel fail, as the hardware ignores writes to the locked
>> MMIO registers (per AMD IOMMU spec Section 2.12.2.1).
>>
>> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
>> remapping which is required for proper operation.
>>
>> This results in repeated "Completion-Wait loop timed out" errors and a
>> second kernel panic: "Kernel panic - not syncing: timer doesn't work
>> through Interrupt-remapped IO-APIC"
>>
>> The following MMIO registers are locked and ignore writes after failed
>> SNP shutdown:
>> Device Table Base Address Register
>> Command Buffer Base Address Register
>> Event Buffer Base Address Register
>> Completion Store Base Register/Exclusion Base Register
>> Completion Store Limit Register/Exclusion Range Limit Register
>>
> 
> May be you can rephrase the description as first patch covered some of these
> details

We do need to include the complete description here as this is the final
patch of the series which fixes the kdump boot.

Do note, that the description in the first patch only mentions the 
IOMMU buffers - command, CWB and event buffers for reuse and this commit
log covers all reusing and remapping required - IOMMU buffers, device table,
etc.
 
>> Instead of allocating new buffers, re-use the previous kernel’s pages
>> for completion wait buffers, command buffers, event buffers and device
>> tables and operate with the already enabled SNP configuration and
>> existing data structures.
>>
>> This approach is now used for kdump boot regardless of whether SNP is
>> enabled during kdump.
>>
>> The fix enables successful crashkernel/kdump operation on SNP hosts
>> even when SNP_SHUTDOWN_EX fails.
>>
>> Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")
> 
> I am not sure why you have marked only this patch as Fixes? Also it won't fix
> the kdump if someone just backports only this patch right?
> 

As mentioned in the cover letter, this is the final patch of the series which 
actually fixes the SNP kdump boot, so i kept Fixes: tag as part of this patch.

I am not sure if i can add Fixes: tag to all the four patches in this series ?

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/4] iommu/amd: Reuse device table for kdump
  2025-07-16 22:07     ` Kalra, Ashish
@ 2025-07-17  5:38       ` Sairaj Kodilkar
  2025-07-17  6:05       ` Vasant Hegde
  1 sibling, 0 replies; 21+ messages in thread
From: Sairaj Kodilkar @ 2025-07-17  5:38 UTC (permalink / raw)
  To: Kalra, Ashish, Vasant Hegde, joro, suravee.suthikulpanit,
	thomas.lendacky, Sairaj.ArunKodilkar, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm



On 7/17/2025 3:37 AM, Kalra, Ashish wrote:
> Hello Vasant,
> 
> On 7/16/2025 4:42 AM, Vasant Hegde wrote:
>>
>>
>> On 7/16/2025 12:57 AM, Ashish Kalra wrote:
>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>
>>> After a panic if SNP is enabled in the previous kernel then the kdump
>>> kernel boots with IOMMU SNP enforcement still enabled.
>>>
>>> IOMMU device table register is locked and exclusive to the previous
>>> kernel. Attempts to copy old device table from the previous kernel
>>> fails in kdump kernel as hardware ignores writes to the locked device
>>> table base address register as per AMD IOMMU spec Section 2.12.2.1.
>>>
>>> This results in repeated "Completion-Wait loop timed out" errors and a
>>> second kernel panic: "Kernel panic - not syncing: timer doesn't work
>>> through Interrupt-remapped IO-APIC".
>>>
>>> Reuse device table instead of copying device table in case of kdump
>>> boot and remove all copying device table code.
>>>
>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>> ---
>>>   drivers/iommu/amd/init.c | 97 ++++++++++++----------------------------
>>>   1 file changed, 28 insertions(+), 69 deletions(-)
>>>
>>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>>> index 32295f26be1b..18bd869a82d9 100644
>>> --- a/drivers/iommu/amd/init.c
>>> +++ b/drivers/iommu/amd/init.c
>>> @@ -406,6 +406,9 @@ static void iommu_set_device_table(struct amd_iommu *iommu)
>>>   
>>>   	BUG_ON(iommu->mmio_base == NULL);
>>>   
>>> +	if (is_kdump_kernel())
>>
>> This is fine.. but its becoming too many places with kdump check! I don't know
>> what is the better way here.
>> Is it worth to keep it like this -OR- add say iommu ops that way during init we
>> check is_kdump_kernel() and adjust the ops ?
>>
>> @Joerg, any preference?
>>
>>
>>> +		return;
>>> +
>>>   	entry = iommu_virt_to_phys(dev_table);
>>>   	entry |= (dev_table_size >> 12) - 1;
>>>   	memcpy_toio(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET,
>>> @@ -646,7 +649,10 @@ static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
>>>   
>>>   static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg)
>>>   {
>>> -	iommu_free_pages(pci_seg->dev_table);
>>> +	if (is_kdump_kernel())
>>> +		memunmap((void *)pci_seg->dev_table);
>>> +	else
>>> +		iommu_free_pages(pci_seg->dev_table);
>>>   	pci_seg->dev_table = NULL;
>>>   }
>>>   
>>> @@ -1128,15 +1134,12 @@ static void set_dte_bit(struct dev_table_entry *dte, u8 bit)
>>>   	dte->data[i] |= (1UL << _bit);
>>>   }
>>>   
>>> -static bool __copy_device_table(struct amd_iommu *iommu)
>>> +static bool __reuse_device_table(struct amd_iommu *iommu)
>>>   {
>>> -	u64 int_ctl, int_tab_len, entry = 0;
>>>   	struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
>>> -	struct dev_table_entry *old_devtb = NULL;
>>> -	u32 lo, hi, devid, old_devtb_size;
>>> +	u32 lo, hi, old_devtb_size;
>>>   	phys_addr_t old_devtb_phys;
>>> -	u16 dom_id, dte_v, irq_v;
>>> -	u64 tmp;
>>> +	u64 entry;
>>>   
>>>   	/* Each IOMMU use separate device table with the same size */
>>>   	lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
>>> @@ -1161,66 +1164,22 @@ static bool __copy_device_table(struct amd_iommu *iommu)
>>>   		pr_err("The address of old device table is above 4G, not trustworthy!\n");
>>>   		return false;
>>>   	}
>>> -	old_devtb = (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT) && is_kdump_kernel())
>>> -		    ? (__force void *)ioremap_encrypted(old_devtb_phys,
>>> -							pci_seg->dev_table_size)
>>> -		    : memremap(old_devtb_phys, pci_seg->dev_table_size, MEMREMAP_WB);
>>> -
>>> -	if (!old_devtb)
>>> -		return false;
>>>   
>>> -	pci_seg->old_dev_tbl_cpy = iommu_alloc_pages_sz(
>>> -		GFP_KERNEL | GFP_DMA32, pci_seg->dev_table_size);
>>> +	/*
>>> +	 * IOMMU Device Table Base Address MMIO register is locked
>>> +	 * if SNP is enabled during kdump, reuse the previous kernel's
>>> +	 * device table.
>>> +	 */
>>> +	pci_seg->old_dev_tbl_cpy = iommu_memremap(old_devtb_phys, pci_seg->dev_table_size);
>>>   	if (pci_seg->old_dev_tbl_cpy == NULL) {
>>> -		pr_err("Failed to allocate memory for copying old device table!\n");
>>> -		memunmap(old_devtb);
>>> +		pr_err("Failed to remap memory for reusing old device table!\n");
>>>   		return false;
>>>   	}
>>>   
>>> -	for (devid = 0; devid <= pci_seg->last_bdf; ++devid) {
>>> -		pci_seg->old_dev_tbl_cpy[devid] = old_devtb[devid];
>>> -		dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
>>> -		dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
>>> -
>>> -		if (dte_v && dom_id) {
>>> -			pci_seg->old_dev_tbl_cpy[devid].data[0] = old_devtb[devid].data[0];
>>> -			pci_seg->old_dev_tbl_cpy[devid].data[1] = old_devtb[devid].data[1];
>>> -			/* Reserve the Domain IDs used by previous kernel */
>>> -			if (ida_alloc_range(&pdom_ids, dom_id, dom_id, GFP_ATOMIC) != dom_id) {
>>> -				pr_err("Failed to reserve domain ID 0x%x\n", dom_id);
>>> -				memunmap(old_devtb);
>>> -				return false;
>>> -			}
>>> -			/* If gcr3 table existed, mask it out */
>>> -			if (old_devtb[devid].data[0] & DTE_FLAG_GV) {
>>> -				tmp = (DTE_GCR3_30_15 | DTE_GCR3_51_31);
>>> -				pci_seg->old_dev_tbl_cpy[devid].data[1] &= ~tmp;
>>> -				tmp = (DTE_GCR3_14_12 | DTE_FLAG_GV);
>>> -				pci_seg->old_dev_tbl_cpy[devid].data[0] &= ~tmp;
>>> -			}
>>> -		}
>>> -
>>> -		irq_v = old_devtb[devid].data[2] & DTE_IRQ_REMAP_ENABLE;
>>> -		int_ctl = old_devtb[devid].data[2] & DTE_IRQ_REMAP_INTCTL_MASK;
>>> -		int_tab_len = old_devtb[devid].data[2] & DTE_INTTABLEN_MASK;
>>> -		if (irq_v && (int_ctl || int_tab_len)) {
>>> -			if ((int_ctl != DTE_IRQ_REMAP_INTCTL) ||
>>> -			    (int_tab_len != DTE_INTTABLEN_512 &&
>>> -			     int_tab_len != DTE_INTTABLEN_2K)) {
>>> -				pr_err("Wrong old irq remapping flag: %#x\n", devid);
>>> -				memunmap(old_devtb);
>>> -				return false;
>>> -			}
>>> -
>>> -			pci_seg->old_dev_tbl_cpy[devid].data[2] = old_devtb[devid].data[2];
>>> -		}
>>> -	}
>>> -	memunmap(old_devtb);
>>> -
>>>   	return true;
>>>   }
>>>   
>>> -static bool copy_device_table(void)
>>> +static bool reuse_device_table(void)
>>>   {
>>>   	struct amd_iommu *iommu;
>>>   	struct amd_iommu_pci_seg *pci_seg;
>>> @@ -1228,17 +1187,17 @@ static bool copy_device_table(void)
>>>   	if (!amd_iommu_pre_enabled)
>>>   		return false;
>>>   
>>> -	pr_warn("Translation is already enabled - trying to copy translation structures\n");
>>> +	pr_warn("Translation is already enabled - trying to reuse translation structures\n");
>>>   
>>>   	/*
>>>   	 * All IOMMUs within PCI segment shares common device table.
>>> -	 * Hence copy device table only once per PCI segment.
>>> +	 * Hence reuse device table only once per PCI segment.
>>>   	 */
>>>   	for_each_pci_segment(pci_seg) {
>>>   		for_each_iommu(iommu) {
>>>   			if (pci_seg->id != iommu->pci_seg->id)
>>>   				continue;
>>> -			if (!__copy_device_table(iommu))
>>> +			if (!__reuse_device_table(iommu))
>>>   				return false;
>>>   			break;
>>>   		}
>>> @@ -2917,8 +2876,8 @@ static void early_enable_iommu(struct amd_iommu *iommu)
>>>    * This function finally enables all IOMMUs found in the system after
>>>    * they have been initialized.
>>>    *
>>> - * Or if in kdump kernel and IOMMUs are all pre-enabled, try to copy
>>> - * the old content of device table entries. Not this case or copy failed,
>>> + * Or if in kdump kernel and IOMMUs are all pre-enabled, try to reuse
>>> + * the old content of device table entries. Not this case or reuse failed,
>>>    * just continue as normal kernel does.
>>>    */
>>>   static void early_enable_iommus(void)
>>> @@ -2926,18 +2885,18 @@ static void early_enable_iommus(void)
>>>   	struct amd_iommu *iommu;
>>>   	struct amd_iommu_pci_seg *pci_seg;
>>>   
>>> -	if (!copy_device_table()) {
>>> +	if (!reuse_device_table()) {
>>
>> Hmmm. What happens if SNP enabled and reuse_device_table() couldn't setup
>> previous DTE?
>> In non-SNP case it works fine as we can rebuild new DTE. But in SNP case we
>> should fail the kdump right?
>>
> 
> Which will happen automatically, if we can't setup previous DTE for SNP case
> then IOMMU commands will time-out and subsequenly cause a panic as IRQ remapping
> won't be setup.
> 
> So this is as good as failing the kdump, which will have the same result.
> 

Maybe we can have a BUG_ON() when it fails to remap the DTE in kdump
kernel and SNP is turned on ?

Its hard to understand why kdump has failed just by looking at
completion wait timeout. Will be easier to debug with BUG_ON().

> Thanks,
> Ashish
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/4] crypto: ccp: Skip SNP INIT for kdump boot
  2025-07-16 22:03     ` Kalra, Ashish
@ 2025-07-17  5:56       ` Vasant Hegde
  0 siblings, 0 replies; 21+ messages in thread
From: Vasant Hegde @ 2025-07-17  5:56 UTC (permalink / raw)
  To: Kalra, Ashish, joro, suravee.suthikulpanit, thomas.lendacky,
	Sairaj.ArunKodilkar, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm

Hi Ashish,


On 7/17/2025 3:33 AM, Kalra, Ashish wrote:
> Hello Vasant,
> 
> On 7/16/2025 4:20 AM, Vasant Hegde wrote:
>>
>>
>> On 7/16/2025 12:57 AM, Ashish Kalra wrote:
>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>
>>> If SNP is enabled and initialized in the previous kernel then SNP is
>>> already initialized for kdump boot and attempting SNP INIT again
>>> during kdump boot causes SNP INIT failure and eventually leads to
>>> IOMMU failures.
>>>
>>> Skip SNP INIT if doing kdump boot.
>>
>> Just double checking, do we need check for snp_rmptable_init()?
>>
> 
> Do you mean adding this check in snp_rmptable_init() ?
> 
> We already have a check there for kexec boot: 
> 
> snp_rmptable_init()
> {
> ...
> ...
> 	/*
>          * Check if SEV-SNP is already enabled, this can happen in case of
>          * kexec boot.
>          */
>         rdmsrq(MSR_AMD64_SYSCFG, val);
>         if (val & MSR_AMD64_SYSCFG_SNP_EN)
>                 goto skip_enable;

Ah Ok. thanks!

> 
> And we still have to map the RMP table in the kernel as SNP is still enabled
> and initialized in this case for kdump boot, so that is still required as
> part of snp_rmptable_init().
> 
> Additionally, for this patch i also have to skip SEV INIT similar to what we
> are doing for SNP INIT as we get SEV INIT failure warnings as SEV is also
> initialized during this kdump boot similar to SNP.

Sure thanks!

-Vasant



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/4] iommu/amd: Reuse device table for kdump
  2025-07-16 22:07     ` Kalra, Ashish
  2025-07-17  5:38       ` Sairaj Kodilkar
@ 2025-07-17  6:05       ` Vasant Hegde
  2025-07-17  6:51         ` Kalra, Ashish
  1 sibling, 1 reply; 21+ messages in thread
From: Vasant Hegde @ 2025-07-17  6:05 UTC (permalink / raw)
  To: Kalra, Ashish, joro, suravee.suthikulpanit, thomas.lendacky,
	Sairaj.ArunKodilkar, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm

Ashish,

On 7/17/2025 3:37 AM, Kalra, Ashish wrote:
> Hello Vasant,
> 
> On 7/16/2025 4:42 AM, Vasant Hegde wrote:
>>
>>
>> On 7/16/2025 12:57 AM, Ashish Kalra wrote:
>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>
>>> After a panic if SNP is enabled in the previous kernel then the kdump
>>> kernel boots with IOMMU SNP enforcement still enabled.
>>>
>>> IOMMU device table register is locked and exclusive to the previous
>>> kernel. Attempts to copy old device table from the previous kernel
>>> fails in kdump kernel as hardware ignores writes to the locked device
>>> table base address register as per AMD IOMMU spec Section 2.12.2.1.
>>>
>>> This results in repeated "Completion-Wait loop timed out" errors and a
>>> second kernel panic: "Kernel panic - not syncing: timer doesn't work
>>> through Interrupt-remapped IO-APIC".
>>>
>>> Reuse device table instead of copying device table in case of kdump
>>> boot and remove all copying device table code.
>>>
>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>> ---
>>>  drivers/iommu/amd/init.c | 97 ++++++++++++----------------------------
>>>  1 file changed, 28 insertions(+), 69 deletions(-)
>>>
>>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>>> index 32295f26be1b..18bd869a82d9 100644
>>> --- a/drivers/iommu/amd/init.c
>>> +++ b/drivers/iommu/amd/init.c
>>> @@ -406,6 +406,9 @@ static void iommu_set_device_table(struct amd_iommu *iommu)
>>>  
>>>  	BUG_ON(iommu->mmio_base == NULL);
>>>  
>>> +	if (is_kdump_kernel())
>>
>> This is fine.. but its becoming too many places with kdump check! I don't know
>> what is the better way here.
>> Is it worth to keep it like this -OR- add say iommu ops that way during init we
>> check is_kdump_kernel() and adjust the ops ?
>>
>> @Joerg, any preference?
>>
>>

.../...

>>>  			break;
>>>  		}
>>> @@ -2917,8 +2876,8 @@ static void early_enable_iommu(struct amd_iommu *iommu)
>>>   * This function finally enables all IOMMUs found in the system after
>>>   * they have been initialized.
>>>   *
>>> - * Or if in kdump kernel and IOMMUs are all pre-enabled, try to copy
>>> - * the old content of device table entries. Not this case or copy failed,
>>> + * Or if in kdump kernel and IOMMUs are all pre-enabled, try to reuse
>>> + * the old content of device table entries. Not this case or reuse failed,
>>>   * just continue as normal kernel does.
>>>   */
>>>  static void early_enable_iommus(void)
>>> @@ -2926,18 +2885,18 @@ static void early_enable_iommus(void)
>>>  	struct amd_iommu *iommu;
>>>  	struct amd_iommu_pci_seg *pci_seg;
>>>  
>>> -	if (!copy_device_table()) {
>>> +	if (!reuse_device_table()) {
>>
>> Hmmm. What happens if SNP enabled and reuse_device_table() couldn't setup
>> previous DTE?
>> In non-SNP case it works fine as we can rebuild new DTE. But in SNP case we
>> should fail the kdump right?
>>
> 
> Which will happen automatically, if we can't setup previous DTE for SNP case
> then IOMMU commands will time-out and subsequenly cause a panic as IRQ remapping
> won't be setup.

But what is the point is proceeding when we know its going to fail? I think its
better to fail here so that at least we know where/why it failed.

-Vasant


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 4/4] iommu/amd: Fix host kdump support for SNP
  2025-07-16 22:12     ` Kalra, Ashish
@ 2025-07-17  6:22       ` Vasant Hegde
  2025-07-17  6:55         ` Kalra, Ashish
  0 siblings, 1 reply; 21+ messages in thread
From: Vasant Hegde @ 2025-07-17  6:22 UTC (permalink / raw)
  To: Kalra, Ashish, joro, suravee.suthikulpanit, thomas.lendacky,
	Sairaj.ArunKodilkar, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm



On 7/17/2025 3:42 AM, Kalra, Ashish wrote:
> Hello Vasant,
> 
> On 7/16/2025 4:46 AM, Vasant Hegde wrote:
>>
>>
>> On 7/16/2025 12:57 AM, Ashish Kalra wrote:
>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>
>>> When a crash is triggered the kernel attempts to shut down SEV-SNP
>>> using the SNP_SHUTDOWN_EX command. If active SNP VMs are present,
>>> SNP_SHUTDOWN_EX fails as firmware checks all encryption-capable ASIDs
>>> to ensure none are in use and that a DF_FLUSH is not required. If a
>>> DF_FLUSH is required, the firmware returns DFFLUSH_REQUIRED, causing
>>> SNP_SHUTDOWN_EX to fail.
>>>
>>> This casues the kdump kernel to boot with IOMMU SNP enforcement still
>>> enabled and IOMMU completion wait buffers (CWBs), command buffers,
>>> device tables and event buffer registers remain locked and exclusive
>>> to the previous kernel. Attempts to allocate and use new buffers in
>>> the kdump kernel fail, as the hardware ignores writes to the locked
>>> MMIO registers (per AMD IOMMU spec Section 2.12.2.1).
>>>
>>> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
>>> remapping which is required for proper operation.
>>>
>>> This results in repeated "Completion-Wait loop timed out" errors and a
>>> second kernel panic: "Kernel panic - not syncing: timer doesn't work
>>> through Interrupt-remapped IO-APIC"
>>>
>>> The following MMIO registers are locked and ignore writes after failed
>>> SNP shutdown:
>>> Device Table Base Address Register
>>> Command Buffer Base Address Register
>>> Event Buffer Base Address Register
>>> Completion Store Base Register/Exclusion Base Register
>>> Completion Store Limit Register/Exclusion Range Limit Register
>>>
>>
>> May be you can rephrase the description as first patch covered some of these
>> details
> 
> We do need to include the complete description here as this is the final
> patch of the series which fixes the kdump boot.
> 
> Do note, that the description in the first patch only mentions the 
> IOMMU buffers - command, CWB and event buffers for reuse and this commit
> log covers all reusing and remapping required - IOMMU buffers, device table,
> etc.
>  
>>> Instead of allocating new buffers, re-use the previous kernel’s pages
>>> for completion wait buffers, command buffers, event buffers and device
>>> tables and operate with the already enabled SNP configuration and
>>> existing data structures.
>>>
>>> This approach is now used for kdump boot regardless of whether SNP is
>>> enabled during kdump.
>>>
>>> The fix enables successful crashkernel/kdump operation on SNP hosts
>>> even when SNP_SHUTDOWN_EX fails.
>>>
>>> Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")
>>
>> I am not sure why you have marked only this patch as Fixes? Also it won't fix
>> the kdump if someone just backports only this patch right?
>>
> 
> As mentioned in the cover letter, this is the final patch of the series which 
> actually fixes the SNP kdump boot, so i kept Fixes: tag as part of this patch.
> > I am not sure if i can add Fixes: tag to all the four patches in this series ?

But just adding Fixes to this one patch is adding more confusion and
complicating backport process.

Is this really a fix? Did kdump ever worked on SNP enabled system? If yes then
add Fixes to all patches. If not call it as an enhancement.


-Vasant



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/4] iommu/amd: Reuse device table for kdump
  2025-07-17  6:05       ` Vasant Hegde
@ 2025-07-17  6:51         ` Kalra, Ashish
  0 siblings, 0 replies; 21+ messages in thread
From: Kalra, Ashish @ 2025-07-17  6:51 UTC (permalink / raw)
  To: Vasant Hegde, joro, suravee.suthikulpanit, thomas.lendacky,
	Sairaj.ArunKodilkar, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm

Hello Vasant and Sairaj,

On 7/17/2025 1:05 AM, Vasant Hegde wrote:
> Ashish,
> 
> On 7/17/2025 3:37 AM, Kalra, Ashish wrote:
>> Hello Vasant,
>>
>> On 7/16/2025 4:42 AM, Vasant Hegde wrote:
>>>
>>>
>>> On 7/16/2025 12:57 AM, Ashish Kalra wrote:
>>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>>
>>>> After a panic if SNP is enabled in the previous kernel then the kdump
>>>> kernel boots with IOMMU SNP enforcement still enabled.
>>>>
>>>> IOMMU device table register is locked and exclusive to the previous
>>>> kernel. Attempts to copy old device table from the previous kernel
>>>> fails in kdump kernel as hardware ignores writes to the locked device
>>>> table base address register as per AMD IOMMU spec Section 2.12.2.1.
>>>>
>>>> This results in repeated "Completion-Wait loop timed out" errors and a
>>>> second kernel panic: "Kernel panic - not syncing: timer doesn't work
>>>> through Interrupt-remapped IO-APIC".
>>>>
>>>> Reuse device table instead of copying device table in case of kdump
>>>> boot and remove all copying device table code.
>>>>
>>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>>> ---
>>>>  drivers/iommu/amd/init.c | 97 ++++++++++++----------------------------
>>>>  1 file changed, 28 insertions(+), 69 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>>>> index 32295f26be1b..18bd869a82d9 100644
>>>> --- a/drivers/iommu/amd/init.c
>>>> +++ b/drivers/iommu/amd/init.c
>>>> @@ -406,6 +406,9 @@ static void iommu_set_device_table(struct amd_iommu *iommu)
>>>>  
>>>>  	BUG_ON(iommu->mmio_base == NULL);
>>>>  
>>>> +	if (is_kdump_kernel())
>>>
>>> This is fine.. but its becoming too many places with kdump check! I don't know
>>> what is the better way here.
>>> Is it worth to keep it like this -OR- add say iommu ops that way during init we
>>> check is_kdump_kernel() and adjust the ops ?
>>>
>>> @Joerg, any preference?
>>>
>>>
> 
> .../...
> 
>>>>  			break;
>>>>  		}
>>>> @@ -2917,8 +2876,8 @@ static void early_enable_iommu(struct amd_iommu *iommu)
>>>>   * This function finally enables all IOMMUs found in the system after
>>>>   * they have been initialized.
>>>>   *
>>>> - * Or if in kdump kernel and IOMMUs are all pre-enabled, try to copy
>>>> - * the old content of device table entries. Not this case or copy failed,
>>>> + * Or if in kdump kernel and IOMMUs are all pre-enabled, try to reuse
>>>> + * the old content of device table entries. Not this case or reuse failed,
>>>>   * just continue as normal kernel does.
>>>>   */
>>>>  static void early_enable_iommus(void)
>>>> @@ -2926,18 +2885,18 @@ static void early_enable_iommus(void)
>>>>  	struct amd_iommu *iommu;
>>>>  	struct amd_iommu_pci_seg *pci_seg;
>>>>  
>>>> -	if (!copy_device_table()) {
>>>> +	if (!reuse_device_table()) {
>>>
>>> Hmmm. What happens if SNP enabled and reuse_device_table() couldn't setup
>>> previous DTE?
>>> In non-SNP case it works fine as we can rebuild new DTE. But in SNP case we
>>> should fail the kdump right?
>>>
>>
>> Which will happen automatically, if we can't setup previous DTE for SNP case
>> then IOMMU commands will time-out and subsequenly cause a panic as IRQ remapping
>> won't be setup.
> 
> But what is the point is proceeding when we know its going to fail? I think its
> better to fail here so that at least we know where/why it failed.
> 

Yes that makes sense.

As Sairaj suggested, we can add a BUG_ON() if reuse_device_table() fails in case
of SNP enabled.

Thanks,
Ashish


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 4/4] iommu/amd: Fix host kdump support for SNP
  2025-07-17  6:22       ` Vasant Hegde
@ 2025-07-17  6:55         ` Kalra, Ashish
  0 siblings, 0 replies; 21+ messages in thread
From: Kalra, Ashish @ 2025-07-17  6:55 UTC (permalink / raw)
  To: Vasant Hegde, joro, suravee.suthikulpanit, thomas.lendacky,
	Sairaj.ArunKodilkar, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm

Hello Vasant,

On 7/17/2025 1:22 AM, Vasant Hegde wrote:
> 
> 
> On 7/17/2025 3:42 AM, Kalra, Ashish wrote:
>> Hello Vasant,
>>
>> On 7/16/2025 4:46 AM, Vasant Hegde wrote:
>>>
>>>
>>> On 7/16/2025 12:57 AM, Ashish Kalra wrote:
>>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>>
>>>> When a crash is triggered the kernel attempts to shut down SEV-SNP
>>>> using the SNP_SHUTDOWN_EX command. If active SNP VMs are present,
>>>> SNP_SHUTDOWN_EX fails as firmware checks all encryption-capable ASIDs
>>>> to ensure none are in use and that a DF_FLUSH is not required. If a
>>>> DF_FLUSH is required, the firmware returns DFFLUSH_REQUIRED, causing
>>>> SNP_SHUTDOWN_EX to fail.
>>>>
>>>> This casues the kdump kernel to boot with IOMMU SNP enforcement still
>>>> enabled and IOMMU completion wait buffers (CWBs), command buffers,
>>>> device tables and event buffer registers remain locked and exclusive
>>>> to the previous kernel. Attempts to allocate and use new buffers in
>>>> the kdump kernel fail, as the hardware ignores writes to the locked
>>>> MMIO registers (per AMD IOMMU spec Section 2.12.2.1).
>>>>
>>>> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
>>>> remapping which is required for proper operation.
>>>>
>>>> This results in repeated "Completion-Wait loop timed out" errors and a
>>>> second kernel panic: "Kernel panic - not syncing: timer doesn't work
>>>> through Interrupt-remapped IO-APIC"
>>>>
>>>> The following MMIO registers are locked and ignore writes after failed
>>>> SNP shutdown:
>>>> Device Table Base Address Register
>>>> Command Buffer Base Address Register
>>>> Event Buffer Base Address Register
>>>> Completion Store Base Register/Exclusion Base Register
>>>> Completion Store Limit Register/Exclusion Range Limit Register
>>>>
>>>
>>> May be you can rephrase the description as first patch covered some of these
>>> details
>>
>> We do need to include the complete description here as this is the final
>> patch of the series which fixes the kdump boot.
>>
>> Do note, that the description in the first patch only mentions the 
>> IOMMU buffers - command, CWB and event buffers for reuse and this commit
>> log covers all reusing and remapping required - IOMMU buffers, device table,
>> etc.
>>  
>>>> Instead of allocating new buffers, re-use the previous kernel’s pages
>>>> for completion wait buffers, command buffers, event buffers and device
>>>> tables and operate with the already enabled SNP configuration and
>>>> existing data structures.
>>>>
>>>> This approach is now used for kdump boot regardless of whether SNP is
>>>> enabled during kdump.
>>>>
>>>> The fix enables successful crashkernel/kdump operation on SNP hosts
>>>> even when SNP_SHUTDOWN_EX fails.
>>>>
>>>> Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")
>>>
>>> I am not sure why you have marked only this patch as Fixes? Also it won't fix
>>> the kdump if someone just backports only this patch right?
>>>
>>
>> As mentioned in the cover letter, this is the final patch of the series which 
>> actually fixes the SNP kdump boot, so i kept Fixes: tag as part of this patch.
>>> I am not sure if i can add Fixes: tag to all the four patches in this series ?
> 
> But just adding Fixes to this one patch is adding more confusion and
> complicating backport process.
> 
> Is this really a fix? Did kdump ever worked on SNP enabled system? If yes then
> add Fixes to all patches. If not call it as an enhancement.
> 

Well, kdump only worked on SNP enabled systems if there are no active SNP VMs.

But i think it makes more sense to remove the Fixes: tag from these patch-series
as this SNP kdump support is more or less a feature enhancement for SNP.

Thanks,
Ashish


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/4] iommu/amd: Add support to remap/unmap IOMMU buffers for kdump
  2025-07-16 21:55     ` Kalra, Ashish
@ 2025-07-17  7:05       ` Vasant Hegde
  2025-07-17  7:16         ` Kalra, Ashish
  0 siblings, 1 reply; 21+ messages in thread
From: Vasant Hegde @ 2025-07-17  7:05 UTC (permalink / raw)
  To: Kalra, Ashish, joro, suravee.suthikulpanit, thomas.lendacky,
	Sairaj.ArunKodilkar, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm

Ashish,


On 7/17/2025 3:25 AM, Kalra, Ashish wrote:
> Hello Vasant,
> 
> On 7/16/2025 4:19 AM, Vasant Hegde wrote:
>> Hi Ashish,
>>
>>
>> On 7/16/2025 12:56 AM, Ashish Kalra wrote:
>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>
>>> After a panic if SNP is enabled in the previous kernel then the kdump
>>> kernel boots with IOMMU SNP enforcement still enabled.
>>>
>>> IOMMU completion wait buffers (CWBs), command buffers and event buffer
>>> registers remain locked and exclusive to the previous kernel. Attempts
>>> to allocate and use new buffers in the kdump kernel fail, as hardware
>>> ignores writes to the locked MMIO registers as per AMD IOMMU spec
>>> Section 2.12.2.1.
>>>
>>> This results in repeated "Completion-Wait loop timed out" errors and a
>>> second kernel panic: "Kernel panic - not syncing: timer doesn't work
>>> through Interrupt-remapped IO-APIC"
>>>
>>> The following MMIO registers are locked and ignore writes after failed
>>> SNP shutdown:
>>> Command Buffer Base Address Register
>>> Event Log Base Address Register
>>> Completion Store Base Register/Exclusion Base Register
>>> Completion Store Limit Register/Exclusion Limit Register
>>> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
>>> remapping, which is required for proper operation.
>>
>> There are couple of other registers in locked list. Can you please rephrase
>> above paras?  Also you don't need to callout indivisual registers here. You can
>> just add link to IOMMU spec.
> 
> Yes i will drop listing the individual registers here and just provide the link
> to the IOMMU specs.

May be you can rephrase a bit so that its clear that there are many register
gets locked. This patch fixes few of them and following patches will fix
remaining ones.

> 
>>
>> Unrelated to this patch :
>>   I went to some of the SNP related code in IOMMU driver. One thing confused me
>> is in amd_iommu_snp_disable() code why Command buffer is not marked as shared?
>> any idea?
>>
> 
> Yes that's interesting. 
> 
> This is as per the SNP Firmware ABI specs: 
> 
> from SNP_INIT_EX: 
> 
> The firmware initializes the IOMMU to perform RMP enforcement. The firmware also transitions
> the event log, PPR log, and completion wait buffers of the IOMMU to an RMP page state that is 
> read only to the hypervisor and cannot be assigned to guests
> 
> So during SNP_SHUTDOWN_EX, transitioning these same buffers back to shared state.
> 
> But will investigate deeper and check why is command buffer not marked as FW/Reclaim state
> by firmware ? 

Sure.

> 
>>
>>>
>>> Reuse the pages of the previous kernel for completion wait buffers,
>>> command buffers, event buffers and memremap them during kdump boot
>>> and essentially work with an already enabled IOMMU configuration and
>>> re-using the previous kernel’s data structures.
>>>
>>> Reusing of command buffers and event buffers is now done for kdump boot
>>> irrespective of SNP being enabled during kdump.
>>>
>>> Re-use of completion wait buffers is only done when SNP is enabled as
>>> the exclusion base register is used for the completion wait buffer
>>> (CWB) address only when SNP is enabled.
>>>
>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>> ---
>>>  drivers/iommu/amd/amd_iommu_types.h |   5 +
>>>  drivers/iommu/amd/init.c            | 163 ++++++++++++++++++++++++++--
>>>  drivers/iommu/amd/iommu.c           |   2 +-
>>>  3 files changed, 157 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
>>> index 9b64cd706c96..082eb1270818 100644
>>> --- a/drivers/iommu/amd/amd_iommu_types.h
>>> +++ b/drivers/iommu/amd/amd_iommu_types.h
>>> @@ -791,6 +791,11 @@ struct amd_iommu {
>>>  	u32 flags;
>>>  	volatile u64 *cmd_sem;
>>>  	atomic64_t cmd_sem_val;
>>> +	/*
>>> +	 * Track physical address to directly use it in build_completion_wait()
>>> +	 * and avoid adding any special checks and handling for kdump.
>>> +	 */
>>> +	u64 cmd_sem_paddr;
>>
>> With this we are tracking both physical and virtual address? Is that really
>> needed? Can we just track PA and convert it into va?
>>
> 
> I believe it is simpler to keep/track cmd_sem and use it directly, instead of doing
> phys_to_virt() calls everytime before using it.
>  
>>>  
>>>  #ifdef CONFIG_AMD_IOMMU_DEBUGFS
>>>  	/* DebugFS Info */
>>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>>> index cadb2c735ffc..32295f26be1b 100644
>>> --- a/drivers/iommu/amd/init.c
>>> +++ b/drivers/iommu/amd/init.c
>>> @@ -710,6 +710,23 @@ static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
>>>  	pci_seg->alias_table = NULL;
>>>  }
>>>  
>>> +static inline void *iommu_memremap(unsigned long paddr, size_t size)
>>> +{
>>> +	phys_addr_t phys;
>>> +
>>> +	if (!paddr)
>>> +		return NULL;
>>> +
>>> +	/*
>>> +	 * Obtain true physical address in kdump kernel when SME is enabled.
>>> +	 * Currently, IOMMU driver does not support booting into an unencrypted
>>> +	 * kdump kernel.
>>
>> You mean production kernel w/ SME and kdump kernel with non-SME is not supported?
>>
> 
> Yes. 

Then can you please rephrase above comment?

> 
>>
>>> +	 */
>>> +	phys = __sme_clr(paddr);
>>> +
>>> +	return ioremap_encrypted(phys, size);
>>
>> You are clearing C bit and then immediately remapping using encrypted mode. Also
>> existing code checks for C bit before calling ioremap_encrypted(). So I am not
>> clear why you do this.
>>
>>
> 
> We need to clear the C-bit to get the correct physical address for remapping.
> 
> Which existing code checks for C-bit before calling ioremap_encrypted() ?
> 
> After getting the correct physical address we call ioremap_encrypted() which
> which map it with C-bit enabled if SME is enabled or else it will map it 
> without C-bit (so it handles both SME and non-SME cases).
>  
> Earlier we used to check for CC_ATTR_HOST_MEM_ENCRYPT flag and if set 
> then call ioremap_encrypted() or otherwise call memremap(), but then
> as mentioned above ioremap_encrypted() works for both cases - SME or
> non-SME, hence we use that approach.

If you want to keep it in current way then it needs better comment. I'd say add
CC_ATTR_HOST_MEM_ENCRYPT check so that its easy to read.


> 
>>
>>> +}
>>> +
>>>  /*
>>>   * Allocates the command buffer. This buffer is per AMD IOMMU. We can
>>>   * write commands to that buffer later and the IOMMU will execute them
>>> @@ -942,8 +959,105 @@ static int iommu_init_ga_log(struct amd_iommu *iommu)
>>>  static int __init alloc_cwwb_sem(struct amd_iommu *iommu)
>>>  {
>>>  	iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL, 1);
>>> +	if (!iommu->cmd_sem)
>>> +		return -ENOMEM;
>>> +	iommu->cmd_sem_paddr = iommu_virt_to_phys((void *)iommu->cmd_sem);
>>> +	return 0;
>>> +}
>>> +
>>> +static int __init remap_event_buffer(struct amd_iommu *iommu)
>>> +{
>>> +	u64 paddr;
>>> +
>>> +	pr_info_once("Re-using event buffer from the previous kernel\n");
>>> +	/*
>>> +	 * Read-back the event log base address register and apply
>>> +	 * PM_ADDR_MASK to obtain the event log base address.
>>> +	 */
>>> +	paddr = readq(iommu->mmio_base + MMIO_EVT_BUF_OFFSET) & PM_ADDR_MASK;
>>> +	iommu->evt_buf = iommu_memremap(paddr, EVT_BUFFER_SIZE);
>>> +
>>> +	return iommu->evt_buf ? 0 : -ENOMEM;
>>> +}
>>> +
>>> +static int __init remap_command_buffer(struct amd_iommu *iommu)
>>> +{
>>> +	u64 paddr;
>>> +
>>> +	pr_info_once("Re-using command buffer from the previous kernel\n");
>>> +	/*
>>> +	 * Read-back the command buffer base address register and apply
>>> +	 * PM_ADDR_MASK to obtain the command buffer base address.
>>> +	 */
>>> +	paddr = readq(iommu->mmio_base + MMIO_CMD_BUF_OFFSET) & PM_ADDR_MASK;
>>> +	iommu->cmd_buf = iommu_memremap(paddr, CMD_BUFFER_SIZE);
>>> +
>>> +	return iommu->cmd_buf ? 0 : -ENOMEM;
>>> +}
>>> +
>>> +static int __init remap_cwwb_sem(struct amd_iommu *iommu)
>>> +{
>>> +	u64 paddr;
>>> +
>>> +	if (check_feature(FEATURE_SNP)) {
>>> +		/*
>>> +		 * When SNP is enabled, the exclusion base register is used for the
>>> +		 * completion wait buffer (CWB) address. Read and re-use it.
>>> +		 */
>>> +		pr_info_once("Re-using CWB buffers from the previous kernel\n");
>>> +		/*
>>> +		 * Read-back the exclusion base register and apply PM_ADDR_MASK
>>> +		 * to obtain the exclusion range base address.
>>> +		 */
>>> +		paddr = readq(iommu->mmio_base + MMIO_EXCL_BASE_OFFSET) & PM_ADDR_MASK;
>>> +		iommu->cmd_sem = iommu_memremap(paddr, PAGE_SIZE);
>>> +		if (!iommu->cmd_sem)
>>> +			return -ENOMEM;
>>> +		iommu->cmd_sem_paddr = paddr;
>>> +	} else {
>>> +		return alloc_cwwb_sem(iommu);
>>
>> I understand this one is different from command/event buffer. But calling
>> function name as remap_*() and then allocating memory internally is bit odd.
>> Also this differs from previous functions.
>>
> 
> Yes i agree, but then what do we name it ?
> 
> remap_or_alloc_cwb_sem() does that sound Ok ?

May be.


-Vasant



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/4] iommu/amd: Add support to remap/unmap IOMMU buffers for kdump
  2025-07-17  7:05       ` Vasant Hegde
@ 2025-07-17  7:16         ` Kalra, Ashish
  0 siblings, 0 replies; 21+ messages in thread
From: Kalra, Ashish @ 2025-07-17  7:16 UTC (permalink / raw)
  To: Vasant Hegde, joro, suravee.suthikulpanit, thomas.lendacky,
	Sairaj.ArunKodilkar, herbert
  Cc: seanjc, pbonzini, will, robin.murphy, john.allen, davem, bp,
	michael.roth, iommu, linux-kernel, linux-crypto, kvm


On 7/17/2025 2:05 AM, Vasant Hegde wrote:
> Ashish,
> 
> 
> On 7/17/2025 3:25 AM, Kalra, Ashish wrote:
>> Hello Vasant,
>>
>> On 7/16/2025 4:19 AM, Vasant Hegde wrote:
>>> Hi Ashish,
>>>
>>>
>>> On 7/16/2025 12:56 AM, Ashish Kalra wrote:
>>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>>
>>>> After a panic if SNP is enabled in the previous kernel then the kdump
>>>> kernel boots with IOMMU SNP enforcement still enabled.
>>>>
>>>> IOMMU completion wait buffers (CWBs), command buffers and event buffer
>>>> registers remain locked and exclusive to the previous kernel. Attempts
>>>> to allocate and use new buffers in the kdump kernel fail, as hardware
>>>> ignores writes to the locked MMIO registers as per AMD IOMMU spec
>>>> Section 2.12.2.1.
>>>>
>>>> This results in repeated "Completion-Wait loop timed out" errors and a
>>>> second kernel panic: "Kernel panic - not syncing: timer doesn't work
>>>> through Interrupt-remapped IO-APIC"
>>>>
>>>> The following MMIO registers are locked and ignore writes after failed
>>>> SNP shutdown:
>>>> Command Buffer Base Address Register
>>>> Event Log Base Address Register
>>>> Completion Store Base Register/Exclusion Base Register
>>>> Completion Store Limit Register/Exclusion Limit Register
>>>> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
>>>> remapping, which is required for proper operation.
>>>
>>> There are couple of other registers in locked list. Can you please rephrase
>>> above paras?  Also you don't need to callout indivisual registers here. You can
>>> just add link to IOMMU spec.
>>
>> Yes i will drop listing the individual registers here and just provide the link
>> to the IOMMU specs.
> 
> May be you can rephrase a bit so that its clear that there are many register
> gets locked. This patch fixes few of them and following patches will fix
> remaining ones.
> 
>>

Ok.

>>>
>>> Unrelated to this patch :
>>>   I went to some of the SNP related code in IOMMU driver. One thing confused me
>>> is in amd_iommu_snp_disable() code why Command buffer is not marked as shared?
>>> any idea?
>>>
>>
>> Yes that's interesting. 
>>
>> This is as per the SNP Firmware ABI specs: 
>>
>> from SNP_INIT_EX: 
>>
>> The firmware initializes the IOMMU to perform RMP enforcement. The firmware also transitions
>> the event log, PPR log, and completion wait buffers of the IOMMU to an RMP page state that is 
>> read only to the hypervisor and cannot be assigned to guests
>>
>> So during SNP_SHUTDOWN_EX, transitioning these same buffers back to shared state.
>>
>> But will investigate deeper and check why is command buffer not marked as FW/Reclaim state
>> by firmware ? 
> 
> Sure.
> 
>>

Did double check this in the SEV firmware code: 

Only the IOMMU Event logs, PPR Logs and Completion Wait buffers are transitioned to FW state
and hence the same need to be transitioned from reclaimed to shared state at SNP_SHUTDOWN_EX (iommu_snp_shutdown).
 
>>>
>>>>
>>>> Reuse the pages of the previous kernel for completion wait buffers,
>>>> command buffers, event buffers and memremap them during kdump boot
>>>> and essentially work with an already enabled IOMMU configuration and
>>>> re-using the previous kernel’s data structures.
>>>>
>>>> Reusing of command buffers and event buffers is now done for kdump boot
>>>> irrespective of SNP being enabled during kdump.
>>>>
>>>> Re-use of completion wait buffers is only done when SNP is enabled as
>>>> the exclusion base register is used for the completion wait buffer
>>>> (CWB) address only when SNP is enabled.
>>>>
>>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>>> ---
>>>>  drivers/iommu/amd/amd_iommu_types.h |   5 +
>>>>  drivers/iommu/amd/init.c            | 163 ++++++++++++++++++++++++++--
>>>>  drivers/iommu/amd/iommu.c           |   2 +-
>>>>  3 files changed, 157 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
>>>> index 9b64cd706c96..082eb1270818 100644
>>>> --- a/drivers/iommu/amd/amd_iommu_types.h
>>>> +++ b/drivers/iommu/amd/amd_iommu_types.h
>>>> @@ -791,6 +791,11 @@ struct amd_iommu {
>>>>  	u32 flags;
>>>>  	volatile u64 *cmd_sem;
>>>>  	atomic64_t cmd_sem_val;
>>>> +	/*
>>>> +	 * Track physical address to directly use it in build_completion_wait()
>>>> +	 * and avoid adding any special checks and handling for kdump.
>>>> +	 */
>>>> +	u64 cmd_sem_paddr;
>>>
>>> With this we are tracking both physical and virtual address? Is that really
>>> needed? Can we just track PA and convert it into va?
>>>
>>
>> I believe it is simpler to keep/track cmd_sem and use it directly, instead of doing
>> phys_to_virt() calls everytime before using it.
>>  
>>>>  
>>>>  #ifdef CONFIG_AMD_IOMMU_DEBUGFS
>>>>  	/* DebugFS Info */
>>>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>>>> index cadb2c735ffc..32295f26be1b 100644
>>>> --- a/drivers/iommu/amd/init.c
>>>> +++ b/drivers/iommu/amd/init.c
>>>> @@ -710,6 +710,23 @@ static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
>>>>  	pci_seg->alias_table = NULL;
>>>>  }
>>>>  
>>>> +static inline void *iommu_memremap(unsigned long paddr, size_t size)
>>>> +{
>>>> +	phys_addr_t phys;
>>>> +
>>>> +	if (!paddr)
>>>> +		return NULL;
>>>> +
>>>> +	/*
>>>> +	 * Obtain true physical address in kdump kernel when SME is enabled.
>>>> +	 * Currently, IOMMU driver does not support booting into an unencrypted
>>>> +	 * kdump kernel.
>>>
>>> You mean production kernel w/ SME and kdump kernel with non-SME is not supported?
>>>
>>
>> Yes. 
> 
> Then can you please rephrase above comment?
> 
>>

Ok. 

>>>
>>>> +	 */
>>>> +	phys = __sme_clr(paddr);
>>>> +
>>>> +	return ioremap_encrypted(phys, size);
>>>
>>> You are clearing C bit and then immediately remapping using encrypted mode. Also
>>> existing code checks for C bit before calling ioremap_encrypted(). So I am not
>>> clear why you do this.
>>>
>>>
>>
>> We need to clear the C-bit to get the correct physical address for remapping.
>>
>> Which existing code checks for C-bit before calling ioremap_encrypted() ?
>>
>> After getting the correct physical address we call ioremap_encrypted() which
>> which map it with C-bit enabled if SME is enabled or else it will map it 
>> without C-bit (so it handles both SME and non-SME cases).
>>  
>> Earlier we used to check for CC_ATTR_HOST_MEM_ENCRYPT flag and if set 
>> then call ioremap_encrypted() or otherwise call memremap(), but then
>> as mentioned above ioremap_encrypted() works for both cases - SME or
>> non-SME, hence we use that approach.
> 
> If you want to keep it in current way then it needs better comment. I'd say add
> CC_ATTR_HOST_MEM_ENCRYPT check so that its easy to read.
> 
>
Ok, i will add back the CC_ATTR_HOST_MEM_ENCRYPT check.

Thanks,
Ashish
 
>>
>>>
>>>> +}
>>>> +
>>>>  /*
>>>>   * Allocates the command buffer. This buffer is per AMD IOMMU. We can
>>>>   * write commands to that buffer later and the IOMMU will execute them
>>>> @@ -942,8 +959,105 @@ static int iommu_init_ga_log(struct amd_iommu *iommu)
>>>>  static int __init alloc_cwwb_sem(struct amd_iommu *iommu)
>>>>  {
>>>>  	iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL, 1);
>>>> +	if (!iommu->cmd_sem)
>>>> +		return -ENOMEM;
>>>> +	iommu->cmd_sem_paddr = iommu_virt_to_phys((void *)iommu->cmd_sem);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int __init remap_event_buffer(struct amd_iommu *iommu)
>>>> +{
>>>> +	u64 paddr;
>>>> +
>>>> +	pr_info_once("Re-using event buffer from the previous kernel\n");
>>>> +	/*
>>>> +	 * Read-back the event log base address register and apply
>>>> +	 * PM_ADDR_MASK to obtain the event log base address.
>>>> +	 */
>>>> +	paddr = readq(iommu->mmio_base + MMIO_EVT_BUF_OFFSET) & PM_ADDR_MASK;
>>>> +	iommu->evt_buf = iommu_memremap(paddr, EVT_BUFFER_SIZE);
>>>> +
>>>> +	return iommu->evt_buf ? 0 : -ENOMEM;
>>>> +}
>>>> +
>>>> +static int __init remap_command_buffer(struct amd_iommu *iommu)
>>>> +{
>>>> +	u64 paddr;
>>>> +
>>>> +	pr_info_once("Re-using command buffer from the previous kernel\n");
>>>> +	/*
>>>> +	 * Read-back the command buffer base address register and apply
>>>> +	 * PM_ADDR_MASK to obtain the command buffer base address.
>>>> +	 */
>>>> +	paddr = readq(iommu->mmio_base + MMIO_CMD_BUF_OFFSET) & PM_ADDR_MASK;
>>>> +	iommu->cmd_buf = iommu_memremap(paddr, CMD_BUFFER_SIZE);
>>>> +
>>>> +	return iommu->cmd_buf ? 0 : -ENOMEM;
>>>> +}
>>>> +
>>>> +static int __init remap_cwwb_sem(struct amd_iommu *iommu)
>>>> +{
>>>> +	u64 paddr;
>>>> +
>>>> +	if (check_feature(FEATURE_SNP)) {
>>>> +		/*
>>>> +		 * When SNP is enabled, the exclusion base register is used for the
>>>> +		 * completion wait buffer (CWB) address. Read and re-use it.
>>>> +		 */
>>>> +		pr_info_once("Re-using CWB buffers from the previous kernel\n");
>>>> +		/*
>>>> +		 * Read-back the exclusion base register and apply PM_ADDR_MASK
>>>> +		 * to obtain the exclusion range base address.
>>>> +		 */
>>>> +		paddr = readq(iommu->mmio_base + MMIO_EXCL_BASE_OFFSET) & PM_ADDR_MASK;
>>>> +		iommu->cmd_sem = iommu_memremap(paddr, PAGE_SIZE);
>>>> +		if (!iommu->cmd_sem)
>>>> +			return -ENOMEM;
>>>> +		iommu->cmd_sem_paddr = paddr;
>>>> +	} else {
>>>> +		return alloc_cwwb_sem(iommu);
>>>
>>> I understand this one is different from command/event buffer. But calling
>>> function name as remap_*() and then allocating memory internally is bit odd.
>>> Also this differs from previous functions.
>>>
>>
>> Yes i agree, but then what do we name it ?
>>
>> remap_or_alloc_cwb_sem() does that sound Ok ?
> 
> May be.
> 
> 
> -Vasant
> 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-07-17  7:16 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-15 19:26 [PATCH v3 0/4] iommu/amd: Fix host kdump support for SNP Ashish Kalra
2025-07-15 19:26 ` [PATCH v3 1/4] iommu/amd: Add support to remap/unmap IOMMU buffers for kdump Ashish Kalra
2025-07-16  9:19   ` Vasant Hegde
2025-07-16 21:55     ` Kalra, Ashish
2025-07-17  7:05       ` Vasant Hegde
2025-07-17  7:16         ` Kalra, Ashish
2025-07-15 19:27 ` [PATCH v3 2/4] iommu/amd: Reuse device table " Ashish Kalra
2025-07-16  9:42   ` Vasant Hegde
2025-07-16 22:07     ` Kalra, Ashish
2025-07-17  5:38       ` Sairaj Kodilkar
2025-07-17  6:05       ` Vasant Hegde
2025-07-17  6:51         ` Kalra, Ashish
2025-07-15 19:27 ` [PATCH v3 3/4] crypto: ccp: Skip SNP INIT for kdump boot Ashish Kalra
2025-07-16  9:20   ` Vasant Hegde
2025-07-16 22:03     ` Kalra, Ashish
2025-07-17  5:56       ` Vasant Hegde
2025-07-15 19:27 ` [PATCH v3 4/4] iommu/amd: Fix host kdump support for SNP Ashish Kalra
2025-07-16  9:46   ` Vasant Hegde
2025-07-16 22:12     ` Kalra, Ashish
2025-07-17  6:22       ` Vasant Hegde
2025-07-17  6:55         ` Kalra, Ashish

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).