Kernel KVM virtualization development
 help / color / mirror / Atom feed
* [PATCH 0/2] riscv: ptdump: Refactor for KVM gstage ptdump support
@ 2026-07-01  8:50 Dylan.Wu
  2026-07-01  8:50 ` [PATCH 1/2] riscv: ptdump: Move pagetable definitions to common header Dylan.Wu
  2026-07-01  8:50 ` [PATCH 2/2] KVM: riscv: Register ptdump with debugfs on guest creation Dylan.Wu
  0 siblings, 2 replies; 5+ messages in thread
From: Dylan.Wu @ 2026-07-01  8:50 UTC (permalink / raw)
  To: palmer, pjw, aou, anup
  Cc: alex, atish.patra, zhouquan, linux-riscv, kvm, kvm-riscv,
	linux-kernel, Dylan.Wu

This series refactors the RISC-V ptdump implementation to support
dumping KVM gstage (stage-2) page tables via debugfs.

Patch 1 moves the pagetable walking state and level definitions to a
common header (arch/riscv/include/asm/ptdump.h), updates the attribute
parsing logic to use per-level bit definitions, and exports note_page()
so that it can be reused by the KVM ptdump implementation.

Patch 2 introduces KVM gstage ptdump support, registering a
'gstage_page_tables' file under the guest debugfs directory. This
allows userspace to inspect the gstage layout and permissions, which
is useful for architectural debugging and memory management audits.

Testing:
  - Booted on QEMU riscv64 with CONFIG_PTDUMP_DEBUGFS=y and
    CONFIG_PTDUMP_GSTAGE_DEBUGFS=y
  - Verified /sys/kernel/debug/kernel_page_tables output unchanged
  - Created a KVM guest and verified gstage_page_tables output: 
0x0000000080200000-0x0000000080209000    0x000000010c94e000        36K PTE   ..   D A U X W R V
0x000000008020a000-0x0000000080223000    0x000000010c9af000       100K PTE   ..   D A U X W R V
0x0000000080229000-0x000000008022b000    0x000000010c978000         8K PTE   ..   D A U X W R V
0x000000008022d000-0x000000008022e000    0x000000010c844000         4K PTE   ..   D A U X W R V
0x000000008022f000-0x0000000080232000    0x000000010c846000        12K PTE   ..   D A U X W R V
0x0000000080234000-0x0000000080238000    0x000000010ca13000        16K PTE   ..   D A U X W R V
0x000000008023a000-0x000000008023f000    0x00000001083bd000        20K PTE   ..   D A U X W R V
0x0000000080242000-0x000000008024b000    0x0000000106975000        36K PTE   ..   D A U X W R V
0x000000008024d000-0x0000000080252000    0x000000010cb54000        20K PTE   ..   D A U X W R V
0x0000000080255000-0x0000000080258000    0x000000010c164000        12K PTE   ..   D A U X W R V
0x000000008025a000-0x000000008025c000    0x00000001036a9000         8K PTE   ..   D A U X W R V
0x0000000080260000-0x0000000080275000    0x0000000107917000        84K PTE   ..   D A U X W R V
0x0000000080276000-0x000000008027e000    0x00000001078ed000        32K PTE   ..   D A U X W R V
0x000000008027f000-0x00000000802bb000    0x0000000108986000       240K PTE   ..   D A U X W R V
0x00000000802bc000-0x00000000802c0000    0x000000010762b000        16K PTE   ..   D A U X W R V
0x00000000802c2000-0x00000000802d2000    0x00000001083fd000        64K PTE   ..   D A U X W R V
0x00000000802d4000-0x00000000802e0000    0x000000010790b000        48K PTE   ..   D A U X W R V
0x00000000802e1000-0x00000000802e4000    0x0000000102f74000        12K PTE   ..   D A U X W R V
0x00000000802e5000-0x00000000802ef000    0x0000000104b9c000        40K PTE   ..   D A U X W R V
0x00000000802f1000-0x00000000802f2000    0x0000000103778000         4K PTE   ..   D A U X W R V
0x00000000802f3000-0x00000000802f9000    0x000000010377a000        24K PTE   ..   D A U X W R V


Dylan.Wu (2):
  riscv: ptdump: Move pagetable definitions to common header
  KVM: riscv: Register ptdump with debugfs on guest creation

 arch/riscv/include/asm/kvm_host.h |   6 +
 arch/riscv/include/asm/ptdump.h   |  42 +++++++
 arch/riscv/kvm/Kconfig            |  15 +++
 arch/riscv/kvm/Makefile           |   1 +
 arch/riscv/kvm/ptdump.c           | 178 ++++++++++++++++++++++++++++++
 arch/riscv/kvm/vm.c               |   5 +
 arch/riscv/mm/ptdump.c            | 102 ++++++-----------
 7 files changed, 282 insertions(+), 67 deletions(-)
 create mode 100644 arch/riscv/include/asm/ptdump.h
 create mode 100644 arch/riscv/kvm/ptdump.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] riscv: ptdump: Move pagetable definitions to common header
  2026-07-01  8:50 [PATCH 0/2] riscv: ptdump: Refactor for KVM gstage ptdump support Dylan.Wu
@ 2026-07-01  8:50 ` Dylan.Wu
  2026-07-01  9:01   ` sashiko-bot
  2026-07-01  8:50 ` [PATCH 2/2] KVM: riscv: Register ptdump with debugfs on guest creation Dylan.Wu
  1 sibling, 1 reply; 5+ messages in thread
From: Dylan.Wu @ 2026-07-01  8:50 UTC (permalink / raw)
  To: palmer, pjw, aou, anup
  Cc: alex, atish.patra, zhouquan, linux-riscv, kvm, kvm-riscv,
	linux-kernel, Dylan.Wu

Move the pagetable walking state and level definitions to a new
arch/riscv/include/asm/ptdump.h header. This allows other parts
of the kernel (like KVM) to reuse the ptdump attribute parsing
logic.

To support different pagetable types (like G-stage), the attribute
parsing logic in dump_prot() is updated to use attribute bits
defined within the pg_level structure rather than a global array.

Assisted-by: YuanSheng: deepseek-v4-pro
Co-developed-by: Quan Zhou <zhouquan@iscas.ac.cn>
Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Signed-off-by: Dylan.Wu <fredwudi0305@gmail.com>
---
 arch/riscv/include/asm/ptdump.h |  42 +++++++++++++
 arch/riscv/mm/ptdump.c          | 102 +++++++++++---------------------
 2 files changed, 77 insertions(+), 67 deletions(-)
 create mode 100644 arch/riscv/include/asm/ptdump.h

diff --git a/arch/riscv/include/asm/ptdump.h b/arch/riscv/include/asm/ptdump.h
new file mode 100644
index 000000000..9173910fa
--- /dev/null
+++ b/arch/riscv/include/asm/ptdump.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_RISCV_PTDUMP_H
+#define _ASM_RISCV_PTDUMP_H
+
+#include <linux/ptdump.h>
+#include <linux/seq_file.h>
+
+struct addr_marker {
+	unsigned long start_address;
+	const char *name;
+};
+
+struct ptdump_prot_bits {
+	u64 mask;
+	const char *set;
+	const char *clear;
+};
+
+struct ptdump_pg_level {
+	const struct ptdump_prot_bits *bits;
+	const char *name;
+	u64 mask;
+	int num;
+};
+
+struct ptdump_pg_state {
+	struct ptdump_state ptdump;
+	struct seq_file *seq;
+	const struct addr_marker *marker;
+	const struct ptdump_pg_level *pg_level;
+	unsigned long start_address;
+	unsigned long start_pa;
+	unsigned long last_pa;
+	int level;
+	u64 current_prot;
+	bool check_wx;
+	unsigned long wx_pages;
+};
+
+void note_page(struct ptdump_state *pt_st, unsigned long addr, int level, u64 val);
+
+#endif /* _ASM_RISCV_PTDUMP_H */
diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index f4b4a9fcb..7e55656cb 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -11,6 +11,7 @@
 #include <linux/ptdump.h>
 
 #include <linux/pgtable.h>
+#include <asm/ptdump.h>
 #include <asm/kasan.h>
 
 #define pt_dump_seq_printf(m, fmt, args...)	\
@@ -25,31 +26,6 @@
 		seq_puts(m, fmt);	\
 })
 
-/*
- * The page dumper groups page table entries of the same type into a single
- * description. It uses pg_state to track the range information while
- * iterating over the pte entries. When the continuity is broken it then
- * dumps out a description of the range.
- */
-struct pg_state {
-	struct ptdump_state ptdump;
-	struct seq_file *seq;
-	const struct addr_marker *marker;
-	unsigned long start_address;
-	unsigned long start_pa;
-	unsigned long last_pa;
-	int level;
-	u64 current_prot;
-	bool check_wx;
-	unsigned long wx_pages;
-};
-
-/* Address marker */
-struct addr_marker {
-	unsigned long start_address;
-	const char *name;
-};
-
 /* Private information for debugfs */
 struct ptd_mm_info {
 	struct mm_struct		*mm;
@@ -126,14 +102,7 @@ static struct ptd_mm_info efi_ptd_info = {
 };
 #endif
 
-/* Page Table Entry */
-struct prot_bits {
-	u64 mask;
-	const char *set;
-	const char *clear;
-};
-
-static const struct prot_bits pte_bits[] = {
+static const struct ptdump_prot_bits pte_bits[] = {
 	{
 #ifdef CONFIG_64BIT
 		.mask = _PAGE_NAPOT,
@@ -183,52 +152,48 @@ static const struct prot_bits pte_bits[] = {
 	}
 };
 
-/* Page Level */
-struct pg_level {
-	const char *name;
-	u64 mask;
-};
-
-static struct pg_level pg_level[] = {
+static struct ptdump_pg_level kernel_pg_levels[] = {
 	{ /* pgd */
 		.name = "PGD",
 	}, { /* p4d */
-		.name = (CONFIG_PGTABLE_LEVELS > 4) ? "P4D" : "PGD",
+		.name = "P4D",
 	}, { /* pud */
-		.name = (CONFIG_PGTABLE_LEVELS > 3) ? "PUD" : "PGD",
+		.name = "PUD",
 	}, { /* pmd */
-		.name = (CONFIG_PGTABLE_LEVELS > 2) ? "PMD" : "PGD",
+		.name = "PMD",
 	}, { /* pte */
 		.name = "PTE",
 	},
 };
 
-static void dump_prot(struct pg_state *st)
+static void dump_prot(struct ptdump_pg_state *st)
 {
 	unsigned int i;
+	const struct ptdump_pg_level *lvl = &st->pg_level[st->level];
+	const struct ptdump_prot_bits *bits = lvl->bits;
 
-	for (i = 0; i < ARRAY_SIZE(pte_bits); i++) {
+	for (i = 0; i < lvl->num; i++) {
 		char s[7];
 		unsigned long val;
 
-		val = st->current_prot & pte_bits[i].mask;
+		val = st->current_prot & bits[i].mask;
 		if (val) {
-			if (pte_bits[i].mask == _PAGE_SOFT)
-				snprintf(s, sizeof(s), pte_bits[i].set, val >> 8);
+			if (bits[i].mask == _PAGE_SOFT)
+				snprintf(s, sizeof(s), bits[i].set, val >> 8);
 #ifdef CONFIG_64BIT
-			else if (pte_bits[i].mask == _PAGE_MTMASK_SVPBMT) {
+			else if (bits[i].mask == _PAGE_MTMASK_SVPBMT) {
 				if (val == _PAGE_NOCACHE_SVPBMT)
-					snprintf(s, sizeof(s), pte_bits[i].set, "NC");
+					snprintf(s, sizeof(s), bits[i].set, "NC");
 				else if (val == _PAGE_IO_SVPBMT)
-					snprintf(s, sizeof(s), pte_bits[i].set, "IO");
+					snprintf(s, sizeof(s), bits[i].set, "IO");
 				else
-					snprintf(s, sizeof(s), pte_bits[i].set, "??");
+					snprintf(s, sizeof(s), bits[i].set, "??");
 			}
 #endif
 			else
-				strscpy(s, pte_bits[i].set);
+				strscpy(s, bits[i].set);
 		} else {
-			strscpy(s, pte_bits[i].clear);
+			strscpy(s, bits[i].clear);
 		}
 
 		pt_dump_seq_printf(st->seq, " %s", s);
@@ -240,7 +205,7 @@ static void dump_prot(struct pg_state *st)
 #else
 #define ADDR_FORMAT	"0x%08lx"
 #endif
-static void dump_addr(struct pg_state *st, unsigned long addr)
+static void dump_addr(struct ptdump_pg_state *st, unsigned long addr)
 {
 	static const char units[] = "KMGTPE";
 	const char *unit = units;
@@ -258,10 +223,10 @@ static void dump_addr(struct pg_state *st, unsigned long addr)
 	}
 
 	pt_dump_seq_printf(st->seq, "%9lu%c %s", delta, *unit,
-			   pg_level[st->level].name);
+			   st->pg_level[st->level].name);
 }
 
-static void note_prot_wx(struct pg_state *st, unsigned long addr)
+static void note_prot_wx(struct ptdump_pg_state *st, unsigned long addr)
 {
 	if (!st->check_wx)
 		return;
@@ -276,15 +241,15 @@ static void note_prot_wx(struct pg_state *st, unsigned long addr)
 	st->wx_pages += (addr - st->start_address) / PAGE_SIZE;
 }
 
-static void note_page(struct ptdump_state *pt_st, unsigned long addr,
-		      int level, u64 val)
+void note_page(struct ptdump_state *pt_st, unsigned long addr,
+	       int level, u64 val)
 {
-	struct pg_state *st = container_of(pt_st, struct pg_state, ptdump);
+	struct ptdump_pg_state *st = container_of(pt_st, struct ptdump_pg_state, ptdump);
 	u64 pa = PFN_PHYS(pte_pfn(__pte(val)));
 	u64 prot = 0;
 
 	if (level >= 0)
-		prot = val & pg_level[level].mask;
+		prot = val & st->pg_level[level].mask;
 
 	if (st->level == -1) {
 		st->level = level;
@@ -317,6 +282,7 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr,
 		st->last_pa = pa;
 	}
 }
+EXPORT_SYMBOL_GPL(note_page);
 
 static void note_page_pte(struct ptdump_state *pt_st, unsigned long addr, pte_t pte)
 {
@@ -352,9 +318,10 @@ static void note_page_flush(struct ptdump_state *pt_st)
 
 static void ptdump_walk(struct seq_file *s, struct ptd_mm_info *pinfo)
 {
-	struct pg_state st = {
+	struct ptdump_pg_state st = {
 		.seq = s,
 		.marker = pinfo->markers,
+		.pg_level = kernel_pg_levels,
 		.level = -1,
 		.ptdump = {
 			.note_page_pte = note_page_pte,
@@ -375,12 +342,13 @@ static void ptdump_walk(struct seq_file *s, struct ptd_mm_info *pinfo)
 
 bool ptdump_check_wx(void)
 {
-	struct pg_state st = {
+	struct ptdump_pg_state st = {
 		.seq = NULL,
 		.marker = (struct addr_marker[]) {
 			{0, NULL},
 			{-1, NULL},
 		},
+		.pg_level = kernel_pg_levels,
 		.level = -1,
 		.check_wx = true,
 		.ptdump = {
@@ -446,12 +414,12 @@ static int __init ptdump_init(void)
 
 	kernel_ptd_info.base_addr = KERN_VIRT_START;
 
-	pg_level[1].name = pgtable_l5_enabled ? "P4D" : "PGD";
-	pg_level[2].name = pgtable_l4_enabled ? "PUD" : "PGD";
+	kernel_pg_levels[1].name = pgtable_l5_enabled ? "P4D" : "PGD";
+	kernel_pg_levels[2].name = pgtable_l4_enabled ? "PUD" : "PGD";
 
-	for (i = 0; i < ARRAY_SIZE(pg_level); i++)
+	for (i = 0; i < ARRAY_SIZE(kernel_pg_levels); i++)
 		for (j = 0; j < ARRAY_SIZE(pte_bits); j++)
-			pg_level[i].mask |= pte_bits[j].mask;
+			kernel_pg_levels[i].mask |= pte_bits[j].mask;
 
 	debugfs_create_file("kernel_page_tables", 0400, NULL, &kernel_ptd_info,
 			    &ptdump_fops);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] KVM: riscv: Register ptdump with debugfs on guest creation
  2026-07-01  8:50 [PATCH 0/2] riscv: ptdump: Refactor for KVM gstage ptdump support Dylan.Wu
  2026-07-01  8:50 ` [PATCH 1/2] riscv: ptdump: Move pagetable definitions to common header Dylan.Wu
@ 2026-07-01  8:50 ` Dylan.Wu
  2026-07-01  9:06   ` sashiko-bot
  1 sibling, 1 reply; 5+ messages in thread
From: Dylan.Wu @ 2026-07-01  8:50 UTC (permalink / raw)
  To: palmer, pjw, aou, anup
  Cc: alex, atish.patra, zhouquan, linux-riscv, kvm, kvm-riscv,
	linux-kernel, Dylan.Wu

Introduce KVM ptdump to show the guest gstage pagetables. This registers
a 'gstage_page_tables' file under the guest debugfs directory.
Userspace can now inspect the gstage layout and permissions, which
is useful for architectural debugging and memory management audits.

Assisted-by: YuanSheng: deepseek-v4-pro
Co-developed-by: Quan Zhou <zhouquan@iscas.ac.cn>
Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Signed-off-by: Dylan.Wu <fredwudi0305@gmail.com>
---
 arch/riscv/include/asm/kvm_host.h |   6 +
 arch/riscv/kvm/Kconfig            |  15 +++
 arch/riscv/kvm/Makefile           |   1 +
 arch/riscv/kvm/ptdump.c           | 178 ++++++++++++++++++++++++++++++
 arch/riscv/kvm/vm.c               |   5 +
 5 files changed, 205 insertions(+)
 create mode 100644 arch/riscv/kvm/ptdump.c

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 60017ceec..04129c5f8 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -322,4 +322,10 @@ void kvm_riscv_vcpu_record_steal_time(struct kvm_vcpu *vcpu);
 /* Flags representing implementation specific details */
 DECLARE_STATIC_KEY_FALSE(kvm_riscv_vsstage_tlb_no_gpa);
 
+#ifdef CONFIG_PTDUMP_GSTAGE_DEBUGFS
+void kvm_s2_ptdump_create_debugfs(struct kvm *kvm);
+#else
+static inline void kvm_s2_ptdump_create_debugfs(struct kvm *kvm) {}
+#endif
+
 #endif /* __RISCV_KVM_HOST_H__ */
diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig
index ec2cee0a3..0ceb4a452 100644
--- a/arch/riscv/kvm/Kconfig
+++ b/arch/riscv/kvm/Kconfig
@@ -38,3 +38,18 @@ config KVM
 	  If unsure, say N.
 
 endif # VIRTUALIZATION
+
+config PTDUMP_GSTAGE_DEBUGFS
+	bool "Present the gstage pagetables to debugfs"
+	depends on KVM
+	depends on DEBUG_KERNEL
+	depends on DEBUG_FS
+	depends on PTDUMP_DEBUGFS
+	default n
+	help
+	  Say Y here if you want to show the RISC-V KVM gstage guest page tables
+	  layout in a debugfs file. This information is primarily useful for
+	  architecture-specific kernel developers and KVM maintainers to
+	  investigate memory mapping and permission issues. It is probably
+	  not a good idea to enable this feature in a production kernel.
+	  If in doubt, say N.
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index 296c2ba05..0170c8c3b 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -42,3 +42,4 @@ kvm-y += vcpu_timer.o
 kvm-y += vcpu_vector.o
 kvm-y += vm.o
 kvm-y += vmid.o
+kvm-$(CONFIG_PTDUMP_GSTAGE_DEBUGFS) += ptdump.o
diff --git a/arch/riscv/kvm/ptdump.c b/arch/riscv/kvm/ptdump.c
new file mode 100644
index 000000000..972d45d69
--- /dev/null
+++ b/arch/riscv/kvm/ptdump.c
@@ -0,0 +1,178 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Debug helper used to dump the gstage pagetables of the system.
+ */
+#include <linux/debugfs.h>
+#include <linux/kvm_host.h>
+#include <linux/seq_file.h>
+#include <linux/mm.h>
+#include <asm/ptdump.h>
+#include <asm/pgtable.h>
+
+static const struct ptdump_prot_bits gstage_pte_bits[] = {
+	{
+		.mask = _PAGE_SOFT,
+		.set = "RSW(%d)",
+		.clear = "  ..  ",
+	}, {
+		.mask = _PAGE_DIRTY,
+		.set = "D",
+		.clear = ".",
+	}, {
+		.mask = _PAGE_ACCESSED,
+		.set = "A",
+		.clear = ".",
+	}, {
+		.mask = _PAGE_USER,
+		.set = "U",
+		.clear = ".",
+	}, {
+		.mask = _PAGE_EXEC,
+		.set = "X",
+		.clear = ".",
+	}, {
+		.mask = _PAGE_WRITE,
+		.set = "W",
+		.clear = ".",
+	}, {
+		.mask = _PAGE_READ,
+		.set = "R",
+		.clear = ".",
+	}, {
+		.mask = _PAGE_PRESENT,
+		.set = "V",
+		.clear = ".",
+	}
+};
+
+static struct ptdump_pg_level gstage_pg_levels[] = {
+	{ .name = "PGD" },
+	{ .name = "P4D" },
+	{ .name = "PUD" },
+	{ .name = "PMD" },
+	{ .name = "PTE" },
+};
+
+struct kvm_ptdump_state {
+	struct kvm *kvm;
+	struct ptdump_pg_state parser_state;
+	struct addr_marker marker[2];
+	struct ptdump_range range[2];
+};
+
+static void kvm_ptdump_walk_level(struct ptdump_state *pt_st,
+				  unsigned long *tbl, int level,
+				  unsigned long start_addr)
+{
+	unsigned long addr = start_addr;
+	unsigned long next, virt_addr;
+	int i;
+	unsigned long step = 1UL << (PAGE_SHIFT + (4 - level) * 9);
+
+	for (i = 0; i < PTRS_PER_PTE; i++, addr += step) {
+		unsigned long val = tbl[i];
+
+		next = addr + step;
+
+		if (level == 4 || (val & _PAGE_LEAF) || !(val & _PAGE_PRESENT)) {
+			note_page(pt_st, addr, level, val);
+		} else {
+			unsigned long pa = (val >> _PAGE_PFN_SHIFT) << PAGE_SHIFT;
+
+			virt_addr = (unsigned long)phys_to_virt(pa);
+
+			kvm_ptdump_walk_level(pt_st, (unsigned long *)virt_addr,
+					      level + 1, addr);
+		}
+	}
+}
+
+static int kvm_ptdump_visitor(struct seq_file *m, void *v)
+{
+	struct kvm_ptdump_state *st = m->private;
+	struct kvm *kvm = st->kvm;
+	unsigned long *pgd = (unsigned long *)kvm->arch.pgd;
+	int start_level = 5 - kvm->arch.pgd_levels;
+	int i, j;
+
+	st->parser_state.level = -1;
+	st->parser_state.start_address = 0;
+	st->parser_state.seq = m;
+
+	for (i = 0; i < ARRAY_SIZE(gstage_pg_levels); i++) {
+		gstage_pg_levels[i].bits = gstage_pte_bits;
+		gstage_pg_levels[i].num = ARRAY_SIZE(gstage_pte_bits);
+		gstage_pg_levels[i].mask = 0;
+		for (j = 0; j < ARRAY_SIZE(gstage_pte_bits); j++)
+			gstage_pg_levels[i].mask |= gstage_pte_bits[j].mask;
+	}
+
+	read_lock(&kvm->mmu_lock);
+	if (pgd) {
+		kvm_ptdump_walk_level(&st->parser_state.ptdump, pgd,
+			start_level, 0);
+	}
+	read_unlock(&kvm->mmu_lock);
+
+	note_page(&st->parser_state.ptdump, 0, -1, 0);
+	return 0;
+}
+
+static int kvm_ptdump_open(struct inode *inode, struct file *file)
+{
+	struct kvm *kvm = inode->i_private;
+	struct kvm_ptdump_state *st;
+	int ret;
+
+	if (!kvm_get_kvm_safe(kvm))
+		return -ENOENT;
+
+	st = kzalloc(sizeof(*st), GFP_KERNEL);
+	if (!st) {
+		kvm_put_kvm(kvm);
+		return -ENOMEM;
+	}
+
+	st->kvm = kvm;
+	st->marker[0].name = "Guest IPA";
+	st->marker[0].start_address = 0;
+	st->marker[1].start_address = -1UL;
+	st->range[0].start = 0;
+	st->range[0].end = -1UL;
+
+	st->parser_state.marker = st->marker;
+	st->parser_state.pg_level = gstage_pg_levels;
+	st->parser_state.ptdump.range = st->range;
+
+	ret = single_open(file, kvm_ptdump_visitor, st);
+	if (ret) {
+		kfree(st);
+		kvm_put_kvm(kvm);
+	}
+	return ret;
+}
+
+static int kvm_ptdump_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *seq = file->private_data;
+	struct kvm_ptdump_state *st = seq->private;
+	struct kvm *kvm = st->kvm;
+
+	kfree(st);
+	kvm_put_kvm(kvm);
+	return single_release(inode, file);
+}
+
+static const struct file_operations kvm_gstage_fops = {
+	.owner		= THIS_MODULE,
+	.open		= kvm_ptdump_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= kvm_ptdump_release,
+};
+
+void kvm_s2_ptdump_create_debugfs(struct kvm *kvm)
+{
+	debugfs_create_file("gstage_page_tables", 0400, kvm->debugfs_dentry, kvm,
+			    &kvm_gstage_fops);
+}
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index a9f083fee..464ad2eaf 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -269,3 +269,8 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
 {
 	return -EINVAL;
 }
+
+void kvm_arch_create_vm_debugfs(struct kvm *kvm)
+{
+	kvm_s2_ptdump_create_debugfs(kvm);
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] riscv: ptdump: Move pagetable definitions to common header
  2026-07-01  8:50 ` [PATCH 1/2] riscv: ptdump: Move pagetable definitions to common header Dylan.Wu
@ 2026-07-01  9:01   ` sashiko-bot
  0 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-07-01  9:01 UTC (permalink / raw)
  To: Dylan.Wu; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Uninitialized `bits` and `num` fields in `kernel_pg_levels` completely suppress page table attribute output in debugfs.
- [Low] Folded PMD levels on 32-bit RISC-V (RV32) will incorrectly display as 'PMD' instead of 'PGD'.
--

commit f96cd5eefaf170c680412811c0fe9c407b08f9a0
Author: Dylan.Wu <fredwudi0305@gmail.com>

riscv: ptdump: Move pagetable definitions to common header

Move the pagetable walking state and level definitions to a new
arch/riscv/include/asm/ptdump.h header to allow reuse of the attribute parsing
logic. The attribute parsing logic in dump_prot() is updated to use attribute
bits defined within the pg_level structure rather than a global array.

> diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
> index f4b4a9fcbbd88..7e55656cbbc92 100644
> --- a/arch/riscv/mm/ptdump.c
> +++ b/arch/riscv/mm/ptdump.c

[ ... ]

> @@ -183,52 +152,48 @@ static const struct prot_bits pte_bits[] = {
>  	}
>  };
>  
> -/* Page Level */
> -struct pg_level {
> -	const char *name;
> -	u64 mask;
> -};
> -
> -static struct pg_level pg_level[] = {
> +static struct ptdump_pg_level kernel_pg_levels[] = {
>  	{ /* pgd */
>  		.name = "PGD",
>  	}, { /* p4d */
> -		.name = (CONFIG_PGTABLE_LEVELS > 4) ? "P4D" : "PGD",
> +		.name = "P4D",
>  	}, { /* pud */
> -		.name = (CONFIG_PGTABLE_LEVELS > 3) ? "PUD" : "PGD",
> +		.name = "PUD",
>  	}, { /* pmd */
> -		.name = (CONFIG_PGTABLE_LEVELS > 2) ? "PMD" : "PGD",
> +		.name = "PMD",

[Severity: Low]
For 32-bit RISC-V (RV32) architectures that use a 2-level page table where PMD
is folded into PGD, will this cause the top-level entries to incorrectly
display as PMD instead of PGD?

The original code dynamically renamed the PMD level to PGD based on
CONFIG_PGTABLE_LEVELS, but this appears to be omitted from the new code.

>  	}, { /* pte */
>  		.name = "PTE",
>  	},
>  };
>  
> -static void dump_prot(struct pg_state *st)
> +static void dump_prot(struct ptdump_pg_state *st)
>  {
>  	unsigned int i;
> +	const struct ptdump_pg_level *lvl = &st->pg_level[st->level];
> +	const struct ptdump_prot_bits *bits = lvl->bits;
>  
> -	for (i = 0; i < ARRAY_SIZE(pte_bits); i++) {
> +	for (i = 0; i < lvl->num; i++) {

[Severity: Medium]
Will this loop ever execute?

The kernel_pg_levels array is statically initialized without setting the .bits
or .num fields. Since .num is implicitly 0, it looks like this loop is entirely
skipped, which will silently discard printing page table protection attributes
to debugfs.

[ ... ]

> @@ -446,12 +414,12 @@ static int __init ptdump_init(void)
>  
>  	kernel_ptd_info.base_addr = KERN_VIRT_START;
>  
> -	pg_level[1].name = pgtable_l5_enabled ? "P4D" : "PGD";
> -	pg_level[2].name = pgtable_l4_enabled ? "PUD" : "PGD";
> +	kernel_pg_levels[1].name = pgtable_l5_enabled ? "P4D" : "PGD";
> +	kernel_pg_levels[2].name = pgtable_l4_enabled ? "PUD" : "PGD";
>  
> -	for (i = 0; i < ARRAY_SIZE(pg_level); i++)
> +	for (i = 0; i < ARRAY_SIZE(kernel_pg_levels); i++)
>  		for (j = 0; j < ARRAY_SIZE(pte_bits); j++)
> -			pg_level[i].mask |= pte_bits[j].mask;
> +			kernel_pg_levels[i].mask |= pte_bits[j].mask;

Does this dynamic initialization loop need to populate the .bits and .num fields
for kernel_pg_levels as well?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701085030.124579-1-fredwudi0305@gmail.com?part=1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] KVM: riscv: Register ptdump with debugfs on guest creation
  2026-07-01  8:50 ` [PATCH 2/2] KVM: riscv: Register ptdump with debugfs on guest creation Dylan.Wu
@ 2026-07-01  9:06   ` sashiko-bot
  0 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-07-01  9:06 UTC (permalink / raw)
  To: Dylan.Wu; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Lockless reads of the guest PGD pointer (`kvm->arch.pgd`) outside `kvm->mmu_lock` lead to a Use-After-Free (UAF) and potential kernel stack overflow via undefined behavior.
- [High] Unbounded full page table walk under `mmu_lock` combined with `single_open` buffer restarts causes severe lock contention.
- [Medium] Concurrent reads of the debugfs file cause a data race by modifying the global static array `gstage_pg_levels` without synchronization.
- [Medium] The page table walk hardcodes the loop bound to `PTRS_PER_PTE`, silently truncating the guest address space dump for concatenated root tables.
--

commit efa8af5af3c60372dfe6205bc307cb96b4b1cf7a
Author: Dylan.Wu <fredwudi0305@gmail.com>
Subject: KVM: riscv: Register ptdump with debugfs on guest creation

Introduce KVM ptdump to show the guest gstage pagetables. This registers
a 'gstage_page_tables' file under the guest debugfs directory.
Userspace can now inspect the gstage layout and permissions, which
is useful for architectural debugging and memory management audits.

> diff --git a/arch/riscv/kvm/ptdump.c b/arch/riscv/kvm/ptdump.c
> new file mode 100644
> index 0000000000000..972d45d691b2b
> --- /dev/null
> +++ b/arch/riscv/kvm/ptdump.c

[ ... ]

> +static void kvm_ptdump_walk_level(struct ptdump_state *pt_st,
> +				  unsigned long *tbl, int level,
> +				  unsigned long start_addr)
> +{
> +	unsigned long addr = start_addr;
> +	unsigned long next, virt_addr;
> +	int i;
> +	unsigned long step = 1UL << (PAGE_SHIFT + (4 - level) * 9);

[Severity: High]
Can step calculation result in a negative bitwise shift? If level > 4 (which
could occur if pgd_levels is read as 0 due to concurrent VM destruction), this
triggers undefined behavior.

> +
> +	for (i = 0; i < PTRS_PER_PTE; i++, addr += step) {

[Severity: Medium]
Does this hardcoded loop bound truncate the guest address space dump? RISC-V 
KVM uses concatenated page tables for the top-level gstage directory (up to 
2048 entries for a 16KB root table), but this caps iteration at PTRS_PER_PTE.

> +		unsigned long val = tbl[i];

[ ... ]

> +static int kvm_ptdump_visitor(struct seq_file *m, void *v)
> +{
> +	struct kvm_ptdump_state *st = m->private;
> +	struct kvm *kvm = st->kvm;
> +	unsigned long *pgd = (unsigned long *)kvm->arch.pgd;
> +	int start_level = 5 - kvm->arch.pgd_levels;

[Severity: High]
Are these lockless reads of pgd and pgd_levels safe? If the VM is torn down 
concurrently, kvm_riscv_mmu_free_pgd() could clear these fields and free the 
pages. This read happens outside of mmu_lock, which might result in a dangling
pointer being passed to the page table walker.

> +	int i, j;
> +
> +	st->parser_state.level = -1;
> +	st->parser_state.start_address = 0;
> +	st->parser_state.seq = m;
> +
> +	for (i = 0; i < ARRAY_SIZE(gstage_pg_levels); i++) {
> +		gstage_pg_levels[i].bits = gstage_pte_bits;

[Severity: Medium]
Does this loop modify the global static array gstage_pg_levels without any 
synchronization? If multiple threads read the debugfs file concurrently, this
data race could cause corrupted or incomplete page table dumps.

> +		gstage_pg_levels[i].num = ARRAY_SIZE(gstage_pte_bits);
> +		gstage_pg_levels[i].mask = 0;
> +		for (j = 0; j < ARRAY_SIZE(gstage_pte_bits); j++)
> +			gstage_pg_levels[i].mask |= gstage_pte_bits[j].mask;
> +	}
> +
> +	read_lock(&kvm->mmu_lock);
> +	if (pgd) {
> +		kvm_ptdump_walk_level(&st->parser_state.ptdump, pgd,
> +			start_level, 0);
> +	}
> +	read_unlock(&kvm->mmu_lock);

[Severity: High]
Could walking the entire page table under mmu_lock cause severe lock contention
or a soft lockup for the guest VM, especially when combined with the buffer 
restarts from single_open() below?

> +
> +	note_page(&st->parser_state.ptdump, 0, -1, 0);
> +	return 0;
> +}
> +
> +static int kvm_ptdump_open(struct inode *inode, struct file *file)
> +{
> +	struct kvm *kvm = inode->i_private;
> +	struct kvm_ptdump_state *st;
> +	int ret;

[ ... ]

> +	st->parser_state.marker = st->marker;
> +	st->parser_state.pg_level = gstage_pg_levels;
> +	st->parser_state.ptdump.range = st->range;
> +
> +	ret = single_open(file, kvm_ptdump_visitor, st);

[Severity: High]
Does using single_open() here risk excessive retries? Since single_open() 
collects output into a contiguous buffer, if the output exceeds the initial 
buffer size, it discards it, doubles the allocation, and restarts the visitor. 
Each restart would repeat the entire recursive page table walk under the 
mmu_lock.

> +	if (ret) {
> +		kfree(st);
> +		kvm_put_kvm(kvm);
> +	}
> +	return ret;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701085030.124579-1-fredwudi0305@gmail.com?part=2

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-07-01  9:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-01  8:50 [PATCH 0/2] riscv: ptdump: Refactor for KVM gstage ptdump support Dylan.Wu
2026-07-01  8:50 ` [PATCH 1/2] riscv: ptdump: Move pagetable definitions to common header Dylan.Wu
2026-07-01  9:01   ` sashiko-bot
2026-07-01  8:50 ` [PATCH 2/2] KVM: riscv: Register ptdump with debugfs on guest creation Dylan.Wu
2026-07-01  9:06   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox