linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2] mm/ptdump: Take the memory hotplug lock inside ptdump_walk_pgd()
@ 2025-06-20  5:24 Anshuman Khandual
  2025-06-24 13:14 ` Alexander Gordeev
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Anshuman Khandual @ 2025-06-20  5:24 UTC (permalink / raw)
  To: linux-mm
  Cc: dev.jain, Anshuman Khandual, Catalin Marinas, Will Deacon,
	Ryan Roberts, Paul Walmsley, Palmer Dabbelt, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, Andrew Morton,
	linux-arm-kernel, linux-kernel, linux-riscv, linux-s390

Memory hot remove unmaps and tears down various kernel page table regions
as required. The ptdump code can race with concurrent modifications of the
kernel page tables. When leaf entries are modified concurrently, the dump
code may log stale or inconsistent information for a VA range, but this is
otherwise not harmful.

But when intermediate levels of kernel page table are freed, the dump code
will continue to use memory that has been freed and potentially reallocated
for another purpose. In such cases, the ptdump code may dereference bogus
addresses, leading to a number of potential problems.

To avoid the above mentioned race condition, platforms such as arm64, riscv
and s390 take memory hotplug lock, while dumping kernel page table via the
sysfs interface /sys/kernel/debug/kernel_page_tables.

Similar race condition exists while checking for pages that might have been
marked W+X via /sys/kernel/debug/kernel_page_tables/check_wx_pages which in
turn calls ptdump_check_wx(). Instead of solving this race condition again,
let's just move the memory hotplug lock inside generic ptdump_check_wx()
which will benefit both the scenarios.

Drop get_online_mems() and put_online_mems() combination from all existing
platform ptdump code paths.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
CC: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
This patch applies on v6.16-rc2 and has been tested on arm64. Besides it
builds on riscv, s390, x86 and powerpc as well. But should the following
fixes tag from V1 also needs to be around as well ?

Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove")

Changes in V2:

- Moved [get|put]_online_mems() inside generic ptdump_walk_pgd()

Changes in V1:

https://lore.kernel.org/all/20250609041214.285664-1-anshuman.khandual@arm.com/

 arch/arm64/mm/ptdump_debugfs.c | 3 ---
 arch/riscv/mm/ptdump.c         | 3 ---
 arch/s390/mm/dump_pagetables.c | 2 --
 mm/ptdump.c                    | 2 ++
 4 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c
index 68bf1a125502d..1e308328c0796 100644
--- a/arch/arm64/mm/ptdump_debugfs.c
+++ b/arch/arm64/mm/ptdump_debugfs.c
@@ -1,6 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/debugfs.h>
-#include <linux/memory_hotplug.h>
 #include <linux/seq_file.h>
 
 #include <asm/ptdump.h>
@@ -9,9 +8,7 @@ static int ptdump_show(struct seq_file *m, void *v)
 {
 	struct ptdump_info *info = m->private;
 
-	get_online_mems();
 	ptdump_walk(m, info);
-	put_online_mems();
 	return 0;
 }
 DEFINE_SHOW_ATTRIBUTE(ptdump);
diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index 32922550a50a3..3b51690cc8760 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -6,7 +6,6 @@
 #include <linux/efi.h>
 #include <linux/init.h>
 #include <linux/debugfs.h>
-#include <linux/memory_hotplug.h>
 #include <linux/seq_file.h>
 #include <linux/ptdump.h>
 
@@ -413,9 +412,7 @@ bool ptdump_check_wx(void)
 
 static int ptdump_show(struct seq_file *m, void *v)
 {
-	get_online_mems();
 	ptdump_walk(m, m->private);
-	put_online_mems();
 
 	return 0;
 }
diff --git a/arch/s390/mm/dump_pagetables.c b/arch/s390/mm/dump_pagetables.c
index ac604b1766609..9af2aae0a5152 100644
--- a/arch/s390/mm/dump_pagetables.c
+++ b/arch/s390/mm/dump_pagetables.c
@@ -247,11 +247,9 @@ static int ptdump_show(struct seq_file *m, void *v)
 		.marker = markers,
 	};
 
-	get_online_mems();
 	mutex_lock(&cpa_mutex);
 	ptdump_walk_pgd(&st.ptdump, &init_mm, NULL);
 	mutex_unlock(&cpa_mutex);
-	put_online_mems();
 	return 0;
 }
 DEFINE_SHOW_ATTRIBUTE(ptdump);
diff --git a/mm/ptdump.c b/mm/ptdump.c
index 9374f29cdc6f8..0a6965e2e7fa6 100644
--- a/mm/ptdump.c
+++ b/mm/ptdump.c
@@ -175,6 +175,7 @@ void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm, pgd_t *pgd)
 {
 	const struct ptdump_range *range = st->range;
 
+	get_online_mems();
 	mmap_write_lock(mm);
 	while (range->start != range->end) {
 		walk_page_range_novma(mm, range->start, range->end,
@@ -182,6 +183,7 @@ void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm, pgd_t *pgd)
 		range++;
 	}
 	mmap_write_unlock(mm);
+	put_online_mems();
 
 	/* Flush out the last page */
 	st->note_page_flush(st);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH V2] mm/ptdump: Take the memory hotplug lock inside ptdump_walk_pgd()
  2025-06-20  5:24 [PATCH V2] mm/ptdump: Take the memory hotplug lock inside ptdump_walk_pgd() Anshuman Khandual
@ 2025-06-24 13:14 ` Alexander Gordeev
  2025-06-24 13:24 ` Dev Jain
  2025-06-24 14:59 ` David Hildenbrand
  2 siblings, 0 replies; 4+ messages in thread
From: Alexander Gordeev @ 2025-06-24 13:14 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-mm, dev.jain, Catalin Marinas, Will Deacon, Ryan Roberts,
	Paul Walmsley, Palmer Dabbelt, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Sven Schnelle,
	Andrew Morton, linux-arm-kernel, linux-kernel, linux-riscv,
	linux-s390

On Fri, Jun 20, 2025 at 10:54:27AM +0530, Anshuman Khandual wrote:
> Memory hot remove unmaps and tears down various kernel page table regions
> as required. The ptdump code can race with concurrent modifications of the
> kernel page tables. When leaf entries are modified concurrently, the dump
> code may log stale or inconsistent information for a VA range, but this is
> otherwise not harmful.
> 
> But when intermediate levels of kernel page table are freed, the dump code
> will continue to use memory that has been freed and potentially reallocated
> for another purpose. In such cases, the ptdump code may dereference bogus
> addresses, leading to a number of potential problems.
> 
> To avoid the above mentioned race condition, platforms such as arm64, riscv
> and s390 take memory hotplug lock, while dumping kernel page table via the
> sysfs interface /sys/kernel/debug/kernel_page_tables.
> 
> Similar race condition exists while checking for pages that might have been
> marked W+X via /sys/kernel/debug/kernel_page_tables/check_wx_pages which in
> turn calls ptdump_check_wx(). Instead of solving this race condition again,
> let's just move the memory hotplug lock inside generic ptdump_check_wx()
> which will benefit both the scenarios.
> 
> Drop get_online_mems() and put_online_mems() combination from all existing
> platform ptdump code paths.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Paul Walmsley <paul.walmsley@sifive.com>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>
> Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> CC: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-riscv@lists.infradead.org
> Cc: linux-s390@vger.kernel.org
> Cc: linux-mm@kvack.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> This patch applies on v6.16-rc2 and has been tested on arm64. Besides it
> builds on riscv, s390, x86 and powerpc as well. But should the following
> fixes tag from V1 also needs to be around as well ?
> 
> Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove")
> 
> Changes in V2:
> 
> - Moved [get|put]_online_mems() inside generic ptdump_walk_pgd()
> 
> Changes in V1:
> 
> https://lore.kernel.org/all/20250609041214.285664-1-anshuman.khandual@arm.com/
> 
>  arch/arm64/mm/ptdump_debugfs.c | 3 ---
>  arch/riscv/mm/ptdump.c         | 3 ---
>  arch/s390/mm/dump_pagetables.c | 2 --
>  mm/ptdump.c                    | 2 ++
>  4 files changed, 2 insertions(+), 8 deletions(-)

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH V2] mm/ptdump: Take the memory hotplug lock inside ptdump_walk_pgd()
  2025-06-20  5:24 [PATCH V2] mm/ptdump: Take the memory hotplug lock inside ptdump_walk_pgd() Anshuman Khandual
  2025-06-24 13:14 ` Alexander Gordeev
@ 2025-06-24 13:24 ` Dev Jain
  2025-06-24 14:59 ` David Hildenbrand
  2 siblings, 0 replies; 4+ messages in thread
From: Dev Jain @ 2025-06-24 13:24 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: Catalin Marinas, Will Deacon, Ryan Roberts, Paul Walmsley,
	Palmer Dabbelt, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, Andrew Morton, linux-arm-kernel, linux-kernel,
	linux-riscv, linux-s390


On 20/06/25 10:54 am, Anshuman Khandual wrote:
> Memory hot remove unmaps and tears down various kernel page table regions
> as required. The ptdump code can race with concurrent modifications of the
> kernel page tables. When leaf entries are modified concurrently, the dump
> code may log stale or inconsistent information for a VA range, but this is
> otherwise not harmful.
>
> But when intermediate levels of kernel page table are freed, the dump code
> will continue to use memory that has been freed and potentially reallocated
> for another purpose. In such cases, the ptdump code may dereference bogus
> addresses, leading to a number of potential problems.
>
> To avoid the above mentioned race condition, platforms such as arm64, riscv
> and s390 take memory hotplug lock, while dumping kernel page table via the
> sysfs interface /sys/kernel/debug/kernel_page_tables.
>
> Similar race condition exists while checking for pages that might have been
> marked W+X via /sys/kernel/debug/kernel_page_tables/check_wx_pages which in
> turn calls ptdump_check_wx(). Instead of solving this race condition again,
> let's just move the memory hotplug lock inside generic ptdump_check_wx()
> which will benefit both the scenarios.
>
> Drop get_online_mems() and put_online_mems() combination from all existing
> platform ptdump code paths.
>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Paul Walmsley <paul.walmsley@sifive.com>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>
> Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> CC: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-riscv@lists.infradead.org
> Cc: linux-s390@vger.kernel.org
> Cc: linux-mm@kvack.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---

Reviewed-by: Dev Jain <dev.jain@arm.com>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH V2] mm/ptdump: Take the memory hotplug lock inside ptdump_walk_pgd()
  2025-06-20  5:24 [PATCH V2] mm/ptdump: Take the memory hotplug lock inside ptdump_walk_pgd() Anshuman Khandual
  2025-06-24 13:14 ` Alexander Gordeev
  2025-06-24 13:24 ` Dev Jain
@ 2025-06-24 14:59 ` David Hildenbrand
  2 siblings, 0 replies; 4+ messages in thread
From: David Hildenbrand @ 2025-06-24 14:59 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: dev.jain, Catalin Marinas, Will Deacon, Ryan Roberts,
	Paul Walmsley, Palmer Dabbelt, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, Andrew Morton, linux-arm-kernel, linux-kernel,
	linux-riscv, linux-s390

On 20.06.25 07:24, Anshuman Khandual wrote:
> Memory hot remove unmaps and tears down various kernel page table regions
> as required. The ptdump code can race with concurrent modifications of the
> kernel page tables. When leaf entries are modified concurrently, the dump
> code may log stale or inconsistent information for a VA range, but this is
> otherwise not harmful.
> 
> But when intermediate levels of kernel page table are freed, the dump code
> will continue to use memory that has been freed and potentially reallocated
> for another purpose. In such cases, the ptdump code may dereference bogus
> addresses, leading to a number of potential problems.
> 
> To avoid the above mentioned race condition, platforms such as arm64, riscv
> and s390 take memory hotplug lock, while dumping kernel page table via the
> sysfs interface /sys/kernel/debug/kernel_page_tables.
> 
> Similar race condition exists while checking for pages that might have been
> marked W+X via /sys/kernel/debug/kernel_page_tables/check_wx_pages which in
> turn calls ptdump_check_wx(). Instead of solving this race condition again,
> let's just move the memory hotplug lock inside generic ptdump_check_wx()
> which will benefit both the scenarios.
> 
> Drop get_online_mems() and put_online_mems() combination from all existing
> platform ptdump code paths.

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-06-24 14:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-20  5:24 [PATCH V2] mm/ptdump: Take the memory hotplug lock inside ptdump_walk_pgd() Anshuman Khandual
2025-06-24 13:14 ` Alexander Gordeev
2025-06-24 13:24 ` Dev Jain
2025-06-24 14:59 ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).