linux-s390.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/1] kasan: Avoid sleepable page allocation from atomic context
@ 2025-05-07 12:48 Alexander Gordeev
  2025-05-07 12:48 ` [PATCH v5 1/1] " Alexander Gordeev
  0 siblings, 1 reply; 4+ messages in thread
From: Alexander Gordeev @ 2025-05-07 12:48 UTC (permalink / raw)
  To: Andrew Morton, Andrey Ryabinin, Daniel Axtens
  Cc: linux-kernel, linux-mm, kasan-dev, linux-s390, stable

Hi All,

Chages since v4:
- unused pages leak is avoided

Chages since v3:
- pfn_to_virt() changed to page_to_virt() due to compile error

Chages since v2:
- page allocation moved out of the atomic context

Chages since v1:
- Fixes: and -stable tags added to the patch description

Thanks!

Alexander Gordeev (1):
  kasan: Avoid sleepable page allocation from atomic context

 mm/kasan/shadow.c | 77 ++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 63 insertions(+), 14 deletions(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v5 1/1] kasan: Avoid sleepable page allocation from atomic context
  2025-05-07 12:48 [PATCH v5 0/1] kasan: Avoid sleepable page allocation from atomic context Alexander Gordeev
@ 2025-05-07 12:48 ` Alexander Gordeev
  2025-05-08  0:05   ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Alexander Gordeev @ 2025-05-07 12:48 UTC (permalink / raw)
  To: Andrew Morton, Andrey Ryabinin, Daniel Axtens
  Cc: linux-kernel, linux-mm, kasan-dev, linux-s390, stable

apply_to_pte_range() enters the lazy MMU mode and then invokes
kasan_populate_vmalloc_pte() callback on each page table walk
iteration. However, the callback can go into sleep when trying
to allocate a single page, e.g. if an architecutre disables
preemption on lazy MMU mode enter.

On s390 if make arch_enter_lazy_mmu_mode() -> preempt_enable()
and arch_leave_lazy_mmu_mode() -> preempt_disable(), such crash
occurs:

    [  553.332108] preempt_count: 1, expected: 0
    [  553.332117] no locks held by multipathd/2116.
    [  553.332128] CPU: 24 PID: 2116 Comm: multipathd Kdump: loaded Tainted:
    [  553.332139] Hardware name: IBM 3931 A01 701 (LPAR)
    [  553.332146] Call Trace:
    [  553.332152]  [<00000000158de23a>] dump_stack_lvl+0xfa/0x150
    [  553.332167]  [<0000000013e10d12>] __might_resched+0x57a/0x5e8
    [  553.332178]  [<00000000144eb6c2>] __alloc_pages+0x2ba/0x7c0
    [  553.332189]  [<00000000144d5cdc>] __get_free_pages+0x2c/0x88
    [  553.332198]  [<00000000145663f6>] kasan_populate_vmalloc_pte+0x4e/0x110
    [  553.332207]  [<000000001447625c>] apply_to_pte_range+0x164/0x3c8
    [  553.332218]  [<000000001448125a>] apply_to_pmd_range+0xda/0x318
    [  553.332226]  [<000000001448181c>] __apply_to_page_range+0x384/0x768
    [  553.332233]  [<0000000014481c28>] apply_to_page_range+0x28/0x38
    [  553.332241]  [<00000000145665da>] kasan_populate_vmalloc+0x82/0x98
    [  553.332249]  [<00000000144c88d0>] alloc_vmap_area+0x590/0x1c90
    [  553.332257]  [<00000000144ca108>] __get_vm_area_node.constprop.0+0x138/0x260
    [  553.332265]  [<00000000144d17fc>] __vmalloc_node_range+0x134/0x360
    [  553.332274]  [<0000000013d5dbf2>] alloc_thread_stack_node+0x112/0x378
    [  553.332284]  [<0000000013d62726>] dup_task_struct+0x66/0x430
    [  553.332293]  [<0000000013d63962>] copy_process+0x432/0x4b80
    [  553.332302]  [<0000000013d68300>] kernel_clone+0xf0/0x7d0
    [  553.332311]  [<0000000013d68bd6>] __do_sys_clone+0xae/0xc8
    [  553.332400]  [<0000000013d68dee>] __s390x_sys_clone+0xd6/0x118
    [  553.332410]  [<0000000013c9d34c>] do_syscall+0x22c/0x328
    [  553.332419]  [<00000000158e7366>] __do_syscall+0xce/0xf0
    [  553.332428]  [<0000000015913260>] system_call+0x70/0x98

Instead of allocating single pages per-PTE, bulk-allocate the
shadow memory prior to applying kasan_populate_vmalloc_pte()
callback on a page range.

Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: stable@vger.kernel.org
Fixes: 3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory")

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
---
 mm/kasan/shadow.c | 77 ++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 63 insertions(+), 14 deletions(-)

diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 88d1c9dcb507..660cc2148575 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -292,30 +292,81 @@ void __init __weak kasan_populate_early_vm_area_shadow(void *start,
 {
 }
 
+struct vmalloc_populate_data {
+	unsigned long start;
+	struct page **pages;
+};
+
 static int kasan_populate_vmalloc_pte(pte_t *ptep, unsigned long addr,
-				      void *unused)
+				      void *_data)
 {
-	unsigned long page;
+	struct vmalloc_populate_data *data = _data;
+	struct page *page;
 	pte_t pte;
+	int index;
 
 	if (likely(!pte_none(ptep_get(ptep))))
 		return 0;
 
-	page = __get_free_page(GFP_KERNEL);
-	if (!page)
-		return -ENOMEM;
-
-	__memset((void *)page, KASAN_VMALLOC_INVALID, PAGE_SIZE);
-	pte = pfn_pte(PFN_DOWN(__pa(page)), PAGE_KERNEL);
+	index = PFN_DOWN(addr - data->start);
+	page = data->pages[index];
+	__memset(page_to_virt(page), KASAN_VMALLOC_INVALID, PAGE_SIZE);
+	pte = pfn_pte(page_to_pfn(page), PAGE_KERNEL);
 
 	spin_lock(&init_mm.page_table_lock);
 	if (likely(pte_none(ptep_get(ptep)))) {
 		set_pte_at(&init_mm, addr, ptep, pte);
-		page = 0;
+		data->pages[index] = NULL;
 	}
 	spin_unlock(&init_mm.page_table_lock);
-	if (page)
-		free_page(page);
+
+	return 0;
+}
+
+static inline void free_pages_bulk(struct page **pages, int nr_pages)
+{
+	int i;
+
+	for (i = 0; i < nr_pages; i++) {
+		if (pages[i]) {
+			__free_pages(pages[i], 0);
+			pages[i] = NULL;
+		}
+	}
+}
+
+static int __kasan_populate_vmalloc(unsigned long start, unsigned long end)
+{
+	unsigned long nr_populated, nr_pages, nr_total = PFN_UP(end - start);
+	struct vmalloc_populate_data data;
+	int ret;
+
+	data.pages = (struct page **)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+	if (!data.pages)
+		return -ENOMEM;
+
+	while (nr_total) {
+		nr_pages = min(nr_total, PAGE_SIZE / sizeof(data.pages[0]));
+		nr_populated = alloc_pages_bulk(GFP_KERNEL, nr_pages, data.pages);
+		if (nr_populated != nr_pages) {
+			free_pages_bulk(data.pages, nr_populated);
+			free_page((unsigned long)data.pages);
+			return -ENOMEM;
+		}
+
+		data.start = start;
+		ret = apply_to_page_range(&init_mm, start, nr_pages * PAGE_SIZE,
+					  kasan_populate_vmalloc_pte, &data);
+		free_pages_bulk(data.pages, nr_pages);
+		if (ret)
+			return ret;
+
+		start += nr_pages * PAGE_SIZE;
+		nr_total -= nr_pages;
+	}
+
+	free_page((unsigned long)data.pages);
+
 	return 0;
 }
 
@@ -348,9 +399,7 @@ int kasan_populate_vmalloc(unsigned long addr, unsigned long size)
 	shadow_start = PAGE_ALIGN_DOWN(shadow_start);
 	shadow_end = PAGE_ALIGN(shadow_end);
 
-	ret = apply_to_page_range(&init_mm, shadow_start,
-				  shadow_end - shadow_start,
-				  kasan_populate_vmalloc_pte, NULL);
+	ret = __kasan_populate_vmalloc(shadow_start, shadow_end);
 	if (ret)
 		return ret;
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v5 1/1] kasan: Avoid sleepable page allocation from atomic context
  2025-05-07 12:48 ` [PATCH v5 1/1] " Alexander Gordeev
@ 2025-05-08  0:05   ` Andrew Morton
  2025-05-08 14:20     ` Alexander Gordeev
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2025-05-08  0:05 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Andrey Ryabinin, Daniel Axtens, linux-kernel, linux-mm, kasan-dev,
	linux-s390, stable

On Wed,  7 May 2025 14:48:03 +0200 Alexander Gordeev <agordeev@linux.ibm.com> wrote:

> apply_to_pte_range() enters the lazy MMU mode and then invokes
> kasan_populate_vmalloc_pte() callback on each page table walk
> iteration. However, the callback can go into sleep when trying
> to allocate a single page, e.g. if an architecutre disables
> preemption on lazy MMU mode enter.
> 
> On s390 if make arch_enter_lazy_mmu_mode() -> preempt_enable()
> and arch_leave_lazy_mmu_mode() -> preempt_disable(), such crash
> occurs:
> 
>     [  553.332108] preempt_count: 1, expected: 0
>     [  553.332117] no locks held by multipathd/2116.
>     [  553.332128] CPU: 24 PID: 2116 Comm: multipathd Kdump: loaded Tainted:
>     [  553.332139] Hardware name: IBM 3931 A01 701 (LPAR)
>     [  553.332146] Call Trace:
>     [  553.332152]  [<00000000158de23a>] dump_stack_lvl+0xfa/0x150
>     [  553.332167]  [<0000000013e10d12>] __might_resched+0x57a/0x5e8
>     [  553.332178]  [<00000000144eb6c2>] __alloc_pages+0x2ba/0x7c0
>     [  553.332189]  [<00000000144d5cdc>] __get_free_pages+0x2c/0x88
>     [  553.332198]  [<00000000145663f6>] kasan_populate_vmalloc_pte+0x4e/0x110
>     [  553.332207]  [<000000001447625c>] apply_to_pte_range+0x164/0x3c8
>     [  553.332218]  [<000000001448125a>] apply_to_pmd_range+0xda/0x318
>     [  553.332226]  [<000000001448181c>] __apply_to_page_range+0x384/0x768
>     [  553.332233]  [<0000000014481c28>] apply_to_page_range+0x28/0x38
>     [  553.332241]  [<00000000145665da>] kasan_populate_vmalloc+0x82/0x98
>     [  553.332249]  [<00000000144c88d0>] alloc_vmap_area+0x590/0x1c90
>     [  553.332257]  [<00000000144ca108>] __get_vm_area_node.constprop.0+0x138/0x260
>     [  553.332265]  [<00000000144d17fc>] __vmalloc_node_range+0x134/0x360
>     [  553.332274]  [<0000000013d5dbf2>] alloc_thread_stack_node+0x112/0x378
>     [  553.332284]  [<0000000013d62726>] dup_task_struct+0x66/0x430
>     [  553.332293]  [<0000000013d63962>] copy_process+0x432/0x4b80
>     [  553.332302]  [<0000000013d68300>] kernel_clone+0xf0/0x7d0
>     [  553.332311]  [<0000000013d68bd6>] __do_sys_clone+0xae/0xc8
>     [  553.332400]  [<0000000013d68dee>] __s390x_sys_clone+0xd6/0x118
>     [  553.332410]  [<0000000013c9d34c>] do_syscall+0x22c/0x328
>     [  553.332419]  [<00000000158e7366>] __do_syscall+0xce/0xf0
>     [  553.332428]  [<0000000015913260>] system_call+0x70/0x98

Is this a crash, or a warning?  From the description I suspect it was a
sleep-while-atomic warning?

Can we please have the complete dmesg output?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v5 1/1] kasan: Avoid sleepable page allocation from atomic context
  2025-05-08  0:05   ` Andrew Morton
@ 2025-05-08 14:20     ` Alexander Gordeev
  0 siblings, 0 replies; 4+ messages in thread
From: Alexander Gordeev @ 2025-05-08 14:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrey Ryabinin, Daniel Axtens, linux-kernel, linux-mm, kasan-dev,
	linux-s390, stable

On Wed, May 07, 2025 at 05:05:54PM -0700, Andrew Morton wrote:
> Is this a crash, or a warning?  From the description I suspect it was a
> sleep-while-atomic warning?

Correct, that is a complaint printed by __might_resched()

> Can we please have the complete dmesg output?

I posted v6 with this output:

[    0.663336] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321
[    0.663348] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
[    0.663358] preempt_count: 1, expected: 0
[    0.663366] RCU nest depth: 0, expected: 0
[    0.663375] no locks held by kthreadd/2.
[    0.663383] Preemption disabled at:
[    0.663386] [<0002f3284cbb4eda>] apply_to_pte_range+0xfa/0x4a0
[    0.663405] CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted 6.15.0-rc5-gcc-kasan-00043-gd76bb1ebb558-dirty #162 PREEMPT 
[    0.663408] Hardware name: IBM 3931 A01 701 (KVM/Linux)
[    0.663409] Call Trace:
[    0.663410]  [<0002f3284c385f58>] dump_stack_lvl+0xe8/0x140 
[    0.663413]  [<0002f3284c507b9e>] __might_resched+0x66e/0x700 
[    0.663415]  [<0002f3284cc4f6c0>] __alloc_frozen_pages_noprof+0x370/0x4b0 
[    0.663419]  [<0002f3284ccc73c0>] alloc_pages_mpol+0x1a0/0x4a0 
[    0.663421]  [<0002f3284ccc8518>] alloc_frozen_pages_noprof+0x88/0xc0 
[    0.663424]  [<0002f3284ccc8572>] alloc_pages_noprof+0x22/0x120 
[    0.663427]  [<0002f3284cc341ac>] get_free_pages_noprof+0x2c/0xc0 
[    0.663429]  [<0002f3284cceba70>] kasan_populate_vmalloc_pte+0x50/0x120 
[    0.663433]  [<0002f3284cbb4ef8>] apply_to_pte_range+0x118/0x4a0 
[    0.663435]  [<0002f3284cbc7c14>] apply_to_pmd_range+0x194/0x3e0 
[    0.663437]  [<0002f3284cbc99be>] __apply_to_page_range+0x2fe/0x7a0 
[    0.663440]  [<0002f3284cbc9e88>] apply_to_page_range+0x28/0x40 
[    0.663442]  [<0002f3284ccebf12>] kasan_populate_vmalloc+0x82/0xa0 
[    0.663445]  [<0002f3284cc1578c>] alloc_vmap_area+0x34c/0xc10 
[    0.663448]  [<0002f3284cc1c2a6>] __get_vm_area_node+0x186/0x2a0 
[    0.663451]  [<0002f3284cc1e696>] __vmalloc_node_range_noprof+0x116/0x310 
[    0.663454]  [<0002f3284cc1d950>] __vmalloc_node_noprof+0xd0/0x110 
[    0.663457]  [<0002f3284c454b88>] alloc_thread_stack_node+0xf8/0x330 
[    0.663460]  [<0002f3284c458d56>] dup_task_struct+0x66/0x4d0 
[    0.663463]  [<0002f3284c45be90>] copy_process+0x280/0x4b90 
[    0.663465]  [<0002f3284c460940>] kernel_clone+0xd0/0x4b0 
[    0.663467]  [<0002f3284c46115e>] kernel_thread+0xbe/0xe0 
[    0.663469]  [<0002f3284c4e440e>] kthreadd+0x50e/0x7f0 
[    0.663472]  [<0002f3284c38c04a>] __ret_from_fork+0x8a/0xf0 
[    0.663475]  [<0002f3284ed57ff2>] ret_from_fork+0xa/0x38 

Thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-08 14:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-07 12:48 [PATCH v5 0/1] kasan: Avoid sleepable page allocation from atomic context Alexander Gordeev
2025-05-07 12:48 ` [PATCH v5 1/1] " Alexander Gordeev
2025-05-08  0:05   ` Andrew Morton
2025-05-08 14:20     ` Alexander Gordeev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).