* [PATCH v6 0/1] kasan: Avoid sleepable page allocation from atomic context
@ 2025-05-08 14:15 Alexander Gordeev
2025-05-08 14:15 ` [PATCH v6 1/1] " Alexander Gordeev
0 siblings, 1 reply; 4+ messages in thread
From: Alexander Gordeev @ 2025-05-08 14:15 UTC (permalink / raw)
To: Andrew Morton, Andrey Ryabinin, Daniel Axtens
Cc: linux-kernel, linux-mm, kasan-dev, linux-s390, stable
Hi All,
Chages since v5:
- full error message included into commit description
Chages since v4:
- unused pages leak is avoided
Chages since v3:
- pfn_to_virt() changed to page_to_virt() due to compile error
Chages since v2:
- page allocation moved out of the atomic context
Chages since v1:
- Fixes: and -stable tags added to the patch description
Thanks!
Alexander Gordeev (1):
kasan: Avoid sleepable page allocation from atomic context
mm/kasan/shadow.c | 77 ++++++++++++++++++++++++++++++++++++++---------
1 file changed, 63 insertions(+), 14 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v6 1/1] kasan: Avoid sleepable page allocation from atomic context
2025-05-08 14:15 [PATCH v6 0/1] kasan: Avoid sleepable page allocation from atomic context Alexander Gordeev
@ 2025-05-08 14:15 ` Alexander Gordeev
2025-05-09 10:05 ` Harry Yoo
0 siblings, 1 reply; 4+ messages in thread
From: Alexander Gordeev @ 2025-05-08 14:15 UTC (permalink / raw)
To: Andrew Morton, Andrey Ryabinin, Daniel Axtens
Cc: linux-kernel, linux-mm, kasan-dev, linux-s390, stable
apply_to_pte_range() enters the lazy MMU mode and then invokes
kasan_populate_vmalloc_pte() callback on each page table walk
iteration. However, the callback can go into sleep when trying
to allocate a single page, e.g. if an architecutre disables
preemption on lazy MMU mode enter.
On s390 if make arch_enter_lazy_mmu_mode() -> preempt_enable()
and arch_leave_lazy_mmu_mode() -> preempt_disable(), such crash
occurs:
[ 0.663336] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321
[ 0.663348] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
[ 0.663358] preempt_count: 1, expected: 0
[ 0.663366] RCU nest depth: 0, expected: 0
[ 0.663375] no locks held by kthreadd/2.
[ 0.663383] Preemption disabled at:
[ 0.663386] [<0002f3284cbb4eda>] apply_to_pte_range+0xfa/0x4a0
[ 0.663405] CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted 6.15.0-rc5-gcc-kasan-00043-gd76bb1ebb558-dirty #162 PREEMPT
[ 0.663408] Hardware name: IBM 3931 A01 701 (KVM/Linux)
[ 0.663409] Call Trace:
[ 0.663410] [<0002f3284c385f58>] dump_stack_lvl+0xe8/0x140
[ 0.663413] [<0002f3284c507b9e>] __might_resched+0x66e/0x700
[ 0.663415] [<0002f3284cc4f6c0>] __alloc_frozen_pages_noprof+0x370/0x4b0
[ 0.663419] [<0002f3284ccc73c0>] alloc_pages_mpol+0x1a0/0x4a0
[ 0.663421] [<0002f3284ccc8518>] alloc_frozen_pages_noprof+0x88/0xc0
[ 0.663424] [<0002f3284ccc8572>] alloc_pages_noprof+0x22/0x120
[ 0.663427] [<0002f3284cc341ac>] get_free_pages_noprof+0x2c/0xc0
[ 0.663429] [<0002f3284cceba70>] kasan_populate_vmalloc_pte+0x50/0x120
[ 0.663433] [<0002f3284cbb4ef8>] apply_to_pte_range+0x118/0x4a0
[ 0.663435] [<0002f3284cbc7c14>] apply_to_pmd_range+0x194/0x3e0
[ 0.663437] [<0002f3284cbc99be>] __apply_to_page_range+0x2fe/0x7a0
[ 0.663440] [<0002f3284cbc9e88>] apply_to_page_range+0x28/0x40
[ 0.663442] [<0002f3284ccebf12>] kasan_populate_vmalloc+0x82/0xa0
[ 0.663445] [<0002f3284cc1578c>] alloc_vmap_area+0x34c/0xc10
[ 0.663448] [<0002f3284cc1c2a6>] __get_vm_area_node+0x186/0x2a0
[ 0.663451] [<0002f3284cc1e696>] __vmalloc_node_range_noprof+0x116/0x310
[ 0.663454] [<0002f3284cc1d950>] __vmalloc_node_noprof+0xd0/0x110
[ 0.663457] [<0002f3284c454b88>] alloc_thread_stack_node+0xf8/0x330
[ 0.663460] [<0002f3284c458d56>] dup_task_struct+0x66/0x4d0
[ 0.663463] [<0002f3284c45be90>] copy_process+0x280/0x4b90
[ 0.663465] [<0002f3284c460940>] kernel_clone+0xd0/0x4b0
[ 0.663467] [<0002f3284c46115e>] kernel_thread+0xbe/0xe0
[ 0.663469] [<0002f3284c4e440e>] kthreadd+0x50e/0x7f0
[ 0.663472] [<0002f3284c38c04a>] __ret_from_fork+0x8a/0xf0
[ 0.663475] [<0002f3284ed57ff2>] ret_from_fork+0xa/0x38
Instead of allocating single pages per-PTE, bulk-allocate the
shadow memory prior to applying kasan_populate_vmalloc_pte()
callback on a page range.
Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: stable@vger.kernel.org
Fixes: 3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
---
mm/kasan/shadow.c | 77 ++++++++++++++++++++++++++++++++++++++---------
1 file changed, 63 insertions(+), 14 deletions(-)
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 88d1c9dcb507..660cc2148575 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -292,30 +292,81 @@ void __init __weak kasan_populate_early_vm_area_shadow(void *start,
{
}
+struct vmalloc_populate_data {
+ unsigned long start;
+ struct page **pages;
+};
+
static int kasan_populate_vmalloc_pte(pte_t *ptep, unsigned long addr,
- void *unused)
+ void *_data)
{
- unsigned long page;
+ struct vmalloc_populate_data *data = _data;
+ struct page *page;
pte_t pte;
+ int index;
if (likely(!pte_none(ptep_get(ptep))))
return 0;
- page = __get_free_page(GFP_KERNEL);
- if (!page)
- return -ENOMEM;
-
- __memset((void *)page, KASAN_VMALLOC_INVALID, PAGE_SIZE);
- pte = pfn_pte(PFN_DOWN(__pa(page)), PAGE_KERNEL);
+ index = PFN_DOWN(addr - data->start);
+ page = data->pages[index];
+ __memset(page_to_virt(page), KASAN_VMALLOC_INVALID, PAGE_SIZE);
+ pte = pfn_pte(page_to_pfn(page), PAGE_KERNEL);
spin_lock(&init_mm.page_table_lock);
if (likely(pte_none(ptep_get(ptep)))) {
set_pte_at(&init_mm, addr, ptep, pte);
- page = 0;
+ data->pages[index] = NULL;
}
spin_unlock(&init_mm.page_table_lock);
- if (page)
- free_page(page);
+
+ return 0;
+}
+
+static inline void free_pages_bulk(struct page **pages, int nr_pages)
+{
+ int i;
+
+ for (i = 0; i < nr_pages; i++) {
+ if (pages[i]) {
+ __free_pages(pages[i], 0);
+ pages[i] = NULL;
+ }
+ }
+}
+
+static int __kasan_populate_vmalloc(unsigned long start, unsigned long end)
+{
+ unsigned long nr_populated, nr_pages, nr_total = PFN_UP(end - start);
+ struct vmalloc_populate_data data;
+ int ret;
+
+ data.pages = (struct page **)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+ if (!data.pages)
+ return -ENOMEM;
+
+ while (nr_total) {
+ nr_pages = min(nr_total, PAGE_SIZE / sizeof(data.pages[0]));
+ nr_populated = alloc_pages_bulk(GFP_KERNEL, nr_pages, data.pages);
+ if (nr_populated != nr_pages) {
+ free_pages_bulk(data.pages, nr_populated);
+ free_page((unsigned long)data.pages);
+ return -ENOMEM;
+ }
+
+ data.start = start;
+ ret = apply_to_page_range(&init_mm, start, nr_pages * PAGE_SIZE,
+ kasan_populate_vmalloc_pte, &data);
+ free_pages_bulk(data.pages, nr_pages);
+ if (ret)
+ return ret;
+
+ start += nr_pages * PAGE_SIZE;
+ nr_total -= nr_pages;
+ }
+
+ free_page((unsigned long)data.pages);
+
return 0;
}
@@ -348,9 +399,7 @@ int kasan_populate_vmalloc(unsigned long addr, unsigned long size)
shadow_start = PAGE_ALIGN_DOWN(shadow_start);
shadow_end = PAGE_ALIGN(shadow_end);
- ret = apply_to_page_range(&init_mm, shadow_start,
- shadow_end - shadow_start,
- kasan_populate_vmalloc_pte, NULL);
+ ret = __kasan_populate_vmalloc(shadow_start, shadow_end);
if (ret)
return ret;
--
2.45.2
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v6 1/1] kasan: Avoid sleepable page allocation from atomic context
2025-05-08 14:15 ` [PATCH v6 1/1] " Alexander Gordeev
@ 2025-05-09 10:05 ` Harry Yoo
2025-05-12 14:22 ` Alexander Gordeev
0 siblings, 1 reply; 4+ messages in thread
From: Harry Yoo @ 2025-05-09 10:05 UTC (permalink / raw)
To: Alexander Gordeev
Cc: Andrew Morton, Andrey Ryabinin, Daniel Axtens, linux-kernel,
linux-mm, kasan-dev, linux-s390, stable
On Thu, May 08, 2025 at 04:15:46PM +0200, Alexander Gordeev wrote:
> apply_to_pte_range() enters the lazy MMU mode and then invokes
> kasan_populate_vmalloc_pte() callback on each page table walk
> iteration. However, the callback can go into sleep when trying
> to allocate a single page, e.g. if an architecutre disables
> preemption on lazy MMU mode enter.
>
> On s390 if make arch_enter_lazy_mmu_mode() -> preempt_enable()
> and arch_leave_lazy_mmu_mode() -> preempt_disable(), such crash
> occurs:
>
> [ 0.663336] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321
> [ 0.663348] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
> [ 0.663358] preempt_count: 1, expected: 0
> [ 0.663366] RCU nest depth: 0, expected: 0
> [ 0.663375] no locks held by kthreadd/2.
> [ 0.663383] Preemption disabled at:
> [ 0.663386] [<0002f3284cbb4eda>] apply_to_pte_range+0xfa/0x4a0
> [ 0.663405] CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted 6.15.0-rc5-gcc-kasan-00043-gd76bb1ebb558-dirty #162 PREEMPT
> [ 0.663408] Hardware name: IBM 3931 A01 701 (KVM/Linux)
> [ 0.663409] Call Trace:
> [ 0.663410] [<0002f3284c385f58>] dump_stack_lvl+0xe8/0x140
> [ 0.663413] [<0002f3284c507b9e>] __might_resched+0x66e/0x700
> [ 0.663415] [<0002f3284cc4f6c0>] __alloc_frozen_pages_noprof+0x370/0x4b0
> [ 0.663419] [<0002f3284ccc73c0>] alloc_pages_mpol+0x1a0/0x4a0
> [ 0.663421] [<0002f3284ccc8518>] alloc_frozen_pages_noprof+0x88/0xc0
> [ 0.663424] [<0002f3284ccc8572>] alloc_pages_noprof+0x22/0x120
> [ 0.663427] [<0002f3284cc341ac>] get_free_pages_noprof+0x2c/0xc0
> [ 0.663429] [<0002f3284cceba70>] kasan_populate_vmalloc_pte+0x50/0x120
> [ 0.663433] [<0002f3284cbb4ef8>] apply_to_pte_range+0x118/0x4a0
> [ 0.663435] [<0002f3284cbc7c14>] apply_to_pmd_range+0x194/0x3e0
> [ 0.663437] [<0002f3284cbc99be>] __apply_to_page_range+0x2fe/0x7a0
> [ 0.663440] [<0002f3284cbc9e88>] apply_to_page_range+0x28/0x40
> [ 0.663442] [<0002f3284ccebf12>] kasan_populate_vmalloc+0x82/0xa0
> [ 0.663445] [<0002f3284cc1578c>] alloc_vmap_area+0x34c/0xc10
> [ 0.663448] [<0002f3284cc1c2a6>] __get_vm_area_node+0x186/0x2a0
> [ 0.663451] [<0002f3284cc1e696>] __vmalloc_node_range_noprof+0x116/0x310
> [ 0.663454] [<0002f3284cc1d950>] __vmalloc_node_noprof+0xd0/0x110
> [ 0.663457] [<0002f3284c454b88>] alloc_thread_stack_node+0xf8/0x330
> [ 0.663460] [<0002f3284c458d56>] dup_task_struct+0x66/0x4d0
> [ 0.663463] [<0002f3284c45be90>] copy_process+0x280/0x4b90
> [ 0.663465] [<0002f3284c460940>] kernel_clone+0xd0/0x4b0
> [ 0.663467] [<0002f3284c46115e>] kernel_thread+0xbe/0xe0
> [ 0.663469] [<0002f3284c4e440e>] kthreadd+0x50e/0x7f0
> [ 0.663472] [<0002f3284c38c04a>] __ret_from_fork+0x8a/0xf0
> [ 0.663475] [<0002f3284ed57ff2>] ret_from_fork+0xa/0x38
>
> Instead of allocating single pages per-PTE, bulk-allocate the
> shadow memory prior to applying kasan_populate_vmalloc_pte()
> callback on a page range.
>
> Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
> Cc: stable@vger.kernel.org
> Fixes: 3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory")
>
> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
> ---
FWIW, this patch looks good to me.
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
With a minor suggestion below.
> mm/kasan/shadow.c | 77 ++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 63 insertions(+), 14 deletions(-)
>
> diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
> index 88d1c9dcb507..660cc2148575 100644
> --- a/mm/kasan/shadow.c
> +++ b/mm/kasan/shadow.c
> @@ -292,30 +292,81 @@ void __init __weak kasan_populate_early_vm_area_shadow(void *start,
... snip ...
> +static int __kasan_populate_vmalloc(unsigned long start, unsigned long end)
> +{
> + unsigned long nr_populated, nr_pages, nr_total = PFN_UP(end - start);
> + struct vmalloc_populate_data data;
> + int ret;
> +
> + data.pages = (struct page **)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> + if (!data.pages)
> + return -ENOMEM;
> +
> + while (nr_total) {
> + nr_pages = min(nr_total, PAGE_SIZE / sizeof(data.pages[0]));
> + nr_populated = alloc_pages_bulk(GFP_KERNEL, nr_pages, data.pages);
> + if (nr_populated != nr_pages) {
> + free_pages_bulk(data.pages, nr_populated);
> + free_page((unsigned long)data.pages);
> + return -ENOMEM;
> + }
> +
> + data.start = start;
> + ret = apply_to_page_range(&init_mm, start, nr_pages * PAGE_SIZE,
> + kasan_populate_vmalloc_pte, &data);
> + free_pages_bulk(data.pages, nr_pages);
A minor suggestion:
I think this free_pages_bulk() can be moved outside the loop
(but with PAGE_SIZE / sizeof(data.pages[0]) instead of nr_pages),
because alloc_pages_bulk() simply skips allocating pages for any
non-NULL entries.
If some pages in the array were not used, it doesn't have to be freed;
on the next iteration of the loop alloc_pages_bulk() can skip
allocating pages for the non-NULL entries.
> + if (ret)
> + return ret;
> +
> + start += nr_pages * PAGE_SIZE;
> + nr_total -= nr_pages;
> + }
> +
> + free_page((unsigned long)data.pages);
> +
> return 0;
> }
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v6 1/1] kasan: Avoid sleepable page allocation from atomic context
2025-05-09 10:05 ` Harry Yoo
@ 2025-05-12 14:22 ` Alexander Gordeev
0 siblings, 0 replies; 4+ messages in thread
From: Alexander Gordeev @ 2025-05-12 14:22 UTC (permalink / raw)
To: Harry Yoo
Cc: Andrew Morton, Andrey Ryabinin, Daniel Axtens, linux-kernel,
linux-mm, kasan-dev, linux-s390, stable
On Fri, May 09, 2025 at 07:05:56PM +0900, Harry Yoo wrote:
> > + while (nr_total) {
> > + nr_pages = min(nr_total, PAGE_SIZE / sizeof(data.pages[0]));
> > + nr_populated = alloc_pages_bulk(GFP_KERNEL, nr_pages, data.pages);
> > + if (nr_populated != nr_pages) {
> > + free_pages_bulk(data.pages, nr_populated);
> > + free_page((unsigned long)data.pages);
> > + return -ENOMEM;
> > + }
> > +
> > + data.start = start;
> > + ret = apply_to_page_range(&init_mm, start, nr_pages * PAGE_SIZE,
> > + kasan_populate_vmalloc_pte, &data);
> > + free_pages_bulk(data.pages, nr_pages);
>
> A minor suggestion:
>
> I think this free_pages_bulk() can be moved outside the loop
> (but with PAGE_SIZE / sizeof(data.pages[0]) instead of nr_pages),
Because we know the number of populated pages I think we could
use it instead of maximal (PAGE_SIZE / sizeof(data.pages[0])).
> because alloc_pages_bulk() simply skips allocating pages for any
> non-NULL entries.
>
> If some pages in the array were not used, it doesn't have to be freed;
> on the next iteration of the loop alloc_pages_bulk() can skip
> allocating pages for the non-NULL entries.
Thanks for the suggestion! I will send an updated version.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-05-12 14:22 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-08 14:15 [PATCH v6 0/1] kasan: Avoid sleepable page allocation from atomic context Alexander Gordeev
2025-05-08 14:15 ` [PATCH v6 1/1] " Alexander Gordeev
2025-05-09 10:05 ` Harry Yoo
2025-05-12 14:22 ` Alexander Gordeev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).