* [PATCH v7 0/1] kasan: Avoid sleepable page allocation from atomic context @ 2025-05-12 14:27 Alexander Gordeev 2025-05-12 14:27 ` [PATCH v7 1/1] " Alexander Gordeev 0 siblings, 1 reply; 5+ messages in thread From: Alexander Gordeev @ 2025-05-12 14:27 UTC (permalink / raw) To: Andrew Morton, Andrey Ryabinin, Daniel Axtens, Harry Yoo Cc: linux-kernel, linux-mm, kasan-dev, linux-s390, stable Hi All, Chages since v6: - do not unnecessary free pages across iterations Chages since v5: - full error message included into commit description Chages since v4: - unused pages leak is avoided Chages since v3: - pfn_to_virt() changed to page_to_virt() due to compile error Chages since v2: - page allocation moved out of the atomic context Chages since v1: - Fixes: and -stable tags added to the patch description Thanks! Alexander Gordeev (1): kasan: Avoid sleepable page allocation from atomic context mm/kasan/shadow.c | 76 ++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 62 insertions(+), 14 deletions(-) -- 2.45.2 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v7 1/1] kasan: Avoid sleepable page allocation from atomic context 2025-05-12 14:27 [PATCH v7 0/1] kasan: Avoid sleepable page allocation from atomic context Alexander Gordeev @ 2025-05-12 14:27 ` Alexander Gordeev 2025-05-12 15:33 ` Harry Yoo 0 siblings, 1 reply; 5+ messages in thread From: Alexander Gordeev @ 2025-05-12 14:27 UTC (permalink / raw) To: Andrew Morton, Andrey Ryabinin, Daniel Axtens, Harry Yoo Cc: linux-kernel, linux-mm, kasan-dev, linux-s390, stable apply_to_pte_range() enters the lazy MMU mode and then invokes kasan_populate_vmalloc_pte() callback on each page table walk iteration. However, the callback can go into sleep when trying to allocate a single page, e.g. if an architecutre disables preemption on lazy MMU mode enter. On s390 if make arch_enter_lazy_mmu_mode() -> preempt_enable() and arch_leave_lazy_mmu_mode() -> preempt_disable(), such crash occurs: [ 0.663336] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321 [ 0.663348] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd [ 0.663358] preempt_count: 1, expected: 0 [ 0.663366] RCU nest depth: 0, expected: 0 [ 0.663375] no locks held by kthreadd/2. [ 0.663383] Preemption disabled at: [ 0.663386] [<0002f3284cbb4eda>] apply_to_pte_range+0xfa/0x4a0 [ 0.663405] CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted 6.15.0-rc5-gcc-kasan-00043-gd76bb1ebb558-dirty #162 PREEMPT [ 0.663408] Hardware name: IBM 3931 A01 701 (KVM/Linux) [ 0.663409] Call Trace: [ 0.663410] [<0002f3284c385f58>] dump_stack_lvl+0xe8/0x140 [ 0.663413] [<0002f3284c507b9e>] __might_resched+0x66e/0x700 [ 0.663415] [<0002f3284cc4f6c0>] __alloc_frozen_pages_noprof+0x370/0x4b0 [ 0.663419] [<0002f3284ccc73c0>] alloc_pages_mpol+0x1a0/0x4a0 [ 0.663421] [<0002f3284ccc8518>] alloc_frozen_pages_noprof+0x88/0xc0 [ 0.663424] [<0002f3284ccc8572>] alloc_pages_noprof+0x22/0x120 [ 0.663427] [<0002f3284cc341ac>] get_free_pages_noprof+0x2c/0xc0 [ 0.663429] [<0002f3284cceba70>] kasan_populate_vmalloc_pte+0x50/0x120 [ 0.663433] [<0002f3284cbb4ef8>] apply_to_pte_range+0x118/0x4a0 [ 0.663435] [<0002f3284cbc7c14>] apply_to_pmd_range+0x194/0x3e0 [ 0.663437] [<0002f3284cbc99be>] __apply_to_page_range+0x2fe/0x7a0 [ 0.663440] [<0002f3284cbc9e88>] apply_to_page_range+0x28/0x40 [ 0.663442] [<0002f3284ccebf12>] kasan_populate_vmalloc+0x82/0xa0 [ 0.663445] [<0002f3284cc1578c>] alloc_vmap_area+0x34c/0xc10 [ 0.663448] [<0002f3284cc1c2a6>] __get_vm_area_node+0x186/0x2a0 [ 0.663451] [<0002f3284cc1e696>] __vmalloc_node_range_noprof+0x116/0x310 [ 0.663454] [<0002f3284cc1d950>] __vmalloc_node_noprof+0xd0/0x110 [ 0.663457] [<0002f3284c454b88>] alloc_thread_stack_node+0xf8/0x330 [ 0.663460] [<0002f3284c458d56>] dup_task_struct+0x66/0x4d0 [ 0.663463] [<0002f3284c45be90>] copy_process+0x280/0x4b90 [ 0.663465] [<0002f3284c460940>] kernel_clone+0xd0/0x4b0 [ 0.663467] [<0002f3284c46115e>] kernel_thread+0xbe/0xe0 [ 0.663469] [<0002f3284c4e440e>] kthreadd+0x50e/0x7f0 [ 0.663472] [<0002f3284c38c04a>] __ret_from_fork+0x8a/0xf0 [ 0.663475] [<0002f3284ed57ff2>] ret_from_fork+0xa/0x38 Instead of allocating single pages per-PTE, bulk-allocate the shadow memory prior to applying kasan_populate_vmalloc_pte() callback on a page range. Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: stable@vger.kernel.org Fixes: 3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> --- mm/kasan/shadow.c | 76 ++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 62 insertions(+), 14 deletions(-) diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c index 88d1c9dcb507..2bf00bf7e545 100644 --- a/mm/kasan/shadow.c +++ b/mm/kasan/shadow.c @@ -292,33 +292,83 @@ void __init __weak kasan_populate_early_vm_area_shadow(void *start, { } +struct vmalloc_populate_data { + unsigned long start; + struct page **pages; +}; + static int kasan_populate_vmalloc_pte(pte_t *ptep, unsigned long addr, - void *unused) + void *_data) { - unsigned long page; + struct vmalloc_populate_data *data = _data; + struct page *page; pte_t pte; + int index; if (likely(!pte_none(ptep_get(ptep)))) return 0; - page = __get_free_page(GFP_KERNEL); - if (!page) - return -ENOMEM; - - __memset((void *)page, KASAN_VMALLOC_INVALID, PAGE_SIZE); - pte = pfn_pte(PFN_DOWN(__pa(page)), PAGE_KERNEL); + index = PFN_DOWN(addr - data->start); + page = data->pages[index]; + __memset(page_to_virt(page), KASAN_VMALLOC_INVALID, PAGE_SIZE); + pte = pfn_pte(page_to_pfn(page), PAGE_KERNEL); spin_lock(&init_mm.page_table_lock); if (likely(pte_none(ptep_get(ptep)))) { set_pte_at(&init_mm, addr, ptep, pte); - page = 0; + data->pages[index] = NULL; } spin_unlock(&init_mm.page_table_lock); - if (page) - free_page(page); + return 0; } +static inline void free_pages_bulk(struct page **pages, int nr_pages) +{ + int i; + + for (i = 0; i < nr_pages; i++) { + if (pages[i]) { + __free_pages(pages[i], 0); + pages[i] = NULL; + } + } +} + +static int __kasan_populate_vmalloc(unsigned long start, unsigned long end) +{ + unsigned long nr_pages, nr_populated = 0, nr_total = PFN_UP(end - start); + struct vmalloc_populate_data data; + int ret = 0; + + data.pages = (struct page **)__get_free_page(GFP_KERNEL | __GFP_ZERO); + if (!data.pages) + return -ENOMEM; + + while (nr_total) { + nr_pages = min(nr_total, PAGE_SIZE / sizeof(data.pages[0])); + nr_populated = alloc_pages_bulk(GFP_KERNEL, nr_pages, data.pages); + if (nr_populated != nr_pages) { + ret = -ENOMEM; + break; + } + + data.start = start; + ret = apply_to_page_range(&init_mm, start, nr_pages * PAGE_SIZE, + kasan_populate_vmalloc_pte, &data); + if (ret) + break; + + start += nr_pages * PAGE_SIZE; + nr_total -= nr_pages; + } + + free_pages_bulk(data.pages, nr_populated); + free_page((unsigned long)data.pages); + + return ret; +} + int kasan_populate_vmalloc(unsigned long addr, unsigned long size) { unsigned long shadow_start, shadow_end; @@ -348,9 +398,7 @@ int kasan_populate_vmalloc(unsigned long addr, unsigned long size) shadow_start = PAGE_ALIGN_DOWN(shadow_start); shadow_end = PAGE_ALIGN(shadow_end); - ret = apply_to_page_range(&init_mm, shadow_start, - shadow_end - shadow_start, - kasan_populate_vmalloc_pte, NULL); + ret = __kasan_populate_vmalloc(shadow_start, shadow_end); if (ret) return ret; -- 2.45.2 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v7 1/1] kasan: Avoid sleepable page allocation from atomic context 2025-05-12 14:27 ` [PATCH v7 1/1] " Alexander Gordeev @ 2025-05-12 15:33 ` Harry Yoo 2025-05-12 16:31 ` Alexander Gordeev 0 siblings, 1 reply; 5+ messages in thread From: Harry Yoo @ 2025-05-12 15:33 UTC (permalink / raw) To: Alexander Gordeev Cc: Andrew Morton, Andrey Ryabinin, Daniel Axtens, linux-kernel, linux-mm, kasan-dev, linux-s390, stable On Mon, May 12, 2025 at 04:27:06PM +0200, Alexander Gordeev wrote: > apply_to_pte_range() enters the lazy MMU mode and then invokes > kasan_populate_vmalloc_pte() callback on each page table walk > iteration. However, the callback can go into sleep when trying > to allocate a single page, e.g. if an architecutre disables > preemption on lazy MMU mode enter. > > On s390 if make arch_enter_lazy_mmu_mode() -> preempt_enable() > and arch_leave_lazy_mmu_mode() -> preempt_disable(), such crash > occurs: > > [ 0.663336] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321 > [ 0.663348] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd > [ 0.663358] preempt_count: 1, expected: 0 > [ 0.663366] RCU nest depth: 0, expected: 0 > [ 0.663375] no locks held by kthreadd/2. > [ 0.663383] Preemption disabled at: > [ 0.663386] [<0002f3284cbb4eda>] apply_to_pte_range+0xfa/0x4a0 > [ 0.663405] CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted 6.15.0-rc5-gcc-kasan-00043-gd76bb1ebb558-dirty #162 PREEMPT > [ 0.663408] Hardware name: IBM 3931 A01 701 (KVM/Linux) > [ 0.663409] Call Trace: > [ 0.663410] [<0002f3284c385f58>] dump_stack_lvl+0xe8/0x140 > [ 0.663413] [<0002f3284c507b9e>] __might_resched+0x66e/0x700 > [ 0.663415] [<0002f3284cc4f6c0>] __alloc_frozen_pages_noprof+0x370/0x4b0 > [ 0.663419] [<0002f3284ccc73c0>] alloc_pages_mpol+0x1a0/0x4a0 > [ 0.663421] [<0002f3284ccc8518>] alloc_frozen_pages_noprof+0x88/0xc0 > [ 0.663424] [<0002f3284ccc8572>] alloc_pages_noprof+0x22/0x120 > [ 0.663427] [<0002f3284cc341ac>] get_free_pages_noprof+0x2c/0xc0 > [ 0.663429] [<0002f3284cceba70>] kasan_populate_vmalloc_pte+0x50/0x120 > [ 0.663433] [<0002f3284cbb4ef8>] apply_to_pte_range+0x118/0x4a0 > [ 0.663435] [<0002f3284cbc7c14>] apply_to_pmd_range+0x194/0x3e0 > [ 0.663437] [<0002f3284cbc99be>] __apply_to_page_range+0x2fe/0x7a0 > [ 0.663440] [<0002f3284cbc9e88>] apply_to_page_range+0x28/0x40 > [ 0.663442] [<0002f3284ccebf12>] kasan_populate_vmalloc+0x82/0xa0 > [ 0.663445] [<0002f3284cc1578c>] alloc_vmap_area+0x34c/0xc10 > [ 0.663448] [<0002f3284cc1c2a6>] __get_vm_area_node+0x186/0x2a0 > [ 0.663451] [<0002f3284cc1e696>] __vmalloc_node_range_noprof+0x116/0x310 > [ 0.663454] [<0002f3284cc1d950>] __vmalloc_node_noprof+0xd0/0x110 > [ 0.663457] [<0002f3284c454b88>] alloc_thread_stack_node+0xf8/0x330 > [ 0.663460] [<0002f3284c458d56>] dup_task_struct+0x66/0x4d0 > [ 0.663463] [<0002f3284c45be90>] copy_process+0x280/0x4b90 > [ 0.663465] [<0002f3284c460940>] kernel_clone+0xd0/0x4b0 > [ 0.663467] [<0002f3284c46115e>] kernel_thread+0xbe/0xe0 > [ 0.663469] [<0002f3284c4e440e>] kthreadd+0x50e/0x7f0 > [ 0.663472] [<0002f3284c38c04a>] __ret_from_fork+0x8a/0xf0 > [ 0.663475] [<0002f3284ed57ff2>] ret_from_fork+0xa/0x38 > > Instead of allocating single pages per-PTE, bulk-allocate the > shadow memory prior to applying kasan_populate_vmalloc_pte() > callback on a page range. > > Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> > Cc: stable@vger.kernel.org > Fixes: 3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory") > > Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> > --- > mm/kasan/shadow.c | 76 ++++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 62 insertions(+), 14 deletions(-) > > diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c > index 88d1c9dcb507..2bf00bf7e545 100644 > --- a/mm/kasan/shadow.c > +++ b/mm/kasan/shadow.c > @@ -292,33 +292,83 @@ void __init __weak kasan_populate_early_vm_area_shadow(void *start, > { > } > > +struct vmalloc_populate_data { > + unsigned long start; > + struct page **pages; > +}; > + > static int kasan_populate_vmalloc_pte(pte_t *ptep, unsigned long addr, > - void *unused) > + void *_data) > { > - unsigned long page; > + struct vmalloc_populate_data *data = _data; > + struct page *page; > pte_t pte; > + int index; > > if (likely(!pte_none(ptep_get(ptep)))) > return 0; > > - page = __get_free_page(GFP_KERNEL); > - if (!page) > - return -ENOMEM; > - > - __memset((void *)page, KASAN_VMALLOC_INVALID, PAGE_SIZE); > - pte = pfn_pte(PFN_DOWN(__pa(page)), PAGE_KERNEL); > + index = PFN_DOWN(addr - data->start); > + page = data->pages[index]; > + __memset(page_to_virt(page), KASAN_VMALLOC_INVALID, PAGE_SIZE); > + pte = pfn_pte(page_to_pfn(page), PAGE_KERNEL); > > spin_lock(&init_mm.page_table_lock); > if (likely(pte_none(ptep_get(ptep)))) { > set_pte_at(&init_mm, addr, ptep, pte); > - page = 0; > + data->pages[index] = NULL; > } > spin_unlock(&init_mm.page_table_lock); > - if (page) > - free_page(page); > + > return 0; > } > > +static inline void free_pages_bulk(struct page **pages, int nr_pages) > +{ > + int i; > + > + for (i = 0; i < nr_pages; i++) { > + if (pages[i]) { > + __free_pages(pages[i], 0); > + pages[i] = NULL; > + } > + } > +} > + > +static int __kasan_populate_vmalloc(unsigned long start, unsigned long end) > +{ > + unsigned long nr_pages, nr_populated = 0, nr_total = PFN_UP(end - start); > + struct vmalloc_populate_data data; > + int ret = 0; > + > + data.pages = (struct page **)__get_free_page(GFP_KERNEL | __GFP_ZERO); > + if (!data.pages) > + return -ENOMEM; > + > + while (nr_total) { > + nr_pages = min(nr_total, PAGE_SIZE / sizeof(data.pages[0])); > + nr_populated = alloc_pages_bulk(GFP_KERNEL, nr_pages, data.pages); > + if (nr_populated != nr_pages) { > + ret = -ENOMEM; > + break; > + } > + > + data.start = start; > + ret = apply_to_page_range(&init_mm, start, nr_pages * PAGE_SIZE, > + kasan_populate_vmalloc_pte, &data); > + if (ret) > + break; > + > + start += nr_pages * PAGE_SIZE; > + nr_total -= nr_pages; > + } > + > + free_pages_bulk(data.pages, nr_populated); Hi, Thanks for the update, but I don't think nr_populated is sufficient here. If nr_populated in the last iteration is smaller than its value in any previous iteration, it could lead to a memory leak. That's why I suggested (PAGE_SIZE / sizeof(data.pages[0])). ...but on second thought maybe touching the whole array is not efficient either. If this ends up making things complicated probably we should just merge v6 instead (v6 looks good)? micro-optimizing vmalloc shadow memory population doesn't seem worth it if it comes at the cost of complexity :) > + free_page((unsigned long)data.pages); > + > + return ret; > +} > + > int kasan_populate_vmalloc(unsigned long addr, unsigned long size) > { > unsigned long shadow_start, shadow_end; > @@ -348,9 +398,7 @@ int kasan_populate_vmalloc(unsigned long addr, unsigned long size) > shadow_start = PAGE_ALIGN_DOWN(shadow_start); > shadow_end = PAGE_ALIGN(shadow_end); > > - ret = apply_to_page_range(&init_mm, shadow_start, > - shadow_end - shadow_start, > - kasan_populate_vmalloc_pte, NULL); > + ret = __kasan_populate_vmalloc(shadow_start, shadow_end); > if (ret) > return ret; > > -- > 2.45.2 > -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v7 1/1] kasan: Avoid sleepable page allocation from atomic context 2025-05-12 15:33 ` Harry Yoo @ 2025-05-12 16:31 ` Alexander Gordeev 2025-05-13 0:30 ` Harry Yoo 0 siblings, 1 reply; 5+ messages in thread From: Alexander Gordeev @ 2025-05-12 16:31 UTC (permalink / raw) To: Harry Yoo Cc: Andrew Morton, Andrey Ryabinin, Daniel Axtens, linux-kernel, linux-mm, kasan-dev, linux-s390, stable On Tue, May 13, 2025 at 12:33:35AM +0900, Harry Yoo wrote: > Thanks for the update, but I don't think nr_populated is sufficient > here. If nr_populated in the last iteration is smaller than its value > in any previous iteration, it could lead to a memory leak. > > That's why I suggested (PAGE_SIZE / sizeof(data.pages[0])). > ...but on second thought maybe touching the whole array is not > efficient either. Yes, I did not like it and wanted to limit the number of pages, but did not realize that using nr_populated still could produce leaks. In addition I could simply do: max_populted = max(max_populted, nr_populated); ... free_pages_bulk(data.pages, max_populated); > If this ends up making things complicated probably we should just > merge v6 instead (v6 looks good)? micro-optimizing vmalloc shadow memory > population doesn't seem worth it if it comes at the cost of complexity :) v6 is okay, except that in v7 I use break instead of return: ret = apply_to_page_range(...); if (ret) break; and as result can call the final: free_page((unsigned long)data.pages); Frankly, I do not have strong opinion. > -- > Cheers, > Harry / Hyeonggon Thanks! ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v7 1/1] kasan: Avoid sleepable page allocation from atomic context 2025-05-12 16:31 ` Alexander Gordeev @ 2025-05-13 0:30 ` Harry Yoo 0 siblings, 0 replies; 5+ messages in thread From: Harry Yoo @ 2025-05-13 0:30 UTC (permalink / raw) To: Alexander Gordeev Cc: Andrew Morton, Andrey Ryabinin, Daniel Axtens, linux-kernel, linux-mm, kasan-dev, linux-s390, stable On Mon, May 12, 2025 at 06:31:30PM +0200, Alexander Gordeev wrote: > On Tue, May 13, 2025 at 12:33:35AM +0900, Harry Yoo wrote: > > Thanks for the update, but I don't think nr_populated is sufficient > > here. If nr_populated in the last iteration is smaller than its value > > in any previous iteration, it could lead to a memory leak. > > > > That's why I suggested (PAGE_SIZE / sizeof(data.pages[0])). > > ...but on second thought maybe touching the whole array is not > > efficient either. > > Yes, I did not like it and wanted to limit the number of pages, > but did not realize that using nr_populated still could produce > leaks. In addition I could simply do: > > max_populted = max(max_populted, nr_populated); > ... > free_pages_bulk(data.pages, max_populated); Yeah that could work, but given that it already confused you, I think we should focus on fixing the bug and defer further improvements later, since it will be backported to -stable. > > If this ends up making things complicated probably we should just > > merge v6 instead (v6 looks good)? micro-optimizing vmalloc shadow memory > > population doesn't seem worth it if it comes at the cost of complexity :) > > v6 is okay, except that in v7 I use break instead of return: > > ret = apply_to_page_range(...); > if (ret) > break; > > and as result can call the final: > > free_page((unsigned long)data.pages); Uh, I didn't realize that while reviewing. I think at this stage (-rc6) Andrew will prefer a fixup patch on top of v6. I think this [1] could fix it, but could you please verify it's correct and send a fixup patch (as a reply to v6)? [1] https://lore.kernel.org/mm-commits/aCKJYHPL_3xAewUB@hyeyoo -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-05-13 0:30 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-12 14:27 [PATCH v7 0/1] kasan: Avoid sleepable page allocation from atomic context Alexander Gordeev 2025-05-12 14:27 ` [PATCH v7 1/1] " Alexander Gordeev 2025-05-12 15:33 ` Harry Yoo 2025-05-12 16:31 ` Alexander Gordeev 2025-05-13 0:30 ` Harry Yoo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).