* [PATCH 1/2] riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled
2024-05-15 5:50 [PATCH 0/2] riscv: fix debug_pagealloc Nam Cao
@ 2024-05-15 5:50 ` Nam Cao
2024-05-22 11:16 ` Alexandre Ghiti
2024-05-15 5:50 ` [PATCH 2/2] riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context Nam Cao
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: Nam Cao @ 2024-05-15 5:50 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
linux-riscv, linux-kernel
Cc: Nam Cao, stable
debug_pagealloc is a debug feature which clears the valid bit in page table
entry for freed pages to detect illegal accesses to freed memory.
For this feature to work, virtual mapping must have PAGE_SIZE resolution.
(No, we cannot map with huge pages and split them only when needed; because
pages can be allocated/freed in atomic context and page splitting cannot be
done in atomic context)
Force linear mapping to use small pages if debug_pagealloc is enabled.
Note that it is not necessary to force the entire linear mapping, but only
those that are given to memory allocator. Some parts of memory can keep
using huge page mapping (for example, kernel's executable code). But these
parts are minority, so keep it simple. This is just a debug feature, some
extra overhead should be acceptable.
Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
---
Interestingly this feature somehow still worked when first introduced.
My guess is that back then only 2MB page size is used. When a 4KB page is
freed, the entire 2MB will be (incorrectly) invalidated by this feature.
But 2MB is quite small, so no one else happen to use other 4KB pages in
this 2MB area. In other words, it used to work by luck.
Now larger page sizes are used, so this feature invalidate large chunk of
memory, and the probability that someone else access this chunk and
trigger a page fault is much higher.
arch/riscv/mm/init.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 2574f6a3b0e7..73914afa3aba 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -682,6 +682,9 @@ void __init create_pgd_mapping(pgd_t *pgdp,
static uintptr_t __init best_map_size(phys_addr_t pa, uintptr_t va,
phys_addr_t size)
{
+ if (debug_pagealloc_enabled())
+ return PAGE_SIZE;
+
if (pgtable_l5_enabled &&
!(pa & (P4D_SIZE - 1)) && !(va & (P4D_SIZE - 1)) && size >= P4D_SIZE)
return P4D_SIZE;
--
2.39.2
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH 1/2] riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled
2024-05-15 5:50 ` [PATCH 1/2] riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled Nam Cao
@ 2024-05-22 11:16 ` Alexandre Ghiti
0 siblings, 0 replies; 7+ messages in thread
From: Alexandre Ghiti @ 2024-05-22 11:16 UTC (permalink / raw)
To: Nam Cao, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Alexandre Ghiti, linux-riscv, linux-kernel
Cc: stable
Hi Nam,
On 15/05/2024 07:50, Nam Cao wrote:
> debug_pagealloc is a debug feature which clears the valid bit in page table
> entry for freed pages to detect illegal accesses to freed memory.
>
> For this feature to work, virtual mapping must have PAGE_SIZE resolution.
> (No, we cannot map with huge pages and split them only when needed; because
> pages can be allocated/freed in atomic context and page splitting cannot be
> done in atomic context)
>
> Force linear mapping to use small pages if debug_pagealloc is enabled.
>
> Note that it is not necessary to force the entire linear mapping, but only
> those that are given to memory allocator. Some parts of memory can keep
> using huge page mapping (for example, kernel's executable code). But these
> parts are minority, so keep it simple. This is just a debug feature, some
> extra overhead should be acceptable.
>
> Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> Cc: stable@vger.kernel.org
> ---
> Interestingly this feature somehow still worked when first introduced.
> My guess is that back then only 2MB page size is used. When a 4KB page is
> freed, the entire 2MB will be (incorrectly) invalidated by this feature.
> But 2MB is quite small, so no one else happen to use other 4KB pages in
> this 2MB area. In other words, it used to work by luck.
>
> Now larger page sizes are used, so this feature invalidate large chunk of
> memory, and the probability that someone else access this chunk and
> trigger a page fault is much higher.
>
> arch/riscv/mm/init.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 2574f6a3b0e7..73914afa3aba 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -682,6 +682,9 @@ void __init create_pgd_mapping(pgd_t *pgdp,
> static uintptr_t __init best_map_size(phys_addr_t pa, uintptr_t va,
> phys_addr_t size)
> {
> + if (debug_pagealloc_enabled())
> + return PAGE_SIZE;
> +
> if (pgtable_l5_enabled &&
> !(pa & (P4D_SIZE - 1)) && !(va & (P4D_SIZE - 1)) && size >= P4D_SIZE)
> return P4D_SIZE;
You can add:
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Thanks,
Alex
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
2024-05-15 5:50 [PATCH 0/2] riscv: fix debug_pagealloc Nam Cao
2024-05-15 5:50 ` [PATCH 1/2] riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled Nam Cao
@ 2024-05-15 5:50 ` Nam Cao
2024-05-22 11:22 ` Alexandre Ghiti
2024-05-15 7:39 ` [PATCH 0/2] riscv: fix debug_pagealloc Nam Cao
2024-05-22 23:51 ` patchwork-bot+linux-riscv
3 siblings, 1 reply; 7+ messages in thread
From: Nam Cao @ 2024-05-15 5:50 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
linux-riscv, linux-kernel
Cc: Nam Cao, stable
__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.
This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:
BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c
Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.
Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
---
arch/riscv/mm/pageattr.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
index 410056a50aa9..271d01a5ba4d 100644
--- a/arch/riscv/mm/pageattr.c
+++ b/arch/riscv/mm/pageattr.c
@@ -387,17 +387,33 @@ int set_direct_map_default_noflush(struct page *page)
}
#ifdef CONFIG_DEBUG_PAGEALLOC
+static int debug_pagealloc_set_page(pte_t *pte, unsigned long addr, void *data)
+{
+ int enable = *(int *)data;
+
+ unsigned long val = pte_val(ptep_get(pte));
+
+ if (enable)
+ val |= _PAGE_PRESENT;
+ else
+ val &= ~_PAGE_PRESENT;
+
+ set_pte(pte, __pte(val));
+
+ return 0;
+}
+
void __kernel_map_pages(struct page *page, int numpages, int enable)
{
if (!debug_pagealloc_enabled())
return;
- if (enable)
- __set_memory((unsigned long)page_address(page), numpages,
- __pgprot(_PAGE_PRESENT), __pgprot(0));
- else
- __set_memory((unsigned long)page_address(page), numpages,
- __pgprot(0), __pgprot(_PAGE_PRESENT));
+ unsigned long start = (unsigned long)page_address(page);
+ unsigned long size = PAGE_SIZE * numpages;
+
+ apply_to_existing_page_range(&init_mm, start, size, debug_pagealloc_set_page, &enable);
+
+ flush_tlb_kernel_range(start, start + size);
}
#endif
--
2.39.2
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH 2/2] riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
2024-05-15 5:50 ` [PATCH 2/2] riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context Nam Cao
@ 2024-05-22 11:22 ` Alexandre Ghiti
0 siblings, 0 replies; 7+ messages in thread
From: Alexandre Ghiti @ 2024-05-22 11:22 UTC (permalink / raw)
To: Nam Cao, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Alexandre Ghiti, linux-riscv, linux-kernel
Cc: stable
On 15/05/2024 07:50, Nam Cao wrote:
> __kernel_map_pages() is a debug function which clears the valid bit in page
> table entry for deallocated pages to detect illegal memory accesses to
> freed pages.
>
> This function set/clear the valid bit using __set_memory(). __set_memory()
> acquires init_mm's semaphore, and this operation may sleep. This is
> problematic, because __kernel_map_pages() can be called in atomic context,
> and thus is illegal to sleep. An example warning that this causes:
>
> BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
> preempt_count: 2, expected: 0
> CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
> Hardware name: riscv-virtio,qemu (DT)
> Call Trace:
> [<ffffffff800060dc>] dump_backtrace+0x1c/0x24
> [<ffffffff8091ef6e>] show_stack+0x2c/0x38
> [<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
> [<ffffffff8092bb24>] dump_stack+0x14/0x1c
> [<ffffffff8003b7ac>] __might_resched+0x104/0x10e
> [<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
> [<ffffffff8093276a>] down_write+0x20/0x72
> [<ffffffff8000cf00>] __set_memory+0x82/0x2fa
> [<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
> [<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
> [<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
> [<ffffffff80011904>] copy_process+0x72c/0x17ec
> [<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
> [<ffffffff80012f62>] kernel_thread+0x82/0xa0
> [<ffffffff8003552c>] kthreadd+0x14a/0x1be
> [<ffffffff809357de>] ret_from_fork+0xe/0x1c
>
> Rewrite this function with apply_to_existing_page_range(). It is fine to
> not have any locking, because __kernel_map_pages() works with pages being
> allocated/deallocated and those pages are not changed by anyone else in the
> meantime.
>
> Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> Cc: stable@vger.kernel.org
> ---
> arch/riscv/mm/pageattr.c | 28 ++++++++++++++++++++++------
> 1 file changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
> index 410056a50aa9..271d01a5ba4d 100644
> --- a/arch/riscv/mm/pageattr.c
> +++ b/arch/riscv/mm/pageattr.c
> @@ -387,17 +387,33 @@ int set_direct_map_default_noflush(struct page *page)
> }
>
> #ifdef CONFIG_DEBUG_PAGEALLOC
> +static int debug_pagealloc_set_page(pte_t *pte, unsigned long addr, void *data)
> +{
> + int enable = *(int *)data;
> +
> + unsigned long val = pte_val(ptep_get(pte));
> +
> + if (enable)
> + val |= _PAGE_PRESENT;
> + else
> + val &= ~_PAGE_PRESENT;
> +
> + set_pte(pte, __pte(val));
> +
> + return 0;
> +}
> +
> void __kernel_map_pages(struct page *page, int numpages, int enable)
> {
> if (!debug_pagealloc_enabled())
> return;
>
> - if (enable)
> - __set_memory((unsigned long)page_address(page), numpages,
> - __pgprot(_PAGE_PRESENT), __pgprot(0));
> - else
> - __set_memory((unsigned long)page_address(page), numpages,
> - __pgprot(0), __pgprot(_PAGE_PRESENT));
> + unsigned long start = (unsigned long)page_address(page);
> + unsigned long size = PAGE_SIZE * numpages;
> +
> + apply_to_existing_page_range(&init_mm, start, size, debug_pagealloc_set_page, &enable);
> +
> + flush_tlb_kernel_range(start, start + size);
> }
> #endif
>
And you can add:
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Thanks!
Alex
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] riscv: fix debug_pagealloc
2024-05-15 5:50 [PATCH 0/2] riscv: fix debug_pagealloc Nam Cao
2024-05-15 5:50 ` [PATCH 1/2] riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled Nam Cao
2024-05-15 5:50 ` [PATCH 2/2] riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context Nam Cao
@ 2024-05-15 7:39 ` Nam Cao
2024-05-22 23:51 ` patchwork-bot+linux-riscv
3 siblings, 0 replies; 7+ messages in thread
From: Nam Cao @ 2024-05-15 7:39 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
linux-riscv, linux-kernel, Zong Li
Forgot to Cc the original author.
+Cc: Zong Li <zong.li@sifive.com>
On Wed, May 15, 2024 at 07:50:38AM +0200, Nam Cao wrote:
> The debug_pagealloc feature is not functional on RISCV. With this feature
> enabled (CONFIG_DEBUG_PAGEALLOC=y and debug_pagealloc=on), kernel crashes
> early during boot.
>
> QEMU command that can reproduce this problem:
> qemu-system-riscv64 -machine virt \
> -kernel Image \
> -append "console=ttyS0 root=/dev/vda debug_pagealloc=on" \
> -nographic \
> -drive "file=root.img,format=raw,id=hd0" \
> -device virtio-blk-device,drive=hd0 \
> -m 4G \
>
> This series makes debug_pagealloc functional.
>
> Nam Cao (2):
> riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled
> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
>
> arch/riscv/mm/init.c | 3 +++
> arch/riscv/mm/pageattr.c | 28 ++++++++++++++++++++++------
> 2 files changed, 25 insertions(+), 6 deletions(-)
>
> --
> 2.39.2
>
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] riscv: fix debug_pagealloc
2024-05-15 5:50 [PATCH 0/2] riscv: fix debug_pagealloc Nam Cao
` (2 preceding siblings ...)
2024-05-15 7:39 ` [PATCH 0/2] riscv: fix debug_pagealloc Nam Cao
@ 2024-05-22 23:51 ` patchwork-bot+linux-riscv
3 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+linux-riscv @ 2024-05-22 23:51 UTC (permalink / raw)
To: Nam Cao; +Cc: linux-riscv, paul.walmsley, palmer, aou, alexghiti, linux-kernel
Hello:
This series was applied to riscv/linux.git (for-next)
by Palmer Dabbelt <palmer@rivosinc.com>:
On Wed, 15 May 2024 07:50:38 +0200 you wrote:
> The debug_pagealloc feature is not functional on RISCV. With this feature
> enabled (CONFIG_DEBUG_PAGEALLOC=y and debug_pagealloc=on), kernel crashes
> early during boot.
>
> QEMU command that can reproduce this problem:
> qemu-system-riscv64 -machine virt \
> -kernel Image \
> -append "console=ttyS0 root=/dev/vda debug_pagealloc=on" \
> -nographic \
> -drive "file=root.img,format=raw,id=hd0" \
> -device virtio-blk-device,drive=hd0 \
> -m 4G \
>
> [...]
Here is the summary with links:
- [1/2] riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled
https://git.kernel.org/riscv/c/c67ddf59ac44
- [2/2] riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
https://git.kernel.org/riscv/c/fb1cf0878328
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 7+ messages in thread