* [RFC RESEND v2 01/13] mm/kfence: Add a new kunit test test_use_after_free_read_nofault()
2024-10-15 1:33 [RFC RESEND v2 00/13] powerpc/kfence: Improve kfence support Ritesh Harjani (IBM)
@ 2024-10-15 1:33 ` Ritesh Harjani (IBM)
2024-10-15 1:33 ` [RFC RESEND v2 02/13] powerpc: mm: Fix kfence page fault reporting Ritesh Harjani (IBM)
` (11 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-15 1:33 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Nirjhar Roy, Ritesh Harjani (IBM)
From: Nirjhar Roy <nirjhar@linux.ibm.com>
Faults from copy_from_kernel_nofault() needs to be handled by fixup
table and should not be handled by kfence. Otherwise while reading
/proc/kcore which uses copy_from_kernel_nofault(), kfence can generate
false negatives. This can happen when /proc/kcore ends up reading an
unmapped address from kfence pool.
Let's add a testcase to cover this case.
Co-developed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Nirjhar Roy <nirjhar@linux.ibm.com>
Cc: kasan-dev@googlegroups.com
Cc: Alexander Potapenko <glider@google.com>
Cc: linux-mm@kvack.org
---
mm/kfence/kfence_test.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c
index 00fd17285285..f65fb182466d 100644
--- a/mm/kfence/kfence_test.c
+++ b/mm/kfence/kfence_test.c
@@ -383,6 +383,22 @@ static void test_use_after_free_read(struct kunit *test)
KUNIT_EXPECT_TRUE(test, report_matches(&expect));
}
+static void test_use_after_free_read_nofault(struct kunit *test)
+{
+ const size_t size = 32;
+ char *addr;
+ char dst;
+ int ret;
+
+ setup_test_cache(test, size, 0, NULL);
+ addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+ test_free(addr);
+ /* Use after free with *_nofault() */
+ ret = copy_from_kernel_nofault(&dst, addr, 1);
+ KUNIT_EXPECT_EQ(test, ret, -EFAULT);
+ KUNIT_EXPECT_FALSE(test, report_available());
+}
+
static void test_double_free(struct kunit *test)
{
const size_t size = 32;
@@ -780,6 +796,7 @@ static struct kunit_case kfence_test_cases[] = {
KFENCE_KUNIT_CASE(test_out_of_bounds_read),
KFENCE_KUNIT_CASE(test_out_of_bounds_write),
KFENCE_KUNIT_CASE(test_use_after_free_read),
+ KFENCE_KUNIT_CASE(test_use_after_free_read_nofault),
KFENCE_KUNIT_CASE(test_double_free),
KFENCE_KUNIT_CASE(test_invalid_addr_free),
KFENCE_KUNIT_CASE(test_corruption),
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC RESEND v2 02/13] powerpc: mm: Fix kfence page fault reporting
2024-10-15 1:33 [RFC RESEND v2 00/13] powerpc/kfence: Improve kfence support Ritesh Harjani (IBM)
2024-10-15 1:33 ` [RFC RESEND v2 01/13] mm/kfence: Add a new kunit test test_use_after_free_read_nofault() Ritesh Harjani (IBM)
@ 2024-10-15 1:33 ` Ritesh Harjani (IBM)
2024-10-15 6:42 ` Christophe Leroy
2024-10-15 1:33 ` [RFC RESEND v2 03/13] book3s64/hash: Remove kfence support temporarily Ritesh Harjani (IBM)
` (10 subsequent siblings)
12 siblings, 1 reply; 16+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-15 1:33 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM), Disha Goel
copy_from_kernel_nofault() can be called when doing read of /proc/kcore.
/proc/kcore can have some unmapped kfence objects which when read via
copy_from_kernel_nofault() can cause page faults. Since *_nofault()
functions define their own fixup table for handling fault, use that
instead of asking kfence to handle such faults.
Hence we search the exception tables for the nip which generated the
fault. If there is an entry then we let the fixup table handler handle the
page fault by returning an error from within ___do_page_fault().
This can be easily triggered if someone tries to do dd from /proc/kcore.
dd if=/proc/kcore of=/dev/null bs=1M
<some example false negatives>
===============================
BUG: KFENCE: invalid read in copy_from_kernel_nofault+0xb0/0x1c8
Invalid read at 0x000000004f749d2e:
copy_from_kernel_nofault+0xb0/0x1c8
0xc0000000057f7950
read_kcore_iter+0x41c/0x9ac
proc_reg_read_iter+0xe4/0x16c
vfs_read+0x2e4/0x3b0
ksys_read+0x88/0x154
system_call_exception+0x124/0x340
system_call_common+0x160/0x2c4
BUG: KFENCE: use-after-free read in copy_from_kernel_nofault+0xb0/0x1c8
Use-after-free read at 0x000000008fbb08ad (in kfence-#0):
copy_from_kernel_nofault+0xb0/0x1c8
0xc0000000057f7950
read_kcore_iter+0x41c/0x9ac
proc_reg_read_iter+0xe4/0x16c
vfs_read+0x2e4/0x3b0
ksys_read+0x88/0x154
system_call_exception+0x124/0x340
system_call_common+0x160/0x2c4
Guessing the fix should go back to when we first got kfence on PPC32.
Fixes: 90cbac0e995d ("powerpc: Enable KFENCE for PPC32")
Reported-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/fault.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 81c77ddce2e3..fa825198f29f 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -439,9 +439,17 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
/*
* The kernel should never take an execute fault nor should it
* take a page fault to a kernel address or a page fault to a user
- * address outside of dedicated places
+ * address outside of dedicated places.
+ *
+ * Rather than kfence reporting false negatives, let the fixup table
+ * handler handle the page fault by returning SIGSEGV, if the fault
+ * has come from functions like copy_from_kernel_nofault().
*/
if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) {
+
+ if (search_exception_tables(instruction_pointer(regs)))
+ return SIGSEGV;
+
if (kfence_handle_page_fault(address, is_write, regs))
return 0;
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [RFC RESEND v2 02/13] powerpc: mm: Fix kfence page fault reporting
2024-10-15 1:33 ` [RFC RESEND v2 02/13] powerpc: mm: Fix kfence page fault reporting Ritesh Harjani (IBM)
@ 2024-10-15 6:42 ` Christophe Leroy
2024-10-15 8:19 ` Ritesh Harjani
0 siblings, 1 reply; 16+ messages in thread
From: Christophe Leroy @ 2024-10-15 6:42 UTC (permalink / raw)
To: Ritesh Harjani (IBM), linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Hari Bathini, Aneesh Kumar K . V, Donet Tom,
Pavithra Prakash, LKML, Disha Goel
Le 15/10/2024 à 03:33, Ritesh Harjani (IBM) a écrit :
> copy_from_kernel_nofault() can be called when doing read of /proc/kcore.
> /proc/kcore can have some unmapped kfence objects which when read via
> copy_from_kernel_nofault() can cause page faults. Since *_nofault()
> functions define their own fixup table for handling fault, use that
> instead of asking kfence to handle such faults.
>
> Hence we search the exception tables for the nip which generated the
> fault. If there is an entry then we let the fixup table handler handle the
> page fault by returning an error from within ___do_page_fault().
>
> This can be easily triggered if someone tries to do dd from /proc/kcore.
> dd if=/proc/kcore of=/dev/null bs=1M
>
> <some example false negatives>
> ===============================
> BUG: KFENCE: invalid read in copy_from_kernel_nofault+0xb0/0x1c8
> Invalid read at 0x000000004f749d2e:
> copy_from_kernel_nofault+0xb0/0x1c8
> 0xc0000000057f7950
> read_kcore_iter+0x41c/0x9ac
> proc_reg_read_iter+0xe4/0x16c
> vfs_read+0x2e4/0x3b0
> ksys_read+0x88/0x154
> system_call_exception+0x124/0x340
> system_call_common+0x160/0x2c4
>
> BUG: KFENCE: use-after-free read in copy_from_kernel_nofault+0xb0/0x1c8
> Use-after-free read at 0x000000008fbb08ad (in kfence-#0):
> copy_from_kernel_nofault+0xb0/0x1c8
> 0xc0000000057f7950
> read_kcore_iter+0x41c/0x9ac
> proc_reg_read_iter+0xe4/0x16c
> vfs_read+0x2e4/0x3b0
> ksys_read+0x88/0x154
> system_call_exception+0x124/0x340
> system_call_common+0x160/0x2c4
>
> Guessing the fix should go back to when we first got kfence on PPC32.
>
> Fixes: 90cbac0e995d ("powerpc: Enable KFENCE for PPC32")
> Reported-by: Disha Goel <disgoel@linux.ibm.com>
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> ---
> arch/powerpc/mm/fault.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 81c77ddce2e3..fa825198f29f 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -439,9 +439,17 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
> /*
> * The kernel should never take an execute fault nor should it
> * take a page fault to a kernel address or a page fault to a user
> - * address outside of dedicated places
> + * address outside of dedicated places.
> + *
> + * Rather than kfence reporting false negatives, let the fixup table
> + * handler handle the page fault by returning SIGSEGV, if the fault
> + * has come from functions like copy_from_kernel_nofault().
> */
> if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) {
> +
> + if (search_exception_tables(instruction_pointer(regs)))
> + return SIGSEGV;
This is a heavy operation. It should at least be done only when KFENCE
is built-in.
kfence_handle_page_fault() bails out immediately when
is_kfence_address() returns false, and is_kfence_address() returns
always false when KFENCE is not built-in.
So you could check that before calling the heavy weight
search_exception_tables().
if (is_kfence_address(address) &&
!search_exception_tables(instruction_pointer(regs)) &&
kfence_handle_page_fault(address, is_write, regs))
return 0;
> + return SIGSEGV;
> +
> if (kfence_handle_page_fault(address, is_write, regs))
> return 0;
>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [RFC RESEND v2 02/13] powerpc: mm: Fix kfence page fault reporting
2024-10-15 6:42 ` Christophe Leroy
@ 2024-10-15 8:19 ` Ritesh Harjani
0 siblings, 0 replies; 16+ messages in thread
From: Ritesh Harjani @ 2024-10-15 8:19 UTC (permalink / raw)
To: Christophe Leroy, linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Hari Bathini, Aneesh Kumar K . V, Donet Tom,
Pavithra Prakash, LKML, Disha Goel
Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 15/10/2024 à 03:33, Ritesh Harjani (IBM) a écrit :
>> copy_from_kernel_nofault() can be called when doing read of /proc/kcore.
>> /proc/kcore can have some unmapped kfence objects which when read via
>> copy_from_kernel_nofault() can cause page faults. Since *_nofault()
>> functions define their own fixup table for handling fault, use that
>> instead of asking kfence to handle such faults.
>>
>> Hence we search the exception tables for the nip which generated the
>> fault. If there is an entry then we let the fixup table handler handle the
>> page fault by returning an error from within ___do_page_fault().
>>
>> This can be easily triggered if someone tries to do dd from /proc/kcore.
>> dd if=/proc/kcore of=/dev/null bs=1M
>>
>> <some example false negatives>
>> ===============================
>> BUG: KFENCE: invalid read in copy_from_kernel_nofault+0xb0/0x1c8
>> Invalid read at 0x000000004f749d2e:
>> copy_from_kernel_nofault+0xb0/0x1c8
>> 0xc0000000057f7950
>> read_kcore_iter+0x41c/0x9ac
>> proc_reg_read_iter+0xe4/0x16c
>> vfs_read+0x2e4/0x3b0
>> ksys_read+0x88/0x154
>> system_call_exception+0x124/0x340
>> system_call_common+0x160/0x2c4
>>
>> BUG: KFENCE: use-after-free read in copy_from_kernel_nofault+0xb0/0x1c8
>> Use-after-free read at 0x000000008fbb08ad (in kfence-#0):
>> copy_from_kernel_nofault+0xb0/0x1c8
>> 0xc0000000057f7950
>> read_kcore_iter+0x41c/0x9ac
>> proc_reg_read_iter+0xe4/0x16c
>> vfs_read+0x2e4/0x3b0
>> ksys_read+0x88/0x154
>> system_call_exception+0x124/0x340
>> system_call_common+0x160/0x2c4
>>
>> Guessing the fix should go back to when we first got kfence on PPC32.
>>
>> Fixes: 90cbac0e995d ("powerpc: Enable KFENCE for PPC32")
>> Reported-by: Disha Goel <disgoel@linux.ibm.com>
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> ---
>> arch/powerpc/mm/fault.c | 10 +++++++++-
>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>> index 81c77ddce2e3..fa825198f29f 100644
>> --- a/arch/powerpc/mm/fault.c
>> +++ b/arch/powerpc/mm/fault.c
>> @@ -439,9 +439,17 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
>> /*
>> * The kernel should never take an execute fault nor should it
>> * take a page fault to a kernel address or a page fault to a user
>> - * address outside of dedicated places
>> + * address outside of dedicated places.
>> + *
>> + * Rather than kfence reporting false negatives, let the fixup table
>> + * handler handle the page fault by returning SIGSEGV, if the fault
>> + * has come from functions like copy_from_kernel_nofault().
>> */
>> if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) {
>> +
>> + if (search_exception_tables(instruction_pointer(regs)))
>> + return SIGSEGV;
>
> This is a heavy operation. It should at least be done only when KFENCE
> is built-in.
>
> kfence_handle_page_fault() bails out immediately when
> is_kfence_address() returns false, and is_kfence_address() returns
> always false when KFENCE is not built-in.
>
> So you could check that before calling the heavy weight
> search_exception_tables().
>
> if (is_kfence_address(address) &&
> !search_exception_tables(instruction_pointer(regs)) &&
> kfence_handle_page_fault(address, is_write, regs))
> return 0;
>
Yes, thanks for the input. I agree with above. I will take that in v3.
I will wait for sometime for any review comments on other patches before
spinning a v3, though.
>
>
> > + return SIGSEGV;
>
>> +
>> if (kfence_handle_page_fault(address, is_write, regs))
>> return 0;
>>
-ritesh
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC RESEND v2 03/13] book3s64/hash: Remove kfence support temporarily
2024-10-15 1:33 [RFC RESEND v2 00/13] powerpc/kfence: Improve kfence support Ritesh Harjani (IBM)
2024-10-15 1:33 ` [RFC RESEND v2 01/13] mm/kfence: Add a new kunit test test_use_after_free_read_nofault() Ritesh Harjani (IBM)
2024-10-15 1:33 ` [RFC RESEND v2 02/13] powerpc: mm: Fix kfence page fault reporting Ritesh Harjani (IBM)
@ 2024-10-15 1:33 ` Ritesh Harjani (IBM)
2024-10-15 1:33 ` [RFC RESEND v2 04/13] book3s64/hash: Refactor kernel linear map related calls Ritesh Harjani (IBM)
` (9 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-15 1:33 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
Kfence on book3s Hash on pseries is anyways broken. It fails to boot
due to RMA size limitation. That is because, kfence with Hash uses
debug_pagealloc infrastructure. debug_pagealloc allocates linear map
for entire dram size instead of just kfence relevant objects.
This means for 16TB of DRAM it will require (16TB >> PAGE_SHIFT)
which is 256MB which is half of RMA region on P8.
crash kernel reserves 256MB and we also need 2048 * 16KB * 3 for
emergency stack and some more for paca allocations.
That means there is not enough memory for reserving the full linear map
in the RMA region, if the DRAM size is too big (>=16TB)
(The issue is seen above 8TB with crash kernel 256 MB reservation).
Now Kfence does not require linear memory map for entire DRAM.
It only needs for kfence objects. So this patch temporarily removes the
kfence functionality since debug_pagealloc code needs some refactoring.
We will bring in kfence on Hash support in later patches.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/include/asm/kfence.h | 5 +++++
arch/powerpc/mm/book3s64/hash_utils.c | 16 +++++++++++-----
2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/kfence.h b/arch/powerpc/include/asm/kfence.h
index fab124ada1c7..f3a9476a71b3 100644
--- a/arch/powerpc/include/asm/kfence.h
+++ b/arch/powerpc/include/asm/kfence.h
@@ -10,6 +10,7 @@
#include <linux/mm.h>
#include <asm/pgtable.h>
+#include <asm/mmu.h>
#ifdef CONFIG_PPC64_ELF_ABI_V1
#define ARCH_FUNC_PREFIX "."
@@ -25,6 +26,10 @@ static inline void disable_kfence(void)
static inline bool arch_kfence_init_pool(void)
{
+#ifdef CONFIG_PPC64
+ if (!radix_enabled())
+ return false;
+#endif
return !kfence_disabled;
}
#endif
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index e1eadd03f133..296bb74dbf40 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -431,7 +431,7 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long vend,
break;
cond_resched();
- if (debug_pagealloc_enabled_or_kfence() &&
+ if (debug_pagealloc_enabled() &&
(paddr >> PAGE_SHIFT) < linear_map_hash_count)
linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80;
}
@@ -814,7 +814,7 @@ static void __init htab_init_page_sizes(void)
bool aligned = true;
init_hpte_page_sizes();
- if (!debug_pagealloc_enabled_or_kfence()) {
+ if (!debug_pagealloc_enabled()) {
/*
* Pick a size for the linear mapping. Currently, we only
* support 16M, 1M and 4K which is the default
@@ -1134,7 +1134,7 @@ static void __init htab_initialize(void)
prot = pgprot_val(PAGE_KERNEL);
- if (debug_pagealloc_enabled_or_kfence()) {
+ if (debug_pagealloc_enabled()) {
linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
linear_map_hash_slots = memblock_alloc_try_nid(
linear_map_hash_count, 1, MEMBLOCK_LOW_LIMIT,
@@ -2120,7 +2120,7 @@ void hpt_do_stress(unsigned long ea, unsigned long hpte_group)
}
}
-#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
+#ifdef CONFIG_DEBUG_PAGEALLOC
static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
@@ -2194,7 +2194,13 @@ int hash__kernel_map_pages(struct page *page, int numpages, int enable)
local_irq_restore(flags);
return 0;
}
-#endif /* CONFIG_DEBUG_PAGEALLOC || CONFIG_KFENCE */
+#else /* CONFIG_DEBUG_PAGEALLOC */
+int hash__kernel_map_pages(struct page *page, int numpages,
+ int enable)
+{
+ return 0;
+}
+#endif /* CONFIG_DEBUG_PAGEALLOC */
void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
phys_addr_t first_memblock_size)
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC RESEND v2 04/13] book3s64/hash: Refactor kernel linear map related calls
2024-10-15 1:33 [RFC RESEND v2 00/13] powerpc/kfence: Improve kfence support Ritesh Harjani (IBM)
` (2 preceding siblings ...)
2024-10-15 1:33 ` [RFC RESEND v2 03/13] book3s64/hash: Remove kfence support temporarily Ritesh Harjani (IBM)
@ 2024-10-15 1:33 ` Ritesh Harjani (IBM)
2024-10-15 1:33 ` [RFC RESEND v2 05/13] book3s64/hash: Add hash_debug_pagealloc_add_slot() function Ritesh Harjani (IBM)
` (8 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-15 1:33 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
This just brings all linear map related handling at one place instead of
having those functions scattered in hash_utils file.
Makes it easy for review.
No functionality changes in this patch.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 164 +++++++++++++-------------
1 file changed, 82 insertions(+), 82 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 296bb74dbf40..82151fff9648 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -273,6 +273,88 @@ void hash__tlbiel_all(unsigned int action)
WARN(1, "%s called on pre-POWER7 CPU\n", __func__);
}
+#ifdef CONFIG_DEBUG_PAGEALLOC
+static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
+
+static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
+{
+ unsigned long hash;
+ unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
+ unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize);
+ unsigned long mode = htab_convert_pte_flags(pgprot_val(PAGE_KERNEL), HPTE_USE_KERNEL_KEY);
+ long ret;
+
+ hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
+
+ /* Don't create HPTE entries for bad address */
+ if (!vsid)
+ return;
+
+ if (linear_map_hash_slots[lmi] & 0x80)
+ return;
+
+ ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), mode,
+ HPTE_V_BOLTED,
+ mmu_linear_psize, mmu_kernel_ssize);
+
+ BUG_ON (ret < 0);
+ raw_spin_lock(&linear_map_hash_lock);
+ BUG_ON(linear_map_hash_slots[lmi] & 0x80);
+ linear_map_hash_slots[lmi] = ret | 0x80;
+ raw_spin_unlock(&linear_map_hash_lock);
+}
+
+static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long lmi)
+{
+ unsigned long hash, hidx, slot;
+ unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
+ unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize);
+
+ hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
+ raw_spin_lock(&linear_map_hash_lock);
+ if (!(linear_map_hash_slots[lmi] & 0x80)) {
+ raw_spin_unlock(&linear_map_hash_lock);
+ return;
+ }
+ hidx = linear_map_hash_slots[lmi] & 0x7f;
+ linear_map_hash_slots[lmi] = 0;
+ raw_spin_unlock(&linear_map_hash_lock);
+ if (hidx & _PTEIDX_SECONDARY)
+ hash = ~hash;
+ slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+ slot += hidx & _PTEIDX_GROUP_IX;
+ mmu_hash_ops.hpte_invalidate(slot, vpn, mmu_linear_psize,
+ mmu_linear_psize,
+ mmu_kernel_ssize, 0);
+}
+
+int hash__kernel_map_pages(struct page *page, int numpages, int enable)
+{
+ unsigned long flags, vaddr, lmi;
+ int i;
+
+ local_irq_save(flags);
+ for (i = 0; i < numpages; i++, page++) {
+ vaddr = (unsigned long)page_address(page);
+ lmi = __pa(vaddr) >> PAGE_SHIFT;
+ if (lmi >= linear_map_hash_count)
+ continue;
+ if (enable)
+ kernel_map_linear_page(vaddr, lmi);
+ else
+ kernel_unmap_linear_page(vaddr, lmi);
+ }
+ local_irq_restore(flags);
+ return 0;
+}
+#else /* CONFIG_DEBUG_PAGEALLOC */
+int hash__kernel_map_pages(struct page *page, int numpages,
+ int enable)
+{
+ return 0;
+}
+#endif /* CONFIG_DEBUG_PAGEALLOC */
+
/*
* 'R' and 'C' update notes:
* - Under pHyp or KVM, the updatepp path will not set C, thus it *will*
@@ -2120,88 +2202,6 @@ void hpt_do_stress(unsigned long ea, unsigned long hpte_group)
}
}
-#ifdef CONFIG_DEBUG_PAGEALLOC
-static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
-
-static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
-{
- unsigned long hash;
- unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
- unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize);
- unsigned long mode = htab_convert_pte_flags(pgprot_val(PAGE_KERNEL), HPTE_USE_KERNEL_KEY);
- long ret;
-
- hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
-
- /* Don't create HPTE entries for bad address */
- if (!vsid)
- return;
-
- if (linear_map_hash_slots[lmi] & 0x80)
- return;
-
- ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), mode,
- HPTE_V_BOLTED,
- mmu_linear_psize, mmu_kernel_ssize);
-
- BUG_ON (ret < 0);
- raw_spin_lock(&linear_map_hash_lock);
- BUG_ON(linear_map_hash_slots[lmi] & 0x80);
- linear_map_hash_slots[lmi] = ret | 0x80;
- raw_spin_unlock(&linear_map_hash_lock);
-}
-
-static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long lmi)
-{
- unsigned long hash, hidx, slot;
- unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
- unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize);
-
- hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
- raw_spin_lock(&linear_map_hash_lock);
- if (!(linear_map_hash_slots[lmi] & 0x80)) {
- raw_spin_unlock(&linear_map_hash_lock);
- return;
- }
- hidx = linear_map_hash_slots[lmi] & 0x7f;
- linear_map_hash_slots[lmi] = 0;
- raw_spin_unlock(&linear_map_hash_lock);
- if (hidx & _PTEIDX_SECONDARY)
- hash = ~hash;
- slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
- slot += hidx & _PTEIDX_GROUP_IX;
- mmu_hash_ops.hpte_invalidate(slot, vpn, mmu_linear_psize,
- mmu_linear_psize,
- mmu_kernel_ssize, 0);
-}
-
-int hash__kernel_map_pages(struct page *page, int numpages, int enable)
-{
- unsigned long flags, vaddr, lmi;
- int i;
-
- local_irq_save(flags);
- for (i = 0; i < numpages; i++, page++) {
- vaddr = (unsigned long)page_address(page);
- lmi = __pa(vaddr) >> PAGE_SHIFT;
- if (lmi >= linear_map_hash_count)
- continue;
- if (enable)
- kernel_map_linear_page(vaddr, lmi);
- else
- kernel_unmap_linear_page(vaddr, lmi);
- }
- local_irq_restore(flags);
- return 0;
-}
-#else /* CONFIG_DEBUG_PAGEALLOC */
-int hash__kernel_map_pages(struct page *page, int numpages,
- int enable)
-{
- return 0;
-}
-#endif /* CONFIG_DEBUG_PAGEALLOC */
-
void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
phys_addr_t first_memblock_size)
{
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC RESEND v2 05/13] book3s64/hash: Add hash_debug_pagealloc_add_slot() function
2024-10-15 1:33 [RFC RESEND v2 00/13] powerpc/kfence: Improve kfence support Ritesh Harjani (IBM)
` (3 preceding siblings ...)
2024-10-15 1:33 ` [RFC RESEND v2 04/13] book3s64/hash: Refactor kernel linear map related calls Ritesh Harjani (IBM)
@ 2024-10-15 1:33 ` Ritesh Harjani (IBM)
2024-10-15 1:33 ` [RFC RESEND v2 06/13] book3s64/hash: Add hash_debug_pagealloc_alloc_slots() function Ritesh Harjani (IBM)
` (7 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-15 1:33 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
This adds hash_debug_pagealloc_add_slot() function instead of open
coding that in htab_bolt_mapping(). This is required since we will be
separating kfence functionality to not depend upon debug_pagealloc.
No functionality change in this patch.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 82151fff9648..6e3860224351 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -328,6 +328,14 @@ static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long lmi)
mmu_kernel_ssize, 0);
}
+static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot)
+{
+ if (!debug_pagealloc_enabled())
+ return;
+ if ((paddr >> PAGE_SHIFT) < linear_map_hash_count)
+ linear_map_hash_slots[paddr >> PAGE_SHIFT] = slot | 0x80;
+}
+
int hash__kernel_map_pages(struct page *page, int numpages, int enable)
{
unsigned long flags, vaddr, lmi;
@@ -353,6 +361,7 @@ int hash__kernel_map_pages(struct page *page, int numpages,
{
return 0;
}
+static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot) {}
#endif /* CONFIG_DEBUG_PAGEALLOC */
/*
@@ -513,9 +522,7 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long vend,
break;
cond_resched();
- if (debug_pagealloc_enabled() &&
- (paddr >> PAGE_SHIFT) < linear_map_hash_count)
- linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80;
+ hash_debug_pagealloc_add_slot(paddr, ret);
}
return ret < 0 ? ret : 0;
}
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC RESEND v2 06/13] book3s64/hash: Add hash_debug_pagealloc_alloc_slots() function
2024-10-15 1:33 [RFC RESEND v2 00/13] powerpc/kfence: Improve kfence support Ritesh Harjani (IBM)
` (4 preceding siblings ...)
2024-10-15 1:33 ` [RFC RESEND v2 05/13] book3s64/hash: Add hash_debug_pagealloc_add_slot() function Ritesh Harjani (IBM)
@ 2024-10-15 1:33 ` Ritesh Harjani (IBM)
2024-10-15 1:33 ` [RFC RESEND v2 07/13] book3s64/hash: Refactor hash__kernel_map_pages() function Ritesh Harjani (IBM)
` (6 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-15 1:33 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
This adds hash_debug_pagealloc_alloc_slots() function instead of open
coding that in htab_initialize(). This is required since we will be
separating the kfence functionality to not depend upon debug_pagealloc.
Now that everything required for debug_pagealloc is under a #ifdef
config. Bring in linear_map_hash_slots and linear_map_hash_count
variables under the same config too.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 29 ++++++++++++++++-----------
1 file changed, 17 insertions(+), 12 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 6e3860224351..030c120d1399 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -123,8 +123,6 @@ EXPORT_SYMBOL_GPL(mmu_slb_size);
#ifdef CONFIG_PPC_64K_PAGES
int mmu_ci_restrictions;
#endif
-static u8 *linear_map_hash_slots;
-static unsigned long linear_map_hash_count;
struct mmu_hash_ops mmu_hash_ops __ro_after_init;
EXPORT_SYMBOL(mmu_hash_ops);
@@ -274,6 +272,8 @@ void hash__tlbiel_all(unsigned int action)
}
#ifdef CONFIG_DEBUG_PAGEALLOC
+static u8 *linear_map_hash_slots;
+static unsigned long linear_map_hash_count;
static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
@@ -328,6 +328,19 @@ static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long lmi)
mmu_kernel_ssize, 0);
}
+static inline void hash_debug_pagealloc_alloc_slots(void)
+{
+ if (!debug_pagealloc_enabled())
+ return;
+ linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
+ linear_map_hash_slots = memblock_alloc_try_nid(
+ linear_map_hash_count, 1, MEMBLOCK_LOW_LIMIT,
+ ppc64_rma_size, NUMA_NO_NODE);
+ if (!linear_map_hash_slots)
+ panic("%s: Failed to allocate %lu bytes max_addr=%pa\n",
+ __func__, linear_map_hash_count, &ppc64_rma_size);
+}
+
static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot)
{
if (!debug_pagealloc_enabled())
@@ -361,6 +374,7 @@ int hash__kernel_map_pages(struct page *page, int numpages,
{
return 0;
}
+static inline void hash_debug_pagealloc_alloc_slots(void) {}
static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot) {}
#endif /* CONFIG_DEBUG_PAGEALLOC */
@@ -1223,16 +1237,7 @@ static void __init htab_initialize(void)
prot = pgprot_val(PAGE_KERNEL);
- if (debug_pagealloc_enabled()) {
- linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
- linear_map_hash_slots = memblock_alloc_try_nid(
- linear_map_hash_count, 1, MEMBLOCK_LOW_LIMIT,
- ppc64_rma_size, NUMA_NO_NODE);
- if (!linear_map_hash_slots)
- panic("%s: Failed to allocate %lu bytes max_addr=%pa\n",
- __func__, linear_map_hash_count, &ppc64_rma_size);
- }
-
+ hash_debug_pagealloc_alloc_slots();
/* create bolted the linear mapping in the hash table */
for_each_mem_range(i, &base, &end) {
size = end - base;
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC RESEND v2 07/13] book3s64/hash: Refactor hash__kernel_map_pages() function
2024-10-15 1:33 [RFC RESEND v2 00/13] powerpc/kfence: Improve kfence support Ritesh Harjani (IBM)
` (5 preceding siblings ...)
2024-10-15 1:33 ` [RFC RESEND v2 06/13] book3s64/hash: Add hash_debug_pagealloc_alloc_slots() function Ritesh Harjani (IBM)
@ 2024-10-15 1:33 ` Ritesh Harjani (IBM)
2024-10-15 1:33 ` [RFC RESEND v2 08/13] book3s64/hash: Make kernel_map_linear_page() generic Ritesh Harjani (IBM)
` (5 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-15 1:33 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
This refactors hash__kernel_map_pages() function to call
hash_debug_pagealloc_map_pages(). This will come useful when we will add
kfence support.
No functionality changes in this patch.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 030c120d1399..da9b089c8e8b 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -349,7 +349,8 @@ static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot)
linear_map_hash_slots[paddr >> PAGE_SHIFT] = slot | 0x80;
}
-int hash__kernel_map_pages(struct page *page, int numpages, int enable)
+static int hash_debug_pagealloc_map_pages(struct page *page, int numpages,
+ int enable)
{
unsigned long flags, vaddr, lmi;
int i;
@@ -368,6 +369,12 @@ int hash__kernel_map_pages(struct page *page, int numpages, int enable)
local_irq_restore(flags);
return 0;
}
+
+int hash__kernel_map_pages(struct page *page, int numpages, int enable)
+{
+ return hash_debug_pagealloc_map_pages(page, numpages, enable);
+}
+
#else /* CONFIG_DEBUG_PAGEALLOC */
int hash__kernel_map_pages(struct page *page, int numpages,
int enable)
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC RESEND v2 08/13] book3s64/hash: Make kernel_map_linear_page() generic
2024-10-15 1:33 [RFC RESEND v2 00/13] powerpc/kfence: Improve kfence support Ritesh Harjani (IBM)
` (6 preceding siblings ...)
2024-10-15 1:33 ` [RFC RESEND v2 07/13] book3s64/hash: Refactor hash__kernel_map_pages() function Ritesh Harjani (IBM)
@ 2024-10-15 1:33 ` Ritesh Harjani (IBM)
2024-10-15 1:33 ` [RFC RESEND v2 09/13] book3s64/hash: Disable debug_pagealloc if it requires more memory Ritesh Harjani (IBM)
` (4 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-15 1:33 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
Currently kernel_map_linear_page() function assumes to be working on
linear_map_hash_slots array. But since in later patches we need a
separate linear map array for kfence, hence make
kernel_map_linear_page() take a linear map array and lock in it's
function argument.
This is needed to separate out kfence from debug_pagealloc
infrastructure.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 47 ++++++++++++++-------------
1 file changed, 25 insertions(+), 22 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index da9b089c8e8b..cc2eaa97982c 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -272,11 +272,8 @@ void hash__tlbiel_all(unsigned int action)
}
#ifdef CONFIG_DEBUG_PAGEALLOC
-static u8 *linear_map_hash_slots;
-static unsigned long linear_map_hash_count;
-static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
-
-static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
+static void kernel_map_linear_page(unsigned long vaddr, unsigned long idx,
+ u8 *slots, raw_spinlock_t *lock)
{
unsigned long hash;
unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
@@ -290,7 +287,7 @@ static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
if (!vsid)
return;
- if (linear_map_hash_slots[lmi] & 0x80)
+ if (slots[idx] & 0x80)
return;
ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), mode,
@@ -298,36 +295,40 @@ static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
mmu_linear_psize, mmu_kernel_ssize);
BUG_ON (ret < 0);
- raw_spin_lock(&linear_map_hash_lock);
- BUG_ON(linear_map_hash_slots[lmi] & 0x80);
- linear_map_hash_slots[lmi] = ret | 0x80;
- raw_spin_unlock(&linear_map_hash_lock);
+ raw_spin_lock(lock);
+ BUG_ON(slots[idx] & 0x80);
+ slots[idx] = ret | 0x80;
+ raw_spin_unlock(lock);
}
-static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long lmi)
+static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long idx,
+ u8 *slots, raw_spinlock_t *lock)
{
- unsigned long hash, hidx, slot;
+ unsigned long hash, hslot, slot;
unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize);
hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
- raw_spin_lock(&linear_map_hash_lock);
- if (!(linear_map_hash_slots[lmi] & 0x80)) {
- raw_spin_unlock(&linear_map_hash_lock);
+ raw_spin_lock(lock);
+ if (!(slots[idx] & 0x80)) {
+ raw_spin_unlock(lock);
return;
}
- hidx = linear_map_hash_slots[lmi] & 0x7f;
- linear_map_hash_slots[lmi] = 0;
- raw_spin_unlock(&linear_map_hash_lock);
- if (hidx & _PTEIDX_SECONDARY)
+ hslot = slots[idx] & 0x7f;
+ slots[idx] = 0;
+ raw_spin_unlock(lock);
+ if (hslot & _PTEIDX_SECONDARY)
hash = ~hash;
slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
- slot += hidx & _PTEIDX_GROUP_IX;
+ slot += hslot & _PTEIDX_GROUP_IX;
mmu_hash_ops.hpte_invalidate(slot, vpn, mmu_linear_psize,
mmu_linear_psize,
mmu_kernel_ssize, 0);
}
+static u8 *linear_map_hash_slots;
+static unsigned long linear_map_hash_count;
+static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
static inline void hash_debug_pagealloc_alloc_slots(void)
{
if (!debug_pagealloc_enabled())
@@ -362,9 +363,11 @@ static int hash_debug_pagealloc_map_pages(struct page *page, int numpages,
if (lmi >= linear_map_hash_count)
continue;
if (enable)
- kernel_map_linear_page(vaddr, lmi);
+ kernel_map_linear_page(vaddr, lmi,
+ linear_map_hash_slots, &linear_map_hash_lock);
else
- kernel_unmap_linear_page(vaddr, lmi);
+ kernel_unmap_linear_page(vaddr, lmi,
+ linear_map_hash_slots, &linear_map_hash_lock);
}
local_irq_restore(flags);
return 0;
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC RESEND v2 09/13] book3s64/hash: Disable debug_pagealloc if it requires more memory
2024-10-15 1:33 [RFC RESEND v2 00/13] powerpc/kfence: Improve kfence support Ritesh Harjani (IBM)
` (7 preceding siblings ...)
2024-10-15 1:33 ` [RFC RESEND v2 08/13] book3s64/hash: Make kernel_map_linear_page() generic Ritesh Harjani (IBM)
@ 2024-10-15 1:33 ` Ritesh Harjani (IBM)
2024-10-15 1:33 ` [RFC RESEND v2 10/13] book3s64/hash: Add kfence functionality Ritesh Harjani (IBM)
` (3 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-15 1:33 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
Make size of the linear map to be allocated in RMA region to be of
ppc64_rma_size / 4. If debug_pagealloc requires more memory than that
then do not allocate any memory and disable debug_pagealloc.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index cc2eaa97982c..cffbb6499ac4 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -331,9 +331,19 @@ static unsigned long linear_map_hash_count;
static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
static inline void hash_debug_pagealloc_alloc_slots(void)
{
+ unsigned long max_hash_count = ppc64_rma_size / 4;
+
if (!debug_pagealloc_enabled())
return;
linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
+ if (unlikely(linear_map_hash_count > max_hash_count)) {
+ pr_info("linear map size (%llu) greater than 4 times RMA region (%llu). Disabling debug_pagealloc\n",
+ ((u64)linear_map_hash_count << PAGE_SHIFT),
+ ppc64_rma_size);
+ linear_map_hash_count = 0;
+ return;
+ }
+
linear_map_hash_slots = memblock_alloc_try_nid(
linear_map_hash_count, 1, MEMBLOCK_LOW_LIMIT,
ppc64_rma_size, NUMA_NO_NODE);
@@ -344,7 +354,7 @@ static inline void hash_debug_pagealloc_alloc_slots(void)
static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot)
{
- if (!debug_pagealloc_enabled())
+ if (!debug_pagealloc_enabled() || !linear_map_hash_count)
return;
if ((paddr >> PAGE_SHIFT) < linear_map_hash_count)
linear_map_hash_slots[paddr >> PAGE_SHIFT] = slot | 0x80;
@@ -356,6 +366,9 @@ static int hash_debug_pagealloc_map_pages(struct page *page, int numpages,
unsigned long flags, vaddr, lmi;
int i;
+ if (!debug_pagealloc_enabled() || !linear_map_hash_count)
+ return 0;
+
local_irq_save(flags);
for (i = 0; i < numpages; i++, page++) {
vaddr = (unsigned long)page_address(page);
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC RESEND v2 10/13] book3s64/hash: Add kfence functionality
2024-10-15 1:33 [RFC RESEND v2 00/13] powerpc/kfence: Improve kfence support Ritesh Harjani (IBM)
` (8 preceding siblings ...)
2024-10-15 1:33 ` [RFC RESEND v2 09/13] book3s64/hash: Disable debug_pagealloc if it requires more memory Ritesh Harjani (IBM)
@ 2024-10-15 1:33 ` Ritesh Harjani (IBM)
2024-10-15 1:33 ` [RFC RESEND v2 11/13] book3s64/radix: Refactoring common kfence related functions Ritesh Harjani (IBM)
` (2 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-15 1:33 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
Now that linear map functionality of debug_pagealloc is made generic,
enable kfence to use this generic infrastructure.
1. Define kfence related linear map variables.
- u8 *linear_map_kf_hash_slots;
- unsigned long linear_map_kf_hash_count;
- DEFINE_RAW_SPINLOCK(linear_map_kf_hash_lock);
2. The linear map size allocated in RMA region is quite small
(KFENCE_POOL_SIZE >> PAGE_SHIFT) which is 512 bytes by default.
3. kfence pool memory is reserved using memblock_phys_alloc() which has
can come from anywhere.
(default 255 objects => ((1+255) * 2) << PAGE_SHIFT = 32MB)
4. The hash slot information for kfence memory gets added in linear map
in hash_linear_map_add_slot() (which also adds for debug_pagealloc).
Reported-by: Pavithra Prakash <pavrampu@linux.vnet.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/include/asm/kfence.h | 5 -
arch/powerpc/mm/book3s64/hash_utils.c | 162 +++++++++++++++++++++++---
2 files changed, 149 insertions(+), 18 deletions(-)
diff --git a/arch/powerpc/include/asm/kfence.h b/arch/powerpc/include/asm/kfence.h
index f3a9476a71b3..fab124ada1c7 100644
--- a/arch/powerpc/include/asm/kfence.h
+++ b/arch/powerpc/include/asm/kfence.h
@@ -10,7 +10,6 @@
#include <linux/mm.h>
#include <asm/pgtable.h>
-#include <asm/mmu.h>
#ifdef CONFIG_PPC64_ELF_ABI_V1
#define ARCH_FUNC_PREFIX "."
@@ -26,10 +25,6 @@ static inline void disable_kfence(void)
static inline bool arch_kfence_init_pool(void)
{
-#ifdef CONFIG_PPC64
- if (!radix_enabled())
- return false;
-#endif
return !kfence_disabled;
}
#endif
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index cffbb6499ac4..53e6f3a524eb 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -40,6 +40,7 @@
#include <linux/random.h>
#include <linux/elf-randomize.h>
#include <linux/of_fdt.h>
+#include <linux/kfence.h>
#include <asm/interrupt.h>
#include <asm/processor.h>
@@ -66,6 +67,7 @@
#include <asm/pte-walk.h>
#include <asm/asm-prototypes.h>
#include <asm/ultravisor.h>
+#include <asm/kfence.h>
#include <mm/mmu_decl.h>
@@ -271,7 +273,7 @@ void hash__tlbiel_all(unsigned int action)
WARN(1, "%s called on pre-POWER7 CPU\n", __func__);
}
-#ifdef CONFIG_DEBUG_PAGEALLOC
+#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
static void kernel_map_linear_page(unsigned long vaddr, unsigned long idx,
u8 *slots, raw_spinlock_t *lock)
{
@@ -325,11 +327,13 @@ static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long idx,
mmu_linear_psize,
mmu_kernel_ssize, 0);
}
+#endif
+#ifdef CONFIG_DEBUG_PAGEALLOC
static u8 *linear_map_hash_slots;
static unsigned long linear_map_hash_count;
static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
-static inline void hash_debug_pagealloc_alloc_slots(void)
+static void hash_debug_pagealloc_alloc_slots(void)
{
unsigned long max_hash_count = ppc64_rma_size / 4;
@@ -352,7 +356,8 @@ static inline void hash_debug_pagealloc_alloc_slots(void)
__func__, linear_map_hash_count, &ppc64_rma_size);
}
-static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot)
+static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr,
+ int slot)
{
if (!debug_pagealloc_enabled() || !linear_map_hash_count)
return;
@@ -386,20 +391,148 @@ static int hash_debug_pagealloc_map_pages(struct page *page, int numpages,
return 0;
}
-int hash__kernel_map_pages(struct page *page, int numpages, int enable)
+#else /* CONFIG_DEBUG_PAGEALLOC */
+static inline void hash_debug_pagealloc_alloc_slots(void) {}
+static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot) {}
+static int __maybe_unused
+hash_debug_pagealloc_map_pages(struct page *page, int numpages, int enable)
{
- return hash_debug_pagealloc_map_pages(page, numpages, enable);
+ return 0;
}
+#endif /* CONFIG_DEBUG_PAGEALLOC */
-#else /* CONFIG_DEBUG_PAGEALLOC */
-int hash__kernel_map_pages(struct page *page, int numpages,
- int enable)
+#ifdef CONFIG_KFENCE
+static u8 *linear_map_kf_hash_slots;
+static unsigned long linear_map_kf_hash_count;
+static DEFINE_RAW_SPINLOCK(linear_map_kf_hash_lock);
+
+static phys_addr_t kfence_pool;
+
+static inline void hash_kfence_alloc_pool(void)
+{
+
+ // allocate linear map for kfence within RMA region
+ linear_map_kf_hash_count = KFENCE_POOL_SIZE >> PAGE_SHIFT;
+ linear_map_kf_hash_slots = memblock_alloc_try_nid(
+ linear_map_kf_hash_count, 1,
+ MEMBLOCK_LOW_LIMIT, ppc64_rma_size,
+ NUMA_NO_NODE);
+ if (!linear_map_kf_hash_slots) {
+ pr_err("%s: memblock for linear map (%lu) failed\n", __func__,
+ linear_map_kf_hash_count);
+ goto err;
+ }
+
+ // allocate kfence pool early
+ kfence_pool = memblock_phys_alloc_range(KFENCE_POOL_SIZE, PAGE_SIZE,
+ MEMBLOCK_LOW_LIMIT, MEMBLOCK_ALLOC_ANYWHERE);
+ if (!kfence_pool) {
+ pr_err("%s: memblock for kfence pool (%lu) failed\n", __func__,
+ KFENCE_POOL_SIZE);
+ memblock_free(linear_map_kf_hash_slots,
+ linear_map_kf_hash_count);
+ linear_map_kf_hash_count = 0;
+ goto err;
+ }
+ memblock_mark_nomap(kfence_pool, KFENCE_POOL_SIZE);
+
+ return;
+err:
+ pr_info("Disabling kfence\n");
+ disable_kfence();
+}
+
+static inline void hash_kfence_map_pool(void)
+{
+ unsigned long kfence_pool_start, kfence_pool_end;
+ unsigned long prot = pgprot_val(PAGE_KERNEL);
+
+ if (!kfence_pool)
+ return;
+
+ kfence_pool_start = (unsigned long) __va(kfence_pool);
+ kfence_pool_end = kfence_pool_start + KFENCE_POOL_SIZE;
+ __kfence_pool = (char *) kfence_pool_start;
+ BUG_ON(htab_bolt_mapping(kfence_pool_start, kfence_pool_end,
+ kfence_pool, prot, mmu_linear_psize,
+ mmu_kernel_ssize));
+ memblock_clear_nomap(kfence_pool, KFENCE_POOL_SIZE);
+}
+
+static inline void hash_kfence_add_slot(phys_addr_t paddr, int slot)
{
+ unsigned long vaddr = (unsigned long) __va(paddr);
+ unsigned long lmi = (vaddr - (unsigned long)__kfence_pool)
+ >> PAGE_SHIFT;
+
+ if (!kfence_pool)
+ return;
+ BUG_ON(!is_kfence_address((void *)vaddr));
+ BUG_ON(lmi >= linear_map_kf_hash_count);
+ linear_map_kf_hash_slots[lmi] = slot | 0x80;
+}
+
+static int hash_kfence_map_pages(struct page *page, int numpages, int enable)
+{
+ unsigned long flags, vaddr, lmi;
+ int i;
+
+ WARN_ON_ONCE(!linear_map_kf_hash_count);
+ local_irq_save(flags);
+ for (i = 0; i < numpages; i++, page++) {
+ vaddr = (unsigned long)page_address(page);
+ lmi = (vaddr - (unsigned long)__kfence_pool) >> PAGE_SHIFT;
+
+ /* Ideally this should never happen */
+ if (lmi >= linear_map_kf_hash_count) {
+ WARN_ON_ONCE(1);
+ continue;
+ }
+
+ if (enable)
+ kernel_map_linear_page(vaddr, lmi,
+ linear_map_kf_hash_slots,
+ &linear_map_kf_hash_lock);
+ else
+ kernel_unmap_linear_page(vaddr, lmi,
+ linear_map_kf_hash_slots,
+ &linear_map_kf_hash_lock);
+ }
+ local_irq_restore(flags);
return 0;
}
-static inline void hash_debug_pagealloc_alloc_slots(void) {}
-static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot) {}
-#endif /* CONFIG_DEBUG_PAGEALLOC */
+#else
+static inline void hash_kfence_alloc_pool(void) {}
+static inline void hash_kfence_map_pool(void) {}
+static inline void hash_kfence_add_slot(phys_addr_t paddr, int slot) {}
+static int __maybe_unused
+hash_kfence_map_pages(struct page *page, int numpages, int enable)
+{
+ return 0;
+}
+#endif
+
+#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
+int hash__kernel_map_pages(struct page *page, int numpages, int enable)
+{
+ void *vaddr = page_address(page);
+
+ if (is_kfence_address(vaddr))
+ return hash_kfence_map_pages(page, numpages, enable);
+ else
+ return hash_debug_pagealloc_map_pages(page, numpages, enable);
+}
+
+static void hash_linear_map_add_slot(phys_addr_t paddr, int slot)
+{
+ if (is_kfence_address(__va(paddr)))
+ hash_kfence_add_slot(paddr, slot);
+ else
+ hash_debug_pagealloc_add_slot(paddr, slot);
+}
+#else
+static void hash_linear_map_add_slot(phys_addr_t paddr, int slot) {}
+#endif
/*
* 'R' and 'C' update notes:
@@ -559,7 +692,8 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long vend,
break;
cond_resched();
- hash_debug_pagealloc_add_slot(paddr, ret);
+ // add slot info in debug_pagealloc / kfence linear map
+ hash_linear_map_add_slot(paddr, ret);
}
return ret < 0 ? ret : 0;
}
@@ -940,7 +1074,7 @@ static void __init htab_init_page_sizes(void)
bool aligned = true;
init_hpte_page_sizes();
- if (!debug_pagealloc_enabled()) {
+ if (!debug_pagealloc_enabled_or_kfence()) {
/*
* Pick a size for the linear mapping. Currently, we only
* support 16M, 1M and 4K which is the default
@@ -1261,6 +1395,7 @@ static void __init htab_initialize(void)
prot = pgprot_val(PAGE_KERNEL);
hash_debug_pagealloc_alloc_slots();
+ hash_kfence_alloc_pool();
/* create bolted the linear mapping in the hash table */
for_each_mem_range(i, &base, &end) {
size = end - base;
@@ -1277,6 +1412,7 @@ static void __init htab_initialize(void)
BUG_ON(htab_bolt_mapping(base, base + size, __pa(base),
prot, mmu_linear_psize, mmu_kernel_ssize));
}
+ hash_kfence_map_pool();
memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
/*
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC RESEND v2 11/13] book3s64/radix: Refactoring common kfence related functions
2024-10-15 1:33 [RFC RESEND v2 00/13] powerpc/kfence: Improve kfence support Ritesh Harjani (IBM)
` (9 preceding siblings ...)
2024-10-15 1:33 ` [RFC RESEND v2 10/13] book3s64/hash: Add kfence functionality Ritesh Harjani (IBM)
@ 2024-10-15 1:33 ` Ritesh Harjani (IBM)
2024-10-15 1:33 ` [RFC RESEND v2 12/13] book3s64/hash: Disable kfence if not early init Ritesh Harjani (IBM)
2024-10-15 1:33 ` [RFC RESEND v2 13/13] book3s64/hash: Early detect debug_pagealloc size requirement Ritesh Harjani (IBM)
12 siblings, 0 replies; 16+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-15 1:33 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
Both radix and hash on book3s requires to detect if kfence
early init is enabled or not. Hash needs to disable kfence
if early init is not enabled because with kfence the linear map is
mapped using PAGE_SIZE rather than 16M mapping.
We don't support multiple page sizes for slb entry used for kernel
linear map in book3s64.
This patch refactors out the common functions required to detect kfence
early init is enabled or not.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/include/asm/kfence.h | 8 ++++++--
arch/powerpc/mm/book3s64/pgtable.c | 13 +++++++++++++
arch/powerpc/mm/book3s64/radix_pgtable.c | 12 ------------
arch/powerpc/mm/init-common.c | 1 +
4 files changed, 20 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/include/asm/kfence.h b/arch/powerpc/include/asm/kfence.h
index fab124ada1c7..1f7cab58ab2c 100644
--- a/arch/powerpc/include/asm/kfence.h
+++ b/arch/powerpc/include/asm/kfence.h
@@ -15,7 +15,7 @@
#define ARCH_FUNC_PREFIX "."
#endif
-#ifdef CONFIG_KFENCE
+extern bool kfence_early_init;
extern bool kfence_disabled;
static inline void disable_kfence(void)
@@ -27,7 +27,11 @@ static inline bool arch_kfence_init_pool(void)
{
return !kfence_disabled;
}
-#endif
+
+static inline bool kfence_early_init_enabled(void)
+{
+ return IS_ENABLED(CONFIG_KFENCE) && kfence_early_init;
+}
#ifdef CONFIG_PPC64
static inline bool kfence_protect_page(unsigned long addr, bool protect)
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 5a4a75369043..374542528080 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -37,6 +37,19 @@ EXPORT_SYMBOL(__pmd_frag_nr);
unsigned long __pmd_frag_size_shift;
EXPORT_SYMBOL(__pmd_frag_size_shift);
+#ifdef CONFIG_KFENCE
+extern bool kfence_early_init;
+static int __init parse_kfence_early_init(char *arg)
+{
+ int val;
+
+ if (get_option(&arg, &val))
+ kfence_early_init = !!val;
+ return 0;
+}
+early_param("kfence.sample_interval", parse_kfence_early_init);
+#endif
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
/*
* This is called when relaxing access to a hugepage. It's also called in the page
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index b0d927009af8..311e2112d782 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -363,18 +363,6 @@ static int __meminit create_physical_mapping(unsigned long start,
}
#ifdef CONFIG_KFENCE
-static bool __ro_after_init kfence_early_init = !!CONFIG_KFENCE_SAMPLE_INTERVAL;
-
-static int __init parse_kfence_early_init(char *arg)
-{
- int val;
-
- if (get_option(&arg, &val))
- kfence_early_init = !!val;
- return 0;
-}
-early_param("kfence.sample_interval", parse_kfence_early_init);
-
static inline phys_addr_t alloc_kfence_pool(void)
{
phys_addr_t kfence_pool;
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index 2978fcbe307e..745097554bea 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -33,6 +33,7 @@ bool disable_kuep = !IS_ENABLED(CONFIG_PPC_KUEP);
bool disable_kuap = !IS_ENABLED(CONFIG_PPC_KUAP);
#ifdef CONFIG_KFENCE
bool __ro_after_init kfence_disabled;
+bool __ro_after_init kfence_early_init = !!CONFIG_KFENCE_SAMPLE_INTERVAL;
#endif
static int __init parse_nosmep(char *p)
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC RESEND v2 12/13] book3s64/hash: Disable kfence if not early init
2024-10-15 1:33 [RFC RESEND v2 00/13] powerpc/kfence: Improve kfence support Ritesh Harjani (IBM)
` (10 preceding siblings ...)
2024-10-15 1:33 ` [RFC RESEND v2 11/13] book3s64/radix: Refactoring common kfence related functions Ritesh Harjani (IBM)
@ 2024-10-15 1:33 ` Ritesh Harjani (IBM)
2024-10-15 1:33 ` [RFC RESEND v2 13/13] book3s64/hash: Early detect debug_pagealloc size requirement Ritesh Harjani (IBM)
12 siblings, 0 replies; 16+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-15 1:33 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
Enable kfence on book3s64 hash only when early init is enabled.
This is because, kfence could cause the kernel linear map to be mapped
at PAGE_SIZE level instead of 16M (which I guess we don't want).
Also currently there is no way to -
1. Make multiple page size entries for the SLB used for kernel linear
map.
2. No easy way of getting the hash slot details after the page table
mapping for kernel linear setup. So even if kfence allocate the
pool in late init, we won't be able to get the hash slot details in
kfence linear map.
Thus this patch disables kfence on hash if kfence early init is not
enabled.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 53e6f3a524eb..b6da25719e37 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -410,6 +410,8 @@ static phys_addr_t kfence_pool;
static inline void hash_kfence_alloc_pool(void)
{
+ if (!kfence_early_init_enabled())
+ goto err;
// allocate linear map for kfence within RMA region
linear_map_kf_hash_count = KFENCE_POOL_SIZE >> PAGE_SHIFT;
@@ -1074,7 +1076,7 @@ static void __init htab_init_page_sizes(void)
bool aligned = true;
init_hpte_page_sizes();
- if (!debug_pagealloc_enabled_or_kfence()) {
+ if (!debug_pagealloc_enabled() && !kfence_early_init_enabled()) {
/*
* Pick a size for the linear mapping. Currently, we only
* support 16M, 1M and 4K which is the default
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC RESEND v2 13/13] book3s64/hash: Early detect debug_pagealloc size requirement
2024-10-15 1:33 [RFC RESEND v2 00/13] powerpc/kfence: Improve kfence support Ritesh Harjani (IBM)
` (11 preceding siblings ...)
2024-10-15 1:33 ` [RFC RESEND v2 12/13] book3s64/hash: Disable kfence if not early init Ritesh Harjani (IBM)
@ 2024-10-15 1:33 ` Ritesh Harjani (IBM)
12 siblings, 0 replies; 16+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-15 1:33 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
Add hash_supports_debug_pagealloc() helper to detect whether
debug_pagealloc can be supported on hash or not. This checks for both,
whether debug_pagealloc config is enabled and the linear map should
fit within rma_size/4 region size.
This can then be used early during htab_init_page_sizes() to decide
linear map pagesize if hash supports either debug_pagealloc or
kfence.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index b6da25719e37..3ffc98b3deb1 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -329,25 +329,26 @@ static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long idx,
}
#endif
+static inline bool hash_supports_debug_pagealloc(void)
+{
+ unsigned long max_hash_count = ppc64_rma_size / 4;
+ unsigned long linear_map_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
+
+ if (!debug_pagealloc_enabled() || linear_map_count > max_hash_count)
+ return false;
+ return true;
+}
+
#ifdef CONFIG_DEBUG_PAGEALLOC
static u8 *linear_map_hash_slots;
static unsigned long linear_map_hash_count;
static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
static void hash_debug_pagealloc_alloc_slots(void)
{
- unsigned long max_hash_count = ppc64_rma_size / 4;
-
- if (!debug_pagealloc_enabled())
- return;
- linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
- if (unlikely(linear_map_hash_count > max_hash_count)) {
- pr_info("linear map size (%llu) greater than 4 times RMA region (%llu). Disabling debug_pagealloc\n",
- ((u64)linear_map_hash_count << PAGE_SHIFT),
- ppc64_rma_size);
- linear_map_hash_count = 0;
+ if (!hash_supports_debug_pagealloc())
return;
- }
+ linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
linear_map_hash_slots = memblock_alloc_try_nid(
linear_map_hash_count, 1, MEMBLOCK_LOW_LIMIT,
ppc64_rma_size, NUMA_NO_NODE);
@@ -1076,7 +1077,7 @@ static void __init htab_init_page_sizes(void)
bool aligned = true;
init_hpte_page_sizes();
- if (!debug_pagealloc_enabled() && !kfence_early_init_enabled()) {
+ if (!hash_supports_debug_pagealloc() && !kfence_early_init_enabled()) {
/*
* Pick a size for the linear mapping. Currently, we only
* support 16M, 1M and 4K which is the default
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread