* [PATCH] mm: vmalloc: use VMALLOC_EARLY_START boundary for early vmap area
@ 2025-07-22 4:08 Jia He
2025-07-22 6:48 ` Dev Jain
2025-07-22 19:39 ` Uladzislau Rezki
0 siblings, 2 replies; 5+ messages in thread
From: Jia He @ 2025-07-22 4:08 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Andrew Morton, Uladzislau Rezki
Cc: Anshuman Khandual, Ryan Roberts, Peter Xu, Joey Gouly,
Yicong Yang, Matthew Wilcox (Oracle), linux-arm-kernel,
linux-kernel, linux-mm, Jia He
When VMALLOC_START is redefined to a new boundary, most subsystems
continue to function correctly. However, vm_area_register_early()
assumes the use of the global _vmlist_ structure before vmalloc_init()
is invoked. This assumption can lead to issues during early boot.
See the calltrace as follows:
start_kernel()
setup_per_cpu_areas()
pcpu_page_first_chunk()
vm_area_register_early()
mm_core_init()
vmalloc_init()
The early vm areas will be added to vmlist at declare_kernel_vmas()
->declare_vma():
ffff800080010000 T _stext
ffff800080da0000 D __start_rodata
ffff800081890000 T __inittext_begin
ffff800081980000 D __initdata_begin
ffff800081ee0000 D _data
The starting address of the early areas is tied to the *old* VMALLOC_START
(i.e. 0xffff800080000000 on an arm64 N2 server).
If VMALLOC_START is redefined, it can disrupt early VM area allocation,
particularly in like pcpu_page_first_chunk()->vm_area_register_early().
To address this potential risk on arm64, introduce a new boundary,
VMALLOC_EARLY_START, to avoid boot issues when VMALLOC_START is
occasionaly redefined.
Signed-off-by: Jia He <justin.he@arm.com>
---
arch/arm64/include/asm/pgtable.h | 2 ++
mm/vmalloc.c | 6 +++++-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 192d86e1cc76..91031912a906 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -18,9 +18,11 @@
* VMALLOC range.
*
* VMALLOC_START: beginning of the kernel vmalloc space
+ * VMALLOC_EARLY_START: early vm area before vmalloc_init()
* VMALLOC_END: extends to the available space below vmemmap
*/
#define VMALLOC_START (MODULES_END)
+#define VMALLOC_EARLY_START (MODULES_END)
#if VA_BITS == VA_BITS_MIN
#define VMALLOC_END (VMEMMAP_START - SZ_8M)
#else
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6dbcdceecae1..86ab1e99641a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -50,6 +50,10 @@
#include "internal.h"
#include "pgalloc-track.h"
+#ifndef VMALLOC_EARLY_START
+#define VMALLOC_EARLY_START VMALLOC_START
+#endif
+
#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
static unsigned int __ro_after_init ioremap_max_page_shift = BITS_PER_LONG - 1;
@@ -3126,7 +3130,7 @@ void __init vm_area_add_early(struct vm_struct *vm)
*/
void __init vm_area_register_early(struct vm_struct *vm, size_t align)
{
- unsigned long addr = ALIGN(VMALLOC_START, align);
+ unsigned long addr = ALIGN(VMALLOC_EARLY_START, align);
struct vm_struct *cur, **p;
BUG_ON(vmap_initialized);
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: vmalloc: use VMALLOC_EARLY_START boundary for early vmap area
2025-07-22 4:08 [PATCH] mm: vmalloc: use VMALLOC_EARLY_START boundary for early vmap area Jia He
@ 2025-07-22 6:48 ` Dev Jain
2025-07-28 6:19 ` Justin He
2025-07-22 19:39 ` Uladzislau Rezki
1 sibling, 1 reply; 5+ messages in thread
From: Dev Jain @ 2025-07-22 6:48 UTC (permalink / raw)
To: Jia He, Catalin Marinas, Will Deacon, Andrew Morton,
Uladzislau Rezki
Cc: Anshuman Khandual, Ryan Roberts, Peter Xu, Joey Gouly,
Yicong Yang, Matthew Wilcox (Oracle), linux-arm-kernel,
linux-kernel, linux-mm
On 22/07/25 9:38 am, Jia He wrote:
> When VMALLOC_START is redefined to a new boundary, most subsystems
> continue to function correctly. However, vm_area_register_early()
> assumes the use of the global _vmlist_ structure before vmalloc_init()
> is invoked. This assumption can lead to issues during early boot.
>
> See the calltrace as follows:
> start_kernel()
> setup_per_cpu_areas()
> pcpu_page_first_chunk()
> vm_area_register_early()
> mm_core_init()
> vmalloc_init()
>
> The early vm areas will be added to vmlist at declare_kernel_vmas()
> ->declare_vma():
> ffff800080010000 T _stext
> ffff800080da0000 D __start_rodata
> ffff800081890000 T __inittext_begin
> ffff800081980000 D __initdata_begin
> ffff800081ee0000 D _data
> The starting address of the early areas is tied to the *old* VMALLOC_START
> (i.e. 0xffff800080000000 on an arm64 N2 server).
>
> If VMALLOC_START is redefined, it can disrupt early VM area allocation,
> particularly in like pcpu_page_first_chunk()->vm_area_register_early().
>
> To address this potential risk on arm64, introduce a new boundary,
> VMALLOC_EARLY_START, to avoid boot issues when VMALLOC_START is
> occasionaly redefined.
Sorry but I am unable to understand the point of the patch. If a particular
value of VMALLOC_START causes a problem because the vma declarations of the
kernel are tied to that value, surely we should be reasoning about what was
wrong about the new value, and not circumventing the actual problem
by introducing VMALLOC_EARLY_START?
Also by your patch description I don't think you have run into a reproducible
boot issue, so this patch is basically adding dead code because both macros
are defined to MODULES_END?
>
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
> arch/arm64/include/asm/pgtable.h | 2 ++
> mm/vmalloc.c | 6 +++++-
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 192d86e1cc76..91031912a906 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -18,9 +18,11 @@
> * VMALLOC range.
> *
> * VMALLOC_START: beginning of the kernel vmalloc space
> + * VMALLOC_EARLY_START: early vm area before vmalloc_init()
> * VMALLOC_END: extends to the available space below vmemmap
> */
> #define VMALLOC_START (MODULES_END)
> +#define VMALLOC_EARLY_START (MODULES_END)
> #if VA_BITS == VA_BITS_MIN
> #define VMALLOC_END (VMEMMAP_START - SZ_8M)
> #else
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 6dbcdceecae1..86ab1e99641a 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -50,6 +50,10 @@
> #include "internal.h"
> #include "pgalloc-track.h"
>
> +#ifndef VMALLOC_EARLY_START
> +#define VMALLOC_EARLY_START VMALLOC_START
> +#endif
> +
> #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> static unsigned int __ro_after_init ioremap_max_page_shift = BITS_PER_LONG - 1;
>
> @@ -3126,7 +3130,7 @@ void __init vm_area_add_early(struct vm_struct *vm)
> */
> void __init vm_area_register_early(struct vm_struct *vm, size_t align)
> {
> - unsigned long addr = ALIGN(VMALLOC_START, align);
> + unsigned long addr = ALIGN(VMALLOC_EARLY_START, align);
> struct vm_struct *cur, **p;
>
> BUG_ON(vmap_initialized);
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: vmalloc: use VMALLOC_EARLY_START boundary for early vmap area
2025-07-22 4:08 [PATCH] mm: vmalloc: use VMALLOC_EARLY_START boundary for early vmap area Jia He
2025-07-22 6:48 ` Dev Jain
@ 2025-07-22 19:39 ` Uladzislau Rezki
1 sibling, 0 replies; 5+ messages in thread
From: Uladzislau Rezki @ 2025-07-22 19:39 UTC (permalink / raw)
To: Jia He
Cc: Catalin Marinas, Will Deacon, Andrew Morton, Uladzislau Rezki,
Anshuman Khandual, Ryan Roberts, Peter Xu, Joey Gouly,
Yicong Yang, Matthew Wilcox (Oracle), linux-arm-kernel,
linux-kernel, linux-mm
On Tue, Jul 22, 2025 at 04:08:50AM +0000, Jia He wrote:
> When VMALLOC_START is redefined to a new boundary, most subsystems
> continue to function correctly. However, vm_area_register_early()
> assumes the use of the global _vmlist_ structure before vmalloc_init()
> is invoked. This assumption can lead to issues during early boot.
>
But we just should not redefine the macro. If there are such places
those are should be fixed, IMO.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH] mm: vmalloc: use VMALLOC_EARLY_START boundary for early vmap area
2025-07-22 6:48 ` Dev Jain
@ 2025-07-28 6:19 ` Justin He
2025-07-29 16:26 ` Dev Jain
0 siblings, 1 reply; 5+ messages in thread
From: Justin He @ 2025-07-28 6:19 UTC (permalink / raw)
To: Dev Jain, Catalin Marinas, Will Deacon, Andrew Morton,
Uladzislau Rezki
Cc: Anshuman Khandual, Ryan Roberts, Peter Xu, Joey Gouly,
Yicong Yang, Matthew Wilcox (Oracle),
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Hi Dev,
> -----Original Message-----
> From: Dev Jain <Dev.Jain@arm.com>
> Sent: Tuesday, July 22, 2025 2:48 PM
> To: Justin He <Justin.He@arm.com>; Catalin Marinas
> <Catalin.Marinas@arm.com>; Will Deacon <will@kernel.org>; Andrew
> Morton <akpm@linux-foundation.org>; Uladzislau Rezki <urezki@gmail.com>
> Cc: Anshuman Khandual <Anshuman.Khandual@arm.com>; Ryan Roberts
> <Ryan.Roberts@arm.com>; Peter Xu <peterx@redhat.com>; Joey Gouly
> <Joey.Gouly@arm.com>; Yicong Yang <yangyicong@hisilicon.com>; Matthew
> Wilcox (Oracle) <willy@infradead.org>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org
> Subject: Re: [PATCH] mm: vmalloc: use VMALLOC_EARLY_START boundary for
> early vmap area
>
>
> On 22/07/25 9:38 am, Jia He wrote:
> > When VMALLOC_START is redefined to a new boundary, most subsystems
> > continue to function correctly. However, vm_area_register_early()
> > assumes the use of the global _vmlist_ structure before vmalloc_init()
> > is invoked. This assumption can lead to issues during early boot.
> >
> > See the calltrace as follows:
> > start_kernel()
> > setup_per_cpu_areas()
> > pcpu_page_first_chunk()
> > vm_area_register_early()
> > mm_core_init()
> > vmalloc_init()
> >
> > The early vm areas will be added to vmlist at declare_kernel_vmas()
> > ->declare_vma():
> > ffff800080010000 T _stext
> > ffff800080da0000 D __start_rodata
> > ffff800081890000 T __inittext_begin
> > ffff800081980000 D __initdata_begin
> > ffff800081ee0000 D _data
> > The starting address of the early areas is tied to the *old*
> > VMALLOC_START (i.e. 0xffff800080000000 on an arm64 N2 server).
> >
> > If VMALLOC_START is redefined, it can disrupt early VM area
> > allocation, particularly in like pcpu_page_first_chunk()-
> >vm_area_register_early().
> >
> > To address this potential risk on arm64, introduce a new boundary,
> > VMALLOC_EARLY_START, to avoid boot issues when VMALLOC_START is
> > occasionaly redefined.
>
> Sorry but I am unable to understand the point of the patch. If a particular
> value of VMALLOC_START causes a problem because the vma declarations of
> the kernel are tied to that value, surely we should be reasoning about what
> was wrong about the new value, and not circumventing the actual problem by
> introducing VMALLOC_EARLY_START?
>
> Also by your patch description I don't think you have run into a reproducible
> boot issue, so this patch is basically adding dead code because both macros
> are defined to MODULES_END?
>
Please try this *debugging* purpose patch which can trigger the boot panic
more easily(I can always reproduce the boot panic on an ARM64 server):
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 192d86e1cc76..2be8db8d0205 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -20,7 +20,8 @@
* VMALLOC_START: beginning of the kernel vmalloc space
* VMALLOC_END: extends to the available space below vmemmap
*/
-#define VMALLOC_START (MODULES_END)
+//#define VMALLOC_START (MODULES_END)
+#define VMALLOC_START ((MODULES_END & PGDIR_MASK) + PGDIR_SIZE)
#if VA_BITS == VA_BITS_MIN
#define VMALLOC_END (VMEMMAP_START - SZ_8M)
#else
diff --git a/mm/percpu.c b/mm/percpu.c
index b35494c8ede2..53d187172b5e 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -3051,7 +3051,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
max_distance += ai->unit_size * ai->groups[highest_group].nr_units;
/* warn if maximum distance is further than 75% of vmalloc space */
- if (max_distance > VMALLOC_TOTAL * 3 / 4) {
+ if (1 || max_distance > VMALLOC_TOTAL * 3 / 4) {
pr_warn("max_distance=0x%lx too large for vmalloc space 0x%lx\n",
max_distance, VMALLOC_TOTAL);
#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
---
Cheers,
Justin He(Jia He)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: vmalloc: use VMALLOC_EARLY_START boundary for early vmap area
2025-07-28 6:19 ` Justin He
@ 2025-07-29 16:26 ` Dev Jain
0 siblings, 0 replies; 5+ messages in thread
From: Dev Jain @ 2025-07-29 16:26 UTC (permalink / raw)
To: Justin He, Catalin Marinas, Will Deacon, Andrew Morton,
Uladzislau Rezki
Cc: Anshuman Khandual, Ryan Roberts, Peter Xu, Joey Gouly,
Yicong Yang, Matthew Wilcox (Oracle),
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
On 28/07/25 11:49 am, Justin He wrote:
> Hi Dev,
>
>> -----Original Message-----
>> From: Dev Jain <Dev.Jain@arm.com>
>> Sent: Tuesday, July 22, 2025 2:48 PM
>> To: Justin He <Justin.He@arm.com>; Catalin Marinas
>> <Catalin.Marinas@arm.com>; Will Deacon <will@kernel.org>; Andrew
>> Morton <akpm@linux-foundation.org>; Uladzislau Rezki <urezki@gmail.com>
>> Cc: Anshuman Khandual <Anshuman.Khandual@arm.com>; Ryan Roberts
>> <Ryan.Roberts@arm.com>; Peter Xu <peterx@redhat.com>; Joey Gouly
>> <Joey.Gouly@arm.com>; Yicong Yang <yangyicong@hisilicon.com>; Matthew
>> Wilcox (Oracle) <willy@infradead.org>; linux-arm-kernel@lists.infradead.org;
>> linux-kernel@vger.kernel.org; linux-mm@kvack.org
>> Subject: Re: [PATCH] mm: vmalloc: use VMALLOC_EARLY_START boundary for
>> early vmap area
>>
>>
>> On 22/07/25 9:38 am, Jia He wrote:
>>> When VMALLOC_START is redefined to a new boundary, most subsystems
>>> continue to function correctly. However, vm_area_register_early()
>>> assumes the use of the global _vmlist_ structure before vmalloc_init()
>>> is invoked. This assumption can lead to issues during early boot.
>>>
>>> See the calltrace as follows:
>>> start_kernel()
>>> setup_per_cpu_areas()
>>> pcpu_page_first_chunk()
>>> vm_area_register_early()
>>> mm_core_init()
>>> vmalloc_init()
>>>
>>> The early vm areas will be added to vmlist at declare_kernel_vmas()
>>> ->declare_vma():
>>> ffff800080010000 T _stext
>>> ffff800080da0000 D __start_rodata
>>> ffff800081890000 T __inittext_begin
>>> ffff800081980000 D __initdata_begin
>>> ffff800081ee0000 D _data
>>> The starting address of the early areas is tied to the *old*
>>> VMALLOC_START (i.e. 0xffff800080000000 on an arm64 N2 server).
>>>
>>> If VMALLOC_START is redefined, it can disrupt early VM area
>>> allocation, particularly in like pcpu_page_first_chunk()-
>>> vm_area_register_early().
>>>
>>> To address this potential risk on arm64, introduce a new boundary,
>>> VMALLOC_EARLY_START, to avoid boot issues when VMALLOC_START is
>>> occasionaly redefined.
>> Sorry but I am unable to understand the point of the patch. If a particular
>> value of VMALLOC_START causes a problem because the vma declarations of
>> the kernel are tied to that value, surely we should be reasoning about what
>> was wrong about the new value, and not circumventing the actual problem by
>> introducing VMALLOC_EARLY_START?
>>
>> Also by your patch description I don't think you have run into a reproducible
>> boot issue, so this patch is basically adding dead code because both macros
>> are defined to MODULES_END?
>>
> Please try this *debugging* purpose patch which can trigger the boot panic
> more easily(I can always reproduce the boot panic on an ARM64 server):
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 192d86e1cc76..2be8db8d0205 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -20,7 +20,8 @@
> * VMALLOC_START: beginning of the kernel vmalloc space
> * VMALLOC_END: extends to the available space below vmemmap
> */
> -#define VMALLOC_START (MODULES_END)
> +//#define VMALLOC_START (MODULES_END)
> +#define VMALLOC_START ((MODULES_END & PGDIR_MASK) + PGDIR_SIZE)
> #if VA_BITS == VA_BITS_MIN
> #define VMALLOC_END (VMEMMAP_START - SZ_8M)
> #else
> diff --git a/mm/percpu.c b/mm/percpu.c
> index b35494c8ede2..53d187172b5e 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -3051,7 +3051,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
> max_distance += ai->unit_size * ai->groups[highest_group].nr_units;
>
> /* warn if maximum distance is further than 75% of vmalloc space */
> - if (max_distance > VMALLOC_TOTAL * 3 / 4) {
> + if (1 || max_distance > VMALLOC_TOTAL * 3 / 4) {
This will always return true - which leads to returning -EINVAL and then
panicking in setup_per_cpu_areas(). Probably you made this change by mistake
and are trying to say that the redefinition above panics?
> pr_warn("max_distance=0x%lx too large for vmalloc space 0x%lx\n",
> max_distance, VMALLOC_TOTAL);
> #ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
>
>
> ---
> Cheers,
> Justin He(Jia He)
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-07-29 16:27 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-22 4:08 [PATCH] mm: vmalloc: use VMALLOC_EARLY_START boundary for early vmap area Jia He
2025-07-22 6:48 ` Dev Jain
2025-07-28 6:19 ` Justin He
2025-07-29 16:26 ` Dev Jain
2025-07-22 19:39 ` Uladzislau Rezki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).