* [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization @ 2026-03-19 8:31 Hao Ge 2026-03-19 22:28 ` Andrew Morton 2026-03-20 3:14 ` Andrew Morton 0 siblings, 2 replies; 21+ messages in thread From: Hao Ge @ 2026-03-19 8:31 UTC (permalink / raw) To: Suren Baghdasaryan, Kent Overstreet, Andrew Morton Cc: linux-mm, linux-kernel, Hao Ge Due to initialization ordering, page_ext is allocated and initialized relatively late during boot. Some pages have already been allocated and freed before page_ext becomes available, leaving their codetag uninitialized. A clear example is in init_section_page_ext(): alloc_page_ext() calls kmemleak_alloc(). If the slab cache has no free objects, it falls back to the buddy allocator to allocate memory. However, at this point page_ext is not yet fully initialized, so these newly allocated pages have no codetag set. These pages may later be reclaimed by KASAN,which causes the warning to trigger when they are freed because their codetag ref is still empty. Use a global array to track pages allocated before page_ext is fully initialized, similar to how kmemleak tracks early allocations. When page_ext initialization completes, set their codetag to empty to avoid warnings when they are freed later. The following warning may be noticed: [ 9.582133] ------------[ cut here ]------------ [ 9.582137] alloc_tag was not set [ 9.582139] WARNING: ./include/linux/alloc_tag.h:164 at __pgalloc_tag_sub+0x40f/0x550, CPU#5: systemd/1 [ 9.582190] CPU: 5 UID: 0 PID: 1 Comm: systemd Not tainted 7.0.0-rc4 #1 PREEMPT(lazy) [ 9.582192] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 9.582194] RIP: 0010:__pgalloc_tag_sub+0x40f/0x550 [ 9.582196] Code: 00 00 4c 29 e5 48 8b 05 1f 88 56 05 48 8d 4c ad 00 48 8d 2c c8 e9 87 fd ff ff 0f 0b 0f 0b e9 f3 fe ff ff 48 8d 3d 61 2f ed 03 <67> 48 0f b9 3a e9 b3 fd ff ff 0f 0b eb e4 e8 5e cd 14 02 4c 89 c7 [ 9.582197] RSP: 0018:ffffc9000001f940 EFLAGS: 00010246 [ 9.582200] RAX: dffffc0000000000 RBX: 1ffff92000003f2b RCX: 1ffff110200d806c [ 9.582201] RDX: ffff8881006c0360 RSI: 0000000000000004 RDI: ffffffff9bc7b460 [ 9.582202] RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff3a62324 [ 9.582203] R10: ffffffff9d311923 R11: 0000000000000000 R12: ffffea0004001b00 [ 9.582204] R13: 0000000000002000 R14: ffffea0000000000 R15: ffff8881006c0360 [ 9.582206] FS: 00007ffbbcf2d940(0000) GS:ffff888450479000(0000) knlGS:0000000000000000 [ 9.582208] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9.582210] CR2: 000055ee3aa260d0 CR3: 0000000148b67005 CR4: 0000000000770ef0 [ 9.582211] PKRU: 55555554 [ 9.582212] Call Trace: [ 9.582213] <TASK> [ 9.582214] ? __pfx___pgalloc_tag_sub+0x10/0x10 [ 9.582216] ? check_bytes_and_report+0x68/0x140 [ 9.582219] __free_frozen_pages+0x2e4/0x1150 [ 9.582221] ? __free_slab+0xc2/0x2b0 [ 9.582224] qlist_free_all+0x4c/0xf0 [ 9.582227] kasan_quarantine_reduce+0x15d/0x180 [ 9.582229] __kasan_slab_alloc+0x69/0x90 [ 9.582232] kmem_cache_alloc_noprof+0x14a/0x500 [ 9.582234] do_getname+0x96/0x310 [ 9.582237] do_readlinkat+0x91/0x2f0 [ 9.582239] ? __pfx_do_readlinkat+0x10/0x10 [ 9.582240] ? get_random_bytes_user+0x1df/0x2c0 [ 9.582244] __x64_sys_readlinkat+0x96/0x100 [ 9.582246] do_syscall_64+0xce/0x650 [ 9.582250] ? __x64_sys_getrandom+0x13a/0x1e0 [ 9.582252] ? __pfx___x64_sys_getrandom+0x10/0x10 [ 9.582254] ? do_syscall_64+0x114/0x650 [ 9.582255] ? ksys_read+0xfc/0x1d0 [ 9.582258] ? __pfx_ksys_read+0x10/0x10 [ 9.582260] ? do_syscall_64+0x114/0x650 [ 9.582262] ? do_syscall_64+0x114/0x650 [ 9.582264] ? __pfx_fput_close_sync+0x10/0x10 [ 9.582266] ? file_close_fd_locked+0x178/0x2a0 [ 9.582268] ? __x64_sys_faccessat2+0x96/0x100 [ 9.582269] ? __x64_sys_close+0x7d/0xd0 [ 9.582271] ? do_syscall_64+0x114/0x650 [ 9.582273] ? do_syscall_64+0x114/0x650 [ 9.582275] ? clear_bhb_loop+0x50/0xa0 [ 9.582277] ? clear_bhb_loop+0x50/0xa0 [ 9.582279] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 9.582280] RIP: 0033:0x7ffbbda345ee [ 9.582282] Code: 0f 1f 40 00 48 8b 15 29 38 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 0b 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fa 37 0d 00 f7 d8 64 89 01 48 [ 9.582284] RSP: 002b:00007ffe2ad8de58 EFLAGS: 00000202 ORIG_RAX: 000000000000010b [ 9.582286] RAX: ffffffffffffffda RBX: 000055ee3aa25570 RCX: 00007ffbbda345ee [ 9.582287] RDX: 000055ee3aa25570 RSI: 00007ffe2ad8dee0 RDI: 00000000ffffff9c [ 9.582288] RBP: 0000000000001000 R08: 0000000000000003 R09: 0000000000001001 [ 9.582289] R10: 0000000000001000 R11: 0000000000000202 R12: 0000000000000033 [ 9.582290] R13: 00007ffe2ad8dee0 R14: 00000000ffffff9c R15: 00007ffe2ad8deb0 [ 9.582292] </TASK> [ 9.582293] ---[ end trace 0000000000000000 ]--- Signed-off-by: Hao Ge <hao.ge@linux.dev> --- include/linux/alloc_tag.h | 3 ++ lib/alloc_tag.c | 80 +++++++++++++++++++++++++++++++++++++++ mm/page_alloc.c | 7 ++++ 3 files changed, 90 insertions(+) diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h index d40ac39bfbe8..941bb4e40d3f 100644 --- a/include/linux/alloc_tag.h +++ b/include/linux/alloc_tag.h @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) #ifdef CONFIG_MEM_ALLOC_PROFILING +bool mem_profiling_is_available(void); +void alloc_tag_add_early_pfn(unsigned long pfn); + #define ALLOC_TAG_SECTION_NAME "alloc_tags" struct codetag_bytes { diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c index 58991ab09d84..a5bf4e72c154 100644 --- a/lib/alloc_tag.c +++ b/lib/alloc_tag.c @@ -6,6 +6,7 @@ #include <linux/kallsyms.h> #include <linux/module.h> #include <linux/page_ext.h> +#include <linux/pgalloc_tag.h> #include <linux/proc_fs.h> #include <linux/seq_buf.h> #include <linux/seq_file.h> @@ -26,6 +27,82 @@ static bool mem_profiling_support; static struct codetag_type *alloc_tag_cttype; +/* + * State of the alloc_tag + * + * This is used to describe the states of the alloc_tag during bootup. + * + * When we need to allocate page_ext to store codetag, we face an + * initialization timing problem: + * + * Due to initialization order, pages may be allocated via buddy system + * before page_ext is fully allocated and initialized. Although these + * pages call the allocation hooks, the codetag will not be set because + * page_ext is not yet available. + * + * When these pages are later free to the buddy system, it triggers + * warnings because their codetag is actually empty if + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. + * + * Additionally, in this situation, we cannot record detailed allocation + * information for these pages. + */ +enum mem_profiling_state { + DOWN, /* No mem_profiling functionality yet */ + UP /* Everything is working */ +}; + +static enum mem_profiling_state mem_profiling_state = DOWN; + +bool mem_profiling_is_available(void) +{ + return mem_profiling_state == UP; +} + +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG + +#define EARLY_ALLOC_PFN_MAX 256 + +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; +static unsigned int early_pfn_count; +static DEFINE_SPINLOCK(early_pfn_lock); + +void alloc_tag_add_early_pfn(unsigned long pfn) +{ + unsigned long flags; + + if (mem_profiling_state != DOWN) + return; + + spin_lock_irqsave(&early_pfn_lock, flags); + if (early_pfn_count >= EARLY_ALLOC_PFN_MAX) { + spin_unlock_irqrestore(&early_pfn_lock, flags); + return; + } + early_pfns[early_pfn_count++] = pfn; + spin_unlock_irqrestore(&early_pfn_lock, flags); +} + +static void __init clear_early_alloc_pfn_tag_refs(void) +{ + unsigned int i; + + for (i = 0; i < early_pfn_count; i++) { + unsigned long pfn = early_pfns[i]; + + if (pfn_valid(pfn)) { + struct page *page = pfn_to_page(pfn); + + clear_page_tag_ref(page); + } + } + early_pfn_count = 0; +} +#else /* !CONFIG_MEM_ALLOC_PROFILING_DEBUG */ +inline void alloc_tag_add_early_pfn(unsigned long pfn) {} +static inline void clear_early_alloc_pfn_tag_refs(void) {} +#endif + #ifdef CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag); EXPORT_SYMBOL(_shared_alloc_tag); @@ -729,6 +806,7 @@ static int __init setup_early_mem_profiling(char *str) compressed = true; } mem_profiling_support = true; + mem_profiling_state = UP; pr_info("Memory allocation profiling is enabled %s compression and is turned %s!\n", compressed ? "with" : "without", str_on_off(enable)); } @@ -760,6 +838,8 @@ static __init bool need_page_alloc_tagging(void) static __init void init_page_alloc_tagging(void) { + mem_profiling_state = UP; + clear_early_alloc_pfn_tag_refs(); } struct page_ext_operations page_alloc_tagging_ops = { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2d4b6f1a554e..336281d91d91 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); update_page_tag_ref(handle, &ref); put_page_tag_ref(handle); + } else { + /* + * page_ext is not available yet, record the pfn so we can + * clear the tag ref later when page_ext is initialized. + */ + if (!mem_profiling_is_available()) + alloc_tag_add_early_pfn(page_to_pfn(page)); } } -- 2.25.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-19 8:31 [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization Hao Ge @ 2026-03-19 22:28 ` Andrew Morton 2026-03-19 23:44 ` Suren Baghdasaryan 2026-03-20 3:14 ` Andrew Morton 1 sibling, 1 reply; 21+ messages in thread From: Andrew Morton @ 2026-03-19 22:28 UTC (permalink / raw) To: Hao Ge; +Cc: Suren Baghdasaryan, Kent Overstreet, linux-mm, linux-kernel On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: > Due to initialization ordering, page_ext is allocated and initialized > relatively late during boot. Some pages have already been allocated > and freed before page_ext becomes available, leaving their codetag > uninitialized. > > A clear example is in init_section_page_ext(): alloc_page_ext() calls > kmemleak_alloc(). If the slab cache has no free objects, it falls back > to the buddy allocator to allocate memory. However, at this point page_ext > is not yet fully initialized, so these newly allocated pages have no > codetag set. These pages may later be reclaimed by KASAN,which causes > the warning to trigger when they are freed because their codetag ref is > still empty. > > Use a global array to track pages allocated before page_ext is fully > initialized, similar to how kmemleak tracks early allocations. > When page_ext initialization completes, set their codetag > to empty to avoid warnings when they are freed later. > > ... > > --- a/include/linux/alloc_tag.h > +++ b/include/linux/alloc_tag.h > @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) > > #ifdef CONFIG_MEM_ALLOC_PROFILING > > +bool mem_profiling_is_available(void); > +void alloc_tag_add_early_pfn(unsigned long pfn); > + > #define ALLOC_TAG_SECTION_NAME "alloc_tags" > > struct codetag_bytes { > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > index 58991ab09d84..a5bf4e72c154 100644 > --- a/lib/alloc_tag.c > +++ b/lib/alloc_tag.c > @@ -6,6 +6,7 @@ > #include <linux/kallsyms.h> > #include <linux/module.h> > #include <linux/page_ext.h> > +#include <linux/pgalloc_tag.h> > #include <linux/proc_fs.h> > #include <linux/seq_buf.h> > #include <linux/seq_file.h> > @@ -26,6 +27,82 @@ static bool mem_profiling_support; > > static struct codetag_type *alloc_tag_cttype; > > +/* > + * State of the alloc_tag > + * > + * This is used to describe the states of the alloc_tag during bootup. > + * > + * When we need to allocate page_ext to store codetag, we face an > + * initialization timing problem: > + * > + * Due to initialization order, pages may be allocated via buddy system > + * before page_ext is fully allocated and initialized. Although these > + * pages call the allocation hooks, the codetag will not be set because > + * page_ext is not yet available. > + * > + * When these pages are later free to the buddy system, it triggers > + * warnings because their codetag is actually empty if > + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. > + * > + * Additionally, in this situation, we cannot record detailed allocation > + * information for these pages. > + */ > +enum mem_profiling_state { > + DOWN, /* No mem_profiling functionality yet */ > + UP /* Everything is working */ > +}; > + > +static enum mem_profiling_state mem_profiling_state = DOWN; > + > +bool mem_profiling_is_available(void) > +{ > + return mem_profiling_state == UP; > +} > + > +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG > + > +#define EARLY_ALLOC_PFN_MAX 256 > + > +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; It's unfortunate that this isn't __initdata. > +static unsigned int early_pfn_count; > +static DEFINE_SPINLOCK(early_pfn_lock); > + > > ... > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, > alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > update_page_tag_ref(handle, &ref); > put_page_tag_ref(handle); > + } else { > + /* > + * page_ext is not available yet, record the pfn so we can > + * clear the tag ref later when page_ext is initialized. > + */ > + if (!mem_profiling_is_available()) > + alloc_tag_add_early_pfn(page_to_pfn(page)); > } > } All because of this, I believe. Is this fixable? If we take that `else', we know we're running in __init code, yes? I don't see how `__init pgalloc_tag_add_early()' could be made to work. hrm. Something clever, please. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-19 22:28 ` Andrew Morton @ 2026-03-19 23:44 ` Suren Baghdasaryan 2026-03-19 23:48 ` Suren Baghdasaryan 0 siblings, 1 reply; 21+ messages in thread From: Suren Baghdasaryan @ 2026-03-19 23:44 UTC (permalink / raw) To: Andrew Morton; +Cc: Hao Ge, Kent Overstreet, linux-mm, linux-kernel On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: > > > Due to initialization ordering, page_ext is allocated and initialized > > relatively late during boot. Some pages have already been allocated > > and freed before page_ext becomes available, leaving their codetag > > uninitialized. Hi Hao, Thanks for the report. Hmm. So, we are allocating pages before page_ext is initialized... > > > > A clear example is in init_section_page_ext(): alloc_page_ext() calls > > kmemleak_alloc(). If the slab cache has no free objects, it falls back > > to the buddy allocator to allocate memory. However, at this point page_ext > > is not yet fully initialized, so these newly allocated pages have no > > codetag set. These pages may later be reclaimed by KASAN,which causes > > the warning to trigger when they are freed because their codetag ref is > > still empty. > > > > Use a global array to track pages allocated before page_ext is fully > > initialized, similar to how kmemleak tracks early allocations. > > When page_ext initialization completes, set their codetag > > to empty to avoid warnings when they are freed later. > > > > ... > > > > --- a/include/linux/alloc_tag.h > > +++ b/include/linux/alloc_tag.h > > @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) > > > > #ifdef CONFIG_MEM_ALLOC_PROFILING > > > > +bool mem_profiling_is_available(void); > > +void alloc_tag_add_early_pfn(unsigned long pfn); > > + > > #define ALLOC_TAG_SECTION_NAME "alloc_tags" > > > > struct codetag_bytes { > > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > > index 58991ab09d84..a5bf4e72c154 100644 > > --- a/lib/alloc_tag.c > > +++ b/lib/alloc_tag.c > > @@ -6,6 +6,7 @@ > > #include <linux/kallsyms.h> > > #include <linux/module.h> > > #include <linux/page_ext.h> > > +#include <linux/pgalloc_tag.h> > > #include <linux/proc_fs.h> > > #include <linux/seq_buf.h> > > #include <linux/seq_file.h> > > @@ -26,6 +27,82 @@ static bool mem_profiling_support; > > > > static struct codetag_type *alloc_tag_cttype; > > > > +/* > > + * State of the alloc_tag > > + * > > + * This is used to describe the states of the alloc_tag during bootup. > > + * > > + * When we need to allocate page_ext to store codetag, we face an > > + * initialization timing problem: > > + * > > + * Due to initialization order, pages may be allocated via buddy system > > + * before page_ext is fully allocated and initialized. Although these > > + * pages call the allocation hooks, the codetag will not be set because > > + * page_ext is not yet available. > > + * > > + * When these pages are later free to the buddy system, it triggers > > + * warnings because their codetag is actually empty if > > + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. > > + * > > + * Additionally, in this situation, we cannot record detailed allocation > > + * information for these pages. > > + */ > > +enum mem_profiling_state { > > + DOWN, /* No mem_profiling functionality yet */ > > + UP /* Everything is working */ > > +}; > > + > > +static enum mem_profiling_state mem_profiling_state = DOWN; > > + > > +bool mem_profiling_is_available(void) > > +{ > > + return mem_profiling_state == UP; > > +} > > + > > +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG > > + > > +#define EARLY_ALLOC_PFN_MAX 256 > > + > > +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; > > It's unfortunate that this isn't __initdata. > > > +static unsigned int early_pfn_count; > > +static DEFINE_SPINLOCK(early_pfn_lock); > > + > > > > ... > > > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, > > alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > > update_page_tag_ref(handle, &ref); > > put_page_tag_ref(handle); > > + } else { This branch can be marked as "unlikely". > > + /* > > + * page_ext is not available yet, record the pfn so we can > > + * clear the tag ref later when page_ext is initialized. > > + */ > > + if (!mem_profiling_is_available()) > > + alloc_tag_add_early_pfn(page_to_pfn(page)); > > } > > } > > All because of this, I believe. Is this fixable? > > If we take that `else', we know we're running in __init code, yes? I > don't see how `__init pgalloc_tag_add_early()' could be made to work. > hrm. Something clever, please. We can have a pointer to a function that is initialized to point to alloc_tag_add_early_pfn, which is defined as __init and uses early_pfns which now can be defined as __initdata. After clear_early_alloc_pfn_tag_refs() is done we reset that pointer to NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() directly checks that pointer and if it's not NULL then calls the function that it points to. This way __pgalloc_tag_add() which is not an __init function will be invoking alloc_tag_add_early_pfn() __init function only until we are done with initialization. I haven't tried this but I think that should work. This also eliminates the need for mem_profiling_state variable since we can use this function pointer instead. > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-19 23:44 ` Suren Baghdasaryan @ 2026-03-19 23:48 ` Suren Baghdasaryan 2026-03-20 1:57 ` Hao Ge 0 siblings, 1 reply; 21+ messages in thread From: Suren Baghdasaryan @ 2026-03-19 23:48 UTC (permalink / raw) To: Andrew Morton; +Cc: Hao Ge, Kent Overstreet, linux-mm, linux-kernel On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: > > > > > Due to initialization ordering, page_ext is allocated and initialized > > > relatively late during boot. Some pages have already been allocated > > > and freed before page_ext becomes available, leaving their codetag > > > uninitialized. > > Hi Hao, > Thanks for the report. > Hmm. So, we are allocating pages before page_ext is initialized... > > > > > > > A clear example is in init_section_page_ext(): alloc_page_ext() calls > > > kmemleak_alloc(). Forgot to ask. The example you are using here is for page_ext allocation itself. Do you have any other examples where page allocation happens before page_ext initialization? If that's the only place, then we might be able to fix this in a simpler way by doing something special for alloc_page_ext(). > > > If the slab cache has no free objects, it falls back > > > to the buddy allocator to allocate memory. However, at this point page_ext > > > is not yet fully initialized, so these newly allocated pages have no > > > codetag set. These pages may later be reclaimed by KASAN,which causes > > > the warning to trigger when they are freed because their codetag ref is > > > still empty. > > > > > > Use a global array to track pages allocated before page_ext is fully > > > initialized, similar to how kmemleak tracks early allocations. > > > When page_ext initialization completes, set their codetag > > > to empty to avoid warnings when they are freed later. > > > > > > ... > > > > > > --- a/include/linux/alloc_tag.h > > > +++ b/include/linux/alloc_tag.h > > > @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) > > > > > > #ifdef CONFIG_MEM_ALLOC_PROFILING > > > > > > +bool mem_profiling_is_available(void); > > > +void alloc_tag_add_early_pfn(unsigned long pfn); > > > + > > > #define ALLOC_TAG_SECTION_NAME "alloc_tags" > > > > > > struct codetag_bytes { > > > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > > > index 58991ab09d84..a5bf4e72c154 100644 > > > --- a/lib/alloc_tag.c > > > +++ b/lib/alloc_tag.c > > > @@ -6,6 +6,7 @@ > > > #include <linux/kallsyms.h> > > > #include <linux/module.h> > > > #include <linux/page_ext.h> > > > +#include <linux/pgalloc_tag.h> > > > #include <linux/proc_fs.h> > > > #include <linux/seq_buf.h> > > > #include <linux/seq_file.h> > > > @@ -26,6 +27,82 @@ static bool mem_profiling_support; > > > > > > static struct codetag_type *alloc_tag_cttype; > > > > > > +/* > > > + * State of the alloc_tag > > > + * > > > + * This is used to describe the states of the alloc_tag during bootup. > > > + * > > > + * When we need to allocate page_ext to store codetag, we face an > > > + * initialization timing problem: > > > + * > > > + * Due to initialization order, pages may be allocated via buddy system > > > + * before page_ext is fully allocated and initialized. Although these > > > + * pages call the allocation hooks, the codetag will not be set because > > > + * page_ext is not yet available. > > > + * > > > + * When these pages are later free to the buddy system, it triggers > > > + * warnings because their codetag is actually empty if > > > + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. > > > + * > > > + * Additionally, in this situation, we cannot record detailed allocation > > > + * information for these pages. > > > + */ > > > +enum mem_profiling_state { > > > + DOWN, /* No mem_profiling functionality yet */ > > > + UP /* Everything is working */ > > > +}; > > > + > > > +static enum mem_profiling_state mem_profiling_state = DOWN; > > > + > > > +bool mem_profiling_is_available(void) > > > +{ > > > + return mem_profiling_state == UP; > > > +} > > > + > > > +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG > > > + > > > +#define EARLY_ALLOC_PFN_MAX 256 > > > + > > > +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; > > > > It's unfortunate that this isn't __initdata. > > > > > +static unsigned int early_pfn_count; > > > +static DEFINE_SPINLOCK(early_pfn_lock); > > > + > > > > > > ... > > > > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, > > > alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > > > update_page_tag_ref(handle, &ref); > > > put_page_tag_ref(handle); > > > + } else { > > This branch can be marked as "unlikely". > > > > + /* > > > + * page_ext is not available yet, record the pfn so we can > > > + * clear the tag ref later when page_ext is initialized. > > > + */ > > > + if (!mem_profiling_is_available()) > > > + alloc_tag_add_early_pfn(page_to_pfn(page)); > > > } > > > } > > > > All because of this, I believe. Is this fixable? > > > > If we take that `else', we know we're running in __init code, yes? I > > don't see how `__init pgalloc_tag_add_early()' could be made to work. > > hrm. Something clever, please. > > We can have a pointer to a function that is initialized to point to > alloc_tag_add_early_pfn, which is defined as __init and uses > early_pfns which now can be defined as __initdata. After > clear_early_alloc_pfn_tag_refs() is done we reset that pointer to > NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() > directly checks that pointer and if it's not NULL then calls the > function that it points to. This way __pgalloc_tag_add() which is not > an __init function will be invoking alloc_tag_add_early_pfn() __init > function only until we are done with initialization. I haven't tried > this but I think that should work. This also eliminates the need for > mem_profiling_state variable since we can use this function pointer > instead. > > > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-19 23:48 ` Suren Baghdasaryan @ 2026-03-20 1:57 ` Hao Ge 2026-03-20 2:14 ` Suren Baghdasaryan 0 siblings, 1 reply; 21+ messages in thread From: Hao Ge @ 2026-03-20 1:57 UTC (permalink / raw) To: Suren Baghdasaryan, Andrew Morton; +Cc: Kent Overstreet, linux-mm, linux-kernel On 2026/3/20 07:48, Suren Baghdasaryan wrote: > On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@google.com> wrote: >> On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: >>> On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: >>> >>>> Due to initialization ordering, page_ext is allocated and initialized >>>> relatively late during boot. Some pages have already been allocated >>>> and freed before page_ext becomes available, leaving their codetag >>>> uninitialized. >> Hi Hao, >> Thanks for the report. >> Hmm. So, we are allocating pages before page_ext is initialized... >> >>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls >>>> kmemleak_alloc(). > Forgot to ask. The example you are using here is for page_ext > allocation itself. Do you have any other examples where page > allocation happens before page_ext initialization? If that's the only > place, then we might be able to fix this in a simpler way by doing > something special for alloc_page_ext(). Hi Suren To help illustrate the point, here's the debug log I added: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2d4b6f1a554e..ebfe636f5b07 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1293,6 +1293,9 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); update_page_tag_ref(handle, &ref); put_page_tag_ref(handle); + } else { + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); + dump_stack(); } } And I caught the following logs: [ 0.296399] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea000400c700 pfn=1049372 nr=1 [ 0.296400] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-rc4-dirty #12 PREEMPT(lazy) [ 0.296402] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 0.296402] Call Trace: [ 0.296403] <TASK> [ 0.296403] dump_stack_lvl+0x53/0x70 [ 0.296405] __pgalloc_tag_add+0x3a3/0x6e0 [ 0.296406] ? __pfx___pgalloc_tag_add+0x10/0x10 [ 0.296407] ? kasan_unpoison+0x27/0x60 [ 0.296409] ? __kasan_unpoison_pages+0x2c/0x40 [ 0.296411] get_page_from_freelist+0xa54/0x1310 [ 0.296413] __alloc_frozen_pages_noprof+0x206/0x4c0 [ 0.296415] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 [ 0.296417] ? stack_depot_save_flags+0x3f/0x680 [ 0.296418] ? ___slab_alloc+0x518/0x530 [ 0.296420] alloc_pages_mpol+0x13a/0x3f0 [ 0.296421] ? __pfx_alloc_pages_mpol+0x10/0x10 [ 0.296423] ? _raw_spin_lock_irqsave+0x8a/0xf0 [ 0.296424] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 0.296426] alloc_slab_page+0xc2/0x130 [ 0.296427] allocate_slab+0x77/0x2c0 [ 0.296429] ? syscall_enter_define_fields+0x3bb/0x5f0 [ 0.296430] ___slab_alloc+0x125/0x530 [ 0.296432] ? __trace_define_field+0x252/0x3d0 [ 0.296433] __kmalloc_noprof+0x329/0x630 [ 0.296435] ? syscall_enter_define_fields+0x3bb/0x5f0 [ 0.296436] syscall_enter_define_fields+0x3bb/0x5f0 [ 0.296438] ? __pfx_syscall_enter_define_fields+0x10/0x10 [ 0.296440] event_define_fields+0x326/0x540 [ 0.296441] __trace_early_add_events+0xac/0x3c0 [ 0.296443] trace_event_init+0x24c/0x460 [ 0.296445] trace_init+0x9/0x20 [ 0.296446] start_kernel+0x199/0x3c0 [ 0.296448] x86_64_start_reservations+0x18/0x30 [ 0.296449] x86_64_start_kernel+0xe2/0xf0 [ 0.296451] common_startup_64+0x13e/0x141 [ 0.296453] </TASK> [ 0.312234] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea000400f900 pfn=1049572 nr=1 [ 0.312234] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-rc4-dirty #12 PREEMPT(lazy) [ 0.312236] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 0.312236] Call Trace: [ 0.312237] <TASK> [ 0.312237] dump_stack_lvl+0x53/0x70 [ 0.312239] __pgalloc_tag_add+0x3a3/0x6e0 [ 0.312240] ? __pfx___pgalloc_tag_add+0x10/0x10 [ 0.312241] ? rmqueue.constprop.0+0x4fc/0x1ce0 [ 0.312243] ? kasan_unpoison+0x27/0x60 [ 0.312244] ? __kasan_unpoison_pages+0x2c/0x40 [ 0.312246] get_page_from_freelist+0xa54/0x1310 [ 0.312248] __alloc_frozen_pages_noprof+0x206/0x4c0 [ 0.312250] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 [ 0.312253] alloc_slab_page+0x39/0x130 [ 0.312254] allocate_slab+0x77/0x2c0 [ 0.312255] ? alloc_cpumask_var_node+0xc7/0x230 [ 0.312257] ___slab_alloc+0x46d/0x530 [ 0.312259] __kmalloc_node_noprof+0x2fa/0x680 [ 0.312261] ? alloc_cpumask_var_node+0xc7/0x230 [ 0.312263] alloc_cpumask_var_node+0xc7/0x230 [ 0.312264] init_desc+0x141/0x6b0 [ 0.312266] alloc_desc+0x108/0x1b0 [ 0.312267] early_irq_init+0xee/0x1c0 [ 0.312268] ? __pfx_early_irq_init+0x10/0x10 [ 0.312271] start_kernel+0x1ab/0x3c0 [ 0.312272] x86_64_start_reservations+0x18/0x30 [ 0.312274] x86_64_start_kernel+0xe2/0xf0 [ 0.312275] common_startup_64+0x13e/0x141 [ 0.312277] </TASK> [ 0.312834] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea000400fc00 pfn=1049584 nr=1 [ 0.312835] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-rc4-dirty #12 PREEMPT(lazy) [ 0.312836] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 0.312837] Call Trace: [ 0.312837] <TASK> [ 0.312838] dump_stack_lvl+0x53/0x70 [ 0.312840] __pgalloc_tag_add+0x3a3/0x6e0 [ 0.312841] ? __pfx___pgalloc_tag_add+0x10/0x10 [ 0.312842] ? rmqueue.constprop.0+0x4fc/0x1ce0 [ 0.312844] ? kasan_unpoison+0x27/0x60 [ 0.312845] ? __kasan_unpoison_pages+0x2c/0x40 [ 0.312847] get_page_from_freelist+0xa54/0x1310 [ 0.312849] __alloc_frozen_pages_noprof+0x206/0x4c0 [ 0.312851] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 [ 0.312853] alloc_pages_mpol+0x13a/0x3f0 [ 0.312855] ? __pfx_alloc_pages_mpol+0x10/0x10 [ 0.312856] ? xas_find+0x2d8/0x450 [ 0.312858] ? _raw_spin_lock+0x84/0xe0 [ 0.312859] ? __pfx__raw_spin_lock+0x10/0x10 [ 0.312861] alloc_pages_noprof+0xf6/0x2b0 [ 0.312862] __change_page_attr+0x293/0x850 [ 0.312864] ? __pfx___change_page_attr+0x10/0x10 [ 0.312865] ? _vm_unmap_aliases+0x2d0/0x650 [ 0.312868] ? __pfx__vm_unmap_aliases+0x10/0x10 [ 0.312869] __change_page_attr_set_clr+0x16c/0x360 [ 0.312871] ? spp_getpage+0xbb/0x1e0 [ 0.312872] change_page_attr_set_clr+0x220/0x3c0 [ 0.312873] ? flush_tlb_one_kernel+0xf/0x30 [ 0.312875] ? set_pte_vaddr_p4d+0x110/0x180 [ 0.312877] ? __pfx_change_page_attr_set_clr+0x10/0x10 [ 0.312878] ? __pfx_set_pte_vaddr_p4d+0x10/0x10 [ 0.312881] ? __pfx_mtree_load+0x10/0x10 [ 0.312883] ? __pfx_mtree_load+0x10/0x10 [ 0.312884] ? __asan_memcpy+0x3c/0x60 [ 0.312886] ? set_intr_gate+0x10c/0x150 [ 0.312888] set_memory_ro+0x76/0xa0 [ 0.312889] ? __pfx_set_memory_ro+0x10/0x10 [ 0.312891] idt_setup_apic_and_irq_gates+0x2c1/0x390 and more. off topic - if we were to handle only alloc_page_ext() specifically, what would be the most straightforward solution in your mind? I'd really appreciate your insight. Thanks. >>>> If the slab cache has no free objects, it falls back >>>> to the buddy allocator to allocate memory. However, at this point page_ext >>>> is not yet fully initialized, so these newly allocated pages have no >>>> codetag set. These pages may later be reclaimed by KASAN,which causes >>>> the warning to trigger when they are freed because their codetag ref is >>>> still empty. >>>> >>>> Use a global array to track pages allocated before page_ext is fully >>>> initialized, similar to how kmemleak tracks early allocations. >>>> When page_ext initialization completes, set their codetag >>>> to empty to avoid warnings when they are freed later. >>>> >>>> ... >>>> >>>> --- a/include/linux/alloc_tag.h >>>> +++ b/include/linux/alloc_tag.h >>>> @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) >>>> >>>> #ifdef CONFIG_MEM_ALLOC_PROFILING >>>> >>>> +bool mem_profiling_is_available(void); >>>> +void alloc_tag_add_early_pfn(unsigned long pfn); >>>> + >>>> #define ALLOC_TAG_SECTION_NAME "alloc_tags" >>>> >>>> struct codetag_bytes { >>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c >>>> index 58991ab09d84..a5bf4e72c154 100644 >>>> --- a/lib/alloc_tag.c >>>> +++ b/lib/alloc_tag.c >>>> @@ -6,6 +6,7 @@ >>>> #include <linux/kallsyms.h> >>>> #include <linux/module.h> >>>> #include <linux/page_ext.h> >>>> +#include <linux/pgalloc_tag.h> >>>> #include <linux/proc_fs.h> >>>> #include <linux/seq_buf.h> >>>> #include <linux/seq_file.h> >>>> @@ -26,6 +27,82 @@ static bool mem_profiling_support; >>>> >>>> static struct codetag_type *alloc_tag_cttype; >>>> >>>> +/* >>>> + * State of the alloc_tag >>>> + * >>>> + * This is used to describe the states of the alloc_tag during bootup. >>>> + * >>>> + * When we need to allocate page_ext to store codetag, we face an >>>> + * initialization timing problem: >>>> + * >>>> + * Due to initialization order, pages may be allocated via buddy system >>>> + * before page_ext is fully allocated and initialized. Although these >>>> + * pages call the allocation hooks, the codetag will not be set because >>>> + * page_ext is not yet available. >>>> + * >>>> + * When these pages are later free to the buddy system, it triggers >>>> + * warnings because their codetag is actually empty if >>>> + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. >>>> + * >>>> + * Additionally, in this situation, we cannot record detailed allocation >>>> + * information for these pages. >>>> + */ >>>> +enum mem_profiling_state { >>>> + DOWN, /* No mem_profiling functionality yet */ >>>> + UP /* Everything is working */ >>>> +}; >>>> + >>>> +static enum mem_profiling_state mem_profiling_state = DOWN; >>>> + >>>> +bool mem_profiling_is_available(void) >>>> +{ >>>> + return mem_profiling_state == UP; >>>> +} >>>> + >>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG >>>> + >>>> +#define EARLY_ALLOC_PFN_MAX 256 >>>> + >>>> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; >>> It's unfortunate that this isn't __initdata. >>> >>>> +static unsigned int early_pfn_count; >>>> +static DEFINE_SPINLOCK(early_pfn_lock); >>>> + >>>> >>>> ... >>>> >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, >>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >>>> update_page_tag_ref(handle, &ref); >>>> put_page_tag_ref(handle); >>>> + } else { >> This branch can be marked as "unlikely". >> >>>> + /* >>>> + * page_ext is not available yet, record the pfn so we can >>>> + * clear the tag ref later when page_ext is initialized. >>>> + */ >>>> + if (!mem_profiling_is_available()) >>>> + alloc_tag_add_early_pfn(page_to_pfn(page)); >>>> } >>>> } >>> All because of this, I believe. Is this fixable? >>> >>> If we take that `else', we know we're running in __init code, yes? I >>> don't see how `__init pgalloc_tag_add_early()' could be made to work. >>> hrm. Something clever, please. >> We can have a pointer to a function that is initialized to point to >> alloc_tag_add_early_pfn, which is defined as __init and uses >> early_pfns which now can be defined as __initdata. After >> clear_early_alloc_pfn_tag_refs() is done we reset that pointer to >> NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() >> directly checks that pointer and if it's not NULL then calls the >> function that it points to. This way __pgalloc_tag_add() which is not >> an __init function will be invoking alloc_tag_add_early_pfn() __init >> function only until we are done with initialization. I haven't tried >> this but I think that should work. This also eliminates the need for >> mem_profiling_state variable since we can use this function pointer >> instead. >> >> ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-20 1:57 ` Hao Ge @ 2026-03-20 2:14 ` Suren Baghdasaryan 2026-03-23 9:15 ` Hao Ge 0 siblings, 1 reply; 21+ messages in thread From: Suren Baghdasaryan @ 2026-03-20 2:14 UTC (permalink / raw) To: Hao Ge; +Cc: Andrew Morton, Kent Overstreet, linux-mm, linux-kernel On Thu, Mar 19, 2026 at 6:58 PM Hao Ge <hao.ge@linux.dev> wrote: > > > On 2026/3/20 07:48, Suren Baghdasaryan wrote: > > On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@google.com> wrote: > >> On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: > >>> On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: > >>> > >>>> Due to initialization ordering, page_ext is allocated and initialized > >>>> relatively late during boot. Some pages have already been allocated > >>>> and freed before page_ext becomes available, leaving their codetag > >>>> uninitialized. > >> Hi Hao, > >> Thanks for the report. > >> Hmm. So, we are allocating pages before page_ext is initialized... > >> > >>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls > >>>> kmemleak_alloc(). > > Forgot to ask. The example you are using here is for page_ext > > allocation itself. Do you have any other examples where page > > allocation happens before page_ext initialization? If that's the only > > place, then we might be able to fix this in a simpler way by doing > > something special for alloc_page_ext(). > > Hi Suren > > To help illustrate the point, here's the debug log I added: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 2d4b6f1a554e..ebfe636f5b07 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1293,6 +1293,9 @@ void __pgalloc_tag_add(struct page *page, struct > task_struct *task, > alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > update_page_tag_ref(handle, &ref); > put_page_tag_ref(handle); > + } else { > + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! > page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > + dump_stack(); > } > } > > > And I caught the following logs: > > [ 0.296399] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea000400c700 pfn=1049372 nr=1 > [ 0.296400] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > 7.0.0-rc4-dirty #12 PREEMPT(lazy) > [ 0.296402] Hardware name: Red Hat KVM, BIOS > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [ 0.296402] Call Trace: > [ 0.296403] <TASK> > [ 0.296403] dump_stack_lvl+0x53/0x70 > [ 0.296405] __pgalloc_tag_add+0x3a3/0x6e0 > [ 0.296406] ? __pfx___pgalloc_tag_add+0x10/0x10 > [ 0.296407] ? kasan_unpoison+0x27/0x60 > [ 0.296409] ? __kasan_unpoison_pages+0x2c/0x40 > [ 0.296411] get_page_from_freelist+0xa54/0x1310 > [ 0.296413] __alloc_frozen_pages_noprof+0x206/0x4c0 > [ 0.296415] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > [ 0.296417] ? stack_depot_save_flags+0x3f/0x680 > [ 0.296418] ? ___slab_alloc+0x518/0x530 > [ 0.296420] alloc_pages_mpol+0x13a/0x3f0 > [ 0.296421] ? __pfx_alloc_pages_mpol+0x10/0x10 > [ 0.296423] ? _raw_spin_lock_irqsave+0x8a/0xf0 > [ 0.296424] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 > [ 0.296426] alloc_slab_page+0xc2/0x130 > [ 0.296427] allocate_slab+0x77/0x2c0 > [ 0.296429] ? syscall_enter_define_fields+0x3bb/0x5f0 > [ 0.296430] ___slab_alloc+0x125/0x530 > [ 0.296432] ? __trace_define_field+0x252/0x3d0 > [ 0.296433] __kmalloc_noprof+0x329/0x630 > [ 0.296435] ? syscall_enter_define_fields+0x3bb/0x5f0 > [ 0.296436] syscall_enter_define_fields+0x3bb/0x5f0 > [ 0.296438] ? __pfx_syscall_enter_define_fields+0x10/0x10 > [ 0.296440] event_define_fields+0x326/0x540 > [ 0.296441] __trace_early_add_events+0xac/0x3c0 > [ 0.296443] trace_event_init+0x24c/0x460 > [ 0.296445] trace_init+0x9/0x20 > [ 0.296446] start_kernel+0x199/0x3c0 > [ 0.296448] x86_64_start_reservations+0x18/0x30 > [ 0.296449] x86_64_start_kernel+0xe2/0xf0 > [ 0.296451] common_startup_64+0x13e/0x141 > [ 0.296453] </TASK> > > > [ 0.312234] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea000400f900 pfn=1049572 nr=1 > [ 0.312234] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > 7.0.0-rc4-dirty #12 PREEMPT(lazy) > [ 0.312236] Hardware name: Red Hat KVM, BIOS > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [ 0.312236] Call Trace: > [ 0.312237] <TASK> > [ 0.312237] dump_stack_lvl+0x53/0x70 > [ 0.312239] __pgalloc_tag_add+0x3a3/0x6e0 > [ 0.312240] ? __pfx___pgalloc_tag_add+0x10/0x10 > [ 0.312241] ? rmqueue.constprop.0+0x4fc/0x1ce0 > [ 0.312243] ? kasan_unpoison+0x27/0x60 > [ 0.312244] ? __kasan_unpoison_pages+0x2c/0x40 > [ 0.312246] get_page_from_freelist+0xa54/0x1310 > [ 0.312248] __alloc_frozen_pages_noprof+0x206/0x4c0 > [ 0.312250] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > [ 0.312253] alloc_slab_page+0x39/0x130 > [ 0.312254] allocate_slab+0x77/0x2c0 > [ 0.312255] ? alloc_cpumask_var_node+0xc7/0x230 > [ 0.312257] ___slab_alloc+0x46d/0x530 > [ 0.312259] __kmalloc_node_noprof+0x2fa/0x680 > [ 0.312261] ? alloc_cpumask_var_node+0xc7/0x230 > [ 0.312263] alloc_cpumask_var_node+0xc7/0x230 > [ 0.312264] init_desc+0x141/0x6b0 > [ 0.312266] alloc_desc+0x108/0x1b0 > [ 0.312267] early_irq_init+0xee/0x1c0 > [ 0.312268] ? __pfx_early_irq_init+0x10/0x10 > [ 0.312271] start_kernel+0x1ab/0x3c0 > [ 0.312272] x86_64_start_reservations+0x18/0x30 > [ 0.312274] x86_64_start_kernel+0xe2/0xf0 > [ 0.312275] common_startup_64+0x13e/0x141 > [ 0.312277] </TASK> > > [ 0.312834] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea000400fc00 pfn=1049584 nr=1 > [ 0.312835] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > 7.0.0-rc4-dirty #12 PREEMPT(lazy) > [ 0.312836] Hardware name: Red Hat KVM, BIOS > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [ 0.312837] Call Trace: > [ 0.312837] <TASK> > [ 0.312838] dump_stack_lvl+0x53/0x70 > [ 0.312840] __pgalloc_tag_add+0x3a3/0x6e0 > [ 0.312841] ? __pfx___pgalloc_tag_add+0x10/0x10 > [ 0.312842] ? rmqueue.constprop.0+0x4fc/0x1ce0 > [ 0.312844] ? kasan_unpoison+0x27/0x60 > [ 0.312845] ? __kasan_unpoison_pages+0x2c/0x40 > [ 0.312847] get_page_from_freelist+0xa54/0x1310 > [ 0.312849] __alloc_frozen_pages_noprof+0x206/0x4c0 > [ 0.312851] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > [ 0.312853] alloc_pages_mpol+0x13a/0x3f0 > [ 0.312855] ? __pfx_alloc_pages_mpol+0x10/0x10 > [ 0.312856] ? xas_find+0x2d8/0x450 > [ 0.312858] ? _raw_spin_lock+0x84/0xe0 > [ 0.312859] ? __pfx__raw_spin_lock+0x10/0x10 > [ 0.312861] alloc_pages_noprof+0xf6/0x2b0 > [ 0.312862] __change_page_attr+0x293/0x850 > [ 0.312864] ? __pfx___change_page_attr+0x10/0x10 > [ 0.312865] ? _vm_unmap_aliases+0x2d0/0x650 > [ 0.312868] ? __pfx__vm_unmap_aliases+0x10/0x10 > [ 0.312869] __change_page_attr_set_clr+0x16c/0x360 > [ 0.312871] ? spp_getpage+0xbb/0x1e0 > [ 0.312872] change_page_attr_set_clr+0x220/0x3c0 > [ 0.312873] ? flush_tlb_one_kernel+0xf/0x30 > [ 0.312875] ? set_pte_vaddr_p4d+0x110/0x180 > [ 0.312877] ? __pfx_change_page_attr_set_clr+0x10/0x10 > [ 0.312878] ? __pfx_set_pte_vaddr_p4d+0x10/0x10 > [ 0.312881] ? __pfx_mtree_load+0x10/0x10 > [ 0.312883] ? __pfx_mtree_load+0x10/0x10 > [ 0.312884] ? __asan_memcpy+0x3c/0x60 > [ 0.312886] ? set_intr_gate+0x10c/0x150 > [ 0.312888] set_memory_ro+0x76/0xa0 > [ 0.312889] ? __pfx_set_memory_ro+0x10/0x10 > [ 0.312891] idt_setup_apic_and_irq_gates+0x2c1/0x390 > > and more. Ok, it's not the only place. Got your point. > > off topic - if we were to handle only alloc_page_ext() specifically, > what would be the most straightforward > > solution in your mind? I'd really appreciate your insight. I was thinking if it's the only special case maybe we can handle it somehow differently, like we do when we allocate obj_ext vectors for slabs using __GFP_NO_OBJ_EXT. I haven't found a good solution yet but since it's not a special case we would not be able to use it even if I came up with something... I think your way is the most straight-forward but please try my suggestion to see if we can avoid extra overhead. Thanks, Suren. > > Thanks. > > > >>>> If the slab cache has no free objects, it falls back > >>>> to the buddy allocator to allocate memory. However, at this point page_ext > >>>> is not yet fully initialized, so these newly allocated pages have no > >>>> codetag set. These pages may later be reclaimed by KASAN,which causes > >>>> the warning to trigger when they are freed because their codetag ref is > >>>> still empty. > >>>> > >>>> Use a global array to track pages allocated before page_ext is fully > >>>> initialized, similar to how kmemleak tracks early allocations. > >>>> When page_ext initialization completes, set their codetag > >>>> to empty to avoid warnings when they are freed later. > >>>> > >>>> ... > >>>> > >>>> --- a/include/linux/alloc_tag.h > >>>> +++ b/include/linux/alloc_tag.h > >>>> @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) > >>>> > >>>> #ifdef CONFIG_MEM_ALLOC_PROFILING > >>>> > >>>> +bool mem_profiling_is_available(void); > >>>> +void alloc_tag_add_early_pfn(unsigned long pfn); > >>>> + > >>>> #define ALLOC_TAG_SECTION_NAME "alloc_tags" > >>>> > >>>> struct codetag_bytes { > >>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > >>>> index 58991ab09d84..a5bf4e72c154 100644 > >>>> --- a/lib/alloc_tag.c > >>>> +++ b/lib/alloc_tag.c > >>>> @@ -6,6 +6,7 @@ > >>>> #include <linux/kallsyms.h> > >>>> #include <linux/module.h> > >>>> #include <linux/page_ext.h> > >>>> +#include <linux/pgalloc_tag.h> > >>>> #include <linux/proc_fs.h> > >>>> #include <linux/seq_buf.h> > >>>> #include <linux/seq_file.h> > >>>> @@ -26,6 +27,82 @@ static bool mem_profiling_support; > >>>> > >>>> static struct codetag_type *alloc_tag_cttype; > >>>> > >>>> +/* > >>>> + * State of the alloc_tag > >>>> + * > >>>> + * This is used to describe the states of the alloc_tag during bootup. > >>>> + * > >>>> + * When we need to allocate page_ext to store codetag, we face an > >>>> + * initialization timing problem: > >>>> + * > >>>> + * Due to initialization order, pages may be allocated via buddy system > >>>> + * before page_ext is fully allocated and initialized. Although these > >>>> + * pages call the allocation hooks, the codetag will not be set because > >>>> + * page_ext is not yet available. > >>>> + * > >>>> + * When these pages are later free to the buddy system, it triggers > >>>> + * warnings because their codetag is actually empty if > >>>> + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. > >>>> + * > >>>> + * Additionally, in this situation, we cannot record detailed allocation > >>>> + * information for these pages. > >>>> + */ > >>>> +enum mem_profiling_state { > >>>> + DOWN, /* No mem_profiling functionality yet */ > >>>> + UP /* Everything is working */ > >>>> +}; > >>>> + > >>>> +static enum mem_profiling_state mem_profiling_state = DOWN; > >>>> + > >>>> +bool mem_profiling_is_available(void) > >>>> +{ > >>>> + return mem_profiling_state == UP; > >>>> +} > >>>> + > >>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG > >>>> + > >>>> +#define EARLY_ALLOC_PFN_MAX 256 > >>>> + > >>>> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; > >>> It's unfortunate that this isn't __initdata. > >>> > >>>> +static unsigned int early_pfn_count; > >>>> +static DEFINE_SPINLOCK(early_pfn_lock); > >>>> + > >>>> > >>>> ... > >>>> > >>>> --- a/mm/page_alloc.c > >>>> +++ b/mm/page_alloc.c > >>>> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, > >>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >>>> update_page_tag_ref(handle, &ref); > >>>> put_page_tag_ref(handle); > >>>> + } else { > >> This branch can be marked as "unlikely". > >> > >>>> + /* > >>>> + * page_ext is not available yet, record the pfn so we can > >>>> + * clear the tag ref later when page_ext is initialized. > >>>> + */ > >>>> + if (!mem_profiling_is_available()) > >>>> + alloc_tag_add_early_pfn(page_to_pfn(page)); > >>>> } > >>>> } > >>> All because of this, I believe. Is this fixable? > >>> > >>> If we take that `else', we know we're running in __init code, yes? I > >>> don't see how `__init pgalloc_tag_add_early()' could be made to work. > >>> hrm. Something clever, please. > >> We can have a pointer to a function that is initialized to point to > >> alloc_tag_add_early_pfn, which is defined as __init and uses > >> early_pfns which now can be defined as __initdata. After > >> clear_early_alloc_pfn_tag_refs() is done we reset that pointer to > >> NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() > >> directly checks that pointer and if it's not NULL then calls the > >> function that it points to. This way __pgalloc_tag_add() which is not > >> an __init function will be invoking alloc_tag_add_early_pfn() __init > >> function only until we are done with initialization. I haven't tried > >> this but I think that should work. This also eliminates the need for > >> mem_profiling_state variable since we can use this function pointer > >> instead. > >> > >> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-20 2:14 ` Suren Baghdasaryan @ 2026-03-23 9:15 ` Hao Ge 2026-03-23 22:47 ` Suren Baghdasaryan 0 siblings, 1 reply; 21+ messages in thread From: Hao Ge @ 2026-03-23 9:15 UTC (permalink / raw) To: Suren Baghdasaryan; +Cc: Andrew Morton, Kent Overstreet, linux-mm, linux-kernel On 2026/3/20 10:14, Suren Baghdasaryan wrote: > On Thu, Mar 19, 2026 at 6:58 PM Hao Ge <hao.ge@linux.dev> wrote: >> >> On 2026/3/20 07:48, Suren Baghdasaryan wrote: >>> On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@google.com> wrote: >>>> On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: >>>>> On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: >>>>> >>>>>> Due to initialization ordering, page_ext is allocated and initialized >>>>>> relatively late during boot. Some pages have already been allocated >>>>>> and freed before page_ext becomes available, leaving their codetag >>>>>> uninitialized. >>>> Hi Hao, >>>> Thanks for the report. >>>> Hmm. So, we are allocating pages before page_ext is initialized... >>>> >>>>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls >>>>>> kmemleak_alloc(). >>> Forgot to ask. The example you are using here is for page_ext >>> allocation itself. Do you have any other examples where page >>> allocation happens before page_ext initialization? If that's the only >>> place, then we might be able to fix this in a simpler way by doing >>> something special for alloc_page_ext(). >> Hi Suren >> >> To help illustrate the point, here's the debug log I added: >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 2d4b6f1a554e..ebfe636f5b07 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1293,6 +1293,9 @@ void __pgalloc_tag_add(struct page *page, struct >> task_struct *task, >> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >> update_page_tag_ref(handle, &ref); >> put_page_tag_ref(handle); >> + } else { >> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! >> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); >> + dump_stack(); >> } >> } >> >> >> And I caught the following logs: >> >> [ 0.296399] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea000400c700 pfn=1049372 nr=1 >> [ 0.296400] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >> [ 0.296402] Hardware name: Red Hat KVM, BIOS >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >> [ 0.296402] Call Trace: >> [ 0.296403] <TASK> >> [ 0.296403] dump_stack_lvl+0x53/0x70 >> [ 0.296405] __pgalloc_tag_add+0x3a3/0x6e0 >> [ 0.296406] ? __pfx___pgalloc_tag_add+0x10/0x10 >> [ 0.296407] ? kasan_unpoison+0x27/0x60 >> [ 0.296409] ? __kasan_unpoison_pages+0x2c/0x40 >> [ 0.296411] get_page_from_freelist+0xa54/0x1310 >> [ 0.296413] __alloc_frozen_pages_noprof+0x206/0x4c0 >> [ 0.296415] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >> [ 0.296417] ? stack_depot_save_flags+0x3f/0x680 >> [ 0.296418] ? ___slab_alloc+0x518/0x530 >> [ 0.296420] alloc_pages_mpol+0x13a/0x3f0 >> [ 0.296421] ? __pfx_alloc_pages_mpol+0x10/0x10 >> [ 0.296423] ? _raw_spin_lock_irqsave+0x8a/0xf0 >> [ 0.296424] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 >> [ 0.296426] alloc_slab_page+0xc2/0x130 >> [ 0.296427] allocate_slab+0x77/0x2c0 >> [ 0.296429] ? syscall_enter_define_fields+0x3bb/0x5f0 >> [ 0.296430] ___slab_alloc+0x125/0x530 >> [ 0.296432] ? __trace_define_field+0x252/0x3d0 >> [ 0.296433] __kmalloc_noprof+0x329/0x630 >> [ 0.296435] ? syscall_enter_define_fields+0x3bb/0x5f0 >> [ 0.296436] syscall_enter_define_fields+0x3bb/0x5f0 >> [ 0.296438] ? __pfx_syscall_enter_define_fields+0x10/0x10 >> [ 0.296440] event_define_fields+0x326/0x540 >> [ 0.296441] __trace_early_add_events+0xac/0x3c0 >> [ 0.296443] trace_event_init+0x24c/0x460 >> [ 0.296445] trace_init+0x9/0x20 >> [ 0.296446] start_kernel+0x199/0x3c0 >> [ 0.296448] x86_64_start_reservations+0x18/0x30 >> [ 0.296449] x86_64_start_kernel+0xe2/0xf0 >> [ 0.296451] common_startup_64+0x13e/0x141 >> [ 0.296453] </TASK> >> >> >> [ 0.312234] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea000400f900 pfn=1049572 nr=1 >> [ 0.312234] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >> [ 0.312236] Hardware name: Red Hat KVM, BIOS >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >> [ 0.312236] Call Trace: >> [ 0.312237] <TASK> >> [ 0.312237] dump_stack_lvl+0x53/0x70 >> [ 0.312239] __pgalloc_tag_add+0x3a3/0x6e0 >> [ 0.312240] ? __pfx___pgalloc_tag_add+0x10/0x10 >> [ 0.312241] ? rmqueue.constprop.0+0x4fc/0x1ce0 >> [ 0.312243] ? kasan_unpoison+0x27/0x60 >> [ 0.312244] ? __kasan_unpoison_pages+0x2c/0x40 >> [ 0.312246] get_page_from_freelist+0xa54/0x1310 >> [ 0.312248] __alloc_frozen_pages_noprof+0x206/0x4c0 >> [ 0.312250] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >> [ 0.312253] alloc_slab_page+0x39/0x130 >> [ 0.312254] allocate_slab+0x77/0x2c0 >> [ 0.312255] ? alloc_cpumask_var_node+0xc7/0x230 >> [ 0.312257] ___slab_alloc+0x46d/0x530 >> [ 0.312259] __kmalloc_node_noprof+0x2fa/0x680 >> [ 0.312261] ? alloc_cpumask_var_node+0xc7/0x230 >> [ 0.312263] alloc_cpumask_var_node+0xc7/0x230 >> [ 0.312264] init_desc+0x141/0x6b0 >> [ 0.312266] alloc_desc+0x108/0x1b0 >> [ 0.312267] early_irq_init+0xee/0x1c0 >> [ 0.312268] ? __pfx_early_irq_init+0x10/0x10 >> [ 0.312271] start_kernel+0x1ab/0x3c0 >> [ 0.312272] x86_64_start_reservations+0x18/0x30 >> [ 0.312274] x86_64_start_kernel+0xe2/0xf0 >> [ 0.312275] common_startup_64+0x13e/0x141 >> [ 0.312277] </TASK> >> >> [ 0.312834] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea000400fc00 pfn=1049584 nr=1 >> [ 0.312835] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >> [ 0.312836] Hardware name: Red Hat KVM, BIOS >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >> [ 0.312837] Call Trace: >> [ 0.312837] <TASK> >> [ 0.312838] dump_stack_lvl+0x53/0x70 >> [ 0.312840] __pgalloc_tag_add+0x3a3/0x6e0 >> [ 0.312841] ? __pfx___pgalloc_tag_add+0x10/0x10 >> [ 0.312842] ? rmqueue.constprop.0+0x4fc/0x1ce0 >> [ 0.312844] ? kasan_unpoison+0x27/0x60 >> [ 0.312845] ? __kasan_unpoison_pages+0x2c/0x40 >> [ 0.312847] get_page_from_freelist+0xa54/0x1310 >> [ 0.312849] __alloc_frozen_pages_noprof+0x206/0x4c0 >> [ 0.312851] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >> [ 0.312853] alloc_pages_mpol+0x13a/0x3f0 >> [ 0.312855] ? __pfx_alloc_pages_mpol+0x10/0x10 >> [ 0.312856] ? xas_find+0x2d8/0x450 >> [ 0.312858] ? _raw_spin_lock+0x84/0xe0 >> [ 0.312859] ? __pfx__raw_spin_lock+0x10/0x10 >> [ 0.312861] alloc_pages_noprof+0xf6/0x2b0 >> [ 0.312862] __change_page_attr+0x293/0x850 >> [ 0.312864] ? __pfx___change_page_attr+0x10/0x10 >> [ 0.312865] ? _vm_unmap_aliases+0x2d0/0x650 >> [ 0.312868] ? __pfx__vm_unmap_aliases+0x10/0x10 >> [ 0.312869] __change_page_attr_set_clr+0x16c/0x360 >> [ 0.312871] ? spp_getpage+0xbb/0x1e0 >> [ 0.312872] change_page_attr_set_clr+0x220/0x3c0 >> [ 0.312873] ? flush_tlb_one_kernel+0xf/0x30 >> [ 0.312875] ? set_pte_vaddr_p4d+0x110/0x180 >> [ 0.312877] ? __pfx_change_page_attr_set_clr+0x10/0x10 >> [ 0.312878] ? __pfx_set_pte_vaddr_p4d+0x10/0x10 >> [ 0.312881] ? __pfx_mtree_load+0x10/0x10 >> [ 0.312883] ? __pfx_mtree_load+0x10/0x10 >> [ 0.312884] ? __asan_memcpy+0x3c/0x60 >> [ 0.312886] ? set_intr_gate+0x10c/0x150 >> [ 0.312888] set_memory_ro+0x76/0xa0 >> [ 0.312889] ? __pfx_set_memory_ro+0x10/0x10 >> [ 0.312891] idt_setup_apic_and_irq_gates+0x2c1/0x390 >> >> and more. > Ok, it's not the only place. Got your point. > >> off topic - if we were to handle only alloc_page_ext() specifically, >> what would be the most straightforward >> >> solution in your mind? I'd really appreciate your insight. > I was thinking if it's the only special case maybe we can handle it > somehow differently, like we do when we allocate obj_ext vectors for > slabs using __GFP_NO_OBJ_EXT. I haven't found a good solution yet but > since it's not a special case we would not be able to use it even if I > came up with something... > I think your way is the most straight-forward but please try my > suggestion to see if we can avoid extra overhead. > Thanks, > Suren. Hi Suren Thank you for your feedback. After re-examining this issue, I realize my previous focus was misplaced. Upon deeper consideration, I understand that this is not merely a bug, but rather a warning that indicates a gap in our memory profiling mechanism. Specifically, the current implementation appears to be missing memory allocation tracking during the period between the buddy system allocation and page_ext initialization. This profiling gap means we may not be capturing all relevant memory allocation events during this critical transition phase. My approach is to dynamically allocate codetag_ref when get_page_tag_ref fails, and maintain a linked list to track all buddy system allocations that occur prior to page_ext initialization. However, this introduces performance concerns: 1. Free Path Overhead: When freeing these pages, we would need to traverse the entire linked list to locate the corresponding codetag_ref, resulting in O(n) lookup complexity per free operation. 2. Initialization Overhead: During init_page_alloc_tagging, iterating through the linked list to assign codetag_ref to page_ext would introduce additional traversal cost. If the number of pages is substantial, this could incur significant overhead. What are your thoughts on this? I look forward to your suggestions. Thanks Hao > >> Thanks. >> >> >>>>>> If the slab cache has no free objects, it falls back >>>>>> to the buddy allocator to allocate memory. However, at this point page_ext >>>>>> is not yet fully initialized, so these newly allocated pages have no >>>>>> codetag set. These pages may later be reclaimed by KASAN,which causes >>>>>> the warning to trigger when they are freed because their codetag ref is >>>>>> still empty. >>>>>> >>>>>> Use a global array to track pages allocated before page_ext is fully >>>>>> initialized, similar to how kmemleak tracks early allocations. >>>>>> When page_ext initialization completes, set their codetag >>>>>> to empty to avoid warnings when they are freed later. >>>>>> >>>>>> ... >>>>>> >>>>>> --- a/include/linux/alloc_tag.h >>>>>> +++ b/include/linux/alloc_tag.h >>>>>> @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) >>>>>> >>>>>> #ifdef CONFIG_MEM_ALLOC_PROFILING >>>>>> >>>>>> +bool mem_profiling_is_available(void); >>>>>> +void alloc_tag_add_early_pfn(unsigned long pfn); >>>>>> + >>>>>> #define ALLOC_TAG_SECTION_NAME "alloc_tags" >>>>>> >>>>>> struct codetag_bytes { >>>>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c >>>>>> index 58991ab09d84..a5bf4e72c154 100644 >>>>>> --- a/lib/alloc_tag.c >>>>>> +++ b/lib/alloc_tag.c >>>>>> @@ -6,6 +6,7 @@ >>>>>> #include <linux/kallsyms.h> >>>>>> #include <linux/module.h> >>>>>> #include <linux/page_ext.h> >>>>>> +#include <linux/pgalloc_tag.h> >>>>>> #include <linux/proc_fs.h> >>>>>> #include <linux/seq_buf.h> >>>>>> #include <linux/seq_file.h> >>>>>> @@ -26,6 +27,82 @@ static bool mem_profiling_support; >>>>>> >>>>>> static struct codetag_type *alloc_tag_cttype; >>>>>> >>>>>> +/* >>>>>> + * State of the alloc_tag >>>>>> + * >>>>>> + * This is used to describe the states of the alloc_tag during bootup. >>>>>> + * >>>>>> + * When we need to allocate page_ext to store codetag, we face an >>>>>> + * initialization timing problem: >>>>>> + * >>>>>> + * Due to initialization order, pages may be allocated via buddy system >>>>>> + * before page_ext is fully allocated and initialized. Although these >>>>>> + * pages call the allocation hooks, the codetag will not be set because >>>>>> + * page_ext is not yet available. >>>>>> + * >>>>>> + * When these pages are later free to the buddy system, it triggers >>>>>> + * warnings because their codetag is actually empty if >>>>>> + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. >>>>>> + * >>>>>> + * Additionally, in this situation, we cannot record detailed allocation >>>>>> + * information for these pages. >>>>>> + */ >>>>>> +enum mem_profiling_state { >>>>>> + DOWN, /* No mem_profiling functionality yet */ >>>>>> + UP /* Everything is working */ >>>>>> +}; >>>>>> + >>>>>> +static enum mem_profiling_state mem_profiling_state = DOWN; >>>>>> + >>>>>> +bool mem_profiling_is_available(void) >>>>>> +{ >>>>>> + return mem_profiling_state == UP; >>>>>> +} >>>>>> + >>>>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG >>>>>> + >>>>>> +#define EARLY_ALLOC_PFN_MAX 256 >>>>>> + >>>>>> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; >>>>> It's unfortunate that this isn't __initdata. >>>>> >>>>>> +static unsigned int early_pfn_count; >>>>>> +static DEFINE_SPINLOCK(early_pfn_lock); >>>>>> + >>>>>> >>>>>> ... >>>>>> >>>>>> --- a/mm/page_alloc.c >>>>>> +++ b/mm/page_alloc.c >>>>>> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, >>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >>>>>> update_page_tag_ref(handle, &ref); >>>>>> put_page_tag_ref(handle); >>>>>> + } else { >>>> This branch can be marked as "unlikely". >>>> >>>>>> + /* >>>>>> + * page_ext is not available yet, record the pfn so we can >>>>>> + * clear the tag ref later when page_ext is initialized. >>>>>> + */ >>>>>> + if (!mem_profiling_is_available()) >>>>>> + alloc_tag_add_early_pfn(page_to_pfn(page)); >>>>>> } >>>>>> } >>>>> All because of this, I believe. Is this fixable? >>>>> >>>>> If we take that `else', we know we're running in __init code, yes? I >>>>> don't see how `__init pgalloc_tag_add_early()' could be made to work. >>>>> hrm. Something clever, please. >>>> We can have a pointer to a function that is initialized to point to >>>> alloc_tag_add_early_pfn, which is defined as __init and uses >>>> early_pfns which now can be defined as __initdata. After >>>> clear_early_alloc_pfn_tag_refs() is done we reset that pointer to >>>> NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() >>>> directly checks that pointer and if it's not NULL then calls the >>>> function that it points to. This way __pgalloc_tag_add() which is not >>>> an __init function will be invoking alloc_tag_add_early_pfn() __init >>>> function only until we are done with initialization. I haven't tried >>>> this but I think that should work. This also eliminates the need for >>>> mem_profiling_state variable since we can use this function pointer >>>> instead. >>>> >>>> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-23 9:15 ` Hao Ge @ 2026-03-23 22:47 ` Suren Baghdasaryan 2026-03-24 9:43 ` Hao Ge 0 siblings, 1 reply; 21+ messages in thread From: Suren Baghdasaryan @ 2026-03-23 22:47 UTC (permalink / raw) To: Hao Ge; +Cc: Andrew Morton, Kent Overstreet, linux-mm, linux-kernel On Mon, Mar 23, 2026 at 2:16 AM Hao Ge <hao.ge@linux.dev> wrote: > > > On 2026/3/20 10:14, Suren Baghdasaryan wrote: > > On Thu, Mar 19, 2026 at 6:58 PM Hao Ge <hao.ge@linux.dev> wrote: > >> > >> On 2026/3/20 07:48, Suren Baghdasaryan wrote: > >>> On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@google.com> wrote: > >>>> On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: > >>>>> On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: > >>>>> > >>>>>> Due to initialization ordering, page_ext is allocated and initialized > >>>>>> relatively late during boot. Some pages have already been allocated > >>>>>> and freed before page_ext becomes available, leaving their codetag > >>>>>> uninitialized. > >>>> Hi Hao, > >>>> Thanks for the report. > >>>> Hmm. So, we are allocating pages before page_ext is initialized... > >>>> > >>>>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls > >>>>>> kmemleak_alloc(). > >>> Forgot to ask. The example you are using here is for page_ext > >>> allocation itself. Do you have any other examples where page > >>> allocation happens before page_ext initialization? If that's the only > >>> place, then we might be able to fix this in a simpler way by doing > >>> something special for alloc_page_ext(). > >> Hi Suren > >> > >> To help illustrate the point, here's the debug log I added: > >> > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index 2d4b6f1a554e..ebfe636f5b07 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -1293,6 +1293,9 @@ void __pgalloc_tag_add(struct page *page, struct > >> task_struct *task, > >> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >> update_page_tag_ref(handle, &ref); > >> put_page_tag_ref(handle); > >> + } else { > >> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! > >> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > >> + dump_stack(); > >> } > >> } > >> > >> > >> And I caught the following logs: > >> > >> [ 0.296399] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea000400c700 pfn=1049372 nr=1 > >> [ 0.296400] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >> [ 0.296402] Hardware name: Red Hat KVM, BIOS > >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >> [ 0.296402] Call Trace: > >> [ 0.296403] <TASK> > >> [ 0.296403] dump_stack_lvl+0x53/0x70 > >> [ 0.296405] __pgalloc_tag_add+0x3a3/0x6e0 > >> [ 0.296406] ? __pfx___pgalloc_tag_add+0x10/0x10 > >> [ 0.296407] ? kasan_unpoison+0x27/0x60 > >> [ 0.296409] ? __kasan_unpoison_pages+0x2c/0x40 > >> [ 0.296411] get_page_from_freelist+0xa54/0x1310 > >> [ 0.296413] __alloc_frozen_pages_noprof+0x206/0x4c0 > >> [ 0.296415] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >> [ 0.296417] ? stack_depot_save_flags+0x3f/0x680 > >> [ 0.296418] ? ___slab_alloc+0x518/0x530 > >> [ 0.296420] alloc_pages_mpol+0x13a/0x3f0 > >> [ 0.296421] ? __pfx_alloc_pages_mpol+0x10/0x10 > >> [ 0.296423] ? _raw_spin_lock_irqsave+0x8a/0xf0 > >> [ 0.296424] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 > >> [ 0.296426] alloc_slab_page+0xc2/0x130 > >> [ 0.296427] allocate_slab+0x77/0x2c0 > >> [ 0.296429] ? syscall_enter_define_fields+0x3bb/0x5f0 > >> [ 0.296430] ___slab_alloc+0x125/0x530 > >> [ 0.296432] ? __trace_define_field+0x252/0x3d0 > >> [ 0.296433] __kmalloc_noprof+0x329/0x630 > >> [ 0.296435] ? syscall_enter_define_fields+0x3bb/0x5f0 > >> [ 0.296436] syscall_enter_define_fields+0x3bb/0x5f0 > >> [ 0.296438] ? __pfx_syscall_enter_define_fields+0x10/0x10 > >> [ 0.296440] event_define_fields+0x326/0x540 > >> [ 0.296441] __trace_early_add_events+0xac/0x3c0 > >> [ 0.296443] trace_event_init+0x24c/0x460 > >> [ 0.296445] trace_init+0x9/0x20 > >> [ 0.296446] start_kernel+0x199/0x3c0 > >> [ 0.296448] x86_64_start_reservations+0x18/0x30 > >> [ 0.296449] x86_64_start_kernel+0xe2/0xf0 > >> [ 0.296451] common_startup_64+0x13e/0x141 > >> [ 0.296453] </TASK> > >> > >> > >> [ 0.312234] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea000400f900 pfn=1049572 nr=1 > >> [ 0.312234] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >> [ 0.312236] Hardware name: Red Hat KVM, BIOS > >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >> [ 0.312236] Call Trace: > >> [ 0.312237] <TASK> > >> [ 0.312237] dump_stack_lvl+0x53/0x70 > >> [ 0.312239] __pgalloc_tag_add+0x3a3/0x6e0 > >> [ 0.312240] ? __pfx___pgalloc_tag_add+0x10/0x10 > >> [ 0.312241] ? rmqueue.constprop.0+0x4fc/0x1ce0 > >> [ 0.312243] ? kasan_unpoison+0x27/0x60 > >> [ 0.312244] ? __kasan_unpoison_pages+0x2c/0x40 > >> [ 0.312246] get_page_from_freelist+0xa54/0x1310 > >> [ 0.312248] __alloc_frozen_pages_noprof+0x206/0x4c0 > >> [ 0.312250] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >> [ 0.312253] alloc_slab_page+0x39/0x130 > >> [ 0.312254] allocate_slab+0x77/0x2c0 > >> [ 0.312255] ? alloc_cpumask_var_node+0xc7/0x230 > >> [ 0.312257] ___slab_alloc+0x46d/0x530 > >> [ 0.312259] __kmalloc_node_noprof+0x2fa/0x680 > >> [ 0.312261] ? alloc_cpumask_var_node+0xc7/0x230 > >> [ 0.312263] alloc_cpumask_var_node+0xc7/0x230 > >> [ 0.312264] init_desc+0x141/0x6b0 > >> [ 0.312266] alloc_desc+0x108/0x1b0 > >> [ 0.312267] early_irq_init+0xee/0x1c0 > >> [ 0.312268] ? __pfx_early_irq_init+0x10/0x10 > >> [ 0.312271] start_kernel+0x1ab/0x3c0 > >> [ 0.312272] x86_64_start_reservations+0x18/0x30 > >> [ 0.312274] x86_64_start_kernel+0xe2/0xf0 > >> [ 0.312275] common_startup_64+0x13e/0x141 > >> [ 0.312277] </TASK> > >> > >> [ 0.312834] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea000400fc00 pfn=1049584 nr=1 > >> [ 0.312835] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >> [ 0.312836] Hardware name: Red Hat KVM, BIOS > >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >> [ 0.312837] Call Trace: > >> [ 0.312837] <TASK> > >> [ 0.312838] dump_stack_lvl+0x53/0x70 > >> [ 0.312840] __pgalloc_tag_add+0x3a3/0x6e0 > >> [ 0.312841] ? __pfx___pgalloc_tag_add+0x10/0x10 > >> [ 0.312842] ? rmqueue.constprop.0+0x4fc/0x1ce0 > >> [ 0.312844] ? kasan_unpoison+0x27/0x60 > >> [ 0.312845] ? __kasan_unpoison_pages+0x2c/0x40 > >> [ 0.312847] get_page_from_freelist+0xa54/0x1310 > >> [ 0.312849] __alloc_frozen_pages_noprof+0x206/0x4c0 > >> [ 0.312851] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >> [ 0.312853] alloc_pages_mpol+0x13a/0x3f0 > >> [ 0.312855] ? __pfx_alloc_pages_mpol+0x10/0x10 > >> [ 0.312856] ? xas_find+0x2d8/0x450 > >> [ 0.312858] ? _raw_spin_lock+0x84/0xe0 > >> [ 0.312859] ? __pfx__raw_spin_lock+0x10/0x10 > >> [ 0.312861] alloc_pages_noprof+0xf6/0x2b0 > >> [ 0.312862] __change_page_attr+0x293/0x850 > >> [ 0.312864] ? __pfx___change_page_attr+0x10/0x10 > >> [ 0.312865] ? _vm_unmap_aliases+0x2d0/0x650 > >> [ 0.312868] ? __pfx__vm_unmap_aliases+0x10/0x10 > >> [ 0.312869] __change_page_attr_set_clr+0x16c/0x360 > >> [ 0.312871] ? spp_getpage+0xbb/0x1e0 > >> [ 0.312872] change_page_attr_set_clr+0x220/0x3c0 > >> [ 0.312873] ? flush_tlb_one_kernel+0xf/0x30 > >> [ 0.312875] ? set_pte_vaddr_p4d+0x110/0x180 > >> [ 0.312877] ? __pfx_change_page_attr_set_clr+0x10/0x10 > >> [ 0.312878] ? __pfx_set_pte_vaddr_p4d+0x10/0x10 > >> [ 0.312881] ? __pfx_mtree_load+0x10/0x10 > >> [ 0.312883] ? __pfx_mtree_load+0x10/0x10 > >> [ 0.312884] ? __asan_memcpy+0x3c/0x60 > >> [ 0.312886] ? set_intr_gate+0x10c/0x150 > >> [ 0.312888] set_memory_ro+0x76/0xa0 > >> [ 0.312889] ? __pfx_set_memory_ro+0x10/0x10 > >> [ 0.312891] idt_setup_apic_and_irq_gates+0x2c1/0x390 > >> > >> and more. > > Ok, it's not the only place. Got your point. > > > >> off topic - if we were to handle only alloc_page_ext() specifically, > >> what would be the most straightforward > >> > >> solution in your mind? I'd really appreciate your insight. > > I was thinking if it's the only special case maybe we can handle it > > somehow differently, like we do when we allocate obj_ext vectors for > > slabs using __GFP_NO_OBJ_EXT. I haven't found a good solution yet but > > since it's not a special case we would not be able to use it even if I > > came up with something... > > I think your way is the most straight-forward but please try my > > suggestion to see if we can avoid extra overhead. > > Thanks, > > Suren. > Hi Hao, > Hi Suren > > Thank you for your feedback. After re-examining this issue, > > I realize my previous focus was misplaced. > > Upon deeper consideration, I understand that this is not merely a bug, > > but rather a warning that indicates a gap in our memory profiling mechanism. > > Specifically, the current implementation appears to be missing memory > allocation > > tracking during the period between the buddy system allocation and page_ext > > initialization. > > This profiling gap means we may not be capturing all relevant memory > allocation > > events during this critical transition phase. Correct, this limitation exists because memory profiling relies on some kernel facilities (page_ext, objj_ext) which might not be initialized yet at the time of allocation. > > My approach is to dynamically allocate codetag_ref when get_page_tag_ref > fails, > > and maintain a linked list to track all buddy system allocations that > occur prior to page_ext initialization. > > However, this introduces performance concerns: > > 1. Free Path Overhead: When freeing these pages, we would need to > traverse the entire linked list to locate > > the corresponding codetag_ref, resulting in O(n) lookup complexity > per free operation. > > 2. Initialization Overhead: During init_page_alloc_tagging, iterating > through the linked list to assign codetag_ref to > > page_ext would introduce additional traversal cost. > > If the number of pages is substantial, this could incur significant > overhead. What are your thoughts on this? I look forward to your > suggestions. My thinking is that these early allocations comprise a small portion of overall memory consumed by the system. So, instead of trying to record and handle them in some alternative way, we just accept that some counters might not be exactly accurate and ignore those early allocations. See how the early slab allocations are marked with the CODETAG_FLAG_INACCURATE flag and later reported as inaccurate. I think that's an acceptable alternative to introducing extra complexity and performance overhead. IOW, the benefits of accounting for these early allocations are low compared to the effort required to account for them. Unless you found a simple and performant way to do that... I think your earlier patch can effectively detect these early allocations and suppress the warnings. We should also mark these allocations with CODETAG_FLAG_INACCURATE. Thanks, Suren. > > > Thanks > > Hao > > > > >> Thanks. > >> > >> > >>>>>> If the slab cache has no free objects, it falls back > >>>>>> to the buddy allocator to allocate memory. However, at this point page_ext > >>>>>> is not yet fully initialized, so these newly allocated pages have no > >>>>>> codetag set. These pages may later be reclaimed by KASAN,which causes > >>>>>> the warning to trigger when they are freed because their codetag ref is > >>>>>> still empty. > >>>>>> > >>>>>> Use a global array to track pages allocated before page_ext is fully > >>>>>> initialized, similar to how kmemleak tracks early allocations. > >>>>>> When page_ext initialization completes, set their codetag > >>>>>> to empty to avoid warnings when they are freed later. > >>>>>> > >>>>>> ... > >>>>>> > >>>>>> --- a/include/linux/alloc_tag.h > >>>>>> +++ b/include/linux/alloc_tag.h > >>>>>> @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) > >>>>>> > >>>>>> #ifdef CONFIG_MEM_ALLOC_PROFILING > >>>>>> > >>>>>> +bool mem_profiling_is_available(void); > >>>>>> +void alloc_tag_add_early_pfn(unsigned long pfn); > >>>>>> + > >>>>>> #define ALLOC_TAG_SECTION_NAME "alloc_tags" > >>>>>> > >>>>>> struct codetag_bytes { > >>>>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > >>>>>> index 58991ab09d84..a5bf4e72c154 100644 > >>>>>> --- a/lib/alloc_tag.c > >>>>>> +++ b/lib/alloc_tag.c > >>>>>> @@ -6,6 +6,7 @@ > >>>>>> #include <linux/kallsyms.h> > >>>>>> #include <linux/module.h> > >>>>>> #include <linux/page_ext.h> > >>>>>> +#include <linux/pgalloc_tag.h> > >>>>>> #include <linux/proc_fs.h> > >>>>>> #include <linux/seq_buf.h> > >>>>>> #include <linux/seq_file.h> > >>>>>> @@ -26,6 +27,82 @@ static bool mem_profiling_support; > >>>>>> > >>>>>> static struct codetag_type *alloc_tag_cttype; > >>>>>> > >>>>>> +/* > >>>>>> + * State of the alloc_tag > >>>>>> + * > >>>>>> + * This is used to describe the states of the alloc_tag during bootup. > >>>>>> + * > >>>>>> + * When we need to allocate page_ext to store codetag, we face an > >>>>>> + * initialization timing problem: > >>>>>> + * > >>>>>> + * Due to initialization order, pages may be allocated via buddy system > >>>>>> + * before page_ext is fully allocated and initialized. Although these > >>>>>> + * pages call the allocation hooks, the codetag will not be set because > >>>>>> + * page_ext is not yet available. > >>>>>> + * > >>>>>> + * When these pages are later free to the buddy system, it triggers > >>>>>> + * warnings because their codetag is actually empty if > >>>>>> + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. > >>>>>> + * > >>>>>> + * Additionally, in this situation, we cannot record detailed allocation > >>>>>> + * information for these pages. > >>>>>> + */ > >>>>>> +enum mem_profiling_state { > >>>>>> + DOWN, /* No mem_profiling functionality yet */ > >>>>>> + UP /* Everything is working */ > >>>>>> +}; > >>>>>> + > >>>>>> +static enum mem_profiling_state mem_profiling_state = DOWN; > >>>>>> + > >>>>>> +bool mem_profiling_is_available(void) > >>>>>> +{ > >>>>>> + return mem_profiling_state == UP; > >>>>>> +} > >>>>>> + > >>>>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG > >>>>>> + > >>>>>> +#define EARLY_ALLOC_PFN_MAX 256 > >>>>>> + > >>>>>> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; > >>>>> It's unfortunate that this isn't __initdata. > >>>>> > >>>>>> +static unsigned int early_pfn_count; > >>>>>> +static DEFINE_SPINLOCK(early_pfn_lock); > >>>>>> + > >>>>>> > >>>>>> ... > >>>>>> > >>>>>> --- a/mm/page_alloc.c > >>>>>> +++ b/mm/page_alloc.c > >>>>>> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, > >>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >>>>>> update_page_tag_ref(handle, &ref); > >>>>>> put_page_tag_ref(handle); > >>>>>> + } else { > >>>> This branch can be marked as "unlikely". > >>>> > >>>>>> + /* > >>>>>> + * page_ext is not available yet, record the pfn so we can > >>>>>> + * clear the tag ref later when page_ext is initialized. > >>>>>> + */ > >>>>>> + if (!mem_profiling_is_available()) > >>>>>> + alloc_tag_add_early_pfn(page_to_pfn(page)); > >>>>>> } > >>>>>> } > >>>>> All because of this, I believe. Is this fixable? > >>>>> > >>>>> If we take that `else', we know we're running in __init code, yes? I > >>>>> don't see how `__init pgalloc_tag_add_early()' could be made to work. > >>>>> hrm. Something clever, please. > >>>> We can have a pointer to a function that is initialized to point to > >>>> alloc_tag_add_early_pfn, which is defined as __init and uses > >>>> early_pfns which now can be defined as __initdata. After > >>>> clear_early_alloc_pfn_tag_refs() is done we reset that pointer to > >>>> NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() > >>>> directly checks that pointer and if it's not NULL then calls the > >>>> function that it points to. This way __pgalloc_tag_add() which is not > >>>> an __init function will be invoking alloc_tag_add_early_pfn() __init > >>>> function only until we are done with initialization. I haven't tried > >>>> this but I think that should work. This also eliminates the need for > >>>> mem_profiling_state variable since we can use this function pointer > >>>> instead. > >>>> > >>>> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-23 22:47 ` Suren Baghdasaryan @ 2026-03-24 9:43 ` Hao Ge 2026-03-25 0:21 ` Suren Baghdasaryan 0 siblings, 1 reply; 21+ messages in thread From: Hao Ge @ 2026-03-24 9:43 UTC (permalink / raw) To: Suren Baghdasaryan; +Cc: Andrew Morton, Kent Overstreet, linux-mm, linux-kernel On 2026/3/24 06:47, Suren Baghdasaryan wrote: > On Mon, Mar 23, 2026 at 2:16 AM Hao Ge <hao.ge@linux.dev> wrote: >> >> On 2026/3/20 10:14, Suren Baghdasaryan wrote: >>> On Thu, Mar 19, 2026 at 6:58 PM Hao Ge <hao.ge@linux.dev> wrote: >>>> On 2026/3/20 07:48, Suren Baghdasaryan wrote: >>>>> On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@google.com> wrote: >>>>>> On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: >>>>>>> On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: >>>>>>> >>>>>>>> Due to initialization ordering, page_ext is allocated and initialized >>>>>>>> relatively late during boot. Some pages have already been allocated >>>>>>>> and freed before page_ext becomes available, leaving their codetag >>>>>>>> uninitialized. >>>>>> Hi Hao, >>>>>> Thanks for the report. >>>>>> Hmm. So, we are allocating pages before page_ext is initialized... >>>>>> >>>>>>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls >>>>>>>> kmemleak_alloc(). >>>>> Forgot to ask. The example you are using here is for page_ext >>>>> allocation itself. Do you have any other examples where page >>>>> allocation happens before page_ext initialization? If that's the only >>>>> place, then we might be able to fix this in a simpler way by doing >>>>> something special for alloc_page_ext(). >>>> Hi Suren >>>> >>>> To help illustrate the point, here's the debug log I added: >>>> >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>> index 2d4b6f1a554e..ebfe636f5b07 100644 >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -1293,6 +1293,9 @@ void __pgalloc_tag_add(struct page *page, struct >>>> task_struct *task, >>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >>>> update_page_tag_ref(handle, &ref); >>>> put_page_tag_ref(handle); >>>> + } else { >>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! >>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); >>>> + dump_stack(); >>>> } >>>> } >>>> >>>> >>>> And I caught the following logs: >>>> >>>> [ 0.296399] __pgalloc_tag_add: get_page_tag_ref failed! >>>> page=ffffea000400c700 pfn=1049372 nr=1 >>>> [ 0.296400] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >>>> [ 0.296402] Hardware name: Red Hat KVM, BIOS >>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>> [ 0.296402] Call Trace: >>>> [ 0.296403] <TASK> >>>> [ 0.296403] dump_stack_lvl+0x53/0x70 >>>> [ 0.296405] __pgalloc_tag_add+0x3a3/0x6e0 >>>> [ 0.296406] ? __pfx___pgalloc_tag_add+0x10/0x10 >>>> [ 0.296407] ? kasan_unpoison+0x27/0x60 >>>> [ 0.296409] ? __kasan_unpoison_pages+0x2c/0x40 >>>> [ 0.296411] get_page_from_freelist+0xa54/0x1310 >>>> [ 0.296413] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>> [ 0.296415] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >>>> [ 0.296417] ? stack_depot_save_flags+0x3f/0x680 >>>> [ 0.296418] ? ___slab_alloc+0x518/0x530 >>>> [ 0.296420] alloc_pages_mpol+0x13a/0x3f0 >>>> [ 0.296421] ? __pfx_alloc_pages_mpol+0x10/0x10 >>>> [ 0.296423] ? _raw_spin_lock_irqsave+0x8a/0xf0 >>>> [ 0.296424] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 >>>> [ 0.296426] alloc_slab_page+0xc2/0x130 >>>> [ 0.296427] allocate_slab+0x77/0x2c0 >>>> [ 0.296429] ? syscall_enter_define_fields+0x3bb/0x5f0 >>>> [ 0.296430] ___slab_alloc+0x125/0x530 >>>> [ 0.296432] ? __trace_define_field+0x252/0x3d0 >>>> [ 0.296433] __kmalloc_noprof+0x329/0x630 >>>> [ 0.296435] ? syscall_enter_define_fields+0x3bb/0x5f0 >>>> [ 0.296436] syscall_enter_define_fields+0x3bb/0x5f0 >>>> [ 0.296438] ? __pfx_syscall_enter_define_fields+0x10/0x10 >>>> [ 0.296440] event_define_fields+0x326/0x540 >>>> [ 0.296441] __trace_early_add_events+0xac/0x3c0 >>>> [ 0.296443] trace_event_init+0x24c/0x460 >>>> [ 0.296445] trace_init+0x9/0x20 >>>> [ 0.296446] start_kernel+0x199/0x3c0 >>>> [ 0.296448] x86_64_start_reservations+0x18/0x30 >>>> [ 0.296449] x86_64_start_kernel+0xe2/0xf0 >>>> [ 0.296451] common_startup_64+0x13e/0x141 >>>> [ 0.296453] </TASK> >>>> >>>> >>>> [ 0.312234] __pgalloc_tag_add: get_page_tag_ref failed! >>>> page=ffffea000400f900 pfn=1049572 nr=1 >>>> [ 0.312234] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >>>> [ 0.312236] Hardware name: Red Hat KVM, BIOS >>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>> [ 0.312236] Call Trace: >>>> [ 0.312237] <TASK> >>>> [ 0.312237] dump_stack_lvl+0x53/0x70 >>>> [ 0.312239] __pgalloc_tag_add+0x3a3/0x6e0 >>>> [ 0.312240] ? __pfx___pgalloc_tag_add+0x10/0x10 >>>> [ 0.312241] ? rmqueue.constprop.0+0x4fc/0x1ce0 >>>> [ 0.312243] ? kasan_unpoison+0x27/0x60 >>>> [ 0.312244] ? __kasan_unpoison_pages+0x2c/0x40 >>>> [ 0.312246] get_page_from_freelist+0xa54/0x1310 >>>> [ 0.312248] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>> [ 0.312250] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >>>> [ 0.312253] alloc_slab_page+0x39/0x130 >>>> [ 0.312254] allocate_slab+0x77/0x2c0 >>>> [ 0.312255] ? alloc_cpumask_var_node+0xc7/0x230 >>>> [ 0.312257] ___slab_alloc+0x46d/0x530 >>>> [ 0.312259] __kmalloc_node_noprof+0x2fa/0x680 >>>> [ 0.312261] ? alloc_cpumask_var_node+0xc7/0x230 >>>> [ 0.312263] alloc_cpumask_var_node+0xc7/0x230 >>>> [ 0.312264] init_desc+0x141/0x6b0 >>>> [ 0.312266] alloc_desc+0x108/0x1b0 >>>> [ 0.312267] early_irq_init+0xee/0x1c0 >>>> [ 0.312268] ? __pfx_early_irq_init+0x10/0x10 >>>> [ 0.312271] start_kernel+0x1ab/0x3c0 >>>> [ 0.312272] x86_64_start_reservations+0x18/0x30 >>>> [ 0.312274] x86_64_start_kernel+0xe2/0xf0 >>>> [ 0.312275] common_startup_64+0x13e/0x141 >>>> [ 0.312277] </TASK> >>>> >>>> [ 0.312834] __pgalloc_tag_add: get_page_tag_ref failed! >>>> page=ffffea000400fc00 pfn=1049584 nr=1 >>>> [ 0.312835] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >>>> [ 0.312836] Hardware name: Red Hat KVM, BIOS >>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>> [ 0.312837] Call Trace: >>>> [ 0.312837] <TASK> >>>> [ 0.312838] dump_stack_lvl+0x53/0x70 >>>> [ 0.312840] __pgalloc_tag_add+0x3a3/0x6e0 >>>> [ 0.312841] ? __pfx___pgalloc_tag_add+0x10/0x10 >>>> [ 0.312842] ? rmqueue.constprop.0+0x4fc/0x1ce0 >>>> [ 0.312844] ? kasan_unpoison+0x27/0x60 >>>> [ 0.312845] ? __kasan_unpoison_pages+0x2c/0x40 >>>> [ 0.312847] get_page_from_freelist+0xa54/0x1310 >>>> [ 0.312849] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>> [ 0.312851] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >>>> [ 0.312853] alloc_pages_mpol+0x13a/0x3f0 >>>> [ 0.312855] ? __pfx_alloc_pages_mpol+0x10/0x10 >>>> [ 0.312856] ? xas_find+0x2d8/0x450 >>>> [ 0.312858] ? _raw_spin_lock+0x84/0xe0 >>>> [ 0.312859] ? __pfx__raw_spin_lock+0x10/0x10 >>>> [ 0.312861] alloc_pages_noprof+0xf6/0x2b0 >>>> [ 0.312862] __change_page_attr+0x293/0x850 >>>> [ 0.312864] ? __pfx___change_page_attr+0x10/0x10 >>>> [ 0.312865] ? _vm_unmap_aliases+0x2d0/0x650 >>>> [ 0.312868] ? __pfx__vm_unmap_aliases+0x10/0x10 >>>> [ 0.312869] __change_page_attr_set_clr+0x16c/0x360 >>>> [ 0.312871] ? spp_getpage+0xbb/0x1e0 >>>> [ 0.312872] change_page_attr_set_clr+0x220/0x3c0 >>>> [ 0.312873] ? flush_tlb_one_kernel+0xf/0x30 >>>> [ 0.312875] ? set_pte_vaddr_p4d+0x110/0x180 >>>> [ 0.312877] ? __pfx_change_page_attr_set_clr+0x10/0x10 >>>> [ 0.312878] ? __pfx_set_pte_vaddr_p4d+0x10/0x10 >>>> [ 0.312881] ? __pfx_mtree_load+0x10/0x10 >>>> [ 0.312883] ? __pfx_mtree_load+0x10/0x10 >>>> [ 0.312884] ? __asan_memcpy+0x3c/0x60 >>>> [ 0.312886] ? set_intr_gate+0x10c/0x150 >>>> [ 0.312888] set_memory_ro+0x76/0xa0 >>>> [ 0.312889] ? __pfx_set_memory_ro+0x10/0x10 >>>> [ 0.312891] idt_setup_apic_and_irq_gates+0x2c1/0x390 >>>> >>>> and more. >>> Ok, it's not the only place. Got your point. >>> >>>> off topic - if we were to handle only alloc_page_ext() specifically, >>>> what would be the most straightforward >>>> >>>> solution in your mind? I'd really appreciate your insight. >>> I was thinking if it's the only special case maybe we can handle it >>> somehow differently, like we do when we allocate obj_ext vectors for >>> slabs using __GFP_NO_OBJ_EXT. I haven't found a good solution yet but >>> since it's not a special case we would not be able to use it even if I >>> came up with something... >>> I think your way is the most straight-forward but please try my >>> suggestion to see if we can avoid extra overhead. >>> Thanks, >>> Suren. Hi Suren > Hi Hao, > >> Hi Suren >> >> Thank you for your feedback. After re-examining this issue, >> >> I realize my previous focus was misplaced. >> >> Upon deeper consideration, I understand that this is not merely a bug, >> >> but rather a warning that indicates a gap in our memory profiling mechanism. >> >> Specifically, the current implementation appears to be missing memory >> allocation >> >> tracking during the period between the buddy system allocation and page_ext >> >> initialization. >> >> This profiling gap means we may not be capturing all relevant memory >> allocation >> >> events during this critical transition phase. > Correct, this limitation exists because memory profiling relies on > some kernel facilities (page_ext, objj_ext) which might not be > initialized yet at the time of allocation. > >> My approach is to dynamically allocate codetag_ref when get_page_tag_ref >> fails, >> >> and maintain a linked list to track all buddy system allocations that >> occur prior to page_ext initialization. >> >> However, this introduces performance concerns: >> >> 1. Free Path Overhead: When freeing these pages, we would need to >> traverse the entire linked list to locate >> >> the corresponding codetag_ref, resulting in O(n) lookup complexity >> per free operation. >> >> 2. Initialization Overhead: During init_page_alloc_tagging, iterating >> through the linked list to assign codetag_ref to >> >> page_ext would introduce additional traversal cost. >> >> If the number of pages is substantial, this could incur significant >> overhead. What are your thoughts on this? I look forward to your >> suggestions. > My thinking is that these early allocations comprise a small portion > of overall memory consumed by the system. So, instead of trying to > record and handle them in some alternative way, we just accept that > some counters might not be exactly accurate and ignore those early > allocations. See how the early slab allocations are marked with the > CODETAG_FLAG_INACCURATE flag and later reported as inaccurate. I think > that's an acceptable alternative to introducing extra complexity and > performance overhead. IOW, the benefits of accounting for these early > allocations are low compared to the effort required to account for > them. Unless you found a simple and performant way to do that... I have been exploring possible solutions to this issue over the past few days, but so far I have not come up with a good approach. I have counted the number of memory allocations that occur earlier than the allocation and initialization of our page_ext, and found that there are actually quite a lot of them. Similarly, I have made the following changes and collected the corresponding logs. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2d4b6f1a554e..6db65b3d52d3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1293,6 +1293,8 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); update_page_tag_ref(handle, &ref); put_page_tag_ref(handle); + } else{ + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); } } @@ -1314,6 +1316,8 @@ void __pgalloc_tag_sub(struct page *page, unsigned int nr) alloc_tag_sub(&ref, PAGE_SIZE * nr); update_page_tag_ref(handle, &ref); put_page_tag_ref(handle); + } else{ + pr_warn("__pgalloc_tag_sub: get_page_tag_ref failed! page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); } } [ 0.261699] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001000 pfn=1048640 nr=2 [ 0.261711] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001100 pfn=1048644 nr=4 [ 0.261717] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001200 pfn=1048648 nr=4 [ 0.261721] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001300 pfn=1048652 nr=4 [ 0.261893] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001080 pfn=1048642 nr=2 [ 0.261917] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001400 pfn=1048656 nr=4 [ 0.262018] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001500 pfn=1048660 nr=2 [ 0.262024] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001600 pfn=1048664 nr=8 [ 0.262040] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001580 pfn=1048662 nr=1 [ 0.262048] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea00040015c0 pfn=1048663 nr=1 [ 0.262056] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001800 pfn=1048672 nr=2 [ 0.262064] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001880 pfn=1048674 nr=2 [ 0.262078] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001900 pfn=1048676 nr=2 [ 0.262196] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 [ 0.262213] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001980 pfn=1048678 nr=2 [ 0.262220] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001a00 pfn=1048680 nr=4 [ 0.262246] ODEBUG: selftest passed [ 0.262268] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001b00 pfn=1048684 nr=1 [ 0.262318] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001b40 pfn=1048685 nr=1 [ 0.262368] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001b80 pfn=1048686 nr=1 [ 0.262418] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001bc0 pfn=1048687 nr=1 [ 0.262469] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001c00 pfn=1048688 nr=1 [ 0.262519] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001c40 pfn=1048689 nr=1 [ 0.262569] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001c80 pfn=1048690 nr=1 [ 0.262620] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001cc0 pfn=1048691 nr=1 [ 0.262670] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001d00 pfn=1048692 nr=1 [ 0.262721] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001d40 pfn=1048693 nr=1 [ 0.262771] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001d80 pfn=1048694 nr=1 [ 0.262821] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001dc0 pfn=1048695 nr=1 [ 0.262871] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001e00 pfn=1048696 nr=1 [ 0.262923] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001e40 pfn=1048697 nr=1 [ 0.262974] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001e80 pfn=1048698 nr=1 [ 0.263024] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001ec0 pfn=1048699 nr=1 [ 0.263074] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001f00 pfn=1048700 nr=1 [ 0.263124] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001f40 pfn=1048701 nr=1 [ 0.263174] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001f80 pfn=1048702 nr=1 [ 0.263224] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004001fc0 pfn=1048703 nr=1 [ 0.263275] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004002000 pfn=1048704 nr=1 [ 0.263325] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004002040 pfn=1048705 nr=1 [ 0.263375] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004002080 pfn=1048706 nr=1 [ 0.263427] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004002400 pfn=1048720 nr=16 [ 0.263437] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea00040020c0 pfn=1048707 nr=1 [ 0.263463] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004002100 pfn=1048708 nr=1 [ 0.263465] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004002140 pfn=1048709 nr=1 [ 0.263467] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004002180 pfn=1048710 nr=1 [ 0.263509] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004002200 pfn=1048712 nr=4 [ 0.263512] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004002800 pfn=1048736 nr=8 [ 0.263524] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea00040021c0 pfn=1048711 nr=1 [ 0.263536] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004002300 pfn=1048716 nr=1 [ 0.263537] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004002340 pfn=1048717 nr=1 [ 0.263539] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004002380 pfn=1048718 nr=1 [ 0.263604] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004004000 pfn=1048832 nr=128 [ 0.263638] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004003000 pfn=1048768 nr=64 [ 0.263650] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004002c00 pfn=1048752 nr=16 [ 0.263655] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea00040023c0 pfn=1048719 nr=1 [ 0.270582] __pgalloc_tag_sub: get_page_tag_ref failed! page=ffffea00040023c0 pfn=1048719 nr=1 [ 0.270591] ftrace: allocating 52717 entries in 208 pages [ 0.270592] ftrace: allocated 208 pages with 3 groups [ 0.270620] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004002a00 pfn=1048744 nr=8 [ 0.270636] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea00040023c0 pfn=1048719 nr=1 [ 0.270643] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006000 pfn=1048960 nr=1 [ 0.270649] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006040 pfn=1048961 nr=1 [ 0.270658] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004007000 pfn=1049024 nr=64 [ 0.270659] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006080 pfn=1048962 nr=2 [ 0.270722] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006100 pfn=1048964 nr=1 [ 0.270730] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006140 pfn=1048965 nr=1 [ 0.270738] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006180 pfn=1048966 nr=1 [ 0.270777] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea00040061c0 pfn=1048967 nr=1 [ 0.270786] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006200 pfn=1048968 nr=1 [ 0.270792] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006240 pfn=1048969 nr=1 [ 0.270833] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006300 pfn=1048972 nr=4 [ 0.270891] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006280 pfn=1048970 nr=1 [ 0.270980] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea00040062c0 pfn=1048971 nr=1 [ 0.271071] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006400 pfn=1048976 nr=1 [ 0.271156] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006440 pfn=1048977 nr=1 [ 0.271185] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006480 pfn=1048978 nr=2 [ 0.271301] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006500 pfn=1048980 nr=1 [ 0.271655] Dynamic Preempt: lazy [ 0.271662] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006580 pfn=1048982 nr=2 [ 0.271752] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006600 pfn=1048984 nr=4 [ 0.271762] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004010000 pfn=1049600 nr=4 [ 0.271824] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006540 pfn=1048981 nr=1 [ 0.271916] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006700 pfn=1048988 nr=2 [ 0.271964] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006780 pfn=1048990 nr=1 [ 0.272099] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea00040067c0 pfn=1048991 nr=1 [ 0.272138] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006800 pfn=1048992 nr=2 [ 0.272144] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006a00 pfn=1049000 nr=8 [ 0.272249] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006c00 pfn=1049008 nr=8 [ 0.272319] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006880 pfn=1048994 nr=2 [ 0.272351] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006900 pfn=1048996 nr=4 [ 0.272424] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004006e00 pfn=1049016 nr=8 [ 0.272485] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004008000 pfn=1049088 nr=8 [ 0.272535] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004008200 pfn=1049096 nr=2 [ 0.272600] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004008400 pfn=1049104 nr=8 [ 0.272663] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004008300 pfn=1049100 nr=4 [ 0.272694] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004008280 pfn=1049098 nr=2 [ 0.272708] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004008600 pfn=1049112 nr=8 [ 0.272924] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004008880 pfn=1049122 nr=2 [ 0.272934] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004008900 pfn=1049124 nr=2 [ 0.272952] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004008c00 pfn=1049136 nr=4 [ 0.273035] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004008980 pfn=1049126 nr=2 [ 0.273062] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004008e00 pfn=1049144 nr=8 [ 0.273674] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004008d00 pfn=1049140 nr=1 [ 0.273884] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004008d80 pfn=1049142 nr=2 [ 0.273943] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009000 pfn=1049152 nr=2 [ 0.274379] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009080 pfn=1049154 nr=2 [ 0.274575] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009200 pfn=1049160 nr=8 [ 0.274617] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009100 pfn=1049156 nr=4 [ 0.274794] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009400 pfn=1049168 nr=2 [ 0.274840] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009480 pfn=1049170 nr=2 [ 0.275057] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009500 pfn=1049172 nr=2 [ 0.275092] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009580 pfn=1049174 nr=2 [ 0.275134] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009600 pfn=1049176 nr=8 [ 0.275211] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009800 pfn=1049184 nr=4 [ 0.275510] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009900 pfn=1049188 nr=2 [ 0.275548] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009980 pfn=1049190 nr=2 [ 0.275976] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009a00 pfn=1049192 nr=8 [ 0.275987] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009c00 pfn=1049200 nr=2 [ 0.276139] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009c80 pfn=1049202 nr=2 [ 0.276152] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004008d40 pfn=1049141 nr=1 [ 0.276242] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009d00 pfn=1049204 nr=1 [ 0.276358] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009d40 pfn=1049205 nr=1 [ 0.276444] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009d80 pfn=1049206 nr=1 [ 0.276526] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009dc0 pfn=1049207 nr=1 [ 0.276615] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009e00 pfn=1049208 nr=1 [ 0.276696] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009e40 pfn=1049209 nr=1 [ 0.276792] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009e80 pfn=1049210 nr=1 [ 0.276827] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009f00 pfn=1049212 nr=2 [ 0.276891] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009ec0 pfn=1049211 nr=1 [ 0.276999] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009f80 pfn=1049214 nr=1 [ 0.277082] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea0004009fc0 pfn=1049215 nr=1 [ 0.277172] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea000400a000 pfn=1049216 nr=1 [ 0.277257] __pgalloc_tag_add: get_page_tag_ref failed! page=ffffea000400a040 pfn=1049217 nr=1 and so on. > I think your earlier patch can effectively detect these early > allocations and suppress the warnings. We should also mark these > allocations with CODETAG_FLAG_INACCURATE. Thanks to an excellent AI review, I realized there are issues with my original patch. One problem is the 256-element array; another is that it involves allocation and free operations — meaning we need to record entries at __pgalloc_tag_add and remove them at __pgalloc_tag_sub, which introduces a noticeable overhead. I'm wondering if we can instead set a flag bit in page flags during the early boot stage, which I'll refer to as EARLY_ALLOC_FLAGS. Then, in __pgalloc_tag_sub, we first check for EARLY_ALLOC_FLAGS. If set, we clear the flag and return immediately; otherwise, we perform the actual subtraction of the tag count. This approach seems somewhat similar to the idea behind mem_profiling_compressed. I would appreciate your valuable feedback and any better suggestions you might have. Thanks Hao > Thanks, > Suren. > >> >> Thanks >> >> Hao >> >>>> Thanks. >>>> >>>> >>>>>>>> If the slab cache has no free objects, it falls back >>>>>>>> to the buddy allocator to allocate memory. However, at this point page_ext >>>>>>>> is not yet fully initialized, so these newly allocated pages have no >>>>>>>> codetag set. These pages may later be reclaimed by KASAN,which causes >>>>>>>> the warning to trigger when they are freed because their codetag ref is >>>>>>>> still empty. >>>>>>>> >>>>>>>> Use a global array to track pages allocated before page_ext is fully >>>>>>>> initialized, similar to how kmemleak tracks early allocations. >>>>>>>> When page_ext initialization completes, set their codetag >>>>>>>> to empty to avoid warnings when they are freed later. >>>>>>>> >>>>>>>> ... >>>>>>>> >>>>>>>> --- a/include/linux/alloc_tag.h >>>>>>>> +++ b/include/linux/alloc_tag.h >>>>>>>> @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) >>>>>>>> >>>>>>>> #ifdef CONFIG_MEM_ALLOC_PROFILING >>>>>>>> >>>>>>>> +bool mem_profiling_is_available(void); >>>>>>>> +void alloc_tag_add_early_pfn(unsigned long pfn); >>>>>>>> + >>>>>>>> #define ALLOC_TAG_SECTION_NAME "alloc_tags" >>>>>>>> >>>>>>>> struct codetag_bytes { >>>>>>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c >>>>>>>> index 58991ab09d84..a5bf4e72c154 100644 >>>>>>>> --- a/lib/alloc_tag.c >>>>>>>> +++ b/lib/alloc_tag.c >>>>>>>> @@ -6,6 +6,7 @@ >>>>>>>> #include <linux/kallsyms.h> >>>>>>>> #include <linux/module.h> >>>>>>>> #include <linux/page_ext.h> >>>>>>>> +#include <linux/pgalloc_tag.h> >>>>>>>> #include <linux/proc_fs.h> >>>>>>>> #include <linux/seq_buf.h> >>>>>>>> #include <linux/seq_file.h> >>>>>>>> @@ -26,6 +27,82 @@ static bool mem_profiling_support; >>>>>>>> >>>>>>>> static struct codetag_type *alloc_tag_cttype; >>>>>>>> >>>>>>>> +/* >>>>>>>> + * State of the alloc_tag >>>>>>>> + * >>>>>>>> + * This is used to describe the states of the alloc_tag during bootup. >>>>>>>> + * >>>>>>>> + * When we need to allocate page_ext to store codetag, we face an >>>>>>>> + * initialization timing problem: >>>>>>>> + * >>>>>>>> + * Due to initialization order, pages may be allocated via buddy system >>>>>>>> + * before page_ext is fully allocated and initialized. Although these >>>>>>>> + * pages call the allocation hooks, the codetag will not be set because >>>>>>>> + * page_ext is not yet available. >>>>>>>> + * >>>>>>>> + * When these pages are later free to the buddy system, it triggers >>>>>>>> + * warnings because their codetag is actually empty if >>>>>>>> + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. >>>>>>>> + * >>>>>>>> + * Additionally, in this situation, we cannot record detailed allocation >>>>>>>> + * information for these pages. >>>>>>>> + */ >>>>>>>> +enum mem_profiling_state { >>>>>>>> + DOWN, /* No mem_profiling functionality yet */ >>>>>>>> + UP /* Everything is working */ >>>>>>>> +}; >>>>>>>> + >>>>>>>> +static enum mem_profiling_state mem_profiling_state = DOWN; >>>>>>>> + >>>>>>>> +bool mem_profiling_is_available(void) >>>>>>>> +{ >>>>>>>> + return mem_profiling_state == UP; >>>>>>>> +} >>>>>>>> + >>>>>>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG >>>>>>>> + >>>>>>>> +#define EARLY_ALLOC_PFN_MAX 256 >>>>>>>> + >>>>>>>> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; >>>>>>> It's unfortunate that this isn't __initdata. >>>>>>> >>>>>>>> +static unsigned int early_pfn_count; >>>>>>>> +static DEFINE_SPINLOCK(early_pfn_lock); >>>>>>>> + >>>>>>>> >>>>>>>> ... >>>>>>>> >>>>>>>> --- a/mm/page_alloc.c >>>>>>>> +++ b/mm/page_alloc.c >>>>>>>> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, >>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >>>>>>>> update_page_tag_ref(handle, &ref); >>>>>>>> put_page_tag_ref(handle); >>>>>>>> + } else { >>>>>> This branch can be marked as "unlikely". >>>>>> >>>>>>>> + /* >>>>>>>> + * page_ext is not available yet, record the pfn so we can >>>>>>>> + * clear the tag ref later when page_ext is initialized. >>>>>>>> + */ >>>>>>>> + if (!mem_profiling_is_available()) >>>>>>>> + alloc_tag_add_early_pfn(page_to_pfn(page)); >>>>>>>> } >>>>>>>> } >>>>>>> All because of this, I believe. Is this fixable? >>>>>>> >>>>>>> If we take that `else', we know we're running in __init code, yes? I >>>>>>> don't see how `__init pgalloc_tag_add_early()' could be made to work. >>>>>>> hrm. Something clever, please. >>>>>> We can have a pointer to a function that is initialized to point to >>>>>> alloc_tag_add_early_pfn, which is defined as __init and uses >>>>>> early_pfns which now can be defined as __initdata. After >>>>>> clear_early_alloc_pfn_tag_refs() is done we reset that pointer to >>>>>> NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() >>>>>> directly checks that pointer and if it's not NULL then calls the >>>>>> function that it points to. This way __pgalloc_tag_add() which is not >>>>>> an __init function will be invoking alloc_tag_add_early_pfn() __init >>>>>> function only until we are done with initialization. I haven't tried >>>>>> this but I think that should work. This also eliminates the need for >>>>>> mem_profiling_state variable since we can use this function pointer >>>>>> instead. >>>>>> >>>>>> ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-24 9:43 ` Hao Ge @ 2026-03-25 0:21 ` Suren Baghdasaryan 2026-03-25 2:07 ` Hao Ge 0 siblings, 1 reply; 21+ messages in thread From: Suren Baghdasaryan @ 2026-03-25 0:21 UTC (permalink / raw) To: Hao Ge; +Cc: Andrew Morton, Kent Overstreet, linux-mm, linux-kernel On Tue, Mar 24, 2026 at 2:43 AM Hao Ge <hao.ge@linux.dev> wrote: > > > On 2026/3/24 06:47, Suren Baghdasaryan wrote: > > On Mon, Mar 23, 2026 at 2:16 AM Hao Ge <hao.ge@linux.dev> wrote: > >> > >> On 2026/3/20 10:14, Suren Baghdasaryan wrote: > >>> On Thu, Mar 19, 2026 at 6:58 PM Hao Ge <hao.ge@linux.dev> wrote: > >>>> On 2026/3/20 07:48, Suren Baghdasaryan wrote: > >>>>> On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@google.com> wrote: > >>>>>> On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: > >>>>>>> On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: > >>>>>>> > >>>>>>>> Due to initialization ordering, page_ext is allocated and initialized > >>>>>>>> relatively late during boot. Some pages have already been allocated > >>>>>>>> and freed before page_ext becomes available, leaving their codetag > >>>>>>>> uninitialized. > >>>>>> Hi Hao, > >>>>>> Thanks for the report. > >>>>>> Hmm. So, we are allocating pages before page_ext is initialized... > >>>>>> > >>>>>>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls > >>>>>>>> kmemleak_alloc(). > >>>>> Forgot to ask. The example you are using here is for page_ext > >>>>> allocation itself. Do you have any other examples where page > >>>>> allocation happens before page_ext initialization? If that's the only > >>>>> place, then we might be able to fix this in a simpler way by doing > >>>>> something special for alloc_page_ext(). > >>>> Hi Suren > >>>> > >>>> To help illustrate the point, here's the debug log I added: > >>>> > >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>>> index 2d4b6f1a554e..ebfe636f5b07 100644 > >>>> --- a/mm/page_alloc.c > >>>> +++ b/mm/page_alloc.c > >>>> @@ -1293,6 +1293,9 @@ void __pgalloc_tag_add(struct page *page, struct > >>>> task_struct *task, > >>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >>>> update_page_tag_ref(handle, &ref); > >>>> put_page_tag_ref(handle); > >>>> + } else { > >>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! > >>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > >>>> + dump_stack(); > >>>> } > >>>> } > >>>> > >>>> > >>>> And I caught the following logs: > >>>> > >>>> [ 0.296399] __pgalloc_tag_add: get_page_tag_ref failed! > >>>> page=ffffea000400c700 pfn=1049372 nr=1 > >>>> [ 0.296400] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >>>> [ 0.296402] Hardware name: Red Hat KVM, BIOS > >>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>> [ 0.296402] Call Trace: > >>>> [ 0.296403] <TASK> > >>>> [ 0.296403] dump_stack_lvl+0x53/0x70 > >>>> [ 0.296405] __pgalloc_tag_add+0x3a3/0x6e0 > >>>> [ 0.296406] ? __pfx___pgalloc_tag_add+0x10/0x10 > >>>> [ 0.296407] ? kasan_unpoison+0x27/0x60 > >>>> [ 0.296409] ? __kasan_unpoison_pages+0x2c/0x40 > >>>> [ 0.296411] get_page_from_freelist+0xa54/0x1310 > >>>> [ 0.296413] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>> [ 0.296415] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >>>> [ 0.296417] ? stack_depot_save_flags+0x3f/0x680 > >>>> [ 0.296418] ? ___slab_alloc+0x518/0x530 > >>>> [ 0.296420] alloc_pages_mpol+0x13a/0x3f0 > >>>> [ 0.296421] ? __pfx_alloc_pages_mpol+0x10/0x10 > >>>> [ 0.296423] ? _raw_spin_lock_irqsave+0x8a/0xf0 > >>>> [ 0.296424] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 > >>>> [ 0.296426] alloc_slab_page+0xc2/0x130 > >>>> [ 0.296427] allocate_slab+0x77/0x2c0 > >>>> [ 0.296429] ? syscall_enter_define_fields+0x3bb/0x5f0 > >>>> [ 0.296430] ___slab_alloc+0x125/0x530 > >>>> [ 0.296432] ? __trace_define_field+0x252/0x3d0 > >>>> [ 0.296433] __kmalloc_noprof+0x329/0x630 > >>>> [ 0.296435] ? syscall_enter_define_fields+0x3bb/0x5f0 > >>>> [ 0.296436] syscall_enter_define_fields+0x3bb/0x5f0 > >>>> [ 0.296438] ? __pfx_syscall_enter_define_fields+0x10/0x10 > >>>> [ 0.296440] event_define_fields+0x326/0x540 > >>>> [ 0.296441] __trace_early_add_events+0xac/0x3c0 > >>>> [ 0.296443] trace_event_init+0x24c/0x460 > >>>> [ 0.296445] trace_init+0x9/0x20 > >>>> [ 0.296446] start_kernel+0x199/0x3c0 > >>>> [ 0.296448] x86_64_start_reservations+0x18/0x30 > >>>> [ 0.296449] x86_64_start_kernel+0xe2/0xf0 > >>>> [ 0.296451] common_startup_64+0x13e/0x141 > >>>> [ 0.296453] </TASK> > >>>> > >>>> > >>>> [ 0.312234] __pgalloc_tag_add: get_page_tag_ref failed! > >>>> page=ffffea000400f900 pfn=1049572 nr=1 > >>>> [ 0.312234] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >>>> [ 0.312236] Hardware name: Red Hat KVM, BIOS > >>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>> [ 0.312236] Call Trace: > >>>> [ 0.312237] <TASK> > >>>> [ 0.312237] dump_stack_lvl+0x53/0x70 > >>>> [ 0.312239] __pgalloc_tag_add+0x3a3/0x6e0 > >>>> [ 0.312240] ? __pfx___pgalloc_tag_add+0x10/0x10 > >>>> [ 0.312241] ? rmqueue.constprop.0+0x4fc/0x1ce0 > >>>> [ 0.312243] ? kasan_unpoison+0x27/0x60 > >>>> [ 0.312244] ? __kasan_unpoison_pages+0x2c/0x40 > >>>> [ 0.312246] get_page_from_freelist+0xa54/0x1310 > >>>> [ 0.312248] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>> [ 0.312250] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >>>> [ 0.312253] alloc_slab_page+0x39/0x130 > >>>> [ 0.312254] allocate_slab+0x77/0x2c0 > >>>> [ 0.312255] ? alloc_cpumask_var_node+0xc7/0x230 > >>>> [ 0.312257] ___slab_alloc+0x46d/0x530 > >>>> [ 0.312259] __kmalloc_node_noprof+0x2fa/0x680 > >>>> [ 0.312261] ? alloc_cpumask_var_node+0xc7/0x230 > >>>> [ 0.312263] alloc_cpumask_var_node+0xc7/0x230 > >>>> [ 0.312264] init_desc+0x141/0x6b0 > >>>> [ 0.312266] alloc_desc+0x108/0x1b0 > >>>> [ 0.312267] early_irq_init+0xee/0x1c0 > >>>> [ 0.312268] ? __pfx_early_irq_init+0x10/0x10 > >>>> [ 0.312271] start_kernel+0x1ab/0x3c0 > >>>> [ 0.312272] x86_64_start_reservations+0x18/0x30 > >>>> [ 0.312274] x86_64_start_kernel+0xe2/0xf0 > >>>> [ 0.312275] common_startup_64+0x13e/0x141 > >>>> [ 0.312277] </TASK> > >>>> > >>>> [ 0.312834] __pgalloc_tag_add: get_page_tag_ref failed! > >>>> page=ffffea000400fc00 pfn=1049584 nr=1 > >>>> [ 0.312835] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >>>> [ 0.312836] Hardware name: Red Hat KVM, BIOS > >>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>> [ 0.312837] Call Trace: > >>>> [ 0.312837] <TASK> > >>>> [ 0.312838] dump_stack_lvl+0x53/0x70 > >>>> [ 0.312840] __pgalloc_tag_add+0x3a3/0x6e0 > >>>> [ 0.312841] ? __pfx___pgalloc_tag_add+0x10/0x10 > >>>> [ 0.312842] ? rmqueue.constprop.0+0x4fc/0x1ce0 > >>>> [ 0.312844] ? kasan_unpoison+0x27/0x60 > >>>> [ 0.312845] ? __kasan_unpoison_pages+0x2c/0x40 > >>>> [ 0.312847] get_page_from_freelist+0xa54/0x1310 > >>>> [ 0.312849] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>> [ 0.312851] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >>>> [ 0.312853] alloc_pages_mpol+0x13a/0x3f0 > >>>> [ 0.312855] ? __pfx_alloc_pages_mpol+0x10/0x10 > >>>> [ 0.312856] ? xas_find+0x2d8/0x450 > >>>> [ 0.312858] ? _raw_spin_lock+0x84/0xe0 > >>>> [ 0.312859] ? __pfx__raw_spin_lock+0x10/0x10 > >>>> [ 0.312861] alloc_pages_noprof+0xf6/0x2b0 > >>>> [ 0.312862] __change_page_attr+0x293/0x850 > >>>> [ 0.312864] ? __pfx___change_page_attr+0x10/0x10 > >>>> [ 0.312865] ? _vm_unmap_aliases+0x2d0/0x650 > >>>> [ 0.312868] ? __pfx__vm_unmap_aliases+0x10/0x10 > >>>> [ 0.312869] __change_page_attr_set_clr+0x16c/0x360 > >>>> [ 0.312871] ? spp_getpage+0xbb/0x1e0 > >>>> [ 0.312872] change_page_attr_set_clr+0x220/0x3c0 > >>>> [ 0.312873] ? flush_tlb_one_kernel+0xf/0x30 > >>>> [ 0.312875] ? set_pte_vaddr_p4d+0x110/0x180 > >>>> [ 0.312877] ? __pfx_change_page_attr_set_clr+0x10/0x10 > >>>> [ 0.312878] ? __pfx_set_pte_vaddr_p4d+0x10/0x10 > >>>> [ 0.312881] ? __pfx_mtree_load+0x10/0x10 > >>>> [ 0.312883] ? __pfx_mtree_load+0x10/0x10 > >>>> [ 0.312884] ? __asan_memcpy+0x3c/0x60 > >>>> [ 0.312886] ? set_intr_gate+0x10c/0x150 > >>>> [ 0.312888] set_memory_ro+0x76/0xa0 > >>>> [ 0.312889] ? __pfx_set_memory_ro+0x10/0x10 > >>>> [ 0.312891] idt_setup_apic_and_irq_gates+0x2c1/0x390 > >>>> > >>>> and more. > >>> Ok, it's not the only place. Got your point. > >>> > >>>> off topic - if we were to handle only alloc_page_ext() specifically, > >>>> what would be the most straightforward > >>>> > >>>> solution in your mind? I'd really appreciate your insight. > >>> I was thinking if it's the only special case maybe we can handle it > >>> somehow differently, like we do when we allocate obj_ext vectors for > >>> slabs using __GFP_NO_OBJ_EXT. I haven't found a good solution yet but > >>> since it's not a special case we would not be able to use it even if I > >>> came up with something... > >>> I think your way is the most straight-forward but please try my > >>> suggestion to see if we can avoid extra overhead. > >>> Thanks, > >>> Suren. > > Hi Suren > > > > Hi Hao, > > > >> Hi Suren > >> > >> Thank you for your feedback. After re-examining this issue, > >> > >> I realize my previous focus was misplaced. > >> > >> Upon deeper consideration, I understand that this is not merely a bug, > >> > >> but rather a warning that indicates a gap in our memory profiling mechanism. > >> > >> Specifically, the current implementation appears to be missing memory > >> allocation > >> > >> tracking during the period between the buddy system allocation and page_ext > >> > >> initialization. > >> > >> This profiling gap means we may not be capturing all relevant memory > >> allocation > >> > >> events during this critical transition phase. > > Correct, this limitation exists because memory profiling relies on > > some kernel facilities (page_ext, objj_ext) which might not be > > initialized yet at the time of allocation. > > > >> My approach is to dynamically allocate codetag_ref when get_page_tag_ref > >> fails, > >> > >> and maintain a linked list to track all buddy system allocations that > >> occur prior to page_ext initialization. > >> > >> However, this introduces performance concerns: > >> > >> 1. Free Path Overhead: When freeing these pages, we would need to > >> traverse the entire linked list to locate > >> > >> the corresponding codetag_ref, resulting in O(n) lookup complexity > >> per free operation. > >> > >> 2. Initialization Overhead: During init_page_alloc_tagging, iterating > >> through the linked list to assign codetag_ref to > >> > >> page_ext would introduce additional traversal cost. > >> > >> If the number of pages is substantial, this could incur significant > >> overhead. What are your thoughts on this? I look forward to your > >> suggestions. > > My thinking is that these early allocations comprise a small portion > > of overall memory consumed by the system. So, instead of trying to > > record and handle them in some alternative way, we just accept that > > some counters might not be exactly accurate and ignore those early > > allocations. See how the early slab allocations are marked with the > > CODETAG_FLAG_INACCURATE flag and later reported as inaccurate. I think > > that's an acceptable alternative to introducing extra complexity and > > performance overhead. IOW, the benefits of accounting for these early > > allocations are low compared to the effort required to account for > > them. Unless you found a simple and performant way to do that... > > > I have been exploring possible solutions to this issue over the past few > days, > > but so far I have not come up with a good approach. > > I have counted the number of memory allocations that occur earlier than the > > allocation and initialization of our page_ext, and found that there are > actually > > quite a lot of them. Interesting... I wonder it's because deferred_struct_pages defers page_ext initialization. Can you check if setting early_page_ext reduces or eliminates these allocations before page_ext init cases? > > Similarly, I have made the following changes and collected the > corresponding logs. > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 2d4b6f1a554e..6db65b3d52d3 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1293,6 +1293,8 @@ void __pgalloc_tag_add(struct page *page, struct > task_struct *task, > alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > update_page_tag_ref(handle, &ref); > put_page_tag_ref(handle); > + } else{ > + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! > page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > } > } > > @@ -1314,6 +1316,8 @@ void __pgalloc_tag_sub(struct page *page, unsigned > int nr) > alloc_tag_sub(&ref, PAGE_SIZE * nr); > update_page_tag_ref(handle, &ref); > put_page_tag_ref(handle); > + } else{ > + pr_warn("__pgalloc_tag_sub: get_page_tag_ref failed! > page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > } > } > > [ 0.261699] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001000 pfn=1048640 nr=2 > [ 0.261711] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001100 pfn=1048644 nr=4 > [ 0.261717] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001200 pfn=1048648 nr=4 > [ 0.261721] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001300 pfn=1048652 nr=4 > [ 0.261893] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001080 pfn=1048642 nr=2 > [ 0.261917] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001400 pfn=1048656 nr=4 > [ 0.262018] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001500 pfn=1048660 nr=2 > [ 0.262024] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001600 pfn=1048664 nr=8 > [ 0.262040] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001580 pfn=1048662 nr=1 > [ 0.262048] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea00040015c0 pfn=1048663 nr=1 > [ 0.262056] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001800 pfn=1048672 nr=2 > [ 0.262064] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001880 pfn=1048674 nr=2 > [ 0.262078] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001900 pfn=1048676 nr=2 > [ 0.262196] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 > [ 0.262213] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001980 pfn=1048678 nr=2 > [ 0.262220] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001a00 pfn=1048680 nr=4 > [ 0.262246] ODEBUG: selftest passed > [ 0.262268] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001b00 pfn=1048684 nr=1 > [ 0.262318] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001b40 pfn=1048685 nr=1 > [ 0.262368] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001b80 pfn=1048686 nr=1 > [ 0.262418] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001bc0 pfn=1048687 nr=1 > [ 0.262469] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001c00 pfn=1048688 nr=1 > [ 0.262519] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001c40 pfn=1048689 nr=1 > [ 0.262569] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001c80 pfn=1048690 nr=1 > [ 0.262620] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001cc0 pfn=1048691 nr=1 > [ 0.262670] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001d00 pfn=1048692 nr=1 > [ 0.262721] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001d40 pfn=1048693 nr=1 > [ 0.262771] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001d80 pfn=1048694 nr=1 > [ 0.262821] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001dc0 pfn=1048695 nr=1 > [ 0.262871] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001e00 pfn=1048696 nr=1 > [ 0.262923] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001e40 pfn=1048697 nr=1 > [ 0.262974] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001e80 pfn=1048698 nr=1 > [ 0.263024] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001ec0 pfn=1048699 nr=1 > [ 0.263074] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001f00 pfn=1048700 nr=1 > [ 0.263124] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001f40 pfn=1048701 nr=1 > [ 0.263174] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001f80 pfn=1048702 nr=1 > [ 0.263224] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004001fc0 pfn=1048703 nr=1 > [ 0.263275] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004002000 pfn=1048704 nr=1 > [ 0.263325] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004002040 pfn=1048705 nr=1 > [ 0.263375] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004002080 pfn=1048706 nr=1 > [ 0.263427] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004002400 pfn=1048720 nr=16 > [ 0.263437] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea00040020c0 pfn=1048707 nr=1 > [ 0.263463] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004002100 pfn=1048708 nr=1 > [ 0.263465] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004002140 pfn=1048709 nr=1 > [ 0.263467] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004002180 pfn=1048710 nr=1 > [ 0.263509] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004002200 pfn=1048712 nr=4 > [ 0.263512] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004002800 pfn=1048736 nr=8 > [ 0.263524] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea00040021c0 pfn=1048711 nr=1 > [ 0.263536] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004002300 pfn=1048716 nr=1 > [ 0.263537] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004002340 pfn=1048717 nr=1 > [ 0.263539] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004002380 pfn=1048718 nr=1 > [ 0.263604] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004004000 pfn=1048832 nr=128 > [ 0.263638] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004003000 pfn=1048768 nr=64 > [ 0.263650] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004002c00 pfn=1048752 nr=16 > [ 0.263655] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea00040023c0 pfn=1048719 nr=1 > [ 0.270582] __pgalloc_tag_sub: get_page_tag_ref failed! > page=ffffea00040023c0 pfn=1048719 nr=1 > [ 0.270591] ftrace: allocating 52717 entries in 208 pages > [ 0.270592] ftrace: allocated 208 pages with 3 groups > [ 0.270620] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004002a00 pfn=1048744 nr=8 > [ 0.270636] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea00040023c0 pfn=1048719 nr=1 > [ 0.270643] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006000 pfn=1048960 nr=1 > [ 0.270649] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006040 pfn=1048961 nr=1 > [ 0.270658] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004007000 pfn=1049024 nr=64 > [ 0.270659] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006080 pfn=1048962 nr=2 > [ 0.270722] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006100 pfn=1048964 nr=1 > [ 0.270730] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006140 pfn=1048965 nr=1 > [ 0.270738] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006180 pfn=1048966 nr=1 > [ 0.270777] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea00040061c0 pfn=1048967 nr=1 > [ 0.270786] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006200 pfn=1048968 nr=1 > [ 0.270792] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006240 pfn=1048969 nr=1 > [ 0.270833] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006300 pfn=1048972 nr=4 > [ 0.270891] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006280 pfn=1048970 nr=1 > [ 0.270980] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea00040062c0 pfn=1048971 nr=1 > [ 0.271071] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006400 pfn=1048976 nr=1 > [ 0.271156] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006440 pfn=1048977 nr=1 > [ 0.271185] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006480 pfn=1048978 nr=2 > [ 0.271301] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006500 pfn=1048980 nr=1 > [ 0.271655] Dynamic Preempt: lazy > [ 0.271662] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006580 pfn=1048982 nr=2 > [ 0.271752] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006600 pfn=1048984 nr=4 > [ 0.271762] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004010000 pfn=1049600 nr=4 > [ 0.271824] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006540 pfn=1048981 nr=1 > [ 0.271916] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006700 pfn=1048988 nr=2 > [ 0.271964] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006780 pfn=1048990 nr=1 > [ 0.272099] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea00040067c0 pfn=1048991 nr=1 > [ 0.272138] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006800 pfn=1048992 nr=2 > [ 0.272144] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006a00 pfn=1049000 nr=8 > [ 0.272249] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006c00 pfn=1049008 nr=8 > [ 0.272319] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006880 pfn=1048994 nr=2 > [ 0.272351] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006900 pfn=1048996 nr=4 > [ 0.272424] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004006e00 pfn=1049016 nr=8 > [ 0.272485] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004008000 pfn=1049088 nr=8 > [ 0.272535] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004008200 pfn=1049096 nr=2 > [ 0.272600] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004008400 pfn=1049104 nr=8 > [ 0.272663] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004008300 pfn=1049100 nr=4 > [ 0.272694] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004008280 pfn=1049098 nr=2 > [ 0.272708] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004008600 pfn=1049112 nr=8 > > [ 0.272924] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004008880 pfn=1049122 nr=2 > [ 0.272934] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004008900 pfn=1049124 nr=2 > [ 0.272952] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004008c00 pfn=1049136 nr=4 > [ 0.273035] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004008980 pfn=1049126 nr=2 > [ 0.273062] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004008e00 pfn=1049144 nr=8 > [ 0.273674] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004008d00 pfn=1049140 nr=1 > [ 0.273884] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004008d80 pfn=1049142 nr=2 > [ 0.273943] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009000 pfn=1049152 nr=2 > [ 0.274379] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009080 pfn=1049154 nr=2 > [ 0.274575] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009200 pfn=1049160 nr=8 > [ 0.274617] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009100 pfn=1049156 nr=4 > [ 0.274794] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009400 pfn=1049168 nr=2 > [ 0.274840] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009480 pfn=1049170 nr=2 > [ 0.275057] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009500 pfn=1049172 nr=2 > [ 0.275092] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009580 pfn=1049174 nr=2 > [ 0.275134] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009600 pfn=1049176 nr=8 > [ 0.275211] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009800 pfn=1049184 nr=4 > [ 0.275510] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009900 pfn=1049188 nr=2 > [ 0.275548] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009980 pfn=1049190 nr=2 > [ 0.275976] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009a00 pfn=1049192 nr=8 > [ 0.275987] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009c00 pfn=1049200 nr=2 > [ 0.276139] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009c80 pfn=1049202 nr=2 > [ 0.276152] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004008d40 pfn=1049141 nr=1 > [ 0.276242] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009d00 pfn=1049204 nr=1 > [ 0.276358] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009d40 pfn=1049205 nr=1 > [ 0.276444] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009d80 pfn=1049206 nr=1 > [ 0.276526] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009dc0 pfn=1049207 nr=1 > [ 0.276615] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009e00 pfn=1049208 nr=1 > [ 0.276696] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009e40 pfn=1049209 nr=1 > [ 0.276792] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009e80 pfn=1049210 nr=1 > [ 0.276827] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009f00 pfn=1049212 nr=2 > [ 0.276891] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009ec0 pfn=1049211 nr=1 > [ 0.276999] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009f80 pfn=1049214 nr=1 > [ 0.277082] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea0004009fc0 pfn=1049215 nr=1 > [ 0.277172] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea000400a000 pfn=1049216 nr=1 > [ 0.277257] __pgalloc_tag_add: get_page_tag_ref failed! > page=ffffea000400a040 pfn=1049217 nr=1 > > and so on. > > > > I think your earlier patch can effectively detect these early > > allocations and suppress the warnings. We should also mark these > > allocations with CODETAG_FLAG_INACCURATE. > > Thanks to an excellent AI review, I realized there are issues with > > my original patch. One problem is the 256-element array; another Yes, if there are lots of such allocations, it's not appropriate. > > is that it involves allocation and free operations — meaning we need > > to record entries at __pgalloc_tag_add and remove them at __pgalloc_tag_sub, > > which introduces a noticeable overhead. I'm wondering if we can instead > set a flag > > bit in page flags during the early boot stage, which I'll refer to as > EARLY_ALLOC_FLAGS. > > Then, in __pgalloc_tag_sub, we first check for EARLY_ALLOC_FLAGS. If > set, we clear the > > flag and return immediately; otherwise, we perform the actual > subtraction of the tag count. > > This approach seems somewhat similar to the idea behind > mem_profiling_compressed. That seems doable but let's first check if we can make page_ext initialization happen before these allocations. That would be the ideal path. If it's not possible then we can focus on alternatives like the one you propose. > > I would appreciate your valuable feedback and any better suggestions you > might have. Thanks for pursuing this! I'll help in any way I can. Suren. > > Thanks > > Hao > > > Thanks, > > Suren. > > > >> > >> Thanks > >> > >> Hao > >> > >>>> Thanks. > >>>> > >>>> > >>>>>>>> If the slab cache has no free objects, it falls back > >>>>>>>> to the buddy allocator to allocate memory. However, at this point page_ext > >>>>>>>> is not yet fully initialized, so these newly allocated pages have no > >>>>>>>> codetag set. These pages may later be reclaimed by KASAN,which causes > >>>>>>>> the warning to trigger when they are freed because their codetag ref is > >>>>>>>> still empty. > >>>>>>>> > >>>>>>>> Use a global array to track pages allocated before page_ext is fully > >>>>>>>> initialized, similar to how kmemleak tracks early allocations. > >>>>>>>> When page_ext initialization completes, set their codetag > >>>>>>>> to empty to avoid warnings when they are freed later. > >>>>>>>> > >>>>>>>> ... > >>>>>>>> > >>>>>>>> --- a/include/linux/alloc_tag.h > >>>>>>>> +++ b/include/linux/alloc_tag.h > >>>>>>>> @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) > >>>>>>>> > >>>>>>>> #ifdef CONFIG_MEM_ALLOC_PROFILING > >>>>>>>> > >>>>>>>> +bool mem_profiling_is_available(void); > >>>>>>>> +void alloc_tag_add_early_pfn(unsigned long pfn); > >>>>>>>> + > >>>>>>>> #define ALLOC_TAG_SECTION_NAME "alloc_tags" > >>>>>>>> > >>>>>>>> struct codetag_bytes { > >>>>>>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > >>>>>>>> index 58991ab09d84..a5bf4e72c154 100644 > >>>>>>>> --- a/lib/alloc_tag.c > >>>>>>>> +++ b/lib/alloc_tag.c > >>>>>>>> @@ -6,6 +6,7 @@ > >>>>>>>> #include <linux/kallsyms.h> > >>>>>>>> #include <linux/module.h> > >>>>>>>> #include <linux/page_ext.h> > >>>>>>>> +#include <linux/pgalloc_tag.h> > >>>>>>>> #include <linux/proc_fs.h> > >>>>>>>> #include <linux/seq_buf.h> > >>>>>>>> #include <linux/seq_file.h> > >>>>>>>> @@ -26,6 +27,82 @@ static bool mem_profiling_support; > >>>>>>>> > >>>>>>>> static struct codetag_type *alloc_tag_cttype; > >>>>>>>> > >>>>>>>> +/* > >>>>>>>> + * State of the alloc_tag > >>>>>>>> + * > >>>>>>>> + * This is used to describe the states of the alloc_tag during bootup. > >>>>>>>> + * > >>>>>>>> + * When we need to allocate page_ext to store codetag, we face an > >>>>>>>> + * initialization timing problem: > >>>>>>>> + * > >>>>>>>> + * Due to initialization order, pages may be allocated via buddy system > >>>>>>>> + * before page_ext is fully allocated and initialized. Although these > >>>>>>>> + * pages call the allocation hooks, the codetag will not be set because > >>>>>>>> + * page_ext is not yet available. > >>>>>>>> + * > >>>>>>>> + * When these pages are later free to the buddy system, it triggers > >>>>>>>> + * warnings because their codetag is actually empty if > >>>>>>>> + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. > >>>>>>>> + * > >>>>>>>> + * Additionally, in this situation, we cannot record detailed allocation > >>>>>>>> + * information for these pages. > >>>>>>>> + */ > >>>>>>>> +enum mem_profiling_state { > >>>>>>>> + DOWN, /* No mem_profiling functionality yet */ > >>>>>>>> + UP /* Everything is working */ > >>>>>>>> +}; > >>>>>>>> + > >>>>>>>> +static enum mem_profiling_state mem_profiling_state = DOWN; > >>>>>>>> + > >>>>>>>> +bool mem_profiling_is_available(void) > >>>>>>>> +{ > >>>>>>>> + return mem_profiling_state == UP; > >>>>>>>> +} > >>>>>>>> + > >>>>>>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG > >>>>>>>> + > >>>>>>>> +#define EARLY_ALLOC_PFN_MAX 256 > >>>>>>>> + > >>>>>>>> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; > >>>>>>> It's unfortunate that this isn't __initdata. > >>>>>>> > >>>>>>>> +static unsigned int early_pfn_count; > >>>>>>>> +static DEFINE_SPINLOCK(early_pfn_lock); > >>>>>>>> + > >>>>>>>> > >>>>>>>> ... > >>>>>>>> > >>>>>>>> --- a/mm/page_alloc.c > >>>>>>>> +++ b/mm/page_alloc.c > >>>>>>>> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, > >>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >>>>>>>> update_page_tag_ref(handle, &ref); > >>>>>>>> put_page_tag_ref(handle); > >>>>>>>> + } else { > >>>>>> This branch can be marked as "unlikely". > >>>>>> > >>>>>>>> + /* > >>>>>>>> + * page_ext is not available yet, record the pfn so we can > >>>>>>>> + * clear the tag ref later when page_ext is initialized. > >>>>>>>> + */ > >>>>>>>> + if (!mem_profiling_is_available()) > >>>>>>>> + alloc_tag_add_early_pfn(page_to_pfn(page)); > >>>>>>>> } > >>>>>>>> } > >>>>>>> All because of this, I believe. Is this fixable? > >>>>>>> > >>>>>>> If we take that `else', we know we're running in __init code, yes? I > >>>>>>> don't see how `__init pgalloc_tag_add_early()' could be made to work. > >>>>>>> hrm. Something clever, please. > >>>>>> We can have a pointer to a function that is initialized to point to > >>>>>> alloc_tag_add_early_pfn, which is defined as __init and uses > >>>>>> early_pfns which now can be defined as __initdata. After > >>>>>> clear_early_alloc_pfn_tag_refs() is done we reset that pointer to > >>>>>> NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() > >>>>>> directly checks that pointer and if it's not NULL then calls the > >>>>>> function that it points to. This way __pgalloc_tag_add() which is not > >>>>>> an __init function will be invoking alloc_tag_add_early_pfn() __init > >>>>>> function only until we are done with initialization. I haven't tried > >>>>>> this but I think that should work. This also eliminates the need for > >>>>>> mem_profiling_state variable since we can use this function pointer > >>>>>> instead. > >>>>>> > >>>>>> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-25 0:21 ` Suren Baghdasaryan @ 2026-03-25 2:07 ` Hao Ge 2026-03-25 6:25 ` Suren Baghdasaryan 0 siblings, 1 reply; 21+ messages in thread From: Hao Ge @ 2026-03-25 2:07 UTC (permalink / raw) To: Suren Baghdasaryan; +Cc: Andrew Morton, Kent Overstreet, linux-mm, linux-kernel On 2026/3/25 08:21, Suren Baghdasaryan wrote: > On Tue, Mar 24, 2026 at 2:43 AM Hao Ge <hao.ge@linux.dev> wrote: >> >> On 2026/3/24 06:47, Suren Baghdasaryan wrote: >>> On Mon, Mar 23, 2026 at 2:16 AM Hao Ge <hao.ge@linux.dev> wrote: >>>> On 2026/3/20 10:14, Suren Baghdasaryan wrote: >>>>> On Thu, Mar 19, 2026 at 6:58 PM Hao Ge <hao.ge@linux.dev> wrote: >>>>>> On 2026/3/20 07:48, Suren Baghdasaryan wrote: >>>>>>> On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@google.com> wrote: >>>>>>>> On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: >>>>>>>>> On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: >>>>>>>>> >>>>>>>>>> Due to initialization ordering, page_ext is allocated and initialized >>>>>>>>>> relatively late during boot. Some pages have already been allocated >>>>>>>>>> and freed before page_ext becomes available, leaving their codetag >>>>>>>>>> uninitialized. >>>>>>>> Hi Hao, >>>>>>>> Thanks for the report. >>>>>>>> Hmm. So, we are allocating pages before page_ext is initialized... >>>>>>>> >>>>>>>>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls >>>>>>>>>> kmemleak_alloc(). >>>>>>> Forgot to ask. The example you are using here is for page_ext >>>>>>> allocation itself. Do you have any other examples where page >>>>>>> allocation happens before page_ext initialization? If that's the only >>>>>>> place, then we might be able to fix this in a simpler way by doing >>>>>>> something special for alloc_page_ext(). >>>>>> Hi Suren >>>>>> >>>>>> To help illustrate the point, here's the debug log I added: >>>>>> >>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>>>> index 2d4b6f1a554e..ebfe636f5b07 100644 >>>>>> --- a/mm/page_alloc.c >>>>>> +++ b/mm/page_alloc.c >>>>>> @@ -1293,6 +1293,9 @@ void __pgalloc_tag_add(struct page *page, struct >>>>>> task_struct *task, >>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >>>>>> update_page_tag_ref(handle, &ref); >>>>>> put_page_tag_ref(handle); >>>>>> + } else { >>>>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! >>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); >>>>>> + dump_stack(); >>>>>> } >>>>>> } >>>>>> >>>>>> >>>>>> And I caught the following logs: >>>>>> >>>>>> [ 0.296399] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>> page=ffffea000400c700 pfn=1049372 nr=1 >>>>>> [ 0.296400] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >>>>>> [ 0.296402] Hardware name: Red Hat KVM, BIOS >>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>>>> [ 0.296402] Call Trace: >>>>>> [ 0.296403] <TASK> >>>>>> [ 0.296403] dump_stack_lvl+0x53/0x70 >>>>>> [ 0.296405] __pgalloc_tag_add+0x3a3/0x6e0 >>>>>> [ 0.296406] ? __pfx___pgalloc_tag_add+0x10/0x10 >>>>>> [ 0.296407] ? kasan_unpoison+0x27/0x60 >>>>>> [ 0.296409] ? __kasan_unpoison_pages+0x2c/0x40 >>>>>> [ 0.296411] get_page_from_freelist+0xa54/0x1310 >>>>>> [ 0.296413] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>>>> [ 0.296415] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >>>>>> [ 0.296417] ? stack_depot_save_flags+0x3f/0x680 >>>>>> [ 0.296418] ? ___slab_alloc+0x518/0x530 >>>>>> [ 0.296420] alloc_pages_mpol+0x13a/0x3f0 >>>>>> [ 0.296421] ? __pfx_alloc_pages_mpol+0x10/0x10 >>>>>> [ 0.296423] ? _raw_spin_lock_irqsave+0x8a/0xf0 >>>>>> [ 0.296424] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 >>>>>> [ 0.296426] alloc_slab_page+0xc2/0x130 >>>>>> [ 0.296427] allocate_slab+0x77/0x2c0 >>>>>> [ 0.296429] ? syscall_enter_define_fields+0x3bb/0x5f0 >>>>>> [ 0.296430] ___slab_alloc+0x125/0x530 >>>>>> [ 0.296432] ? __trace_define_field+0x252/0x3d0 >>>>>> [ 0.296433] __kmalloc_noprof+0x329/0x630 >>>>>> [ 0.296435] ? syscall_enter_define_fields+0x3bb/0x5f0 >>>>>> [ 0.296436] syscall_enter_define_fields+0x3bb/0x5f0 >>>>>> [ 0.296438] ? __pfx_syscall_enter_define_fields+0x10/0x10 >>>>>> [ 0.296440] event_define_fields+0x326/0x540 >>>>>> [ 0.296441] __trace_early_add_events+0xac/0x3c0 >>>>>> [ 0.296443] trace_event_init+0x24c/0x460 >>>>>> [ 0.296445] trace_init+0x9/0x20 >>>>>> [ 0.296446] start_kernel+0x199/0x3c0 >>>>>> [ 0.296448] x86_64_start_reservations+0x18/0x30 >>>>>> [ 0.296449] x86_64_start_kernel+0xe2/0xf0 >>>>>> [ 0.296451] common_startup_64+0x13e/0x141 >>>>>> [ 0.296453] </TASK> >>>>>> >>>>>> >>>>>> [ 0.312234] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>> page=ffffea000400f900 pfn=1049572 nr=1 >>>>>> [ 0.312234] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >>>>>> [ 0.312236] Hardware name: Red Hat KVM, BIOS >>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>>>> [ 0.312236] Call Trace: >>>>>> [ 0.312237] <TASK> >>>>>> [ 0.312237] dump_stack_lvl+0x53/0x70 >>>>>> [ 0.312239] __pgalloc_tag_add+0x3a3/0x6e0 >>>>>> [ 0.312240] ? __pfx___pgalloc_tag_add+0x10/0x10 >>>>>> [ 0.312241] ? rmqueue.constprop.0+0x4fc/0x1ce0 >>>>>> [ 0.312243] ? kasan_unpoison+0x27/0x60 >>>>>> [ 0.312244] ? __kasan_unpoison_pages+0x2c/0x40 >>>>>> [ 0.312246] get_page_from_freelist+0xa54/0x1310 >>>>>> [ 0.312248] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>>>> [ 0.312250] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >>>>>> [ 0.312253] alloc_slab_page+0x39/0x130 >>>>>> [ 0.312254] allocate_slab+0x77/0x2c0 >>>>>> [ 0.312255] ? alloc_cpumask_var_node+0xc7/0x230 >>>>>> [ 0.312257] ___slab_alloc+0x46d/0x530 >>>>>> [ 0.312259] __kmalloc_node_noprof+0x2fa/0x680 >>>>>> [ 0.312261] ? alloc_cpumask_var_node+0xc7/0x230 >>>>>> [ 0.312263] alloc_cpumask_var_node+0xc7/0x230 >>>>>> [ 0.312264] init_desc+0x141/0x6b0 >>>>>> [ 0.312266] alloc_desc+0x108/0x1b0 >>>>>> [ 0.312267] early_irq_init+0xee/0x1c0 >>>>>> [ 0.312268] ? __pfx_early_irq_init+0x10/0x10 >>>>>> [ 0.312271] start_kernel+0x1ab/0x3c0 >>>>>> [ 0.312272] x86_64_start_reservations+0x18/0x30 >>>>>> [ 0.312274] x86_64_start_kernel+0xe2/0xf0 >>>>>> [ 0.312275] common_startup_64+0x13e/0x141 >>>>>> [ 0.312277] </TASK> >>>>>> >>>>>> [ 0.312834] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>> page=ffffea000400fc00 pfn=1049584 nr=1 >>>>>> [ 0.312835] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >>>>>> [ 0.312836] Hardware name: Red Hat KVM, BIOS >>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>>>> [ 0.312837] Call Trace: >>>>>> [ 0.312837] <TASK> >>>>>> [ 0.312838] dump_stack_lvl+0x53/0x70 >>>>>> [ 0.312840] __pgalloc_tag_add+0x3a3/0x6e0 >>>>>> [ 0.312841] ? __pfx___pgalloc_tag_add+0x10/0x10 >>>>>> [ 0.312842] ? rmqueue.constprop.0+0x4fc/0x1ce0 >>>>>> [ 0.312844] ? kasan_unpoison+0x27/0x60 >>>>>> [ 0.312845] ? __kasan_unpoison_pages+0x2c/0x40 >>>>>> [ 0.312847] get_page_from_freelist+0xa54/0x1310 >>>>>> [ 0.312849] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>>>> [ 0.312851] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >>>>>> [ 0.312853] alloc_pages_mpol+0x13a/0x3f0 >>>>>> [ 0.312855] ? __pfx_alloc_pages_mpol+0x10/0x10 >>>>>> [ 0.312856] ? xas_find+0x2d8/0x450 >>>>>> [ 0.312858] ? _raw_spin_lock+0x84/0xe0 >>>>>> [ 0.312859] ? __pfx__raw_spin_lock+0x10/0x10 >>>>>> [ 0.312861] alloc_pages_noprof+0xf6/0x2b0 >>>>>> [ 0.312862] __change_page_attr+0x293/0x850 >>>>>> [ 0.312864] ? __pfx___change_page_attr+0x10/0x10 >>>>>> [ 0.312865] ? _vm_unmap_aliases+0x2d0/0x650 >>>>>> [ 0.312868] ? __pfx__vm_unmap_aliases+0x10/0x10 >>>>>> [ 0.312869] __change_page_attr_set_clr+0x16c/0x360 >>>>>> [ 0.312871] ? spp_getpage+0xbb/0x1e0 >>>>>> [ 0.312872] change_page_attr_set_clr+0x220/0x3c0 >>>>>> [ 0.312873] ? flush_tlb_one_kernel+0xf/0x30 >>>>>> [ 0.312875] ? set_pte_vaddr_p4d+0x110/0x180 >>>>>> [ 0.312877] ? __pfx_change_page_attr_set_clr+0x10/0x10 >>>>>> [ 0.312878] ? __pfx_set_pte_vaddr_p4d+0x10/0x10 >>>>>> [ 0.312881] ? __pfx_mtree_load+0x10/0x10 >>>>>> [ 0.312883] ? __pfx_mtree_load+0x10/0x10 >>>>>> [ 0.312884] ? __asan_memcpy+0x3c/0x60 >>>>>> [ 0.312886] ? set_intr_gate+0x10c/0x150 >>>>>> [ 0.312888] set_memory_ro+0x76/0xa0 >>>>>> [ 0.312889] ? __pfx_set_memory_ro+0x10/0x10 >>>>>> [ 0.312891] idt_setup_apic_and_irq_gates+0x2c1/0x390 >>>>>> >>>>>> and more. >>>>> Ok, it's not the only place. Got your point. >>>>> >>>>>> off topic - if we were to handle only alloc_page_ext() specifically, >>>>>> what would be the most straightforward >>>>>> >>>>>> solution in your mind? I'd really appreciate your insight. >>>>> I was thinking if it's the only special case maybe we can handle it >>>>> somehow differently, like we do when we allocate obj_ext vectors for >>>>> slabs using __GFP_NO_OBJ_EXT. I haven't found a good solution yet but >>>>> since it's not a special case we would not be able to use it even if I >>>>> came up with something... >>>>> I think your way is the most straight-forward but please try my >>>>> suggestion to see if we can avoid extra overhead. >>>>> Thanks, >>>>> Suren. Hi Suren >> Hi Suren >> >> >>> Hi Hao, >>> >>>> Hi Suren >>>> >>>> Thank you for your feedback. After re-examining this issue, >>>> >>>> I realize my previous focus was misplaced. >>>> >>>> Upon deeper consideration, I understand that this is not merely a bug, >>>> >>>> but rather a warning that indicates a gap in our memory profiling mechanism. >>>> >>>> Specifically, the current implementation appears to be missing memory >>>> allocation >>>> >>>> tracking during the period between the buddy system allocation and page_ext >>>> >>>> initialization. >>>> >>>> This profiling gap means we may not be capturing all relevant memory >>>> allocation >>>> >>>> events during this critical transition phase. >>> Correct, this limitation exists because memory profiling relies on >>> some kernel facilities (page_ext, objj_ext) which might not be >>> initialized yet at the time of allocation. >>> >>>> My approach is to dynamically allocate codetag_ref when get_page_tag_ref >>>> fails, >>>> >>>> and maintain a linked list to track all buddy system allocations that >>>> occur prior to page_ext initialization. >>>> >>>> However, this introduces performance concerns: >>>> >>>> 1. Free Path Overhead: When freeing these pages, we would need to >>>> traverse the entire linked list to locate >>>> >>>> the corresponding codetag_ref, resulting in O(n) lookup complexity >>>> per free operation. >>>> >>>> 2. Initialization Overhead: During init_page_alloc_tagging, iterating >>>> through the linked list to assign codetag_ref to >>>> >>>> page_ext would introduce additional traversal cost. >>>> >>>> If the number of pages is substantial, this could incur significant >>>> overhead. What are your thoughts on this? I look forward to your >>>> suggestions. >>> My thinking is that these early allocations comprise a small portion >>> of overall memory consumed by the system. So, instead of trying to >>> record and handle them in some alternative way, we just accept that >>> some counters might not be exactly accurate and ignore those early >>> allocations. See how the early slab allocations are marked with the >>> CODETAG_FLAG_INACCURATE flag and later reported as inaccurate. I think >>> that's an acceptable alternative to introducing extra complexity and >>> performance overhead. IOW, the benefits of accounting for these early >>> allocations are low compared to the effort required to account for >>> them. Unless you found a simple and performant way to do that... >> >> I have been exploring possible solutions to this issue over the past few >> days, >> >> but so far I have not come up with a good approach. >> >> I have counted the number of memory allocations that occur earlier than the >> >> allocation and initialization of our page_ext, and found that there are >> actually >> >> quite a lot of them. > Interesting... I wonder it's because deferred_struct_pages defers > page_ext initialization. Can you check if setting early_page_ext > reduces or eliminates these allocations before page_ext init cases? Yes, you are correct. In my 8-core 16GB virtual machine, I used a global counter to record these allocations. With early_page_ext enabled, there were 130 allocations before page_ext initialization. Without early_page_ext, there were 802 allocations before page_ext initialization. > >> Similarly, I have made the following changes and collected the >> corresponding logs. >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 2d4b6f1a554e..6db65b3d52d3 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1293,6 +1293,8 @@ void __pgalloc_tag_add(struct page *page, struct >> task_struct *task, >> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >> update_page_tag_ref(handle, &ref); >> put_page_tag_ref(handle); >> + } else{ >> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! >> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); >> } >> } >> >> @@ -1314,6 +1316,8 @@ void __pgalloc_tag_sub(struct page *page, unsigned >> int nr) >> alloc_tag_sub(&ref, PAGE_SIZE * nr); >> update_page_tag_ref(handle, &ref); >> put_page_tag_ref(handle); >> + } else{ >> + pr_warn("__pgalloc_tag_sub: get_page_tag_ref failed! >> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); >> } >> } >> >> [ 0.261699] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001000 pfn=1048640 nr=2 >> [ 0.261711] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001100 pfn=1048644 nr=4 >> [ 0.261717] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001200 pfn=1048648 nr=4 >> [ 0.261721] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001300 pfn=1048652 nr=4 >> [ 0.261893] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001080 pfn=1048642 nr=2 >> [ 0.261917] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001400 pfn=1048656 nr=4 >> [ 0.262018] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001500 pfn=1048660 nr=2 >> [ 0.262024] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001600 pfn=1048664 nr=8 >> [ 0.262040] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001580 pfn=1048662 nr=1 >> [ 0.262048] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea00040015c0 pfn=1048663 nr=1 >> [ 0.262056] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001800 pfn=1048672 nr=2 >> [ 0.262064] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001880 pfn=1048674 nr=2 >> [ 0.262078] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001900 pfn=1048676 nr=2 >> [ 0.262196] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 >> [ 0.262213] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001980 pfn=1048678 nr=2 >> [ 0.262220] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001a00 pfn=1048680 nr=4 >> [ 0.262246] ODEBUG: selftest passed >> [ 0.262268] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001b00 pfn=1048684 nr=1 >> [ 0.262318] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001b40 pfn=1048685 nr=1 >> [ 0.262368] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001b80 pfn=1048686 nr=1 >> [ 0.262418] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001bc0 pfn=1048687 nr=1 >> [ 0.262469] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001c00 pfn=1048688 nr=1 >> [ 0.262519] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001c40 pfn=1048689 nr=1 >> [ 0.262569] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001c80 pfn=1048690 nr=1 >> [ 0.262620] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001cc0 pfn=1048691 nr=1 >> [ 0.262670] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001d00 pfn=1048692 nr=1 >> [ 0.262721] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001d40 pfn=1048693 nr=1 >> [ 0.262771] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001d80 pfn=1048694 nr=1 >> [ 0.262821] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001dc0 pfn=1048695 nr=1 >> [ 0.262871] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001e00 pfn=1048696 nr=1 >> [ 0.262923] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001e40 pfn=1048697 nr=1 >> [ 0.262974] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001e80 pfn=1048698 nr=1 >> [ 0.263024] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001ec0 pfn=1048699 nr=1 >> [ 0.263074] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001f00 pfn=1048700 nr=1 >> [ 0.263124] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001f40 pfn=1048701 nr=1 >> [ 0.263174] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001f80 pfn=1048702 nr=1 >> [ 0.263224] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004001fc0 pfn=1048703 nr=1 >> [ 0.263275] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004002000 pfn=1048704 nr=1 >> [ 0.263325] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004002040 pfn=1048705 nr=1 >> [ 0.263375] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004002080 pfn=1048706 nr=1 >> [ 0.263427] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004002400 pfn=1048720 nr=16 >> [ 0.263437] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea00040020c0 pfn=1048707 nr=1 >> [ 0.263463] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004002100 pfn=1048708 nr=1 >> [ 0.263465] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004002140 pfn=1048709 nr=1 >> [ 0.263467] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004002180 pfn=1048710 nr=1 >> [ 0.263509] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004002200 pfn=1048712 nr=4 >> [ 0.263512] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004002800 pfn=1048736 nr=8 >> [ 0.263524] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea00040021c0 pfn=1048711 nr=1 >> [ 0.263536] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004002300 pfn=1048716 nr=1 >> [ 0.263537] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004002340 pfn=1048717 nr=1 >> [ 0.263539] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004002380 pfn=1048718 nr=1 >> [ 0.263604] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004004000 pfn=1048832 nr=128 >> [ 0.263638] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004003000 pfn=1048768 nr=64 >> [ 0.263650] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004002c00 pfn=1048752 nr=16 >> [ 0.263655] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea00040023c0 pfn=1048719 nr=1 >> [ 0.270582] __pgalloc_tag_sub: get_page_tag_ref failed! >> page=ffffea00040023c0 pfn=1048719 nr=1 >> [ 0.270591] ftrace: allocating 52717 entries in 208 pages >> [ 0.270592] ftrace: allocated 208 pages with 3 groups >> [ 0.270620] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004002a00 pfn=1048744 nr=8 >> [ 0.270636] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea00040023c0 pfn=1048719 nr=1 >> [ 0.270643] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006000 pfn=1048960 nr=1 >> [ 0.270649] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006040 pfn=1048961 nr=1 >> [ 0.270658] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004007000 pfn=1049024 nr=64 >> [ 0.270659] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006080 pfn=1048962 nr=2 >> [ 0.270722] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006100 pfn=1048964 nr=1 >> [ 0.270730] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006140 pfn=1048965 nr=1 >> [ 0.270738] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006180 pfn=1048966 nr=1 >> [ 0.270777] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea00040061c0 pfn=1048967 nr=1 >> [ 0.270786] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006200 pfn=1048968 nr=1 >> [ 0.270792] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006240 pfn=1048969 nr=1 >> [ 0.270833] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006300 pfn=1048972 nr=4 >> [ 0.270891] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006280 pfn=1048970 nr=1 >> [ 0.270980] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea00040062c0 pfn=1048971 nr=1 >> [ 0.271071] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006400 pfn=1048976 nr=1 >> [ 0.271156] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006440 pfn=1048977 nr=1 >> [ 0.271185] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006480 pfn=1048978 nr=2 >> [ 0.271301] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006500 pfn=1048980 nr=1 >> [ 0.271655] Dynamic Preempt: lazy >> [ 0.271662] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006580 pfn=1048982 nr=2 >> [ 0.271752] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006600 pfn=1048984 nr=4 >> [ 0.271762] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004010000 pfn=1049600 nr=4 >> [ 0.271824] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006540 pfn=1048981 nr=1 >> [ 0.271916] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006700 pfn=1048988 nr=2 >> [ 0.271964] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006780 pfn=1048990 nr=1 >> [ 0.272099] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea00040067c0 pfn=1048991 nr=1 >> [ 0.272138] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006800 pfn=1048992 nr=2 >> [ 0.272144] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006a00 pfn=1049000 nr=8 >> [ 0.272249] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006c00 pfn=1049008 nr=8 >> [ 0.272319] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006880 pfn=1048994 nr=2 >> [ 0.272351] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006900 pfn=1048996 nr=4 >> [ 0.272424] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004006e00 pfn=1049016 nr=8 >> [ 0.272485] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004008000 pfn=1049088 nr=8 >> [ 0.272535] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004008200 pfn=1049096 nr=2 >> [ 0.272600] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004008400 pfn=1049104 nr=8 >> [ 0.272663] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004008300 pfn=1049100 nr=4 >> [ 0.272694] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004008280 pfn=1049098 nr=2 >> [ 0.272708] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004008600 pfn=1049112 nr=8 >> >> [ 0.272924] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004008880 pfn=1049122 nr=2 >> [ 0.272934] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004008900 pfn=1049124 nr=2 >> [ 0.272952] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004008c00 pfn=1049136 nr=4 >> [ 0.273035] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004008980 pfn=1049126 nr=2 >> [ 0.273062] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004008e00 pfn=1049144 nr=8 >> [ 0.273674] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004008d00 pfn=1049140 nr=1 >> [ 0.273884] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004008d80 pfn=1049142 nr=2 >> [ 0.273943] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009000 pfn=1049152 nr=2 >> [ 0.274379] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009080 pfn=1049154 nr=2 >> [ 0.274575] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009200 pfn=1049160 nr=8 >> [ 0.274617] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009100 pfn=1049156 nr=4 >> [ 0.274794] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009400 pfn=1049168 nr=2 >> [ 0.274840] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009480 pfn=1049170 nr=2 >> [ 0.275057] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009500 pfn=1049172 nr=2 >> [ 0.275092] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009580 pfn=1049174 nr=2 >> [ 0.275134] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009600 pfn=1049176 nr=8 >> [ 0.275211] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009800 pfn=1049184 nr=4 >> [ 0.275510] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009900 pfn=1049188 nr=2 >> [ 0.275548] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009980 pfn=1049190 nr=2 >> [ 0.275976] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009a00 pfn=1049192 nr=8 >> [ 0.275987] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009c00 pfn=1049200 nr=2 >> [ 0.276139] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009c80 pfn=1049202 nr=2 >> [ 0.276152] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004008d40 pfn=1049141 nr=1 >> [ 0.276242] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009d00 pfn=1049204 nr=1 >> [ 0.276358] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009d40 pfn=1049205 nr=1 >> [ 0.276444] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009d80 pfn=1049206 nr=1 >> [ 0.276526] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009dc0 pfn=1049207 nr=1 >> [ 0.276615] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009e00 pfn=1049208 nr=1 >> [ 0.276696] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009e40 pfn=1049209 nr=1 >> [ 0.276792] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009e80 pfn=1049210 nr=1 >> [ 0.276827] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009f00 pfn=1049212 nr=2 >> [ 0.276891] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009ec0 pfn=1049211 nr=1 >> [ 0.276999] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009f80 pfn=1049214 nr=1 >> [ 0.277082] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea0004009fc0 pfn=1049215 nr=1 >> [ 0.277172] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea000400a000 pfn=1049216 nr=1 >> [ 0.277257] __pgalloc_tag_add: get_page_tag_ref failed! >> page=ffffea000400a040 pfn=1049217 nr=1 >> >> and so on. >> >> >>> I think your earlier patch can effectively detect these early >>> allocations and suppress the warnings. We should also mark these >>> allocations with CODETAG_FLAG_INACCURATE. >> Thanks to an excellent AI review, I realized there are issues with >> >> my original patch. One problem is the 256-element array; another > Yes, if there are lots of such allocations, it's not appropriate. > >> is that it involves allocation and free operations — meaning we need >> >> to record entries at __pgalloc_tag_add and remove them at __pgalloc_tag_sub, >> >> which introduces a noticeable overhead. I'm wondering if we can instead >> set a flag >> >> bit in page flags during the early boot stage, which I'll refer to as >> EARLY_ALLOC_FLAGS. >> >> Then, in __pgalloc_tag_sub, we first check for EARLY_ALLOC_FLAGS. If >> set, we clear the >> >> flag and return immediately; otherwise, we perform the actual >> subtraction of the tag count. >> >> This approach seems somewhat similar to the idea behind >> mem_profiling_compressed. > That seems doable but let's first check if we can make page_ext > initialization happen before these allocations. That would be the > ideal path. If it's not possible then we can focus on alternatives > like the one you propose. Yes, the ideal scenario would be to have page_ext initialization complete before these allocations occur. I just did a code walkthrough and found that this resembles the FLATMEM implementation approach - FLATMEM allocates page_ext before the buddy system initialization, so it doesn't seem to encounter the issue we're facing now. https://elixir.bootlin.com/linux/v7.0-rc5/source/mm/mm_init.c#L2707 However, I'm not entirely certain whether SPARSEMEM can guarantee the same behavior. > >> I would appreciate your valuable feedback and any better suggestions you >> might have. > Thanks for pursuing this! I'll help in any way I can. > Suren. Thank you so much for your patient guidance and assistance. I truly appreciate your willingness to share your knowledge and insights. Thanks, Hao >> Thanks >> >> Hao >> >>> Thanks, >>> Suren. >>> >>>> Thanks >>>> >>>> Hao >>>> >>>>>> Thanks. >>>>>> >>>>>> >>>>>>>>>> If the slab cache has no free objects, it falls back >>>>>>>>>> to the buddy allocator to allocate memory. However, at this point page_ext >>>>>>>>>> is not yet fully initialized, so these newly allocated pages have no >>>>>>>>>> codetag set. These pages may later be reclaimed by KASAN,which causes >>>>>>>>>> the warning to trigger when they are freed because their codetag ref is >>>>>>>>>> still empty. >>>>>>>>>> >>>>>>>>>> Use a global array to track pages allocated before page_ext is fully >>>>>>>>>> initialized, similar to how kmemleak tracks early allocations. >>>>>>>>>> When page_ext initialization completes, set their codetag >>>>>>>>>> to empty to avoid warnings when they are freed later. >>>>>>>>>> >>>>>>>>>> ... >>>>>>>>>> >>>>>>>>>> --- a/include/linux/alloc_tag.h >>>>>>>>>> +++ b/include/linux/alloc_tag.h >>>>>>>>>> @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) >>>>>>>>>> >>>>>>>>>> #ifdef CONFIG_MEM_ALLOC_PROFILING >>>>>>>>>> >>>>>>>>>> +bool mem_profiling_is_available(void); >>>>>>>>>> +void alloc_tag_add_early_pfn(unsigned long pfn); >>>>>>>>>> + >>>>>>>>>> #define ALLOC_TAG_SECTION_NAME "alloc_tags" >>>>>>>>>> >>>>>>>>>> struct codetag_bytes { >>>>>>>>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c >>>>>>>>>> index 58991ab09d84..a5bf4e72c154 100644 >>>>>>>>>> --- a/lib/alloc_tag.c >>>>>>>>>> +++ b/lib/alloc_tag.c >>>>>>>>>> @@ -6,6 +6,7 @@ >>>>>>>>>> #include <linux/kallsyms.h> >>>>>>>>>> #include <linux/module.h> >>>>>>>>>> #include <linux/page_ext.h> >>>>>>>>>> +#include <linux/pgalloc_tag.h> >>>>>>>>>> #include <linux/proc_fs.h> >>>>>>>>>> #include <linux/seq_buf.h> >>>>>>>>>> #include <linux/seq_file.h> >>>>>>>>>> @@ -26,6 +27,82 @@ static bool mem_profiling_support; >>>>>>>>>> >>>>>>>>>> static struct codetag_type *alloc_tag_cttype; >>>>>>>>>> >>>>>>>>>> +/* >>>>>>>>>> + * State of the alloc_tag >>>>>>>>>> + * >>>>>>>>>> + * This is used to describe the states of the alloc_tag during bootup. >>>>>>>>>> + * >>>>>>>>>> + * When we need to allocate page_ext to store codetag, we face an >>>>>>>>>> + * initialization timing problem: >>>>>>>>>> + * >>>>>>>>>> + * Due to initialization order, pages may be allocated via buddy system >>>>>>>>>> + * before page_ext is fully allocated and initialized. Although these >>>>>>>>>> + * pages call the allocation hooks, the codetag will not be set because >>>>>>>>>> + * page_ext is not yet available. >>>>>>>>>> + * >>>>>>>>>> + * When these pages are later free to the buddy system, it triggers >>>>>>>>>> + * warnings because their codetag is actually empty if >>>>>>>>>> + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. >>>>>>>>>> + * >>>>>>>>>> + * Additionally, in this situation, we cannot record detailed allocation >>>>>>>>>> + * information for these pages. >>>>>>>>>> + */ >>>>>>>>>> +enum mem_profiling_state { >>>>>>>>>> + DOWN, /* No mem_profiling functionality yet */ >>>>>>>>>> + UP /* Everything is working */ >>>>>>>>>> +}; >>>>>>>>>> + >>>>>>>>>> +static enum mem_profiling_state mem_profiling_state = DOWN; >>>>>>>>>> + >>>>>>>>>> +bool mem_profiling_is_available(void) >>>>>>>>>> +{ >>>>>>>>>> + return mem_profiling_state == UP; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG >>>>>>>>>> + >>>>>>>>>> +#define EARLY_ALLOC_PFN_MAX 256 >>>>>>>>>> + >>>>>>>>>> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; >>>>>>>>> It's unfortunate that this isn't __initdata. >>>>>>>>> >>>>>>>>>> +static unsigned int early_pfn_count; >>>>>>>>>> +static DEFINE_SPINLOCK(early_pfn_lock); >>>>>>>>>> + >>>>>>>>>> >>>>>>>>>> ... >>>>>>>>>> >>>>>>>>>> --- a/mm/page_alloc.c >>>>>>>>>> +++ b/mm/page_alloc.c >>>>>>>>>> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, >>>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >>>>>>>>>> update_page_tag_ref(handle, &ref); >>>>>>>>>> put_page_tag_ref(handle); >>>>>>>>>> + } else { >>>>>>>> This branch can be marked as "unlikely". >>>>>>>> >>>>>>>>>> + /* >>>>>>>>>> + * page_ext is not available yet, record the pfn so we can >>>>>>>>>> + * clear the tag ref later when page_ext is initialized. >>>>>>>>>> + */ >>>>>>>>>> + if (!mem_profiling_is_available()) >>>>>>>>>> + alloc_tag_add_early_pfn(page_to_pfn(page)); >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>> All because of this, I believe. Is this fixable? >>>>>>>>> >>>>>>>>> If we take that `else', we know we're running in __init code, yes? I >>>>>>>>> don't see how `__init pgalloc_tag_add_early()' could be made to work. >>>>>>>>> hrm. Something clever, please. >>>>>>>> We can have a pointer to a function that is initialized to point to >>>>>>>> alloc_tag_add_early_pfn, which is defined as __init and uses >>>>>>>> early_pfns which now can be defined as __initdata. After >>>>>>>> clear_early_alloc_pfn_tag_refs() is done we reset that pointer to >>>>>>>> NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() >>>>>>>> directly checks that pointer and if it's not NULL then calls the >>>>>>>> function that it points to. This way __pgalloc_tag_add() which is not >>>>>>>> an __init function will be invoking alloc_tag_add_early_pfn() __init >>>>>>>> function only until we are done with initialization. I haven't tried >>>>>>>> this but I think that should work. This also eliminates the need for >>>>>>>> mem_profiling_state variable since we can use this function pointer >>>>>>>> instead. >>>>>>>> >>>>>>>> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-25 2:07 ` Hao Ge @ 2026-03-25 6:25 ` Suren Baghdasaryan 2026-03-25 7:35 ` Suren Baghdasaryan 0 siblings, 1 reply; 21+ messages in thread From: Suren Baghdasaryan @ 2026-03-25 6:25 UTC (permalink / raw) To: Hao Ge; +Cc: Andrew Morton, Kent Overstreet, linux-mm, linux-kernel On Tue, Mar 24, 2026 at 7:08 PM Hao Ge <hao.ge@linux.dev> wrote: > > > On 2026/3/25 08:21, Suren Baghdasaryan wrote: > > On Tue, Mar 24, 2026 at 2:43 AM Hao Ge <hao.ge@linux.dev> wrote: > >> > >> On 2026/3/24 06:47, Suren Baghdasaryan wrote: > >>> On Mon, Mar 23, 2026 at 2:16 AM Hao Ge <hao.ge@linux.dev> wrote: > >>>> On 2026/3/20 10:14, Suren Baghdasaryan wrote: > >>>>> On Thu, Mar 19, 2026 at 6:58 PM Hao Ge <hao.ge@linux.dev> wrote: > >>>>>> On 2026/3/20 07:48, Suren Baghdasaryan wrote: > >>>>>>> On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@google.com> wrote: > >>>>>>>> On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: > >>>>>>>>> On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: > >>>>>>>>> > >>>>>>>>>> Due to initialization ordering, page_ext is allocated and initialized > >>>>>>>>>> relatively late during boot. Some pages have already been allocated > >>>>>>>>>> and freed before page_ext becomes available, leaving their codetag > >>>>>>>>>> uninitialized. > >>>>>>>> Hi Hao, > >>>>>>>> Thanks for the report. > >>>>>>>> Hmm. So, we are allocating pages before page_ext is initialized... > >>>>>>>> > >>>>>>>>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls > >>>>>>>>>> kmemleak_alloc(). > >>>>>>> Forgot to ask. The example you are using here is for page_ext > >>>>>>> allocation itself. Do you have any other examples where page > >>>>>>> allocation happens before page_ext initialization? If that's the only > >>>>>>> place, then we might be able to fix this in a simpler way by doing > >>>>>>> something special for alloc_page_ext(). > >>>>>> Hi Suren > >>>>>> > >>>>>> To help illustrate the point, here's the debug log I added: > >>>>>> > >>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>>>>> index 2d4b6f1a554e..ebfe636f5b07 100644 > >>>>>> --- a/mm/page_alloc.c > >>>>>> +++ b/mm/page_alloc.c > >>>>>> @@ -1293,6 +1293,9 @@ void __pgalloc_tag_add(struct page *page, struct > >>>>>> task_struct *task, > >>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >>>>>> update_page_tag_ref(handle, &ref); > >>>>>> put_page_tag_ref(handle); > >>>>>> + } else { > >>>>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! > >>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > >>>>>> + dump_stack(); > >>>>>> } > >>>>>> } > >>>>>> > >>>>>> > >>>>>> And I caught the following logs: > >>>>>> > >>>>>> [ 0.296399] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>> page=ffffea000400c700 pfn=1049372 nr=1 > >>>>>> [ 0.296400] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >>>>>> [ 0.296402] Hardware name: Red Hat KVM, BIOS > >>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>>>> [ 0.296402] Call Trace: > >>>>>> [ 0.296403] <TASK> > >>>>>> [ 0.296403] dump_stack_lvl+0x53/0x70 > >>>>>> [ 0.296405] __pgalloc_tag_add+0x3a3/0x6e0 > >>>>>> [ 0.296406] ? __pfx___pgalloc_tag_add+0x10/0x10 > >>>>>> [ 0.296407] ? kasan_unpoison+0x27/0x60 > >>>>>> [ 0.296409] ? __kasan_unpoison_pages+0x2c/0x40 > >>>>>> [ 0.296411] get_page_from_freelist+0xa54/0x1310 > >>>>>> [ 0.296413] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>>>> [ 0.296415] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >>>>>> [ 0.296417] ? stack_depot_save_flags+0x3f/0x680 > >>>>>> [ 0.296418] ? ___slab_alloc+0x518/0x530 > >>>>>> [ 0.296420] alloc_pages_mpol+0x13a/0x3f0 > >>>>>> [ 0.296421] ? __pfx_alloc_pages_mpol+0x10/0x10 > >>>>>> [ 0.296423] ? _raw_spin_lock_irqsave+0x8a/0xf0 > >>>>>> [ 0.296424] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 > >>>>>> [ 0.296426] alloc_slab_page+0xc2/0x130 > >>>>>> [ 0.296427] allocate_slab+0x77/0x2c0 > >>>>>> [ 0.296429] ? syscall_enter_define_fields+0x3bb/0x5f0 > >>>>>> [ 0.296430] ___slab_alloc+0x125/0x530 > >>>>>> [ 0.296432] ? __trace_define_field+0x252/0x3d0 > >>>>>> [ 0.296433] __kmalloc_noprof+0x329/0x630 > >>>>>> [ 0.296435] ? syscall_enter_define_fields+0x3bb/0x5f0 > >>>>>> [ 0.296436] syscall_enter_define_fields+0x3bb/0x5f0 > >>>>>> [ 0.296438] ? __pfx_syscall_enter_define_fields+0x10/0x10 > >>>>>> [ 0.296440] event_define_fields+0x326/0x540 > >>>>>> [ 0.296441] __trace_early_add_events+0xac/0x3c0 > >>>>>> [ 0.296443] trace_event_init+0x24c/0x460 > >>>>>> [ 0.296445] trace_init+0x9/0x20 > >>>>>> [ 0.296446] start_kernel+0x199/0x3c0 > >>>>>> [ 0.296448] x86_64_start_reservations+0x18/0x30 > >>>>>> [ 0.296449] x86_64_start_kernel+0xe2/0xf0 > >>>>>> [ 0.296451] common_startup_64+0x13e/0x141 > >>>>>> [ 0.296453] </TASK> > >>>>>> > >>>>>> > >>>>>> [ 0.312234] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>> page=ffffea000400f900 pfn=1049572 nr=1 > >>>>>> [ 0.312234] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >>>>>> [ 0.312236] Hardware name: Red Hat KVM, BIOS > >>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>>>> [ 0.312236] Call Trace: > >>>>>> [ 0.312237] <TASK> > >>>>>> [ 0.312237] dump_stack_lvl+0x53/0x70 > >>>>>> [ 0.312239] __pgalloc_tag_add+0x3a3/0x6e0 > >>>>>> [ 0.312240] ? __pfx___pgalloc_tag_add+0x10/0x10 > >>>>>> [ 0.312241] ? rmqueue.constprop.0+0x4fc/0x1ce0 > >>>>>> [ 0.312243] ? kasan_unpoison+0x27/0x60 > >>>>>> [ 0.312244] ? __kasan_unpoison_pages+0x2c/0x40 > >>>>>> [ 0.312246] get_page_from_freelist+0xa54/0x1310 > >>>>>> [ 0.312248] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>>>> [ 0.312250] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >>>>>> [ 0.312253] alloc_slab_page+0x39/0x130 > >>>>>> [ 0.312254] allocate_slab+0x77/0x2c0 > >>>>>> [ 0.312255] ? alloc_cpumask_var_node+0xc7/0x230 > >>>>>> [ 0.312257] ___slab_alloc+0x46d/0x530 > >>>>>> [ 0.312259] __kmalloc_node_noprof+0x2fa/0x680 > >>>>>> [ 0.312261] ? alloc_cpumask_var_node+0xc7/0x230 > >>>>>> [ 0.312263] alloc_cpumask_var_node+0xc7/0x230 > >>>>>> [ 0.312264] init_desc+0x141/0x6b0 > >>>>>> [ 0.312266] alloc_desc+0x108/0x1b0 > >>>>>> [ 0.312267] early_irq_init+0xee/0x1c0 > >>>>>> [ 0.312268] ? __pfx_early_irq_init+0x10/0x10 > >>>>>> [ 0.312271] start_kernel+0x1ab/0x3c0 > >>>>>> [ 0.312272] x86_64_start_reservations+0x18/0x30 > >>>>>> [ 0.312274] x86_64_start_kernel+0xe2/0xf0 > >>>>>> [ 0.312275] common_startup_64+0x13e/0x141 > >>>>>> [ 0.312277] </TASK> > >>>>>> > >>>>>> [ 0.312834] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>> page=ffffea000400fc00 pfn=1049584 nr=1 > >>>>>> [ 0.312835] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >>>>>> [ 0.312836] Hardware name: Red Hat KVM, BIOS > >>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>>>> [ 0.312837] Call Trace: > >>>>>> [ 0.312837] <TASK> > >>>>>> [ 0.312838] dump_stack_lvl+0x53/0x70 > >>>>>> [ 0.312840] __pgalloc_tag_add+0x3a3/0x6e0 > >>>>>> [ 0.312841] ? __pfx___pgalloc_tag_add+0x10/0x10 > >>>>>> [ 0.312842] ? rmqueue.constprop.0+0x4fc/0x1ce0 > >>>>>> [ 0.312844] ? kasan_unpoison+0x27/0x60 > >>>>>> [ 0.312845] ? __kasan_unpoison_pages+0x2c/0x40 > >>>>>> [ 0.312847] get_page_from_freelist+0xa54/0x1310 > >>>>>> [ 0.312849] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>>>> [ 0.312851] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >>>>>> [ 0.312853] alloc_pages_mpol+0x13a/0x3f0 > >>>>>> [ 0.312855] ? __pfx_alloc_pages_mpol+0x10/0x10 > >>>>>> [ 0.312856] ? xas_find+0x2d8/0x450 > >>>>>> [ 0.312858] ? _raw_spin_lock+0x84/0xe0 > >>>>>> [ 0.312859] ? __pfx__raw_spin_lock+0x10/0x10 > >>>>>> [ 0.312861] alloc_pages_noprof+0xf6/0x2b0 > >>>>>> [ 0.312862] __change_page_attr+0x293/0x850 > >>>>>> [ 0.312864] ? __pfx___change_page_attr+0x10/0x10 > >>>>>> [ 0.312865] ? _vm_unmap_aliases+0x2d0/0x650 > >>>>>> [ 0.312868] ? __pfx__vm_unmap_aliases+0x10/0x10 > >>>>>> [ 0.312869] __change_page_attr_set_clr+0x16c/0x360 > >>>>>> [ 0.312871] ? spp_getpage+0xbb/0x1e0 > >>>>>> [ 0.312872] change_page_attr_set_clr+0x220/0x3c0 > >>>>>> [ 0.312873] ? flush_tlb_one_kernel+0xf/0x30 > >>>>>> [ 0.312875] ? set_pte_vaddr_p4d+0x110/0x180 > >>>>>> [ 0.312877] ? __pfx_change_page_attr_set_clr+0x10/0x10 > >>>>>> [ 0.312878] ? __pfx_set_pte_vaddr_p4d+0x10/0x10 > >>>>>> [ 0.312881] ? __pfx_mtree_load+0x10/0x10 > >>>>>> [ 0.312883] ? __pfx_mtree_load+0x10/0x10 > >>>>>> [ 0.312884] ? __asan_memcpy+0x3c/0x60 > >>>>>> [ 0.312886] ? set_intr_gate+0x10c/0x150 > >>>>>> [ 0.312888] set_memory_ro+0x76/0xa0 > >>>>>> [ 0.312889] ? __pfx_set_memory_ro+0x10/0x10 > >>>>>> [ 0.312891] idt_setup_apic_and_irq_gates+0x2c1/0x390 > >>>>>> > >>>>>> and more. > >>>>> Ok, it's not the only place. Got your point. > >>>>> > >>>>>> off topic - if we were to handle only alloc_page_ext() specifically, > >>>>>> what would be the most straightforward > >>>>>> > >>>>>> solution in your mind? I'd really appreciate your insight. > >>>>> I was thinking if it's the only special case maybe we can handle it > >>>>> somehow differently, like we do when we allocate obj_ext vectors for > >>>>> slabs using __GFP_NO_OBJ_EXT. I haven't found a good solution yet but > >>>>> since it's not a special case we would not be able to use it even if I > >>>>> came up with something... > >>>>> I think your way is the most straight-forward but please try my > >>>>> suggestion to see if we can avoid extra overhead. > >>>>> Thanks, > >>>>> Suren. > Hi Suren > >> Hi Suren > >> > >> > >>> Hi Hao, > >>> > >>>> Hi Suren > >>>> > >>>> Thank you for your feedback. After re-examining this issue, > >>>> > >>>> I realize my previous focus was misplaced. > >>>> > >>>> Upon deeper consideration, I understand that this is not merely a bug, > >>>> > >>>> but rather a warning that indicates a gap in our memory profiling mechanism. > >>>> > >>>> Specifically, the current implementation appears to be missing memory > >>>> allocation > >>>> > >>>> tracking during the period between the buddy system allocation and page_ext > >>>> > >>>> initialization. > >>>> > >>>> This profiling gap means we may not be capturing all relevant memory > >>>> allocation > >>>> > >>>> events during this critical transition phase. > >>> Correct, this limitation exists because memory profiling relies on > >>> some kernel facilities (page_ext, objj_ext) which might not be > >>> initialized yet at the time of allocation. > >>> > >>>> My approach is to dynamically allocate codetag_ref when get_page_tag_ref > >>>> fails, > >>>> > >>>> and maintain a linked list to track all buddy system allocations that > >>>> occur prior to page_ext initialization. > >>>> > >>>> However, this introduces performance concerns: > >>>> > >>>> 1. Free Path Overhead: When freeing these pages, we would need to > >>>> traverse the entire linked list to locate > >>>> > >>>> the corresponding codetag_ref, resulting in O(n) lookup complexity > >>>> per free operation. > >>>> > >>>> 2. Initialization Overhead: During init_page_alloc_tagging, iterating > >>>> through the linked list to assign codetag_ref to > >>>> > >>>> page_ext would introduce additional traversal cost. > >>>> > >>>> If the number of pages is substantial, this could incur significant > >>>> overhead. What are your thoughts on this? I look forward to your > >>>> suggestions. > >>> My thinking is that these early allocations comprise a small portion > >>> of overall memory consumed by the system. So, instead of trying to > >>> record and handle them in some alternative way, we just accept that > >>> some counters might not be exactly accurate and ignore those early > >>> allocations. See how the early slab allocations are marked with the > >>> CODETAG_FLAG_INACCURATE flag and later reported as inaccurate. I think > >>> that's an acceptable alternative to introducing extra complexity and > >>> performance overhead. IOW, the benefits of accounting for these early > >>> allocations are low compared to the effort required to account for > >>> them. Unless you found a simple and performant way to do that... > >> > >> I have been exploring possible solutions to this issue over the past few > >> days, > >> > >> but so far I have not come up with a good approach. > >> > >> I have counted the number of memory allocations that occur earlier than the > >> > >> allocation and initialization of our page_ext, and found that there are > >> actually > >> > >> quite a lot of them. > > Interesting... I wonder it's because deferred_struct_pages defers > > page_ext initialization. Can you check if setting early_page_ext > > reduces or eliminates these allocations before page_ext init cases? > > Yes, you are correct. In my 8-core 16GB virtual machine, I used a global > counter > > to record these allocations. With early_page_ext enabled, there were 130 > allocations > > before page_ext initialization. Without early_page_ext, there were 802 > allocations > > before page_ext initialization. > > > > > >> Similarly, I have made the following changes and collected the > >> corresponding logs. > >> > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index 2d4b6f1a554e..6db65b3d52d3 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -1293,6 +1293,8 @@ void __pgalloc_tag_add(struct page *page, struct > >> task_struct *task, > >> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >> update_page_tag_ref(handle, &ref); > >> put_page_tag_ref(handle); > >> + } else{ > >> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! > >> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > >> } > >> } > >> > >> @@ -1314,6 +1316,8 @@ void __pgalloc_tag_sub(struct page *page, unsigned > >> int nr) > >> alloc_tag_sub(&ref, PAGE_SIZE * nr); > >> update_page_tag_ref(handle, &ref); > >> put_page_tag_ref(handle); > >> + } else{ > >> + pr_warn("__pgalloc_tag_sub: get_page_tag_ref failed! > >> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > >> } > >> } > >> > >> [ 0.261699] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001000 pfn=1048640 nr=2 > >> [ 0.261711] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001100 pfn=1048644 nr=4 > >> [ 0.261717] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001200 pfn=1048648 nr=4 > >> [ 0.261721] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001300 pfn=1048652 nr=4 > >> [ 0.261893] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001080 pfn=1048642 nr=2 > >> [ 0.261917] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001400 pfn=1048656 nr=4 > >> [ 0.262018] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001500 pfn=1048660 nr=2 > >> [ 0.262024] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001600 pfn=1048664 nr=8 > >> [ 0.262040] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001580 pfn=1048662 nr=1 > >> [ 0.262048] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea00040015c0 pfn=1048663 nr=1 > >> [ 0.262056] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001800 pfn=1048672 nr=2 > >> [ 0.262064] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001880 pfn=1048674 nr=2 > >> [ 0.262078] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001900 pfn=1048676 nr=2 > >> [ 0.262196] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 > >> [ 0.262213] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001980 pfn=1048678 nr=2 > >> [ 0.262220] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001a00 pfn=1048680 nr=4 > >> [ 0.262246] ODEBUG: selftest passed > >> [ 0.262268] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001b00 pfn=1048684 nr=1 > >> [ 0.262318] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001b40 pfn=1048685 nr=1 > >> [ 0.262368] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001b80 pfn=1048686 nr=1 > >> [ 0.262418] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001bc0 pfn=1048687 nr=1 > >> [ 0.262469] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001c00 pfn=1048688 nr=1 > >> [ 0.262519] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001c40 pfn=1048689 nr=1 > >> [ 0.262569] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001c80 pfn=1048690 nr=1 > >> [ 0.262620] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001cc0 pfn=1048691 nr=1 > >> [ 0.262670] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001d00 pfn=1048692 nr=1 > >> [ 0.262721] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001d40 pfn=1048693 nr=1 > >> [ 0.262771] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001d80 pfn=1048694 nr=1 > >> [ 0.262821] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001dc0 pfn=1048695 nr=1 > >> [ 0.262871] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001e00 pfn=1048696 nr=1 > >> [ 0.262923] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001e40 pfn=1048697 nr=1 > >> [ 0.262974] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001e80 pfn=1048698 nr=1 > >> [ 0.263024] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001ec0 pfn=1048699 nr=1 > >> [ 0.263074] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001f00 pfn=1048700 nr=1 > >> [ 0.263124] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001f40 pfn=1048701 nr=1 > >> [ 0.263174] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001f80 pfn=1048702 nr=1 > >> [ 0.263224] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004001fc0 pfn=1048703 nr=1 > >> [ 0.263275] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004002000 pfn=1048704 nr=1 > >> [ 0.263325] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004002040 pfn=1048705 nr=1 > >> [ 0.263375] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004002080 pfn=1048706 nr=1 > >> [ 0.263427] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004002400 pfn=1048720 nr=16 > >> [ 0.263437] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea00040020c0 pfn=1048707 nr=1 > >> [ 0.263463] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004002100 pfn=1048708 nr=1 > >> [ 0.263465] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004002140 pfn=1048709 nr=1 > >> [ 0.263467] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004002180 pfn=1048710 nr=1 > >> [ 0.263509] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004002200 pfn=1048712 nr=4 > >> [ 0.263512] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004002800 pfn=1048736 nr=8 > >> [ 0.263524] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea00040021c0 pfn=1048711 nr=1 > >> [ 0.263536] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004002300 pfn=1048716 nr=1 > >> [ 0.263537] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004002340 pfn=1048717 nr=1 > >> [ 0.263539] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004002380 pfn=1048718 nr=1 > >> [ 0.263604] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004004000 pfn=1048832 nr=128 > >> [ 0.263638] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004003000 pfn=1048768 nr=64 > >> [ 0.263650] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004002c00 pfn=1048752 nr=16 > >> [ 0.263655] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea00040023c0 pfn=1048719 nr=1 > >> [ 0.270582] __pgalloc_tag_sub: get_page_tag_ref failed! > >> page=ffffea00040023c0 pfn=1048719 nr=1 > >> [ 0.270591] ftrace: allocating 52717 entries in 208 pages > >> [ 0.270592] ftrace: allocated 208 pages with 3 groups > >> [ 0.270620] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004002a00 pfn=1048744 nr=8 > >> [ 0.270636] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea00040023c0 pfn=1048719 nr=1 > >> [ 0.270643] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006000 pfn=1048960 nr=1 > >> [ 0.270649] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006040 pfn=1048961 nr=1 > >> [ 0.270658] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004007000 pfn=1049024 nr=64 > >> [ 0.270659] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006080 pfn=1048962 nr=2 > >> [ 0.270722] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006100 pfn=1048964 nr=1 > >> [ 0.270730] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006140 pfn=1048965 nr=1 > >> [ 0.270738] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006180 pfn=1048966 nr=1 > >> [ 0.270777] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea00040061c0 pfn=1048967 nr=1 > >> [ 0.270786] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006200 pfn=1048968 nr=1 > >> [ 0.270792] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006240 pfn=1048969 nr=1 > >> [ 0.270833] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006300 pfn=1048972 nr=4 > >> [ 0.270891] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006280 pfn=1048970 nr=1 > >> [ 0.270980] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea00040062c0 pfn=1048971 nr=1 > >> [ 0.271071] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006400 pfn=1048976 nr=1 > >> [ 0.271156] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006440 pfn=1048977 nr=1 > >> [ 0.271185] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006480 pfn=1048978 nr=2 > >> [ 0.271301] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006500 pfn=1048980 nr=1 > >> [ 0.271655] Dynamic Preempt: lazy > >> [ 0.271662] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006580 pfn=1048982 nr=2 > >> [ 0.271752] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006600 pfn=1048984 nr=4 > >> [ 0.271762] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004010000 pfn=1049600 nr=4 > >> [ 0.271824] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006540 pfn=1048981 nr=1 > >> [ 0.271916] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006700 pfn=1048988 nr=2 > >> [ 0.271964] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006780 pfn=1048990 nr=1 > >> [ 0.272099] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea00040067c0 pfn=1048991 nr=1 > >> [ 0.272138] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006800 pfn=1048992 nr=2 > >> [ 0.272144] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006a00 pfn=1049000 nr=8 > >> [ 0.272249] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006c00 pfn=1049008 nr=8 > >> [ 0.272319] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006880 pfn=1048994 nr=2 > >> [ 0.272351] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006900 pfn=1048996 nr=4 > >> [ 0.272424] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004006e00 pfn=1049016 nr=8 > >> [ 0.272485] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004008000 pfn=1049088 nr=8 > >> [ 0.272535] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004008200 pfn=1049096 nr=2 > >> [ 0.272600] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004008400 pfn=1049104 nr=8 > >> [ 0.272663] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004008300 pfn=1049100 nr=4 > >> [ 0.272694] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004008280 pfn=1049098 nr=2 > >> [ 0.272708] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004008600 pfn=1049112 nr=8 > >> > >> [ 0.272924] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004008880 pfn=1049122 nr=2 > >> [ 0.272934] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004008900 pfn=1049124 nr=2 > >> [ 0.272952] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004008c00 pfn=1049136 nr=4 > >> [ 0.273035] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004008980 pfn=1049126 nr=2 > >> [ 0.273062] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004008e00 pfn=1049144 nr=8 > >> [ 0.273674] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004008d00 pfn=1049140 nr=1 > >> [ 0.273884] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004008d80 pfn=1049142 nr=2 > >> [ 0.273943] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009000 pfn=1049152 nr=2 > >> [ 0.274379] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009080 pfn=1049154 nr=2 > >> [ 0.274575] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009200 pfn=1049160 nr=8 > >> [ 0.274617] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009100 pfn=1049156 nr=4 > >> [ 0.274794] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009400 pfn=1049168 nr=2 > >> [ 0.274840] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009480 pfn=1049170 nr=2 > >> [ 0.275057] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009500 pfn=1049172 nr=2 > >> [ 0.275092] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009580 pfn=1049174 nr=2 > >> [ 0.275134] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009600 pfn=1049176 nr=8 > >> [ 0.275211] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009800 pfn=1049184 nr=4 > >> [ 0.275510] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009900 pfn=1049188 nr=2 > >> [ 0.275548] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009980 pfn=1049190 nr=2 > >> [ 0.275976] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009a00 pfn=1049192 nr=8 > >> [ 0.275987] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009c00 pfn=1049200 nr=2 > >> [ 0.276139] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009c80 pfn=1049202 nr=2 > >> [ 0.276152] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004008d40 pfn=1049141 nr=1 > >> [ 0.276242] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009d00 pfn=1049204 nr=1 > >> [ 0.276358] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009d40 pfn=1049205 nr=1 > >> [ 0.276444] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009d80 pfn=1049206 nr=1 > >> [ 0.276526] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009dc0 pfn=1049207 nr=1 > >> [ 0.276615] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009e00 pfn=1049208 nr=1 > >> [ 0.276696] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009e40 pfn=1049209 nr=1 > >> [ 0.276792] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009e80 pfn=1049210 nr=1 > >> [ 0.276827] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009f00 pfn=1049212 nr=2 > >> [ 0.276891] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009ec0 pfn=1049211 nr=1 > >> [ 0.276999] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009f80 pfn=1049214 nr=1 > >> [ 0.277082] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea0004009fc0 pfn=1049215 nr=1 > >> [ 0.277172] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea000400a000 pfn=1049216 nr=1 > >> [ 0.277257] __pgalloc_tag_add: get_page_tag_ref failed! > >> page=ffffea000400a040 pfn=1049217 nr=1 > >> > >> and so on. > >> > >> > >>> I think your earlier patch can effectively detect these early > >>> allocations and suppress the warnings. We should also mark these > >>> allocations with CODETAG_FLAG_INACCURATE. > >> Thanks to an excellent AI review, I realized there are issues with > >> > >> my original patch. One problem is the 256-element array; another > > Yes, if there are lots of such allocations, it's not appropriate. > > > >> is that it involves allocation and free operations — meaning we need > >> > >> to record entries at __pgalloc_tag_add and remove them at __pgalloc_tag_sub, > >> > >> which introduces a noticeable overhead. I'm wondering if we can instead > >> set a flag > >> > >> bit in page flags during the early boot stage, which I'll refer to as > >> EARLY_ALLOC_FLAGS. > >> > >> Then, in __pgalloc_tag_sub, we first check for EARLY_ALLOC_FLAGS. If > >> set, we clear the > >> > >> flag and return immediately; otherwise, we perform the actual > >> subtraction of the tag count. > >> > >> This approach seems somewhat similar to the idea behind > >> mem_profiling_compressed. > > That seems doable but let's first check if we can make page_ext > > initialization happen before these allocations. That would be the > > ideal path. If it's not possible then we can focus on alternatives > > like the one you propose. > > > Yes, the ideal scenario would be to have page_ext initialization > complete before > > these allocations occur. I just did a code walkthrough and found that > this resembles > > the FLATMEM implementation approach - FLATMEM allocates page_ext before > the buddy > > system initialization, so it doesn't seem to encounter the issue we're > facing now. > > https://elixir.bootlin.com/linux/v7.0-rc5/source/mm/mm_init.c#L2707 Yes, page_ext_init_flatmem() looks like an interesting option and it would not work with sparsemem. TBH I would prefer to find a simple solution that can identify early init allocations, mark them inaccuate and suppress the warning rather than introduce some complex mechanism to account for them which would work only is some cases (flatmem). With your original approach I think the only real issue is the size of the array that might be too small. The other issue you mentioned about allocated page being freed and then re-allocated after page_ext is inialized but before clear_page_tag_ref() is called is not really a problem. Yes, we will lose that counter's value but it's similar to other early allocations which we just treat as inaccurate. We can also minimize the possibility of this happening by moving clear_page_tag_ref() into init_page_alloc_tagging(). I don't like the pageflag option you mentioned because it adds an extra condition check into __pgalloc_tag_sub() which will be executed even after the init stage is over. I'll look into this some more tomorrow as it's quite late now. Thanks, Suren. > > However, I'm not entirely certain whether SPARSEMEM can guarantee the > same behavior. > > > > > >> I would appreciate your valuable feedback and any better suggestions you > >> might have. > > Thanks for pursuing this! I'll help in any way I can. > > Suren. > > Thank you so much for your patient guidance and assistance. > > I truly appreciate your willingness to share your knowledge and insights. > > Thanks, > Hao > > >> Thanks > >> > >> Hao > >> > >>> Thanks, > >>> Suren. > >>> > >>>> Thanks > >>>> > >>>> Hao > >>>> > >>>>>> Thanks. > >>>>>> > >>>>>> > >>>>>>>>>> If the slab cache has no free objects, it falls back > >>>>>>>>>> to the buddy allocator to allocate memory. However, at this point page_ext > >>>>>>>>>> is not yet fully initialized, so these newly allocated pages have no > >>>>>>>>>> codetag set. These pages may later be reclaimed by KASAN,which causes > >>>>>>>>>> the warning to trigger when they are freed because their codetag ref is > >>>>>>>>>> still empty. > >>>>>>>>>> > >>>>>>>>>> Use a global array to track pages allocated before page_ext is fully > >>>>>>>>>> initialized, similar to how kmemleak tracks early allocations. > >>>>>>>>>> When page_ext initialization completes, set their codetag > >>>>>>>>>> to empty to avoid warnings when they are freed later. > >>>>>>>>>> > >>>>>>>>>> ... > >>>>>>>>>> > >>>>>>>>>> --- a/include/linux/alloc_tag.h > >>>>>>>>>> +++ b/include/linux/alloc_tag.h > >>>>>>>>>> @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) > >>>>>>>>>> > >>>>>>>>>> #ifdef CONFIG_MEM_ALLOC_PROFILING > >>>>>>>>>> > >>>>>>>>>> +bool mem_profiling_is_available(void); > >>>>>>>>>> +void alloc_tag_add_early_pfn(unsigned long pfn); > >>>>>>>>>> + > >>>>>>>>>> #define ALLOC_TAG_SECTION_NAME "alloc_tags" > >>>>>>>>>> > >>>>>>>>>> struct codetag_bytes { > >>>>>>>>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > >>>>>>>>>> index 58991ab09d84..a5bf4e72c154 100644 > >>>>>>>>>> --- a/lib/alloc_tag.c > >>>>>>>>>> +++ b/lib/alloc_tag.c > >>>>>>>>>> @@ -6,6 +6,7 @@ > >>>>>>>>>> #include <linux/kallsyms.h> > >>>>>>>>>> #include <linux/module.h> > >>>>>>>>>> #include <linux/page_ext.h> > >>>>>>>>>> +#include <linux/pgalloc_tag.h> > >>>>>>>>>> #include <linux/proc_fs.h> > >>>>>>>>>> #include <linux/seq_buf.h> > >>>>>>>>>> #include <linux/seq_file.h> > >>>>>>>>>> @@ -26,6 +27,82 @@ static bool mem_profiling_support; > >>>>>>>>>> > >>>>>>>>>> static struct codetag_type *alloc_tag_cttype; > >>>>>>>>>> > >>>>>>>>>> +/* > >>>>>>>>>> + * State of the alloc_tag > >>>>>>>>>> + * > >>>>>>>>>> + * This is used to describe the states of the alloc_tag during bootup. > >>>>>>>>>> + * > >>>>>>>>>> + * When we need to allocate page_ext to store codetag, we face an > >>>>>>>>>> + * initialization timing problem: > >>>>>>>>>> + * > >>>>>>>>>> + * Due to initialization order, pages may be allocated via buddy system > >>>>>>>>>> + * before page_ext is fully allocated and initialized. Although these > >>>>>>>>>> + * pages call the allocation hooks, the codetag will not be set because > >>>>>>>>>> + * page_ext is not yet available. > >>>>>>>>>> + * > >>>>>>>>>> + * When these pages are later free to the buddy system, it triggers > >>>>>>>>>> + * warnings because their codetag is actually empty if > >>>>>>>>>> + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. > >>>>>>>>>> + * > >>>>>>>>>> + * Additionally, in this situation, we cannot record detailed allocation > >>>>>>>>>> + * information for these pages. > >>>>>>>>>> + */ > >>>>>>>>>> +enum mem_profiling_state { > >>>>>>>>>> + DOWN, /* No mem_profiling functionality yet */ > >>>>>>>>>> + UP /* Everything is working */ > >>>>>>>>>> +}; > >>>>>>>>>> + > >>>>>>>>>> +static enum mem_profiling_state mem_profiling_state = DOWN; > >>>>>>>>>> + > >>>>>>>>>> +bool mem_profiling_is_available(void) > >>>>>>>>>> +{ > >>>>>>>>>> + return mem_profiling_state == UP; > >>>>>>>>>> +} > >>>>>>>>>> + > >>>>>>>>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG > >>>>>>>>>> + > >>>>>>>>>> +#define EARLY_ALLOC_PFN_MAX 256 > >>>>>>>>>> + > >>>>>>>>>> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; > >>>>>>>>> It's unfortunate that this isn't __initdata. > >>>>>>>>> > >>>>>>>>>> +static unsigned int early_pfn_count; > >>>>>>>>>> +static DEFINE_SPINLOCK(early_pfn_lock); > >>>>>>>>>> + > >>>>>>>>>> > >>>>>>>>>> ... > >>>>>>>>>> > >>>>>>>>>> --- a/mm/page_alloc.c > >>>>>>>>>> +++ b/mm/page_alloc.c > >>>>>>>>>> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, > >>>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >>>>>>>>>> update_page_tag_ref(handle, &ref); > >>>>>>>>>> put_page_tag_ref(handle); > >>>>>>>>>> + } else { > >>>>>>>> This branch can be marked as "unlikely". > >>>>>>>> > >>>>>>>>>> + /* > >>>>>>>>>> + * page_ext is not available yet, record the pfn so we can > >>>>>>>>>> + * clear the tag ref later when page_ext is initialized. > >>>>>>>>>> + */ > >>>>>>>>>> + if (!mem_profiling_is_available()) > >>>>>>>>>> + alloc_tag_add_early_pfn(page_to_pfn(page)); > >>>>>>>>>> } > >>>>>>>>>> } > >>>>>>>>> All because of this, I believe. Is this fixable? > >>>>>>>>> > >>>>>>>>> If we take that `else', we know we're running in __init code, yes? I > >>>>>>>>> don't see how `__init pgalloc_tag_add_early()' could be made to work. > >>>>>>>>> hrm. Something clever, please. > >>>>>>>> We can have a pointer to a function that is initialized to point to > >>>>>>>> alloc_tag_add_early_pfn, which is defined as __init and uses > >>>>>>>> early_pfns which now can be defined as __initdata. After > >>>>>>>> clear_early_alloc_pfn_tag_refs() is done we reset that pointer to > >>>>>>>> NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() > >>>>>>>> directly checks that pointer and if it's not NULL then calls the > >>>>>>>> function that it points to. This way __pgalloc_tag_add() which is not > >>>>>>>> an __init function will be invoking alloc_tag_add_early_pfn() __init > >>>>>>>> function only until we are done with initialization. I haven't tried > >>>>>>>> this but I think that should work. This also eliminates the need for > >>>>>>>> mem_profiling_state variable since we can use this function pointer > >>>>>>>> instead. > >>>>>>>> > >>>>>>>> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-25 6:25 ` Suren Baghdasaryan @ 2026-03-25 7:35 ` Suren Baghdasaryan 2026-03-25 11:20 ` Hao Ge 0 siblings, 1 reply; 21+ messages in thread From: Suren Baghdasaryan @ 2026-03-25 7:35 UTC (permalink / raw) To: Hao Ge; +Cc: Andrew Morton, Kent Overstreet, linux-mm, linux-kernel On Tue, Mar 24, 2026 at 11:25 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Tue, Mar 24, 2026 at 7:08 PM Hao Ge <hao.ge@linux.dev> wrote: > > > > > > On 2026/3/25 08:21, Suren Baghdasaryan wrote: > > > On Tue, Mar 24, 2026 at 2:43 AM Hao Ge <hao.ge@linux.dev> wrote: > > >> > > >> On 2026/3/24 06:47, Suren Baghdasaryan wrote: > > >>> On Mon, Mar 23, 2026 at 2:16 AM Hao Ge <hao.ge@linux.dev> wrote: > > >>>> On 2026/3/20 10:14, Suren Baghdasaryan wrote: > > >>>>> On Thu, Mar 19, 2026 at 6:58 PM Hao Ge <hao.ge@linux.dev> wrote: > > >>>>>> On 2026/3/20 07:48, Suren Baghdasaryan wrote: > > >>>>>>> On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@google.com> wrote: > > >>>>>>>> On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > >>>>>>>>> On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: > > >>>>>>>>> > > >>>>>>>>>> Due to initialization ordering, page_ext is allocated and initialized > > >>>>>>>>>> relatively late during boot. Some pages have already been allocated > > >>>>>>>>>> and freed before page_ext becomes available, leaving their codetag > > >>>>>>>>>> uninitialized. > > >>>>>>>> Hi Hao, > > >>>>>>>> Thanks for the report. > > >>>>>>>> Hmm. So, we are allocating pages before page_ext is initialized... > > >>>>>>>> > > >>>>>>>>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls > > >>>>>>>>>> kmemleak_alloc(). > > >>>>>>> Forgot to ask. The example you are using here is for page_ext > > >>>>>>> allocation itself. Do you have any other examples where page > > >>>>>>> allocation happens before page_ext initialization? If that's the only > > >>>>>>> place, then we might be able to fix this in a simpler way by doing > > >>>>>>> something special for alloc_page_ext(). > > >>>>>> Hi Suren > > >>>>>> > > >>>>>> To help illustrate the point, here's the debug log I added: > > >>>>>> > > >>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > >>>>>> index 2d4b6f1a554e..ebfe636f5b07 100644 > > >>>>>> --- a/mm/page_alloc.c > > >>>>>> +++ b/mm/page_alloc.c > > >>>>>> @@ -1293,6 +1293,9 @@ void __pgalloc_tag_add(struct page *page, struct > > >>>>>> task_struct *task, > > >>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > > >>>>>> update_page_tag_ref(handle, &ref); > > >>>>>> put_page_tag_ref(handle); > > >>>>>> + } else { > > >>>>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! > > >>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > > >>>>>> + dump_stack(); > > >>>>>> } > > >>>>>> } > > >>>>>> > > >>>>>> > > >>>>>> And I caught the following logs: > > >>>>>> > > >>>>>> [ 0.296399] __pgalloc_tag_add: get_page_tag_ref failed! > > >>>>>> page=ffffea000400c700 pfn=1049372 nr=1 > > >>>>>> [ 0.296400] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > > >>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > > >>>>>> [ 0.296402] Hardware name: Red Hat KVM, BIOS > > >>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > >>>>>> [ 0.296402] Call Trace: > > >>>>>> [ 0.296403] <TASK> > > >>>>>> [ 0.296403] dump_stack_lvl+0x53/0x70 > > >>>>>> [ 0.296405] __pgalloc_tag_add+0x3a3/0x6e0 > > >>>>>> [ 0.296406] ? __pfx___pgalloc_tag_add+0x10/0x10 > > >>>>>> [ 0.296407] ? kasan_unpoison+0x27/0x60 > > >>>>>> [ 0.296409] ? __kasan_unpoison_pages+0x2c/0x40 > > >>>>>> [ 0.296411] get_page_from_freelist+0xa54/0x1310 > > >>>>>> [ 0.296413] __alloc_frozen_pages_noprof+0x206/0x4c0 > > >>>>>> [ 0.296415] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > > >>>>>> [ 0.296417] ? stack_depot_save_flags+0x3f/0x680 > > >>>>>> [ 0.296418] ? ___slab_alloc+0x518/0x530 > > >>>>>> [ 0.296420] alloc_pages_mpol+0x13a/0x3f0 > > >>>>>> [ 0.296421] ? __pfx_alloc_pages_mpol+0x10/0x10 > > >>>>>> [ 0.296423] ? _raw_spin_lock_irqsave+0x8a/0xf0 > > >>>>>> [ 0.296424] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 > > >>>>>> [ 0.296426] alloc_slab_page+0xc2/0x130 > > >>>>>> [ 0.296427] allocate_slab+0x77/0x2c0 > > >>>>>> [ 0.296429] ? syscall_enter_define_fields+0x3bb/0x5f0 > > >>>>>> [ 0.296430] ___slab_alloc+0x125/0x530 > > >>>>>> [ 0.296432] ? __trace_define_field+0x252/0x3d0 > > >>>>>> [ 0.296433] __kmalloc_noprof+0x329/0x630 > > >>>>>> [ 0.296435] ? syscall_enter_define_fields+0x3bb/0x5f0 > > >>>>>> [ 0.296436] syscall_enter_define_fields+0x3bb/0x5f0 > > >>>>>> [ 0.296438] ? __pfx_syscall_enter_define_fields+0x10/0x10 > > >>>>>> [ 0.296440] event_define_fields+0x326/0x540 > > >>>>>> [ 0.296441] __trace_early_add_events+0xac/0x3c0 > > >>>>>> [ 0.296443] trace_event_init+0x24c/0x460 > > >>>>>> [ 0.296445] trace_init+0x9/0x20 > > >>>>>> [ 0.296446] start_kernel+0x199/0x3c0 > > >>>>>> [ 0.296448] x86_64_start_reservations+0x18/0x30 > > >>>>>> [ 0.296449] x86_64_start_kernel+0xe2/0xf0 > > >>>>>> [ 0.296451] common_startup_64+0x13e/0x141 > > >>>>>> [ 0.296453] </TASK> > > >>>>>> > > >>>>>> > > >>>>>> [ 0.312234] __pgalloc_tag_add: get_page_tag_ref failed! > > >>>>>> page=ffffea000400f900 pfn=1049572 nr=1 > > >>>>>> [ 0.312234] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > > >>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > > >>>>>> [ 0.312236] Hardware name: Red Hat KVM, BIOS > > >>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > >>>>>> [ 0.312236] Call Trace: > > >>>>>> [ 0.312237] <TASK> > > >>>>>> [ 0.312237] dump_stack_lvl+0x53/0x70 > > >>>>>> [ 0.312239] __pgalloc_tag_add+0x3a3/0x6e0 > > >>>>>> [ 0.312240] ? __pfx___pgalloc_tag_add+0x10/0x10 > > >>>>>> [ 0.312241] ? rmqueue.constprop.0+0x4fc/0x1ce0 > > >>>>>> [ 0.312243] ? kasan_unpoison+0x27/0x60 > > >>>>>> [ 0.312244] ? __kasan_unpoison_pages+0x2c/0x40 > > >>>>>> [ 0.312246] get_page_from_freelist+0xa54/0x1310 > > >>>>>> [ 0.312248] __alloc_frozen_pages_noprof+0x206/0x4c0 > > >>>>>> [ 0.312250] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > > >>>>>> [ 0.312253] alloc_slab_page+0x39/0x130 > > >>>>>> [ 0.312254] allocate_slab+0x77/0x2c0 > > >>>>>> [ 0.312255] ? alloc_cpumask_var_node+0xc7/0x230 > > >>>>>> [ 0.312257] ___slab_alloc+0x46d/0x530 > > >>>>>> [ 0.312259] __kmalloc_node_noprof+0x2fa/0x680 > > >>>>>> [ 0.312261] ? alloc_cpumask_var_node+0xc7/0x230 > > >>>>>> [ 0.312263] alloc_cpumask_var_node+0xc7/0x230 > > >>>>>> [ 0.312264] init_desc+0x141/0x6b0 > > >>>>>> [ 0.312266] alloc_desc+0x108/0x1b0 > > >>>>>> [ 0.312267] early_irq_init+0xee/0x1c0 > > >>>>>> [ 0.312268] ? __pfx_early_irq_init+0x10/0x10 > > >>>>>> [ 0.312271] start_kernel+0x1ab/0x3c0 > > >>>>>> [ 0.312272] x86_64_start_reservations+0x18/0x30 > > >>>>>> [ 0.312274] x86_64_start_kernel+0xe2/0xf0 > > >>>>>> [ 0.312275] common_startup_64+0x13e/0x141 > > >>>>>> [ 0.312277] </TASK> > > >>>>>> > > >>>>>> [ 0.312834] __pgalloc_tag_add: get_page_tag_ref failed! > > >>>>>> page=ffffea000400fc00 pfn=1049584 nr=1 > > >>>>>> [ 0.312835] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > > >>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > > >>>>>> [ 0.312836] Hardware name: Red Hat KVM, BIOS > > >>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > >>>>>> [ 0.312837] Call Trace: > > >>>>>> [ 0.312837] <TASK> > > >>>>>> [ 0.312838] dump_stack_lvl+0x53/0x70 > > >>>>>> [ 0.312840] __pgalloc_tag_add+0x3a3/0x6e0 > > >>>>>> [ 0.312841] ? __pfx___pgalloc_tag_add+0x10/0x10 > > >>>>>> [ 0.312842] ? rmqueue.constprop.0+0x4fc/0x1ce0 > > >>>>>> [ 0.312844] ? kasan_unpoison+0x27/0x60 > > >>>>>> [ 0.312845] ? __kasan_unpoison_pages+0x2c/0x40 > > >>>>>> [ 0.312847] get_page_from_freelist+0xa54/0x1310 > > >>>>>> [ 0.312849] __alloc_frozen_pages_noprof+0x206/0x4c0 > > >>>>>> [ 0.312851] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > > >>>>>> [ 0.312853] alloc_pages_mpol+0x13a/0x3f0 > > >>>>>> [ 0.312855] ? __pfx_alloc_pages_mpol+0x10/0x10 > > >>>>>> [ 0.312856] ? xas_find+0x2d8/0x450 > > >>>>>> [ 0.312858] ? _raw_spin_lock+0x84/0xe0 > > >>>>>> [ 0.312859] ? __pfx__raw_spin_lock+0x10/0x10 > > >>>>>> [ 0.312861] alloc_pages_noprof+0xf6/0x2b0 > > >>>>>> [ 0.312862] __change_page_attr+0x293/0x850 > > >>>>>> [ 0.312864] ? __pfx___change_page_attr+0x10/0x10 > > >>>>>> [ 0.312865] ? _vm_unmap_aliases+0x2d0/0x650 > > >>>>>> [ 0.312868] ? __pfx__vm_unmap_aliases+0x10/0x10 > > >>>>>> [ 0.312869] __change_page_attr_set_clr+0x16c/0x360 > > >>>>>> [ 0.312871] ? spp_getpage+0xbb/0x1e0 > > >>>>>> [ 0.312872] change_page_attr_set_clr+0x220/0x3c0 > > >>>>>> [ 0.312873] ? flush_tlb_one_kernel+0xf/0x30 > > >>>>>> [ 0.312875] ? set_pte_vaddr_p4d+0x110/0x180 > > >>>>>> [ 0.312877] ? __pfx_change_page_attr_set_clr+0x10/0x10 > > >>>>>> [ 0.312878] ? __pfx_set_pte_vaddr_p4d+0x10/0x10 > > >>>>>> [ 0.312881] ? __pfx_mtree_load+0x10/0x10 > > >>>>>> [ 0.312883] ? __pfx_mtree_load+0x10/0x10 > > >>>>>> [ 0.312884] ? __asan_memcpy+0x3c/0x60 > > >>>>>> [ 0.312886] ? set_intr_gate+0x10c/0x150 > > >>>>>> [ 0.312888] set_memory_ro+0x76/0xa0 > > >>>>>> [ 0.312889] ? __pfx_set_memory_ro+0x10/0x10 > > >>>>>> [ 0.312891] idt_setup_apic_and_irq_gates+0x2c1/0x390 > > >>>>>> > > >>>>>> and more. > > >>>>> Ok, it's not the only place. Got your point. > > >>>>> > > >>>>>> off topic - if we were to handle only alloc_page_ext() specifically, > > >>>>>> what would be the most straightforward > > >>>>>> > > >>>>>> solution in your mind? I'd really appreciate your insight. > > >>>>> I was thinking if it's the only special case maybe we can handle it > > >>>>> somehow differently, like we do when we allocate obj_ext vectors for > > >>>>> slabs using __GFP_NO_OBJ_EXT. I haven't found a good solution yet but > > >>>>> since it's not a special case we would not be able to use it even if I > > >>>>> came up with something... > > >>>>> I think your way is the most straight-forward but please try my > > >>>>> suggestion to see if we can avoid extra overhead. > > >>>>> Thanks, > > >>>>> Suren. > > Hi Suren > > >> Hi Suren > > >> > > >> > > >>> Hi Hao, > > >>> > > >>>> Hi Suren > > >>>> > > >>>> Thank you for your feedback. After re-examining this issue, > > >>>> > > >>>> I realize my previous focus was misplaced. > > >>>> > > >>>> Upon deeper consideration, I understand that this is not merely a bug, > > >>>> > > >>>> but rather a warning that indicates a gap in our memory profiling mechanism. > > >>>> > > >>>> Specifically, the current implementation appears to be missing memory > > >>>> allocation > > >>>> > > >>>> tracking during the period between the buddy system allocation and page_ext > > >>>> > > >>>> initialization. > > >>>> > > >>>> This profiling gap means we may not be capturing all relevant memory > > >>>> allocation > > >>>> > > >>>> events during this critical transition phase. > > >>> Correct, this limitation exists because memory profiling relies on > > >>> some kernel facilities (page_ext, objj_ext) which might not be > > >>> initialized yet at the time of allocation. > > >>> > > >>>> My approach is to dynamically allocate codetag_ref when get_page_tag_ref > > >>>> fails, > > >>>> > > >>>> and maintain a linked list to track all buddy system allocations that > > >>>> occur prior to page_ext initialization. > > >>>> > > >>>> However, this introduces performance concerns: > > >>>> > > >>>> 1. Free Path Overhead: When freeing these pages, we would need to > > >>>> traverse the entire linked list to locate > > >>>> > > >>>> the corresponding codetag_ref, resulting in O(n) lookup complexity > > >>>> per free operation. > > >>>> > > >>>> 2. Initialization Overhead: During init_page_alloc_tagging, iterating > > >>>> through the linked list to assign codetag_ref to > > >>>> > > >>>> page_ext would introduce additional traversal cost. > > >>>> > > >>>> If the number of pages is substantial, this could incur significant > > >>>> overhead. What are your thoughts on this? I look forward to your > > >>>> suggestions. > > >>> My thinking is that these early allocations comprise a small portion > > >>> of overall memory consumed by the system. So, instead of trying to > > >>> record and handle them in some alternative way, we just accept that > > >>> some counters might not be exactly accurate and ignore those early > > >>> allocations. See how the early slab allocations are marked with the > > >>> CODETAG_FLAG_INACCURATE flag and later reported as inaccurate. I think > > >>> that's an acceptable alternative to introducing extra complexity and > > >>> performance overhead. IOW, the benefits of accounting for these early > > >>> allocations are low compared to the effort required to account for > > >>> them. Unless you found a simple and performant way to do that... > > >> > > >> I have been exploring possible solutions to this issue over the past few > > >> days, > > >> > > >> but so far I have not come up with a good approach. > > >> > > >> I have counted the number of memory allocations that occur earlier than the > > >> > > >> allocation and initialization of our page_ext, and found that there are > > >> actually > > >> > > >> quite a lot of them. > > > Interesting... I wonder it's because deferred_struct_pages defers > > > page_ext initialization. Can you check if setting early_page_ext > > > reduces or eliminates these allocations before page_ext init cases? > > > > Yes, you are correct. In my 8-core 16GB virtual machine, I used a global > > counter > > > > to record these allocations. With early_page_ext enabled, there were 130 > > allocations > > > > before page_ext initialization. Without early_page_ext, there were 802 > > allocations > > > > before page_ext initialization. > > > > > > > > > >> Similarly, I have made the following changes and collected the > > >> corresponding logs. > > >> > > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > >> index 2d4b6f1a554e..6db65b3d52d3 100644 > > >> --- a/mm/page_alloc.c > > >> +++ b/mm/page_alloc.c > > >> @@ -1293,6 +1293,8 @@ void __pgalloc_tag_add(struct page *page, struct > > >> task_struct *task, > > >> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > > >> update_page_tag_ref(handle, &ref); > > >> put_page_tag_ref(handle); > > >> + } else{ > > >> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! > > >> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > > >> } > > >> } > > >> > > >> @@ -1314,6 +1316,8 @@ void __pgalloc_tag_sub(struct page *page, unsigned > > >> int nr) > > >> alloc_tag_sub(&ref, PAGE_SIZE * nr); > > >> update_page_tag_ref(handle, &ref); > > >> put_page_tag_ref(handle); > > >> + } else{ > > >> + pr_warn("__pgalloc_tag_sub: get_page_tag_ref failed! > > >> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > > >> } > > >> } > > >> > > >> [ 0.261699] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001000 pfn=1048640 nr=2 > > >> [ 0.261711] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001100 pfn=1048644 nr=4 > > >> [ 0.261717] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001200 pfn=1048648 nr=4 > > >> [ 0.261721] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001300 pfn=1048652 nr=4 > > >> [ 0.261893] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001080 pfn=1048642 nr=2 > > >> [ 0.261917] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001400 pfn=1048656 nr=4 > > >> [ 0.262018] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001500 pfn=1048660 nr=2 > > >> [ 0.262024] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001600 pfn=1048664 nr=8 > > >> [ 0.262040] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001580 pfn=1048662 nr=1 > > >> [ 0.262048] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea00040015c0 pfn=1048663 nr=1 > > >> [ 0.262056] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001800 pfn=1048672 nr=2 > > >> [ 0.262064] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001880 pfn=1048674 nr=2 > > >> [ 0.262078] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001900 pfn=1048676 nr=2 > > >> [ 0.262196] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 > > >> [ 0.262213] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001980 pfn=1048678 nr=2 > > >> [ 0.262220] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001a00 pfn=1048680 nr=4 > > >> [ 0.262246] ODEBUG: selftest passed > > >> [ 0.262268] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001b00 pfn=1048684 nr=1 > > >> [ 0.262318] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001b40 pfn=1048685 nr=1 > > >> [ 0.262368] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001b80 pfn=1048686 nr=1 > > >> [ 0.262418] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001bc0 pfn=1048687 nr=1 > > >> [ 0.262469] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001c00 pfn=1048688 nr=1 > > >> [ 0.262519] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001c40 pfn=1048689 nr=1 > > >> [ 0.262569] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001c80 pfn=1048690 nr=1 > > >> [ 0.262620] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001cc0 pfn=1048691 nr=1 > > >> [ 0.262670] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001d00 pfn=1048692 nr=1 > > >> [ 0.262721] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001d40 pfn=1048693 nr=1 > > >> [ 0.262771] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001d80 pfn=1048694 nr=1 > > >> [ 0.262821] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001dc0 pfn=1048695 nr=1 > > >> [ 0.262871] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001e00 pfn=1048696 nr=1 > > >> [ 0.262923] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001e40 pfn=1048697 nr=1 > > >> [ 0.262974] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001e80 pfn=1048698 nr=1 > > >> [ 0.263024] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001ec0 pfn=1048699 nr=1 > > >> [ 0.263074] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001f00 pfn=1048700 nr=1 > > >> [ 0.263124] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001f40 pfn=1048701 nr=1 > > >> [ 0.263174] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001f80 pfn=1048702 nr=1 > > >> [ 0.263224] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004001fc0 pfn=1048703 nr=1 > > >> [ 0.263275] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004002000 pfn=1048704 nr=1 > > >> [ 0.263325] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004002040 pfn=1048705 nr=1 > > >> [ 0.263375] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004002080 pfn=1048706 nr=1 > > >> [ 0.263427] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004002400 pfn=1048720 nr=16 > > >> [ 0.263437] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea00040020c0 pfn=1048707 nr=1 > > >> [ 0.263463] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004002100 pfn=1048708 nr=1 > > >> [ 0.263465] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004002140 pfn=1048709 nr=1 > > >> [ 0.263467] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004002180 pfn=1048710 nr=1 > > >> [ 0.263509] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004002200 pfn=1048712 nr=4 > > >> [ 0.263512] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004002800 pfn=1048736 nr=8 > > >> [ 0.263524] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea00040021c0 pfn=1048711 nr=1 > > >> [ 0.263536] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004002300 pfn=1048716 nr=1 > > >> [ 0.263537] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004002340 pfn=1048717 nr=1 > > >> [ 0.263539] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004002380 pfn=1048718 nr=1 > > >> [ 0.263604] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004004000 pfn=1048832 nr=128 > > >> [ 0.263638] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004003000 pfn=1048768 nr=64 > > >> [ 0.263650] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004002c00 pfn=1048752 nr=16 > > >> [ 0.263655] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea00040023c0 pfn=1048719 nr=1 > > >> [ 0.270582] __pgalloc_tag_sub: get_page_tag_ref failed! > > >> page=ffffea00040023c0 pfn=1048719 nr=1 > > >> [ 0.270591] ftrace: allocating 52717 entries in 208 pages > > >> [ 0.270592] ftrace: allocated 208 pages with 3 groups > > >> [ 0.270620] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004002a00 pfn=1048744 nr=8 > > >> [ 0.270636] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea00040023c0 pfn=1048719 nr=1 > > >> [ 0.270643] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006000 pfn=1048960 nr=1 > > >> [ 0.270649] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006040 pfn=1048961 nr=1 > > >> [ 0.270658] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004007000 pfn=1049024 nr=64 > > >> [ 0.270659] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006080 pfn=1048962 nr=2 > > >> [ 0.270722] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006100 pfn=1048964 nr=1 > > >> [ 0.270730] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006140 pfn=1048965 nr=1 > > >> [ 0.270738] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006180 pfn=1048966 nr=1 > > >> [ 0.270777] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea00040061c0 pfn=1048967 nr=1 > > >> [ 0.270786] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006200 pfn=1048968 nr=1 > > >> [ 0.270792] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006240 pfn=1048969 nr=1 > > >> [ 0.270833] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006300 pfn=1048972 nr=4 > > >> [ 0.270891] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006280 pfn=1048970 nr=1 > > >> [ 0.270980] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea00040062c0 pfn=1048971 nr=1 > > >> [ 0.271071] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006400 pfn=1048976 nr=1 > > >> [ 0.271156] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006440 pfn=1048977 nr=1 > > >> [ 0.271185] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006480 pfn=1048978 nr=2 > > >> [ 0.271301] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006500 pfn=1048980 nr=1 > > >> [ 0.271655] Dynamic Preempt: lazy > > >> [ 0.271662] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006580 pfn=1048982 nr=2 > > >> [ 0.271752] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006600 pfn=1048984 nr=4 > > >> [ 0.271762] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004010000 pfn=1049600 nr=4 > > >> [ 0.271824] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006540 pfn=1048981 nr=1 > > >> [ 0.271916] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006700 pfn=1048988 nr=2 > > >> [ 0.271964] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006780 pfn=1048990 nr=1 > > >> [ 0.272099] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea00040067c0 pfn=1048991 nr=1 > > >> [ 0.272138] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006800 pfn=1048992 nr=2 > > >> [ 0.272144] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006a00 pfn=1049000 nr=8 > > >> [ 0.272249] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006c00 pfn=1049008 nr=8 > > >> [ 0.272319] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006880 pfn=1048994 nr=2 > > >> [ 0.272351] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006900 pfn=1048996 nr=4 > > >> [ 0.272424] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004006e00 pfn=1049016 nr=8 > > >> [ 0.272485] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004008000 pfn=1049088 nr=8 > > >> [ 0.272535] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004008200 pfn=1049096 nr=2 > > >> [ 0.272600] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004008400 pfn=1049104 nr=8 > > >> [ 0.272663] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004008300 pfn=1049100 nr=4 > > >> [ 0.272694] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004008280 pfn=1049098 nr=2 > > >> [ 0.272708] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004008600 pfn=1049112 nr=8 > > >> > > >> [ 0.272924] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004008880 pfn=1049122 nr=2 > > >> [ 0.272934] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004008900 pfn=1049124 nr=2 > > >> [ 0.272952] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004008c00 pfn=1049136 nr=4 > > >> [ 0.273035] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004008980 pfn=1049126 nr=2 > > >> [ 0.273062] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004008e00 pfn=1049144 nr=8 > > >> [ 0.273674] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004008d00 pfn=1049140 nr=1 > > >> [ 0.273884] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004008d80 pfn=1049142 nr=2 > > >> [ 0.273943] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009000 pfn=1049152 nr=2 > > >> [ 0.274379] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009080 pfn=1049154 nr=2 > > >> [ 0.274575] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009200 pfn=1049160 nr=8 > > >> [ 0.274617] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009100 pfn=1049156 nr=4 > > >> [ 0.274794] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009400 pfn=1049168 nr=2 > > >> [ 0.274840] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009480 pfn=1049170 nr=2 > > >> [ 0.275057] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009500 pfn=1049172 nr=2 > > >> [ 0.275092] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009580 pfn=1049174 nr=2 > > >> [ 0.275134] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009600 pfn=1049176 nr=8 > > >> [ 0.275211] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009800 pfn=1049184 nr=4 > > >> [ 0.275510] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009900 pfn=1049188 nr=2 > > >> [ 0.275548] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009980 pfn=1049190 nr=2 > > >> [ 0.275976] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009a00 pfn=1049192 nr=8 > > >> [ 0.275987] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009c00 pfn=1049200 nr=2 > > >> [ 0.276139] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009c80 pfn=1049202 nr=2 > > >> [ 0.276152] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004008d40 pfn=1049141 nr=1 > > >> [ 0.276242] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009d00 pfn=1049204 nr=1 > > >> [ 0.276358] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009d40 pfn=1049205 nr=1 > > >> [ 0.276444] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009d80 pfn=1049206 nr=1 > > >> [ 0.276526] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009dc0 pfn=1049207 nr=1 > > >> [ 0.276615] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009e00 pfn=1049208 nr=1 > > >> [ 0.276696] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009e40 pfn=1049209 nr=1 > > >> [ 0.276792] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009e80 pfn=1049210 nr=1 > > >> [ 0.276827] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009f00 pfn=1049212 nr=2 > > >> [ 0.276891] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009ec0 pfn=1049211 nr=1 > > >> [ 0.276999] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009f80 pfn=1049214 nr=1 > > >> [ 0.277082] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea0004009fc0 pfn=1049215 nr=1 > > >> [ 0.277172] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea000400a000 pfn=1049216 nr=1 > > >> [ 0.277257] __pgalloc_tag_add: get_page_tag_ref failed! > > >> page=ffffea000400a040 pfn=1049217 nr=1 > > >> > > >> and so on. > > >> > > >> > > >>> I think your earlier patch can effectively detect these early > > >>> allocations and suppress the warnings. We should also mark these > > >>> allocations with CODETAG_FLAG_INACCURATE. > > >> Thanks to an excellent AI review, I realized there are issues with > > >> > > >> my original patch. One problem is the 256-element array; another > > > Yes, if there are lots of such allocations, it's not appropriate. > > > > > >> is that it involves allocation and free operations — meaning we need > > >> > > >> to record entries at __pgalloc_tag_add and remove them at __pgalloc_tag_sub, > > >> > > >> which introduces a noticeable overhead. I'm wondering if we can instead > > >> set a flag > > >> > > >> bit in page flags during the early boot stage, which I'll refer to as > > >> EARLY_ALLOC_FLAGS. > > >> > > >> Then, in __pgalloc_tag_sub, we first check for EARLY_ALLOC_FLAGS. If > > >> set, we clear the > > >> > > >> flag and return immediately; otherwise, we perform the actual > > >> subtraction of the tag count. > > >> > > >> This approach seems somewhat similar to the idea behind > > >> mem_profiling_compressed. > > > That seems doable but let's first check if we can make page_ext > > > initialization happen before these allocations. That would be the > > > ideal path. If it's not possible then we can focus on alternatives > > > like the one you propose. > > > > > > Yes, the ideal scenario would be to have page_ext initialization > > complete before > > > > these allocations occur. I just did a code walkthrough and found that > > this resembles > > > > the FLATMEM implementation approach - FLATMEM allocates page_ext before > > the buddy > > > > system initialization, so it doesn't seem to encounter the issue we're > > facing now. > > > > https://elixir.bootlin.com/linux/v7.0-rc5/source/mm/mm_init.c#L2707 > > Yes, page_ext_init_flatmem() looks like an interesting option and it > would not work with sparsemem. TBH I would prefer to find a simple > solution that can identify early init allocations, mark them inaccuate > and suppress the warning rather than introduce some complex mechanism > to account for them which would work only is some cases (flatmem). > With your original approach I think the only real issue is the size of > the array that might be too small. The other issue you mentioned about > allocated page being freed and then re-allocated after page_ext is > inialized but before clear_page_tag_ref() is called is not really a > problem. Yes, we will lose that counter's value but it's similar to > other early allocations which we just treat as inaccurate. We can also > minimize the possibility of this happening by moving > clear_page_tag_ref() into init_page_alloc_tagging(). > > I don't like the pageflag option you mentioned because it adds an > extra condition check into __pgalloc_tag_sub() which will be executed > even after the init stage is over. > I'll look into this some more tomorrow as it's quite late now. Just though of something. Are all these pages allocated by slab? If so, I think slab does not use page->lru (need to double-check) and we could add all these pages allocated during early init into a list and then set their page_ext reference to CODETAG_EMPTY in init_page_alloc_tagging(). > Thanks, > Suren. > > > > > However, I'm not entirely certain whether SPARSEMEM can guarantee the > > same behavior. > > > > > > > > > >> I would appreciate your valuable feedback and any better suggestions you > > >> might have. > > > Thanks for pursuing this! I'll help in any way I can. > > > Suren. > > > > Thank you so much for your patient guidance and assistance. > > > > I truly appreciate your willingness to share your knowledge and insights. > > > > Thanks, > > Hao > > > > >> Thanks > > >> > > >> Hao > > >> > > >>> Thanks, > > >>> Suren. > > >>> > > >>>> Thanks > > >>>> > > >>>> Hao > > >>>> > > >>>>>> Thanks. > > >>>>>> > > >>>>>> > > >>>>>>>>>> If the slab cache has no free objects, it falls back > > >>>>>>>>>> to the buddy allocator to allocate memory. However, at this point page_ext > > >>>>>>>>>> is not yet fully initialized, so these newly allocated pages have no > > >>>>>>>>>> codetag set. These pages may later be reclaimed by KASAN,which causes > > >>>>>>>>>> the warning to trigger when they are freed because their codetag ref is > > >>>>>>>>>> still empty. > > >>>>>>>>>> > > >>>>>>>>>> Use a global array to track pages allocated before page_ext is fully > > >>>>>>>>>> initialized, similar to how kmemleak tracks early allocations. > > >>>>>>>>>> When page_ext initialization completes, set their codetag > > >>>>>>>>>> to empty to avoid warnings when they are freed later. > > >>>>>>>>>> > > >>>>>>>>>> ... > > >>>>>>>>>> > > >>>>>>>>>> --- a/include/linux/alloc_tag.h > > >>>>>>>>>> +++ b/include/linux/alloc_tag.h > > >>>>>>>>>> @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) > > >>>>>>>>>> > > >>>>>>>>>> #ifdef CONFIG_MEM_ALLOC_PROFILING > > >>>>>>>>>> > > >>>>>>>>>> +bool mem_profiling_is_available(void); > > >>>>>>>>>> +void alloc_tag_add_early_pfn(unsigned long pfn); > > >>>>>>>>>> + > > >>>>>>>>>> #define ALLOC_TAG_SECTION_NAME "alloc_tags" > > >>>>>>>>>> > > >>>>>>>>>> struct codetag_bytes { > > >>>>>>>>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > > >>>>>>>>>> index 58991ab09d84..a5bf4e72c154 100644 > > >>>>>>>>>> --- a/lib/alloc_tag.c > > >>>>>>>>>> +++ b/lib/alloc_tag.c > > >>>>>>>>>> @@ -6,6 +6,7 @@ > > >>>>>>>>>> #include <linux/kallsyms.h> > > >>>>>>>>>> #include <linux/module.h> > > >>>>>>>>>> #include <linux/page_ext.h> > > >>>>>>>>>> +#include <linux/pgalloc_tag.h> > > >>>>>>>>>> #include <linux/proc_fs.h> > > >>>>>>>>>> #include <linux/seq_buf.h> > > >>>>>>>>>> #include <linux/seq_file.h> > > >>>>>>>>>> @@ -26,6 +27,82 @@ static bool mem_profiling_support; > > >>>>>>>>>> > > >>>>>>>>>> static struct codetag_type *alloc_tag_cttype; > > >>>>>>>>>> > > >>>>>>>>>> +/* > > >>>>>>>>>> + * State of the alloc_tag > > >>>>>>>>>> + * > > >>>>>>>>>> + * This is used to describe the states of the alloc_tag during bootup. > > >>>>>>>>>> + * > > >>>>>>>>>> + * When we need to allocate page_ext to store codetag, we face an > > >>>>>>>>>> + * initialization timing problem: > > >>>>>>>>>> + * > > >>>>>>>>>> + * Due to initialization order, pages may be allocated via buddy system > > >>>>>>>>>> + * before page_ext is fully allocated and initialized. Although these > > >>>>>>>>>> + * pages call the allocation hooks, the codetag will not be set because > > >>>>>>>>>> + * page_ext is not yet available. > > >>>>>>>>>> + * > > >>>>>>>>>> + * When these pages are later free to the buddy system, it triggers > > >>>>>>>>>> + * warnings because their codetag is actually empty if > > >>>>>>>>>> + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. > > >>>>>>>>>> + * > > >>>>>>>>>> + * Additionally, in this situation, we cannot record detailed allocation > > >>>>>>>>>> + * information for these pages. > > >>>>>>>>>> + */ > > >>>>>>>>>> +enum mem_profiling_state { > > >>>>>>>>>> + DOWN, /* No mem_profiling functionality yet */ > > >>>>>>>>>> + UP /* Everything is working */ > > >>>>>>>>>> +}; > > >>>>>>>>>> + > > >>>>>>>>>> +static enum mem_profiling_state mem_profiling_state = DOWN; > > >>>>>>>>>> + > > >>>>>>>>>> +bool mem_profiling_is_available(void) > > >>>>>>>>>> +{ > > >>>>>>>>>> + return mem_profiling_state == UP; > > >>>>>>>>>> +} > > >>>>>>>>>> + > > >>>>>>>>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG > > >>>>>>>>>> + > > >>>>>>>>>> +#define EARLY_ALLOC_PFN_MAX 256 > > >>>>>>>>>> + > > >>>>>>>>>> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; > > >>>>>>>>> It's unfortunate that this isn't __initdata. > > >>>>>>>>> > > >>>>>>>>>> +static unsigned int early_pfn_count; > > >>>>>>>>>> +static DEFINE_SPINLOCK(early_pfn_lock); > > >>>>>>>>>> + > > >>>>>>>>>> > > >>>>>>>>>> ... > > >>>>>>>>>> > > >>>>>>>>>> --- a/mm/page_alloc.c > > >>>>>>>>>> +++ b/mm/page_alloc.c > > >>>>>>>>>> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, > > >>>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > > >>>>>>>>>> update_page_tag_ref(handle, &ref); > > >>>>>>>>>> put_page_tag_ref(handle); > > >>>>>>>>>> + } else { > > >>>>>>>> This branch can be marked as "unlikely". > > >>>>>>>> > > >>>>>>>>>> + /* > > >>>>>>>>>> + * page_ext is not available yet, record the pfn so we can > > >>>>>>>>>> + * clear the tag ref later when page_ext is initialized. > > >>>>>>>>>> + */ > > >>>>>>>>>> + if (!mem_profiling_is_available()) > > >>>>>>>>>> + alloc_tag_add_early_pfn(page_to_pfn(page)); > > >>>>>>>>>> } > > >>>>>>>>>> } > > >>>>>>>>> All because of this, I believe. Is this fixable? > > >>>>>>>>> > > >>>>>>>>> If we take that `else', we know we're running in __init code, yes? I > > >>>>>>>>> don't see how `__init pgalloc_tag_add_early()' could be made to work. > > >>>>>>>>> hrm. Something clever, please. > > >>>>>>>> We can have a pointer to a function that is initialized to point to > > >>>>>>>> alloc_tag_add_early_pfn, which is defined as __init and uses > > >>>>>>>> early_pfns which now can be defined as __initdata. After > > >>>>>>>> clear_early_alloc_pfn_tag_refs() is done we reset that pointer to > > >>>>>>>> NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() > > >>>>>>>> directly checks that pointer and if it's not NULL then calls the > > >>>>>>>> function that it points to. This way __pgalloc_tag_add() which is not > > >>>>>>>> an __init function will be invoking alloc_tag_add_early_pfn() __init > > >>>>>>>> function only until we are done with initialization. I haven't tried > > >>>>>>>> this but I think that should work. This also eliminates the need for > > >>>>>>>> mem_profiling_state variable since we can use this function pointer > > >>>>>>>> instead. > > >>>>>>>> > > >>>>>>>> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-25 7:35 ` Suren Baghdasaryan @ 2026-03-25 11:20 ` Hao Ge 2026-03-25 15:17 ` Suren Baghdasaryan 0 siblings, 1 reply; 21+ messages in thread From: Hao Ge @ 2026-03-25 11:20 UTC (permalink / raw) To: Suren Baghdasaryan; +Cc: Andrew Morton, Kent Overstreet, linux-mm, linux-kernel On 2026/3/25 15:35, Suren Baghdasaryan wrote: > On Tue, Mar 24, 2026 at 11:25 PM Suren Baghdasaryan <surenb@google.com> wrote: >> On Tue, Mar 24, 2026 at 7:08 PM Hao Ge <hao.ge@linux.dev> wrote: >>> >>> On 2026/3/25 08:21, Suren Baghdasaryan wrote: >>>> On Tue, Mar 24, 2026 at 2:43 AM Hao Ge <hao.ge@linux.dev> wrote: >>>>> On 2026/3/24 06:47, Suren Baghdasaryan wrote: >>>>>> On Mon, Mar 23, 2026 at 2:16 AM Hao Ge <hao.ge@linux.dev> wrote: >>>>>>> On 2026/3/20 10:14, Suren Baghdasaryan wrote: >>>>>>>> On Thu, Mar 19, 2026 at 6:58 PM Hao Ge <hao.ge@linux.dev> wrote: >>>>>>>>> On 2026/3/20 07:48, Suren Baghdasaryan wrote: >>>>>>>>>> On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@google.com> wrote: >>>>>>>>>>> On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: >>>>>>>>>>>> On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Due to initialization ordering, page_ext is allocated and initialized >>>>>>>>>>>>> relatively late during boot. Some pages have already been allocated >>>>>>>>>>>>> and freed before page_ext becomes available, leaving their codetag >>>>>>>>>>>>> uninitialized. >>>>>>>>>>> Hi Hao, >>>>>>>>>>> Thanks for the report. >>>>>>>>>>> Hmm. So, we are allocating pages before page_ext is initialized... >>>>>>>>>>> >>>>>>>>>>>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls >>>>>>>>>>>>> kmemleak_alloc(). >>>>>>>>>> Forgot to ask. The example you are using here is for page_ext >>>>>>>>>> allocation itself. Do you have any other examples where page >>>>>>>>>> allocation happens before page_ext initialization? If that's the only >>>>>>>>>> place, then we might be able to fix this in a simpler way by doing >>>>>>>>>> something special for alloc_page_ext(). >>>>>>>>> Hi Suren >>>>>>>>> >>>>>>>>> To help illustrate the point, here's the debug log I added: >>>>>>>>> >>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>>>>>>> index 2d4b6f1a554e..ebfe636f5b07 100644 >>>>>>>>> --- a/mm/page_alloc.c >>>>>>>>> +++ b/mm/page_alloc.c >>>>>>>>> @@ -1293,6 +1293,9 @@ void __pgalloc_tag_add(struct page *page, struct >>>>>>>>> task_struct *task, >>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >>>>>>>>> update_page_tag_ref(handle, &ref); >>>>>>>>> put_page_tag_ref(handle); >>>>>>>>> + } else { >>>>>>>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); >>>>>>>>> + dump_stack(); >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> And I caught the following logs: >>>>>>>>> >>>>>>>>> [ 0.296399] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea000400c700 pfn=1049372 nr=1 >>>>>>>>> [ 0.296400] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >>>>>>>>> [ 0.296402] Hardware name: Red Hat KVM, BIOS >>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>>>>>>> [ 0.296402] Call Trace: >>>>>>>>> [ 0.296403] <TASK> >>>>>>>>> [ 0.296403] dump_stack_lvl+0x53/0x70 >>>>>>>>> [ 0.296405] __pgalloc_tag_add+0x3a3/0x6e0 >>>>>>>>> [ 0.296406] ? __pfx___pgalloc_tag_add+0x10/0x10 >>>>>>>>> [ 0.296407] ? kasan_unpoison+0x27/0x60 >>>>>>>>> [ 0.296409] ? __kasan_unpoison_pages+0x2c/0x40 >>>>>>>>> [ 0.296411] get_page_from_freelist+0xa54/0x1310 >>>>>>>>> [ 0.296413] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>>>>>>> [ 0.296415] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >>>>>>>>> [ 0.296417] ? stack_depot_save_flags+0x3f/0x680 >>>>>>>>> [ 0.296418] ? ___slab_alloc+0x518/0x530 >>>>>>>>> [ 0.296420] alloc_pages_mpol+0x13a/0x3f0 >>>>>>>>> [ 0.296421] ? __pfx_alloc_pages_mpol+0x10/0x10 >>>>>>>>> [ 0.296423] ? _raw_spin_lock_irqsave+0x8a/0xf0 >>>>>>>>> [ 0.296424] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 >>>>>>>>> [ 0.296426] alloc_slab_page+0xc2/0x130 >>>>>>>>> [ 0.296427] allocate_slab+0x77/0x2c0 >>>>>>>>> [ 0.296429] ? syscall_enter_define_fields+0x3bb/0x5f0 >>>>>>>>> [ 0.296430] ___slab_alloc+0x125/0x530 >>>>>>>>> [ 0.296432] ? __trace_define_field+0x252/0x3d0 >>>>>>>>> [ 0.296433] __kmalloc_noprof+0x329/0x630 >>>>>>>>> [ 0.296435] ? syscall_enter_define_fields+0x3bb/0x5f0 >>>>>>>>> [ 0.296436] syscall_enter_define_fields+0x3bb/0x5f0 >>>>>>>>> [ 0.296438] ? __pfx_syscall_enter_define_fields+0x10/0x10 >>>>>>>>> [ 0.296440] event_define_fields+0x326/0x540 >>>>>>>>> [ 0.296441] __trace_early_add_events+0xac/0x3c0 >>>>>>>>> [ 0.296443] trace_event_init+0x24c/0x460 >>>>>>>>> [ 0.296445] trace_init+0x9/0x20 >>>>>>>>> [ 0.296446] start_kernel+0x199/0x3c0 >>>>>>>>> [ 0.296448] x86_64_start_reservations+0x18/0x30 >>>>>>>>> [ 0.296449] x86_64_start_kernel+0xe2/0xf0 >>>>>>>>> [ 0.296451] common_startup_64+0x13e/0x141 >>>>>>>>> [ 0.296453] </TASK> >>>>>>>>> >>>>>>>>> >>>>>>>>> [ 0.312234] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea000400f900 pfn=1049572 nr=1 >>>>>>>>> [ 0.312234] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >>>>>>>>> [ 0.312236] Hardware name: Red Hat KVM, BIOS >>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>>>>>>> [ 0.312236] Call Trace: >>>>>>>>> [ 0.312237] <TASK> >>>>>>>>> [ 0.312237] dump_stack_lvl+0x53/0x70 >>>>>>>>> [ 0.312239] __pgalloc_tag_add+0x3a3/0x6e0 >>>>>>>>> [ 0.312240] ? __pfx___pgalloc_tag_add+0x10/0x10 >>>>>>>>> [ 0.312241] ? rmqueue.constprop.0+0x4fc/0x1ce0 >>>>>>>>> [ 0.312243] ? kasan_unpoison+0x27/0x60 >>>>>>>>> [ 0.312244] ? __kasan_unpoison_pages+0x2c/0x40 >>>>>>>>> [ 0.312246] get_page_from_freelist+0xa54/0x1310 >>>>>>>>> [ 0.312248] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>>>>>>> [ 0.312250] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >>>>>>>>> [ 0.312253] alloc_slab_page+0x39/0x130 >>>>>>>>> [ 0.312254] allocate_slab+0x77/0x2c0 >>>>>>>>> [ 0.312255] ? alloc_cpumask_var_node+0xc7/0x230 >>>>>>>>> [ 0.312257] ___slab_alloc+0x46d/0x530 >>>>>>>>> [ 0.312259] __kmalloc_node_noprof+0x2fa/0x680 >>>>>>>>> [ 0.312261] ? alloc_cpumask_var_node+0xc7/0x230 >>>>>>>>> [ 0.312263] alloc_cpumask_var_node+0xc7/0x230 >>>>>>>>> [ 0.312264] init_desc+0x141/0x6b0 >>>>>>>>> [ 0.312266] alloc_desc+0x108/0x1b0 >>>>>>>>> [ 0.312267] early_irq_init+0xee/0x1c0 >>>>>>>>> [ 0.312268] ? __pfx_early_irq_init+0x10/0x10 >>>>>>>>> [ 0.312271] start_kernel+0x1ab/0x3c0 >>>>>>>>> [ 0.312272] x86_64_start_reservations+0x18/0x30 >>>>>>>>> [ 0.312274] x86_64_start_kernel+0xe2/0xf0 >>>>>>>>> [ 0.312275] common_startup_64+0x13e/0x141 >>>>>>>>> [ 0.312277] </TASK> >>>>>>>>> >>>>>>>>> [ 0.312834] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea000400fc00 pfn=1049584 nr=1 >>>>>>>>> [ 0.312835] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >>>>>>>>> [ 0.312836] Hardware name: Red Hat KVM, BIOS >>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>>>>>>> [ 0.312837] Call Trace: >>>>>>>>> [ 0.312837] <TASK> >>>>>>>>> [ 0.312838] dump_stack_lvl+0x53/0x70 >>>>>>>>> [ 0.312840] __pgalloc_tag_add+0x3a3/0x6e0 >>>>>>>>> [ 0.312841] ? __pfx___pgalloc_tag_add+0x10/0x10 >>>>>>>>> [ 0.312842] ? rmqueue.constprop.0+0x4fc/0x1ce0 >>>>>>>>> [ 0.312844] ? kasan_unpoison+0x27/0x60 >>>>>>>>> [ 0.312845] ? __kasan_unpoison_pages+0x2c/0x40 >>>>>>>>> [ 0.312847] get_page_from_freelist+0xa54/0x1310 >>>>>>>>> [ 0.312849] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>>>>>>> [ 0.312851] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >>>>>>>>> [ 0.312853] alloc_pages_mpol+0x13a/0x3f0 >>>>>>>>> [ 0.312855] ? __pfx_alloc_pages_mpol+0x10/0x10 >>>>>>>>> [ 0.312856] ? xas_find+0x2d8/0x450 >>>>>>>>> [ 0.312858] ? _raw_spin_lock+0x84/0xe0 >>>>>>>>> [ 0.312859] ? __pfx__raw_spin_lock+0x10/0x10 >>>>>>>>> [ 0.312861] alloc_pages_noprof+0xf6/0x2b0 >>>>>>>>> [ 0.312862] __change_page_attr+0x293/0x850 >>>>>>>>> [ 0.312864] ? __pfx___change_page_attr+0x10/0x10 >>>>>>>>> [ 0.312865] ? _vm_unmap_aliases+0x2d0/0x650 >>>>>>>>> [ 0.312868] ? __pfx__vm_unmap_aliases+0x10/0x10 >>>>>>>>> [ 0.312869] __change_page_attr_set_clr+0x16c/0x360 >>>>>>>>> [ 0.312871] ? spp_getpage+0xbb/0x1e0 >>>>>>>>> [ 0.312872] change_page_attr_set_clr+0x220/0x3c0 >>>>>>>>> [ 0.312873] ? flush_tlb_one_kernel+0xf/0x30 >>>>>>>>> [ 0.312875] ? set_pte_vaddr_p4d+0x110/0x180 >>>>>>>>> [ 0.312877] ? __pfx_change_page_attr_set_clr+0x10/0x10 >>>>>>>>> [ 0.312878] ? __pfx_set_pte_vaddr_p4d+0x10/0x10 >>>>>>>>> [ 0.312881] ? __pfx_mtree_load+0x10/0x10 >>>>>>>>> [ 0.312883] ? __pfx_mtree_load+0x10/0x10 >>>>>>>>> [ 0.312884] ? __asan_memcpy+0x3c/0x60 >>>>>>>>> [ 0.312886] ? set_intr_gate+0x10c/0x150 >>>>>>>>> [ 0.312888] set_memory_ro+0x76/0xa0 >>>>>>>>> [ 0.312889] ? __pfx_set_memory_ro+0x10/0x10 >>>>>>>>> [ 0.312891] idt_setup_apic_and_irq_gates+0x2c1/0x390 >>>>>>>>> >>>>>>>>> and more. >>>>>>>> Ok, it's not the only place. Got your point. >>>>>>>> >>>>>>>>> off topic - if we were to handle only alloc_page_ext() specifically, >>>>>>>>> what would be the most straightforward >>>>>>>>> >>>>>>>>> solution in your mind? I'd really appreciate your insight. >>>>>>>> I was thinking if it's the only special case maybe we can handle it >>>>>>>> somehow differently, like we do when we allocate obj_ext vectors for >>>>>>>> slabs using __GFP_NO_OBJ_EXT. I haven't found a good solution yet but >>>>>>>> since it's not a special case we would not be able to use it even if I >>>>>>>> came up with something... >>>>>>>> I think your way is the most straight-forward but please try my >>>>>>>> suggestion to see if we can avoid extra overhead. >>>>>>>> Thanks, >>>>>>>> Suren. >>> Hi Suren >>>>> Hi Suren >>>>> >>>>> >>>>>> Hi Hao, >>>>>> >>>>>>> Hi Suren >>>>>>> >>>>>>> Thank you for your feedback. After re-examining this issue, >>>>>>> >>>>>>> I realize my previous focus was misplaced. >>>>>>> >>>>>>> Upon deeper consideration, I understand that this is not merely a bug, >>>>>>> >>>>>>> but rather a warning that indicates a gap in our memory profiling mechanism. >>>>>>> >>>>>>> Specifically, the current implementation appears to be missing memory >>>>>>> allocation >>>>>>> >>>>>>> tracking during the period between the buddy system allocation and page_ext >>>>>>> >>>>>>> initialization. >>>>>>> >>>>>>> This profiling gap means we may not be capturing all relevant memory >>>>>>> allocation >>>>>>> >>>>>>> events during this critical transition phase. >>>>>> Correct, this limitation exists because memory profiling relies on >>>>>> some kernel facilities (page_ext, objj_ext) which might not be >>>>>> initialized yet at the time of allocation. >>>>>> >>>>>>> My approach is to dynamically allocate codetag_ref when get_page_tag_ref >>>>>>> fails, >>>>>>> >>>>>>> and maintain a linked list to track all buddy system allocations that >>>>>>> occur prior to page_ext initialization. >>>>>>> >>>>>>> However, this introduces performance concerns: >>>>>>> >>>>>>> 1. Free Path Overhead: When freeing these pages, we would need to >>>>>>> traverse the entire linked list to locate >>>>>>> >>>>>>> the corresponding codetag_ref, resulting in O(n) lookup complexity >>>>>>> per free operation. >>>>>>> >>>>>>> 2. Initialization Overhead: During init_page_alloc_tagging, iterating >>>>>>> through the linked list to assign codetag_ref to >>>>>>> >>>>>>> page_ext would introduce additional traversal cost. >>>>>>> >>>>>>> If the number of pages is substantial, this could incur significant >>>>>>> overhead. What are your thoughts on this? I look forward to your >>>>>>> suggestions. >>>>>> My thinking is that these early allocations comprise a small portion >>>>>> of overall memory consumed by the system. So, instead of trying to >>>>>> record and handle them in some alternative way, we just accept that >>>>>> some counters might not be exactly accurate and ignore those early >>>>>> allocations. See how the early slab allocations are marked with the >>>>>> CODETAG_FLAG_INACCURATE flag and later reported as inaccurate. I think >>>>>> that's an acceptable alternative to introducing extra complexity and >>>>>> performance overhead. IOW, the benefits of accounting for these early >>>>>> allocations are low compared to the effort required to account for >>>>>> them. Unless you found a simple and performant way to do that... >>>>> I have been exploring possible solutions to this issue over the past few >>>>> days, >>>>> >>>>> but so far I have not come up with a good approach. >>>>> >>>>> I have counted the number of memory allocations that occur earlier than the >>>>> >>>>> allocation and initialization of our page_ext, and found that there are >>>>> actually >>>>> >>>>> quite a lot of them. >>>> Interesting... I wonder it's because deferred_struct_pages defers >>>> page_ext initialization. Can you check if setting early_page_ext >>>> reduces or eliminates these allocations before page_ext init cases? >>> Yes, you are correct. In my 8-core 16GB virtual machine, I used a global >>> counter >>> >>> to record these allocations. With early_page_ext enabled, there were 130 >>> allocations >>> >>> before page_ext initialization. Without early_page_ext, there were 802 >>> allocations >>> >>> before page_ext initialization. >>> >>> >>>>> Similarly, I have made the following changes and collected the >>>>> corresponding logs. >>>>> >>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>>> index 2d4b6f1a554e..6db65b3d52d3 100644 >>>>> --- a/mm/page_alloc.c >>>>> +++ b/mm/page_alloc.c >>>>> @@ -1293,6 +1293,8 @@ void __pgalloc_tag_add(struct page *page, struct >>>>> task_struct *task, >>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >>>>> update_page_tag_ref(handle, &ref); >>>>> put_page_tag_ref(handle); >>>>> + } else{ >>>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); >>>>> } >>>>> } >>>>> >>>>> @@ -1314,6 +1316,8 @@ void __pgalloc_tag_sub(struct page *page, unsigned >>>>> int nr) >>>>> alloc_tag_sub(&ref, PAGE_SIZE * nr); >>>>> update_page_tag_ref(handle, &ref); >>>>> put_page_tag_ref(handle); >>>>> + } else{ >>>>> + pr_warn("__pgalloc_tag_sub: get_page_tag_ref failed! >>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); >>>>> } >>>>> } >>>>> >>>>> [ 0.261699] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001000 pfn=1048640 nr=2 >>>>> [ 0.261711] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001100 pfn=1048644 nr=4 >>>>> [ 0.261717] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001200 pfn=1048648 nr=4 >>>>> [ 0.261721] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001300 pfn=1048652 nr=4 >>>>> [ 0.261893] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001080 pfn=1048642 nr=2 >>>>> [ 0.261917] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001400 pfn=1048656 nr=4 >>>>> [ 0.262018] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001500 pfn=1048660 nr=2 >>>>> [ 0.262024] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001600 pfn=1048664 nr=8 >>>>> [ 0.262040] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001580 pfn=1048662 nr=1 >>>>> [ 0.262048] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea00040015c0 pfn=1048663 nr=1 >>>>> [ 0.262056] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001800 pfn=1048672 nr=2 >>>>> [ 0.262064] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001880 pfn=1048674 nr=2 >>>>> [ 0.262078] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001900 pfn=1048676 nr=2 >>>>> [ 0.262196] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 >>>>> [ 0.262213] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001980 pfn=1048678 nr=2 >>>>> [ 0.262220] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001a00 pfn=1048680 nr=4 >>>>> [ 0.262246] ODEBUG: selftest passed >>>>> [ 0.262268] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001b00 pfn=1048684 nr=1 >>>>> [ 0.262318] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001b40 pfn=1048685 nr=1 >>>>> [ 0.262368] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001b80 pfn=1048686 nr=1 >>>>> [ 0.262418] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001bc0 pfn=1048687 nr=1 >>>>> [ 0.262469] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001c00 pfn=1048688 nr=1 >>>>> [ 0.262519] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001c40 pfn=1048689 nr=1 >>>>> [ 0.262569] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001c80 pfn=1048690 nr=1 >>>>> [ 0.262620] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001cc0 pfn=1048691 nr=1 >>>>> [ 0.262670] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001d00 pfn=1048692 nr=1 >>>>> [ 0.262721] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001d40 pfn=1048693 nr=1 >>>>> [ 0.262771] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001d80 pfn=1048694 nr=1 >>>>> [ 0.262821] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001dc0 pfn=1048695 nr=1 >>>>> [ 0.262871] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001e00 pfn=1048696 nr=1 >>>>> [ 0.262923] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001e40 pfn=1048697 nr=1 >>>>> [ 0.262974] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001e80 pfn=1048698 nr=1 >>>>> [ 0.263024] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001ec0 pfn=1048699 nr=1 >>>>> [ 0.263074] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001f00 pfn=1048700 nr=1 >>>>> [ 0.263124] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001f40 pfn=1048701 nr=1 >>>>> [ 0.263174] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001f80 pfn=1048702 nr=1 >>>>> [ 0.263224] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004001fc0 pfn=1048703 nr=1 >>>>> [ 0.263275] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004002000 pfn=1048704 nr=1 >>>>> [ 0.263325] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004002040 pfn=1048705 nr=1 >>>>> [ 0.263375] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004002080 pfn=1048706 nr=1 >>>>> [ 0.263427] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004002400 pfn=1048720 nr=16 >>>>> [ 0.263437] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea00040020c0 pfn=1048707 nr=1 >>>>> [ 0.263463] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004002100 pfn=1048708 nr=1 >>>>> [ 0.263465] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004002140 pfn=1048709 nr=1 >>>>> [ 0.263467] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004002180 pfn=1048710 nr=1 >>>>> [ 0.263509] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004002200 pfn=1048712 nr=4 >>>>> [ 0.263512] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004002800 pfn=1048736 nr=8 >>>>> [ 0.263524] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea00040021c0 pfn=1048711 nr=1 >>>>> [ 0.263536] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004002300 pfn=1048716 nr=1 >>>>> [ 0.263537] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004002340 pfn=1048717 nr=1 >>>>> [ 0.263539] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004002380 pfn=1048718 nr=1 >>>>> [ 0.263604] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004004000 pfn=1048832 nr=128 >>>>> [ 0.263638] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004003000 pfn=1048768 nr=64 >>>>> [ 0.263650] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004002c00 pfn=1048752 nr=16 >>>>> [ 0.263655] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea00040023c0 pfn=1048719 nr=1 >>>>> [ 0.270582] __pgalloc_tag_sub: get_page_tag_ref failed! >>>>> page=ffffea00040023c0 pfn=1048719 nr=1 >>>>> [ 0.270591] ftrace: allocating 52717 entries in 208 pages >>>>> [ 0.270592] ftrace: allocated 208 pages with 3 groups >>>>> [ 0.270620] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004002a00 pfn=1048744 nr=8 >>>>> [ 0.270636] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea00040023c0 pfn=1048719 nr=1 >>>>> [ 0.270643] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006000 pfn=1048960 nr=1 >>>>> [ 0.270649] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006040 pfn=1048961 nr=1 >>>>> [ 0.270658] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004007000 pfn=1049024 nr=64 >>>>> [ 0.270659] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006080 pfn=1048962 nr=2 >>>>> [ 0.270722] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006100 pfn=1048964 nr=1 >>>>> [ 0.270730] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006140 pfn=1048965 nr=1 >>>>> [ 0.270738] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006180 pfn=1048966 nr=1 >>>>> [ 0.270777] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea00040061c0 pfn=1048967 nr=1 >>>>> [ 0.270786] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006200 pfn=1048968 nr=1 >>>>> [ 0.270792] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006240 pfn=1048969 nr=1 >>>>> [ 0.270833] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006300 pfn=1048972 nr=4 >>>>> [ 0.270891] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006280 pfn=1048970 nr=1 >>>>> [ 0.270980] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea00040062c0 pfn=1048971 nr=1 >>>>> [ 0.271071] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006400 pfn=1048976 nr=1 >>>>> [ 0.271156] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006440 pfn=1048977 nr=1 >>>>> [ 0.271185] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006480 pfn=1048978 nr=2 >>>>> [ 0.271301] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006500 pfn=1048980 nr=1 >>>>> [ 0.271655] Dynamic Preempt: lazy >>>>> [ 0.271662] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006580 pfn=1048982 nr=2 >>>>> [ 0.271752] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006600 pfn=1048984 nr=4 >>>>> [ 0.271762] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004010000 pfn=1049600 nr=4 >>>>> [ 0.271824] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006540 pfn=1048981 nr=1 >>>>> [ 0.271916] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006700 pfn=1048988 nr=2 >>>>> [ 0.271964] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006780 pfn=1048990 nr=1 >>>>> [ 0.272099] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea00040067c0 pfn=1048991 nr=1 >>>>> [ 0.272138] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006800 pfn=1048992 nr=2 >>>>> [ 0.272144] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006a00 pfn=1049000 nr=8 >>>>> [ 0.272249] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006c00 pfn=1049008 nr=8 >>>>> [ 0.272319] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006880 pfn=1048994 nr=2 >>>>> [ 0.272351] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006900 pfn=1048996 nr=4 >>>>> [ 0.272424] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004006e00 pfn=1049016 nr=8 >>>>> [ 0.272485] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004008000 pfn=1049088 nr=8 >>>>> [ 0.272535] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004008200 pfn=1049096 nr=2 >>>>> [ 0.272600] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004008400 pfn=1049104 nr=8 >>>>> [ 0.272663] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004008300 pfn=1049100 nr=4 >>>>> [ 0.272694] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004008280 pfn=1049098 nr=2 >>>>> [ 0.272708] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004008600 pfn=1049112 nr=8 >>>>> >>>>> [ 0.272924] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004008880 pfn=1049122 nr=2 >>>>> [ 0.272934] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004008900 pfn=1049124 nr=2 >>>>> [ 0.272952] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004008c00 pfn=1049136 nr=4 >>>>> [ 0.273035] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004008980 pfn=1049126 nr=2 >>>>> [ 0.273062] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004008e00 pfn=1049144 nr=8 >>>>> [ 0.273674] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004008d00 pfn=1049140 nr=1 >>>>> [ 0.273884] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004008d80 pfn=1049142 nr=2 >>>>> [ 0.273943] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009000 pfn=1049152 nr=2 >>>>> [ 0.274379] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009080 pfn=1049154 nr=2 >>>>> [ 0.274575] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009200 pfn=1049160 nr=8 >>>>> [ 0.274617] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009100 pfn=1049156 nr=4 >>>>> [ 0.274794] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009400 pfn=1049168 nr=2 >>>>> [ 0.274840] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009480 pfn=1049170 nr=2 >>>>> [ 0.275057] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009500 pfn=1049172 nr=2 >>>>> [ 0.275092] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009580 pfn=1049174 nr=2 >>>>> [ 0.275134] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009600 pfn=1049176 nr=8 >>>>> [ 0.275211] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009800 pfn=1049184 nr=4 >>>>> [ 0.275510] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009900 pfn=1049188 nr=2 >>>>> [ 0.275548] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009980 pfn=1049190 nr=2 >>>>> [ 0.275976] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009a00 pfn=1049192 nr=8 >>>>> [ 0.275987] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009c00 pfn=1049200 nr=2 >>>>> [ 0.276139] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009c80 pfn=1049202 nr=2 >>>>> [ 0.276152] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004008d40 pfn=1049141 nr=1 >>>>> [ 0.276242] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009d00 pfn=1049204 nr=1 >>>>> [ 0.276358] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009d40 pfn=1049205 nr=1 >>>>> [ 0.276444] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009d80 pfn=1049206 nr=1 >>>>> [ 0.276526] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009dc0 pfn=1049207 nr=1 >>>>> [ 0.276615] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009e00 pfn=1049208 nr=1 >>>>> [ 0.276696] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009e40 pfn=1049209 nr=1 >>>>> [ 0.276792] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009e80 pfn=1049210 nr=1 >>>>> [ 0.276827] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009f00 pfn=1049212 nr=2 >>>>> [ 0.276891] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009ec0 pfn=1049211 nr=1 >>>>> [ 0.276999] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009f80 pfn=1049214 nr=1 >>>>> [ 0.277082] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea0004009fc0 pfn=1049215 nr=1 >>>>> [ 0.277172] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea000400a000 pfn=1049216 nr=1 >>>>> [ 0.277257] __pgalloc_tag_add: get_page_tag_ref failed! >>>>> page=ffffea000400a040 pfn=1049217 nr=1 >>>>> >>>>> and so on. >>>>> >>>>> >>>>>> I think your earlier patch can effectively detect these early >>>>>> allocations and suppress the warnings. We should also mark these >>>>>> allocations with CODETAG_FLAG_INACCURATE. >>>>> Thanks to an excellent AI review, I realized there are issues with >>>>> >>>>> my original patch. One problem is the 256-element array; another >>>> Yes, if there are lots of such allocations, it's not appropriate. >>>> >>>>> is that it involves allocation and free operations — meaning we need >>>>> >>>>> to record entries at __pgalloc_tag_add and remove them at __pgalloc_tag_sub, >>>>> >>>>> which introduces a noticeable overhead. I'm wondering if we can instead >>>>> set a flag >>>>> >>>>> bit in page flags during the early boot stage, which I'll refer to as >>>>> EARLY_ALLOC_FLAGS. >>>>> >>>>> Then, in __pgalloc_tag_sub, we first check for EARLY_ALLOC_FLAGS. If >>>>> set, we clear the >>>>> >>>>> flag and return immediately; otherwise, we perform the actual >>>>> subtraction of the tag count. >>>>> >>>>> This approach seems somewhat similar to the idea behind >>>>> mem_profiling_compressed. >>>> That seems doable but let's first check if we can make page_ext >>>> initialization happen before these allocations. That would be the >>>> ideal path. If it's not possible then we can focus on alternatives >>>> like the one you propose. >>> >>> Yes, the ideal scenario would be to have page_ext initialization >>> complete before >>> >>> these allocations occur. I just did a code walkthrough and found that >>> this resembles >>> >>> the FLATMEM implementation approach - FLATMEM allocates page_ext before >>> the buddy >>> >>> system initialization, so it doesn't seem to encounter the issue we're >>> facing now. >>> >>> https://elixir.bootlin.com/linux/v7.0-rc5/source/mm/mm_init.c#L2707 >> Yes, page_ext_init_flatmem() looks like an interesting option and it >> would not work with sparsemem. TBH I would prefer to find a simple >> solution that can identify early init allocations, mark them inaccuate >> and suppress the warning rather than introduce some complex mechanism >> to account for them which would work only is some cases (flatmem). >> With your original approach I think the only real issue is the size of >> the array that might be too small. The other issue you mentioned about >> allocated page being freed and then re-allocated after page_ext is >> inialized but before clear_page_tag_ref() is called is not really a >> problem. Yes, we will lose that counter's value but it's similar to >> other early allocations which we just treat as inaccurate. We can also >> minimize the possibility of this happening by moving >> clear_page_tag_ref() into init_page_alloc_tagging(). >> >> I don't like the pageflag option you mentioned because it adds an >> extra condition check into __pgalloc_tag_sub() which will be executed >> even after the init stage is over. >> I'll look into this some more tomorrow as it's quite late now. Hi Suren > Just though of something. Are all these pages allocated by slab? If > so, I think slab does not use page->lru (need to double-check) and we > could add all these pages allocated during early init into a list and > then set their page_ext reference to CODETAG_EMPTY in > init_page_alloc_tagging(). Got your point. There will indeed be some non-SLAB memory allocations here, such as the following: CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) [ 0.326607] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 0.326608] Call Trace: [ 0.326608] <TASK> [ 0.326609] dump_stack_lvl+0x53/0x70 [ 0.326611] __pgalloc_tag_add+0x407/0x700 [ 0.326616] get_page_from_freelist+0xa54/0x1310 [ 0.326618] __alloc_frozen_pages_noprof+0x206/0x4c0 [ 0.326623] alloc_pages_mpol+0x13a/0x3f0 [ 0.326627] alloc_pages_noprof+0xf6/0x2b0 [ 0.326628] __pmd_alloc+0x743/0x9c0 [ 0.326630] vmap_range_noflush+0xac0/0x10a0 [ 0.326637] ioremap_page_range+0x17c/0x250 [ 0.326639] __ioremap_caller+0x437/0x5c0 [ 0.326645] acpi_os_map_iomem+0x4c0/0x660 [ 0.326647] acpi_tb_verify_temp_table+0x1c0/0x580 [ 0.326649] acpi_reallocate_root_table+0x2ad/0x460 [ 0.326655] acpi_early_init+0x111/0x460 [ 0.326657] start_kernel+0x271/0x3c0 [ 0.326659] x86_64_start_reservations+0x18/0x30 [ 0.326660] x86_64_start_kernel+0xe2/0xf0 [ 0.326662] common_startup_64+0x13e/0x141 [ 0.326663] </TASK> CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) [ 0.329167] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 0.329167] Call Trace: [ 0.329167] <TASK> [ 0.329167] dump_stack_lvl+0x53/0x70 [ 0.329167] __pgalloc_tag_add+0x407/0x700 [ 0.329167] get_page_from_freelist+0xa54/0x1310 [ 0.329167] __alloc_frozen_pages_noprof+0x206/0x4c0 [ 0.329167] __alloc_pages_noprof+0x10/0x1b0 [ 0.329167] dup_task_struct+0x163/0x8c0 [ 0.329167] copy_process+0x390/0x4a70 [ 0.329167] kernel_clone+0xe1/0x830 [ 0.329167] kernel_thread+0xcb/0x110 [ 0.329167] kthreadd+0x8a2/0xc60 [ 0.329167] ret_from_fork+0x551/0x720 [ 0.329167] ret_from_fork_asm+0x1a/0x30 [ 0.329167] </TASK> CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) [ 0.329167] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 0.329167] Call Trace: [ 0.329167] <TASK> [ 0.329167] dump_stack_lvl+0x53/0x70 [ 0.329167] __pgalloc_tag_add+0x407/0x700 [ 0.329167] get_page_from_freelist+0xa54/0x1310 [ 0.329167] __alloc_frozen_pages_noprof+0x206/0x4c0 [ 0.329167] __alloc_pages_noprof+0x10/0x1b0 [ 0.329167] dup_task_struct+0x163/0x8c0 [ 0.329167] copy_process+0x390/0x4a70 [ 0.329167] kernel_clone+0xe1/0x830 [ 0.329167] kernel_thread+0xcb/0x110 [ 0.329167] kthreadd+0x8a2/0xc60 [ 0.329167] ret_from_fork+0x551/0x720 [ 0.329167] ret_from_fork_asm+0x1a/0x30 [ 0.329167] </TASK> CPU: 4 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) [ 0.434265] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 0.434266] Call Trace: [ 0.434266] <TASK> [ 0.434266] dump_stack_lvl+0x53/0x70 [ 0.434268] __pgalloc_tag_add+0x407/0x700 [ 0.434272] get_page_from_freelist+0xa54/0x1310 [ 0.434274] __alloc_frozen_pages_noprof+0x206/0x4c0 [ 0.434279] alloc_pages_exact_nid_noprof+0x10f/0x380 [ 0.434283] init_section_page_ext+0x167/0x370 [ 0.434284] page_ext_init+0x451/0x620 [ 0.434287] page_alloc_init_late+0x553/0x630 [ 0.434290] kernel_init_freeable+0x7be/0xd30 [ 0.434294] kernel_init+0x1f/0x1f0 [ 0.434295] ret_from_fork+0x551/0x720 [ 0.434301] ret_from_fork_asm+0x1a/0x30 [ 0.434303] </TASK> CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) [ 0.346712] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 0.346713] Call Trace: [ 0.346713] <TASK> [ 0.346714] dump_stack_lvl+0x53/0x70 [ 0.346715] __pgalloc_tag_add+0x407/0x700 [ 0.346720] get_page_from_freelist+0xa54/0x1310 [ 0.346723] __alloc_frozen_pages_noprof+0x206/0x4c0 [ 0.346729] __alloc_pages_noprof+0x10/0x1b0 [ 0.346731] alloc_cpu_data+0x96/0x210 [ 0.346732] rb_allocate_cpu_buffer+0xb93/0x1500 [ 0.346739] trace_rb_cpu_prepare+0x21a/0x4f0 [ 0.346753] cpuhp_invoke_callback+0x6db/0x14b0 [ 0.346755] __cpuhp_invoke_callback_range+0xde/0x1d0 [ 0.346759] _cpu_up+0x395/0x880 [ 0.346761] cpu_up+0x1bb/0x210 [ 0.346762] cpuhp_bringup_mask+0xd2/0x150 [ 0.346763] bringup_nonboot_cpus+0x12b/0x170 [ 0.346764] smp_init+0x2f/0x100 [ 0.346766] kernel_init_freeable+0x7a5/0xd30 [ 0.346769] kernel_init+0x1f/0x1f0 [ 0.346771] ret_from_fork+0x551/0x720 [ 0.346776] ret_from_fork_asm+0x1a/0x30 [ 0.346778] </TASK> and so on... In fact, I previously conducted extensive and prolonged stress testing on memory profiling. After our efforts to address several WARN cases, one remaining scenario we are addressing is the warning triggered during early slab cache reclaim — which is precisely the situation we are currently encountering (although I cannot guarantee that all edge cases have been covered by our stress testing). During the stress testing process, this warning did indeed manifest. However, the current environment triggers KASAN slab cache reclaim earlier than anticipated. Although the memory allocated prior to page_ext initialization has a relatively low probability of being released in subsequent operations (at least we have not encountered such cases up to now), I remain uncertain whether there are any overlooked edge cases when considering only slab-backed pages. Thanks Hao >> Thanks, >> Suren. >> >>> However, I'm not entirely certain whether SPARSEMEM can guarantee the >>> same behavior. >>> >>> >>>>> I would appreciate your valuable feedback and any better suggestions you >>>>> might have. >>>> Thanks for pursuing this! I'll help in any way I can. >>>> Suren. >>> Thank you so much for your patient guidance and assistance. >>> >>> I truly appreciate your willingness to share your knowledge and insights. >>> >>> Thanks, >>> Hao >>> >>>>> Thanks >>>>> >>>>> Hao >>>>> >>>>>> Thanks, >>>>>> Suren. >>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Hao >>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> If the slab cache has no free objects, it falls back >>>>>>>>>>>>> to the buddy allocator to allocate memory. However, at this point page_ext >>>>>>>>>>>>> is not yet fully initialized, so these newly allocated pages have no >>>>>>>>>>>>> codetag set. These pages may later be reclaimed by KASAN,which causes >>>>>>>>>>>>> the warning to trigger when they are freed because their codetag ref is >>>>>>>>>>>>> still empty. >>>>>>>>>>>>> >>>>>>>>>>>>> Use a global array to track pages allocated before page_ext is fully >>>>>>>>>>>>> initialized, similar to how kmemleak tracks early allocations. >>>>>>>>>>>>> When page_ext initialization completes, set their codetag >>>>>>>>>>>>> to empty to avoid warnings when they are freed later. >>>>>>>>>>>>> >>>>>>>>>>>>> ... >>>>>>>>>>>>> >>>>>>>>>>>>> --- a/include/linux/alloc_tag.h >>>>>>>>>>>>> +++ b/include/linux/alloc_tag.h >>>>>>>>>>>>> @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) >>>>>>>>>>>>> >>>>>>>>>>>>> #ifdef CONFIG_MEM_ALLOC_PROFILING >>>>>>>>>>>>> >>>>>>>>>>>>> +bool mem_profiling_is_available(void); >>>>>>>>>>>>> +void alloc_tag_add_early_pfn(unsigned long pfn); >>>>>>>>>>>>> + >>>>>>>>>>>>> #define ALLOC_TAG_SECTION_NAME "alloc_tags" >>>>>>>>>>>>> >>>>>>>>>>>>> struct codetag_bytes { >>>>>>>>>>>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c >>>>>>>>>>>>> index 58991ab09d84..a5bf4e72c154 100644 >>>>>>>>>>>>> --- a/lib/alloc_tag.c >>>>>>>>>>>>> +++ b/lib/alloc_tag.c >>>>>>>>>>>>> @@ -6,6 +6,7 @@ >>>>>>>>>>>>> #include <linux/kallsyms.h> >>>>>>>>>>>>> #include <linux/module.h> >>>>>>>>>>>>> #include <linux/page_ext.h> >>>>>>>>>>>>> +#include <linux/pgalloc_tag.h> >>>>>>>>>>>>> #include <linux/proc_fs.h> >>>>>>>>>>>>> #include <linux/seq_buf.h> >>>>>>>>>>>>> #include <linux/seq_file.h> >>>>>>>>>>>>> @@ -26,6 +27,82 @@ static bool mem_profiling_support; >>>>>>>>>>>>> >>>>>>>>>>>>> static struct codetag_type *alloc_tag_cttype; >>>>>>>>>>>>> >>>>>>>>>>>>> +/* >>>>>>>>>>>>> + * State of the alloc_tag >>>>>>>>>>>>> + * >>>>>>>>>>>>> + * This is used to describe the states of the alloc_tag during bootup. >>>>>>>>>>>>> + * >>>>>>>>>>>>> + * When we need to allocate page_ext to store codetag, we face an >>>>>>>>>>>>> + * initialization timing problem: >>>>>>>>>>>>> + * >>>>>>>>>>>>> + * Due to initialization order, pages may be allocated via buddy system >>>>>>>>>>>>> + * before page_ext is fully allocated and initialized. Although these >>>>>>>>>>>>> + * pages call the allocation hooks, the codetag will not be set because >>>>>>>>>>>>> + * page_ext is not yet available. >>>>>>>>>>>>> + * >>>>>>>>>>>>> + * When these pages are later free to the buddy system, it triggers >>>>>>>>>>>>> + * warnings because their codetag is actually empty if >>>>>>>>>>>>> + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. >>>>>>>>>>>>> + * >>>>>>>>>>>>> + * Additionally, in this situation, we cannot record detailed allocation >>>>>>>>>>>>> + * information for these pages. >>>>>>>>>>>>> + */ >>>>>>>>>>>>> +enum mem_profiling_state { >>>>>>>>>>>>> + DOWN, /* No mem_profiling functionality yet */ >>>>>>>>>>>>> + UP /* Everything is working */ >>>>>>>>>>>>> +}; >>>>>>>>>>>>> + >>>>>>>>>>>>> +static enum mem_profiling_state mem_profiling_state = DOWN; >>>>>>>>>>>>> + >>>>>>>>>>>>> +bool mem_profiling_is_available(void) >>>>>>>>>>>>> +{ >>>>>>>>>>>>> + return mem_profiling_state == UP; >>>>>>>>>>>>> +} >>>>>>>>>>>>> + >>>>>>>>>>>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG >>>>>>>>>>>>> + >>>>>>>>>>>>> +#define EARLY_ALLOC_PFN_MAX 256 >>>>>>>>>>>>> + >>>>>>>>>>>>> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; >>>>>>>>>>>> It's unfortunate that this isn't __initdata. >>>>>>>>>>>> >>>>>>>>>>>>> +static unsigned int early_pfn_count; >>>>>>>>>>>>> +static DEFINE_SPINLOCK(early_pfn_lock); >>>>>>>>>>>>> + >>>>>>>>>>>>> >>>>>>>>>>>>> ... >>>>>>>>>>>>> >>>>>>>>>>>>> --- a/mm/page_alloc.c >>>>>>>>>>>>> +++ b/mm/page_alloc.c >>>>>>>>>>>>> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, >>>>>>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >>>>>>>>>>>>> update_page_tag_ref(handle, &ref); >>>>>>>>>>>>> put_page_tag_ref(handle); >>>>>>>>>>>>> + } else { >>>>>>>>>>> This branch can be marked as "unlikely". >>>>>>>>>>> >>>>>>>>>>>>> + /* >>>>>>>>>>>>> + * page_ext is not available yet, record the pfn so we can >>>>>>>>>>>>> + * clear the tag ref later when page_ext is initialized. >>>>>>>>>>>>> + */ >>>>>>>>>>>>> + if (!mem_profiling_is_available()) >>>>>>>>>>>>> + alloc_tag_add_early_pfn(page_to_pfn(page)); >>>>>>>>>>>>> } >>>>>>>>>>>>> } >>>>>>>>>>>> All because of this, I believe. Is this fixable? >>>>>>>>>>>> >>>>>>>>>>>> If we take that `else', we know we're running in __init code, yes? I >>>>>>>>>>>> don't see how `__init pgalloc_tag_add_early()' could be made to work. >>>>>>>>>>>> hrm. Something clever, please. >>>>>>>>>>> We can have a pointer to a function that is initialized to point to >>>>>>>>>>> alloc_tag_add_early_pfn, which is defined as __init and uses >>>>>>>>>>> early_pfns which now can be defined as __initdata. After >>>>>>>>>>> clear_early_alloc_pfn_tag_refs() is done we reset that pointer to >>>>>>>>>>> NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() >>>>>>>>>>> directly checks that pointer and if it's not NULL then calls the >>>>>>>>>>> function that it points to. This way __pgalloc_tag_add() which is not >>>>>>>>>>> an __init function will be invoking alloc_tag_add_early_pfn() __init >>>>>>>>>>> function only until we are done with initialization. I haven't tried >>>>>>>>>>> this but I think that should work. This also eliminates the need for >>>>>>>>>>> mem_profiling_state variable since we can use this function pointer >>>>>>>>>>> instead. >>>>>>>>>>> >>>>>>>>>>> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-25 11:20 ` Hao Ge @ 2026-03-25 15:17 ` Suren Baghdasaryan 2026-03-26 1:44 ` Hao Ge 0 siblings, 1 reply; 21+ messages in thread From: Suren Baghdasaryan @ 2026-03-25 15:17 UTC (permalink / raw) To: Hao Ge; +Cc: Andrew Morton, Kent Overstreet, linux-mm, linux-kernel On Wed, Mar 25, 2026 at 4:21 AM Hao Ge <hao.ge@linux.dev> wrote: > > > On 2026/3/25 15:35, Suren Baghdasaryan wrote: > > On Tue, Mar 24, 2026 at 11:25 PM Suren Baghdasaryan <surenb@google.com> wrote: > >> On Tue, Mar 24, 2026 at 7:08 PM Hao Ge <hao.ge@linux.dev> wrote: > >>> > >>> On 2026/3/25 08:21, Suren Baghdasaryan wrote: > >>>> On Tue, Mar 24, 2026 at 2:43 AM Hao Ge <hao.ge@linux.dev> wrote: > >>>>> On 2026/3/24 06:47, Suren Baghdasaryan wrote: > >>>>>> On Mon, Mar 23, 2026 at 2:16 AM Hao Ge <hao.ge@linux.dev> wrote: > >>>>>>> On 2026/3/20 10:14, Suren Baghdasaryan wrote: > >>>>>>>> On Thu, Mar 19, 2026 at 6:58 PM Hao Ge <hao.ge@linux.dev> wrote: > >>>>>>>>> On 2026/3/20 07:48, Suren Baghdasaryan wrote: > >>>>>>>>>> On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@google.com> wrote: > >>>>>>>>>>> On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: > >>>>>>>>>>>> On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Due to initialization ordering, page_ext is allocated and initialized > >>>>>>>>>>>>> relatively late during boot. Some pages have already been allocated > >>>>>>>>>>>>> and freed before page_ext becomes available, leaving their codetag > >>>>>>>>>>>>> uninitialized. > >>>>>>>>>>> Hi Hao, > >>>>>>>>>>> Thanks for the report. > >>>>>>>>>>> Hmm. So, we are allocating pages before page_ext is initialized... > >>>>>>>>>>> > >>>>>>>>>>>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls > >>>>>>>>>>>>> kmemleak_alloc(). > >>>>>>>>>> Forgot to ask. The example you are using here is for page_ext > >>>>>>>>>> allocation itself. Do you have any other examples where page > >>>>>>>>>> allocation happens before page_ext initialization? If that's the only > >>>>>>>>>> place, then we might be able to fix this in a simpler way by doing > >>>>>>>>>> something special for alloc_page_ext(). > >>>>>>>>> Hi Suren > >>>>>>>>> > >>>>>>>>> To help illustrate the point, here's the debug log I added: > >>>>>>>>> > >>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>>>>>>>> index 2d4b6f1a554e..ebfe636f5b07 100644 > >>>>>>>>> --- a/mm/page_alloc.c > >>>>>>>>> +++ b/mm/page_alloc.c > >>>>>>>>> @@ -1293,6 +1293,9 @@ void __pgalloc_tag_add(struct page *page, struct > >>>>>>>>> task_struct *task, > >>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >>>>>>>>> update_page_tag_ref(handle, &ref); > >>>>>>>>> put_page_tag_ref(handle); > >>>>>>>>> + } else { > >>>>>>>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > >>>>>>>>> + dump_stack(); > >>>>>>>>> } > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> And I caught the following logs: > >>>>>>>>> > >>>>>>>>> [ 0.296399] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea000400c700 pfn=1049372 nr=1 > >>>>>>>>> [ 0.296400] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >>>>>>>>> [ 0.296402] Hardware name: Red Hat KVM, BIOS > >>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>>>>>>> [ 0.296402] Call Trace: > >>>>>>>>> [ 0.296403] <TASK> > >>>>>>>>> [ 0.296403] dump_stack_lvl+0x53/0x70 > >>>>>>>>> [ 0.296405] __pgalloc_tag_add+0x3a3/0x6e0 > >>>>>>>>> [ 0.296406] ? __pfx___pgalloc_tag_add+0x10/0x10 > >>>>>>>>> [ 0.296407] ? kasan_unpoison+0x27/0x60 > >>>>>>>>> [ 0.296409] ? __kasan_unpoison_pages+0x2c/0x40 > >>>>>>>>> [ 0.296411] get_page_from_freelist+0xa54/0x1310 > >>>>>>>>> [ 0.296413] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>>>>>>> [ 0.296415] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >>>>>>>>> [ 0.296417] ? stack_depot_save_flags+0x3f/0x680 > >>>>>>>>> [ 0.296418] ? ___slab_alloc+0x518/0x530 > >>>>>>>>> [ 0.296420] alloc_pages_mpol+0x13a/0x3f0 > >>>>>>>>> [ 0.296421] ? __pfx_alloc_pages_mpol+0x10/0x10 > >>>>>>>>> [ 0.296423] ? _raw_spin_lock_irqsave+0x8a/0xf0 > >>>>>>>>> [ 0.296424] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 > >>>>>>>>> [ 0.296426] alloc_slab_page+0xc2/0x130 > >>>>>>>>> [ 0.296427] allocate_slab+0x77/0x2c0 > >>>>>>>>> [ 0.296429] ? syscall_enter_define_fields+0x3bb/0x5f0 > >>>>>>>>> [ 0.296430] ___slab_alloc+0x125/0x530 > >>>>>>>>> [ 0.296432] ? __trace_define_field+0x252/0x3d0 > >>>>>>>>> [ 0.296433] __kmalloc_noprof+0x329/0x630 > >>>>>>>>> [ 0.296435] ? syscall_enter_define_fields+0x3bb/0x5f0 > >>>>>>>>> [ 0.296436] syscall_enter_define_fields+0x3bb/0x5f0 > >>>>>>>>> [ 0.296438] ? __pfx_syscall_enter_define_fields+0x10/0x10 > >>>>>>>>> [ 0.296440] event_define_fields+0x326/0x540 > >>>>>>>>> [ 0.296441] __trace_early_add_events+0xac/0x3c0 > >>>>>>>>> [ 0.296443] trace_event_init+0x24c/0x460 > >>>>>>>>> [ 0.296445] trace_init+0x9/0x20 > >>>>>>>>> [ 0.296446] start_kernel+0x199/0x3c0 > >>>>>>>>> [ 0.296448] x86_64_start_reservations+0x18/0x30 > >>>>>>>>> [ 0.296449] x86_64_start_kernel+0xe2/0xf0 > >>>>>>>>> [ 0.296451] common_startup_64+0x13e/0x141 > >>>>>>>>> [ 0.296453] </TASK> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> [ 0.312234] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea000400f900 pfn=1049572 nr=1 > >>>>>>>>> [ 0.312234] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >>>>>>>>> [ 0.312236] Hardware name: Red Hat KVM, BIOS > >>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>>>>>>> [ 0.312236] Call Trace: > >>>>>>>>> [ 0.312237] <TASK> > >>>>>>>>> [ 0.312237] dump_stack_lvl+0x53/0x70 > >>>>>>>>> [ 0.312239] __pgalloc_tag_add+0x3a3/0x6e0 > >>>>>>>>> [ 0.312240] ? __pfx___pgalloc_tag_add+0x10/0x10 > >>>>>>>>> [ 0.312241] ? rmqueue.constprop.0+0x4fc/0x1ce0 > >>>>>>>>> [ 0.312243] ? kasan_unpoison+0x27/0x60 > >>>>>>>>> [ 0.312244] ? __kasan_unpoison_pages+0x2c/0x40 > >>>>>>>>> [ 0.312246] get_page_from_freelist+0xa54/0x1310 > >>>>>>>>> [ 0.312248] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>>>>>>> [ 0.312250] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >>>>>>>>> [ 0.312253] alloc_slab_page+0x39/0x130 > >>>>>>>>> [ 0.312254] allocate_slab+0x77/0x2c0 > >>>>>>>>> [ 0.312255] ? alloc_cpumask_var_node+0xc7/0x230 > >>>>>>>>> [ 0.312257] ___slab_alloc+0x46d/0x530 > >>>>>>>>> [ 0.312259] __kmalloc_node_noprof+0x2fa/0x680 > >>>>>>>>> [ 0.312261] ? alloc_cpumask_var_node+0xc7/0x230 > >>>>>>>>> [ 0.312263] alloc_cpumask_var_node+0xc7/0x230 > >>>>>>>>> [ 0.312264] init_desc+0x141/0x6b0 > >>>>>>>>> [ 0.312266] alloc_desc+0x108/0x1b0 > >>>>>>>>> [ 0.312267] early_irq_init+0xee/0x1c0 > >>>>>>>>> [ 0.312268] ? __pfx_early_irq_init+0x10/0x10 > >>>>>>>>> [ 0.312271] start_kernel+0x1ab/0x3c0 > >>>>>>>>> [ 0.312272] x86_64_start_reservations+0x18/0x30 > >>>>>>>>> [ 0.312274] x86_64_start_kernel+0xe2/0xf0 > >>>>>>>>> [ 0.312275] common_startup_64+0x13e/0x141 > >>>>>>>>> [ 0.312277] </TASK> > >>>>>>>>> > >>>>>>>>> [ 0.312834] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea000400fc00 pfn=1049584 nr=1 > >>>>>>>>> [ 0.312835] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >>>>>>>>> [ 0.312836] Hardware name: Red Hat KVM, BIOS > >>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>>>>>>> [ 0.312837] Call Trace: > >>>>>>>>> [ 0.312837] <TASK> > >>>>>>>>> [ 0.312838] dump_stack_lvl+0x53/0x70 > >>>>>>>>> [ 0.312840] __pgalloc_tag_add+0x3a3/0x6e0 > >>>>>>>>> [ 0.312841] ? __pfx___pgalloc_tag_add+0x10/0x10 > >>>>>>>>> [ 0.312842] ? rmqueue.constprop.0+0x4fc/0x1ce0 > >>>>>>>>> [ 0.312844] ? kasan_unpoison+0x27/0x60 > >>>>>>>>> [ 0.312845] ? __kasan_unpoison_pages+0x2c/0x40 > >>>>>>>>> [ 0.312847] get_page_from_freelist+0xa54/0x1310 > >>>>>>>>> [ 0.312849] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>>>>>>> [ 0.312851] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >>>>>>>>> [ 0.312853] alloc_pages_mpol+0x13a/0x3f0 > >>>>>>>>> [ 0.312855] ? __pfx_alloc_pages_mpol+0x10/0x10 > >>>>>>>>> [ 0.312856] ? xas_find+0x2d8/0x450 > >>>>>>>>> [ 0.312858] ? _raw_spin_lock+0x84/0xe0 > >>>>>>>>> [ 0.312859] ? __pfx__raw_spin_lock+0x10/0x10 > >>>>>>>>> [ 0.312861] alloc_pages_noprof+0xf6/0x2b0 > >>>>>>>>> [ 0.312862] __change_page_attr+0x293/0x850 > >>>>>>>>> [ 0.312864] ? __pfx___change_page_attr+0x10/0x10 > >>>>>>>>> [ 0.312865] ? _vm_unmap_aliases+0x2d0/0x650 > >>>>>>>>> [ 0.312868] ? __pfx__vm_unmap_aliases+0x10/0x10 > >>>>>>>>> [ 0.312869] __change_page_attr_set_clr+0x16c/0x360 > >>>>>>>>> [ 0.312871] ? spp_getpage+0xbb/0x1e0 > >>>>>>>>> [ 0.312872] change_page_attr_set_clr+0x220/0x3c0 > >>>>>>>>> [ 0.312873] ? flush_tlb_one_kernel+0xf/0x30 > >>>>>>>>> [ 0.312875] ? set_pte_vaddr_p4d+0x110/0x180 > >>>>>>>>> [ 0.312877] ? __pfx_change_page_attr_set_clr+0x10/0x10 > >>>>>>>>> [ 0.312878] ? __pfx_set_pte_vaddr_p4d+0x10/0x10 > >>>>>>>>> [ 0.312881] ? __pfx_mtree_load+0x10/0x10 > >>>>>>>>> [ 0.312883] ? __pfx_mtree_load+0x10/0x10 > >>>>>>>>> [ 0.312884] ? __asan_memcpy+0x3c/0x60 > >>>>>>>>> [ 0.312886] ? set_intr_gate+0x10c/0x150 > >>>>>>>>> [ 0.312888] set_memory_ro+0x76/0xa0 > >>>>>>>>> [ 0.312889] ? __pfx_set_memory_ro+0x10/0x10 > >>>>>>>>> [ 0.312891] idt_setup_apic_and_irq_gates+0x2c1/0x390 > >>>>>>>>> > >>>>>>>>> and more. > >>>>>>>> Ok, it's not the only place. Got your point. > >>>>>>>> > >>>>>>>>> off topic - if we were to handle only alloc_page_ext() specifically, > >>>>>>>>> what would be the most straightforward > >>>>>>>>> > >>>>>>>>> solution in your mind? I'd really appreciate your insight. > >>>>>>>> I was thinking if it's the only special case maybe we can handle it > >>>>>>>> somehow differently, like we do when we allocate obj_ext vectors for > >>>>>>>> slabs using __GFP_NO_OBJ_EXT. I haven't found a good solution yet but > >>>>>>>> since it's not a special case we would not be able to use it even if I > >>>>>>>> came up with something... > >>>>>>>> I think your way is the most straight-forward but please try my > >>>>>>>> suggestion to see if we can avoid extra overhead. > >>>>>>>> Thanks, > >>>>>>>> Suren. > >>> Hi Suren > >>>>> Hi Suren > >>>>> > >>>>> > >>>>>> Hi Hao, > >>>>>> > >>>>>>> Hi Suren > >>>>>>> > >>>>>>> Thank you for your feedback. After re-examining this issue, > >>>>>>> > >>>>>>> I realize my previous focus was misplaced. > >>>>>>> > >>>>>>> Upon deeper consideration, I understand that this is not merely a bug, > >>>>>>> > >>>>>>> but rather a warning that indicates a gap in our memory profiling mechanism. > >>>>>>> > >>>>>>> Specifically, the current implementation appears to be missing memory > >>>>>>> allocation > >>>>>>> > >>>>>>> tracking during the period between the buddy system allocation and page_ext > >>>>>>> > >>>>>>> initialization. > >>>>>>> > >>>>>>> This profiling gap means we may not be capturing all relevant memory > >>>>>>> allocation > >>>>>>> > >>>>>>> events during this critical transition phase. > >>>>>> Correct, this limitation exists because memory profiling relies on > >>>>>> some kernel facilities (page_ext, objj_ext) which might not be > >>>>>> initialized yet at the time of allocation. > >>>>>> > >>>>>>> My approach is to dynamically allocate codetag_ref when get_page_tag_ref > >>>>>>> fails, > >>>>>>> > >>>>>>> and maintain a linked list to track all buddy system allocations that > >>>>>>> occur prior to page_ext initialization. > >>>>>>> > >>>>>>> However, this introduces performance concerns: > >>>>>>> > >>>>>>> 1. Free Path Overhead: When freeing these pages, we would need to > >>>>>>> traverse the entire linked list to locate > >>>>>>> > >>>>>>> the corresponding codetag_ref, resulting in O(n) lookup complexity > >>>>>>> per free operation. > >>>>>>> > >>>>>>> 2. Initialization Overhead: During init_page_alloc_tagging, iterating > >>>>>>> through the linked list to assign codetag_ref to > >>>>>>> > >>>>>>> page_ext would introduce additional traversal cost. > >>>>>>> > >>>>>>> If the number of pages is substantial, this could incur significant > >>>>>>> overhead. What are your thoughts on this? I look forward to your > >>>>>>> suggestions. > >>>>>> My thinking is that these early allocations comprise a small portion > >>>>>> of overall memory consumed by the system. So, instead of trying to > >>>>>> record and handle them in some alternative way, we just accept that > >>>>>> some counters might not be exactly accurate and ignore those early > >>>>>> allocations. See how the early slab allocations are marked with the > >>>>>> CODETAG_FLAG_INACCURATE flag and later reported as inaccurate. I think > >>>>>> that's an acceptable alternative to introducing extra complexity and > >>>>>> performance overhead. IOW, the benefits of accounting for these early > >>>>>> allocations are low compared to the effort required to account for > >>>>>> them. Unless you found a simple and performant way to do that... > >>>>> I have been exploring possible solutions to this issue over the past few > >>>>> days, > >>>>> > >>>>> but so far I have not come up with a good approach. > >>>>> > >>>>> I have counted the number of memory allocations that occur earlier than the > >>>>> > >>>>> allocation and initialization of our page_ext, and found that there are > >>>>> actually > >>>>> > >>>>> quite a lot of them. > >>>> Interesting... I wonder it's because deferred_struct_pages defers > >>>> page_ext initialization. Can you check if setting early_page_ext > >>>> reduces or eliminates these allocations before page_ext init cases? > >>> Yes, you are correct. In my 8-core 16GB virtual machine, I used a global > >>> counter > >>> > >>> to record these allocations. With early_page_ext enabled, there were 130 > >>> allocations > >>> > >>> before page_ext initialization. Without early_page_ext, there were 802 > >>> allocations > >>> > >>> before page_ext initialization. > >>> > >>> > >>>>> Similarly, I have made the following changes and collected the > >>>>> corresponding logs. > >>>>> > >>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>>>> index 2d4b6f1a554e..6db65b3d52d3 100644 > >>>>> --- a/mm/page_alloc.c > >>>>> +++ b/mm/page_alloc.c > >>>>> @@ -1293,6 +1293,8 @@ void __pgalloc_tag_add(struct page *page, struct > >>>>> task_struct *task, > >>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >>>>> update_page_tag_ref(handle, &ref); > >>>>> put_page_tag_ref(handle); > >>>>> + } else{ > >>>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > >>>>> } > >>>>> } > >>>>> > >>>>> @@ -1314,6 +1316,8 @@ void __pgalloc_tag_sub(struct page *page, unsigned > >>>>> int nr) > >>>>> alloc_tag_sub(&ref, PAGE_SIZE * nr); > >>>>> update_page_tag_ref(handle, &ref); > >>>>> put_page_tag_ref(handle); > >>>>> + } else{ > >>>>> + pr_warn("__pgalloc_tag_sub: get_page_tag_ref failed! > >>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > >>>>> } > >>>>> } > >>>>> > >>>>> [ 0.261699] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001000 pfn=1048640 nr=2 > >>>>> [ 0.261711] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001100 pfn=1048644 nr=4 > >>>>> [ 0.261717] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001200 pfn=1048648 nr=4 > >>>>> [ 0.261721] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001300 pfn=1048652 nr=4 > >>>>> [ 0.261893] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001080 pfn=1048642 nr=2 > >>>>> [ 0.261917] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001400 pfn=1048656 nr=4 > >>>>> [ 0.262018] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001500 pfn=1048660 nr=2 > >>>>> [ 0.262024] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001600 pfn=1048664 nr=8 > >>>>> [ 0.262040] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001580 pfn=1048662 nr=1 > >>>>> [ 0.262048] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea00040015c0 pfn=1048663 nr=1 > >>>>> [ 0.262056] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001800 pfn=1048672 nr=2 > >>>>> [ 0.262064] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001880 pfn=1048674 nr=2 > >>>>> [ 0.262078] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001900 pfn=1048676 nr=2 > >>>>> [ 0.262196] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 > >>>>> [ 0.262213] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001980 pfn=1048678 nr=2 > >>>>> [ 0.262220] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001a00 pfn=1048680 nr=4 > >>>>> [ 0.262246] ODEBUG: selftest passed > >>>>> [ 0.262268] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001b00 pfn=1048684 nr=1 > >>>>> [ 0.262318] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001b40 pfn=1048685 nr=1 > >>>>> [ 0.262368] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001b80 pfn=1048686 nr=1 > >>>>> [ 0.262418] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001bc0 pfn=1048687 nr=1 > >>>>> [ 0.262469] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001c00 pfn=1048688 nr=1 > >>>>> [ 0.262519] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001c40 pfn=1048689 nr=1 > >>>>> [ 0.262569] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001c80 pfn=1048690 nr=1 > >>>>> [ 0.262620] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001cc0 pfn=1048691 nr=1 > >>>>> [ 0.262670] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001d00 pfn=1048692 nr=1 > >>>>> [ 0.262721] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001d40 pfn=1048693 nr=1 > >>>>> [ 0.262771] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001d80 pfn=1048694 nr=1 > >>>>> [ 0.262821] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001dc0 pfn=1048695 nr=1 > >>>>> [ 0.262871] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001e00 pfn=1048696 nr=1 > >>>>> [ 0.262923] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001e40 pfn=1048697 nr=1 > >>>>> [ 0.262974] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001e80 pfn=1048698 nr=1 > >>>>> [ 0.263024] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001ec0 pfn=1048699 nr=1 > >>>>> [ 0.263074] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001f00 pfn=1048700 nr=1 > >>>>> [ 0.263124] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001f40 pfn=1048701 nr=1 > >>>>> [ 0.263174] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001f80 pfn=1048702 nr=1 > >>>>> [ 0.263224] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004001fc0 pfn=1048703 nr=1 > >>>>> [ 0.263275] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004002000 pfn=1048704 nr=1 > >>>>> [ 0.263325] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004002040 pfn=1048705 nr=1 > >>>>> [ 0.263375] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004002080 pfn=1048706 nr=1 > >>>>> [ 0.263427] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004002400 pfn=1048720 nr=16 > >>>>> [ 0.263437] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea00040020c0 pfn=1048707 nr=1 > >>>>> [ 0.263463] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004002100 pfn=1048708 nr=1 > >>>>> [ 0.263465] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004002140 pfn=1048709 nr=1 > >>>>> [ 0.263467] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004002180 pfn=1048710 nr=1 > >>>>> [ 0.263509] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004002200 pfn=1048712 nr=4 > >>>>> [ 0.263512] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004002800 pfn=1048736 nr=8 > >>>>> [ 0.263524] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea00040021c0 pfn=1048711 nr=1 > >>>>> [ 0.263536] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004002300 pfn=1048716 nr=1 > >>>>> [ 0.263537] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004002340 pfn=1048717 nr=1 > >>>>> [ 0.263539] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004002380 pfn=1048718 nr=1 > >>>>> [ 0.263604] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004004000 pfn=1048832 nr=128 > >>>>> [ 0.263638] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004003000 pfn=1048768 nr=64 > >>>>> [ 0.263650] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004002c00 pfn=1048752 nr=16 > >>>>> [ 0.263655] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea00040023c0 pfn=1048719 nr=1 > >>>>> [ 0.270582] __pgalloc_tag_sub: get_page_tag_ref failed! > >>>>> page=ffffea00040023c0 pfn=1048719 nr=1 > >>>>> [ 0.270591] ftrace: allocating 52717 entries in 208 pages > >>>>> [ 0.270592] ftrace: allocated 208 pages with 3 groups > >>>>> [ 0.270620] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004002a00 pfn=1048744 nr=8 > >>>>> [ 0.270636] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea00040023c0 pfn=1048719 nr=1 > >>>>> [ 0.270643] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006000 pfn=1048960 nr=1 > >>>>> [ 0.270649] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006040 pfn=1048961 nr=1 > >>>>> [ 0.270658] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004007000 pfn=1049024 nr=64 > >>>>> [ 0.270659] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006080 pfn=1048962 nr=2 > >>>>> [ 0.270722] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006100 pfn=1048964 nr=1 > >>>>> [ 0.270730] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006140 pfn=1048965 nr=1 > >>>>> [ 0.270738] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006180 pfn=1048966 nr=1 > >>>>> [ 0.270777] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea00040061c0 pfn=1048967 nr=1 > >>>>> [ 0.270786] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006200 pfn=1048968 nr=1 > >>>>> [ 0.270792] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006240 pfn=1048969 nr=1 > >>>>> [ 0.270833] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006300 pfn=1048972 nr=4 > >>>>> [ 0.270891] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006280 pfn=1048970 nr=1 > >>>>> [ 0.270980] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea00040062c0 pfn=1048971 nr=1 > >>>>> [ 0.271071] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006400 pfn=1048976 nr=1 > >>>>> [ 0.271156] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006440 pfn=1048977 nr=1 > >>>>> [ 0.271185] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006480 pfn=1048978 nr=2 > >>>>> [ 0.271301] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006500 pfn=1048980 nr=1 > >>>>> [ 0.271655] Dynamic Preempt: lazy > >>>>> [ 0.271662] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006580 pfn=1048982 nr=2 > >>>>> [ 0.271752] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006600 pfn=1048984 nr=4 > >>>>> [ 0.271762] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004010000 pfn=1049600 nr=4 > >>>>> [ 0.271824] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006540 pfn=1048981 nr=1 > >>>>> [ 0.271916] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006700 pfn=1048988 nr=2 > >>>>> [ 0.271964] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006780 pfn=1048990 nr=1 > >>>>> [ 0.272099] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea00040067c0 pfn=1048991 nr=1 > >>>>> [ 0.272138] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006800 pfn=1048992 nr=2 > >>>>> [ 0.272144] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006a00 pfn=1049000 nr=8 > >>>>> [ 0.272249] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006c00 pfn=1049008 nr=8 > >>>>> [ 0.272319] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006880 pfn=1048994 nr=2 > >>>>> [ 0.272351] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006900 pfn=1048996 nr=4 > >>>>> [ 0.272424] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004006e00 pfn=1049016 nr=8 > >>>>> [ 0.272485] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004008000 pfn=1049088 nr=8 > >>>>> [ 0.272535] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004008200 pfn=1049096 nr=2 > >>>>> [ 0.272600] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004008400 pfn=1049104 nr=8 > >>>>> [ 0.272663] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004008300 pfn=1049100 nr=4 > >>>>> [ 0.272694] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004008280 pfn=1049098 nr=2 > >>>>> [ 0.272708] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004008600 pfn=1049112 nr=8 > >>>>> > >>>>> [ 0.272924] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004008880 pfn=1049122 nr=2 > >>>>> [ 0.272934] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004008900 pfn=1049124 nr=2 > >>>>> [ 0.272952] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004008c00 pfn=1049136 nr=4 > >>>>> [ 0.273035] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004008980 pfn=1049126 nr=2 > >>>>> [ 0.273062] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004008e00 pfn=1049144 nr=8 > >>>>> [ 0.273674] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004008d00 pfn=1049140 nr=1 > >>>>> [ 0.273884] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004008d80 pfn=1049142 nr=2 > >>>>> [ 0.273943] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009000 pfn=1049152 nr=2 > >>>>> [ 0.274379] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009080 pfn=1049154 nr=2 > >>>>> [ 0.274575] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009200 pfn=1049160 nr=8 > >>>>> [ 0.274617] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009100 pfn=1049156 nr=4 > >>>>> [ 0.274794] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009400 pfn=1049168 nr=2 > >>>>> [ 0.274840] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009480 pfn=1049170 nr=2 > >>>>> [ 0.275057] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009500 pfn=1049172 nr=2 > >>>>> [ 0.275092] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009580 pfn=1049174 nr=2 > >>>>> [ 0.275134] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009600 pfn=1049176 nr=8 > >>>>> [ 0.275211] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009800 pfn=1049184 nr=4 > >>>>> [ 0.275510] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009900 pfn=1049188 nr=2 > >>>>> [ 0.275548] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009980 pfn=1049190 nr=2 > >>>>> [ 0.275976] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009a00 pfn=1049192 nr=8 > >>>>> [ 0.275987] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009c00 pfn=1049200 nr=2 > >>>>> [ 0.276139] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009c80 pfn=1049202 nr=2 > >>>>> [ 0.276152] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004008d40 pfn=1049141 nr=1 > >>>>> [ 0.276242] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009d00 pfn=1049204 nr=1 > >>>>> [ 0.276358] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009d40 pfn=1049205 nr=1 > >>>>> [ 0.276444] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009d80 pfn=1049206 nr=1 > >>>>> [ 0.276526] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009dc0 pfn=1049207 nr=1 > >>>>> [ 0.276615] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009e00 pfn=1049208 nr=1 > >>>>> [ 0.276696] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009e40 pfn=1049209 nr=1 > >>>>> [ 0.276792] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009e80 pfn=1049210 nr=1 > >>>>> [ 0.276827] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009f00 pfn=1049212 nr=2 > >>>>> [ 0.276891] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009ec0 pfn=1049211 nr=1 > >>>>> [ 0.276999] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009f80 pfn=1049214 nr=1 > >>>>> [ 0.277082] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea0004009fc0 pfn=1049215 nr=1 > >>>>> [ 0.277172] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea000400a000 pfn=1049216 nr=1 > >>>>> [ 0.277257] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>> page=ffffea000400a040 pfn=1049217 nr=1 > >>>>> > >>>>> and so on. > >>>>> > >>>>> > >>>>>> I think your earlier patch can effectively detect these early > >>>>>> allocations and suppress the warnings. We should also mark these > >>>>>> allocations with CODETAG_FLAG_INACCURATE. > >>>>> Thanks to an excellent AI review, I realized there are issues with > >>>>> > >>>>> my original patch. One problem is the 256-element array; another > >>>> Yes, if there are lots of such allocations, it's not appropriate. > >>>> > >>>>> is that it involves allocation and free operations — meaning we need > >>>>> > >>>>> to record entries at __pgalloc_tag_add and remove them at __pgalloc_tag_sub, > >>>>> > >>>>> which introduces a noticeable overhead. I'm wondering if we can instead > >>>>> set a flag > >>>>> > >>>>> bit in page flags during the early boot stage, which I'll refer to as > >>>>> EARLY_ALLOC_FLAGS. > >>>>> > >>>>> Then, in __pgalloc_tag_sub, we first check for EARLY_ALLOC_FLAGS. If > >>>>> set, we clear the > >>>>> > >>>>> flag and return immediately; otherwise, we perform the actual > >>>>> subtraction of the tag count. > >>>>> > >>>>> This approach seems somewhat similar to the idea behind > >>>>> mem_profiling_compressed. > >>>> That seems doable but let's first check if we can make page_ext > >>>> initialization happen before these allocations. That would be the > >>>> ideal path. If it's not possible then we can focus on alternatives > >>>> like the one you propose. > >>> > >>> Yes, the ideal scenario would be to have page_ext initialization > >>> complete before > >>> > >>> these allocations occur. I just did a code walkthrough and found that > >>> this resembles > >>> > >>> the FLATMEM implementation approach - FLATMEM allocates page_ext before > >>> the buddy > >>> > >>> system initialization, so it doesn't seem to encounter the issue we're > >>> facing now. > >>> > >>> https://elixir.bootlin.com/linux/v7.0-rc5/source/mm/mm_init.c#L2707 > >> Yes, page_ext_init_flatmem() looks like an interesting option and it > >> would not work with sparsemem. TBH I would prefer to find a simple > >> solution that can identify early init allocations, mark them inaccuate > >> and suppress the warning rather than introduce some complex mechanism > >> to account for them which would work only is some cases (flatmem). > >> With your original approach I think the only real issue is the size of > >> the array that might be too small. The other issue you mentioned about > >> allocated page being freed and then re-allocated after page_ext is > >> inialized but before clear_page_tag_ref() is called is not really a > >> problem. Yes, we will lose that counter's value but it's similar to > >> other early allocations which we just treat as inaccurate. We can also > >> minimize the possibility of this happening by moving > >> clear_page_tag_ref() into init_page_alloc_tagging(). > >> > >> I don't like the pageflag option you mentioned because it adds an > >> extra condition check into __pgalloc_tag_sub() which will be executed > >> even after the init stage is over. > >> I'll look into this some more tomorrow as it's quite late now. > > > Hi Suren > > > > Just though of something. Are all these pages allocated by slab? If > > so, I think slab does not use page->lru (need to double-check) and we > > could add all these pages allocated during early init into a list and > > then set their page_ext reference to CODETAG_EMPTY in > > init_page_alloc_tagging(). > > Got your point. > > > There will indeed be some non-SLAB memory allocations here, such as the > following: > > > CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) > [ 0.326607] Hardware name: Red Hat KVM, BIOS > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [ 0.326608] Call Trace: > [ 0.326608] <TASK> > [ 0.326609] dump_stack_lvl+0x53/0x70 > [ 0.326611] __pgalloc_tag_add+0x407/0x700 > [ 0.326616] get_page_from_freelist+0xa54/0x1310 > [ 0.326618] __alloc_frozen_pages_noprof+0x206/0x4c0 > [ 0.326623] alloc_pages_mpol+0x13a/0x3f0 > [ 0.326627] alloc_pages_noprof+0xf6/0x2b0 > [ 0.326628] __pmd_alloc+0x743/0x9c0 > [ 0.326630] vmap_range_noflush+0xac0/0x10a0 > [ 0.326637] ioremap_page_range+0x17c/0x250 > [ 0.326639] __ioremap_caller+0x437/0x5c0 > [ 0.326645] acpi_os_map_iomem+0x4c0/0x660 > [ 0.326647] acpi_tb_verify_temp_table+0x1c0/0x580 > [ 0.326649] acpi_reallocate_root_table+0x2ad/0x460 > [ 0.326655] acpi_early_init+0x111/0x460 > [ 0.326657] start_kernel+0x271/0x3c0 > [ 0.326659] x86_64_start_reservations+0x18/0x30 > [ 0.326660] x86_64_start_kernel+0xe2/0xf0 > [ 0.326662] common_startup_64+0x13e/0x141 > [ 0.326663] </TASK> > > CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted > 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) > [ 0.329167] Hardware name: Red Hat KVM, BIOS > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [ 0.329167] Call Trace: > [ 0.329167] <TASK> > [ 0.329167] dump_stack_lvl+0x53/0x70 > [ 0.329167] __pgalloc_tag_add+0x407/0x700 > [ 0.329167] get_page_from_freelist+0xa54/0x1310 > [ 0.329167] __alloc_frozen_pages_noprof+0x206/0x4c0 > [ 0.329167] __alloc_pages_noprof+0x10/0x1b0 > [ 0.329167] dup_task_struct+0x163/0x8c0 > [ 0.329167] copy_process+0x390/0x4a70 > [ 0.329167] kernel_clone+0xe1/0x830 > [ 0.329167] kernel_thread+0xcb/0x110 > [ 0.329167] kthreadd+0x8a2/0xc60 > [ 0.329167] ret_from_fork+0x551/0x720 > [ 0.329167] ret_from_fork_asm+0x1a/0x30 > [ 0.329167] </TASK> > > CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted > 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) > [ 0.329167] Hardware name: Red Hat KVM, BIOS > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [ 0.329167] Call Trace: > [ 0.329167] <TASK> > [ 0.329167] dump_stack_lvl+0x53/0x70 > [ 0.329167] __pgalloc_tag_add+0x407/0x700 > [ 0.329167] get_page_from_freelist+0xa54/0x1310 > [ 0.329167] __alloc_frozen_pages_noprof+0x206/0x4c0 > [ 0.329167] __alloc_pages_noprof+0x10/0x1b0 > [ 0.329167] dup_task_struct+0x163/0x8c0 > [ 0.329167] copy_process+0x390/0x4a70 > [ 0.329167] kernel_clone+0xe1/0x830 > [ 0.329167] kernel_thread+0xcb/0x110 > [ 0.329167] kthreadd+0x8a2/0xc60 > [ 0.329167] ret_from_fork+0x551/0x720 > [ 0.329167] ret_from_fork_asm+0x1a/0x30 > [ 0.329167] </TASK> > > CPU: 4 UID: 0 PID: 1 Comm: swapper/0 Not tainted > 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) > [ 0.434265] Hardware name: Red Hat KVM, BIOS > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [ 0.434266] Call Trace: > [ 0.434266] <TASK> > [ 0.434266] dump_stack_lvl+0x53/0x70 > [ 0.434268] __pgalloc_tag_add+0x407/0x700 > [ 0.434272] get_page_from_freelist+0xa54/0x1310 > [ 0.434274] __alloc_frozen_pages_noprof+0x206/0x4c0 > [ 0.434279] alloc_pages_exact_nid_noprof+0x10f/0x380 > [ 0.434283] init_section_page_ext+0x167/0x370 > [ 0.434284] page_ext_init+0x451/0x620 > [ 0.434287] page_alloc_init_late+0x553/0x630 > [ 0.434290] kernel_init_freeable+0x7be/0xd30 > [ 0.434294] kernel_init+0x1f/0x1f0 > [ 0.434295] ret_from_fork+0x551/0x720 > [ 0.434301] ret_from_fork_asm+0x1a/0x30 > [ 0.434303] </TASK> > > CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted > 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) > [ 0.346712] Hardware name: Red Hat KVM, BIOS > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [ 0.346713] Call Trace: > [ 0.346713] <TASK> > [ 0.346714] dump_stack_lvl+0x53/0x70 > [ 0.346715] __pgalloc_tag_add+0x407/0x700 > [ 0.346720] get_page_from_freelist+0xa54/0x1310 > [ 0.346723] __alloc_frozen_pages_noprof+0x206/0x4c0 > [ 0.346729] __alloc_pages_noprof+0x10/0x1b0 > [ 0.346731] alloc_cpu_data+0x96/0x210 > [ 0.346732] rb_allocate_cpu_buffer+0xb93/0x1500 > [ 0.346739] trace_rb_cpu_prepare+0x21a/0x4f0 > [ 0.346753] cpuhp_invoke_callback+0x6db/0x14b0 > [ 0.346755] __cpuhp_invoke_callback_range+0xde/0x1d0 > [ 0.346759] _cpu_up+0x395/0x880 > [ 0.346761] cpu_up+0x1bb/0x210 > [ 0.346762] cpuhp_bringup_mask+0xd2/0x150 > [ 0.346763] bringup_nonboot_cpus+0x12b/0x170 > [ 0.346764] smp_init+0x2f/0x100 > [ 0.346766] kernel_init_freeable+0x7a5/0xd30 > [ 0.346769] kernel_init+0x1f/0x1f0 > [ 0.346771] ret_from_fork+0x551/0x720 > [ 0.346776] ret_from_fork_asm+0x1a/0x30 > [ 0.346778] </TASK> > > and so on... > > > In fact, I previously conducted extensive and prolonged stress testing > > on memory profiling. After our efforts to address several WARN cases, > > one remaining scenario we are addressing is the warning triggered during > > early slab cache reclaim — which is precisely the situation we are currently > > encountering (although I cannot guarantee that all edge cases have been > > covered by our stress testing). During the stress testing process, this > warning > > did indeed manifest. However, the current environment triggers KASAN slab > > cache reclaim earlier than anticipated. > > > Although the memory allocated prior to page_ext initialization has a > relatively low probability of > > being released in subsequent operations (at least we have not > encountered such cases up to now), > > I remain uncertain whether there are any overlooked edge cases when > considering only slab-backed pages. Ok, I guess specialized solution for slab would not work then. I want to check on my side and understand how the number of these early allocation scales. Is it higher for bigger machines or stays constant. If the latter I think your original simple solution with some fixups can still work. I'll need to instrument my code to capture these early allocations and see where they originate. If you have a patch already doing that it would help speed it up for me. Thanks, Suren. > > > Thanks > Hao > > >> Thanks, > >> Suren. > >> > >>> However, I'm not entirely certain whether SPARSEMEM can guarantee the > >>> same behavior. > >>> > >>> > >>>>> I would appreciate your valuable feedback and any better suggestions you > >>>>> might have. > >>>> Thanks for pursuing this! I'll help in any way I can. > >>>> Suren. > >>> Thank you so much for your patient guidance and assistance. > >>> > >>> I truly appreciate your willingness to share your knowledge and insights. > >>> > >>> Thanks, > >>> Hao > >>> > >>>>> Thanks > >>>>> > >>>>> Hao > >>>>> > >>>>>> Thanks, > >>>>>> Suren. > >>>>>> > >>>>>>> Thanks > >>>>>>> > >>>>>>> Hao > >>>>>>> > >>>>>>>>> Thanks. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>>>> If the slab cache has no free objects, it falls back > >>>>>>>>>>>>> to the buddy allocator to allocate memory. However, at this point page_ext > >>>>>>>>>>>>> is not yet fully initialized, so these newly allocated pages have no > >>>>>>>>>>>>> codetag set. These pages may later be reclaimed by KASAN,which causes > >>>>>>>>>>>>> the warning to trigger when they are freed because their codetag ref is > >>>>>>>>>>>>> still empty. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Use a global array to track pages allocated before page_ext is fully > >>>>>>>>>>>>> initialized, similar to how kmemleak tracks early allocations. > >>>>>>>>>>>>> When page_ext initialization completes, set their codetag > >>>>>>>>>>>>> to empty to avoid warnings when they are freed later. > >>>>>>>>>>>>> > >>>>>>>>>>>>> ... > >>>>>>>>>>>>> > >>>>>>>>>>>>> --- a/include/linux/alloc_tag.h > >>>>>>>>>>>>> +++ b/include/linux/alloc_tag.h > >>>>>>>>>>>>> @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) > >>>>>>>>>>>>> > >>>>>>>>>>>>> #ifdef CONFIG_MEM_ALLOC_PROFILING > >>>>>>>>>>>>> > >>>>>>>>>>>>> +bool mem_profiling_is_available(void); > >>>>>>>>>>>>> +void alloc_tag_add_early_pfn(unsigned long pfn); > >>>>>>>>>>>>> + > >>>>>>>>>>>>> #define ALLOC_TAG_SECTION_NAME "alloc_tags" > >>>>>>>>>>>>> > >>>>>>>>>>>>> struct codetag_bytes { > >>>>>>>>>>>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > >>>>>>>>>>>>> index 58991ab09d84..a5bf4e72c154 100644 > >>>>>>>>>>>>> --- a/lib/alloc_tag.c > >>>>>>>>>>>>> +++ b/lib/alloc_tag.c > >>>>>>>>>>>>> @@ -6,6 +6,7 @@ > >>>>>>>>>>>>> #include <linux/kallsyms.h> > >>>>>>>>>>>>> #include <linux/module.h> > >>>>>>>>>>>>> #include <linux/page_ext.h> > >>>>>>>>>>>>> +#include <linux/pgalloc_tag.h> > >>>>>>>>>>>>> #include <linux/proc_fs.h> > >>>>>>>>>>>>> #include <linux/seq_buf.h> > >>>>>>>>>>>>> #include <linux/seq_file.h> > >>>>>>>>>>>>> @@ -26,6 +27,82 @@ static bool mem_profiling_support; > >>>>>>>>>>>>> > >>>>>>>>>>>>> static struct codetag_type *alloc_tag_cttype; > >>>>>>>>>>>>> > >>>>>>>>>>>>> +/* > >>>>>>>>>>>>> + * State of the alloc_tag > >>>>>>>>>>>>> + * > >>>>>>>>>>>>> + * This is used to describe the states of the alloc_tag during bootup. > >>>>>>>>>>>>> + * > >>>>>>>>>>>>> + * When we need to allocate page_ext to store codetag, we face an > >>>>>>>>>>>>> + * initialization timing problem: > >>>>>>>>>>>>> + * > >>>>>>>>>>>>> + * Due to initialization order, pages may be allocated via buddy system > >>>>>>>>>>>>> + * before page_ext is fully allocated and initialized. Although these > >>>>>>>>>>>>> + * pages call the allocation hooks, the codetag will not be set because > >>>>>>>>>>>>> + * page_ext is not yet available. > >>>>>>>>>>>>> + * > >>>>>>>>>>>>> + * When these pages are later free to the buddy system, it triggers > >>>>>>>>>>>>> + * warnings because their codetag is actually empty if > >>>>>>>>>>>>> + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. > >>>>>>>>>>>>> + * > >>>>>>>>>>>>> + * Additionally, in this situation, we cannot record detailed allocation > >>>>>>>>>>>>> + * information for these pages. > >>>>>>>>>>>>> + */ > >>>>>>>>>>>>> +enum mem_profiling_state { > >>>>>>>>>>>>> + DOWN, /* No mem_profiling functionality yet */ > >>>>>>>>>>>>> + UP /* Everything is working */ > >>>>>>>>>>>>> +}; > >>>>>>>>>>>>> + > >>>>>>>>>>>>> +static enum mem_profiling_state mem_profiling_state = DOWN; > >>>>>>>>>>>>> + > >>>>>>>>>>>>> +bool mem_profiling_is_available(void) > >>>>>>>>>>>>> +{ > >>>>>>>>>>>>> + return mem_profiling_state == UP; > >>>>>>>>>>>>> +} > >>>>>>>>>>>>> + > >>>>>>>>>>>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG > >>>>>>>>>>>>> + > >>>>>>>>>>>>> +#define EARLY_ALLOC_PFN_MAX 256 > >>>>>>>>>>>>> + > >>>>>>>>>>>>> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; > >>>>>>>>>>>> It's unfortunate that this isn't __initdata. > >>>>>>>>>>>> > >>>>>>>>>>>>> +static unsigned int early_pfn_count; > >>>>>>>>>>>>> +static DEFINE_SPINLOCK(early_pfn_lock); > >>>>>>>>>>>>> + > >>>>>>>>>>>>> > >>>>>>>>>>>>> ... > >>>>>>>>>>>>> > >>>>>>>>>>>>> --- a/mm/page_alloc.c > >>>>>>>>>>>>> +++ b/mm/page_alloc.c > >>>>>>>>>>>>> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, > >>>>>>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >>>>>>>>>>>>> update_page_tag_ref(handle, &ref); > >>>>>>>>>>>>> put_page_tag_ref(handle); > >>>>>>>>>>>>> + } else { > >>>>>>>>>>> This branch can be marked as "unlikely". > >>>>>>>>>>> > >>>>>>>>>>>>> + /* > >>>>>>>>>>>>> + * page_ext is not available yet, record the pfn so we can > >>>>>>>>>>>>> + * clear the tag ref later when page_ext is initialized. > >>>>>>>>>>>>> + */ > >>>>>>>>>>>>> + if (!mem_profiling_is_available()) > >>>>>>>>>>>>> + alloc_tag_add_early_pfn(page_to_pfn(page)); > >>>>>>>>>>>>> } > >>>>>>>>>>>>> } > >>>>>>>>>>>> All because of this, I believe. Is this fixable? > >>>>>>>>>>>> > >>>>>>>>>>>> If we take that `else', we know we're running in __init code, yes? I > >>>>>>>>>>>> don't see how `__init pgalloc_tag_add_early()' could be made to work. > >>>>>>>>>>>> hrm. Something clever, please. > >>>>>>>>>>> We can have a pointer to a function that is initialized to point to > >>>>>>>>>>> alloc_tag_add_early_pfn, which is defined as __init and uses > >>>>>>>>>>> early_pfns which now can be defined as __initdata. After > >>>>>>>>>>> clear_early_alloc_pfn_tag_refs() is done we reset that pointer to > >>>>>>>>>>> NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() > >>>>>>>>>>> directly checks that pointer and if it's not NULL then calls the > >>>>>>>>>>> function that it points to. This way __pgalloc_tag_add() which is not > >>>>>>>>>>> an __init function will be invoking alloc_tag_add_early_pfn() __init > >>>>>>>>>>> function only until we are done with initialization. I haven't tried > >>>>>>>>>>> this but I think that should work. This also eliminates the need for > >>>>>>>>>>> mem_profiling_state variable since we can use this function pointer > >>>>>>>>>>> instead. > >>>>>>>>>>> > >>>>>>>>>>> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-25 15:17 ` Suren Baghdasaryan @ 2026-03-26 1:44 ` Hao Ge 2026-03-26 5:04 ` Suren Baghdasaryan 0 siblings, 1 reply; 21+ messages in thread From: Hao Ge @ 2026-03-26 1:44 UTC (permalink / raw) To: Suren Baghdasaryan; +Cc: Andrew Morton, Kent Overstreet, linux-mm, linux-kernel On 2026/3/25 23:17, Suren Baghdasaryan wrote: > On Wed, Mar 25, 2026 at 4:21 AM Hao Ge <hao.ge@linux.dev> wrote: >> >> On 2026/3/25 15:35, Suren Baghdasaryan wrote: >>> On Tue, Mar 24, 2026 at 11:25 PM Suren Baghdasaryan <surenb@google.com> wrote: >>>> On Tue, Mar 24, 2026 at 7:08 PM Hao Ge <hao.ge@linux.dev> wrote: >>>>> On 2026/3/25 08:21, Suren Baghdasaryan wrote: >>>>>> On Tue, Mar 24, 2026 at 2:43 AM Hao Ge <hao.ge@linux.dev> wrote: >>>>>>> On 2026/3/24 06:47, Suren Baghdasaryan wrote: >>>>>>>> On Mon, Mar 23, 2026 at 2:16 AM Hao Ge <hao.ge@linux.dev> wrote: >>>>>>>>> On 2026/3/20 10:14, Suren Baghdasaryan wrote: >>>>>>>>>> On Thu, Mar 19, 2026 at 6:58 PM Hao Ge <hao.ge@linux.dev> wrote: >>>>>>>>>>> On 2026/3/20 07:48, Suren Baghdasaryan wrote: >>>>>>>>>>>> On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@google.com> wrote: >>>>>>>>>>>>> On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: >>>>>>>>>>>>>> On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Due to initialization ordering, page_ext is allocated and initialized >>>>>>>>>>>>>>> relatively late during boot. Some pages have already been allocated >>>>>>>>>>>>>>> and freed before page_ext becomes available, leaving their codetag >>>>>>>>>>>>>>> uninitialized. >>>>>>>>>>>>> Hi Hao, >>>>>>>>>>>>> Thanks for the report. >>>>>>>>>>>>> Hmm. So, we are allocating pages before page_ext is initialized... >>>>>>>>>>>>> >>>>>>>>>>>>>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls >>>>>>>>>>>>>>> kmemleak_alloc(). >>>>>>>>>>>> Forgot to ask. The example you are using here is for page_ext >>>>>>>>>>>> allocation itself. Do you have any other examples where page >>>>>>>>>>>> allocation happens before page_ext initialization? If that's the only >>>>>>>>>>>> place, then we might be able to fix this in a simpler way by doing >>>>>>>>>>>> something special for alloc_page_ext(). >>>>>>>>>>> Hi Suren >>>>>>>>>>> >>>>>>>>>>> To help illustrate the point, here's the debug log I added: >>>>>>>>>>> >>>>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>>>>>>>>> index 2d4b6f1a554e..ebfe636f5b07 100644 >>>>>>>>>>> --- a/mm/page_alloc.c >>>>>>>>>>> +++ b/mm/page_alloc.c >>>>>>>>>>> @@ -1293,6 +1293,9 @@ void __pgalloc_tag_add(struct page *page, struct >>>>>>>>>>> task_struct *task, >>>>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >>>>>>>>>>> update_page_tag_ref(handle, &ref); >>>>>>>>>>> put_page_tag_ref(handle); >>>>>>>>>>> + } else { >>>>>>>>>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); >>>>>>>>>>> + dump_stack(); >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> And I caught the following logs: >>>>>>>>>>> >>>>>>>>>>> [ 0.296399] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>>>> page=ffffea000400c700 pfn=1049372 nr=1 >>>>>>>>>>> [ 0.296400] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >>>>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >>>>>>>>>>> [ 0.296402] Hardware name: Red Hat KVM, BIOS >>>>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>>>>>>>>> [ 0.296402] Call Trace: >>>>>>>>>>> [ 0.296403] <TASK> >>>>>>>>>>> [ 0.296403] dump_stack_lvl+0x53/0x70 >>>>>>>>>>> [ 0.296405] __pgalloc_tag_add+0x3a3/0x6e0 >>>>>>>>>>> [ 0.296406] ? __pfx___pgalloc_tag_add+0x10/0x10 >>>>>>>>>>> [ 0.296407] ? kasan_unpoison+0x27/0x60 >>>>>>>>>>> [ 0.296409] ? __kasan_unpoison_pages+0x2c/0x40 >>>>>>>>>>> [ 0.296411] get_page_from_freelist+0xa54/0x1310 >>>>>>>>>>> [ 0.296413] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>>>>>>>>> [ 0.296415] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >>>>>>>>>>> [ 0.296417] ? stack_depot_save_flags+0x3f/0x680 >>>>>>>>>>> [ 0.296418] ? ___slab_alloc+0x518/0x530 >>>>>>>>>>> [ 0.296420] alloc_pages_mpol+0x13a/0x3f0 >>>>>>>>>>> [ 0.296421] ? __pfx_alloc_pages_mpol+0x10/0x10 >>>>>>>>>>> [ 0.296423] ? _raw_spin_lock_irqsave+0x8a/0xf0 >>>>>>>>>>> [ 0.296424] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 >>>>>>>>>>> [ 0.296426] alloc_slab_page+0xc2/0x130 >>>>>>>>>>> [ 0.296427] allocate_slab+0x77/0x2c0 >>>>>>>>>>> [ 0.296429] ? syscall_enter_define_fields+0x3bb/0x5f0 >>>>>>>>>>> [ 0.296430] ___slab_alloc+0x125/0x530 >>>>>>>>>>> [ 0.296432] ? __trace_define_field+0x252/0x3d0 >>>>>>>>>>> [ 0.296433] __kmalloc_noprof+0x329/0x630 >>>>>>>>>>> [ 0.296435] ? syscall_enter_define_fields+0x3bb/0x5f0 >>>>>>>>>>> [ 0.296436] syscall_enter_define_fields+0x3bb/0x5f0 >>>>>>>>>>> [ 0.296438] ? __pfx_syscall_enter_define_fields+0x10/0x10 >>>>>>>>>>> [ 0.296440] event_define_fields+0x326/0x540 >>>>>>>>>>> [ 0.296441] __trace_early_add_events+0xac/0x3c0 >>>>>>>>>>> [ 0.296443] trace_event_init+0x24c/0x460 >>>>>>>>>>> [ 0.296445] trace_init+0x9/0x20 >>>>>>>>>>> [ 0.296446] start_kernel+0x199/0x3c0 >>>>>>>>>>> [ 0.296448] x86_64_start_reservations+0x18/0x30 >>>>>>>>>>> [ 0.296449] x86_64_start_kernel+0xe2/0xf0 >>>>>>>>>>> [ 0.296451] common_startup_64+0x13e/0x141 >>>>>>>>>>> [ 0.296453] </TASK> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [ 0.312234] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>>>> page=ffffea000400f900 pfn=1049572 nr=1 >>>>>>>>>>> [ 0.312234] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >>>>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >>>>>>>>>>> [ 0.312236] Hardware name: Red Hat KVM, BIOS >>>>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>>>>>>>>> [ 0.312236] Call Trace: >>>>>>>>>>> [ 0.312237] <TASK> >>>>>>>>>>> [ 0.312237] dump_stack_lvl+0x53/0x70 >>>>>>>>>>> [ 0.312239] __pgalloc_tag_add+0x3a3/0x6e0 >>>>>>>>>>> [ 0.312240] ? __pfx___pgalloc_tag_add+0x10/0x10 >>>>>>>>>>> [ 0.312241] ? rmqueue.constprop.0+0x4fc/0x1ce0 >>>>>>>>>>> [ 0.312243] ? kasan_unpoison+0x27/0x60 >>>>>>>>>>> [ 0.312244] ? __kasan_unpoison_pages+0x2c/0x40 >>>>>>>>>>> [ 0.312246] get_page_from_freelist+0xa54/0x1310 >>>>>>>>>>> [ 0.312248] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>>>>>>>>> [ 0.312250] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >>>>>>>>>>> [ 0.312253] alloc_slab_page+0x39/0x130 >>>>>>>>>>> [ 0.312254] allocate_slab+0x77/0x2c0 >>>>>>>>>>> [ 0.312255] ? alloc_cpumask_var_node+0xc7/0x230 >>>>>>>>>>> [ 0.312257] ___slab_alloc+0x46d/0x530 >>>>>>>>>>> [ 0.312259] __kmalloc_node_noprof+0x2fa/0x680 >>>>>>>>>>> [ 0.312261] ? alloc_cpumask_var_node+0xc7/0x230 >>>>>>>>>>> [ 0.312263] alloc_cpumask_var_node+0xc7/0x230 >>>>>>>>>>> [ 0.312264] init_desc+0x141/0x6b0 >>>>>>>>>>> [ 0.312266] alloc_desc+0x108/0x1b0 >>>>>>>>>>> [ 0.312267] early_irq_init+0xee/0x1c0 >>>>>>>>>>> [ 0.312268] ? __pfx_early_irq_init+0x10/0x10 >>>>>>>>>>> [ 0.312271] start_kernel+0x1ab/0x3c0 >>>>>>>>>>> [ 0.312272] x86_64_start_reservations+0x18/0x30 >>>>>>>>>>> [ 0.312274] x86_64_start_kernel+0xe2/0xf0 >>>>>>>>>>> [ 0.312275] common_startup_64+0x13e/0x141 >>>>>>>>>>> [ 0.312277] </TASK> >>>>>>>>>>> >>>>>>>>>>> [ 0.312834] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>>>> page=ffffea000400fc00 pfn=1049584 nr=1 >>>>>>>>>>> [ 0.312835] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >>>>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >>>>>>>>>>> [ 0.312836] Hardware name: Red Hat KVM, BIOS >>>>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>>>>>>>>> [ 0.312837] Call Trace: >>>>>>>>>>> [ 0.312837] <TASK> >>>>>>>>>>> [ 0.312838] dump_stack_lvl+0x53/0x70 >>>>>>>>>>> [ 0.312840] __pgalloc_tag_add+0x3a3/0x6e0 >>>>>>>>>>> [ 0.312841] ? __pfx___pgalloc_tag_add+0x10/0x10 >>>>>>>>>>> [ 0.312842] ? rmqueue.constprop.0+0x4fc/0x1ce0 >>>>>>>>>>> [ 0.312844] ? kasan_unpoison+0x27/0x60 >>>>>>>>>>> [ 0.312845] ? __kasan_unpoison_pages+0x2c/0x40 >>>>>>>>>>> [ 0.312847] get_page_from_freelist+0xa54/0x1310 >>>>>>>>>>> [ 0.312849] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>>>>>>>>> [ 0.312851] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >>>>>>>>>>> [ 0.312853] alloc_pages_mpol+0x13a/0x3f0 >>>>>>>>>>> [ 0.312855] ? __pfx_alloc_pages_mpol+0x10/0x10 >>>>>>>>>>> [ 0.312856] ? xas_find+0x2d8/0x450 >>>>>>>>>>> [ 0.312858] ? _raw_spin_lock+0x84/0xe0 >>>>>>>>>>> [ 0.312859] ? __pfx__raw_spin_lock+0x10/0x10 >>>>>>>>>>> [ 0.312861] alloc_pages_noprof+0xf6/0x2b0 >>>>>>>>>>> [ 0.312862] __change_page_attr+0x293/0x850 >>>>>>>>>>> [ 0.312864] ? __pfx___change_page_attr+0x10/0x10 >>>>>>>>>>> [ 0.312865] ? _vm_unmap_aliases+0x2d0/0x650 >>>>>>>>>>> [ 0.312868] ? __pfx__vm_unmap_aliases+0x10/0x10 >>>>>>>>>>> [ 0.312869] __change_page_attr_set_clr+0x16c/0x360 >>>>>>>>>>> [ 0.312871] ? spp_getpage+0xbb/0x1e0 >>>>>>>>>>> [ 0.312872] change_page_attr_set_clr+0x220/0x3c0 >>>>>>>>>>> [ 0.312873] ? flush_tlb_one_kernel+0xf/0x30 >>>>>>>>>>> [ 0.312875] ? set_pte_vaddr_p4d+0x110/0x180 >>>>>>>>>>> [ 0.312877] ? __pfx_change_page_attr_set_clr+0x10/0x10 >>>>>>>>>>> [ 0.312878] ? __pfx_set_pte_vaddr_p4d+0x10/0x10 >>>>>>>>>>> [ 0.312881] ? __pfx_mtree_load+0x10/0x10 >>>>>>>>>>> [ 0.312883] ? __pfx_mtree_load+0x10/0x10 >>>>>>>>>>> [ 0.312884] ? __asan_memcpy+0x3c/0x60 >>>>>>>>>>> [ 0.312886] ? set_intr_gate+0x10c/0x150 >>>>>>>>>>> [ 0.312888] set_memory_ro+0x76/0xa0 >>>>>>>>>>> [ 0.312889] ? __pfx_set_memory_ro+0x10/0x10 >>>>>>>>>>> [ 0.312891] idt_setup_apic_and_irq_gates+0x2c1/0x390 >>>>>>>>>>> >>>>>>>>>>> and more. >>>>>>>>>> Ok, it's not the only place. Got your point. >>>>>>>>>> >>>>>>>>>>> off topic - if we were to handle only alloc_page_ext() specifically, >>>>>>>>>>> what would be the most straightforward >>>>>>>>>>> >>>>>>>>>>> solution in your mind? I'd really appreciate your insight. >>>>>>>>>> I was thinking if it's the only special case maybe we can handle it >>>>>>>>>> somehow differently, like we do when we allocate obj_ext vectors for >>>>>>>>>> slabs using __GFP_NO_OBJ_EXT. I haven't found a good solution yet but >>>>>>>>>> since it's not a special case we would not be able to use it even if I >>>>>>>>>> came up with something... >>>>>>>>>> I think your way is the most straight-forward but please try my >>>>>>>>>> suggestion to see if we can avoid extra overhead. >>>>>>>>>> Thanks, >>>>>>>>>> Suren. >>>>> Hi Suren >>>>>>> Hi Suren >>>>>>> >>>>>>> >>>>>>>> Hi Hao, >>>>>>>> >>>>>>>>> Hi Suren >>>>>>>>> >>>>>>>>> Thank you for your feedback. After re-examining this issue, >>>>>>>>> >>>>>>>>> I realize my previous focus was misplaced. >>>>>>>>> >>>>>>>>> Upon deeper consideration, I understand that this is not merely a bug, >>>>>>>>> >>>>>>>>> but rather a warning that indicates a gap in our memory profiling mechanism. >>>>>>>>> >>>>>>>>> Specifically, the current implementation appears to be missing memory >>>>>>>>> allocation >>>>>>>>> >>>>>>>>> tracking during the period between the buddy system allocation and page_ext >>>>>>>>> >>>>>>>>> initialization. >>>>>>>>> >>>>>>>>> This profiling gap means we may not be capturing all relevant memory >>>>>>>>> allocation >>>>>>>>> >>>>>>>>> events during this critical transition phase. >>>>>>>> Correct, this limitation exists because memory profiling relies on >>>>>>>> some kernel facilities (page_ext, objj_ext) which might not be >>>>>>>> initialized yet at the time of allocation. >>>>>>>> >>>>>>>>> My approach is to dynamically allocate codetag_ref when get_page_tag_ref >>>>>>>>> fails, >>>>>>>>> >>>>>>>>> and maintain a linked list to track all buddy system allocations that >>>>>>>>> occur prior to page_ext initialization. >>>>>>>>> >>>>>>>>> However, this introduces performance concerns: >>>>>>>>> >>>>>>>>> 1. Free Path Overhead: When freeing these pages, we would need to >>>>>>>>> traverse the entire linked list to locate >>>>>>>>> >>>>>>>>> the corresponding codetag_ref, resulting in O(n) lookup complexity >>>>>>>>> per free operation. >>>>>>>>> >>>>>>>>> 2. Initialization Overhead: During init_page_alloc_tagging, iterating >>>>>>>>> through the linked list to assign codetag_ref to >>>>>>>>> >>>>>>>>> page_ext would introduce additional traversal cost. >>>>>>>>> >>>>>>>>> If the number of pages is substantial, this could incur significant >>>>>>>>> overhead. What are your thoughts on this? I look forward to your >>>>>>>>> suggestions. >>>>>>>> My thinking is that these early allocations comprise a small portion >>>>>>>> of overall memory consumed by the system. So, instead of trying to >>>>>>>> record and handle them in some alternative way, we just accept that >>>>>>>> some counters might not be exactly accurate and ignore those early >>>>>>>> allocations. See how the early slab allocations are marked with the >>>>>>>> CODETAG_FLAG_INACCURATE flag and later reported as inaccurate. I think >>>>>>>> that's an acceptable alternative to introducing extra complexity and >>>>>>>> performance overhead. IOW, the benefits of accounting for these early >>>>>>>> allocations are low compared to the effort required to account for >>>>>>>> them. Unless you found a simple and performant way to do that... >>>>>>> I have been exploring possible solutions to this issue over the past few >>>>>>> days, >>>>>>> >>>>>>> but so far I have not come up with a good approach. >>>>>>> >>>>>>> I have counted the number of memory allocations that occur earlier than the >>>>>>> >>>>>>> allocation and initialization of our page_ext, and found that there are >>>>>>> actually >>>>>>> >>>>>>> quite a lot of them. >>>>>> Interesting... I wonder it's because deferred_struct_pages defers >>>>>> page_ext initialization. Can you check if setting early_page_ext >>>>>> reduces or eliminates these allocations before page_ext init cases? >>>>> Yes, you are correct. In my 8-core 16GB virtual machine, I used a global >>>>> counter >>>>> >>>>> to record these allocations. With early_page_ext enabled, there were 130 >>>>> allocations >>>>> >>>>> before page_ext initialization. Without early_page_ext, there were 802 >>>>> allocations >>>>> >>>>> before page_ext initialization. >>>>> >>>>> >>>>>>> Similarly, I have made the following changes and collected the >>>>>>> corresponding logs. >>>>>>> >>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>>>>> index 2d4b6f1a554e..6db65b3d52d3 100644 >>>>>>> --- a/mm/page_alloc.c >>>>>>> +++ b/mm/page_alloc.c >>>>>>> @@ -1293,6 +1293,8 @@ void __pgalloc_tag_add(struct page *page, struct >>>>>>> task_struct *task, >>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >>>>>>> update_page_tag_ref(handle, &ref); >>>>>>> put_page_tag_ref(handle); >>>>>>> + } else{ >>>>>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> @@ -1314,6 +1316,8 @@ void __pgalloc_tag_sub(struct page *page, unsigned >>>>>>> int nr) >>>>>>> alloc_tag_sub(&ref, PAGE_SIZE * nr); >>>>>>> update_page_tag_ref(handle, &ref); >>>>>>> put_page_tag_ref(handle); >>>>>>> + } else{ >>>>>>> + pr_warn("__pgalloc_tag_sub: get_page_tag_ref failed! >>>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> [ 0.261699] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001000 pfn=1048640 nr=2 >>>>>>> [ 0.261711] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001100 pfn=1048644 nr=4 >>>>>>> [ 0.261717] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001200 pfn=1048648 nr=4 >>>>>>> [ 0.261721] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001300 pfn=1048652 nr=4 >>>>>>> [ 0.261893] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001080 pfn=1048642 nr=2 >>>>>>> [ 0.261917] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001400 pfn=1048656 nr=4 >>>>>>> [ 0.262018] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001500 pfn=1048660 nr=2 >>>>>>> [ 0.262024] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001600 pfn=1048664 nr=8 >>>>>>> [ 0.262040] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001580 pfn=1048662 nr=1 >>>>>>> [ 0.262048] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea00040015c0 pfn=1048663 nr=1 >>>>>>> [ 0.262056] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001800 pfn=1048672 nr=2 >>>>>>> [ 0.262064] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001880 pfn=1048674 nr=2 >>>>>>> [ 0.262078] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001900 pfn=1048676 nr=2 >>>>>>> [ 0.262196] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 >>>>>>> [ 0.262213] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001980 pfn=1048678 nr=2 >>>>>>> [ 0.262220] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001a00 pfn=1048680 nr=4 >>>>>>> [ 0.262246] ODEBUG: selftest passed >>>>>>> [ 0.262268] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001b00 pfn=1048684 nr=1 >>>>>>> [ 0.262318] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001b40 pfn=1048685 nr=1 >>>>>>> [ 0.262368] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001b80 pfn=1048686 nr=1 >>>>>>> [ 0.262418] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001bc0 pfn=1048687 nr=1 >>>>>>> [ 0.262469] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001c00 pfn=1048688 nr=1 >>>>>>> [ 0.262519] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001c40 pfn=1048689 nr=1 >>>>>>> [ 0.262569] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001c80 pfn=1048690 nr=1 >>>>>>> [ 0.262620] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001cc0 pfn=1048691 nr=1 >>>>>>> [ 0.262670] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001d00 pfn=1048692 nr=1 >>>>>>> [ 0.262721] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001d40 pfn=1048693 nr=1 >>>>>>> [ 0.262771] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001d80 pfn=1048694 nr=1 >>>>>>> [ 0.262821] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001dc0 pfn=1048695 nr=1 >>>>>>> [ 0.262871] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001e00 pfn=1048696 nr=1 >>>>>>> [ 0.262923] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001e40 pfn=1048697 nr=1 >>>>>>> [ 0.262974] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001e80 pfn=1048698 nr=1 >>>>>>> [ 0.263024] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001ec0 pfn=1048699 nr=1 >>>>>>> [ 0.263074] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001f00 pfn=1048700 nr=1 >>>>>>> [ 0.263124] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001f40 pfn=1048701 nr=1 >>>>>>> [ 0.263174] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001f80 pfn=1048702 nr=1 >>>>>>> [ 0.263224] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004001fc0 pfn=1048703 nr=1 >>>>>>> [ 0.263275] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004002000 pfn=1048704 nr=1 >>>>>>> [ 0.263325] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004002040 pfn=1048705 nr=1 >>>>>>> [ 0.263375] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004002080 pfn=1048706 nr=1 >>>>>>> [ 0.263427] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004002400 pfn=1048720 nr=16 >>>>>>> [ 0.263437] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea00040020c0 pfn=1048707 nr=1 >>>>>>> [ 0.263463] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004002100 pfn=1048708 nr=1 >>>>>>> [ 0.263465] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004002140 pfn=1048709 nr=1 >>>>>>> [ 0.263467] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004002180 pfn=1048710 nr=1 >>>>>>> [ 0.263509] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004002200 pfn=1048712 nr=4 >>>>>>> [ 0.263512] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004002800 pfn=1048736 nr=8 >>>>>>> [ 0.263524] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea00040021c0 pfn=1048711 nr=1 >>>>>>> [ 0.263536] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004002300 pfn=1048716 nr=1 >>>>>>> [ 0.263537] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004002340 pfn=1048717 nr=1 >>>>>>> [ 0.263539] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004002380 pfn=1048718 nr=1 >>>>>>> [ 0.263604] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004004000 pfn=1048832 nr=128 >>>>>>> [ 0.263638] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004003000 pfn=1048768 nr=64 >>>>>>> [ 0.263650] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004002c00 pfn=1048752 nr=16 >>>>>>> [ 0.263655] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea00040023c0 pfn=1048719 nr=1 >>>>>>> [ 0.270582] __pgalloc_tag_sub: get_page_tag_ref failed! >>>>>>> page=ffffea00040023c0 pfn=1048719 nr=1 >>>>>>> [ 0.270591] ftrace: allocating 52717 entries in 208 pages >>>>>>> [ 0.270592] ftrace: allocated 208 pages with 3 groups >>>>>>> [ 0.270620] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004002a00 pfn=1048744 nr=8 >>>>>>> [ 0.270636] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea00040023c0 pfn=1048719 nr=1 >>>>>>> [ 0.270643] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006000 pfn=1048960 nr=1 >>>>>>> [ 0.270649] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006040 pfn=1048961 nr=1 >>>>>>> [ 0.270658] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004007000 pfn=1049024 nr=64 >>>>>>> [ 0.270659] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006080 pfn=1048962 nr=2 >>>>>>> [ 0.270722] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006100 pfn=1048964 nr=1 >>>>>>> [ 0.270730] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006140 pfn=1048965 nr=1 >>>>>>> [ 0.270738] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006180 pfn=1048966 nr=1 >>>>>>> [ 0.270777] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea00040061c0 pfn=1048967 nr=1 >>>>>>> [ 0.270786] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006200 pfn=1048968 nr=1 >>>>>>> [ 0.270792] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006240 pfn=1048969 nr=1 >>>>>>> [ 0.270833] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006300 pfn=1048972 nr=4 >>>>>>> [ 0.270891] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006280 pfn=1048970 nr=1 >>>>>>> [ 0.270980] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea00040062c0 pfn=1048971 nr=1 >>>>>>> [ 0.271071] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006400 pfn=1048976 nr=1 >>>>>>> [ 0.271156] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006440 pfn=1048977 nr=1 >>>>>>> [ 0.271185] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006480 pfn=1048978 nr=2 >>>>>>> [ 0.271301] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006500 pfn=1048980 nr=1 >>>>>>> [ 0.271655] Dynamic Preempt: lazy >>>>>>> [ 0.271662] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006580 pfn=1048982 nr=2 >>>>>>> [ 0.271752] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006600 pfn=1048984 nr=4 >>>>>>> [ 0.271762] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004010000 pfn=1049600 nr=4 >>>>>>> [ 0.271824] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006540 pfn=1048981 nr=1 >>>>>>> [ 0.271916] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006700 pfn=1048988 nr=2 >>>>>>> [ 0.271964] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006780 pfn=1048990 nr=1 >>>>>>> [ 0.272099] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea00040067c0 pfn=1048991 nr=1 >>>>>>> [ 0.272138] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006800 pfn=1048992 nr=2 >>>>>>> [ 0.272144] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006a00 pfn=1049000 nr=8 >>>>>>> [ 0.272249] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006c00 pfn=1049008 nr=8 >>>>>>> [ 0.272319] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006880 pfn=1048994 nr=2 >>>>>>> [ 0.272351] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006900 pfn=1048996 nr=4 >>>>>>> [ 0.272424] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004006e00 pfn=1049016 nr=8 >>>>>>> [ 0.272485] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004008000 pfn=1049088 nr=8 >>>>>>> [ 0.272535] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004008200 pfn=1049096 nr=2 >>>>>>> [ 0.272600] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004008400 pfn=1049104 nr=8 >>>>>>> [ 0.272663] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004008300 pfn=1049100 nr=4 >>>>>>> [ 0.272694] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004008280 pfn=1049098 nr=2 >>>>>>> [ 0.272708] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004008600 pfn=1049112 nr=8 >>>>>>> >>>>>>> [ 0.272924] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004008880 pfn=1049122 nr=2 >>>>>>> [ 0.272934] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004008900 pfn=1049124 nr=2 >>>>>>> [ 0.272952] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004008c00 pfn=1049136 nr=4 >>>>>>> [ 0.273035] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004008980 pfn=1049126 nr=2 >>>>>>> [ 0.273062] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004008e00 pfn=1049144 nr=8 >>>>>>> [ 0.273674] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004008d00 pfn=1049140 nr=1 >>>>>>> [ 0.273884] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004008d80 pfn=1049142 nr=2 >>>>>>> [ 0.273943] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009000 pfn=1049152 nr=2 >>>>>>> [ 0.274379] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009080 pfn=1049154 nr=2 >>>>>>> [ 0.274575] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009200 pfn=1049160 nr=8 >>>>>>> [ 0.274617] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009100 pfn=1049156 nr=4 >>>>>>> [ 0.274794] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009400 pfn=1049168 nr=2 >>>>>>> [ 0.274840] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009480 pfn=1049170 nr=2 >>>>>>> [ 0.275057] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009500 pfn=1049172 nr=2 >>>>>>> [ 0.275092] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009580 pfn=1049174 nr=2 >>>>>>> [ 0.275134] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009600 pfn=1049176 nr=8 >>>>>>> [ 0.275211] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009800 pfn=1049184 nr=4 >>>>>>> [ 0.275510] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009900 pfn=1049188 nr=2 >>>>>>> [ 0.275548] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009980 pfn=1049190 nr=2 >>>>>>> [ 0.275976] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009a00 pfn=1049192 nr=8 >>>>>>> [ 0.275987] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009c00 pfn=1049200 nr=2 >>>>>>> [ 0.276139] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009c80 pfn=1049202 nr=2 >>>>>>> [ 0.276152] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004008d40 pfn=1049141 nr=1 >>>>>>> [ 0.276242] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009d00 pfn=1049204 nr=1 >>>>>>> [ 0.276358] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009d40 pfn=1049205 nr=1 >>>>>>> [ 0.276444] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009d80 pfn=1049206 nr=1 >>>>>>> [ 0.276526] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009dc0 pfn=1049207 nr=1 >>>>>>> [ 0.276615] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009e00 pfn=1049208 nr=1 >>>>>>> [ 0.276696] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009e40 pfn=1049209 nr=1 >>>>>>> [ 0.276792] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009e80 pfn=1049210 nr=1 >>>>>>> [ 0.276827] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009f00 pfn=1049212 nr=2 >>>>>>> [ 0.276891] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009ec0 pfn=1049211 nr=1 >>>>>>> [ 0.276999] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009f80 pfn=1049214 nr=1 >>>>>>> [ 0.277082] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea0004009fc0 pfn=1049215 nr=1 >>>>>>> [ 0.277172] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea000400a000 pfn=1049216 nr=1 >>>>>>> [ 0.277257] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>> page=ffffea000400a040 pfn=1049217 nr=1 >>>>>>> >>>>>>> and so on. >>>>>>> >>>>>>> >>>>>>>> I think your earlier patch can effectively detect these early >>>>>>>> allocations and suppress the warnings. We should also mark these >>>>>>>> allocations with CODETAG_FLAG_INACCURATE. >>>>>>> Thanks to an excellent AI review, I realized there are issues with >>>>>>> >>>>>>> my original patch. One problem is the 256-element array; another >>>>>> Yes, if there are lots of such allocations, it's not appropriate. >>>>>> >>>>>>> is that it involves allocation and free operations — meaning we need >>>>>>> >>>>>>> to record entries at __pgalloc_tag_add and remove them at __pgalloc_tag_sub, >>>>>>> >>>>>>> which introduces a noticeable overhead. I'm wondering if we can instead >>>>>>> set a flag >>>>>>> >>>>>>> bit in page flags during the early boot stage, which I'll refer to as >>>>>>> EARLY_ALLOC_FLAGS. >>>>>>> >>>>>>> Then, in __pgalloc_tag_sub, we first check for EARLY_ALLOC_FLAGS. If >>>>>>> set, we clear the >>>>>>> >>>>>>> flag and return immediately; otherwise, we perform the actual >>>>>>> subtraction of the tag count. >>>>>>> >>>>>>> This approach seems somewhat similar to the idea behind >>>>>>> mem_profiling_compressed. >>>>>> That seems doable but let's first check if we can make page_ext >>>>>> initialization happen before these allocations. That would be the >>>>>> ideal path. If it's not possible then we can focus on alternatives >>>>>> like the one you propose. >>>>> Yes, the ideal scenario would be to have page_ext initialization >>>>> complete before >>>>> >>>>> these allocations occur. I just did a code walkthrough and found that >>>>> this resembles >>>>> >>>>> the FLATMEM implementation approach - FLATMEM allocates page_ext before >>>>> the buddy >>>>> >>>>> system initialization, so it doesn't seem to encounter the issue we're >>>>> facing now. >>>>> >>>>> https://elixir.bootlin.com/linux/v7.0-rc5/source/mm/mm_init.c#L2707 >>>> Yes, page_ext_init_flatmem() looks like an interesting option and it >>>> would not work with sparsemem. TBH I would prefer to find a simple >>>> solution that can identify early init allocations, mark them inaccuate >>>> and suppress the warning rather than introduce some complex mechanism >>>> to account for them which would work only is some cases (flatmem). >>>> With your original approach I think the only real issue is the size of >>>> the array that might be too small. The other issue you mentioned about >>>> allocated page being freed and then re-allocated after page_ext is >>>> inialized but before clear_page_tag_ref() is called is not really a >>>> problem. Yes, we will lose that counter's value but it's similar to >>>> other early allocations which we just treat as inaccurate. We can also >>>> minimize the possibility of this happening by moving >>>> clear_page_tag_ref() into init_page_alloc_tagging(). >>>> >>>> I don't like the pageflag option you mentioned because it adds an >>>> extra condition check into __pgalloc_tag_sub() which will be executed >>>> even after the init stage is over. >>>> I'll look into this some more tomorrow as it's quite late now. >> >> Hi Suren >> >> >>> Just though of something. Are all these pages allocated by slab? If >>> so, I think slab does not use page->lru (need to double-check) and we >>> could add all these pages allocated during early init into a list and >>> then set their page_ext reference to CODETAG_EMPTY in >>> init_page_alloc_tagging(). >> Got your point. >> >> >> There will indeed be some non-SLAB memory allocations here, such as the >> following: >> >> >> CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) >> [ 0.326607] Hardware name: Red Hat KVM, BIOS >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >> [ 0.326608] Call Trace: >> [ 0.326608] <TASK> >> [ 0.326609] dump_stack_lvl+0x53/0x70 >> [ 0.326611] __pgalloc_tag_add+0x407/0x700 >> [ 0.326616] get_page_from_freelist+0xa54/0x1310 >> [ 0.326618] __alloc_frozen_pages_noprof+0x206/0x4c0 >> [ 0.326623] alloc_pages_mpol+0x13a/0x3f0 >> [ 0.326627] alloc_pages_noprof+0xf6/0x2b0 >> [ 0.326628] __pmd_alloc+0x743/0x9c0 >> [ 0.326630] vmap_range_noflush+0xac0/0x10a0 >> [ 0.326637] ioremap_page_range+0x17c/0x250 >> [ 0.326639] __ioremap_caller+0x437/0x5c0 >> [ 0.326645] acpi_os_map_iomem+0x4c0/0x660 >> [ 0.326647] acpi_tb_verify_temp_table+0x1c0/0x580 >> [ 0.326649] acpi_reallocate_root_table+0x2ad/0x460 >> [ 0.326655] acpi_early_init+0x111/0x460 >> [ 0.326657] start_kernel+0x271/0x3c0 >> [ 0.326659] x86_64_start_reservations+0x18/0x30 >> [ 0.326660] x86_64_start_kernel+0xe2/0xf0 >> [ 0.326662] common_startup_64+0x13e/0x141 >> [ 0.326663] </TASK> >> >> CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted >> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) >> [ 0.329167] Hardware name: Red Hat KVM, BIOS >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >> [ 0.329167] Call Trace: >> [ 0.329167] <TASK> >> [ 0.329167] dump_stack_lvl+0x53/0x70 >> [ 0.329167] __pgalloc_tag_add+0x407/0x700 >> [ 0.329167] get_page_from_freelist+0xa54/0x1310 >> [ 0.329167] __alloc_frozen_pages_noprof+0x206/0x4c0 >> [ 0.329167] __alloc_pages_noprof+0x10/0x1b0 >> [ 0.329167] dup_task_struct+0x163/0x8c0 >> [ 0.329167] copy_process+0x390/0x4a70 >> [ 0.329167] kernel_clone+0xe1/0x830 >> [ 0.329167] kernel_thread+0xcb/0x110 >> [ 0.329167] kthreadd+0x8a2/0xc60 >> [ 0.329167] ret_from_fork+0x551/0x720 >> [ 0.329167] ret_from_fork_asm+0x1a/0x30 >> [ 0.329167] </TASK> >> >> CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted >> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) >> [ 0.329167] Hardware name: Red Hat KVM, BIOS >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >> [ 0.329167] Call Trace: >> [ 0.329167] <TASK> >> [ 0.329167] dump_stack_lvl+0x53/0x70 >> [ 0.329167] __pgalloc_tag_add+0x407/0x700 >> [ 0.329167] get_page_from_freelist+0xa54/0x1310 >> [ 0.329167] __alloc_frozen_pages_noprof+0x206/0x4c0 >> [ 0.329167] __alloc_pages_noprof+0x10/0x1b0 >> [ 0.329167] dup_task_struct+0x163/0x8c0 >> [ 0.329167] copy_process+0x390/0x4a70 >> [ 0.329167] kernel_clone+0xe1/0x830 >> [ 0.329167] kernel_thread+0xcb/0x110 >> [ 0.329167] kthreadd+0x8a2/0xc60 >> [ 0.329167] ret_from_fork+0x551/0x720 >> [ 0.329167] ret_from_fork_asm+0x1a/0x30 >> [ 0.329167] </TASK> >> >> CPU: 4 UID: 0 PID: 1 Comm: swapper/0 Not tainted >> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) >> [ 0.434265] Hardware name: Red Hat KVM, BIOS >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >> [ 0.434266] Call Trace: >> [ 0.434266] <TASK> >> [ 0.434266] dump_stack_lvl+0x53/0x70 >> [ 0.434268] __pgalloc_tag_add+0x407/0x700 >> [ 0.434272] get_page_from_freelist+0xa54/0x1310 >> [ 0.434274] __alloc_frozen_pages_noprof+0x206/0x4c0 >> [ 0.434279] alloc_pages_exact_nid_noprof+0x10f/0x380 >> [ 0.434283] init_section_page_ext+0x167/0x370 >> [ 0.434284] page_ext_init+0x451/0x620 >> [ 0.434287] page_alloc_init_late+0x553/0x630 >> [ 0.434290] kernel_init_freeable+0x7be/0xd30 >> [ 0.434294] kernel_init+0x1f/0x1f0 >> [ 0.434295] ret_from_fork+0x551/0x720 >> [ 0.434301] ret_from_fork_asm+0x1a/0x30 >> [ 0.434303] </TASK> >> >> CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted >> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) >> [ 0.346712] Hardware name: Red Hat KVM, BIOS >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >> [ 0.346713] Call Trace: >> [ 0.346713] <TASK> >> [ 0.346714] dump_stack_lvl+0x53/0x70 >> [ 0.346715] __pgalloc_tag_add+0x407/0x700 >> [ 0.346720] get_page_from_freelist+0xa54/0x1310 >> [ 0.346723] __alloc_frozen_pages_noprof+0x206/0x4c0 >> [ 0.346729] __alloc_pages_noprof+0x10/0x1b0 >> [ 0.346731] alloc_cpu_data+0x96/0x210 >> [ 0.346732] rb_allocate_cpu_buffer+0xb93/0x1500 >> [ 0.346739] trace_rb_cpu_prepare+0x21a/0x4f0 >> [ 0.346753] cpuhp_invoke_callback+0x6db/0x14b0 >> [ 0.346755] __cpuhp_invoke_callback_range+0xde/0x1d0 >> [ 0.346759] _cpu_up+0x395/0x880 >> [ 0.346761] cpu_up+0x1bb/0x210 >> [ 0.346762] cpuhp_bringup_mask+0xd2/0x150 >> [ 0.346763] bringup_nonboot_cpus+0x12b/0x170 >> [ 0.346764] smp_init+0x2f/0x100 >> [ 0.346766] kernel_init_freeable+0x7a5/0xd30 >> [ 0.346769] kernel_init+0x1f/0x1f0 >> [ 0.346771] ret_from_fork+0x551/0x720 >> [ 0.346776] ret_from_fork_asm+0x1a/0x30 >> [ 0.346778] </TASK> >> >> and so on... >> >> >> In fact, I previously conducted extensive and prolonged stress testing >> >> on memory profiling. After our efforts to address several WARN cases, >> >> one remaining scenario we are addressing is the warning triggered during >> >> early slab cache reclaim — which is precisely the situation we are currently >> >> encountering (although I cannot guarantee that all edge cases have been >> >> covered by our stress testing). During the stress testing process, this >> warning >> >> did indeed manifest. However, the current environment triggers KASAN slab >> >> cache reclaim earlier than anticipated. >> >> >> Although the memory allocated prior to page_ext initialization has a >> relatively low probability of >> >> being released in subsequent operations (at least we have not >> encountered such cases up to now), >> >> I remain uncertain whether there are any overlooked edge cases when >> considering only slab-backed pages. Hi Suren > Ok, I guess specialized solution for slab would not work then. I want > to check on my side and understand how the number of these early > allocation scales. Is it higher for bigger machines or stays constant. > If the latter I think your original simple solution with some fixups > can still work. I'll need to instrument my code to capture these early > allocations and see where they originate. If you have a patch already > doing that it would help speed it up for me. > Thanks, > Suren. OK, my V2 patch is as follows: diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h index d40ac39bfbe8..bf226c2be2ad 100644 --- a/include/linux/alloc_tag.h +++ b/include/linux/alloc_tag.h @@ -74,6 +74,8 @@ static inline void set_codetag_empty(union codetag_ref *ref) #ifdef CONFIG_MEM_ALLOC_PROFILING +void alloc_tag_add_early_pfn(unsigned long pfn); + #define ALLOC_TAG_SECTION_NAME "alloc_tags" struct codetag_bytes { diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h index 38a82d65e58e..951d33362268 100644 --- a/include/linux/pgalloc_tag.h +++ b/include/linux/pgalloc_tag.h @@ -181,7 +181,7 @@ static inline struct alloc_tag *__pgalloc_tag_get(struct page *page) if (get_page_tag_ref(page, &ref, &handle)) { alloc_tag_sub_check(&ref); - if (ref.ct) + if (ref.ct && !is_codetag_empty(&ref)) tag = ct_to_alloc_tag(ref.ct); put_page_tag_ref(handle); } diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c index 58991ab09d84..55c134a71cd0 100644 --- a/lib/alloc_tag.c +++ b/lib/alloc_tag.c @@ -6,6 +6,7 @@ #include <linux/kallsyms.h> #include <linux/module.h> #include <linux/page_ext.h> +#include <linux/pgalloc_tag.h> #include <linux/proc_fs.h> #include <linux/seq_buf.h> #include <linux/seq_file.h> @@ -26,6 +27,85 @@ static bool mem_profiling_support; static struct codetag_type *alloc_tag_cttype; +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG + +/* + * page_ext is allocated and initialized relatively late during boot. + * Some pages are allocated before page_ext becomes available. + * Track these early PFNs and clear their codetag refs later to avoid + * warnings when they are freed. + */ + +#define EARLY_ALLOC_PFN_MAX 256 + +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX] __initdata; +static atomic_t early_pfn_count __initdata = ATOMIC_INIT(0); + +static void __init __alloc_tag_add_early_pfn(unsigned long pfn) +{ + int old_idx, new_idx; + + do { + old_idx = atomic_read(&early_pfn_count); + if (old_idx >= EARLY_ALLOC_PFN_MAX) + return; + new_idx = old_idx + 1; + } while (!atomic_try_cmpxchg(&early_pfn_count, &old_idx, new_idx)); + + early_pfns[old_idx] = pfn; +} + +static void (*alloc_tag_add_early_pfn_ptr)(unsigned long pfn) __refdata = + __alloc_tag_add_early_pfn; + +void alloc_tag_add_early_pfn(unsigned long pfn) +{ + if (static_key_enabled(&mem_profiling_compressed)) + return; + + if (alloc_tag_add_early_pfn_ptr) + alloc_tag_add_early_pfn_ptr(pfn); +} + +static void __init clear_early_alloc_pfn_tag_refs(void) +{ + unsigned int i; + + for (i = 0; i < atomic_read(&early_pfn_count); i++) { + unsigned long pfn = early_pfns[i]; + + if (pfn_valid(pfn)) { + struct page *page = pfn_to_page(pfn); + union pgtag_ref_handle handle; + union codetag_ref ref; + + if (get_page_tag_ref(page, &ref, &handle)) { + /* + * An early-allocated page could be freed and reallocated + * after its page_ext is initialized but before we clear it. + * In that case, it already has a valid tag set. + * We should not overwrite that valid tag with CODETAG_EMPTY. + */ + if (ref.ct) { + put_page_tag_ref(handle); + continue; + } + + set_codetag_empty(&ref); + update_page_tag_ref(handle, &ref); + put_page_tag_ref(handle); + } + } + + atomic_set(&early_pfn_count, 0); + + alloc_tag_add_early_pfn_ptr = NULL; +} +#else /* !CONFIG_MEM_ALLOC_PROFILING_DEBUG */ +inline void alloc_tag_add_early_pfn(unsigned long pfn) {} +static inline void __init clear_early_alloc_pfn_tag_refs(void) {} +#endif + #ifdef CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag); EXPORT_SYMBOL(_shared_alloc_tag); @@ -760,6 +840,7 @@ static __init bool need_page_alloc_tagging(void) static __init void init_page_alloc_tagging(void) { + clear_early_alloc_pfn_tag_refs(); } struct page_ext_operations page_alloc_tagging_ops = { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2d4b6f1a554e..5ce5c4ba401f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1293,6 +1293,12 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); update_page_tag_ref(handle, &ref); put_page_tag_ref(handle); + } else { + /* + * page_ext is not available yet, record the pfn so we can + * clear the tag ref later when page_ext is initialized. + */ + alloc_tag_add_early_pfn(page_to_pfn(page)); } } Although this 256-entry array remains unmodified for now, I will locally record the occurrence counts of these various early memory allocations. Hopefully this will be helpful to you. Thanks Hao > >> >> Thanks >> Hao >> >>>> Thanks, >>>> Suren. >>>> >>>>> However, I'm not entirely certain whether SPARSEMEM can guarantee the >>>>> same behavior. >>>>> >>>>> >>>>>>> I would appreciate your valuable feedback and any better suggestions you >>>>>>> might have. >>>>>> Thanks for pursuing this! I'll help in any way I can. >>>>>> Suren. >>>>> Thank you so much for your patient guidance and assistance. >>>>> >>>>> I truly appreciate your willingness to share your knowledge and insights. >>>>> >>>>> Thanks, >>>>> Hao >>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Hao >>>>>>> >>>>>>>> Thanks, >>>>>>>> Suren. >>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> Hao >>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>> If the slab cache has no free objects, it falls back >>>>>>>>>>>>>>> to the buddy allocator to allocate memory. However, at this point page_ext >>>>>>>>>>>>>>> is not yet fully initialized, so these newly allocated pages have no >>>>>>>>>>>>>>> codetag set. These pages may later be reclaimed by KASAN,which causes >>>>>>>>>>>>>>> the warning to trigger when they are freed because their codetag ref is >>>>>>>>>>>>>>> still empty. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Use a global array to track pages allocated before page_ext is fully >>>>>>>>>>>>>>> initialized, similar to how kmemleak tracks early allocations. >>>>>>>>>>>>>>> When page_ext initialization completes, set their codetag >>>>>>>>>>>>>>> to empty to avoid warnings when they are freed later. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> --- a/include/linux/alloc_tag.h >>>>>>>>>>>>>>> +++ b/include/linux/alloc_tag.h >>>>>>>>>>>>>>> @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> #ifdef CONFIG_MEM_ALLOC_PROFILING >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +bool mem_profiling_is_available(void); >>>>>>>>>>>>>>> +void alloc_tag_add_early_pfn(unsigned long pfn); >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> #define ALLOC_TAG_SECTION_NAME "alloc_tags" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> struct codetag_bytes { >>>>>>>>>>>>>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c >>>>>>>>>>>>>>> index 58991ab09d84..a5bf4e72c154 100644 >>>>>>>>>>>>>>> --- a/lib/alloc_tag.c >>>>>>>>>>>>>>> +++ b/lib/alloc_tag.c >>>>>>>>>>>>>>> @@ -6,6 +6,7 @@ >>>>>>>>>>>>>>> #include <linux/kallsyms.h> >>>>>>>>>>>>>>> #include <linux/module.h> >>>>>>>>>>>>>>> #include <linux/page_ext.h> >>>>>>>>>>>>>>> +#include <linux/pgalloc_tag.h> >>>>>>>>>>>>>>> #include <linux/proc_fs.h> >>>>>>>>>>>>>>> #include <linux/seq_buf.h> >>>>>>>>>>>>>>> #include <linux/seq_file.h> >>>>>>>>>>>>>>> @@ -26,6 +27,82 @@ static bool mem_profiling_support; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> static struct codetag_type *alloc_tag_cttype; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +/* >>>>>>>>>>>>>>> + * State of the alloc_tag >>>>>>>>>>>>>>> + * >>>>>>>>>>>>>>> + * This is used to describe the states of the alloc_tag during bootup. >>>>>>>>>>>>>>> + * >>>>>>>>>>>>>>> + * When we need to allocate page_ext to store codetag, we face an >>>>>>>>>>>>>>> + * initialization timing problem: >>>>>>>>>>>>>>> + * >>>>>>>>>>>>>>> + * Due to initialization order, pages may be allocated via buddy system >>>>>>>>>>>>>>> + * before page_ext is fully allocated and initialized. Although these >>>>>>>>>>>>>>> + * pages call the allocation hooks, the codetag will not be set because >>>>>>>>>>>>>>> + * page_ext is not yet available. >>>>>>>>>>>>>>> + * >>>>>>>>>>>>>>> + * When these pages are later free to the buddy system, it triggers >>>>>>>>>>>>>>> + * warnings because their codetag is actually empty if >>>>>>>>>>>>>>> + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. >>>>>>>>>>>>>>> + * >>>>>>>>>>>>>>> + * Additionally, in this situation, we cannot record detailed allocation >>>>>>>>>>>>>>> + * information for these pages. >>>>>>>>>>>>>>> + */ >>>>>>>>>>>>>>> +enum mem_profiling_state { >>>>>>>>>>>>>>> + DOWN, /* No mem_profiling functionality yet */ >>>>>>>>>>>>>>> + UP /* Everything is working */ >>>>>>>>>>>>>>> +}; >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +static enum mem_profiling_state mem_profiling_state = DOWN; >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +bool mem_profiling_is_available(void) >>>>>>>>>>>>>>> +{ >>>>>>>>>>>>>>> + return mem_profiling_state == UP; >>>>>>>>>>>>>>> +} >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +#define EARLY_ALLOC_PFN_MAX 256 >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; >>>>>>>>>>>>>> It's unfortunate that this isn't __initdata. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> +static unsigned int early_pfn_count; >>>>>>>>>>>>>>> +static DEFINE_SPINLOCK(early_pfn_lock); >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> --- a/mm/page_alloc.c >>>>>>>>>>>>>>> +++ b/mm/page_alloc.c >>>>>>>>>>>>>>> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, >>>>>>>>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >>>>>>>>>>>>>>> update_page_tag_ref(handle, &ref); >>>>>>>>>>>>>>> put_page_tag_ref(handle); >>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>> This branch can be marked as "unlikely". >>>>>>>>>>>>> >>>>>>>>>>>>>>> + /* >>>>>>>>>>>>>>> + * page_ext is not available yet, record the pfn so we can >>>>>>>>>>>>>>> + * clear the tag ref later when page_ext is initialized. >>>>>>>>>>>>>>> + */ >>>>>>>>>>>>>>> + if (!mem_profiling_is_available()) >>>>>>>>>>>>>>> + alloc_tag_add_early_pfn(page_to_pfn(page)); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> } >>>>>>>>>>>>>> All because of this, I believe. Is this fixable? >>>>>>>>>>>>>> >>>>>>>>>>>>>> If we take that `else', we know we're running in __init code, yes? I >>>>>>>>>>>>>> don't see how `__init pgalloc_tag_add_early()' could be made to work. >>>>>>>>>>>>>> hrm. Something clever, please. >>>>>>>>>>>>> We can have a pointer to a function that is initialized to point to >>>>>>>>>>>>> alloc_tag_add_early_pfn, which is defined as __init and uses >>>>>>>>>>>>> early_pfns which now can be defined as __initdata. After >>>>>>>>>>>>> clear_early_alloc_pfn_tag_refs() is done we reset that pointer to >>>>>>>>>>>>> NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() >>>>>>>>>>>>> directly checks that pointer and if it's not NULL then calls the >>>>>>>>>>>>> function that it points to. This way __pgalloc_tag_add() which is not >>>>>>>>>>>>> an __init function will be invoking alloc_tag_add_early_pfn() __init >>>>>>>>>>>>> function only until we are done with initialization. I haven't tried >>>>>>>>>>>>> this but I think that should work. This also eliminates the need for >>>>>>>>>>>>> mem_profiling_state variable since we can use this function pointer >>>>>>>>>>>>> instead. >>>>>>>>>>>>> >>>>>>>>>>>>> ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-26 1:44 ` Hao Ge @ 2026-03-26 5:04 ` Suren Baghdasaryan 2026-03-26 5:33 ` Hao Ge 0 siblings, 1 reply; 21+ messages in thread From: Suren Baghdasaryan @ 2026-03-26 5:04 UTC (permalink / raw) To: Hao Ge; +Cc: Andrew Morton, Kent Overstreet, linux-mm, linux-kernel On Wed, Mar 25, 2026 at 6:45 PM Hao Ge <hao.ge@linux.dev> wrote: > > > On 2026/3/25 23:17, Suren Baghdasaryan wrote: > > On Wed, Mar 25, 2026 at 4:21 AM Hao Ge <hao.ge@linux.dev> wrote: > >> > >> On 2026/3/25 15:35, Suren Baghdasaryan wrote: > >>> On Tue, Mar 24, 2026 at 11:25 PM Suren Baghdasaryan <surenb@google.com> wrote: > >>>> On Tue, Mar 24, 2026 at 7:08 PM Hao Ge <hao.ge@linux.dev> wrote: > >>>>> On 2026/3/25 08:21, Suren Baghdasaryan wrote: > >>>>>> On Tue, Mar 24, 2026 at 2:43 AM Hao Ge <hao.ge@linux.dev> wrote: > >>>>>>> On 2026/3/24 06:47, Suren Baghdasaryan wrote: > >>>>>>>> On Mon, Mar 23, 2026 at 2:16 AM Hao Ge <hao.ge@linux.dev> wrote: > >>>>>>>>> On 2026/3/20 10:14, Suren Baghdasaryan wrote: > >>>>>>>>>> On Thu, Mar 19, 2026 at 6:58 PM Hao Ge <hao.ge@linux.dev> wrote: > >>>>>>>>>>> On 2026/3/20 07:48, Suren Baghdasaryan wrote: > >>>>>>>>>>>> On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@google.com> wrote: > >>>>>>>>>>>>> On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: > >>>>>>>>>>>>>> On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Due to initialization ordering, page_ext is allocated and initialized > >>>>>>>>>>>>>>> relatively late during boot. Some pages have already been allocated > >>>>>>>>>>>>>>> and freed before page_ext becomes available, leaving their codetag > >>>>>>>>>>>>>>> uninitialized. > >>>>>>>>>>>>> Hi Hao, > >>>>>>>>>>>>> Thanks for the report. > >>>>>>>>>>>>> Hmm. So, we are allocating pages before page_ext is initialized... > >>>>>>>>>>>>> > >>>>>>>>>>>>>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls > >>>>>>>>>>>>>>> kmemleak_alloc(). > >>>>>>>>>>>> Forgot to ask. The example you are using here is for page_ext > >>>>>>>>>>>> allocation itself. Do you have any other examples where page > >>>>>>>>>>>> allocation happens before page_ext initialization? If that's the only > >>>>>>>>>>>> place, then we might be able to fix this in a simpler way by doing > >>>>>>>>>>>> something special for alloc_page_ext(). > >>>>>>>>>>> Hi Suren > >>>>>>>>>>> > >>>>>>>>>>> To help illustrate the point, here's the debug log I added: > >>>>>>>>>>> > >>>>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>>>>>>>>>> index 2d4b6f1a554e..ebfe636f5b07 100644 > >>>>>>>>>>> --- a/mm/page_alloc.c > >>>>>>>>>>> +++ b/mm/page_alloc.c > >>>>>>>>>>> @@ -1293,6 +1293,9 @@ void __pgalloc_tag_add(struct page *page, struct > >>>>>>>>>>> task_struct *task, > >>>>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >>>>>>>>>>> update_page_tag_ref(handle, &ref); > >>>>>>>>>>> put_page_tag_ref(handle); > >>>>>>>>>>> + } else { > >>>>>>>>>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > >>>>>>>>>>> + dump_stack(); > >>>>>>>>>>> } > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> And I caught the following logs: > >>>>>>>>>>> > >>>>>>>>>>> [ 0.296399] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>>>> page=ffffea000400c700 pfn=1049372 nr=1 > >>>>>>>>>>> [ 0.296400] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >>>>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >>>>>>>>>>> [ 0.296402] Hardware name: Red Hat KVM, BIOS > >>>>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>>>>>>>>> [ 0.296402] Call Trace: > >>>>>>>>>>> [ 0.296403] <TASK> > >>>>>>>>>>> [ 0.296403] dump_stack_lvl+0x53/0x70 > >>>>>>>>>>> [ 0.296405] __pgalloc_tag_add+0x3a3/0x6e0 > >>>>>>>>>>> [ 0.296406] ? __pfx___pgalloc_tag_add+0x10/0x10 > >>>>>>>>>>> [ 0.296407] ? kasan_unpoison+0x27/0x60 > >>>>>>>>>>> [ 0.296409] ? __kasan_unpoison_pages+0x2c/0x40 > >>>>>>>>>>> [ 0.296411] get_page_from_freelist+0xa54/0x1310 > >>>>>>>>>>> [ 0.296413] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>>>>>>>>> [ 0.296415] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >>>>>>>>>>> [ 0.296417] ? stack_depot_save_flags+0x3f/0x680 > >>>>>>>>>>> [ 0.296418] ? ___slab_alloc+0x518/0x530 > >>>>>>>>>>> [ 0.296420] alloc_pages_mpol+0x13a/0x3f0 > >>>>>>>>>>> [ 0.296421] ? __pfx_alloc_pages_mpol+0x10/0x10 > >>>>>>>>>>> [ 0.296423] ? _raw_spin_lock_irqsave+0x8a/0xf0 > >>>>>>>>>>> [ 0.296424] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 > >>>>>>>>>>> [ 0.296426] alloc_slab_page+0xc2/0x130 > >>>>>>>>>>> [ 0.296427] allocate_slab+0x77/0x2c0 > >>>>>>>>>>> [ 0.296429] ? syscall_enter_define_fields+0x3bb/0x5f0 > >>>>>>>>>>> [ 0.296430] ___slab_alloc+0x125/0x530 > >>>>>>>>>>> [ 0.296432] ? __trace_define_field+0x252/0x3d0 > >>>>>>>>>>> [ 0.296433] __kmalloc_noprof+0x329/0x630 > >>>>>>>>>>> [ 0.296435] ? syscall_enter_define_fields+0x3bb/0x5f0 > >>>>>>>>>>> [ 0.296436] syscall_enter_define_fields+0x3bb/0x5f0 > >>>>>>>>>>> [ 0.296438] ? __pfx_syscall_enter_define_fields+0x10/0x10 > >>>>>>>>>>> [ 0.296440] event_define_fields+0x326/0x540 > >>>>>>>>>>> [ 0.296441] __trace_early_add_events+0xac/0x3c0 > >>>>>>>>>>> [ 0.296443] trace_event_init+0x24c/0x460 > >>>>>>>>>>> [ 0.296445] trace_init+0x9/0x20 > >>>>>>>>>>> [ 0.296446] start_kernel+0x199/0x3c0 > >>>>>>>>>>> [ 0.296448] x86_64_start_reservations+0x18/0x30 > >>>>>>>>>>> [ 0.296449] x86_64_start_kernel+0xe2/0xf0 > >>>>>>>>>>> [ 0.296451] common_startup_64+0x13e/0x141 > >>>>>>>>>>> [ 0.296453] </TASK> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> [ 0.312234] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>>>> page=ffffea000400f900 pfn=1049572 nr=1 > >>>>>>>>>>> [ 0.312234] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >>>>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >>>>>>>>>>> [ 0.312236] Hardware name: Red Hat KVM, BIOS > >>>>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>>>>>>>>> [ 0.312236] Call Trace: > >>>>>>>>>>> [ 0.312237] <TASK> > >>>>>>>>>>> [ 0.312237] dump_stack_lvl+0x53/0x70 > >>>>>>>>>>> [ 0.312239] __pgalloc_tag_add+0x3a3/0x6e0 > >>>>>>>>>>> [ 0.312240] ? __pfx___pgalloc_tag_add+0x10/0x10 > >>>>>>>>>>> [ 0.312241] ? rmqueue.constprop.0+0x4fc/0x1ce0 > >>>>>>>>>>> [ 0.312243] ? kasan_unpoison+0x27/0x60 > >>>>>>>>>>> [ 0.312244] ? __kasan_unpoison_pages+0x2c/0x40 > >>>>>>>>>>> [ 0.312246] get_page_from_freelist+0xa54/0x1310 > >>>>>>>>>>> [ 0.312248] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>>>>>>>>> [ 0.312250] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >>>>>>>>>>> [ 0.312253] alloc_slab_page+0x39/0x130 > >>>>>>>>>>> [ 0.312254] allocate_slab+0x77/0x2c0 > >>>>>>>>>>> [ 0.312255] ? alloc_cpumask_var_node+0xc7/0x230 > >>>>>>>>>>> [ 0.312257] ___slab_alloc+0x46d/0x530 > >>>>>>>>>>> [ 0.312259] __kmalloc_node_noprof+0x2fa/0x680 > >>>>>>>>>>> [ 0.312261] ? alloc_cpumask_var_node+0xc7/0x230 > >>>>>>>>>>> [ 0.312263] alloc_cpumask_var_node+0xc7/0x230 > >>>>>>>>>>> [ 0.312264] init_desc+0x141/0x6b0 > >>>>>>>>>>> [ 0.312266] alloc_desc+0x108/0x1b0 > >>>>>>>>>>> [ 0.312267] early_irq_init+0xee/0x1c0 > >>>>>>>>>>> [ 0.312268] ? __pfx_early_irq_init+0x10/0x10 > >>>>>>>>>>> [ 0.312271] start_kernel+0x1ab/0x3c0 > >>>>>>>>>>> [ 0.312272] x86_64_start_reservations+0x18/0x30 > >>>>>>>>>>> [ 0.312274] x86_64_start_kernel+0xe2/0xf0 > >>>>>>>>>>> [ 0.312275] common_startup_64+0x13e/0x141 > >>>>>>>>>>> [ 0.312277] </TASK> > >>>>>>>>>>> > >>>>>>>>>>> [ 0.312834] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>>>> page=ffffea000400fc00 pfn=1049584 nr=1 > >>>>>>>>>>> [ 0.312835] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >>>>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >>>>>>>>>>> [ 0.312836] Hardware name: Red Hat KVM, BIOS > >>>>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>>>>>>>>> [ 0.312837] Call Trace: > >>>>>>>>>>> [ 0.312837] <TASK> > >>>>>>>>>>> [ 0.312838] dump_stack_lvl+0x53/0x70 > >>>>>>>>>>> [ 0.312840] __pgalloc_tag_add+0x3a3/0x6e0 > >>>>>>>>>>> [ 0.312841] ? __pfx___pgalloc_tag_add+0x10/0x10 > >>>>>>>>>>> [ 0.312842] ? rmqueue.constprop.0+0x4fc/0x1ce0 > >>>>>>>>>>> [ 0.312844] ? kasan_unpoison+0x27/0x60 > >>>>>>>>>>> [ 0.312845] ? __kasan_unpoison_pages+0x2c/0x40 > >>>>>>>>>>> [ 0.312847] get_page_from_freelist+0xa54/0x1310 > >>>>>>>>>>> [ 0.312849] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>>>>>>>>> [ 0.312851] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >>>>>>>>>>> [ 0.312853] alloc_pages_mpol+0x13a/0x3f0 > >>>>>>>>>>> [ 0.312855] ? __pfx_alloc_pages_mpol+0x10/0x10 > >>>>>>>>>>> [ 0.312856] ? xas_find+0x2d8/0x450 > >>>>>>>>>>> [ 0.312858] ? _raw_spin_lock+0x84/0xe0 > >>>>>>>>>>> [ 0.312859] ? __pfx__raw_spin_lock+0x10/0x10 > >>>>>>>>>>> [ 0.312861] alloc_pages_noprof+0xf6/0x2b0 > >>>>>>>>>>> [ 0.312862] __change_page_attr+0x293/0x850 > >>>>>>>>>>> [ 0.312864] ? __pfx___change_page_attr+0x10/0x10 > >>>>>>>>>>> [ 0.312865] ? _vm_unmap_aliases+0x2d0/0x650 > >>>>>>>>>>> [ 0.312868] ? __pfx__vm_unmap_aliases+0x10/0x10 > >>>>>>>>>>> [ 0.312869] __change_page_attr_set_clr+0x16c/0x360 > >>>>>>>>>>> [ 0.312871] ? spp_getpage+0xbb/0x1e0 > >>>>>>>>>>> [ 0.312872] change_page_attr_set_clr+0x220/0x3c0 > >>>>>>>>>>> [ 0.312873] ? flush_tlb_one_kernel+0xf/0x30 > >>>>>>>>>>> [ 0.312875] ? set_pte_vaddr_p4d+0x110/0x180 > >>>>>>>>>>> [ 0.312877] ? __pfx_change_page_attr_set_clr+0x10/0x10 > >>>>>>>>>>> [ 0.312878] ? __pfx_set_pte_vaddr_p4d+0x10/0x10 > >>>>>>>>>>> [ 0.312881] ? __pfx_mtree_load+0x10/0x10 > >>>>>>>>>>> [ 0.312883] ? __pfx_mtree_load+0x10/0x10 > >>>>>>>>>>> [ 0.312884] ? __asan_memcpy+0x3c/0x60 > >>>>>>>>>>> [ 0.312886] ? set_intr_gate+0x10c/0x150 > >>>>>>>>>>> [ 0.312888] set_memory_ro+0x76/0xa0 > >>>>>>>>>>> [ 0.312889] ? __pfx_set_memory_ro+0x10/0x10 > >>>>>>>>>>> [ 0.312891] idt_setup_apic_and_irq_gates+0x2c1/0x390 > >>>>>>>>>>> > >>>>>>>>>>> and more. > >>>>>>>>>> Ok, it's not the only place. Got your point. > >>>>>>>>>> > >>>>>>>>>>> off topic - if we were to handle only alloc_page_ext() specifically, > >>>>>>>>>>> what would be the most straightforward > >>>>>>>>>>> > >>>>>>>>>>> solution in your mind? I'd really appreciate your insight. > >>>>>>>>>> I was thinking if it's the only special case maybe we can handle it > >>>>>>>>>> somehow differently, like we do when we allocate obj_ext vectors for > >>>>>>>>>> slabs using __GFP_NO_OBJ_EXT. I haven't found a good solution yet but > >>>>>>>>>> since it's not a special case we would not be able to use it even if I > >>>>>>>>>> came up with something... > >>>>>>>>>> I think your way is the most straight-forward but please try my > >>>>>>>>>> suggestion to see if we can avoid extra overhead. > >>>>>>>>>> Thanks, > >>>>>>>>>> Suren. > >>>>> Hi Suren > >>>>>>> Hi Suren > >>>>>>> > >>>>>>> > >>>>>>>> Hi Hao, > >>>>>>>> > >>>>>>>>> Hi Suren > >>>>>>>>> > >>>>>>>>> Thank you for your feedback. After re-examining this issue, > >>>>>>>>> > >>>>>>>>> I realize my previous focus was misplaced. > >>>>>>>>> > >>>>>>>>> Upon deeper consideration, I understand that this is not merely a bug, > >>>>>>>>> > >>>>>>>>> but rather a warning that indicates a gap in our memory profiling mechanism. > >>>>>>>>> > >>>>>>>>> Specifically, the current implementation appears to be missing memory > >>>>>>>>> allocation > >>>>>>>>> > >>>>>>>>> tracking during the period between the buddy system allocation and page_ext > >>>>>>>>> > >>>>>>>>> initialization. > >>>>>>>>> > >>>>>>>>> This profiling gap means we may not be capturing all relevant memory > >>>>>>>>> allocation > >>>>>>>>> > >>>>>>>>> events during this critical transition phase. > >>>>>>>> Correct, this limitation exists because memory profiling relies on > >>>>>>>> some kernel facilities (page_ext, objj_ext) which might not be > >>>>>>>> initialized yet at the time of allocation. > >>>>>>>> > >>>>>>>>> My approach is to dynamically allocate codetag_ref when get_page_tag_ref > >>>>>>>>> fails, > >>>>>>>>> > >>>>>>>>> and maintain a linked list to track all buddy system allocations that > >>>>>>>>> occur prior to page_ext initialization. > >>>>>>>>> > >>>>>>>>> However, this introduces performance concerns: > >>>>>>>>> > >>>>>>>>> 1. Free Path Overhead: When freeing these pages, we would need to > >>>>>>>>> traverse the entire linked list to locate > >>>>>>>>> > >>>>>>>>> the corresponding codetag_ref, resulting in O(n) lookup complexity > >>>>>>>>> per free operation. > >>>>>>>>> > >>>>>>>>> 2. Initialization Overhead: During init_page_alloc_tagging, iterating > >>>>>>>>> through the linked list to assign codetag_ref to > >>>>>>>>> > >>>>>>>>> page_ext would introduce additional traversal cost. > >>>>>>>>> > >>>>>>>>> If the number of pages is substantial, this could incur significant > >>>>>>>>> overhead. What are your thoughts on this? I look forward to your > >>>>>>>>> suggestions. > >>>>>>>> My thinking is that these early allocations comprise a small portion > >>>>>>>> of overall memory consumed by the system. So, instead of trying to > >>>>>>>> record and handle them in some alternative way, we just accept that > >>>>>>>> some counters might not be exactly accurate and ignore those early > >>>>>>>> allocations. See how the early slab allocations are marked with the > >>>>>>>> CODETAG_FLAG_INACCURATE flag and later reported as inaccurate. I think > >>>>>>>> that's an acceptable alternative to introducing extra complexity and > >>>>>>>> performance overhead. IOW, the benefits of accounting for these early > >>>>>>>> allocations are low compared to the effort required to account for > >>>>>>>> them. Unless you found a simple and performant way to do that... > >>>>>>> I have been exploring possible solutions to this issue over the past few > >>>>>>> days, > >>>>>>> > >>>>>>> but so far I have not come up with a good approach. > >>>>>>> > >>>>>>> I have counted the number of memory allocations that occur earlier than the > >>>>>>> > >>>>>>> allocation and initialization of our page_ext, and found that there are > >>>>>>> actually > >>>>>>> > >>>>>>> quite a lot of them. > >>>>>> Interesting... I wonder it's because deferred_struct_pages defers > >>>>>> page_ext initialization. Can you check if setting early_page_ext > >>>>>> reduces or eliminates these allocations before page_ext init cases? > >>>>> Yes, you are correct. In my 8-core 16GB virtual machine, I used a global > >>>>> counter > >>>>> > >>>>> to record these allocations. With early_page_ext enabled, there were 130 > >>>>> allocations > >>>>> > >>>>> before page_ext initialization. Without early_page_ext, there were 802 > >>>>> allocations > >>>>> > >>>>> before page_ext initialization. > >>>>> > >>>>> > >>>>>>> Similarly, I have made the following changes and collected the > >>>>>>> corresponding logs. > >>>>>>> > >>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>>>>>> index 2d4b6f1a554e..6db65b3d52d3 100644 > >>>>>>> --- a/mm/page_alloc.c > >>>>>>> +++ b/mm/page_alloc.c > >>>>>>> @@ -1293,6 +1293,8 @@ void __pgalloc_tag_add(struct page *page, struct > >>>>>>> task_struct *task, > >>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >>>>>>> update_page_tag_ref(handle, &ref); > >>>>>>> put_page_tag_ref(handle); > >>>>>>> + } else{ > >>>>>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > >>>>>>> } > >>>>>>> } > >>>>>>> > >>>>>>> @@ -1314,6 +1316,8 @@ void __pgalloc_tag_sub(struct page *page, unsigned > >>>>>>> int nr) > >>>>>>> alloc_tag_sub(&ref, PAGE_SIZE * nr); > >>>>>>> update_page_tag_ref(handle, &ref); > >>>>>>> put_page_tag_ref(handle); > >>>>>>> + } else{ > >>>>>>> + pr_warn("__pgalloc_tag_sub: get_page_tag_ref failed! > >>>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > >>>>>>> } > >>>>>>> } > >>>>>>> > >>>>>>> [ 0.261699] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001000 pfn=1048640 nr=2 > >>>>>>> [ 0.261711] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001100 pfn=1048644 nr=4 > >>>>>>> [ 0.261717] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001200 pfn=1048648 nr=4 > >>>>>>> [ 0.261721] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001300 pfn=1048652 nr=4 > >>>>>>> [ 0.261893] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001080 pfn=1048642 nr=2 > >>>>>>> [ 0.261917] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001400 pfn=1048656 nr=4 > >>>>>>> [ 0.262018] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001500 pfn=1048660 nr=2 > >>>>>>> [ 0.262024] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001600 pfn=1048664 nr=8 > >>>>>>> [ 0.262040] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001580 pfn=1048662 nr=1 > >>>>>>> [ 0.262048] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea00040015c0 pfn=1048663 nr=1 > >>>>>>> [ 0.262056] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001800 pfn=1048672 nr=2 > >>>>>>> [ 0.262064] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001880 pfn=1048674 nr=2 > >>>>>>> [ 0.262078] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001900 pfn=1048676 nr=2 > >>>>>>> [ 0.262196] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 > >>>>>>> [ 0.262213] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001980 pfn=1048678 nr=2 > >>>>>>> [ 0.262220] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001a00 pfn=1048680 nr=4 > >>>>>>> [ 0.262246] ODEBUG: selftest passed > >>>>>>> [ 0.262268] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001b00 pfn=1048684 nr=1 > >>>>>>> [ 0.262318] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001b40 pfn=1048685 nr=1 > >>>>>>> [ 0.262368] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001b80 pfn=1048686 nr=1 > >>>>>>> [ 0.262418] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001bc0 pfn=1048687 nr=1 > >>>>>>> [ 0.262469] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001c00 pfn=1048688 nr=1 > >>>>>>> [ 0.262519] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001c40 pfn=1048689 nr=1 > >>>>>>> [ 0.262569] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001c80 pfn=1048690 nr=1 > >>>>>>> [ 0.262620] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001cc0 pfn=1048691 nr=1 > >>>>>>> [ 0.262670] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001d00 pfn=1048692 nr=1 > >>>>>>> [ 0.262721] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001d40 pfn=1048693 nr=1 > >>>>>>> [ 0.262771] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001d80 pfn=1048694 nr=1 > >>>>>>> [ 0.262821] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001dc0 pfn=1048695 nr=1 > >>>>>>> [ 0.262871] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001e00 pfn=1048696 nr=1 > >>>>>>> [ 0.262923] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001e40 pfn=1048697 nr=1 > >>>>>>> [ 0.262974] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001e80 pfn=1048698 nr=1 > >>>>>>> [ 0.263024] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001ec0 pfn=1048699 nr=1 > >>>>>>> [ 0.263074] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001f00 pfn=1048700 nr=1 > >>>>>>> [ 0.263124] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001f40 pfn=1048701 nr=1 > >>>>>>> [ 0.263174] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001f80 pfn=1048702 nr=1 > >>>>>>> [ 0.263224] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004001fc0 pfn=1048703 nr=1 > >>>>>>> [ 0.263275] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004002000 pfn=1048704 nr=1 > >>>>>>> [ 0.263325] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004002040 pfn=1048705 nr=1 > >>>>>>> [ 0.263375] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004002080 pfn=1048706 nr=1 > >>>>>>> [ 0.263427] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004002400 pfn=1048720 nr=16 > >>>>>>> [ 0.263437] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea00040020c0 pfn=1048707 nr=1 > >>>>>>> [ 0.263463] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004002100 pfn=1048708 nr=1 > >>>>>>> [ 0.263465] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004002140 pfn=1048709 nr=1 > >>>>>>> [ 0.263467] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004002180 pfn=1048710 nr=1 > >>>>>>> [ 0.263509] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004002200 pfn=1048712 nr=4 > >>>>>>> [ 0.263512] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004002800 pfn=1048736 nr=8 > >>>>>>> [ 0.263524] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea00040021c0 pfn=1048711 nr=1 > >>>>>>> [ 0.263536] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004002300 pfn=1048716 nr=1 > >>>>>>> [ 0.263537] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004002340 pfn=1048717 nr=1 > >>>>>>> [ 0.263539] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004002380 pfn=1048718 nr=1 > >>>>>>> [ 0.263604] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004004000 pfn=1048832 nr=128 > >>>>>>> [ 0.263638] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004003000 pfn=1048768 nr=64 > >>>>>>> [ 0.263650] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004002c00 pfn=1048752 nr=16 > >>>>>>> [ 0.263655] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea00040023c0 pfn=1048719 nr=1 > >>>>>>> [ 0.270582] __pgalloc_tag_sub: get_page_tag_ref failed! > >>>>>>> page=ffffea00040023c0 pfn=1048719 nr=1 > >>>>>>> [ 0.270591] ftrace: allocating 52717 entries in 208 pages > >>>>>>> [ 0.270592] ftrace: allocated 208 pages with 3 groups > >>>>>>> [ 0.270620] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004002a00 pfn=1048744 nr=8 > >>>>>>> [ 0.270636] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea00040023c0 pfn=1048719 nr=1 > >>>>>>> [ 0.270643] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006000 pfn=1048960 nr=1 > >>>>>>> [ 0.270649] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006040 pfn=1048961 nr=1 > >>>>>>> [ 0.270658] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004007000 pfn=1049024 nr=64 > >>>>>>> [ 0.270659] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006080 pfn=1048962 nr=2 > >>>>>>> [ 0.270722] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006100 pfn=1048964 nr=1 > >>>>>>> [ 0.270730] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006140 pfn=1048965 nr=1 > >>>>>>> [ 0.270738] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006180 pfn=1048966 nr=1 > >>>>>>> [ 0.270777] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea00040061c0 pfn=1048967 nr=1 > >>>>>>> [ 0.270786] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006200 pfn=1048968 nr=1 > >>>>>>> [ 0.270792] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006240 pfn=1048969 nr=1 > >>>>>>> [ 0.270833] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006300 pfn=1048972 nr=4 > >>>>>>> [ 0.270891] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006280 pfn=1048970 nr=1 > >>>>>>> [ 0.270980] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea00040062c0 pfn=1048971 nr=1 > >>>>>>> [ 0.271071] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006400 pfn=1048976 nr=1 > >>>>>>> [ 0.271156] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006440 pfn=1048977 nr=1 > >>>>>>> [ 0.271185] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006480 pfn=1048978 nr=2 > >>>>>>> [ 0.271301] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006500 pfn=1048980 nr=1 > >>>>>>> [ 0.271655] Dynamic Preempt: lazy > >>>>>>> [ 0.271662] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006580 pfn=1048982 nr=2 > >>>>>>> [ 0.271752] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006600 pfn=1048984 nr=4 > >>>>>>> [ 0.271762] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004010000 pfn=1049600 nr=4 > >>>>>>> [ 0.271824] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006540 pfn=1048981 nr=1 > >>>>>>> [ 0.271916] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006700 pfn=1048988 nr=2 > >>>>>>> [ 0.271964] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006780 pfn=1048990 nr=1 > >>>>>>> [ 0.272099] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea00040067c0 pfn=1048991 nr=1 > >>>>>>> [ 0.272138] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006800 pfn=1048992 nr=2 > >>>>>>> [ 0.272144] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006a00 pfn=1049000 nr=8 > >>>>>>> [ 0.272249] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006c00 pfn=1049008 nr=8 > >>>>>>> [ 0.272319] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006880 pfn=1048994 nr=2 > >>>>>>> [ 0.272351] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006900 pfn=1048996 nr=4 > >>>>>>> [ 0.272424] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004006e00 pfn=1049016 nr=8 > >>>>>>> [ 0.272485] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004008000 pfn=1049088 nr=8 > >>>>>>> [ 0.272535] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004008200 pfn=1049096 nr=2 > >>>>>>> [ 0.272600] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004008400 pfn=1049104 nr=8 > >>>>>>> [ 0.272663] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004008300 pfn=1049100 nr=4 > >>>>>>> [ 0.272694] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004008280 pfn=1049098 nr=2 > >>>>>>> [ 0.272708] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004008600 pfn=1049112 nr=8 > >>>>>>> > >>>>>>> [ 0.272924] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004008880 pfn=1049122 nr=2 > >>>>>>> [ 0.272934] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004008900 pfn=1049124 nr=2 > >>>>>>> [ 0.272952] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004008c00 pfn=1049136 nr=4 > >>>>>>> [ 0.273035] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004008980 pfn=1049126 nr=2 > >>>>>>> [ 0.273062] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004008e00 pfn=1049144 nr=8 > >>>>>>> [ 0.273674] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004008d00 pfn=1049140 nr=1 > >>>>>>> [ 0.273884] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004008d80 pfn=1049142 nr=2 > >>>>>>> [ 0.273943] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009000 pfn=1049152 nr=2 > >>>>>>> [ 0.274379] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009080 pfn=1049154 nr=2 > >>>>>>> [ 0.274575] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009200 pfn=1049160 nr=8 > >>>>>>> [ 0.274617] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009100 pfn=1049156 nr=4 > >>>>>>> [ 0.274794] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009400 pfn=1049168 nr=2 > >>>>>>> [ 0.274840] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009480 pfn=1049170 nr=2 > >>>>>>> [ 0.275057] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009500 pfn=1049172 nr=2 > >>>>>>> [ 0.275092] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009580 pfn=1049174 nr=2 > >>>>>>> [ 0.275134] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009600 pfn=1049176 nr=8 > >>>>>>> [ 0.275211] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009800 pfn=1049184 nr=4 > >>>>>>> [ 0.275510] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009900 pfn=1049188 nr=2 > >>>>>>> [ 0.275548] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009980 pfn=1049190 nr=2 > >>>>>>> [ 0.275976] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009a00 pfn=1049192 nr=8 > >>>>>>> [ 0.275987] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009c00 pfn=1049200 nr=2 > >>>>>>> [ 0.276139] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009c80 pfn=1049202 nr=2 > >>>>>>> [ 0.276152] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004008d40 pfn=1049141 nr=1 > >>>>>>> [ 0.276242] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009d00 pfn=1049204 nr=1 > >>>>>>> [ 0.276358] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009d40 pfn=1049205 nr=1 > >>>>>>> [ 0.276444] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009d80 pfn=1049206 nr=1 > >>>>>>> [ 0.276526] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009dc0 pfn=1049207 nr=1 > >>>>>>> [ 0.276615] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009e00 pfn=1049208 nr=1 > >>>>>>> [ 0.276696] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009e40 pfn=1049209 nr=1 > >>>>>>> [ 0.276792] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009e80 pfn=1049210 nr=1 > >>>>>>> [ 0.276827] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009f00 pfn=1049212 nr=2 > >>>>>>> [ 0.276891] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009ec0 pfn=1049211 nr=1 > >>>>>>> [ 0.276999] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009f80 pfn=1049214 nr=1 > >>>>>>> [ 0.277082] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea0004009fc0 pfn=1049215 nr=1 > >>>>>>> [ 0.277172] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea000400a000 pfn=1049216 nr=1 > >>>>>>> [ 0.277257] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>> page=ffffea000400a040 pfn=1049217 nr=1 > >>>>>>> > >>>>>>> and so on. > >>>>>>> > >>>>>>> > >>>>>>>> I think your earlier patch can effectively detect these early > >>>>>>>> allocations and suppress the warnings. We should also mark these > >>>>>>>> allocations with CODETAG_FLAG_INACCURATE. > >>>>>>> Thanks to an excellent AI review, I realized there are issues with > >>>>>>> > >>>>>>> my original patch. One problem is the 256-element array; another > >>>>>> Yes, if there are lots of such allocations, it's not appropriate. > >>>>>> > >>>>>>> is that it involves allocation and free operations — meaning we need > >>>>>>> > >>>>>>> to record entries at __pgalloc_tag_add and remove them at __pgalloc_tag_sub, > >>>>>>> > >>>>>>> which introduces a noticeable overhead. I'm wondering if we can instead > >>>>>>> set a flag > >>>>>>> > >>>>>>> bit in page flags during the early boot stage, which I'll refer to as > >>>>>>> EARLY_ALLOC_FLAGS. > >>>>>>> > >>>>>>> Then, in __pgalloc_tag_sub, we first check for EARLY_ALLOC_FLAGS. If > >>>>>>> set, we clear the > >>>>>>> > >>>>>>> flag and return immediately; otherwise, we perform the actual > >>>>>>> subtraction of the tag count. > >>>>>>> > >>>>>>> This approach seems somewhat similar to the idea behind > >>>>>>> mem_profiling_compressed. > >>>>>> That seems doable but let's first check if we can make page_ext > >>>>>> initialization happen before these allocations. That would be the > >>>>>> ideal path. If it's not possible then we can focus on alternatives > >>>>>> like the one you propose. > >>>>> Yes, the ideal scenario would be to have page_ext initialization > >>>>> complete before > >>>>> > >>>>> these allocations occur. I just did a code walkthrough and found that > >>>>> this resembles > >>>>> > >>>>> the FLATMEM implementation approach - FLATMEM allocates page_ext before > >>>>> the buddy > >>>>> > >>>>> system initialization, so it doesn't seem to encounter the issue we're > >>>>> facing now. > >>>>> > >>>>> https://elixir.bootlin.com/linux/v7.0-rc5/source/mm/mm_init.c#L2707 > >>>> Yes, page_ext_init_flatmem() looks like an interesting option and it > >>>> would not work with sparsemem. TBH I would prefer to find a simple > >>>> solution that can identify early init allocations, mark them inaccuate > >>>> and suppress the warning rather than introduce some complex mechanism > >>>> to account for them which would work only is some cases (flatmem). > >>>> With your original approach I think the only real issue is the size of > >>>> the array that might be too small. The other issue you mentioned about > >>>> allocated page being freed and then re-allocated after page_ext is > >>>> inialized but before clear_page_tag_ref() is called is not really a > >>>> problem. Yes, we will lose that counter's value but it's similar to > >>>> other early allocations which we just treat as inaccurate. We can also > >>>> minimize the possibility of this happening by moving > >>>> clear_page_tag_ref() into init_page_alloc_tagging(). > >>>> > >>>> I don't like the pageflag option you mentioned because it adds an > >>>> extra condition check into __pgalloc_tag_sub() which will be executed > >>>> even after the init stage is over. > >>>> I'll look into this some more tomorrow as it's quite late now. > >> > >> Hi Suren > >> > >> > >>> Just though of something. Are all these pages allocated by slab? If > >>> so, I think slab does not use page->lru (need to double-check) and we > >>> could add all these pages allocated during early init into a list and > >>> then set their page_ext reference to CODETAG_EMPTY in > >>> init_page_alloc_tagging(). > >> Got your point. > >> > >> > >> There will indeed be some non-SLAB memory allocations here, such as the > >> following: > >> > >> > >> CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) > >> [ 0.326607] Hardware name: Red Hat KVM, BIOS > >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >> [ 0.326608] Call Trace: > >> [ 0.326608] <TASK> > >> [ 0.326609] dump_stack_lvl+0x53/0x70 > >> [ 0.326611] __pgalloc_tag_add+0x407/0x700 > >> [ 0.326616] get_page_from_freelist+0xa54/0x1310 > >> [ 0.326618] __alloc_frozen_pages_noprof+0x206/0x4c0 > >> [ 0.326623] alloc_pages_mpol+0x13a/0x3f0 > >> [ 0.326627] alloc_pages_noprof+0xf6/0x2b0 > >> [ 0.326628] __pmd_alloc+0x743/0x9c0 > >> [ 0.326630] vmap_range_noflush+0xac0/0x10a0 > >> [ 0.326637] ioremap_page_range+0x17c/0x250 > >> [ 0.326639] __ioremap_caller+0x437/0x5c0 > >> [ 0.326645] acpi_os_map_iomem+0x4c0/0x660 > >> [ 0.326647] acpi_tb_verify_temp_table+0x1c0/0x580 > >> [ 0.326649] acpi_reallocate_root_table+0x2ad/0x460 > >> [ 0.326655] acpi_early_init+0x111/0x460 > >> [ 0.326657] start_kernel+0x271/0x3c0 > >> [ 0.326659] x86_64_start_reservations+0x18/0x30 > >> [ 0.326660] x86_64_start_kernel+0xe2/0xf0 > >> [ 0.326662] common_startup_64+0x13e/0x141 > >> [ 0.326663] </TASK> > >> > >> CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted > >> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) > >> [ 0.329167] Hardware name: Red Hat KVM, BIOS > >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >> [ 0.329167] Call Trace: > >> [ 0.329167] <TASK> > >> [ 0.329167] dump_stack_lvl+0x53/0x70 > >> [ 0.329167] __pgalloc_tag_add+0x407/0x700 > >> [ 0.329167] get_page_from_freelist+0xa54/0x1310 > >> [ 0.329167] __alloc_frozen_pages_noprof+0x206/0x4c0 > >> [ 0.329167] __alloc_pages_noprof+0x10/0x1b0 > >> [ 0.329167] dup_task_struct+0x163/0x8c0 > >> [ 0.329167] copy_process+0x390/0x4a70 > >> [ 0.329167] kernel_clone+0xe1/0x830 > >> [ 0.329167] kernel_thread+0xcb/0x110 > >> [ 0.329167] kthreadd+0x8a2/0xc60 > >> [ 0.329167] ret_from_fork+0x551/0x720 > >> [ 0.329167] ret_from_fork_asm+0x1a/0x30 > >> [ 0.329167] </TASK> > >> > >> CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted > >> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) > >> [ 0.329167] Hardware name: Red Hat KVM, BIOS > >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >> [ 0.329167] Call Trace: > >> [ 0.329167] <TASK> > >> [ 0.329167] dump_stack_lvl+0x53/0x70 > >> [ 0.329167] __pgalloc_tag_add+0x407/0x700 > >> [ 0.329167] get_page_from_freelist+0xa54/0x1310 > >> [ 0.329167] __alloc_frozen_pages_noprof+0x206/0x4c0 > >> [ 0.329167] __alloc_pages_noprof+0x10/0x1b0 > >> [ 0.329167] dup_task_struct+0x163/0x8c0 > >> [ 0.329167] copy_process+0x390/0x4a70 > >> [ 0.329167] kernel_clone+0xe1/0x830 > >> [ 0.329167] kernel_thread+0xcb/0x110 > >> [ 0.329167] kthreadd+0x8a2/0xc60 > >> [ 0.329167] ret_from_fork+0x551/0x720 > >> [ 0.329167] ret_from_fork_asm+0x1a/0x30 > >> [ 0.329167] </TASK> > >> > >> CPU: 4 UID: 0 PID: 1 Comm: swapper/0 Not tainted > >> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) > >> [ 0.434265] Hardware name: Red Hat KVM, BIOS > >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >> [ 0.434266] Call Trace: > >> [ 0.434266] <TASK> > >> [ 0.434266] dump_stack_lvl+0x53/0x70 > >> [ 0.434268] __pgalloc_tag_add+0x407/0x700 > >> [ 0.434272] get_page_from_freelist+0xa54/0x1310 > >> [ 0.434274] __alloc_frozen_pages_noprof+0x206/0x4c0 > >> [ 0.434279] alloc_pages_exact_nid_noprof+0x10f/0x380 > >> [ 0.434283] init_section_page_ext+0x167/0x370 > >> [ 0.434284] page_ext_init+0x451/0x620 > >> [ 0.434287] page_alloc_init_late+0x553/0x630 > >> [ 0.434290] kernel_init_freeable+0x7be/0xd30 > >> [ 0.434294] kernel_init+0x1f/0x1f0 > >> [ 0.434295] ret_from_fork+0x551/0x720 > >> [ 0.434301] ret_from_fork_asm+0x1a/0x30 > >> [ 0.434303] </TASK> > >> > >> CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted > >> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) > >> [ 0.346712] Hardware name: Red Hat KVM, BIOS > >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >> [ 0.346713] Call Trace: > >> [ 0.346713] <TASK> > >> [ 0.346714] dump_stack_lvl+0x53/0x70 > >> [ 0.346715] __pgalloc_tag_add+0x407/0x700 > >> [ 0.346720] get_page_from_freelist+0xa54/0x1310 > >> [ 0.346723] __alloc_frozen_pages_noprof+0x206/0x4c0 > >> [ 0.346729] __alloc_pages_noprof+0x10/0x1b0 > >> [ 0.346731] alloc_cpu_data+0x96/0x210 > >> [ 0.346732] rb_allocate_cpu_buffer+0xb93/0x1500 > >> [ 0.346739] trace_rb_cpu_prepare+0x21a/0x4f0 > >> [ 0.346753] cpuhp_invoke_callback+0x6db/0x14b0 > >> [ 0.346755] __cpuhp_invoke_callback_range+0xde/0x1d0 > >> [ 0.346759] _cpu_up+0x395/0x880 > >> [ 0.346761] cpu_up+0x1bb/0x210 > >> [ 0.346762] cpuhp_bringup_mask+0xd2/0x150 > >> [ 0.346763] bringup_nonboot_cpus+0x12b/0x170 > >> [ 0.346764] smp_init+0x2f/0x100 > >> [ 0.346766] kernel_init_freeable+0x7a5/0xd30 > >> [ 0.346769] kernel_init+0x1f/0x1f0 > >> [ 0.346771] ret_from_fork+0x551/0x720 > >> [ 0.346776] ret_from_fork_asm+0x1a/0x30 > >> [ 0.346778] </TASK> > >> > >> and so on... > >> > >> > >> In fact, I previously conducted extensive and prolonged stress testing > >> > >> on memory profiling. After our efforts to address several WARN cases, > >> > >> one remaining scenario we are addressing is the warning triggered during > >> > >> early slab cache reclaim — which is precisely the situation we are currently > >> > >> encountering (although I cannot guarantee that all edge cases have been > >> > >> covered by our stress testing). During the stress testing process, this > >> warning > >> > >> did indeed manifest. However, the current environment triggers KASAN slab > >> > >> cache reclaim earlier than anticipated. > >> > >> > >> Although the memory allocated prior to page_ext initialization has a > >> relatively low probability of > >> > >> being released in subsequent operations (at least we have not > >> encountered such cases up to now), > >> > >> I remain uncertain whether there are any overlooked edge cases when > >> considering only slab-backed pages. > > Hi Suren > > > > Ok, I guess specialized solution for slab would not work then. I want > > to check on my side and understand how the number of these early > > allocation scales. Is it higher for bigger machines or stays constant. > > If the latter I think your original simple solution with some fixups > > can still work. I'll need to instrument my code to capture these early > > allocations and see where they originate. If you have a patch already > > doing that it would help speed it up for me. > > Thanks, > > Suren. > > OK, my V2 patch is as follows: Thanks! I'll go over it but first I need to check if the number of early allocations is constant or dependent on some factors like machine size (as I mentioned before). I hope to carve out some time to investigate that this Friday. We should also probably start a separate thread for this v2 as this email thread is getting painfully long. > > > diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h > index d40ac39bfbe8..bf226c2be2ad 100644 > --- a/include/linux/alloc_tag.h > +++ b/include/linux/alloc_tag.h > @@ -74,6 +74,8 @@ static inline void set_codetag_empty(union codetag_ref > *ref) > > #ifdef CONFIG_MEM_ALLOC_PROFILING > > +void alloc_tag_add_early_pfn(unsigned long pfn); > + > #define ALLOC_TAG_SECTION_NAME "alloc_tags" > > struct codetag_bytes { > diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h > index 38a82d65e58e..951d33362268 100644 > --- a/include/linux/pgalloc_tag.h > +++ b/include/linux/pgalloc_tag.h > @@ -181,7 +181,7 @@ static inline struct alloc_tag > *__pgalloc_tag_get(struct page *page) > > if (get_page_tag_ref(page, &ref, &handle)) { > alloc_tag_sub_check(&ref); > - if (ref.ct) > + if (ref.ct && !is_codetag_empty(&ref)) > tag = ct_to_alloc_tag(ref.ct); > put_page_tag_ref(handle); > } > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > index 58991ab09d84..55c134a71cd0 100644 > --- a/lib/alloc_tag.c > +++ b/lib/alloc_tag.c > @@ -6,6 +6,7 @@ > #include <linux/kallsyms.h> > #include <linux/module.h> > #include <linux/page_ext.h> > +#include <linux/pgalloc_tag.h> > #include <linux/proc_fs.h> > #include <linux/seq_buf.h> > #include <linux/seq_file.h> > @@ -26,6 +27,85 @@ static bool mem_profiling_support; > > static struct codetag_type *alloc_tag_cttype; > > +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG > + > +/* > + * page_ext is allocated and initialized relatively late during boot. > + * Some pages are allocated before page_ext becomes available. > + * Track these early PFNs and clear their codetag refs later to avoid > + * warnings when they are freed. > + */ > + > +#define EARLY_ALLOC_PFN_MAX 256 > + > +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX] __initdata; > +static atomic_t early_pfn_count __initdata = ATOMIC_INIT(0); > + > +static void __init __alloc_tag_add_early_pfn(unsigned long pfn) > +{ > + int old_idx, new_idx; > + > + do { > + old_idx = atomic_read(&early_pfn_count); > + if (old_idx >= EARLY_ALLOC_PFN_MAX) > + return; > + new_idx = old_idx + 1; > + } while (!atomic_try_cmpxchg(&early_pfn_count, &old_idx, new_idx)); > + > + early_pfns[old_idx] = pfn; > +} > + > +static void (*alloc_tag_add_early_pfn_ptr)(unsigned long pfn) __refdata = > + __alloc_tag_add_early_pfn; > + > +void alloc_tag_add_early_pfn(unsigned long pfn) > +{ > + if (static_key_enabled(&mem_profiling_compressed)) > + return; > + > + if (alloc_tag_add_early_pfn_ptr) > + alloc_tag_add_early_pfn_ptr(pfn); > +} > + > +static void __init clear_early_alloc_pfn_tag_refs(void) > +{ > + unsigned int i; > + > + for (i = 0; i < atomic_read(&early_pfn_count); i++) { > + unsigned long pfn = early_pfns[i]; > + > + if (pfn_valid(pfn)) { > + struct page *page = pfn_to_page(pfn); > + union pgtag_ref_handle handle; > + union codetag_ref ref; > + > + if (get_page_tag_ref(page, &ref, &handle)) { > + /* > + * An early-allocated page could be freed and reallocated > + * after its page_ext is initialized but before we > clear it. > + * In that case, it already has a valid tag set. > + * We should not overwrite that valid tag with > CODETAG_EMPTY. > + */ > + if (ref.ct) { > + put_page_tag_ref(handle); > + continue; > + } > + > + set_codetag_empty(&ref); > + update_page_tag_ref(handle, &ref); > + put_page_tag_ref(handle); > + } > + } > + > + atomic_set(&early_pfn_count, 0); > + > + alloc_tag_add_early_pfn_ptr = NULL; > +} > +#else /* !CONFIG_MEM_ALLOC_PROFILING_DEBUG */ > +inline void alloc_tag_add_early_pfn(unsigned long pfn) {} > +static inline void __init clear_early_alloc_pfn_tag_refs(void) {} > +#endif > + > #ifdef CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU > DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag); > EXPORT_SYMBOL(_shared_alloc_tag); > @@ -760,6 +840,7 @@ static __init bool need_page_alloc_tagging(void) > > static __init void init_page_alloc_tagging(void) > { > + clear_early_alloc_pfn_tag_refs(); > } > > struct page_ext_operations page_alloc_tagging_ops = { > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 2d4b6f1a554e..5ce5c4ba401f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1293,6 +1293,12 @@ void __pgalloc_tag_add(struct page *page, struct > task_struct *task, > alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > update_page_tag_ref(handle, &ref); > put_page_tag_ref(handle); > + } else { > + /* > + * page_ext is not available yet, record the pfn so we can > + * clear the tag ref later when page_ext is initialized. > + */ > + alloc_tag_add_early_pfn(page_to_pfn(page)); > } > } > > Although this 256-entry array remains unmodified for now, I will locally > record the occurrence counts > > of these various early memory allocations. Hopefully this will be > helpful to you. > > > Thanks > > Hao > > > > >> > >> Thanks > >> Hao > >> > >>>> Thanks, > >>>> Suren. > >>>> > >>>>> However, I'm not entirely certain whether SPARSEMEM can guarantee the > >>>>> same behavior. > >>>>> > >>>>> > >>>>>>> I would appreciate your valuable feedback and any better suggestions you > >>>>>>> might have. > >>>>>> Thanks for pursuing this! I'll help in any way I can. > >>>>>> Suren. > >>>>> Thank you so much for your patient guidance and assistance. > >>>>> > >>>>> I truly appreciate your willingness to share your knowledge and insights. > >>>>> > >>>>> Thanks, > >>>>> Hao > >>>>> > >>>>>>> Thanks > >>>>>>> > >>>>>>> Hao > >>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Suren. > >>>>>>>> > >>>>>>>>> Thanks > >>>>>>>>> > >>>>>>>>> Hao > >>>>>>>>> > >>>>>>>>>>> Thanks. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>>>> If the slab cache has no free objects, it falls back > >>>>>>>>>>>>>>> to the buddy allocator to allocate memory. However, at this point page_ext > >>>>>>>>>>>>>>> is not yet fully initialized, so these newly allocated pages have no > >>>>>>>>>>>>>>> codetag set. These pages may later be reclaimed by KASAN,which causes > >>>>>>>>>>>>>>> the warning to trigger when they are freed because their codetag ref is > >>>>>>>>>>>>>>> still empty. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Use a global array to track pages allocated before page_ext is fully > >>>>>>>>>>>>>>> initialized, similar to how kmemleak tracks early allocations. > >>>>>>>>>>>>>>> When page_ext initialization completes, set their codetag > >>>>>>>>>>>>>>> to empty to avoid warnings when they are freed later. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> ... > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> --- a/include/linux/alloc_tag.h > >>>>>>>>>>>>>>> +++ b/include/linux/alloc_tag.h > >>>>>>>>>>>>>>> @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> #ifdef CONFIG_MEM_ALLOC_PROFILING > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> +bool mem_profiling_is_available(void); > >>>>>>>>>>>>>>> +void alloc_tag_add_early_pfn(unsigned long pfn); > >>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>> #define ALLOC_TAG_SECTION_NAME "alloc_tags" > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> struct codetag_bytes { > >>>>>>>>>>>>>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > >>>>>>>>>>>>>>> index 58991ab09d84..a5bf4e72c154 100644 > >>>>>>>>>>>>>>> --- a/lib/alloc_tag.c > >>>>>>>>>>>>>>> +++ b/lib/alloc_tag.c > >>>>>>>>>>>>>>> @@ -6,6 +6,7 @@ > >>>>>>>>>>>>>>> #include <linux/kallsyms.h> > >>>>>>>>>>>>>>> #include <linux/module.h> > >>>>>>>>>>>>>>> #include <linux/page_ext.h> > >>>>>>>>>>>>>>> +#include <linux/pgalloc_tag.h> > >>>>>>>>>>>>>>> #include <linux/proc_fs.h> > >>>>>>>>>>>>>>> #include <linux/seq_buf.h> > >>>>>>>>>>>>>>> #include <linux/seq_file.h> > >>>>>>>>>>>>>>> @@ -26,6 +27,82 @@ static bool mem_profiling_support; > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> static struct codetag_type *alloc_tag_cttype; > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> +/* > >>>>>>>>>>>>>>> + * State of the alloc_tag > >>>>>>>>>>>>>>> + * > >>>>>>>>>>>>>>> + * This is used to describe the states of the alloc_tag during bootup. > >>>>>>>>>>>>>>> + * > >>>>>>>>>>>>>>> + * When we need to allocate page_ext to store codetag, we face an > >>>>>>>>>>>>>>> + * initialization timing problem: > >>>>>>>>>>>>>>> + * > >>>>>>>>>>>>>>> + * Due to initialization order, pages may be allocated via buddy system > >>>>>>>>>>>>>>> + * before page_ext is fully allocated and initialized. Although these > >>>>>>>>>>>>>>> + * pages call the allocation hooks, the codetag will not be set because > >>>>>>>>>>>>>>> + * page_ext is not yet available. > >>>>>>>>>>>>>>> + * > >>>>>>>>>>>>>>> + * When these pages are later free to the buddy system, it triggers > >>>>>>>>>>>>>>> + * warnings because their codetag is actually empty if > >>>>>>>>>>>>>>> + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. > >>>>>>>>>>>>>>> + * > >>>>>>>>>>>>>>> + * Additionally, in this situation, we cannot record detailed allocation > >>>>>>>>>>>>>>> + * information for these pages. > >>>>>>>>>>>>>>> + */ > >>>>>>>>>>>>>>> +enum mem_profiling_state { > >>>>>>>>>>>>>>> + DOWN, /* No mem_profiling functionality yet */ > >>>>>>>>>>>>>>> + UP /* Everything is working */ > >>>>>>>>>>>>>>> +}; > >>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>> +static enum mem_profiling_state mem_profiling_state = DOWN; > >>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>> +bool mem_profiling_is_available(void) > >>>>>>>>>>>>>>> +{ > >>>>>>>>>>>>>>> + return mem_profiling_state == UP; > >>>>>>>>>>>>>>> +} > >>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG > >>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>> +#define EARLY_ALLOC_PFN_MAX 256 > >>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; > >>>>>>>>>>>>>> It's unfortunate that this isn't __initdata. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> +static unsigned int early_pfn_count; > >>>>>>>>>>>>>>> +static DEFINE_SPINLOCK(early_pfn_lock); > >>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> ... > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> --- a/mm/page_alloc.c > >>>>>>>>>>>>>>> +++ b/mm/page_alloc.c > >>>>>>>>>>>>>>> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, > >>>>>>>>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >>>>>>>>>>>>>>> update_page_tag_ref(handle, &ref); > >>>>>>>>>>>>>>> put_page_tag_ref(handle); > >>>>>>>>>>>>>>> + } else { > >>>>>>>>>>>>> This branch can be marked as "unlikely". > >>>>>>>>>>>>> > >>>>>>>>>>>>>>> + /* > >>>>>>>>>>>>>>> + * page_ext is not available yet, record the pfn so we can > >>>>>>>>>>>>>>> + * clear the tag ref later when page_ext is initialized. > >>>>>>>>>>>>>>> + */ > >>>>>>>>>>>>>>> + if (!mem_profiling_is_available()) > >>>>>>>>>>>>>>> + alloc_tag_add_early_pfn(page_to_pfn(page)); > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>> All because of this, I believe. Is this fixable? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> If we take that `else', we know we're running in __init code, yes? I > >>>>>>>>>>>>>> don't see how `__init pgalloc_tag_add_early()' could be made to work. > >>>>>>>>>>>>>> hrm. Something clever, please. > >>>>>>>>>>>>> We can have a pointer to a function that is initialized to point to > >>>>>>>>>>>>> alloc_tag_add_early_pfn, which is defined as __init and uses > >>>>>>>>>>>>> early_pfns which now can be defined as __initdata. After > >>>>>>>>>>>>> clear_early_alloc_pfn_tag_refs() is done we reset that pointer to > >>>>>>>>>>>>> NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() > >>>>>>>>>>>>> directly checks that pointer and if it's not NULL then calls the > >>>>>>>>>>>>> function that it points to. This way __pgalloc_tag_add() which is not > >>>>>>>>>>>>> an __init function will be invoking alloc_tag_add_early_pfn() __init > >>>>>>>>>>>>> function only until we are done with initialization. I haven't tried > >>>>>>>>>>>>> this but I think that should work. This also eliminates the need for > >>>>>>>>>>>>> mem_profiling_state variable since we can use this function pointer > >>>>>>>>>>>>> instead. > >>>>>>>>>>>>> > >>>>>>>>>>>>> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-26 5:04 ` Suren Baghdasaryan @ 2026-03-26 5:33 ` Hao Ge 2026-03-26 8:23 ` Suren Baghdasaryan 0 siblings, 1 reply; 21+ messages in thread From: Hao Ge @ 2026-03-26 5:33 UTC (permalink / raw) To: Suren Baghdasaryan; +Cc: Andrew Morton, Kent Overstreet, linux-mm, linux-kernel On 2026/3/26 13:04, Suren Baghdasaryan wrote: > On Wed, Mar 25, 2026 at 6:45 PM Hao Ge <hao.ge@linux.dev> wrote: >> >> On 2026/3/25 23:17, Suren Baghdasaryan wrote: >>> On Wed, Mar 25, 2026 at 4:21 AM Hao Ge <hao.ge@linux.dev> wrote: >>>> On 2026/3/25 15:35, Suren Baghdasaryan wrote: >>>>> On Tue, Mar 24, 2026 at 11:25 PM Suren Baghdasaryan <surenb@google.com> wrote: >>>>>> On Tue, Mar 24, 2026 at 7:08 PM Hao Ge <hao.ge@linux.dev> wrote: >>>>>>> On 2026/3/25 08:21, Suren Baghdasaryan wrote: >>>>>>>> On Tue, Mar 24, 2026 at 2:43 AM Hao Ge <hao.ge@linux.dev> wrote: >>>>>>>>> On 2026/3/24 06:47, Suren Baghdasaryan wrote: >>>>>>>>>> On Mon, Mar 23, 2026 at 2:16 AM Hao Ge <hao.ge@linux.dev> wrote: >>>>>>>>>>> On 2026/3/20 10:14, Suren Baghdasaryan wrote: >>>>>>>>>>>> On Thu, Mar 19, 2026 at 6:58 PM Hao Ge <hao.ge@linux.dev> wrote: >>>>>>>>>>>>> On 2026/3/20 07:48, Suren Baghdasaryan wrote: >>>>>>>>>>>>>> On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@google.com> wrote: >>>>>>>>>>>>>>> On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: >>>>>>>>>>>>>>>> On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Due to initialization ordering, page_ext is allocated and initialized >>>>>>>>>>>>>>>>> relatively late during boot. Some pages have already been allocated >>>>>>>>>>>>>>>>> and freed before page_ext becomes available, leaving their codetag >>>>>>>>>>>>>>>>> uninitialized. >>>>>>>>>>>>>>> Hi Hao, >>>>>>>>>>>>>>> Thanks for the report. >>>>>>>>>>>>>>> Hmm. So, we are allocating pages before page_ext is initialized... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls >>>>>>>>>>>>>>>>> kmemleak_alloc(). >>>>>>>>>>>>>> Forgot to ask. The example you are using here is for page_ext >>>>>>>>>>>>>> allocation itself. Do you have any other examples where page >>>>>>>>>>>>>> allocation happens before page_ext initialization? If that's the only >>>>>>>>>>>>>> place, then we might be able to fix this in a simpler way by doing >>>>>>>>>>>>>> something special for alloc_page_ext(). >>>>>>>>>>>>> Hi Suren >>>>>>>>>>>>> >>>>>>>>>>>>> To help illustrate the point, here's the debug log I added: >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>>>>>>>>>>> index 2d4b6f1a554e..ebfe636f5b07 100644 >>>>>>>>>>>>> --- a/mm/page_alloc.c >>>>>>>>>>>>> +++ b/mm/page_alloc.c >>>>>>>>>>>>> @@ -1293,6 +1293,9 @@ void __pgalloc_tag_add(struct page *page, struct >>>>>>>>>>>>> task_struct *task, >>>>>>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >>>>>>>>>>>>> update_page_tag_ref(handle, &ref); >>>>>>>>>>>>> put_page_tag_ref(handle); >>>>>>>>>>>>> + } else { >>>>>>>>>>>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); >>>>>>>>>>>>> + dump_stack(); >>>>>>>>>>>>> } >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> And I caught the following logs: >>>>>>>>>>>>> >>>>>>>>>>>>> [ 0.296399] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>>>>>> page=ffffea000400c700 pfn=1049372 nr=1 >>>>>>>>>>>>> [ 0.296400] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >>>>>>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >>>>>>>>>>>>> [ 0.296402] Hardware name: Red Hat KVM, BIOS >>>>>>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>>>>>>>>>>> [ 0.296402] Call Trace: >>>>>>>>>>>>> [ 0.296403] <TASK> >>>>>>>>>>>>> [ 0.296403] dump_stack_lvl+0x53/0x70 >>>>>>>>>>>>> [ 0.296405] __pgalloc_tag_add+0x3a3/0x6e0 >>>>>>>>>>>>> [ 0.296406] ? __pfx___pgalloc_tag_add+0x10/0x10 >>>>>>>>>>>>> [ 0.296407] ? kasan_unpoison+0x27/0x60 >>>>>>>>>>>>> [ 0.296409] ? __kasan_unpoison_pages+0x2c/0x40 >>>>>>>>>>>>> [ 0.296411] get_page_from_freelist+0xa54/0x1310 >>>>>>>>>>>>> [ 0.296413] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>>>>>>>>>>> [ 0.296415] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >>>>>>>>>>>>> [ 0.296417] ? stack_depot_save_flags+0x3f/0x680 >>>>>>>>>>>>> [ 0.296418] ? ___slab_alloc+0x518/0x530 >>>>>>>>>>>>> [ 0.296420] alloc_pages_mpol+0x13a/0x3f0 >>>>>>>>>>>>> [ 0.296421] ? __pfx_alloc_pages_mpol+0x10/0x10 >>>>>>>>>>>>> [ 0.296423] ? _raw_spin_lock_irqsave+0x8a/0xf0 >>>>>>>>>>>>> [ 0.296424] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 >>>>>>>>>>>>> [ 0.296426] alloc_slab_page+0xc2/0x130 >>>>>>>>>>>>> [ 0.296427] allocate_slab+0x77/0x2c0 >>>>>>>>>>>>> [ 0.296429] ? syscall_enter_define_fields+0x3bb/0x5f0 >>>>>>>>>>>>> [ 0.296430] ___slab_alloc+0x125/0x530 >>>>>>>>>>>>> [ 0.296432] ? __trace_define_field+0x252/0x3d0 >>>>>>>>>>>>> [ 0.296433] __kmalloc_noprof+0x329/0x630 >>>>>>>>>>>>> [ 0.296435] ? syscall_enter_define_fields+0x3bb/0x5f0 >>>>>>>>>>>>> [ 0.296436] syscall_enter_define_fields+0x3bb/0x5f0 >>>>>>>>>>>>> [ 0.296438] ? __pfx_syscall_enter_define_fields+0x10/0x10 >>>>>>>>>>>>> [ 0.296440] event_define_fields+0x326/0x540 >>>>>>>>>>>>> [ 0.296441] __trace_early_add_events+0xac/0x3c0 >>>>>>>>>>>>> [ 0.296443] trace_event_init+0x24c/0x460 >>>>>>>>>>>>> [ 0.296445] trace_init+0x9/0x20 >>>>>>>>>>>>> [ 0.296446] start_kernel+0x199/0x3c0 >>>>>>>>>>>>> [ 0.296448] x86_64_start_reservations+0x18/0x30 >>>>>>>>>>>>> [ 0.296449] x86_64_start_kernel+0xe2/0xf0 >>>>>>>>>>>>> [ 0.296451] common_startup_64+0x13e/0x141 >>>>>>>>>>>>> [ 0.296453] </TASK> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> [ 0.312234] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>>>>>> page=ffffea000400f900 pfn=1049572 nr=1 >>>>>>>>>>>>> [ 0.312234] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >>>>>>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >>>>>>>>>>>>> [ 0.312236] Hardware name: Red Hat KVM, BIOS >>>>>>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>>>>>>>>>>> [ 0.312236] Call Trace: >>>>>>>>>>>>> [ 0.312237] <TASK> >>>>>>>>>>>>> [ 0.312237] dump_stack_lvl+0x53/0x70 >>>>>>>>>>>>> [ 0.312239] __pgalloc_tag_add+0x3a3/0x6e0 >>>>>>>>>>>>> [ 0.312240] ? __pfx___pgalloc_tag_add+0x10/0x10 >>>>>>>>>>>>> [ 0.312241] ? rmqueue.constprop.0+0x4fc/0x1ce0 >>>>>>>>>>>>> [ 0.312243] ? kasan_unpoison+0x27/0x60 >>>>>>>>>>>>> [ 0.312244] ? __kasan_unpoison_pages+0x2c/0x40 >>>>>>>>>>>>> [ 0.312246] get_page_from_freelist+0xa54/0x1310 >>>>>>>>>>>>> [ 0.312248] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>>>>>>>>>>> [ 0.312250] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >>>>>>>>>>>>> [ 0.312253] alloc_slab_page+0x39/0x130 >>>>>>>>>>>>> [ 0.312254] allocate_slab+0x77/0x2c0 >>>>>>>>>>>>> [ 0.312255] ? alloc_cpumask_var_node+0xc7/0x230 >>>>>>>>>>>>> [ 0.312257] ___slab_alloc+0x46d/0x530 >>>>>>>>>>>>> [ 0.312259] __kmalloc_node_noprof+0x2fa/0x680 >>>>>>>>>>>>> [ 0.312261] ? alloc_cpumask_var_node+0xc7/0x230 >>>>>>>>>>>>> [ 0.312263] alloc_cpumask_var_node+0xc7/0x230 >>>>>>>>>>>>> [ 0.312264] init_desc+0x141/0x6b0 >>>>>>>>>>>>> [ 0.312266] alloc_desc+0x108/0x1b0 >>>>>>>>>>>>> [ 0.312267] early_irq_init+0xee/0x1c0 >>>>>>>>>>>>> [ 0.312268] ? __pfx_early_irq_init+0x10/0x10 >>>>>>>>>>>>> [ 0.312271] start_kernel+0x1ab/0x3c0 >>>>>>>>>>>>> [ 0.312272] x86_64_start_reservations+0x18/0x30 >>>>>>>>>>>>> [ 0.312274] x86_64_start_kernel+0xe2/0xf0 >>>>>>>>>>>>> [ 0.312275] common_startup_64+0x13e/0x141 >>>>>>>>>>>>> [ 0.312277] </TASK> >>>>>>>>>>>>> >>>>>>>>>>>>> [ 0.312834] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>>>>>> page=ffffea000400fc00 pfn=1049584 nr=1 >>>>>>>>>>>>> [ 0.312835] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >>>>>>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) >>>>>>>>>>>>> [ 0.312836] Hardware name: Red Hat KVM, BIOS >>>>>>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>>>>>>>>>>> [ 0.312837] Call Trace: >>>>>>>>>>>>> [ 0.312837] <TASK> >>>>>>>>>>>>> [ 0.312838] dump_stack_lvl+0x53/0x70 >>>>>>>>>>>>> [ 0.312840] __pgalloc_tag_add+0x3a3/0x6e0 >>>>>>>>>>>>> [ 0.312841] ? __pfx___pgalloc_tag_add+0x10/0x10 >>>>>>>>>>>>> [ 0.312842] ? rmqueue.constprop.0+0x4fc/0x1ce0 >>>>>>>>>>>>> [ 0.312844] ? kasan_unpoison+0x27/0x60 >>>>>>>>>>>>> [ 0.312845] ? __kasan_unpoison_pages+0x2c/0x40 >>>>>>>>>>>>> [ 0.312847] get_page_from_freelist+0xa54/0x1310 >>>>>>>>>>>>> [ 0.312849] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>>>>>>>>>>> [ 0.312851] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 >>>>>>>>>>>>> [ 0.312853] alloc_pages_mpol+0x13a/0x3f0 >>>>>>>>>>>>> [ 0.312855] ? __pfx_alloc_pages_mpol+0x10/0x10 >>>>>>>>>>>>> [ 0.312856] ? xas_find+0x2d8/0x450 >>>>>>>>>>>>> [ 0.312858] ? _raw_spin_lock+0x84/0xe0 >>>>>>>>>>>>> [ 0.312859] ? __pfx__raw_spin_lock+0x10/0x10 >>>>>>>>>>>>> [ 0.312861] alloc_pages_noprof+0xf6/0x2b0 >>>>>>>>>>>>> [ 0.312862] __change_page_attr+0x293/0x850 >>>>>>>>>>>>> [ 0.312864] ? __pfx___change_page_attr+0x10/0x10 >>>>>>>>>>>>> [ 0.312865] ? _vm_unmap_aliases+0x2d0/0x650 >>>>>>>>>>>>> [ 0.312868] ? __pfx__vm_unmap_aliases+0x10/0x10 >>>>>>>>>>>>> [ 0.312869] __change_page_attr_set_clr+0x16c/0x360 >>>>>>>>>>>>> [ 0.312871] ? spp_getpage+0xbb/0x1e0 >>>>>>>>>>>>> [ 0.312872] change_page_attr_set_clr+0x220/0x3c0 >>>>>>>>>>>>> [ 0.312873] ? flush_tlb_one_kernel+0xf/0x30 >>>>>>>>>>>>> [ 0.312875] ? set_pte_vaddr_p4d+0x110/0x180 >>>>>>>>>>>>> [ 0.312877] ? __pfx_change_page_attr_set_clr+0x10/0x10 >>>>>>>>>>>>> [ 0.312878] ? __pfx_set_pte_vaddr_p4d+0x10/0x10 >>>>>>>>>>>>> [ 0.312881] ? __pfx_mtree_load+0x10/0x10 >>>>>>>>>>>>> [ 0.312883] ? __pfx_mtree_load+0x10/0x10 >>>>>>>>>>>>> [ 0.312884] ? __asan_memcpy+0x3c/0x60 >>>>>>>>>>>>> [ 0.312886] ? set_intr_gate+0x10c/0x150 >>>>>>>>>>>>> [ 0.312888] set_memory_ro+0x76/0xa0 >>>>>>>>>>>>> [ 0.312889] ? __pfx_set_memory_ro+0x10/0x10 >>>>>>>>>>>>> [ 0.312891] idt_setup_apic_and_irq_gates+0x2c1/0x390 >>>>>>>>>>>>> >>>>>>>>>>>>> and more. >>>>>>>>>>>> Ok, it's not the only place. Got your point. >>>>>>>>>>>> >>>>>>>>>>>>> off topic - if we were to handle only alloc_page_ext() specifically, >>>>>>>>>>>>> what would be the most straightforward >>>>>>>>>>>>> >>>>>>>>>>>>> solution in your mind? I'd really appreciate your insight. >>>>>>>>>>>> I was thinking if it's the only special case maybe we can handle it >>>>>>>>>>>> somehow differently, like we do when we allocate obj_ext vectors for >>>>>>>>>>>> slabs using __GFP_NO_OBJ_EXT. I haven't found a good solution yet but >>>>>>>>>>>> since it's not a special case we would not be able to use it even if I >>>>>>>>>>>> came up with something... >>>>>>>>>>>> I think your way is the most straight-forward but please try my >>>>>>>>>>>> suggestion to see if we can avoid extra overhead. >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Suren. >>>>>>> Hi Suren >>>>>>>>> Hi Suren >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hi Hao, >>>>>>>>>> >>>>>>>>>>> Hi Suren >>>>>>>>>>> >>>>>>>>>>> Thank you for your feedback. After re-examining this issue, >>>>>>>>>>> >>>>>>>>>>> I realize my previous focus was misplaced. >>>>>>>>>>> >>>>>>>>>>> Upon deeper consideration, I understand that this is not merely a bug, >>>>>>>>>>> >>>>>>>>>>> but rather a warning that indicates a gap in our memory profiling mechanism. >>>>>>>>>>> >>>>>>>>>>> Specifically, the current implementation appears to be missing memory >>>>>>>>>>> allocation >>>>>>>>>>> >>>>>>>>>>> tracking during the period between the buddy system allocation and page_ext >>>>>>>>>>> >>>>>>>>>>> initialization. >>>>>>>>>>> >>>>>>>>>>> This profiling gap means we may not be capturing all relevant memory >>>>>>>>>>> allocation >>>>>>>>>>> >>>>>>>>>>> events during this critical transition phase. >>>>>>>>>> Correct, this limitation exists because memory profiling relies on >>>>>>>>>> some kernel facilities (page_ext, objj_ext) which might not be >>>>>>>>>> initialized yet at the time of allocation. >>>>>>>>>> >>>>>>>>>>> My approach is to dynamically allocate codetag_ref when get_page_tag_ref >>>>>>>>>>> fails, >>>>>>>>>>> >>>>>>>>>>> and maintain a linked list to track all buddy system allocations that >>>>>>>>>>> occur prior to page_ext initialization. >>>>>>>>>>> >>>>>>>>>>> However, this introduces performance concerns: >>>>>>>>>>> >>>>>>>>>>> 1. Free Path Overhead: When freeing these pages, we would need to >>>>>>>>>>> traverse the entire linked list to locate >>>>>>>>>>> >>>>>>>>>>> the corresponding codetag_ref, resulting in O(n) lookup complexity >>>>>>>>>>> per free operation. >>>>>>>>>>> >>>>>>>>>>> 2. Initialization Overhead: During init_page_alloc_tagging, iterating >>>>>>>>>>> through the linked list to assign codetag_ref to >>>>>>>>>>> >>>>>>>>>>> page_ext would introduce additional traversal cost. >>>>>>>>>>> >>>>>>>>>>> If the number of pages is substantial, this could incur significant >>>>>>>>>>> overhead. What are your thoughts on this? I look forward to your >>>>>>>>>>> suggestions. >>>>>>>>>> My thinking is that these early allocations comprise a small portion >>>>>>>>>> of overall memory consumed by the system. So, instead of trying to >>>>>>>>>> record and handle them in some alternative way, we just accept that >>>>>>>>>> some counters might not be exactly accurate and ignore those early >>>>>>>>>> allocations. See how the early slab allocations are marked with the >>>>>>>>>> CODETAG_FLAG_INACCURATE flag and later reported as inaccurate. I think >>>>>>>>>> that's an acceptable alternative to introducing extra complexity and >>>>>>>>>> performance overhead. IOW, the benefits of accounting for these early >>>>>>>>>> allocations are low compared to the effort required to account for >>>>>>>>>> them. Unless you found a simple and performant way to do that... >>>>>>>>> I have been exploring possible solutions to this issue over the past few >>>>>>>>> days, >>>>>>>>> >>>>>>>>> but so far I have not come up with a good approach. >>>>>>>>> >>>>>>>>> I have counted the number of memory allocations that occur earlier than the >>>>>>>>> >>>>>>>>> allocation and initialization of our page_ext, and found that there are >>>>>>>>> actually >>>>>>>>> >>>>>>>>> quite a lot of them. >>>>>>>> Interesting... I wonder it's because deferred_struct_pages defers >>>>>>>> page_ext initialization. Can you check if setting early_page_ext >>>>>>>> reduces or eliminates these allocations before page_ext init cases? >>>>>>> Yes, you are correct. In my 8-core 16GB virtual machine, I used a global >>>>>>> counter >>>>>>> >>>>>>> to record these allocations. With early_page_ext enabled, there were 130 >>>>>>> allocations >>>>>>> >>>>>>> before page_ext initialization. Without early_page_ext, there were 802 >>>>>>> allocations >>>>>>> >>>>>>> before page_ext initialization. >>>>>>> >>>>>>> >>>>>>>>> Similarly, I have made the following changes and collected the >>>>>>>>> corresponding logs. >>>>>>>>> >>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>>>>>>> index 2d4b6f1a554e..6db65b3d52d3 100644 >>>>>>>>> --- a/mm/page_alloc.c >>>>>>>>> +++ b/mm/page_alloc.c >>>>>>>>> @@ -1293,6 +1293,8 @@ void __pgalloc_tag_add(struct page *page, struct >>>>>>>>> task_struct *task, >>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >>>>>>>>> update_page_tag_ref(handle, &ref); >>>>>>>>> put_page_tag_ref(handle); >>>>>>>>> + } else{ >>>>>>>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> @@ -1314,6 +1316,8 @@ void __pgalloc_tag_sub(struct page *page, unsigned >>>>>>>>> int nr) >>>>>>>>> alloc_tag_sub(&ref, PAGE_SIZE * nr); >>>>>>>>> update_page_tag_ref(handle, &ref); >>>>>>>>> put_page_tag_ref(handle); >>>>>>>>> + } else{ >>>>>>>>> + pr_warn("__pgalloc_tag_sub: get_page_tag_ref failed! >>>>>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> [ 0.261699] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001000 pfn=1048640 nr=2 >>>>>>>>> [ 0.261711] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001100 pfn=1048644 nr=4 >>>>>>>>> [ 0.261717] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001200 pfn=1048648 nr=4 >>>>>>>>> [ 0.261721] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001300 pfn=1048652 nr=4 >>>>>>>>> [ 0.261893] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001080 pfn=1048642 nr=2 >>>>>>>>> [ 0.261917] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001400 pfn=1048656 nr=4 >>>>>>>>> [ 0.262018] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001500 pfn=1048660 nr=2 >>>>>>>>> [ 0.262024] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001600 pfn=1048664 nr=8 >>>>>>>>> [ 0.262040] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001580 pfn=1048662 nr=1 >>>>>>>>> [ 0.262048] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea00040015c0 pfn=1048663 nr=1 >>>>>>>>> [ 0.262056] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001800 pfn=1048672 nr=2 >>>>>>>>> [ 0.262064] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001880 pfn=1048674 nr=2 >>>>>>>>> [ 0.262078] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001900 pfn=1048676 nr=2 >>>>>>>>> [ 0.262196] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 >>>>>>>>> [ 0.262213] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001980 pfn=1048678 nr=2 >>>>>>>>> [ 0.262220] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001a00 pfn=1048680 nr=4 >>>>>>>>> [ 0.262246] ODEBUG: selftest passed >>>>>>>>> [ 0.262268] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001b00 pfn=1048684 nr=1 >>>>>>>>> [ 0.262318] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001b40 pfn=1048685 nr=1 >>>>>>>>> [ 0.262368] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001b80 pfn=1048686 nr=1 >>>>>>>>> [ 0.262418] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001bc0 pfn=1048687 nr=1 >>>>>>>>> [ 0.262469] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001c00 pfn=1048688 nr=1 >>>>>>>>> [ 0.262519] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001c40 pfn=1048689 nr=1 >>>>>>>>> [ 0.262569] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001c80 pfn=1048690 nr=1 >>>>>>>>> [ 0.262620] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001cc0 pfn=1048691 nr=1 >>>>>>>>> [ 0.262670] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001d00 pfn=1048692 nr=1 >>>>>>>>> [ 0.262721] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001d40 pfn=1048693 nr=1 >>>>>>>>> [ 0.262771] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001d80 pfn=1048694 nr=1 >>>>>>>>> [ 0.262821] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001dc0 pfn=1048695 nr=1 >>>>>>>>> [ 0.262871] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001e00 pfn=1048696 nr=1 >>>>>>>>> [ 0.262923] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001e40 pfn=1048697 nr=1 >>>>>>>>> [ 0.262974] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001e80 pfn=1048698 nr=1 >>>>>>>>> [ 0.263024] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001ec0 pfn=1048699 nr=1 >>>>>>>>> [ 0.263074] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001f00 pfn=1048700 nr=1 >>>>>>>>> [ 0.263124] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001f40 pfn=1048701 nr=1 >>>>>>>>> [ 0.263174] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001f80 pfn=1048702 nr=1 >>>>>>>>> [ 0.263224] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004001fc0 pfn=1048703 nr=1 >>>>>>>>> [ 0.263275] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004002000 pfn=1048704 nr=1 >>>>>>>>> [ 0.263325] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004002040 pfn=1048705 nr=1 >>>>>>>>> [ 0.263375] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004002080 pfn=1048706 nr=1 >>>>>>>>> [ 0.263427] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004002400 pfn=1048720 nr=16 >>>>>>>>> [ 0.263437] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea00040020c0 pfn=1048707 nr=1 >>>>>>>>> [ 0.263463] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004002100 pfn=1048708 nr=1 >>>>>>>>> [ 0.263465] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004002140 pfn=1048709 nr=1 >>>>>>>>> [ 0.263467] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004002180 pfn=1048710 nr=1 >>>>>>>>> [ 0.263509] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004002200 pfn=1048712 nr=4 >>>>>>>>> [ 0.263512] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004002800 pfn=1048736 nr=8 >>>>>>>>> [ 0.263524] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea00040021c0 pfn=1048711 nr=1 >>>>>>>>> [ 0.263536] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004002300 pfn=1048716 nr=1 >>>>>>>>> [ 0.263537] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004002340 pfn=1048717 nr=1 >>>>>>>>> [ 0.263539] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004002380 pfn=1048718 nr=1 >>>>>>>>> [ 0.263604] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004004000 pfn=1048832 nr=128 >>>>>>>>> [ 0.263638] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004003000 pfn=1048768 nr=64 >>>>>>>>> [ 0.263650] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004002c00 pfn=1048752 nr=16 >>>>>>>>> [ 0.263655] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea00040023c0 pfn=1048719 nr=1 >>>>>>>>> [ 0.270582] __pgalloc_tag_sub: get_page_tag_ref failed! >>>>>>>>> page=ffffea00040023c0 pfn=1048719 nr=1 >>>>>>>>> [ 0.270591] ftrace: allocating 52717 entries in 208 pages >>>>>>>>> [ 0.270592] ftrace: allocated 208 pages with 3 groups >>>>>>>>> [ 0.270620] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004002a00 pfn=1048744 nr=8 >>>>>>>>> [ 0.270636] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea00040023c0 pfn=1048719 nr=1 >>>>>>>>> [ 0.270643] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006000 pfn=1048960 nr=1 >>>>>>>>> [ 0.270649] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006040 pfn=1048961 nr=1 >>>>>>>>> [ 0.270658] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004007000 pfn=1049024 nr=64 >>>>>>>>> [ 0.270659] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006080 pfn=1048962 nr=2 >>>>>>>>> [ 0.270722] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006100 pfn=1048964 nr=1 >>>>>>>>> [ 0.270730] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006140 pfn=1048965 nr=1 >>>>>>>>> [ 0.270738] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006180 pfn=1048966 nr=1 >>>>>>>>> [ 0.270777] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea00040061c0 pfn=1048967 nr=1 >>>>>>>>> [ 0.270786] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006200 pfn=1048968 nr=1 >>>>>>>>> [ 0.270792] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006240 pfn=1048969 nr=1 >>>>>>>>> [ 0.270833] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006300 pfn=1048972 nr=4 >>>>>>>>> [ 0.270891] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006280 pfn=1048970 nr=1 >>>>>>>>> [ 0.270980] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea00040062c0 pfn=1048971 nr=1 >>>>>>>>> [ 0.271071] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006400 pfn=1048976 nr=1 >>>>>>>>> [ 0.271156] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006440 pfn=1048977 nr=1 >>>>>>>>> [ 0.271185] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006480 pfn=1048978 nr=2 >>>>>>>>> [ 0.271301] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006500 pfn=1048980 nr=1 >>>>>>>>> [ 0.271655] Dynamic Preempt: lazy >>>>>>>>> [ 0.271662] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006580 pfn=1048982 nr=2 >>>>>>>>> [ 0.271752] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006600 pfn=1048984 nr=4 >>>>>>>>> [ 0.271762] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004010000 pfn=1049600 nr=4 >>>>>>>>> [ 0.271824] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006540 pfn=1048981 nr=1 >>>>>>>>> [ 0.271916] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006700 pfn=1048988 nr=2 >>>>>>>>> [ 0.271964] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006780 pfn=1048990 nr=1 >>>>>>>>> [ 0.272099] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea00040067c0 pfn=1048991 nr=1 >>>>>>>>> [ 0.272138] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006800 pfn=1048992 nr=2 >>>>>>>>> [ 0.272144] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006a00 pfn=1049000 nr=8 >>>>>>>>> [ 0.272249] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006c00 pfn=1049008 nr=8 >>>>>>>>> [ 0.272319] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006880 pfn=1048994 nr=2 >>>>>>>>> [ 0.272351] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006900 pfn=1048996 nr=4 >>>>>>>>> [ 0.272424] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004006e00 pfn=1049016 nr=8 >>>>>>>>> [ 0.272485] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004008000 pfn=1049088 nr=8 >>>>>>>>> [ 0.272535] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004008200 pfn=1049096 nr=2 >>>>>>>>> [ 0.272600] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004008400 pfn=1049104 nr=8 >>>>>>>>> [ 0.272663] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004008300 pfn=1049100 nr=4 >>>>>>>>> [ 0.272694] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004008280 pfn=1049098 nr=2 >>>>>>>>> [ 0.272708] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004008600 pfn=1049112 nr=8 >>>>>>>>> >>>>>>>>> [ 0.272924] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004008880 pfn=1049122 nr=2 >>>>>>>>> [ 0.272934] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004008900 pfn=1049124 nr=2 >>>>>>>>> [ 0.272952] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004008c00 pfn=1049136 nr=4 >>>>>>>>> [ 0.273035] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004008980 pfn=1049126 nr=2 >>>>>>>>> [ 0.273062] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004008e00 pfn=1049144 nr=8 >>>>>>>>> [ 0.273674] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004008d00 pfn=1049140 nr=1 >>>>>>>>> [ 0.273884] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004008d80 pfn=1049142 nr=2 >>>>>>>>> [ 0.273943] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009000 pfn=1049152 nr=2 >>>>>>>>> [ 0.274379] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009080 pfn=1049154 nr=2 >>>>>>>>> [ 0.274575] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009200 pfn=1049160 nr=8 >>>>>>>>> [ 0.274617] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009100 pfn=1049156 nr=4 >>>>>>>>> [ 0.274794] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009400 pfn=1049168 nr=2 >>>>>>>>> [ 0.274840] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009480 pfn=1049170 nr=2 >>>>>>>>> [ 0.275057] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009500 pfn=1049172 nr=2 >>>>>>>>> [ 0.275092] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009580 pfn=1049174 nr=2 >>>>>>>>> [ 0.275134] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009600 pfn=1049176 nr=8 >>>>>>>>> [ 0.275211] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009800 pfn=1049184 nr=4 >>>>>>>>> [ 0.275510] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009900 pfn=1049188 nr=2 >>>>>>>>> [ 0.275548] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009980 pfn=1049190 nr=2 >>>>>>>>> [ 0.275976] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009a00 pfn=1049192 nr=8 >>>>>>>>> [ 0.275987] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009c00 pfn=1049200 nr=2 >>>>>>>>> [ 0.276139] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009c80 pfn=1049202 nr=2 >>>>>>>>> [ 0.276152] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004008d40 pfn=1049141 nr=1 >>>>>>>>> [ 0.276242] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009d00 pfn=1049204 nr=1 >>>>>>>>> [ 0.276358] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009d40 pfn=1049205 nr=1 >>>>>>>>> [ 0.276444] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009d80 pfn=1049206 nr=1 >>>>>>>>> [ 0.276526] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009dc0 pfn=1049207 nr=1 >>>>>>>>> [ 0.276615] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009e00 pfn=1049208 nr=1 >>>>>>>>> [ 0.276696] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009e40 pfn=1049209 nr=1 >>>>>>>>> [ 0.276792] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009e80 pfn=1049210 nr=1 >>>>>>>>> [ 0.276827] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009f00 pfn=1049212 nr=2 >>>>>>>>> [ 0.276891] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009ec0 pfn=1049211 nr=1 >>>>>>>>> [ 0.276999] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009f80 pfn=1049214 nr=1 >>>>>>>>> [ 0.277082] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea0004009fc0 pfn=1049215 nr=1 >>>>>>>>> [ 0.277172] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea000400a000 pfn=1049216 nr=1 >>>>>>>>> [ 0.277257] __pgalloc_tag_add: get_page_tag_ref failed! >>>>>>>>> page=ffffea000400a040 pfn=1049217 nr=1 >>>>>>>>> >>>>>>>>> and so on. >>>>>>>>> >>>>>>>>> >>>>>>>>>> I think your earlier patch can effectively detect these early >>>>>>>>>> allocations and suppress the warnings. We should also mark these >>>>>>>>>> allocations with CODETAG_FLAG_INACCURATE. >>>>>>>>> Thanks to an excellent AI review, I realized there are issues with >>>>>>>>> >>>>>>>>> my original patch. One problem is the 256-element array; another >>>>>>>> Yes, if there are lots of such allocations, it's not appropriate. >>>>>>>> >>>>>>>>> is that it involves allocation and free operations — meaning we need >>>>>>>>> >>>>>>>>> to record entries at __pgalloc_tag_add and remove them at __pgalloc_tag_sub, >>>>>>>>> >>>>>>>>> which introduces a noticeable overhead. I'm wondering if we can instead >>>>>>>>> set a flag >>>>>>>>> >>>>>>>>> bit in page flags during the early boot stage, which I'll refer to as >>>>>>>>> EARLY_ALLOC_FLAGS. >>>>>>>>> >>>>>>>>> Then, in __pgalloc_tag_sub, we first check for EARLY_ALLOC_FLAGS. If >>>>>>>>> set, we clear the >>>>>>>>> >>>>>>>>> flag and return immediately; otherwise, we perform the actual >>>>>>>>> subtraction of the tag count. >>>>>>>>> >>>>>>>>> This approach seems somewhat similar to the idea behind >>>>>>>>> mem_profiling_compressed. >>>>>>>> That seems doable but let's first check if we can make page_ext >>>>>>>> initialization happen before these allocations. That would be the >>>>>>>> ideal path. If it's not possible then we can focus on alternatives >>>>>>>> like the one you propose. >>>>>>> Yes, the ideal scenario would be to have page_ext initialization >>>>>>> complete before >>>>>>> >>>>>>> these allocations occur. I just did a code walkthrough and found that >>>>>>> this resembles >>>>>>> >>>>>>> the FLATMEM implementation approach - FLATMEM allocates page_ext before >>>>>>> the buddy >>>>>>> >>>>>>> system initialization, so it doesn't seem to encounter the issue we're >>>>>>> facing now. >>>>>>> >>>>>>> https://elixir.bootlin.com/linux/v7.0-rc5/source/mm/mm_init.c#L2707 >>>>>> Yes, page_ext_init_flatmem() looks like an interesting option and it >>>>>> would not work with sparsemem. TBH I would prefer to find a simple >>>>>> solution that can identify early init allocations, mark them inaccuate >>>>>> and suppress the warning rather than introduce some complex mechanism >>>>>> to account for them which would work only is some cases (flatmem). >>>>>> With your original approach I think the only real issue is the size of >>>>>> the array that might be too small. The other issue you mentioned about >>>>>> allocated page being freed and then re-allocated after page_ext is >>>>>> inialized but before clear_page_tag_ref() is called is not really a >>>>>> problem. Yes, we will lose that counter's value but it's similar to >>>>>> other early allocations which we just treat as inaccurate. We can also >>>>>> minimize the possibility of this happening by moving >>>>>> clear_page_tag_ref() into init_page_alloc_tagging(). >>>>>> >>>>>> I don't like the pageflag option you mentioned because it adds an >>>>>> extra condition check into __pgalloc_tag_sub() which will be executed >>>>>> even after the init stage is over. >>>>>> I'll look into this some more tomorrow as it's quite late now. >>>> Hi Suren >>>> >>>> >>>>> Just though of something. Are all these pages allocated by slab? If >>>>> so, I think slab does not use page->lru (need to double-check) and we >>>>> could add all these pages allocated during early init into a list and >>>>> then set their page_ext reference to CODETAG_EMPTY in >>>>> init_page_alloc_tagging(). >>>> Got your point. >>>> >>>> >>>> There will indeed be some non-SLAB memory allocations here, such as the >>>> following: >>>> >>>> >>>> CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted >>>> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) >>>> [ 0.326607] Hardware name: Red Hat KVM, BIOS >>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>> [ 0.326608] Call Trace: >>>> [ 0.326608] <TASK> >>>> [ 0.326609] dump_stack_lvl+0x53/0x70 >>>> [ 0.326611] __pgalloc_tag_add+0x407/0x700 >>>> [ 0.326616] get_page_from_freelist+0xa54/0x1310 >>>> [ 0.326618] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>> [ 0.326623] alloc_pages_mpol+0x13a/0x3f0 >>>> [ 0.326627] alloc_pages_noprof+0xf6/0x2b0 >>>> [ 0.326628] __pmd_alloc+0x743/0x9c0 >>>> [ 0.326630] vmap_range_noflush+0xac0/0x10a0 >>>> [ 0.326637] ioremap_page_range+0x17c/0x250 >>>> [ 0.326639] __ioremap_caller+0x437/0x5c0 >>>> [ 0.326645] acpi_os_map_iomem+0x4c0/0x660 >>>> [ 0.326647] acpi_tb_verify_temp_table+0x1c0/0x580 >>>> [ 0.326649] acpi_reallocate_root_table+0x2ad/0x460 >>>> [ 0.326655] acpi_early_init+0x111/0x460 >>>> [ 0.326657] start_kernel+0x271/0x3c0 >>>> [ 0.326659] x86_64_start_reservations+0x18/0x30 >>>> [ 0.326660] x86_64_start_kernel+0xe2/0xf0 >>>> [ 0.326662] common_startup_64+0x13e/0x141 >>>> [ 0.326663] </TASK> >>>> >>>> CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted >>>> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) >>>> [ 0.329167] Hardware name: Red Hat KVM, BIOS >>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>> [ 0.329167] Call Trace: >>>> [ 0.329167] <TASK> >>>> [ 0.329167] dump_stack_lvl+0x53/0x70 >>>> [ 0.329167] __pgalloc_tag_add+0x407/0x700 >>>> [ 0.329167] get_page_from_freelist+0xa54/0x1310 >>>> [ 0.329167] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>> [ 0.329167] __alloc_pages_noprof+0x10/0x1b0 >>>> [ 0.329167] dup_task_struct+0x163/0x8c0 >>>> [ 0.329167] copy_process+0x390/0x4a70 >>>> [ 0.329167] kernel_clone+0xe1/0x830 >>>> [ 0.329167] kernel_thread+0xcb/0x110 >>>> [ 0.329167] kthreadd+0x8a2/0xc60 >>>> [ 0.329167] ret_from_fork+0x551/0x720 >>>> [ 0.329167] ret_from_fork_asm+0x1a/0x30 >>>> [ 0.329167] </TASK> >>>> >>>> CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted >>>> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) >>>> [ 0.329167] Hardware name: Red Hat KVM, BIOS >>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>> [ 0.329167] Call Trace: >>>> [ 0.329167] <TASK> >>>> [ 0.329167] dump_stack_lvl+0x53/0x70 >>>> [ 0.329167] __pgalloc_tag_add+0x407/0x700 >>>> [ 0.329167] get_page_from_freelist+0xa54/0x1310 >>>> [ 0.329167] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>> [ 0.329167] __alloc_pages_noprof+0x10/0x1b0 >>>> [ 0.329167] dup_task_struct+0x163/0x8c0 >>>> [ 0.329167] copy_process+0x390/0x4a70 >>>> [ 0.329167] kernel_clone+0xe1/0x830 >>>> [ 0.329167] kernel_thread+0xcb/0x110 >>>> [ 0.329167] kthreadd+0x8a2/0xc60 >>>> [ 0.329167] ret_from_fork+0x551/0x720 >>>> [ 0.329167] ret_from_fork_asm+0x1a/0x30 >>>> [ 0.329167] </TASK> >>>> >>>> CPU: 4 UID: 0 PID: 1 Comm: swapper/0 Not tainted >>>> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) >>>> [ 0.434265] Hardware name: Red Hat KVM, BIOS >>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>> [ 0.434266] Call Trace: >>>> [ 0.434266] <TASK> >>>> [ 0.434266] dump_stack_lvl+0x53/0x70 >>>> [ 0.434268] __pgalloc_tag_add+0x407/0x700 >>>> [ 0.434272] get_page_from_freelist+0xa54/0x1310 >>>> [ 0.434274] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>> [ 0.434279] alloc_pages_exact_nid_noprof+0x10f/0x380 >>>> [ 0.434283] init_section_page_ext+0x167/0x370 >>>> [ 0.434284] page_ext_init+0x451/0x620 >>>> [ 0.434287] page_alloc_init_late+0x553/0x630 >>>> [ 0.434290] kernel_init_freeable+0x7be/0xd30 >>>> [ 0.434294] kernel_init+0x1f/0x1f0 >>>> [ 0.434295] ret_from_fork+0x551/0x720 >>>> [ 0.434301] ret_from_fork_asm+0x1a/0x30 >>>> [ 0.434303] </TASK> >>>> >>>> CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted >>>> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) >>>> [ 0.346712] Hardware name: Red Hat KVM, BIOS >>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >>>> [ 0.346713] Call Trace: >>>> [ 0.346713] <TASK> >>>> [ 0.346714] dump_stack_lvl+0x53/0x70 >>>> [ 0.346715] __pgalloc_tag_add+0x407/0x700 >>>> [ 0.346720] get_page_from_freelist+0xa54/0x1310 >>>> [ 0.346723] __alloc_frozen_pages_noprof+0x206/0x4c0 >>>> [ 0.346729] __alloc_pages_noprof+0x10/0x1b0 >>>> [ 0.346731] alloc_cpu_data+0x96/0x210 >>>> [ 0.346732] rb_allocate_cpu_buffer+0xb93/0x1500 >>>> [ 0.346739] trace_rb_cpu_prepare+0x21a/0x4f0 >>>> [ 0.346753] cpuhp_invoke_callback+0x6db/0x14b0 >>>> [ 0.346755] __cpuhp_invoke_callback_range+0xde/0x1d0 >>>> [ 0.346759] _cpu_up+0x395/0x880 >>>> [ 0.346761] cpu_up+0x1bb/0x210 >>>> [ 0.346762] cpuhp_bringup_mask+0xd2/0x150 >>>> [ 0.346763] bringup_nonboot_cpus+0x12b/0x170 >>>> [ 0.346764] smp_init+0x2f/0x100 >>>> [ 0.346766] kernel_init_freeable+0x7a5/0xd30 >>>> [ 0.346769] kernel_init+0x1f/0x1f0 >>>> [ 0.346771] ret_from_fork+0x551/0x720 >>>> [ 0.346776] ret_from_fork_asm+0x1a/0x30 >>>> [ 0.346778] </TASK> >>>> >>>> and so on... >>>> >>>> >>>> In fact, I previously conducted extensive and prolonged stress testing >>>> >>>> on memory profiling. After our efforts to address several WARN cases, >>>> >>>> one remaining scenario we are addressing is the warning triggered during >>>> >>>> early slab cache reclaim — which is precisely the situation we are currently >>>> >>>> encountering (although I cannot guarantee that all edge cases have been >>>> >>>> covered by our stress testing). During the stress testing process, this >>>> warning >>>> >>>> did indeed manifest. However, the current environment triggers KASAN slab >>>> >>>> cache reclaim earlier than anticipated. >>>> >>>> >>>> Although the memory allocated prior to page_ext initialization has a >>>> relatively low probability of >>>> >>>> being released in subsequent operations (at least we have not >>>> encountered such cases up to now), >>>> >>>> I remain uncertain whether there are any overlooked edge cases when >>>> considering only slab-backed pages. >> Hi Suren >> >> >>> Ok, I guess specialized solution for slab would not work then. I want >>> to check on my side and understand how the number of these early >>> allocation scales. Is it higher for bigger machines or stays constant. >>> If the latter I think your original simple solution with some fixups >>> can still work. I'll need to instrument my code to capture these early >>> allocations and see where they originate. If you have a patch already >>> doing that it would help speed it up for me. >>> Thanks, >>> Suren. >> OK, my V2 patch is as follows: Hi Suren > Thanks! I'll go over it but first I need to check if the number of > early allocations is constant or dependent on some factors like > machine size (as I mentioned before). I hope to carve out some time to > investigate that this Friday. > We should also probably start a separate thread for this v2 as this > email thread is getting painfully long. OK, Right, but I can share the test data from my side. With early_page_ext disabled, I tested the following scenarios, and I will share my data. 8C16G: alloc_count = 802 8C32G: alloc_count = 790 16C32G: alloc_count = 994 16C64G: alloc_count = 992 32C64G: alloc_count = 1364 64C64G: alloc_count = 2226 128C64G: alloc_count = 3913 I think it makes sense for the value to grow with the number of CPUs, as this involves memory allocations related to CPU boot, like this: [ 0.345299] dump_stack_lvl+0x53/0x70 [ 0.345301] __pgalloc_tag_add+0x407/0x700 [ 0.345306] get_page_from_freelist+0xa54/0x1310 [ 0.345308] __alloc_frozen_pages_noprof+0x206/0x4c0 [ 0.345314] __alloc_pages_noprof+0x10/0x1b0 [ 0.345316] alloc_cpu_data+0x96/0x210 [ 0.345318] rb_allocate_cpu_buffer+0xb93/0x1500 [ 0.345325] trace_rb_cpu_prepare+0x21a/0x4f0 [ 0.345327] cpuhp_invoke_callback+0x6db/0x14b0 [ 0.345329] __cpuhp_invoke_callback_range+0xde/0x1d0 [ 0.345333] _cpu_up+0x395/0x880 [ 0.345335] cpu_up+0x1bb/0x210 [ 0.345336] cpuhp_bringup_mask+0xd2/0x150 [ 0.345337] bringup_nonboot_cpus+0x12b/0x170 [ 0.345338] smp_init+0x2f/0x100 [ 0.345340] kernel_init_freeable+0x7a5/0xd30 [ 0.345344] kernel_init+0x1f/0x1f0 I will send out version 2 as soon as possible. Thanks Hao > >> >> diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h >> index d40ac39bfbe8..bf226c2be2ad 100644 >> --- a/include/linux/alloc_tag.h >> +++ b/include/linux/alloc_tag.h >> @@ -74,6 +74,8 @@ static inline void set_codetag_empty(union codetag_ref >> *ref) >> >> #ifdef CONFIG_MEM_ALLOC_PROFILING >> >> +void alloc_tag_add_early_pfn(unsigned long pfn); >> + >> #define ALLOC_TAG_SECTION_NAME "alloc_tags" >> >> struct codetag_bytes { >> diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h >> index 38a82d65e58e..951d33362268 100644 >> --- a/include/linux/pgalloc_tag.h >> +++ b/include/linux/pgalloc_tag.h >> @@ -181,7 +181,7 @@ static inline struct alloc_tag >> *__pgalloc_tag_get(struct page *page) >> >> if (get_page_tag_ref(page, &ref, &handle)) { >> alloc_tag_sub_check(&ref); >> - if (ref.ct) >> + if (ref.ct && !is_codetag_empty(&ref)) >> tag = ct_to_alloc_tag(ref.ct); >> put_page_tag_ref(handle); >> } >> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c >> index 58991ab09d84..55c134a71cd0 100644 >> --- a/lib/alloc_tag.c >> +++ b/lib/alloc_tag.c >> @@ -6,6 +6,7 @@ >> #include <linux/kallsyms.h> >> #include <linux/module.h> >> #include <linux/page_ext.h> >> +#include <linux/pgalloc_tag.h> >> #include <linux/proc_fs.h> >> #include <linux/seq_buf.h> >> #include <linux/seq_file.h> >> @@ -26,6 +27,85 @@ static bool mem_profiling_support; >> >> static struct codetag_type *alloc_tag_cttype; >> >> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG >> + >> +/* >> + * page_ext is allocated and initialized relatively late during boot. >> + * Some pages are allocated before page_ext becomes available. >> + * Track these early PFNs and clear their codetag refs later to avoid >> + * warnings when they are freed. >> + */ >> + >> +#define EARLY_ALLOC_PFN_MAX 256 >> + >> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX] __initdata; >> +static atomic_t early_pfn_count __initdata = ATOMIC_INIT(0); >> + >> +static void __init __alloc_tag_add_early_pfn(unsigned long pfn) >> +{ >> + int old_idx, new_idx; >> + >> + do { >> + old_idx = atomic_read(&early_pfn_count); >> + if (old_idx >= EARLY_ALLOC_PFN_MAX) >> + return; >> + new_idx = old_idx + 1; >> + } while (!atomic_try_cmpxchg(&early_pfn_count, &old_idx, new_idx)); >> + >> + early_pfns[old_idx] = pfn; >> +} >> + >> +static void (*alloc_tag_add_early_pfn_ptr)(unsigned long pfn) __refdata = >> + __alloc_tag_add_early_pfn; >> + >> +void alloc_tag_add_early_pfn(unsigned long pfn) >> +{ >> + if (static_key_enabled(&mem_profiling_compressed)) >> + return; >> + >> + if (alloc_tag_add_early_pfn_ptr) >> + alloc_tag_add_early_pfn_ptr(pfn); >> +} >> + >> +static void __init clear_early_alloc_pfn_tag_refs(void) >> +{ >> + unsigned int i; >> + >> + for (i = 0; i < atomic_read(&early_pfn_count); i++) { >> + unsigned long pfn = early_pfns[i]; >> + >> + if (pfn_valid(pfn)) { >> + struct page *page = pfn_to_page(pfn); >> + union pgtag_ref_handle handle; >> + union codetag_ref ref; >> + >> + if (get_page_tag_ref(page, &ref, &handle)) { >> + /* >> + * An early-allocated page could be freed and reallocated >> + * after its page_ext is initialized but before we >> clear it. >> + * In that case, it already has a valid tag set. >> + * We should not overwrite that valid tag with >> CODETAG_EMPTY. >> + */ >> + if (ref.ct) { >> + put_page_tag_ref(handle); >> + continue; >> + } >> + >> + set_codetag_empty(&ref); >> + update_page_tag_ref(handle, &ref); >> + put_page_tag_ref(handle); >> + } >> + } >> + >> + atomic_set(&early_pfn_count, 0); >> + >> + alloc_tag_add_early_pfn_ptr = NULL; >> +} >> +#else /* !CONFIG_MEM_ALLOC_PROFILING_DEBUG */ >> +inline void alloc_tag_add_early_pfn(unsigned long pfn) {} >> +static inline void __init clear_early_alloc_pfn_tag_refs(void) {} >> +#endif >> + >> #ifdef CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU >> DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag); >> EXPORT_SYMBOL(_shared_alloc_tag); >> @@ -760,6 +840,7 @@ static __init bool need_page_alloc_tagging(void) >> >> static __init void init_page_alloc_tagging(void) >> { >> + clear_early_alloc_pfn_tag_refs(); >> } >> >> struct page_ext_operations page_alloc_tagging_ops = { >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 2d4b6f1a554e..5ce5c4ba401f 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1293,6 +1293,12 @@ void __pgalloc_tag_add(struct page *page, struct >> task_struct *task, >> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >> update_page_tag_ref(handle, &ref); >> put_page_tag_ref(handle); >> + } else { >> + /* >> + * page_ext is not available yet, record the pfn so we can >> + * clear the tag ref later when page_ext is initialized. >> + */ >> + alloc_tag_add_early_pfn(page_to_pfn(page)); >> } >> } >> >> Although this 256-entry array remains unmodified for now, I will locally >> record the occurrence counts >> >> of these various early memory allocations. Hopefully this will be >> helpful to you. >> >> >> Thanks >> >> Hao >> >>>> Thanks >>>> Hao >>>> >>>>>> Thanks, >>>>>> Suren. >>>>>> >>>>>>> However, I'm not entirely certain whether SPARSEMEM can guarantee the >>>>>>> same behavior. >>>>>>> >>>>>>> >>>>>>>>> I would appreciate your valuable feedback and any better suggestions you >>>>>>>>> might have. >>>>>>>> Thanks for pursuing this! I'll help in any way I can. >>>>>>>> Suren. >>>>>>> Thank you so much for your patient guidance and assistance. >>>>>>> >>>>>>> I truly appreciate your willingness to share your knowledge and insights. >>>>>>> >>>>>>> Thanks, >>>>>>> Hao >>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> Hao >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Suren. >>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> Hao >>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If the slab cache has no free objects, it falls back >>>>>>>>>>>>>>>>> to the buddy allocator to allocate memory. However, at this point page_ext >>>>>>>>>>>>>>>>> is not yet fully initialized, so these newly allocated pages have no >>>>>>>>>>>>>>>>> codetag set. These pages may later be reclaimed by KASAN,which causes >>>>>>>>>>>>>>>>> the warning to trigger when they are freed because their codetag ref is >>>>>>>>>>>>>>>>> still empty. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Use a global array to track pages allocated before page_ext is fully >>>>>>>>>>>>>>>>> initialized, similar to how kmemleak tracks early allocations. >>>>>>>>>>>>>>>>> When page_ext initialization completes, set their codetag >>>>>>>>>>>>>>>>> to empty to avoid warnings when they are freed later. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> --- a/include/linux/alloc_tag.h >>>>>>>>>>>>>>>>> +++ b/include/linux/alloc_tag.h >>>>>>>>>>>>>>>>> @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> #ifdef CONFIG_MEM_ALLOC_PROFILING >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> +bool mem_profiling_is_available(void); >>>>>>>>>>>>>>>>> +void alloc_tag_add_early_pfn(unsigned long pfn); >>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>> #define ALLOC_TAG_SECTION_NAME "alloc_tags" >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> struct codetag_bytes { >>>>>>>>>>>>>>>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c >>>>>>>>>>>>>>>>> index 58991ab09d84..a5bf4e72c154 100644 >>>>>>>>>>>>>>>>> --- a/lib/alloc_tag.c >>>>>>>>>>>>>>>>> +++ b/lib/alloc_tag.c >>>>>>>>>>>>>>>>> @@ -6,6 +6,7 @@ >>>>>>>>>>>>>>>>> #include <linux/kallsyms.h> >>>>>>>>>>>>>>>>> #include <linux/module.h> >>>>>>>>>>>>>>>>> #include <linux/page_ext.h> >>>>>>>>>>>>>>>>> +#include <linux/pgalloc_tag.h> >>>>>>>>>>>>>>>>> #include <linux/proc_fs.h> >>>>>>>>>>>>>>>>> #include <linux/seq_buf.h> >>>>>>>>>>>>>>>>> #include <linux/seq_file.h> >>>>>>>>>>>>>>>>> @@ -26,6 +27,82 @@ static bool mem_profiling_support; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> static struct codetag_type *alloc_tag_cttype; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> +/* >>>>>>>>>>>>>>>>> + * State of the alloc_tag >>>>>>>>>>>>>>>>> + * >>>>>>>>>>>>>>>>> + * This is used to describe the states of the alloc_tag during bootup. >>>>>>>>>>>>>>>>> + * >>>>>>>>>>>>>>>>> + * When we need to allocate page_ext to store codetag, we face an >>>>>>>>>>>>>>>>> + * initialization timing problem: >>>>>>>>>>>>>>>>> + * >>>>>>>>>>>>>>>>> + * Due to initialization order, pages may be allocated via buddy system >>>>>>>>>>>>>>>>> + * before page_ext is fully allocated and initialized. Although these >>>>>>>>>>>>>>>>> + * pages call the allocation hooks, the codetag will not be set because >>>>>>>>>>>>>>>>> + * page_ext is not yet available. >>>>>>>>>>>>>>>>> + * >>>>>>>>>>>>>>>>> + * When these pages are later free to the buddy system, it triggers >>>>>>>>>>>>>>>>> + * warnings because their codetag is actually empty if >>>>>>>>>>>>>>>>> + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. >>>>>>>>>>>>>>>>> + * >>>>>>>>>>>>>>>>> + * Additionally, in this situation, we cannot record detailed allocation >>>>>>>>>>>>>>>>> + * information for these pages. >>>>>>>>>>>>>>>>> + */ >>>>>>>>>>>>>>>>> +enum mem_profiling_state { >>>>>>>>>>>>>>>>> + DOWN, /* No mem_profiling functionality yet */ >>>>>>>>>>>>>>>>> + UP /* Everything is working */ >>>>>>>>>>>>>>>>> +}; >>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>> +static enum mem_profiling_state mem_profiling_state = DOWN; >>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>> +bool mem_profiling_is_available(void) >>>>>>>>>>>>>>>>> +{ >>>>>>>>>>>>>>>>> + return mem_profiling_state == UP; >>>>>>>>>>>>>>>>> +} >>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG >>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>> +#define EARLY_ALLOC_PFN_MAX 256 >>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; >>>>>>>>>>>>>>>> It's unfortunate that this isn't __initdata. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> +static unsigned int early_pfn_count; >>>>>>>>>>>>>>>>> +static DEFINE_SPINLOCK(early_pfn_lock); >>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> --- a/mm/page_alloc.c >>>>>>>>>>>>>>>>> +++ b/mm/page_alloc.c >>>>>>>>>>>>>>>>> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, >>>>>>>>>>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); >>>>>>>>>>>>>>>>> update_page_tag_ref(handle, &ref); >>>>>>>>>>>>>>>>> put_page_tag_ref(handle); >>>>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>> This branch can be marked as "unlikely". >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> + /* >>>>>>>>>>>>>>>>> + * page_ext is not available yet, record the pfn so we can >>>>>>>>>>>>>>>>> + * clear the tag ref later when page_ext is initialized. >>>>>>>>>>>>>>>>> + */ >>>>>>>>>>>>>>>>> + if (!mem_profiling_is_available()) >>>>>>>>>>>>>>>>> + alloc_tag_add_early_pfn(page_to_pfn(page)); >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> All because of this, I believe. Is this fixable? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If we take that `else', we know we're running in __init code, yes? I >>>>>>>>>>>>>>>> don't see how `__init pgalloc_tag_add_early()' could be made to work. >>>>>>>>>>>>>>>> hrm. Something clever, please. >>>>>>>>>>>>>>> We can have a pointer to a function that is initialized to point to >>>>>>>>>>>>>>> alloc_tag_add_early_pfn, which is defined as __init and uses >>>>>>>>>>>>>>> early_pfns which now can be defined as __initdata. After >>>>>>>>>>>>>>> clear_early_alloc_pfn_tag_refs() is done we reset that pointer to >>>>>>>>>>>>>>> NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() >>>>>>>>>>>>>>> directly checks that pointer and if it's not NULL then calls the >>>>>>>>>>>>>>> function that it points to. This way __pgalloc_tag_add() which is not >>>>>>>>>>>>>>> an __init function will be invoking alloc_tag_add_early_pfn() __init >>>>>>>>>>>>>>> function only until we are done with initialization. I haven't tried >>>>>>>>>>>>>>> this but I think that should work. This also eliminates the need for >>>>>>>>>>>>>>> mem_profiling_state variable since we can use this function pointer >>>>>>>>>>>>>>> instead. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-26 5:33 ` Hao Ge @ 2026-03-26 8:23 ` Suren Baghdasaryan 0 siblings, 0 replies; 21+ messages in thread From: Suren Baghdasaryan @ 2026-03-26 8:23 UTC (permalink / raw) To: Hao Ge; +Cc: Andrew Morton, Kent Overstreet, linux-mm, linux-kernel On Wed, Mar 25, 2026 at 10:34 PM Hao Ge <hao.ge@linux.dev> wrote: > > > On 2026/3/26 13:04, Suren Baghdasaryan wrote: > > On Wed, Mar 25, 2026 at 6:45 PM Hao Ge <hao.ge@linux.dev> wrote: > >> > >> On 2026/3/25 23:17, Suren Baghdasaryan wrote: > >>> On Wed, Mar 25, 2026 at 4:21 AM Hao Ge <hao.ge@linux.dev> wrote: > >>>> On 2026/3/25 15:35, Suren Baghdasaryan wrote: > >>>>> On Tue, Mar 24, 2026 at 11:25 PM Suren Baghdasaryan <surenb@google.com> wrote: > >>>>>> On Tue, Mar 24, 2026 at 7:08 PM Hao Ge <hao.ge@linux.dev> wrote: > >>>>>>> On 2026/3/25 08:21, Suren Baghdasaryan wrote: > >>>>>>>> On Tue, Mar 24, 2026 at 2:43 AM Hao Ge <hao.ge@linux.dev> wrote: > >>>>>>>>> On 2026/3/24 06:47, Suren Baghdasaryan wrote: > >>>>>>>>>> On Mon, Mar 23, 2026 at 2:16 AM Hao Ge <hao.ge@linux.dev> wrote: > >>>>>>>>>>> On 2026/3/20 10:14, Suren Baghdasaryan wrote: > >>>>>>>>>>>> On Thu, Mar 19, 2026 at 6:58 PM Hao Ge <hao.ge@linux.dev> wrote: > >>>>>>>>>>>>> On 2026/3/20 07:48, Suren Baghdasaryan wrote: > >>>>>>>>>>>>>> On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@google.com> wrote: > >>>>>>>>>>>>>>> On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: > >>>>>>>>>>>>>>>> On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Due to initialization ordering, page_ext is allocated and initialized > >>>>>>>>>>>>>>>>> relatively late during boot. Some pages have already been allocated > >>>>>>>>>>>>>>>>> and freed before page_ext becomes available, leaving their codetag > >>>>>>>>>>>>>>>>> uninitialized. > >>>>>>>>>>>>>>> Hi Hao, > >>>>>>>>>>>>>>> Thanks for the report. > >>>>>>>>>>>>>>> Hmm. So, we are allocating pages before page_ext is initialized... > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> A clear example is in init_section_page_ext(): alloc_page_ext() calls > >>>>>>>>>>>>>>>>> kmemleak_alloc(). > >>>>>>>>>>>>>> Forgot to ask. The example you are using here is for page_ext > >>>>>>>>>>>>>> allocation itself. Do you have any other examples where page > >>>>>>>>>>>>>> allocation happens before page_ext initialization? If that's the only > >>>>>>>>>>>>>> place, then we might be able to fix this in a simpler way by doing > >>>>>>>>>>>>>> something special for alloc_page_ext(). > >>>>>>>>>>>>> Hi Suren > >>>>>>>>>>>>> > >>>>>>>>>>>>> To help illustrate the point, here's the debug log I added: > >>>>>>>>>>>>> > >>>>>>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>>>>>>>>>>>> index 2d4b6f1a554e..ebfe636f5b07 100644 > >>>>>>>>>>>>> --- a/mm/page_alloc.c > >>>>>>>>>>>>> +++ b/mm/page_alloc.c > >>>>>>>>>>>>> @@ -1293,6 +1293,9 @@ void __pgalloc_tag_add(struct page *page, struct > >>>>>>>>>>>>> task_struct *task, > >>>>>>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >>>>>>>>>>>>> update_page_tag_ref(handle, &ref); > >>>>>>>>>>>>> put_page_tag_ref(handle); > >>>>>>>>>>>>> + } else { > >>>>>>>>>>>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > >>>>>>>>>>>>> + dump_stack(); > >>>>>>>>>>>>> } > >>>>>>>>>>>>> } > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> And I caught the following logs: > >>>>>>>>>>>>> > >>>>>>>>>>>>> [ 0.296399] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>>>>>> page=ffffea000400c700 pfn=1049372 nr=1 > >>>>>>>>>>>>> [ 0.296400] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >>>>>>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >>>>>>>>>>>>> [ 0.296402] Hardware name: Red Hat KVM, BIOS > >>>>>>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>>>>>>>>>>> [ 0.296402] Call Trace: > >>>>>>>>>>>>> [ 0.296403] <TASK> > >>>>>>>>>>>>> [ 0.296403] dump_stack_lvl+0x53/0x70 > >>>>>>>>>>>>> [ 0.296405] __pgalloc_tag_add+0x3a3/0x6e0 > >>>>>>>>>>>>> [ 0.296406] ? __pfx___pgalloc_tag_add+0x10/0x10 > >>>>>>>>>>>>> [ 0.296407] ? kasan_unpoison+0x27/0x60 > >>>>>>>>>>>>> [ 0.296409] ? __kasan_unpoison_pages+0x2c/0x40 > >>>>>>>>>>>>> [ 0.296411] get_page_from_freelist+0xa54/0x1310 > >>>>>>>>>>>>> [ 0.296413] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>>>>>>>>>>> [ 0.296415] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >>>>>>>>>>>>> [ 0.296417] ? stack_depot_save_flags+0x3f/0x680 > >>>>>>>>>>>>> [ 0.296418] ? ___slab_alloc+0x518/0x530 > >>>>>>>>>>>>> [ 0.296420] alloc_pages_mpol+0x13a/0x3f0 > >>>>>>>>>>>>> [ 0.296421] ? __pfx_alloc_pages_mpol+0x10/0x10 > >>>>>>>>>>>>> [ 0.296423] ? _raw_spin_lock_irqsave+0x8a/0xf0 > >>>>>>>>>>>>> [ 0.296424] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 > >>>>>>>>>>>>> [ 0.296426] alloc_slab_page+0xc2/0x130 > >>>>>>>>>>>>> [ 0.296427] allocate_slab+0x77/0x2c0 > >>>>>>>>>>>>> [ 0.296429] ? syscall_enter_define_fields+0x3bb/0x5f0 > >>>>>>>>>>>>> [ 0.296430] ___slab_alloc+0x125/0x530 > >>>>>>>>>>>>> [ 0.296432] ? __trace_define_field+0x252/0x3d0 > >>>>>>>>>>>>> [ 0.296433] __kmalloc_noprof+0x329/0x630 > >>>>>>>>>>>>> [ 0.296435] ? syscall_enter_define_fields+0x3bb/0x5f0 > >>>>>>>>>>>>> [ 0.296436] syscall_enter_define_fields+0x3bb/0x5f0 > >>>>>>>>>>>>> [ 0.296438] ? __pfx_syscall_enter_define_fields+0x10/0x10 > >>>>>>>>>>>>> [ 0.296440] event_define_fields+0x326/0x540 > >>>>>>>>>>>>> [ 0.296441] __trace_early_add_events+0xac/0x3c0 > >>>>>>>>>>>>> [ 0.296443] trace_event_init+0x24c/0x460 > >>>>>>>>>>>>> [ 0.296445] trace_init+0x9/0x20 > >>>>>>>>>>>>> [ 0.296446] start_kernel+0x199/0x3c0 > >>>>>>>>>>>>> [ 0.296448] x86_64_start_reservations+0x18/0x30 > >>>>>>>>>>>>> [ 0.296449] x86_64_start_kernel+0xe2/0xf0 > >>>>>>>>>>>>> [ 0.296451] common_startup_64+0x13e/0x141 > >>>>>>>>>>>>> [ 0.296453] </TASK> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> [ 0.312234] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>>>>>> page=ffffea000400f900 pfn=1049572 nr=1 > >>>>>>>>>>>>> [ 0.312234] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >>>>>>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >>>>>>>>>>>>> [ 0.312236] Hardware name: Red Hat KVM, BIOS > >>>>>>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>>>>>>>>>>> [ 0.312236] Call Trace: > >>>>>>>>>>>>> [ 0.312237] <TASK> > >>>>>>>>>>>>> [ 0.312237] dump_stack_lvl+0x53/0x70 > >>>>>>>>>>>>> [ 0.312239] __pgalloc_tag_add+0x3a3/0x6e0 > >>>>>>>>>>>>> [ 0.312240] ? __pfx___pgalloc_tag_add+0x10/0x10 > >>>>>>>>>>>>> [ 0.312241] ? rmqueue.constprop.0+0x4fc/0x1ce0 > >>>>>>>>>>>>> [ 0.312243] ? kasan_unpoison+0x27/0x60 > >>>>>>>>>>>>> [ 0.312244] ? __kasan_unpoison_pages+0x2c/0x40 > >>>>>>>>>>>>> [ 0.312246] get_page_from_freelist+0xa54/0x1310 > >>>>>>>>>>>>> [ 0.312248] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>>>>>>>>>>> [ 0.312250] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >>>>>>>>>>>>> [ 0.312253] alloc_slab_page+0x39/0x130 > >>>>>>>>>>>>> [ 0.312254] allocate_slab+0x77/0x2c0 > >>>>>>>>>>>>> [ 0.312255] ? alloc_cpumask_var_node+0xc7/0x230 > >>>>>>>>>>>>> [ 0.312257] ___slab_alloc+0x46d/0x530 > >>>>>>>>>>>>> [ 0.312259] __kmalloc_node_noprof+0x2fa/0x680 > >>>>>>>>>>>>> [ 0.312261] ? alloc_cpumask_var_node+0xc7/0x230 > >>>>>>>>>>>>> [ 0.312263] alloc_cpumask_var_node+0xc7/0x230 > >>>>>>>>>>>>> [ 0.312264] init_desc+0x141/0x6b0 > >>>>>>>>>>>>> [ 0.312266] alloc_desc+0x108/0x1b0 > >>>>>>>>>>>>> [ 0.312267] early_irq_init+0xee/0x1c0 > >>>>>>>>>>>>> [ 0.312268] ? __pfx_early_irq_init+0x10/0x10 > >>>>>>>>>>>>> [ 0.312271] start_kernel+0x1ab/0x3c0 > >>>>>>>>>>>>> [ 0.312272] x86_64_start_reservations+0x18/0x30 > >>>>>>>>>>>>> [ 0.312274] x86_64_start_kernel+0xe2/0xf0 > >>>>>>>>>>>>> [ 0.312275] common_startup_64+0x13e/0x141 > >>>>>>>>>>>>> [ 0.312277] </TASK> > >>>>>>>>>>>>> > >>>>>>>>>>>>> [ 0.312834] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>>>>>> page=ffffea000400fc00 pfn=1049584 nr=1 > >>>>>>>>>>>>> [ 0.312835] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >>>>>>>>>>>>> 7.0.0-rc4-dirty #12 PREEMPT(lazy) > >>>>>>>>>>>>> [ 0.312836] Hardware name: Red Hat KVM, BIOS > >>>>>>>>>>>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>>>>>>>>>>> [ 0.312837] Call Trace: > >>>>>>>>>>>>> [ 0.312837] <TASK> > >>>>>>>>>>>>> [ 0.312838] dump_stack_lvl+0x53/0x70 > >>>>>>>>>>>>> [ 0.312840] __pgalloc_tag_add+0x3a3/0x6e0 > >>>>>>>>>>>>> [ 0.312841] ? __pfx___pgalloc_tag_add+0x10/0x10 > >>>>>>>>>>>>> [ 0.312842] ? rmqueue.constprop.0+0x4fc/0x1ce0 > >>>>>>>>>>>>> [ 0.312844] ? kasan_unpoison+0x27/0x60 > >>>>>>>>>>>>> [ 0.312845] ? __kasan_unpoison_pages+0x2c/0x40 > >>>>>>>>>>>>> [ 0.312847] get_page_from_freelist+0xa54/0x1310 > >>>>>>>>>>>>> [ 0.312849] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>>>>>>>>>>> [ 0.312851] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > >>>>>>>>>>>>> [ 0.312853] alloc_pages_mpol+0x13a/0x3f0 > >>>>>>>>>>>>> [ 0.312855] ? __pfx_alloc_pages_mpol+0x10/0x10 > >>>>>>>>>>>>> [ 0.312856] ? xas_find+0x2d8/0x450 > >>>>>>>>>>>>> [ 0.312858] ? _raw_spin_lock+0x84/0xe0 > >>>>>>>>>>>>> [ 0.312859] ? __pfx__raw_spin_lock+0x10/0x10 > >>>>>>>>>>>>> [ 0.312861] alloc_pages_noprof+0xf6/0x2b0 > >>>>>>>>>>>>> [ 0.312862] __change_page_attr+0x293/0x850 > >>>>>>>>>>>>> [ 0.312864] ? __pfx___change_page_attr+0x10/0x10 > >>>>>>>>>>>>> [ 0.312865] ? _vm_unmap_aliases+0x2d0/0x650 > >>>>>>>>>>>>> [ 0.312868] ? __pfx__vm_unmap_aliases+0x10/0x10 > >>>>>>>>>>>>> [ 0.312869] __change_page_attr_set_clr+0x16c/0x360 > >>>>>>>>>>>>> [ 0.312871] ? spp_getpage+0xbb/0x1e0 > >>>>>>>>>>>>> [ 0.312872] change_page_attr_set_clr+0x220/0x3c0 > >>>>>>>>>>>>> [ 0.312873] ? flush_tlb_one_kernel+0xf/0x30 > >>>>>>>>>>>>> [ 0.312875] ? set_pte_vaddr_p4d+0x110/0x180 > >>>>>>>>>>>>> [ 0.312877] ? __pfx_change_page_attr_set_clr+0x10/0x10 > >>>>>>>>>>>>> [ 0.312878] ? __pfx_set_pte_vaddr_p4d+0x10/0x10 > >>>>>>>>>>>>> [ 0.312881] ? __pfx_mtree_load+0x10/0x10 > >>>>>>>>>>>>> [ 0.312883] ? __pfx_mtree_load+0x10/0x10 > >>>>>>>>>>>>> [ 0.312884] ? __asan_memcpy+0x3c/0x60 > >>>>>>>>>>>>> [ 0.312886] ? set_intr_gate+0x10c/0x150 > >>>>>>>>>>>>> [ 0.312888] set_memory_ro+0x76/0xa0 > >>>>>>>>>>>>> [ 0.312889] ? __pfx_set_memory_ro+0x10/0x10 > >>>>>>>>>>>>> [ 0.312891] idt_setup_apic_and_irq_gates+0x2c1/0x390 > >>>>>>>>>>>>> > >>>>>>>>>>>>> and more. > >>>>>>>>>>>> Ok, it's not the only place. Got your point. > >>>>>>>>>>>> > >>>>>>>>>>>>> off topic - if we were to handle only alloc_page_ext() specifically, > >>>>>>>>>>>>> what would be the most straightforward > >>>>>>>>>>>>> > >>>>>>>>>>>>> solution in your mind? I'd really appreciate your insight. > >>>>>>>>>>>> I was thinking if it's the only special case maybe we can handle it > >>>>>>>>>>>> somehow differently, like we do when we allocate obj_ext vectors for > >>>>>>>>>>>> slabs using __GFP_NO_OBJ_EXT. I haven't found a good solution yet but > >>>>>>>>>>>> since it's not a special case we would not be able to use it even if I > >>>>>>>>>>>> came up with something... > >>>>>>>>>>>> I think your way is the most straight-forward but please try my > >>>>>>>>>>>> suggestion to see if we can avoid extra overhead. > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> Suren. > >>>>>>> Hi Suren > >>>>>>>>> Hi Suren > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Hi Hao, > >>>>>>>>>> > >>>>>>>>>>> Hi Suren > >>>>>>>>>>> > >>>>>>>>>>> Thank you for your feedback. After re-examining this issue, > >>>>>>>>>>> > >>>>>>>>>>> I realize my previous focus was misplaced. > >>>>>>>>>>> > >>>>>>>>>>> Upon deeper consideration, I understand that this is not merely a bug, > >>>>>>>>>>> > >>>>>>>>>>> but rather a warning that indicates a gap in our memory profiling mechanism. > >>>>>>>>>>> > >>>>>>>>>>> Specifically, the current implementation appears to be missing memory > >>>>>>>>>>> allocation > >>>>>>>>>>> > >>>>>>>>>>> tracking during the period between the buddy system allocation and page_ext > >>>>>>>>>>> > >>>>>>>>>>> initialization. > >>>>>>>>>>> > >>>>>>>>>>> This profiling gap means we may not be capturing all relevant memory > >>>>>>>>>>> allocation > >>>>>>>>>>> > >>>>>>>>>>> events during this critical transition phase. > >>>>>>>>>> Correct, this limitation exists because memory profiling relies on > >>>>>>>>>> some kernel facilities (page_ext, objj_ext) which might not be > >>>>>>>>>> initialized yet at the time of allocation. > >>>>>>>>>> > >>>>>>>>>>> My approach is to dynamically allocate codetag_ref when get_page_tag_ref > >>>>>>>>>>> fails, > >>>>>>>>>>> > >>>>>>>>>>> and maintain a linked list to track all buddy system allocations that > >>>>>>>>>>> occur prior to page_ext initialization. > >>>>>>>>>>> > >>>>>>>>>>> However, this introduces performance concerns: > >>>>>>>>>>> > >>>>>>>>>>> 1. Free Path Overhead: When freeing these pages, we would need to > >>>>>>>>>>> traverse the entire linked list to locate > >>>>>>>>>>> > >>>>>>>>>>> the corresponding codetag_ref, resulting in O(n) lookup complexity > >>>>>>>>>>> per free operation. > >>>>>>>>>>> > >>>>>>>>>>> 2. Initialization Overhead: During init_page_alloc_tagging, iterating > >>>>>>>>>>> through the linked list to assign codetag_ref to > >>>>>>>>>>> > >>>>>>>>>>> page_ext would introduce additional traversal cost. > >>>>>>>>>>> > >>>>>>>>>>> If the number of pages is substantial, this could incur significant > >>>>>>>>>>> overhead. What are your thoughts on this? I look forward to your > >>>>>>>>>>> suggestions. > >>>>>>>>>> My thinking is that these early allocations comprise a small portion > >>>>>>>>>> of overall memory consumed by the system. So, instead of trying to > >>>>>>>>>> record and handle them in some alternative way, we just accept that > >>>>>>>>>> some counters might not be exactly accurate and ignore those early > >>>>>>>>>> allocations. See how the early slab allocations are marked with the > >>>>>>>>>> CODETAG_FLAG_INACCURATE flag and later reported as inaccurate. I think > >>>>>>>>>> that's an acceptable alternative to introducing extra complexity and > >>>>>>>>>> performance overhead. IOW, the benefits of accounting for these early > >>>>>>>>>> allocations are low compared to the effort required to account for > >>>>>>>>>> them. Unless you found a simple and performant way to do that... > >>>>>>>>> I have been exploring possible solutions to this issue over the past few > >>>>>>>>> days, > >>>>>>>>> > >>>>>>>>> but so far I have not come up with a good approach. > >>>>>>>>> > >>>>>>>>> I have counted the number of memory allocations that occur earlier than the > >>>>>>>>> > >>>>>>>>> allocation and initialization of our page_ext, and found that there are > >>>>>>>>> actually > >>>>>>>>> > >>>>>>>>> quite a lot of them. > >>>>>>>> Interesting... I wonder it's because deferred_struct_pages defers > >>>>>>>> page_ext initialization. Can you check if setting early_page_ext > >>>>>>>> reduces or eliminates these allocations before page_ext init cases? > >>>>>>> Yes, you are correct. In my 8-core 16GB virtual machine, I used a global > >>>>>>> counter > >>>>>>> > >>>>>>> to record these allocations. With early_page_ext enabled, there were 130 > >>>>>>> allocations > >>>>>>> > >>>>>>> before page_ext initialization. Without early_page_ext, there were 802 > >>>>>>> allocations > >>>>>>> > >>>>>>> before page_ext initialization. > >>>>>>> > >>>>>>> > >>>>>>>>> Similarly, I have made the following changes and collected the > >>>>>>>>> corresponding logs. > >>>>>>>>> > >>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>>>>>>>> index 2d4b6f1a554e..6db65b3d52d3 100644 > >>>>>>>>> --- a/mm/page_alloc.c > >>>>>>>>> +++ b/mm/page_alloc.c > >>>>>>>>> @@ -1293,6 +1293,8 @@ void __pgalloc_tag_add(struct page *page, struct > >>>>>>>>> task_struct *task, > >>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >>>>>>>>> update_page_tag_ref(handle, &ref); > >>>>>>>>> put_page_tag_ref(handle); > >>>>>>>>> + } else{ > >>>>>>>>> + pr_warn("__pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > >>>>>>>>> } > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> @@ -1314,6 +1316,8 @@ void __pgalloc_tag_sub(struct page *page, unsigned > >>>>>>>>> int nr) > >>>>>>>>> alloc_tag_sub(&ref, PAGE_SIZE * nr); > >>>>>>>>> update_page_tag_ref(handle, &ref); > >>>>>>>>> put_page_tag_ref(handle); > >>>>>>>>> + } else{ > >>>>>>>>> + pr_warn("__pgalloc_tag_sub: get_page_tag_ref failed! > >>>>>>>>> page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr); > >>>>>>>>> } > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> [ 0.261699] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001000 pfn=1048640 nr=2 > >>>>>>>>> [ 0.261711] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001100 pfn=1048644 nr=4 > >>>>>>>>> [ 0.261717] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001200 pfn=1048648 nr=4 > >>>>>>>>> [ 0.261721] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001300 pfn=1048652 nr=4 > >>>>>>>>> [ 0.261893] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001080 pfn=1048642 nr=2 > >>>>>>>>> [ 0.261917] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001400 pfn=1048656 nr=4 > >>>>>>>>> [ 0.262018] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001500 pfn=1048660 nr=2 > >>>>>>>>> [ 0.262024] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001600 pfn=1048664 nr=8 > >>>>>>>>> [ 0.262040] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001580 pfn=1048662 nr=1 > >>>>>>>>> [ 0.262048] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea00040015c0 pfn=1048663 nr=1 > >>>>>>>>> [ 0.262056] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001800 pfn=1048672 nr=2 > >>>>>>>>> [ 0.262064] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001880 pfn=1048674 nr=2 > >>>>>>>>> [ 0.262078] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001900 pfn=1048676 nr=2 > >>>>>>>>> [ 0.262196] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 > >>>>>>>>> [ 0.262213] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001980 pfn=1048678 nr=2 > >>>>>>>>> [ 0.262220] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001a00 pfn=1048680 nr=4 > >>>>>>>>> [ 0.262246] ODEBUG: selftest passed > >>>>>>>>> [ 0.262268] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001b00 pfn=1048684 nr=1 > >>>>>>>>> [ 0.262318] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001b40 pfn=1048685 nr=1 > >>>>>>>>> [ 0.262368] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001b80 pfn=1048686 nr=1 > >>>>>>>>> [ 0.262418] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001bc0 pfn=1048687 nr=1 > >>>>>>>>> [ 0.262469] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001c00 pfn=1048688 nr=1 > >>>>>>>>> [ 0.262519] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001c40 pfn=1048689 nr=1 > >>>>>>>>> [ 0.262569] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001c80 pfn=1048690 nr=1 > >>>>>>>>> [ 0.262620] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001cc0 pfn=1048691 nr=1 > >>>>>>>>> [ 0.262670] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001d00 pfn=1048692 nr=1 > >>>>>>>>> [ 0.262721] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001d40 pfn=1048693 nr=1 > >>>>>>>>> [ 0.262771] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001d80 pfn=1048694 nr=1 > >>>>>>>>> [ 0.262821] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001dc0 pfn=1048695 nr=1 > >>>>>>>>> [ 0.262871] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001e00 pfn=1048696 nr=1 > >>>>>>>>> [ 0.262923] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001e40 pfn=1048697 nr=1 > >>>>>>>>> [ 0.262974] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001e80 pfn=1048698 nr=1 > >>>>>>>>> [ 0.263024] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001ec0 pfn=1048699 nr=1 > >>>>>>>>> [ 0.263074] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001f00 pfn=1048700 nr=1 > >>>>>>>>> [ 0.263124] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001f40 pfn=1048701 nr=1 > >>>>>>>>> [ 0.263174] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001f80 pfn=1048702 nr=1 > >>>>>>>>> [ 0.263224] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004001fc0 pfn=1048703 nr=1 > >>>>>>>>> [ 0.263275] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004002000 pfn=1048704 nr=1 > >>>>>>>>> [ 0.263325] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004002040 pfn=1048705 nr=1 > >>>>>>>>> [ 0.263375] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004002080 pfn=1048706 nr=1 > >>>>>>>>> [ 0.263427] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004002400 pfn=1048720 nr=16 > >>>>>>>>> [ 0.263437] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea00040020c0 pfn=1048707 nr=1 > >>>>>>>>> [ 0.263463] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004002100 pfn=1048708 nr=1 > >>>>>>>>> [ 0.263465] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004002140 pfn=1048709 nr=1 > >>>>>>>>> [ 0.263467] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004002180 pfn=1048710 nr=1 > >>>>>>>>> [ 0.263509] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004002200 pfn=1048712 nr=4 > >>>>>>>>> [ 0.263512] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004002800 pfn=1048736 nr=8 > >>>>>>>>> [ 0.263524] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea00040021c0 pfn=1048711 nr=1 > >>>>>>>>> [ 0.263536] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004002300 pfn=1048716 nr=1 > >>>>>>>>> [ 0.263537] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004002340 pfn=1048717 nr=1 > >>>>>>>>> [ 0.263539] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004002380 pfn=1048718 nr=1 > >>>>>>>>> [ 0.263604] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004004000 pfn=1048832 nr=128 > >>>>>>>>> [ 0.263638] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004003000 pfn=1048768 nr=64 > >>>>>>>>> [ 0.263650] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004002c00 pfn=1048752 nr=16 > >>>>>>>>> [ 0.263655] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea00040023c0 pfn=1048719 nr=1 > >>>>>>>>> [ 0.270582] __pgalloc_tag_sub: get_page_tag_ref failed! > >>>>>>>>> page=ffffea00040023c0 pfn=1048719 nr=1 > >>>>>>>>> [ 0.270591] ftrace: allocating 52717 entries in 208 pages > >>>>>>>>> [ 0.270592] ftrace: allocated 208 pages with 3 groups > >>>>>>>>> [ 0.270620] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004002a00 pfn=1048744 nr=8 > >>>>>>>>> [ 0.270636] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea00040023c0 pfn=1048719 nr=1 > >>>>>>>>> [ 0.270643] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006000 pfn=1048960 nr=1 > >>>>>>>>> [ 0.270649] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006040 pfn=1048961 nr=1 > >>>>>>>>> [ 0.270658] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004007000 pfn=1049024 nr=64 > >>>>>>>>> [ 0.270659] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006080 pfn=1048962 nr=2 > >>>>>>>>> [ 0.270722] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006100 pfn=1048964 nr=1 > >>>>>>>>> [ 0.270730] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006140 pfn=1048965 nr=1 > >>>>>>>>> [ 0.270738] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006180 pfn=1048966 nr=1 > >>>>>>>>> [ 0.270777] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea00040061c0 pfn=1048967 nr=1 > >>>>>>>>> [ 0.270786] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006200 pfn=1048968 nr=1 > >>>>>>>>> [ 0.270792] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006240 pfn=1048969 nr=1 > >>>>>>>>> [ 0.270833] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006300 pfn=1048972 nr=4 > >>>>>>>>> [ 0.270891] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006280 pfn=1048970 nr=1 > >>>>>>>>> [ 0.270980] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea00040062c0 pfn=1048971 nr=1 > >>>>>>>>> [ 0.271071] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006400 pfn=1048976 nr=1 > >>>>>>>>> [ 0.271156] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006440 pfn=1048977 nr=1 > >>>>>>>>> [ 0.271185] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006480 pfn=1048978 nr=2 > >>>>>>>>> [ 0.271301] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006500 pfn=1048980 nr=1 > >>>>>>>>> [ 0.271655] Dynamic Preempt: lazy > >>>>>>>>> [ 0.271662] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006580 pfn=1048982 nr=2 > >>>>>>>>> [ 0.271752] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006600 pfn=1048984 nr=4 > >>>>>>>>> [ 0.271762] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004010000 pfn=1049600 nr=4 > >>>>>>>>> [ 0.271824] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006540 pfn=1048981 nr=1 > >>>>>>>>> [ 0.271916] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006700 pfn=1048988 nr=2 > >>>>>>>>> [ 0.271964] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006780 pfn=1048990 nr=1 > >>>>>>>>> [ 0.272099] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea00040067c0 pfn=1048991 nr=1 > >>>>>>>>> [ 0.272138] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006800 pfn=1048992 nr=2 > >>>>>>>>> [ 0.272144] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006a00 pfn=1049000 nr=8 > >>>>>>>>> [ 0.272249] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006c00 pfn=1049008 nr=8 > >>>>>>>>> [ 0.272319] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006880 pfn=1048994 nr=2 > >>>>>>>>> [ 0.272351] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006900 pfn=1048996 nr=4 > >>>>>>>>> [ 0.272424] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004006e00 pfn=1049016 nr=8 > >>>>>>>>> [ 0.272485] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004008000 pfn=1049088 nr=8 > >>>>>>>>> [ 0.272535] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004008200 pfn=1049096 nr=2 > >>>>>>>>> [ 0.272600] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004008400 pfn=1049104 nr=8 > >>>>>>>>> [ 0.272663] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004008300 pfn=1049100 nr=4 > >>>>>>>>> [ 0.272694] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004008280 pfn=1049098 nr=2 > >>>>>>>>> [ 0.272708] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004008600 pfn=1049112 nr=8 > >>>>>>>>> > >>>>>>>>> [ 0.272924] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004008880 pfn=1049122 nr=2 > >>>>>>>>> [ 0.272934] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004008900 pfn=1049124 nr=2 > >>>>>>>>> [ 0.272952] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004008c00 pfn=1049136 nr=4 > >>>>>>>>> [ 0.273035] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004008980 pfn=1049126 nr=2 > >>>>>>>>> [ 0.273062] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004008e00 pfn=1049144 nr=8 > >>>>>>>>> [ 0.273674] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004008d00 pfn=1049140 nr=1 > >>>>>>>>> [ 0.273884] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004008d80 pfn=1049142 nr=2 > >>>>>>>>> [ 0.273943] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009000 pfn=1049152 nr=2 > >>>>>>>>> [ 0.274379] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009080 pfn=1049154 nr=2 > >>>>>>>>> [ 0.274575] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009200 pfn=1049160 nr=8 > >>>>>>>>> [ 0.274617] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009100 pfn=1049156 nr=4 > >>>>>>>>> [ 0.274794] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009400 pfn=1049168 nr=2 > >>>>>>>>> [ 0.274840] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009480 pfn=1049170 nr=2 > >>>>>>>>> [ 0.275057] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009500 pfn=1049172 nr=2 > >>>>>>>>> [ 0.275092] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009580 pfn=1049174 nr=2 > >>>>>>>>> [ 0.275134] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009600 pfn=1049176 nr=8 > >>>>>>>>> [ 0.275211] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009800 pfn=1049184 nr=4 > >>>>>>>>> [ 0.275510] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009900 pfn=1049188 nr=2 > >>>>>>>>> [ 0.275548] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009980 pfn=1049190 nr=2 > >>>>>>>>> [ 0.275976] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009a00 pfn=1049192 nr=8 > >>>>>>>>> [ 0.275987] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009c00 pfn=1049200 nr=2 > >>>>>>>>> [ 0.276139] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009c80 pfn=1049202 nr=2 > >>>>>>>>> [ 0.276152] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004008d40 pfn=1049141 nr=1 > >>>>>>>>> [ 0.276242] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009d00 pfn=1049204 nr=1 > >>>>>>>>> [ 0.276358] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009d40 pfn=1049205 nr=1 > >>>>>>>>> [ 0.276444] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009d80 pfn=1049206 nr=1 > >>>>>>>>> [ 0.276526] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009dc0 pfn=1049207 nr=1 > >>>>>>>>> [ 0.276615] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009e00 pfn=1049208 nr=1 > >>>>>>>>> [ 0.276696] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009e40 pfn=1049209 nr=1 > >>>>>>>>> [ 0.276792] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009e80 pfn=1049210 nr=1 > >>>>>>>>> [ 0.276827] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009f00 pfn=1049212 nr=2 > >>>>>>>>> [ 0.276891] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009ec0 pfn=1049211 nr=1 > >>>>>>>>> [ 0.276999] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009f80 pfn=1049214 nr=1 > >>>>>>>>> [ 0.277082] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea0004009fc0 pfn=1049215 nr=1 > >>>>>>>>> [ 0.277172] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea000400a000 pfn=1049216 nr=1 > >>>>>>>>> [ 0.277257] __pgalloc_tag_add: get_page_tag_ref failed! > >>>>>>>>> page=ffffea000400a040 pfn=1049217 nr=1 > >>>>>>>>> > >>>>>>>>> and so on. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> I think your earlier patch can effectively detect these early > >>>>>>>>>> allocations and suppress the warnings. We should also mark these > >>>>>>>>>> allocations with CODETAG_FLAG_INACCURATE. > >>>>>>>>> Thanks to an excellent AI review, I realized there are issues with > >>>>>>>>> > >>>>>>>>> my original patch. One problem is the 256-element array; another > >>>>>>>> Yes, if there are lots of such allocations, it's not appropriate. > >>>>>>>> > >>>>>>>>> is that it involves allocation and free operations — meaning we need > >>>>>>>>> > >>>>>>>>> to record entries at __pgalloc_tag_add and remove them at __pgalloc_tag_sub, > >>>>>>>>> > >>>>>>>>> which introduces a noticeable overhead. I'm wondering if we can instead > >>>>>>>>> set a flag > >>>>>>>>> > >>>>>>>>> bit in page flags during the early boot stage, which I'll refer to as > >>>>>>>>> EARLY_ALLOC_FLAGS. > >>>>>>>>> > >>>>>>>>> Then, in __pgalloc_tag_sub, we first check for EARLY_ALLOC_FLAGS. If > >>>>>>>>> set, we clear the > >>>>>>>>> > >>>>>>>>> flag and return immediately; otherwise, we perform the actual > >>>>>>>>> subtraction of the tag count. > >>>>>>>>> > >>>>>>>>> This approach seems somewhat similar to the idea behind > >>>>>>>>> mem_profiling_compressed. > >>>>>>>> That seems doable but let's first check if we can make page_ext > >>>>>>>> initialization happen before these allocations. That would be the > >>>>>>>> ideal path. If it's not possible then we can focus on alternatives > >>>>>>>> like the one you propose. > >>>>>>> Yes, the ideal scenario would be to have page_ext initialization > >>>>>>> complete before > >>>>>>> > >>>>>>> these allocations occur. I just did a code walkthrough and found that > >>>>>>> this resembles > >>>>>>> > >>>>>>> the FLATMEM implementation approach - FLATMEM allocates page_ext before > >>>>>>> the buddy > >>>>>>> > >>>>>>> system initialization, so it doesn't seem to encounter the issue we're > >>>>>>> facing now. > >>>>>>> > >>>>>>> https://elixir.bootlin.com/linux/v7.0-rc5/source/mm/mm_init.c#L2707 > >>>>>> Yes, page_ext_init_flatmem() looks like an interesting option and it > >>>>>> would not work with sparsemem. TBH I would prefer to find a simple > >>>>>> solution that can identify early init allocations, mark them inaccuate > >>>>>> and suppress the warning rather than introduce some complex mechanism > >>>>>> to account for them which would work only is some cases (flatmem). > >>>>>> With your original approach I think the only real issue is the size of > >>>>>> the array that might be too small. The other issue you mentioned about > >>>>>> allocated page being freed and then re-allocated after page_ext is > >>>>>> inialized but before clear_page_tag_ref() is called is not really a > >>>>>> problem. Yes, we will lose that counter's value but it's similar to > >>>>>> other early allocations which we just treat as inaccurate. We can also > >>>>>> minimize the possibility of this happening by moving > >>>>>> clear_page_tag_ref() into init_page_alloc_tagging(). > >>>>>> > >>>>>> I don't like the pageflag option you mentioned because it adds an > >>>>>> extra condition check into __pgalloc_tag_sub() which will be executed > >>>>>> even after the init stage is over. > >>>>>> I'll look into this some more tomorrow as it's quite late now. > >>>> Hi Suren > >>>> > >>>> > >>>>> Just though of something. Are all these pages allocated by slab? If > >>>>> so, I think slab does not use page->lru (need to double-check) and we > >>>>> could add all these pages allocated during early init into a list and > >>>>> then set their page_ext reference to CODETAG_EMPTY in > >>>>> init_page_alloc_tagging(). > >>>> Got your point. > >>>> > >>>> > >>>> There will indeed be some non-SLAB memory allocations here, such as the > >>>> following: > >>>> > >>>> > >>>> CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > >>>> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) > >>>> [ 0.326607] Hardware name: Red Hat KVM, BIOS > >>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>> [ 0.326608] Call Trace: > >>>> [ 0.326608] <TASK> > >>>> [ 0.326609] dump_stack_lvl+0x53/0x70 > >>>> [ 0.326611] __pgalloc_tag_add+0x407/0x700 > >>>> [ 0.326616] get_page_from_freelist+0xa54/0x1310 > >>>> [ 0.326618] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>> [ 0.326623] alloc_pages_mpol+0x13a/0x3f0 > >>>> [ 0.326627] alloc_pages_noprof+0xf6/0x2b0 > >>>> [ 0.326628] __pmd_alloc+0x743/0x9c0 > >>>> [ 0.326630] vmap_range_noflush+0xac0/0x10a0 > >>>> [ 0.326637] ioremap_page_range+0x17c/0x250 > >>>> [ 0.326639] __ioremap_caller+0x437/0x5c0 > >>>> [ 0.326645] acpi_os_map_iomem+0x4c0/0x660 > >>>> [ 0.326647] acpi_tb_verify_temp_table+0x1c0/0x580 > >>>> [ 0.326649] acpi_reallocate_root_table+0x2ad/0x460 > >>>> [ 0.326655] acpi_early_init+0x111/0x460 > >>>> [ 0.326657] start_kernel+0x271/0x3c0 > >>>> [ 0.326659] x86_64_start_reservations+0x18/0x30 > >>>> [ 0.326660] x86_64_start_kernel+0xe2/0xf0 > >>>> [ 0.326662] common_startup_64+0x13e/0x141 > >>>> [ 0.326663] </TASK> > >>>> > >>>> CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted > >>>> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) > >>>> [ 0.329167] Hardware name: Red Hat KVM, BIOS > >>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>> [ 0.329167] Call Trace: > >>>> [ 0.329167] <TASK> > >>>> [ 0.329167] dump_stack_lvl+0x53/0x70 > >>>> [ 0.329167] __pgalloc_tag_add+0x407/0x700 > >>>> [ 0.329167] get_page_from_freelist+0xa54/0x1310 > >>>> [ 0.329167] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>> [ 0.329167] __alloc_pages_noprof+0x10/0x1b0 > >>>> [ 0.329167] dup_task_struct+0x163/0x8c0 > >>>> [ 0.329167] copy_process+0x390/0x4a70 > >>>> [ 0.329167] kernel_clone+0xe1/0x830 > >>>> [ 0.329167] kernel_thread+0xcb/0x110 > >>>> [ 0.329167] kthreadd+0x8a2/0xc60 > >>>> [ 0.329167] ret_from_fork+0x551/0x720 > >>>> [ 0.329167] ret_from_fork_asm+0x1a/0x30 > >>>> [ 0.329167] </TASK> > >>>> > >>>> CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted > >>>> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) > >>>> [ 0.329167] Hardware name: Red Hat KVM, BIOS > >>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>> [ 0.329167] Call Trace: > >>>> [ 0.329167] <TASK> > >>>> [ 0.329167] dump_stack_lvl+0x53/0x70 > >>>> [ 0.329167] __pgalloc_tag_add+0x407/0x700 > >>>> [ 0.329167] get_page_from_freelist+0xa54/0x1310 > >>>> [ 0.329167] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>> [ 0.329167] __alloc_pages_noprof+0x10/0x1b0 > >>>> [ 0.329167] dup_task_struct+0x163/0x8c0 > >>>> [ 0.329167] copy_process+0x390/0x4a70 > >>>> [ 0.329167] kernel_clone+0xe1/0x830 > >>>> [ 0.329167] kernel_thread+0xcb/0x110 > >>>> [ 0.329167] kthreadd+0x8a2/0xc60 > >>>> [ 0.329167] ret_from_fork+0x551/0x720 > >>>> [ 0.329167] ret_from_fork_asm+0x1a/0x30 > >>>> [ 0.329167] </TASK> > >>>> > >>>> CPU: 4 UID: 0 PID: 1 Comm: swapper/0 Not tainted > >>>> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) > >>>> [ 0.434265] Hardware name: Red Hat KVM, BIOS > >>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>> [ 0.434266] Call Trace: > >>>> [ 0.434266] <TASK> > >>>> [ 0.434266] dump_stack_lvl+0x53/0x70 > >>>> [ 0.434268] __pgalloc_tag_add+0x407/0x700 > >>>> [ 0.434272] get_page_from_freelist+0xa54/0x1310 > >>>> [ 0.434274] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>> [ 0.434279] alloc_pages_exact_nid_noprof+0x10f/0x380 > >>>> [ 0.434283] init_section_page_ext+0x167/0x370 > >>>> [ 0.434284] page_ext_init+0x451/0x620 > >>>> [ 0.434287] page_alloc_init_late+0x553/0x630 > >>>> [ 0.434290] kernel_init_freeable+0x7be/0xd30 > >>>> [ 0.434294] kernel_init+0x1f/0x1f0 > >>>> [ 0.434295] ret_from_fork+0x551/0x720 > >>>> [ 0.434301] ret_from_fork_asm+0x1a/0x30 > >>>> [ 0.434303] </TASK> > >>>> > >>>> CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted > >>>> 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy) > >>>> [ 0.346712] Hardware name: Red Hat KVM, BIOS > >>>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > >>>> [ 0.346713] Call Trace: > >>>> [ 0.346713] <TASK> > >>>> [ 0.346714] dump_stack_lvl+0x53/0x70 > >>>> [ 0.346715] __pgalloc_tag_add+0x407/0x700 > >>>> [ 0.346720] get_page_from_freelist+0xa54/0x1310 > >>>> [ 0.346723] __alloc_frozen_pages_noprof+0x206/0x4c0 > >>>> [ 0.346729] __alloc_pages_noprof+0x10/0x1b0 > >>>> [ 0.346731] alloc_cpu_data+0x96/0x210 > >>>> [ 0.346732] rb_allocate_cpu_buffer+0xb93/0x1500 > >>>> [ 0.346739] trace_rb_cpu_prepare+0x21a/0x4f0 > >>>> [ 0.346753] cpuhp_invoke_callback+0x6db/0x14b0 > >>>> [ 0.346755] __cpuhp_invoke_callback_range+0xde/0x1d0 > >>>> [ 0.346759] _cpu_up+0x395/0x880 > >>>> [ 0.346761] cpu_up+0x1bb/0x210 > >>>> [ 0.346762] cpuhp_bringup_mask+0xd2/0x150 > >>>> [ 0.346763] bringup_nonboot_cpus+0x12b/0x170 > >>>> [ 0.346764] smp_init+0x2f/0x100 > >>>> [ 0.346766] kernel_init_freeable+0x7a5/0xd30 > >>>> [ 0.346769] kernel_init+0x1f/0x1f0 > >>>> [ 0.346771] ret_from_fork+0x551/0x720 > >>>> [ 0.346776] ret_from_fork_asm+0x1a/0x30 > >>>> [ 0.346778] </TASK> > >>>> > >>>> and so on... > >>>> > >>>> > >>>> In fact, I previously conducted extensive and prolonged stress testing > >>>> > >>>> on memory profiling. After our efforts to address several WARN cases, > >>>> > >>>> one remaining scenario we are addressing is the warning triggered during > >>>> > >>>> early slab cache reclaim — which is precisely the situation we are currently > >>>> > >>>> encountering (although I cannot guarantee that all edge cases have been > >>>> > >>>> covered by our stress testing). During the stress testing process, this > >>>> warning > >>>> > >>>> did indeed manifest. However, the current environment triggers KASAN slab > >>>> > >>>> cache reclaim earlier than anticipated. > >>>> > >>>> > >>>> Although the memory allocated prior to page_ext initialization has a > >>>> relatively low probability of > >>>> > >>>> being released in subsequent operations (at least we have not > >>>> encountered such cases up to now), > >>>> > >>>> I remain uncertain whether there are any overlooked edge cases when > >>>> considering only slab-backed pages. > >> Hi Suren > >> > >> > >>> Ok, I guess specialized solution for slab would not work then. I want > >>> to check on my side and understand how the number of these early > >>> allocation scales. Is it higher for bigger machines or stays constant. > >>> If the latter I think your original simple solution with some fixups > >>> can still work. I'll need to instrument my code to capture these early > >>> allocations and see where they originate. If you have a patch already > >>> doing that it would help speed it up for me. > >>> Thanks, > >>> Suren. > >> OK, my V2 patch is as follows: > > Hi Suren > > > > Thanks! I'll go over it but first I need to check if the number of > > early allocations is constant or dependent on some factors like > > machine size (as I mentioned before). I hope to carve out some time to > > investigate that this Friday. > > We should also probably start a separate thread for this v2 as this > > email thread is getting painfully long. > > OK, Right, but I can share the test data from my side. > > With early_page_ext disabled, I tested the following scenarios, and I > will share my data. > > 8C16G: alloc_count = 802 > 8C32G: alloc_count = 790 > 16C32G: alloc_count = 994 > 16C64G: alloc_count = 992 > 32C64G: alloc_count = 1364 > 64C64G: alloc_count = 2226 > 128C64G: alloc_count = 3913 > I think it makes sense for the value to grow with the number of CPUs, Yep, makes sense. Thanks, this is very helpful data! Now we need to come up with a solution that can scale :) > as this involves memory allocations related to CPU boot, like this: > > [ 0.345299] dump_stack_lvl+0x53/0x70 > [ 0.345301] __pgalloc_tag_add+0x407/0x700 > [ 0.345306] get_page_from_freelist+0xa54/0x1310 > [ 0.345308] __alloc_frozen_pages_noprof+0x206/0x4c0 > [ 0.345314] __alloc_pages_noprof+0x10/0x1b0 > [ 0.345316] alloc_cpu_data+0x96/0x210 > [ 0.345318] rb_allocate_cpu_buffer+0xb93/0x1500 > [ 0.345325] trace_rb_cpu_prepare+0x21a/0x4f0 > [ 0.345327] cpuhp_invoke_callback+0x6db/0x14b0 > [ 0.345329] __cpuhp_invoke_callback_range+0xde/0x1d0 > [ 0.345333] _cpu_up+0x395/0x880 > [ 0.345335] cpu_up+0x1bb/0x210 > [ 0.345336] cpuhp_bringup_mask+0xd2/0x150 > [ 0.345337] bringup_nonboot_cpus+0x12b/0x170 > [ 0.345338] smp_init+0x2f/0x100 > [ 0.345340] kernel_init_freeable+0x7a5/0xd30 > [ 0.345344] kernel_init+0x1f/0x1f0 > > > I will send out version 2 as soon as possible. > > Thanks > > Hao > > > > >> > >> diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h > >> index d40ac39bfbe8..bf226c2be2ad 100644 > >> --- a/include/linux/alloc_tag.h > >> +++ b/include/linux/alloc_tag.h > >> @@ -74,6 +74,8 @@ static inline void set_codetag_empty(union codetag_ref > >> *ref) > >> > >> #ifdef CONFIG_MEM_ALLOC_PROFILING > >> > >> +void alloc_tag_add_early_pfn(unsigned long pfn); > >> + > >> #define ALLOC_TAG_SECTION_NAME "alloc_tags" > >> > >> struct codetag_bytes { > >> diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h > >> index 38a82d65e58e..951d33362268 100644 > >> --- a/include/linux/pgalloc_tag.h > >> +++ b/include/linux/pgalloc_tag.h > >> @@ -181,7 +181,7 @@ static inline struct alloc_tag > >> *__pgalloc_tag_get(struct page *page) > >> > >> if (get_page_tag_ref(page, &ref, &handle)) { > >> alloc_tag_sub_check(&ref); > >> - if (ref.ct) > >> + if (ref.ct && !is_codetag_empty(&ref)) > >> tag = ct_to_alloc_tag(ref.ct); > >> put_page_tag_ref(handle); > >> } > >> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > >> index 58991ab09d84..55c134a71cd0 100644 > >> --- a/lib/alloc_tag.c > >> +++ b/lib/alloc_tag.c > >> @@ -6,6 +6,7 @@ > >> #include <linux/kallsyms.h> > >> #include <linux/module.h> > >> #include <linux/page_ext.h> > >> +#include <linux/pgalloc_tag.h> > >> #include <linux/proc_fs.h> > >> #include <linux/seq_buf.h> > >> #include <linux/seq_file.h> > >> @@ -26,6 +27,85 @@ static bool mem_profiling_support; > >> > >> static struct codetag_type *alloc_tag_cttype; > >> > >> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG > >> + > >> +/* > >> + * page_ext is allocated and initialized relatively late during boot. > >> + * Some pages are allocated before page_ext becomes available. > >> + * Track these early PFNs and clear their codetag refs later to avoid > >> + * warnings when they are freed. > >> + */ > >> + > >> +#define EARLY_ALLOC_PFN_MAX 256 > >> + > >> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX] __initdata; > >> +static atomic_t early_pfn_count __initdata = ATOMIC_INIT(0); > >> + > >> +static void __init __alloc_tag_add_early_pfn(unsigned long pfn) > >> +{ > >> + int old_idx, new_idx; > >> + > >> + do { > >> + old_idx = atomic_read(&early_pfn_count); > >> + if (old_idx >= EARLY_ALLOC_PFN_MAX) > >> + return; > >> + new_idx = old_idx + 1; > >> + } while (!atomic_try_cmpxchg(&early_pfn_count, &old_idx, new_idx)); > >> + > >> + early_pfns[old_idx] = pfn; > >> +} > >> + > >> +static void (*alloc_tag_add_early_pfn_ptr)(unsigned long pfn) __refdata = > >> + __alloc_tag_add_early_pfn; > >> + > >> +void alloc_tag_add_early_pfn(unsigned long pfn) > >> +{ > >> + if (static_key_enabled(&mem_profiling_compressed)) > >> + return; > >> + > >> + if (alloc_tag_add_early_pfn_ptr) > >> + alloc_tag_add_early_pfn_ptr(pfn); > >> +} > >> + > >> +static void __init clear_early_alloc_pfn_tag_refs(void) > >> +{ > >> + unsigned int i; > >> + > >> + for (i = 0; i < atomic_read(&early_pfn_count); i++) { > >> + unsigned long pfn = early_pfns[i]; > >> + > >> + if (pfn_valid(pfn)) { > >> + struct page *page = pfn_to_page(pfn); > >> + union pgtag_ref_handle handle; > >> + union codetag_ref ref; > >> + > >> + if (get_page_tag_ref(page, &ref, &handle)) { > >> + /* > >> + * An early-allocated page could be freed and reallocated > >> + * after its page_ext is initialized but before we > >> clear it. > >> + * In that case, it already has a valid tag set. > >> + * We should not overwrite that valid tag with > >> CODETAG_EMPTY. > >> + */ > >> + if (ref.ct) { > >> + put_page_tag_ref(handle); > >> + continue; > >> + } > >> + > >> + set_codetag_empty(&ref); > >> + update_page_tag_ref(handle, &ref); > >> + put_page_tag_ref(handle); > >> + } > >> + } > >> + > >> + atomic_set(&early_pfn_count, 0); > >> + > >> + alloc_tag_add_early_pfn_ptr = NULL; > >> +} > >> +#else /* !CONFIG_MEM_ALLOC_PROFILING_DEBUG */ > >> +inline void alloc_tag_add_early_pfn(unsigned long pfn) {} > >> +static inline void __init clear_early_alloc_pfn_tag_refs(void) {} > >> +#endif > >> + > >> #ifdef CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU > >> DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag); > >> EXPORT_SYMBOL(_shared_alloc_tag); > >> @@ -760,6 +840,7 @@ static __init bool need_page_alloc_tagging(void) > >> > >> static __init void init_page_alloc_tagging(void) > >> { > >> + clear_early_alloc_pfn_tag_refs(); > >> } > >> > >> struct page_ext_operations page_alloc_tagging_ops = { > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index 2d4b6f1a554e..5ce5c4ba401f 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -1293,6 +1293,12 @@ void __pgalloc_tag_add(struct page *page, struct > >> task_struct *task, > >> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >> update_page_tag_ref(handle, &ref); > >> put_page_tag_ref(handle); > >> + } else { > >> + /* > >> + * page_ext is not available yet, record the pfn so we can > >> + * clear the tag ref later when page_ext is initialized. > >> + */ > >> + alloc_tag_add_early_pfn(page_to_pfn(page)); > >> } > >> } > >> > >> Although this 256-entry array remains unmodified for now, I will locally > >> record the occurrence counts > >> > >> of these various early memory allocations. Hopefully this will be > >> helpful to you. > >> > >> > >> Thanks > >> > >> Hao > >> > >>>> Thanks > >>>> Hao > >>>> > >>>>>> Thanks, > >>>>>> Suren. > >>>>>> > >>>>>>> However, I'm not entirely certain whether SPARSEMEM can guarantee the > >>>>>>> same behavior. > >>>>>>> > >>>>>>> > >>>>>>>>> I would appreciate your valuable feedback and any better suggestions you > >>>>>>>>> might have. > >>>>>>>> Thanks for pursuing this! I'll help in any way I can. > >>>>>>>> Suren. > >>>>>>> Thank you so much for your patient guidance and assistance. > >>>>>>> > >>>>>>> I truly appreciate your willingness to share your knowledge and insights. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Hao > >>>>>>> > >>>>>>>>> Thanks > >>>>>>>>> > >>>>>>>>> Hao > >>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> Suren. > >>>>>>>>>> > >>>>>>>>>>> Thanks > >>>>>>>>>>> > >>>>>>>>>>> Hao > >>>>>>>>>>> > >>>>>>>>>>>>> Thanks. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> If the slab cache has no free objects, it falls back > >>>>>>>>>>>>>>>>> to the buddy allocator to allocate memory. However, at this point page_ext > >>>>>>>>>>>>>>>>> is not yet fully initialized, so these newly allocated pages have no > >>>>>>>>>>>>>>>>> codetag set. These pages may later be reclaimed by KASAN,which causes > >>>>>>>>>>>>>>>>> the warning to trigger when they are freed because their codetag ref is > >>>>>>>>>>>>>>>>> still empty. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Use a global array to track pages allocated before page_ext is fully > >>>>>>>>>>>>>>>>> initialized, similar to how kmemleak tracks early allocations. > >>>>>>>>>>>>>>>>> When page_ext initialization completes, set their codetag > >>>>>>>>>>>>>>>>> to empty to avoid warnings when they are freed later. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> ... > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> --- a/include/linux/alloc_tag.h > >>>>>>>>>>>>>>>>> +++ b/include/linux/alloc_tag.h > >>>>>>>>>>>>>>>>> @@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> #ifdef CONFIG_MEM_ALLOC_PROFILING > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> +bool mem_profiling_is_available(void); > >>>>>>>>>>>>>>>>> +void alloc_tag_add_early_pfn(unsigned long pfn); > >>>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>>> #define ALLOC_TAG_SECTION_NAME "alloc_tags" > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> struct codetag_bytes { > >>>>>>>>>>>>>>>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > >>>>>>>>>>>>>>>>> index 58991ab09d84..a5bf4e72c154 100644 > >>>>>>>>>>>>>>>>> --- a/lib/alloc_tag.c > >>>>>>>>>>>>>>>>> +++ b/lib/alloc_tag.c > >>>>>>>>>>>>>>>>> @@ -6,6 +6,7 @@ > >>>>>>>>>>>>>>>>> #include <linux/kallsyms.h> > >>>>>>>>>>>>>>>>> #include <linux/module.h> > >>>>>>>>>>>>>>>>> #include <linux/page_ext.h> > >>>>>>>>>>>>>>>>> +#include <linux/pgalloc_tag.h> > >>>>>>>>>>>>>>>>> #include <linux/proc_fs.h> > >>>>>>>>>>>>>>>>> #include <linux/seq_buf.h> > >>>>>>>>>>>>>>>>> #include <linux/seq_file.h> > >>>>>>>>>>>>>>>>> @@ -26,6 +27,82 @@ static bool mem_profiling_support; > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> static struct codetag_type *alloc_tag_cttype; > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> +/* > >>>>>>>>>>>>>>>>> + * State of the alloc_tag > >>>>>>>>>>>>>>>>> + * > >>>>>>>>>>>>>>>>> + * This is used to describe the states of the alloc_tag during bootup. > >>>>>>>>>>>>>>>>> + * > >>>>>>>>>>>>>>>>> + * When we need to allocate page_ext to store codetag, we face an > >>>>>>>>>>>>>>>>> + * initialization timing problem: > >>>>>>>>>>>>>>>>> + * > >>>>>>>>>>>>>>>>> + * Due to initialization order, pages may be allocated via buddy system > >>>>>>>>>>>>>>>>> + * before page_ext is fully allocated and initialized. Although these > >>>>>>>>>>>>>>>>> + * pages call the allocation hooks, the codetag will not be set because > >>>>>>>>>>>>>>>>> + * page_ext is not yet available. > >>>>>>>>>>>>>>>>> + * > >>>>>>>>>>>>>>>>> + * When these pages are later free to the buddy system, it triggers > >>>>>>>>>>>>>>>>> + * warnings because their codetag is actually empty if > >>>>>>>>>>>>>>>>> + * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled. > >>>>>>>>>>>>>>>>> + * > >>>>>>>>>>>>>>>>> + * Additionally, in this situation, we cannot record detailed allocation > >>>>>>>>>>>>>>>>> + * information for these pages. > >>>>>>>>>>>>>>>>> + */ > >>>>>>>>>>>>>>>>> +enum mem_profiling_state { > >>>>>>>>>>>>>>>>> + DOWN, /* No mem_profiling functionality yet */ > >>>>>>>>>>>>>>>>> + UP /* Everything is working */ > >>>>>>>>>>>>>>>>> +}; > >>>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>>> +static enum mem_profiling_state mem_profiling_state = DOWN; > >>>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>>> +bool mem_profiling_is_available(void) > >>>>>>>>>>>>>>>>> +{ > >>>>>>>>>>>>>>>>> + return mem_profiling_state == UP; > >>>>>>>>>>>>>>>>> +} > >>>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>>> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG > >>>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>>> +#define EARLY_ALLOC_PFN_MAX 256 > >>>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>>> +static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX]; > >>>>>>>>>>>>>>>> It's unfortunate that this isn't __initdata. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> +static unsigned int early_pfn_count; > >>>>>>>>>>>>>>>>> +static DEFINE_SPINLOCK(early_pfn_lock); > >>>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> ... > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> --- a/mm/page_alloc.c > >>>>>>>>>>>>>>>>> +++ b/mm/page_alloc.c > >>>>>>>>>>>>>>>>> @@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task, > >>>>>>>>>>>>>>>>> alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); > >>>>>>>>>>>>>>>>> update_page_tag_ref(handle, &ref); > >>>>>>>>>>>>>>>>> put_page_tag_ref(handle); > >>>>>>>>>>>>>>>>> + } else { > >>>>>>>>>>>>>>> This branch can be marked as "unlikely". > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> + /* > >>>>>>>>>>>>>>>>> + * page_ext is not available yet, record the pfn so we can > >>>>>>>>>>>>>>>>> + * clear the tag ref later when page_ext is initialized. > >>>>>>>>>>>>>>>>> + */ > >>>>>>>>>>>>>>>>> + if (!mem_profiling_is_available()) > >>>>>>>>>>>>>>>>> + alloc_tag_add_early_pfn(page_to_pfn(page)); > >>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>> All because of this, I believe. Is this fixable? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> If we take that `else', we know we're running in __init code, yes? I > >>>>>>>>>>>>>>>> don't see how `__init pgalloc_tag_add_early()' could be made to work. > >>>>>>>>>>>>>>>> hrm. Something clever, please. > >>>>>>>>>>>>>>> We can have a pointer to a function that is initialized to point to > >>>>>>>>>>>>>>> alloc_tag_add_early_pfn, which is defined as __init and uses > >>>>>>>>>>>>>>> early_pfns which now can be defined as __initdata. After > >>>>>>>>>>>>>>> clear_early_alloc_pfn_tag_refs() is done we reset that pointer to > >>>>>>>>>>>>>>> NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn() > >>>>>>>>>>>>>>> directly checks that pointer and if it's not NULL then calls the > >>>>>>>>>>>>>>> function that it points to. This way __pgalloc_tag_add() which is not > >>>>>>>>>>>>>>> an __init function will be invoking alloc_tag_add_early_pfn() __init > >>>>>>>>>>>>>>> function only until we are done with initialization. I haven't tried > >>>>>>>>>>>>>>> this but I think that should work. This also eliminates the need for > >>>>>>>>>>>>>>> mem_profiling_state variable since we can use this function pointer > >>>>>>>>>>>>>>> instead. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-19 8:31 [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization Hao Ge 2026-03-19 22:28 ` Andrew Morton @ 2026-03-20 3:14 ` Andrew Morton 2026-03-20 4:18 ` Suren Baghdasaryan 1 sibling, 1 reply; 21+ messages in thread From: Andrew Morton @ 2026-03-20 3:14 UTC (permalink / raw) To: Hao Ge; +Cc: Suren Baghdasaryan, Kent Overstreet, linux-mm, linux-kernel On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: > Due to initialization ordering, page_ext is allocated and initialized > relatively late during boot. Some pages have already been allocated > and freed before page_ext becomes available, leaving their codetag > uninitialized. > > A clear example is in init_section_page_ext(): alloc_page_ext() calls > kmemleak_alloc(). If the slab cache has no free objects, it falls back > to the buddy allocator to allocate memory. However, at this point page_ext > is not yet fully initialized, so these newly allocated pages have no > codetag set. These pages may later be reclaimed by KASAN,which causes > the warning to trigger when they are freed because their codetag ref is > still empty. > > Use a global array to track pages allocated before page_ext is fully > initialized, similar to how kmemleak tracks early allocations. > When page_ext initialization completes, set their codetag > to empty to avoid warnings when they are freed later. AI review asks questions: https://sashiko.dev/#/patchset/20260319083153.2488005-1-hao.ge%40linux.dev ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization 2026-03-20 3:14 ` Andrew Morton @ 2026-03-20 4:18 ` Suren Baghdasaryan 0 siblings, 0 replies; 21+ messages in thread From: Suren Baghdasaryan @ 2026-03-20 4:18 UTC (permalink / raw) To: Andrew Morton; +Cc: Hao Ge, Kent Overstreet, linux-mm, linux-kernel On Thu, Mar 19, 2026 at 8:14 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@linux.dev> wrote: > > > Due to initialization ordering, page_ext is allocated and initialized > > relatively late during boot. Some pages have already been allocated > > and freed before page_ext becomes available, leaving their codetag > > uninitialized. > > > > A clear example is in init_section_page_ext(): alloc_page_ext() calls > > kmemleak_alloc(). If the slab cache has no free objects, it falls back > > to the buddy allocator to allocate memory. However, at this point page_ext > > is not yet fully initialized, so these newly allocated pages have no > > codetag set. These pages may later be reclaimed by KASAN,which causes > > the warning to trigger when they are freed because their codetag ref is > > still empty. > > > > Use a global array to track pages allocated before page_ext is fully > > initialized, similar to how kmemleak tracks early allocations. > > When page_ext initialization completes, set their codetag > > to empty to avoid warnings when they are freed later. > > AI review asks questions: > https://sashiko.dev/#/patchset/20260319083153.2488005-1-hao.ge%40linux.dev Impressive! ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2026-03-26 8:24 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-19 8:31 [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization Hao Ge 2026-03-19 22:28 ` Andrew Morton 2026-03-19 23:44 ` Suren Baghdasaryan 2026-03-19 23:48 ` Suren Baghdasaryan 2026-03-20 1:57 ` Hao Ge 2026-03-20 2:14 ` Suren Baghdasaryan 2026-03-23 9:15 ` Hao Ge 2026-03-23 22:47 ` Suren Baghdasaryan 2026-03-24 9:43 ` Hao Ge 2026-03-25 0:21 ` Suren Baghdasaryan 2026-03-25 2:07 ` Hao Ge 2026-03-25 6:25 ` Suren Baghdasaryan 2026-03-25 7:35 ` Suren Baghdasaryan 2026-03-25 11:20 ` Hao Ge 2026-03-25 15:17 ` Suren Baghdasaryan 2026-03-26 1:44 ` Hao Ge 2026-03-26 5:04 ` Suren Baghdasaryan 2026-03-26 5:33 ` Hao Ge 2026-03-26 8:23 ` Suren Baghdasaryan 2026-03-20 3:14 ` Andrew Morton 2026-03-20 4:18 ` Suren Baghdasaryan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox