linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: tag kernel stack pages
@ 2025-08-20 20:20 Vishal Moola (Oracle)
  2025-08-21 12:44 ` David Hildenbrand
  2025-09-03  7:49 ` David Hildenbrand
  0 siblings, 2 replies; 8+ messages in thread
From: Vishal Moola (Oracle) @ 2025-08-20 20:20 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, David Hildenbrand,
	Vishal Moola (Oracle)

Currently, we have no way to distinguish a kernel stack page from an
unidentified page. Being able to track this information can be
beneficial for optimizing kernel memory usage (i.e. analyzing
fragmentation, location etc.). Knowing a page is being used for a kernel
stack gives us more insight about pages that are certainly immovable and
important to kernel functionality.

Add a new pagetype, and tag pages alongside the kernel stack accounting.
Also, ensure the type is dumped to /proc/kpageflags and the page-types
tool can find it.

Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
 fs/proc/page.c                         |  3 ++-
 include/linux/page-flags.h             |  5 +++++
 include/uapi/linux/kernel-page-flags.h |  1 +
 kernel/fork.c                          | 19 +++++++++++++++++--
 tools/mm/page-types.c                  |  1 +
 5 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 771e0b6bc630..46be207c5a02 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -201,7 +201,8 @@ u64 stable_page_flags(const struct page *page)
 
 	if (ps.flags & PAGE_SNAPSHOT_PG_BUDDY)
 		u |= 1 << KPF_BUDDY;
-
+	if (folio_test_stack(folio))
+		u |= 1 << KPF_KSTACK;
 	if (folio_test_offline(folio))
 		u |= 1 << KPF_OFFLINE;
 	if (folio_test_pgtable(folio))
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d53a86e68c89..5ee6ffbdbf83 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -933,6 +933,7 @@ enum pagetype {
 	PGTY_zsmalloc		= 0xf6,
 	PGTY_unaccepted		= 0xf7,
 	PGTY_large_kmalloc	= 0xf8,
+	PGTY_kstack		= 0xf9,
 
 	PGTY_mapcount_underflow = 0xff
 };
@@ -995,6 +996,10 @@ static __always_inline void __ClearPage##uname(struct page *page)	\
 	page->page_type = UINT_MAX;					\
 }
 
+/* PageStack() indicates that a page is used by kernel stacks.
+ */
+PAGE_TYPE_OPS(Stack, kstack, stack)
+
 /*
  * PageBuddy() indicates that the page is free and in the buddy system
  * (see mm/page_alloc.c).
diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
index ff8032227876..56175b497ace 100644
--- a/include/uapi/linux/kernel-page-flags.h
+++ b/include/uapi/linux/kernel-page-flags.h
@@ -36,5 +36,6 @@
 #define KPF_ZERO_PAGE		24
 #define KPF_IDLE		25
 #define KPF_PGTABLE		26
+#define KPF_KSTACK		27
 
 #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 5115be549234..c8a6e1495acf 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -440,15 +440,22 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
 		struct vm_struct *vm_area = task_stack_vm_area(tsk);
 		int i;
 
-		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++)
+		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) {
 			mod_lruvec_page_state(vm_area->pages[i], NR_KERNEL_STACK_KB,
 					      account * (PAGE_SIZE / 1024));
+			__SetPageStack(vm_area->pages[i]);
+		}
 	} else {
 		void *stack = task_stack_page(tsk);
+		struct page *page = virt_to_head_page(stack);
+		int i;
 
 		/* All stack pages are in the same node. */
 		mod_lruvec_kmem_state(stack, NR_KERNEL_STACK_KB,
 				      account * (THREAD_SIZE / 1024));
+
+		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++, page++)
+			__SetPageStack(page);
 	}
 }
 
@@ -461,8 +468,16 @@ void exit_task_stack_account(struct task_struct *tsk)
 		int i;
 
 		vm_area = task_stack_vm_area(tsk);
-		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++)
+		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) {
 			memcg_kmem_uncharge_page(vm_area->pages[i], 0);
+			__ClearPageStack(vm_area->pages[i]);
+		}
+	} else {
+		struct page *page = virt_to_head_page(task_stack_page(tsk));
+		int i;
+
+		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++, page++)
+			__ClearPageStack(page);
 	}
 }
 
diff --git a/tools/mm/page-types.c b/tools/mm/page-types.c
index d7e5e8902af8..4031fdbad3e7 100644
--- a/tools/mm/page-types.c
+++ b/tools/mm/page-types.c
@@ -127,6 +127,7 @@ static const char * const page_flag_names[] = {
 	[KPF_PGTABLE]		= "g:pgtable",
 	[KPF_ZERO_PAGE]		= "z:zero_page",
 	[KPF_IDLE]              = "i:idle_page",
+	[KPF_KSTACK]		= "k:kernel_stack",
 
 	[KPF_RESERVED]		= "r:reserved",
 	[KPF_MLOCKED]		= "m:mlocked",
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: tag kernel stack pages
  2025-08-20 20:20 [PATCH] mm: tag kernel stack pages Vishal Moola (Oracle)
@ 2025-08-21 12:44 ` David Hildenbrand
  2025-09-02 19:52   ` Matthew Wilcox
  2025-09-03  7:49 ` David Hildenbrand
  1 sibling, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2025-08-21 12:44 UTC (permalink / raw)
  To: Vishal Moola (Oracle), linux-mm
  Cc: linux-kernel, Andrew Morton, Matthew Wilcox

On 20.08.25 22:20, Vishal Moola (Oracle) wrote:
> Currently, we have no way to distinguish a kernel stack page from an
> unidentified page. Being able to track this information can be
> beneficial for optimizing kernel memory usage (i.e. analyzing
> fragmentation, location etc.). Knowing a page is being used for a kernel
> stack gives us more insight about pages that are certainly immovable and
> important to kernel functionality.

It's a very niche use case. Anything that's not clearly a folio or a 
special movable_ops page is certainly immovable. So we can identify 
pretty reliable what's movable and what's not.

Happy to learn how you would want to use that knowledge to reduce 
fragmentation. :)

So this reads a bit hand-wavy.

> 
> Add a new pagetype, and tag pages alongside the kernel stack accounting.
> Also, ensure the type is dumped to /proc/kpageflags and the page-types
> tool can find it.
> 
> Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> ---
>   fs/proc/page.c                         |  3 ++-
>   include/linux/page-flags.h             |  5 +++++
>   include/uapi/linux/kernel-page-flags.h |  1 +
>   kernel/fork.c                          | 19 +++++++++++++++++--
>   tools/mm/page-types.c                  |  1 +
>   5 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/proc/page.c b/fs/proc/page.c
> index 771e0b6bc630..46be207c5a02 100644
> --- a/fs/proc/page.c
> +++ b/fs/proc/page.c
> @@ -201,7 +201,8 @@ u64 stable_page_flags(const struct page *page)
>   
>   	if (ps.flags & PAGE_SNAPSHOT_PG_BUDDY)
>   		u |= 1 << KPF_BUDDY;
> -
> +	if (folio_test_stack(folio))
> +		u |= 1 << KPF_KSTACK;
>   	if (folio_test_offline(folio))
>   		u |= 1 << KPF_OFFLINE;
>   	if (folio_test_pgtable(folio))
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index d53a86e68c89..5ee6ffbdbf83 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -933,6 +933,7 @@ enum pagetype {
>   	PGTY_zsmalloc		= 0xf6,
>   	PGTY_unaccepted		= 0xf7,
>   	PGTY_large_kmalloc	= 0xf8,
> +	PGTY_kstack		= 0xf9,
>   
>   	PGTY_mapcount_underflow = 0xff
>   };
> @@ -995,6 +996,10 @@ static __always_inline void __ClearPage##uname(struct page *page)	\
>   	page->page_type = UINT_MAX;					\
>   }
>   
> +/* PageStack() indicates that a page is used by kernel stacks.
> + */
> +PAGE_TYPE_OPS(Stack, kstack, stack)
> +
>   /*
>    * PageBuddy() indicates that the page is free and in the buddy system
>    * (see mm/page_alloc.c).
> diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
> index ff8032227876..56175b497ace 100644
> --- a/include/uapi/linux/kernel-page-flags.h
> +++ b/include/uapi/linux/kernel-page-flags.h
> @@ -36,5 +36,6 @@
>   #define KPF_ZERO_PAGE		24
>   #define KPF_IDLE		25
>   #define KPF_PGTABLE		26
> +#define KPF_KSTACK		27
>   
>   #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 5115be549234..c8a6e1495acf 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -440,15 +440,22 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
>   		struct vm_struct *vm_area = task_stack_vm_area(tsk);
>   		int i;
>   
> -		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++)
> +		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) {
>   			mod_lruvec_page_state(vm_area->pages[i], NR_KERNEL_STACK_KB,
>   					      account * (PAGE_SIZE / 1024));
> +			__SetPageStack(vm_area->pages[i]);
> +		}
>   	} else {
>   		void *stack = task_stack_page(tsk);
> +		struct page *page = virt_to_head_page(stack);
> +		int i;
>   
>   		/* All stack pages are in the same node. */
>   		mod_lruvec_kmem_state(stack, NR_KERNEL_STACK_KB,
>   				      account * (THREAD_SIZE / 1024));
> +
> +		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++, page++)
> +			__SetPageStack(page);
>   	}
>   }
>   
> @@ -461,8 +468,16 @@ void exit_task_stack_account(struct task_struct *tsk)
>   		int i;
>   
>   		vm_area = task_stack_vm_area(tsk);
> -		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++)
> +		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) {
>   			memcg_kmem_uncharge_page(vm_area->pages[i], 0);
> +			__ClearPageStack(vm_area->pages[i]);
> +		}
> +	} else {
> +		struct page *page = virt_to_head_page(task_stack_page(tsk));
> +		int i;
> +
> +		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++, page++)
> +			__ClearPageStack(page);
>   	}
>   }

Note that exit_task_stack_account() stack calls 
account_kernel_stack(tsk, -1), where you would do a non-senical 
__SetPageStack() first.

... so this would better be done in account_kernel_stack() based on the 
"int account" flag.

But I wonder, if this should actually go to the actual place where we 
alloc/free.

Now that it's no longer required to clear page types when freeing, 
alloc_thread_stack_node() might be a better place to set it, and to 
leave it set until freed.

I'll leave Willy whether we actually want this type, cannot spot it 
under [1], but if we have sufficient types available, why not.

BUT

staring at [1], we allocate from vmalloc, so I would assume that these 
will be vmalloc-typed pages in the future and we cannot change the type 
later.


[1] https://kernelnewbies.org/MatthewWilcox/Memdescs

-- 
Cheers

David / dhildenb


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: tag kernel stack pages
  2025-08-21 12:44 ` David Hildenbrand
@ 2025-09-02 19:52   ` Matthew Wilcox
  2025-09-04 10:31     ` David Hildenbrand
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2025-09-02 19:52 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Vishal Moola (Oracle), linux-mm, linux-kernel, Andrew Morton

On Thu, Aug 21, 2025 at 02:44:31PM +0200, David Hildenbrand wrote:
> On 20.08.25 22:20, Vishal Moola (Oracle) wrote:
> > Currently, we have no way to distinguish a kernel stack page from an
> > unidentified page. Being able to track this information can be
> > beneficial for optimizing kernel memory usage (i.e. analyzing
> > fragmentation, location etc.). Knowing a page is being used for a kernel
> > stack gives us more insight about pages that are certainly immovable and
> > important to kernel functionality.
> 
> It's a very niche use case. Anything that's not clearly a folio or a special
> movable_ops page is certainly immovable. So we can identify pretty reliable
> what's movable and what's not.
> 
> Happy to learn how you would want to use that knowledge to reduce
> fragmentation. :)
> 
> So this reads a bit hand-wavy.

I have a theory that we should always be attempting to do aligned
allocations if we can, falling back to individual allocations if
we can't.  This is an attempt to gather some data to inform us whether
that theory is true, and to help us measure whether any effort we
take to improve that situation is effective.

Eyeballing the output of tools/testing/page-types certainly lends
some credence to this.  On x86-64 with its 16KiB stacks and 4KiB
page size, we often see four consecutive pages allocated as type
KernelStack, and as you'd expect only about 25% of the time are they
aligned to a 16KiB boundary.  That is, at least 75% of the time they
prevent _two_ order-2 pages from being available.

As you say, they're not movable.  I'm not sure if it makes sense to
go to the effort of making them movable; it'd require interacting
with the scheduler (to prevent the task we're relocating from
being scheduled), and I don't think the realtime people would be
terribly keen on that idea.  So that isn't one of the ideas we
have on the table for improving matters.

Ideas we have been batting around:

 - Have kernel stacks try to do an order-N allocation and vmap()
   the result, fall back to current implementation
 - Have vmalloc try to do an order-N allocation, fall back down the
   orders on failure to allocate
 - Change the alloc_bulk implementation to do the order-N allocation
   and fall back

I'm sure other possibilities also exist.

> staring at [1], we allocate from vmalloc, so I would assume that these will
> be vmalloc-typed pages in the future and we cannot change the type later.
> 
> [1] https://kernelnewbies.org/MatthewWilcox/Memdescs

I see the vmalloc subtype as being a "we don't know any better" type.
We could allocate another subtype of type 0 to mean "kernel stacks"
and have it be implicit that kernel stacks are allocated from vmalloc.
This would probably require that we have a vmalloc interface that lets us
specify a subtype, which I think is probably something we'd want anyway.

I think it's fine to say "This doesn't add enough value to merge it
upstream".  I will note one minor advantage which is that typing these
pages as PGTY_kstack today prevents them from being inadvertently mapped
to userspace (whether by malicious code or innocent bug).


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: tag kernel stack pages
  2025-08-20 20:20 [PATCH] mm: tag kernel stack pages Vishal Moola (Oracle)
  2025-08-21 12:44 ` David Hildenbrand
@ 2025-09-03  7:49 ` David Hildenbrand
  2025-09-03 18:19   ` Vishal Moola (Oracle)
  1 sibling, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2025-09-03  7:49 UTC (permalink / raw)
  To: Vishal Moola (Oracle), linux-mm; +Cc: linux-kernel, Andrew Morton

[resending my original mail because it might have landed in the spam folder]

On 20.08.25 22:20, Vishal Moola (Oracle) wrote:
> Currently, we have no way to distinguish a kernel stack page from an
> unidentified page. Being able to track this information can be
> beneficial for optimizing kernel memory usage (i.e. analyzing
> fragmentation, location etc.). Knowing a page is being used for a kernel
> stack gives us more insight about pages that are certainly immovable and
> important to kernel functionality.

It's a very niche use case. Anything that's not clearly a folio or a
special movable_ops page is certainly immovable. So we can identify
pretty reliable what's movable and what's not.

Happy to learn how you would want to use that knowledge to reduce
fragmentation. 🙂

So this reads a bit hand-wavy.

> 
> Add a new pagetype, and tag pages alongside the kernel stack accounting.
> Also, ensure the type is dumped to /proc/kpageflags and the page-types
> tool can find it.
> 
> Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> ---
>   fs/proc/page.c                         |  3 ++-
>   include/linux/page-flags.h             |  5 +++++
>   include/uapi/linux/kernel-page-flags.h |  1 +
>   kernel/fork.c                          | 19 +++++++++++++++++--
>   tools/mm/page-types.c                  |  1 +
>   5 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/proc/page.c b/fs/proc/page.c
> index 771e0b6bc630..46be207c5a02 100644
> --- a/fs/proc/page.c
> +++ b/fs/proc/page.c
> @@ -201,7 +201,8 @@ u64 stable_page_flags(const struct page *page)
>   
>   	if (ps.flags & PAGE_SNAPSHOT_PG_BUDDY)
>   		u |= 1 << KPF_BUDDY;
> -
> +	if (folio_test_stack(folio))
> +		u |= 1 << KPF_KSTACK;
>   	if (folio_test_offline(folio))
>   		u |= 1 << KPF_OFFLINE;
>   	if (folio_test_pgtable(folio))
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index d53a86e68c89..5ee6ffbdbf83 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -933,6 +933,7 @@ enum pagetype {
>   	PGTY_zsmalloc		= 0xf6,
>   	PGTY_unaccepted		= 0xf7,
>   	PGTY_large_kmalloc	= 0xf8,
> +	PGTY_kstack		= 0xf9,
>   
>   	PGTY_mapcount_underflow = 0xff
>   };
> @@ -995,6 +996,10 @@ static __always_inline void __ClearPage##uname(struct page *page)	\
>   	page->page_type = UINT_MAX;					\
>   }
>   
> +/* PageStack() indicates that a page is used by kernel stacks.
> + */
> +PAGE_TYPE_OPS(Stack, kstack, stack)
> +
>   /*
>    * PageBuddy() indicates that the page is free and in the buddy system
>    * (see mm/page_alloc.c).
> diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
> index ff8032227876..56175b497ace 100644
> --- a/include/uapi/linux/kernel-page-flags.h
> +++ b/include/uapi/linux/kernel-page-flags.h
> @@ -36,5 +36,6 @@
>   #define KPF_ZERO_PAGE		24
>   #define KPF_IDLE		25
>   #define KPF_PGTABLE		26
> +#define KPF_KSTACK		27
>   
>   #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 5115be549234..c8a6e1495acf 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -440,15 +440,22 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
>   		struct vm_struct *vm_area = task_stack_vm_area(tsk);
>   		int i;
>   
> -		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++)
> +		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) {
>   			mod_lruvec_page_state(vm_area->pages[i], NR_KERNEL_STACK_KB,
>   					      account * (PAGE_SIZE / 1024));
> +			__SetPageStack(vm_area->pages[i]);
> +		}
>   	} else {
>   		void *stack = task_stack_page(tsk);
> +		struct page *page = virt_to_head_page(stack);
> +		int i;
>   
>   		/* All stack pages are in the same node. */
>   		mod_lruvec_kmem_state(stack, NR_KERNEL_STACK_KB,
>   				      account * (THREAD_SIZE / 1024));
> +
> +		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++, page++)
> +			__SetPageStack(page);
>   	}
>   }
>   
> @@ -461,8 +468,16 @@ void exit_task_stack_account(struct task_struct *tsk)
>   		int i;
>   
>   		vm_area = task_stack_vm_area(tsk);
> -		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++)
> +		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) {
>   			memcg_kmem_uncharge_page(vm_area->pages[i], 0);
> +			__ClearPageStack(vm_area->pages[i]);
> +		}
> +	} else {
> +		struct page *page = virt_to_head_page(task_stack_page(tsk));
> +		int i;
> +
> +		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++, page++)
> +			__ClearPageStack(page);
>   	}

Note that exit_task_stack_account() stack calls
account_kernel_stack(tsk, -1), where you would do a non-sensical
__SetPageStack() first.

... so this would better be done in account_kernel_stack() based on the
"int account" flag.

But I wonder, if this should actually go to the actual place where we
alloc/free.

Now that it's no longer required to clear page types when freeing,
alloc_thread_stack_node() might be a better place to set it, and to
leave it set until freed.

I'll leave Willy whether we actually want this type, cannot spot it
under [1], but if we have sufficient types available, why not.

BUT

staring at [1], we allocate from vmalloc, so I would assume that these
will be vmalloc-typed pages in the future and we cannot change the type
later.


[1] https://kernelnewbies.org/MatthewWilcox/Memdescs


-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: tag kernel stack pages
  2025-09-03  7:49 ` David Hildenbrand
@ 2025-09-03 18:19   ` Vishal Moola (Oracle)
  2025-09-04 10:23     ` David Hildenbrand
  0 siblings, 1 reply; 8+ messages in thread
From: Vishal Moola (Oracle) @ 2025-09-03 18:19 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: linux-mm, linux-kernel, Andrew Morton

On Wed, Sep 03, 2025 at 09:49:06AM +0200, David Hildenbrand wrote:
> [resending my original mail because it might have landed in the spam folder]

Ah, indeed the original mail was found in my spam folder. Thanks for
resending.

> On 20.08.25 22:20, Vishal Moola (Oracle) wrote:
> > Currently, we have no way to distinguish a kernel stack page from an
> > unidentified page. Being able to track this information can be
> > beneficial for optimizing kernel memory usage (i.e. analyzing
> > fragmentation, location etc.). Knowing a page is being used for a kernel
> > stack gives us more insight about pages that are certainly immovable and
> > important to kernel functionality.
> 
> It's a very niche use case. Anything that's not clearly a folio or a
> special movable_ops page is certainly immovable. So we can identify
> pretty reliable what's movable and what's not.
> 
> Happy to learn how you would want to use that knowledge to reduce
> fragmentation. 🙂
> 
> So this reads a bit hand-wavy.

My thoughts align with Matthew's response. If we decide "This doesn't add
enough value to merge it upstream" thats fine by me.

Otherwise if we think this is useful, I can respin this with your
suggestion below.

> But I wonder, if this should actually go to the actual place where we
> alloc/free.
> 
> Now that it's no longer required to clear page types when freeing,
> alloc_thread_stack_node() might be a better place to set it, and to
> leave it set until freed.

I think this would be a better place to implement it as well.

> I'll leave Willy whether we actually want this type, cannot spot it
> under [1], but if we have sufficient types available, why not.
> 
> BUT
> 
> staring at [1], we allocate from vmalloc, so I would assume that these
> will be vmalloc-typed pages in the future and we cannot change the type
> later.
> 
> 
> [1] https://kernelnewbies.org/MatthewWilcox/Memdescs
> 
> 
> -- 
> Cheers
> 
> David / dhildenb
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: tag kernel stack pages
  2025-09-03 18:19   ` Vishal Moola (Oracle)
@ 2025-09-04 10:23     ` David Hildenbrand
  2025-09-05 17:47       ` Vishal Moola (Oracle)
  0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2025-09-04 10:23 UTC (permalink / raw)
  To: Vishal Moola (Oracle); +Cc: linux-mm, linux-kernel, Andrew Morton

On 03.09.25 20:19, Vishal Moola (Oracle) wrote:
> On Wed, Sep 03, 2025 at 09:49:06AM +0200, David Hildenbrand wrote:
>> [resending my original mail because it might have landed in the spam folder]
> 
> Ah, indeed the original mail was found in my spam folder. Thanks for
> resending.
> 
>> On 20.08.25 22:20, Vishal Moola (Oracle) wrote:
>>> Currently, we have no way to distinguish a kernel stack page from an
>>> unidentified page. Being able to track this information can be
>>> beneficial for optimizing kernel memory usage (i.e. analyzing
>>> fragmentation, location etc.). Knowing a page is being used for a kernel
>>> stack gives us more insight about pages that are certainly immovable and
>>> important to kernel functionality.
>>
>> It's a very niche use case. Anything that's not clearly a folio or a
>> special movable_ops page is certainly immovable. So we can identify
>> pretty reliable what's movable and what's not.
>>
>> Happy to learn how you would want to use that knowledge to reduce
>> fragmentation. 🙂
>>
>> So this reads a bit hand-wavy.
> 
> My thoughts align with Matthew's response. If we decide "This doesn't add
> enough value to merge it upstream" thats fine by me.
> 
> Otherwise if we think this is useful, I can respin this with your
> suggestion below.

As raised in my other mail, I assume there is no way to just have any 
stack pages in any kernel config marked appropriately (slab allocation 
discussion)?

If so, I prefer to not add it.

If there is a way to just make it consistent, then no strong opinion 
from my side. Willy is the page-type guard :)

BTW, I was wondering if page-owner could be useful instead.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: tag kernel stack pages
  2025-09-02 19:52   ` Matthew Wilcox
@ 2025-09-04 10:31     ` David Hildenbrand
  0 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2025-09-04 10:31 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Vishal Moola (Oracle), linux-mm, linux-kernel, Andrew Morton

On 02.09.25 21:52, Matthew Wilcox wrote:
> On Thu, Aug 21, 2025 at 02:44:31PM +0200, David Hildenbrand wrote:
>> On 20.08.25 22:20, Vishal Moola (Oracle) wrote:
>>> Currently, we have no way to distinguish a kernel stack page from an
>>> unidentified page. Being able to track this information can be
>>> beneficial for optimizing kernel memory usage (i.e. analyzing
>>> fragmentation, location etc.). Knowing a page is being used for a kernel
>>> stack gives us more insight about pages that are certainly immovable and
>>> important to kernel functionality.
>>
>> It's a very niche use case. Anything that's not clearly a folio or a special
>> movable_ops page is certainly immovable. So we can identify pretty reliable
>> what's movable and what's not.
>>
>> Happy to learn how you would want to use that knowledge to reduce
>> fragmentation. :)
>>
>> So this reads a bit hand-wavy.
> 
> I have a theory that we should always be attempting to do aligned
> allocations if we can, falling back to individual allocations if
> we can't.  This is an attempt to gather some data to inform us whether
> that theory is true, and to help us measure whether any effort we
> take to improve that situation is effective.
> 
> Eyeballing the output of tools/testing/page-types certainly lends
> some credence to this.  On x86-64 with its 16KiB stacks and 4KiB
> page size, we often see four consecutive pages allocated as type
> KernelStack, and as you'd expect only about 25% of the time are they
> aligned to a 16KiB boundary.  That is, at least 75% of the time they
> prevent _two_ order-2 pages from being available.

I assume, ideally, you'd also know whether all these stack pages belong 
to the same thread, not various ones, right? ("context" can matter as well)

> 
> As you say, they're not movable.  I'm not sure if it makes sense to
> go to the effort of making them movable; it'd require interacting
> with the scheduler (to prevent the task we're relocating from
> being scheduled), and I don't think the realtime people would be
> terribly keen on that idea.  So that isn't one of the ideas we
> have on the table for improving matters.

Yeah, while possible I am also not sure if we always want that.

> 
> Ideas we have been batting around:
> 
>   - Have kernel stacks try to do an order-N allocation and vmap()
>     the result, fall back to current implementation
>   - Have vmalloc try to do an order-N allocation, fall back down the
>     orders on failure to allocate
>   - Change the alloc_bulk implementation to do the order-N allocation
>     and fall back
> 
> I'm sure other possibilities also exist.
> 
>> staring at [1], we allocate from vmalloc, so I would assume that these will
>> be vmalloc-typed pages in the future and we cannot change the type later.
>>
>> [1] https://kernelnewbies.org/MatthewWilcox/Memdescs
> 
> I see the vmalloc subtype as being a "we don't know any better" type.

I guess this could get nasty once we would have metadata assigned to the 
vmalloc allocations (struct vmdesc).

> We could allocate another subtype of type 0 to mean "kernel stacks"
> and have it be implicit that kernel stacks are allocated from vmalloc.

Yes, that would work.

> This would probably require that we have a vmalloc interface that lets us
> specify a subtype, which I think is probably something we'd want anyway.

vmalloc subtypes don't sound like a bad idea.

> 
> I think it's fine to say "This doesn't add enough value to merge it
> upstream".  I will note one minor advantage which is that typing these
> pages as PGTY_kstack today prevents them from being inadvertently mapped
> to userspace (whether by malicious code or innocent bug).

Yes, as raised elsewhere, if we can do this consistently today (stack -> 
PGTY_kstack), fine with me.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: tag kernel stack pages
  2025-09-04 10:23     ` David Hildenbrand
@ 2025-09-05 17:47       ` Vishal Moola (Oracle)
  0 siblings, 0 replies; 8+ messages in thread
From: Vishal Moola (Oracle) @ 2025-09-05 17:47 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: linux-mm, linux-kernel, Andrew Morton

On Thu, Sep 04, 2025 at 12:23:31PM +0200, David Hildenbrand wrote:
> On 03.09.25 20:19, Vishal Moola (Oracle) wrote:
> > On Wed, Sep 03, 2025 at 09:49:06AM +0200, David Hildenbrand wrote:
> > > [resending my original mail because it might have landed in the spam folder]
> > 
> > Ah, indeed the original mail was found in my spam folder. Thanks for
> > resending.
> > 
> > > On 20.08.25 22:20, Vishal Moola (Oracle) wrote:
> > > > Currently, we have no way to distinguish a kernel stack page from an
> > > > unidentified page. Being able to track this information can be
> > > > beneficial for optimizing kernel memory usage (i.e. analyzing
> > > > fragmentation, location etc.). Knowing a page is being used for a kernel
> > > > stack gives us more insight about pages that are certainly immovable and
> > > > important to kernel functionality.
> > > 
> > > It's a very niche use case. Anything that's not clearly a folio or a
> > > special movable_ops page is certainly immovable. So we can identify
> > > pretty reliable what's movable and what's not.
> > > 
> > > Happy to learn how you would want to use that knowledge to reduce
> > > fragmentation. 🙂
> > > 
> > > So this reads a bit hand-wavy.
> > 
> > My thoughts align with Matthew's response. If we decide "This doesn't add
> > enough value to merge it upstream" thats fine by me.
> > 
> > Otherwise if we think this is useful, I can respin this with your
> > suggestion below.
> 
> As raised in my other mail, I assume there is no way to just have any stack
> pages in any kernel config marked appropriately (slab allocation
> discussion)?
> 
> If so, I prefer to not add it.

I agree, this shouldn't be tied to specific kernel configs. We can leave this
out of tree.

I didn't know a page could only have one type, and trying to handle that
doesn't help explore what we're interested in right now anyway.

> If there is a way to just make it consistent, then no strong opinion from my
> side. Willy is the page-type guard :)
> 
> BTW, I was wondering if page-owner could be useful instead.

Thanks for the suggestion, page-owner looks useful for playing around with
different kernel stack allocation methods :)

> -- 
> Cheers
> 
> David / dhildenb
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-09-05 17:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-20 20:20 [PATCH] mm: tag kernel stack pages Vishal Moola (Oracle)
2025-08-21 12:44 ` David Hildenbrand
2025-09-02 19:52   ` Matthew Wilcox
2025-09-04 10:31     ` David Hildenbrand
2025-09-03  7:49 ` David Hildenbrand
2025-09-03 18:19   ` Vishal Moola (Oracle)
2025-09-04 10:23     ` David Hildenbrand
2025-09-05 17:47       ` Vishal Moola (Oracle)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).