[PATCH v2 0/2] mm: introduce per-order mTHP split counters

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] mm: introduce per-order mTHP split counters
@ 2024-06-28 13:07 Lance Yang
  2024-06-28 13:07 ` [PATCH v2 1/2] mm: add " Lance Yang
  2024-06-28 13:07 ` [PATCH v2 2/2] mm: add docs for " Lance Yang
  0 siblings, 2 replies; 14+ messages in thread
From: Lance Yang @ 2024-06-28 13:07 UTC (permalink / raw)
  To: akpm
  Cc: dj456119, 21cnbao, ryan.roberts, david, shy828301, ziy, libang.li,
	baolin.wang, linux-kernel, linux-mm, Lance Yang

Hi all,

Currently, the split counters in THP statistics no longer include
PTE-mapped mTHP. Therefore, we propose introducing per-order mTHP split
counters to monitor the frequency of mTHP splits. This will help developers
better analyze and optimize system performance.

/sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
        split
        split_failed
        split_deferred

---

Changes since v1 [1]
====================
 - mm: add per-order mTHP split counters
   - Update the changelog
   - Drop '_page' from mTHP split counter names (per David and Ryan)
   - Store the order of the folio in a variable and reuse it later
     (per Bang)
 - mm: add docs for per-order mTHP split counters
   - Improve the doc suggested by Ryan

[1] https://lore.kernel.org/linux-mm/20240424135148.30422-1-ioworker0@gmail.com

Lance Yang (2):
  mm: add per-order mTHP split counters
  mm: add docs for per-order mTHP split counters

 Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
 include/linux/huge_mm.h                    |  3 +++
 mm/huge_memory.c                           | 19 ++++++++++++++-----
 3 files changed, 33 insertions(+), 5 deletions(-)

-- 
2.45.2



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/2] mm: add per-order mTHP split counters
  2024-06-28 13:07 [PATCH v2 0/2] mm: introduce per-order mTHP split counters Lance Yang
@ 2024-06-28 13:07 ` Lance Yang
  2024-07-01  0:02   ` Barry Song
  2024-07-01  8:18   ` Ryan Roberts
  2024-06-28 13:07 ` [PATCH v2 2/2] mm: add docs for " Lance Yang
  1 sibling, 2 replies; 14+ messages in thread
From: Lance Yang @ 2024-06-28 13:07 UTC (permalink / raw)
  To: akpm
  Cc: dj456119, 21cnbao, ryan.roberts, david, shy828301, ziy, libang.li,
	baolin.wang, linux-kernel, linux-mm, Lance Yang, Mingzhe Yang

Currently, the split counters in THP statistics no longer include
PTE-mapped mTHP. Therefore, we propose introducing per-order mTHP split
counters to monitor the frequency of mTHP splits. This will help developers
better analyze and optimize system performance.

/sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
        split
        split_failed
        split_deferred

Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
 include/linux/huge_mm.h |  3 +++
 mm/huge_memory.c        | 19 ++++++++++++++-----
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 212cca384d7e..cee3c5da8f0e 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -284,6 +284,9 @@ enum mthp_stat_item {
 	MTHP_STAT_FILE_ALLOC,
 	MTHP_STAT_FILE_FALLBACK,
 	MTHP_STAT_FILE_FALLBACK_CHARGE,
+	MTHP_STAT_SPLIT,
+	MTHP_STAT_SPLIT_FAILED,
+	MTHP_STAT_SPLIT_DEFERRED,
 	__MTHP_STAT_COUNT
 };
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c7ce28f6b7f3..a633206375af 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -559,6 +559,9 @@ DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
 DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
 DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
 DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
+DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
+DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
+DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
 
 static struct attribute *stats_attrs[] = {
 	&anon_fault_alloc_attr.attr,
@@ -569,6 +572,9 @@ static struct attribute *stats_attrs[] = {
 	&file_alloc_attr.attr,
 	&file_fallback_attr.attr,
 	&file_fallback_charge_attr.attr,
+	&split_attr.attr,
+	&split_failed_attr.attr,
+	&split_deferred_attr.attr,
 	NULL,
 };
 
@@ -3068,7 +3074,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 	XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
 	struct anon_vma *anon_vma = NULL;
 	struct address_space *mapping = NULL;
-	bool is_thp = folio_test_pmd_mappable(folio);
+	int order = folio_order(folio);
 	int extra_pins, ret;
 	pgoff_t end;
 	bool is_hzp;
@@ -3076,7 +3082,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
 	VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
 
-	if (new_order >= folio_order(folio))
+	if (new_order >= order)
 		return -EINVAL;
 
 	if (folio_test_anon(folio)) {
@@ -3253,8 +3259,9 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 		i_mmap_unlock_read(mapping);
 out:
 	xas_destroy(&xas);
-	if (is_thp)
+	if (order >= HPAGE_PMD_ORDER)
 		count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
+	count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT : MTHP_STAT_SPLIT_FAILED);
 	return ret;
 }
 
@@ -3278,13 +3285,14 @@ void deferred_split_folio(struct folio *folio)
 #ifdef CONFIG_MEMCG
 	struct mem_cgroup *memcg = folio_memcg(folio);
 #endif
+	int order = folio_order(folio);
 	unsigned long flags;
 
 	/*
 	 * Order 1 folios have no space for a deferred list, but we also
 	 * won't waste much memory by not adding them to the deferred list.
 	 */
-	if (folio_order(folio) <= 1)
+	if (order <= 1)
 		return;
 
 	/*
@@ -3305,8 +3313,9 @@ void deferred_split_folio(struct folio *folio)
 
 	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
 	if (list_empty(&folio->_deferred_list)) {
-		if (folio_test_pmd_mappable(folio))
+		if (order >= HPAGE_PMD_ORDER)
 			count_vm_event(THP_DEFERRED_SPLIT_PAGE);
+		count_mthp_stat(order, MTHP_STAT_SPLIT_DEFERRED);
 		list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
 		ds_queue->split_queue_len++;
 #ifdef CONFIG_MEMCG
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] mm: add docs for per-order mTHP split counters
  2024-06-28 13:07 [PATCH v2 0/2] mm: introduce per-order mTHP split counters Lance Yang
  2024-06-28 13:07 ` [PATCH v2 1/2] mm: add " Lance Yang
@ 2024-06-28 13:07 ` Lance Yang
  2024-06-29  3:08   ` Barry Song
  2024-07-01  8:31   ` Ryan Roberts
  1 sibling, 2 replies; 14+ messages in thread
From: Lance Yang @ 2024-06-28 13:07 UTC (permalink / raw)
  To: akpm
  Cc: dj456119, 21cnbao, ryan.roberts, david, shy828301, ziy, libang.li,
	baolin.wang, linux-kernel, linux-mm, Lance Yang, Mingzhe Yang

This commit introduces documentation for mTHP split counters in
transhuge.rst.

Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
 Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 1f72b00af5d3..709fe10b60f4 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -514,6 +514,22 @@ file_fallback_charge
 	falls back to using small pages even though the allocation was
 	successful.
 
+split
+	is incremented every time a huge page is successfully split into
+	base pages. This can happen for a variety of reasons but a common
+	reason is that a huge page is old and is being reclaimed.
+	This action implies splitting any block mappings into PTEs.
+
+split_failed
+	is incremented if kernel fails to split huge
+	page. This can happen if the page was pinned by somebody.
+
+split_deferred
+	is incremented when a huge page is put onto split
+	queue. This happens when a huge page is partially unmapped and
+	splitting it would free up some memory. Pages on split queue are
+	going to be split under memory pressure.
+
 As the system ages, allocating huge pages may be expensive as the
 system uses memory compaction to copy data around memory to free a
 huge page for use. There are some counters in ``/proc/vmstat`` to help
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] mm: add docs for per-order mTHP split counters
  2024-06-28 13:07 ` [PATCH v2 2/2] mm: add docs for " Lance Yang
@ 2024-06-29  3:08   ` Barry Song
  2024-06-29 14:30     ` Lance Yang
  2024-07-01  8:31   ` Ryan Roberts
  1 sibling, 1 reply; 14+ messages in thread
From: Barry Song @ 2024-06-29  3:08 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, dj456119, ryan.roberts, david, shy828301, ziy, libang.li,
	baolin.wang, linux-kernel, linux-mm, Mingzhe Yang

On Sat, Jun 29, 2024 at 1:09 AM Lance Yang <ioworker0@gmail.com> wrote:
>
> This commit introduces documentation for mTHP split counters in
> transhuge.rst.
>
> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>

Reviewed-by: Barry Song <baohua@kernel.org>

> ---
>  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index 1f72b00af5d3..709fe10b60f4 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -514,6 +514,22 @@ file_fallback_charge
>         falls back to using small pages even though the allocation was
>         successful.
>
> +split
> +       is incremented every time a huge page is successfully split into
> +       base pages. This can happen for a variety of reasons but a common
> +       reason is that a huge page is old and is being reclaimed.
> +       This action implies splitting any block mappings into PTEs.
> +
> +split_failed
> +       is incremented if kernel fails to split huge
> +       page. This can happen if the page was pinned by somebody.
> +
> +split_deferred
> +       is incremented when a huge page is put onto split
> +       queue. This happens when a huge page is partially unmapped and
> +       splitting it would free up some memory. Pages on split queue are
> +       going to be split under memory pressure.
> +
>  As the system ages, allocating huge pages may be expensive as the
>  system uses memory compaction to copy data around memory to free a
>  huge page for use. There are some counters in ``/proc/vmstat`` to help
> --
> 2.45.2
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] mm: add docs for per-order mTHP split counters
  2024-06-29  3:08   ` Barry Song
@ 2024-06-29 14:30     ` Lance Yang
  0 siblings, 0 replies; 14+ messages in thread
From: Lance Yang @ 2024-06-29 14:30 UTC (permalink / raw)
  To: Barry Song
  Cc: akpm, dj456119, ryan.roberts, david, shy828301, ziy, libang.li,
	baolin.wang, linux-kernel, linux-mm, Mingzhe Yang

Hi Barry,

Thanks a lot for taking time to review!

On Sat, Jun 29, 2024 at 11:08 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Jun 29, 2024 at 1:09 AM Lance Yang <ioworker0@gmail.com> wrote:
> >
> > This commit introduces documentation for mTHP split counters in
> > transhuge.rst.
> >
> > Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
>
> Reviewed-by: Barry Song <baohua@kernel.org>

Have a nice weekend ;)
Lance

>
> > ---
> >  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index 1f72b00af5d3..709fe10b60f4 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -514,6 +514,22 @@ file_fallback_charge
> >         falls back to using small pages even though the allocation was
> >         successful.
> >
> > +split
> > +       is incremented every time a huge page is successfully split into
> > +       base pages. This can happen for a variety of reasons but a common
> > +       reason is that a huge page is old and is being reclaimed.
> > +       This action implies splitting any block mappings into PTEs.
> > +
> > +split_failed
> > +       is incremented if kernel fails to split huge
> > +       page. This can happen if the page was pinned by somebody.
> > +
> > +split_deferred
> > +       is incremented when a huge page is put onto split
> > +       queue. This happens when a huge page is partially unmapped and
> > +       splitting it would free up some memory. Pages on split queue are
> > +       going to be split under memory pressure.
> > +
> >  As the system ages, allocating huge pages may be expensive as the
> >  system uses memory compaction to copy data around memory to free a
> >  huge page for use. There are some counters in ``/proc/vmstat`` to help
> > --
> > 2.45.2
> >


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] mm: add per-order mTHP split counters
  2024-06-28 13:07 ` [PATCH v2 1/2] mm: add " Lance Yang
@ 2024-07-01  0:02   ` Barry Song
  2024-07-01  1:42     ` Lance Yang
  2024-07-01  2:23     ` Baolin Wang
  2024-07-01  8:18   ` Ryan Roberts
  1 sibling, 2 replies; 14+ messages in thread
From: Barry Song @ 2024-07-01  0:02 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, dj456119, ryan.roberts, david, shy828301, ziy, libang.li,
	baolin.wang, linux-kernel, linux-mm, Mingzhe Yang

On Sat, Jun 29, 2024 at 1:09 AM Lance Yang <ioworker0@gmail.com> wrote:
>
> Currently, the split counters in THP statistics no longer include
> PTE-mapped mTHP. Therefore, we propose introducing per-order mTHP split
> counters to monitor the frequency of mTHP splits. This will help developers
> better analyze and optimize system performance.
>
> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
>         split
>         split_failed
>         split_deferred
>
> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>

Personally, I'm not convinced that using a temporary variable order
makes the code
more readable. Functions like folio_test_pmd_mappable() seem more readable to
me. In any case, it's a minor issue.

Acked-by: Barry Song <baohua@kernel.org>

> ---
>  include/linux/huge_mm.h |  3 +++
>  mm/huge_memory.c        | 19 ++++++++++++++-----
>  2 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 212cca384d7e..cee3c5da8f0e 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -284,6 +284,9 @@ enum mthp_stat_item {
>         MTHP_STAT_FILE_ALLOC,
>         MTHP_STAT_FILE_FALLBACK,
>         MTHP_STAT_FILE_FALLBACK_CHARGE,
> +       MTHP_STAT_SPLIT,
> +       MTHP_STAT_SPLIT_FAILED,
> +       MTHP_STAT_SPLIT_DEFERRED,
>         __MTHP_STAT_COUNT
>  };
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c7ce28f6b7f3..a633206375af 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -559,6 +559,9 @@ DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
>  DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
>  DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
>  DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
> +DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
> +DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
> +DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
>
>  static struct attribute *stats_attrs[] = {
>         &anon_fault_alloc_attr.attr,
> @@ -569,6 +572,9 @@ static struct attribute *stats_attrs[] = {
>         &file_alloc_attr.attr,
>         &file_fallback_attr.attr,
>         &file_fallback_charge_attr.attr,
> +       &split_attr.attr,
> +       &split_failed_attr.attr,
> +       &split_deferred_attr.attr,
>         NULL,
>  };
>
> @@ -3068,7 +3074,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>         XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
>         struct anon_vma *anon_vma = NULL;
>         struct address_space *mapping = NULL;
> -       bool is_thp = folio_test_pmd_mappable(folio);
> +       int order = folio_order(folio);
>         int extra_pins, ret;
>         pgoff_t end;
>         bool is_hzp;
> @@ -3076,7 +3082,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>         VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
>         VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
>
> -       if (new_order >= folio_order(folio))
> +       if (new_order >= order)
>                 return -EINVAL;
>
>         if (folio_test_anon(folio)) {
> @@ -3253,8 +3259,9 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>                 i_mmap_unlock_read(mapping);
>  out:
>         xas_destroy(&xas);
> -       if (is_thp)
> +       if (order >= HPAGE_PMD_ORDER)
>                 count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
> +       count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT : MTHP_STAT_SPLIT_FAILED);
>         return ret;
>  }
>
> @@ -3278,13 +3285,14 @@ void deferred_split_folio(struct folio *folio)
>  #ifdef CONFIG_MEMCG
>         struct mem_cgroup *memcg = folio_memcg(folio);
>  #endif
> +       int order = folio_order(folio);
>         unsigned long flags;
>
>         /*
>          * Order 1 folios have no space for a deferred list, but we also
>          * won't waste much memory by not adding them to the deferred list.
>          */
> -       if (folio_order(folio) <= 1)
> +       if (order <= 1)
>                 return;
>
>         /*
> @@ -3305,8 +3313,9 @@ void deferred_split_folio(struct folio *folio)
>
>         spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>         if (list_empty(&folio->_deferred_list)) {
> -               if (folio_test_pmd_mappable(folio))
> +               if (order >= HPAGE_PMD_ORDER)
>                         count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> +               count_mthp_stat(order, MTHP_STAT_SPLIT_DEFERRED);
>                 list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
>                 ds_queue->split_queue_len++;
>  #ifdef CONFIG_MEMCG
> --
> 2.45.2
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] mm: add per-order mTHP split counters
  2024-07-01  0:02   ` Barry Song
@ 2024-07-01  1:42     ` Lance Yang
  2024-07-01  2:23     ` Baolin Wang
  1 sibling, 0 replies; 14+ messages in thread
From: Lance Yang @ 2024-07-01  1:42 UTC (permalink / raw)
  To: Barry Song
  Cc: akpm, dj456119, ryan.roberts, david, shy828301, ziy, libang.li,
	baolin.wang, linux-kernel, linux-mm, Mingzhe Yang

Hi Barry,

Thanks  for taking time to review!

On Mon, Jul 1, 2024 at 8:02 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Jun 29, 2024 at 1:09 AM Lance Yang <ioworker0@gmail.com> wrote:
> >
> > Currently, the split counters in THP statistics no longer include
> > PTE-mapped mTHP. Therefore, we propose introducing per-order mTHP split
> > counters to monitor the frequency of mTHP splits. This will help developers
> > better analyze and optimize system performance.
> >
> > /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
> >         split
> >         split_failed
> >         split_deferred
> >
> > Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
>
> Personally, I'm not convinced that using a temporary variable order
> makes the code
> more readable. Functions like folio_test_pmd_mappable() seem more readable to

Agreed. Using functions like folio_test_pmd_mappable() is more readable
for THP checks.

> me. In any case, it's a minor issue.

I'd like to hear other opinions as well ;)

>
> Acked-by: Barry Song <baohua@kernel.org>

Thanks again for your time!
Lance

>
> > ---
> >  include/linux/huge_mm.h |  3 +++
> >  mm/huge_memory.c        | 19 ++++++++++++++-----
> >  2 files changed, 17 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 212cca384d7e..cee3c5da8f0e 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -284,6 +284,9 @@ enum mthp_stat_item {
> >         MTHP_STAT_FILE_ALLOC,
> >         MTHP_STAT_FILE_FALLBACK,
> >         MTHP_STAT_FILE_FALLBACK_CHARGE,
> > +       MTHP_STAT_SPLIT,
> > +       MTHP_STAT_SPLIT_FAILED,
> > +       MTHP_STAT_SPLIT_DEFERRED,
> >         __MTHP_STAT_COUNT
> >  };
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index c7ce28f6b7f3..a633206375af 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -559,6 +559,9 @@ DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
> >  DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
> >  DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
> >  DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
> > +DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
> > +DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
> > +DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
> >
> >  static struct attribute *stats_attrs[] = {
> >         &anon_fault_alloc_attr.attr,
> > @@ -569,6 +572,9 @@ static struct attribute *stats_attrs[] = {
> >         &file_alloc_attr.attr,
> >         &file_fallback_attr.attr,
> >         &file_fallback_charge_attr.attr,
> > +       &split_attr.attr,
> > +       &split_failed_attr.attr,
> > +       &split_deferred_attr.attr,
> >         NULL,
> >  };
> >
> > @@ -3068,7 +3074,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >         XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
> >         struct anon_vma *anon_vma = NULL;
> >         struct address_space *mapping = NULL;
> > -       bool is_thp = folio_test_pmd_mappable(folio);
> > +       int order = folio_order(folio);
> >         int extra_pins, ret;
> >         pgoff_t end;
> >         bool is_hzp;
> > @@ -3076,7 +3082,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >         VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
> >         VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
> >
> > -       if (new_order >= folio_order(folio))
> > +       if (new_order >= order)
> >                 return -EINVAL;
> >
> >         if (folio_test_anon(folio)) {
> > @@ -3253,8 +3259,9 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >                 i_mmap_unlock_read(mapping);
> >  out:
> >         xas_destroy(&xas);
> > -       if (is_thp)
> > +       if (order >= HPAGE_PMD_ORDER)
> >                 count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
> > +       count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT : MTHP_STAT_SPLIT_FAILED);
> >         return ret;
> >  }
> >
> > @@ -3278,13 +3285,14 @@ void deferred_split_folio(struct folio *folio)
> >  #ifdef CONFIG_MEMCG
> >         struct mem_cgroup *memcg = folio_memcg(folio);
> >  #endif
> > +       int order = folio_order(folio);
> >         unsigned long flags;
> >
> >         /*
> >          * Order 1 folios have no space for a deferred list, but we also
> >          * won't waste much memory by not adding them to the deferred list.
> >          */
> > -       if (folio_order(folio) <= 1)
> > +       if (order <= 1)
> >                 return;
> >
> >         /*
> > @@ -3305,8 +3313,9 @@ void deferred_split_folio(struct folio *folio)
> >
> >         spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> >         if (list_empty(&folio->_deferred_list)) {
> > -               if (folio_test_pmd_mappable(folio))
> > +               if (order >= HPAGE_PMD_ORDER)
> >                         count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> > +               count_mthp_stat(order, MTHP_STAT_SPLIT_DEFERRED);
> >                 list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
> >                 ds_queue->split_queue_len++;
> >  #ifdef CONFIG_MEMCG
> > --
> > 2.45.2
> >


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] mm: add per-order mTHP split counters
  2024-07-01  0:02   ` Barry Song
  2024-07-01  1:42     ` Lance Yang
@ 2024-07-01  2:23     ` Baolin Wang
  2024-07-01 10:36       ` Lance Yang
  1 sibling, 1 reply; 14+ messages in thread
From: Baolin Wang @ 2024-07-01  2:23 UTC (permalink / raw)
  To: Barry Song, Lance Yang
  Cc: akpm, dj456119, ryan.roberts, david, shy828301, ziy, libang.li,
	linux-kernel, linux-mm, Mingzhe Yang



On 2024/7/1 08:02, Barry Song wrote:
> On Sat, Jun 29, 2024 at 1:09 AM Lance Yang <ioworker0@gmail.com> wrote:
>>
>> Currently, the split counters in THP statistics no longer include
>> PTE-mapped mTHP. Therefore, we propose introducing per-order mTHP split
>> counters to monitor the frequency of mTHP splits. This will help developers
>> better analyze and optimize system performance.
>>
>> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
>>          split
>>          split_failed
>>          split_deferred
>>
>> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
>> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> 
> Personally, I'm not convinced that using a temporary variable order
> makes the code
> more readable. Functions like folio_test_pmd_mappable() seem more readable to
> me. In any case, it's a minor issue.

Yes, I have the same opinion as Barry. With that:
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>


> Acked-by: Barry Song <baohua@kernel.org>
> 
>> ---
>>   include/linux/huge_mm.h |  3 +++
>>   mm/huge_memory.c        | 19 ++++++++++++++-----
>>   2 files changed, 17 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 212cca384d7e..cee3c5da8f0e 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -284,6 +284,9 @@ enum mthp_stat_item {
>>          MTHP_STAT_FILE_ALLOC,
>>          MTHP_STAT_FILE_FALLBACK,
>>          MTHP_STAT_FILE_FALLBACK_CHARGE,
>> +       MTHP_STAT_SPLIT,
>> +       MTHP_STAT_SPLIT_FAILED,
>> +       MTHP_STAT_SPLIT_DEFERRED,
>>          __MTHP_STAT_COUNT
>>   };
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index c7ce28f6b7f3..a633206375af 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -559,6 +559,9 @@ DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
>>   DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
>>   DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
>>   DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
>> +DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
>> +DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
>> +DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
>>
>>   static struct attribute *stats_attrs[] = {
>>          &anon_fault_alloc_attr.attr,
>> @@ -569,6 +572,9 @@ static struct attribute *stats_attrs[] = {
>>          &file_alloc_attr.attr,
>>          &file_fallback_attr.attr,
>>          &file_fallback_charge_attr.attr,
>> +       &split_attr.attr,
>> +       &split_failed_attr.attr,
>> +       &split_deferred_attr.attr,
>>          NULL,
>>   };
>>
>> @@ -3068,7 +3074,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>>          XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
>>          struct anon_vma *anon_vma = NULL;
>>          struct address_space *mapping = NULL;
>> -       bool is_thp = folio_test_pmd_mappable(folio);
>> +       int order = folio_order(folio);
>>          int extra_pins, ret;
>>          pgoff_t end;
>>          bool is_hzp;
>> @@ -3076,7 +3082,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>>          VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
>>          VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
>>
>> -       if (new_order >= folio_order(folio))
>> +       if (new_order >= order)
>>                  return -EINVAL;
>>
>>          if (folio_test_anon(folio)) {
>> @@ -3253,8 +3259,9 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>>                  i_mmap_unlock_read(mapping);
>>   out:
>>          xas_destroy(&xas);
>> -       if (is_thp)
>> +       if (order >= HPAGE_PMD_ORDER)
>>                  count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
>> +       count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT : MTHP_STAT_SPLIT_FAILED);
>>          return ret;
>>   }
>>
>> @@ -3278,13 +3285,14 @@ void deferred_split_folio(struct folio *folio)
>>   #ifdef CONFIG_MEMCG
>>          struct mem_cgroup *memcg = folio_memcg(folio);
>>   #endif
>> +       int order = folio_order(folio);
>>          unsigned long flags;
>>
>>          /*
>>           * Order 1 folios have no space for a deferred list, but we also
>>           * won't waste much memory by not adding them to the deferred list.
>>           */
>> -       if (folio_order(folio) <= 1)
>> +       if (order <= 1)
>>                  return;
>>
>>          /*
>> @@ -3305,8 +3313,9 @@ void deferred_split_folio(struct folio *folio)
>>
>>          spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>          if (list_empty(&folio->_deferred_list)) {
>> -               if (folio_test_pmd_mappable(folio))
>> +               if (order >= HPAGE_PMD_ORDER)
>>                          count_vm_event(THP_DEFERRED_SPLIT_PAGE);
>> +               count_mthp_stat(order, MTHP_STAT_SPLIT_DEFERRED);
>>                  list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
>>                  ds_queue->split_queue_len++;
>>   #ifdef CONFIG_MEMCG
>> --
>> 2.45.2
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] mm: add per-order mTHP split counters
  2024-06-28 13:07 ` [PATCH v2 1/2] mm: add " Lance Yang
  2024-07-01  0:02   ` Barry Song
@ 2024-07-01  8:18   ` Ryan Roberts
  2024-07-01 10:37     ` Lance Yang
  1 sibling, 1 reply; 14+ messages in thread
From: Ryan Roberts @ 2024-07-01  8:18 UTC (permalink / raw)
  To: Lance Yang, akpm
  Cc: dj456119, 21cnbao, david, shy828301, ziy, libang.li, baolin.wang,
	linux-kernel, linux-mm, Mingzhe Yang

On 28/06/2024 14:07, Lance Yang wrote:
> Currently, the split counters in THP statistics no longer include
> PTE-mapped mTHP. Therefore, we propose introducing per-order mTHP split
> counters to monitor the frequency of mTHP splits. This will help developers
> better analyze and optimize system performance.
> 
> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
>         split
>         split_failed
>         split_deferred
> 
> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>

LGTM!

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

> ---
>  include/linux/huge_mm.h |  3 +++
>  mm/huge_memory.c        | 19 ++++++++++++++-----
>  2 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 212cca384d7e..cee3c5da8f0e 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -284,6 +284,9 @@ enum mthp_stat_item {
>  	MTHP_STAT_FILE_ALLOC,
>  	MTHP_STAT_FILE_FALLBACK,
>  	MTHP_STAT_FILE_FALLBACK_CHARGE,
> +	MTHP_STAT_SPLIT,
> +	MTHP_STAT_SPLIT_FAILED,
> +	MTHP_STAT_SPLIT_DEFERRED,
>  	__MTHP_STAT_COUNT
>  };
>  
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c7ce28f6b7f3..a633206375af 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -559,6 +559,9 @@ DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
>  DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
>  DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
>  DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
> +DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
> +DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
> +DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
>  
>  static struct attribute *stats_attrs[] = {
>  	&anon_fault_alloc_attr.attr,
> @@ -569,6 +572,9 @@ static struct attribute *stats_attrs[] = {
>  	&file_alloc_attr.attr,
>  	&file_fallback_attr.attr,
>  	&file_fallback_charge_attr.attr,
> +	&split_attr.attr,
> +	&split_failed_attr.attr,
> +	&split_deferred_attr.attr,
>  	NULL,
>  };
>  
> @@ -3068,7 +3074,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>  	XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
>  	struct anon_vma *anon_vma = NULL;
>  	struct address_space *mapping = NULL;
> -	bool is_thp = folio_test_pmd_mappable(folio);
> +	int order = folio_order(folio);
>  	int extra_pins, ret;
>  	pgoff_t end;
>  	bool is_hzp;
> @@ -3076,7 +3082,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>  	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
>  	VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
>  
> -	if (new_order >= folio_order(folio))
> +	if (new_order >= order)
>  		return -EINVAL;
>  
>  	if (folio_test_anon(folio)) {
> @@ -3253,8 +3259,9 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>  		i_mmap_unlock_read(mapping);
>  out:
>  	xas_destroy(&xas);
> -	if (is_thp)
> +	if (order >= HPAGE_PMD_ORDER)
>  		count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
> +	count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT : MTHP_STAT_SPLIT_FAILED);
>  	return ret;
>  }
>  
> @@ -3278,13 +3285,14 @@ void deferred_split_folio(struct folio *folio)
>  #ifdef CONFIG_MEMCG
>  	struct mem_cgroup *memcg = folio_memcg(folio);
>  #endif
> +	int order = folio_order(folio);
>  	unsigned long flags;
>  
>  	/*
>  	 * Order 1 folios have no space for a deferred list, but we also
>  	 * won't waste much memory by not adding them to the deferred list.
>  	 */
> -	if (folio_order(folio) <= 1)
> +	if (order <= 1)
>  		return;
>  
>  	/*
> @@ -3305,8 +3313,9 @@ void deferred_split_folio(struct folio *folio)
>  
>  	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>  	if (list_empty(&folio->_deferred_list)) {
> -		if (folio_test_pmd_mappable(folio))
> +		if (order >= HPAGE_PMD_ORDER)
>  			count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> +		count_mthp_stat(order, MTHP_STAT_SPLIT_DEFERRED);
>  		list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
>  		ds_queue->split_queue_len++;
>  #ifdef CONFIG_MEMCG



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] mm: add docs for per-order mTHP split counters
  2024-06-28 13:07 ` [PATCH v2 2/2] mm: add docs for " Lance Yang
  2024-06-29  3:08   ` Barry Song
@ 2024-07-01  8:31   ` Ryan Roberts
  2024-07-01 10:50     ` Lance Yang
  1 sibling, 1 reply; 14+ messages in thread
From: Ryan Roberts @ 2024-07-01  8:31 UTC (permalink / raw)
  To: Lance Yang, akpm
  Cc: dj456119, 21cnbao, david, shy828301, ziy, libang.li, baolin.wang,
	linux-kernel, linux-mm, Mingzhe Yang

On 28/06/2024 14:07, Lance Yang wrote:
> This commit introduces documentation for mTHP split counters in
> transhuge.rst.
> 
> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> ---
>  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index 1f72b00af5d3..709fe10b60f4 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -514,6 +514,22 @@ file_fallback_charge
>  	falls back to using small pages even though the allocation was
>  	successful.

I note at the top of this section there is a note:

Monitoring usage
================

.. note::
   Currently the below counters only record events relating to
   PMD-sized THP. Events relating to other THP sizes are not included.

Which is out of date, now that we support mTHP stats. Perhaps it should be removed?

>  
> +split
> +	is incremented every time a huge page is successfully split into
> +	base pages. This can happen for a variety of reasons but a common
> +	reason is that a huge page is old and is being reclaimed.
> +	This action implies splitting any block mappings into PTEs.

Now that I'm reading this, I'm reminded that Yang Shi suggested at LSFMM that a
potential aid so solving the swap-out fragmentation problem is to split high
orders to lower (but not 0) orders. I don't know if we would take that route,
but in principle it sounds like splitting mTHP to smaller mTHP might be
something we want some day. I wonder if we should spec this counter to also
include splits to smaller orders and not just splits to base pages?

Actually looking at the code, I think split_huge_page_to_list_to_order(order>0)
would already increment this counter without actually splitting to base pages.
So the documantation should probably just reflect that.

> +
> +split_failed
> +	is incremented if kernel fails to split huge
> +	page. This can happen if the page was pinned by somebody.
> +
> +split_deferred
> +	is incremented when a huge page is put onto split
> +	queue. This happens when a huge page is partially unmapped and
> +	splitting it would free up some memory. Pages on split queue are
> +	going to be split under memory pressure.
> +
>  As the system ages, allocating huge pages may be expensive as the
>  system uses memory compaction to copy data around memory to free a
>  huge page for use. There are some counters in ``/proc/vmstat`` to help

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] mm: add per-order mTHP split counters
  2024-07-01  2:23     ` Baolin Wang
@ 2024-07-01 10:36       ` Lance Yang
  0 siblings, 0 replies; 14+ messages in thread
From: Lance Yang @ 2024-07-01 10:36 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Barry Song, akpm, dj456119, ryan.roberts, david, shy828301, ziy,
	libang.li, linux-kernel, linux-mm, Mingzhe Yang

Hi Baolin,

Thanks for taking time to review!

On Mon, Jul 1, 2024 at 10:23 AM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
>
>
> On 2024/7/1 08:02, Barry Song wrote:
> > On Sat, Jun 29, 2024 at 1:09 AM Lance Yang <ioworker0@gmail.com> wrote:
> >>
> >> Currently, the split counters in THP statistics no longer include
> >> PTE-mapped mTHP. Therefore, we propose introducing per-order mTHP split
> >> counters to monitor the frequency of mTHP splits. This will help developers
> >> better analyze and optimize system performance.
> >>
> >> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
> >>          split
> >>          split_failed
> >>          split_deferred
> >>
> >> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> >> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> >
> > Personally, I'm not convinced that using a temporary variable order
> > makes the code
> > more readable. Functions like folio_test_pmd_mappable() seem more readable to
> > me. In any case, it's a minor issue.
>
> Yes, I have the same opinion as Barry. With that:
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

Thanks again for your opinion!
Lance

>
>
> > Acked-by: Barry Song <baohua@kernel.org>
> >
> >> ---
> >>   include/linux/huge_mm.h |  3 +++
> >>   mm/huge_memory.c        | 19 ++++++++++++++-----
> >>   2 files changed, 17 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >> index 212cca384d7e..cee3c5da8f0e 100644
> >> --- a/include/linux/huge_mm.h
> >> +++ b/include/linux/huge_mm.h
> >> @@ -284,6 +284,9 @@ enum mthp_stat_item {
> >>          MTHP_STAT_FILE_ALLOC,
> >>          MTHP_STAT_FILE_FALLBACK,
> >>          MTHP_STAT_FILE_FALLBACK_CHARGE,
> >> +       MTHP_STAT_SPLIT,
> >> +       MTHP_STAT_SPLIT_FAILED,
> >> +       MTHP_STAT_SPLIT_DEFERRED,
> >>          __MTHP_STAT_COUNT
> >>   };
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index c7ce28f6b7f3..a633206375af 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -559,6 +559,9 @@ DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
> >>   DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
> >>   DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
> >>   DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
> >> +DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
> >> +DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
> >> +DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
> >>
> >>   static struct attribute *stats_attrs[] = {
> >>          &anon_fault_alloc_attr.attr,
> >> @@ -569,6 +572,9 @@ static struct attribute *stats_attrs[] = {
> >>          &file_alloc_attr.attr,
> >>          &file_fallback_attr.attr,
> >>          &file_fallback_charge_attr.attr,
> >> +       &split_attr.attr,
> >> +       &split_failed_attr.attr,
> >> +       &split_deferred_attr.attr,
> >>          NULL,
> >>   };
> >>
> >> @@ -3068,7 +3074,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >>          XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
> >>          struct anon_vma *anon_vma = NULL;
> >>          struct address_space *mapping = NULL;
> >> -       bool is_thp = folio_test_pmd_mappable(folio);
> >> +       int order = folio_order(folio);
> >>          int extra_pins, ret;
> >>          pgoff_t end;
> >>          bool is_hzp;
> >> @@ -3076,7 +3082,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >>          VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
> >>          VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
> >>
> >> -       if (new_order >= folio_order(folio))
> >> +       if (new_order >= order)
> >>                  return -EINVAL;
> >>
> >>          if (folio_test_anon(folio)) {
> >> @@ -3253,8 +3259,9 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >>                  i_mmap_unlock_read(mapping);
> >>   out:
> >>          xas_destroy(&xas);
> >> -       if (is_thp)
> >> +       if (order >= HPAGE_PMD_ORDER)
> >>                  count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
> >> +       count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT : MTHP_STAT_SPLIT_FAILED);
> >>          return ret;
> >>   }
> >>
> >> @@ -3278,13 +3285,14 @@ void deferred_split_folio(struct folio *folio)
> >>   #ifdef CONFIG_MEMCG
> >>          struct mem_cgroup *memcg = folio_memcg(folio);
> >>   #endif
> >> +       int order = folio_order(folio);
> >>          unsigned long flags;
> >>
> >>          /*
> >>           * Order 1 folios have no space for a deferred list, but we also
> >>           * won't waste much memory by not adding them to the deferred list.
> >>           */
> >> -       if (folio_order(folio) <= 1)
> >> +       if (order <= 1)
> >>                  return;
> >>
> >>          /*
> >> @@ -3305,8 +3313,9 @@ void deferred_split_folio(struct folio *folio)
> >>
> >>          spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> >>          if (list_empty(&folio->_deferred_list)) {
> >> -               if (folio_test_pmd_mappable(folio))
> >> +               if (order >= HPAGE_PMD_ORDER)
> >>                          count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> >> +               count_mthp_stat(order, MTHP_STAT_SPLIT_DEFERRED);
> >>                  list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
> >>                  ds_queue->split_queue_len++;
> >>   #ifdef CONFIG_MEMCG
> >> --
> >> 2.45.2
> >>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] mm: add per-order mTHP split counters
  2024-07-01  8:18   ` Ryan Roberts
@ 2024-07-01 10:37     ` Lance Yang
  0 siblings, 0 replies; 14+ messages in thread
From: Lance Yang @ 2024-07-01 10:37 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: akpm, dj456119, 21cnbao, david, shy828301, ziy, libang.li,
	baolin.wang, linux-kernel, linux-mm, Mingzhe Yang

Hi Ryan,

On Mon, Jul 1, 2024 at 4:18 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 28/06/2024 14:07, Lance Yang wrote:
> > Currently, the split counters in THP statistics no longer include
> > PTE-mapped mTHP. Therefore, we propose introducing per-order mTHP split
> > counters to monitor the frequency of mTHP splits. This will help developers
> > better analyze and optimize system performance.
> >
> > /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
> >         split
> >         split_failed
> >         split_deferred
> >
> > Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
>
> LGTM!
>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

Thanks for taking time to review!
Lance

>
> > ---
> >  include/linux/huge_mm.h |  3 +++
> >  mm/huge_memory.c        | 19 ++++++++++++++-----
> >  2 files changed, 17 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 212cca384d7e..cee3c5da8f0e 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -284,6 +284,9 @@ enum mthp_stat_item {
> >       MTHP_STAT_FILE_ALLOC,
> >       MTHP_STAT_FILE_FALLBACK,
> >       MTHP_STAT_FILE_FALLBACK_CHARGE,
> > +     MTHP_STAT_SPLIT,
> > +     MTHP_STAT_SPLIT_FAILED,
> > +     MTHP_STAT_SPLIT_DEFERRED,
> >       __MTHP_STAT_COUNT
> >  };
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index c7ce28f6b7f3..a633206375af 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -559,6 +559,9 @@ DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
> >  DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
> >  DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
> >  DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
> > +DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
> > +DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
> > +DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
> >
> >  static struct attribute *stats_attrs[] = {
> >       &anon_fault_alloc_attr.attr,
> > @@ -569,6 +572,9 @@ static struct attribute *stats_attrs[] = {
> >       &file_alloc_attr.attr,
> >       &file_fallback_attr.attr,
> >       &file_fallback_charge_attr.attr,
> > +     &split_attr.attr,
> > +     &split_failed_attr.attr,
> > +     &split_deferred_attr.attr,
> >       NULL,
> >  };
> >
> > @@ -3068,7 +3074,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >       XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
> >       struct anon_vma *anon_vma = NULL;
> >       struct address_space *mapping = NULL;
> > -     bool is_thp = folio_test_pmd_mappable(folio);
> > +     int order = folio_order(folio);
> >       int extra_pins, ret;
> >       pgoff_t end;
> >       bool is_hzp;
> > @@ -3076,7 +3082,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >       VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
> >       VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
> >
> > -     if (new_order >= folio_order(folio))
> > +     if (new_order >= order)
> >               return -EINVAL;
> >
> >       if (folio_test_anon(folio)) {
> > @@ -3253,8 +3259,9 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >               i_mmap_unlock_read(mapping);
> >  out:
> >       xas_destroy(&xas);
> > -     if (is_thp)
> > +     if (order >= HPAGE_PMD_ORDER)
> >               count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
> > +     count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT : MTHP_STAT_SPLIT_FAILED);
> >       return ret;
> >  }
> >
> > @@ -3278,13 +3285,14 @@ void deferred_split_folio(struct folio *folio)
> >  #ifdef CONFIG_MEMCG
> >       struct mem_cgroup *memcg = folio_memcg(folio);
> >  #endif
> > +     int order = folio_order(folio);
> >       unsigned long flags;
> >
> >       /*
> >        * Order 1 folios have no space for a deferred list, but we also
> >        * won't waste much memory by not adding them to the deferred list.
> >        */
> > -     if (folio_order(folio) <= 1)
> > +     if (order <= 1)
> >               return;
> >
> >       /*
> > @@ -3305,8 +3313,9 @@ void deferred_split_folio(struct folio *folio)
> >
> >       spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> >       if (list_empty(&folio->_deferred_list)) {
> > -             if (folio_test_pmd_mappable(folio))
> > +             if (order >= HPAGE_PMD_ORDER)
> >                       count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> > +             count_mthp_stat(order, MTHP_STAT_SPLIT_DEFERRED);
> >               list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
> >               ds_queue->split_queue_len++;
> >  #ifdef CONFIG_MEMCG
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] mm: add docs for per-order mTHP split counters
  2024-07-01  8:31   ` Ryan Roberts
@ 2024-07-01 10:50     ` Lance Yang
  2024-07-01 11:46       ` Ryan Roberts
  0 siblings, 1 reply; 14+ messages in thread
From: Lance Yang @ 2024-07-01 10:50 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: akpm, dj456119, 21cnbao, david, shy828301, ziy, libang.li,
	baolin.wang, linux-kernel, linux-mm, Mingzhe Yang

On Mon, Jul 1, 2024 at 4:31 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 28/06/2024 14:07, Lance Yang wrote:
> > This commit introduces documentation for mTHP split counters in
> > transhuge.rst.
> >
> > Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
> > ---
> >  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index 1f72b00af5d3..709fe10b60f4 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -514,6 +514,22 @@ file_fallback_charge
> >       falls back to using small pages even though the allocation was
> >       successful.
>
>
> I note at the top of this section there is a note:
>
> Monitoring usage
> ================
>
> .. note::
>    Currently the below counters only record events relating to
>    PMD-sized THP. Events relating to other THP sizes are not included.
>
> Which is out of date, now that we support mTHP stats. Perhaps it should be removed?

Good catch! Let's remove that in this patch ;)

>
> >
> > +split
> > +     is incremented every time a huge page is successfully split into
> > +     base pages. This can happen for a variety of reasons but a common
> > +     reason is that a huge page is old and is being reclaimed.
> > +     This action implies splitting any block mappings into PTEs.
>
> Now that I'm reading this, I'm reminded that Yang Shi suggested at LSFMM that a
> potential aid so solving the swap-out fragmentation problem is to split high
> orders to lower (but not 0) orders. I don't know if we would take that route,
> but in principle it sounds like splitting mTHP to smaller mTHP might be
> something we want some day. I wonder if we should spec this counter to also
> include splits to smaller orders and not just splits to base pages?
>
> Actually looking at the code, I think split_huge_page_to_list_to_order(order>0)
> would already increment this counter without actually splitting to base pages.
> So the documantation should probably just reflect that.

Yep, you're right.

It’s important that the documentation reflects that to ensure consistency.

How about "...  is successfully split into smaller orders. This can..."?

Thanks,
Lance

>
> > +
> > +split_failed
> > +     is incremented if kernel fails to split huge
> > +     page. This can happen if the page was pinned by somebody.
> > +
> > +split_deferred
> > +     is incremented when a huge page is put onto split
> > +     queue. This happens when a huge page is partially unmapped and
> > +     splitting it would free up some memory. Pages on split queue are
> > +     going to be split under memory pressure.
> > +
> >  As the system ages, allocating huge pages may be expensive as the
> >  system uses memory compaction to copy data around memory to free a
> >  huge page for use. There are some counters in ``/proc/vmstat`` to help
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] mm: add docs for per-order mTHP split counters
  2024-07-01 10:50     ` Lance Yang
@ 2024-07-01 11:46       ` Ryan Roberts
  0 siblings, 0 replies; 14+ messages in thread
From: Ryan Roberts @ 2024-07-01 11:46 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, dj456119, 21cnbao, david, shy828301, ziy, libang.li,
	baolin.wang, linux-kernel, linux-mm, Mingzhe Yang

On 01/07/2024 11:50, Lance Yang wrote:
> On Mon, Jul 1, 2024 at 4:31 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 28/06/2024 14:07, Lance Yang wrote:
>>> This commit introduces documentation for mTHP split counters in
>>> transhuge.rst.
>>>
>>> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
>>> Signed-off-by: Lance Yang <ioworker0@gmail.com>
>>> ---
>>>  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
>>>  1 file changed, 16 insertions(+)
>>>
>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>>> index 1f72b00af5d3..709fe10b60f4 100644
>>> --- a/Documentation/admin-guide/mm/transhuge.rst
>>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>>> @@ -514,6 +514,22 @@ file_fallback_charge
>>>       falls back to using small pages even though the allocation was
>>>       successful.
>>
>>
>> I note at the top of this section there is a note:
>>
>> Monitoring usage
>> ================
>>
>> .. note::
>>    Currently the below counters only record events relating to
>>    PMD-sized THP. Events relating to other THP sizes are not included.
>>
>> Which is out of date, now that we support mTHP stats. Perhaps it should be removed?
> 
> Good catch! Let's remove that in this patch ;)
> 
>>
>>>
>>> +split
>>> +     is incremented every time a huge page is successfully split into
>>> +     base pages. This can happen for a variety of reasons but a common
>>> +     reason is that a huge page is old and is being reclaimed.
>>> +     This action implies splitting any block mappings into PTEs.
>>
>> Now that I'm reading this, I'm reminded that Yang Shi suggested at LSFMM that a
>> potential aid so solving the swap-out fragmentation problem is to split high
>> orders to lower (but not 0) orders. I don't know if we would take that route,
>> but in principle it sounds like splitting mTHP to smaller mTHP might be
>> something we want some day. I wonder if we should spec this counter to also
>> include splits to smaller orders and not just splits to base pages?
>>
>> Actually looking at the code, I think split_huge_page_to_list_to_order(order>0)
>> would already increment this counter without actually splitting to base pages.
>> So the documantation should probably just reflect that.
> 
> Yep, you're right.
> 
> It’s important that the documentation reflects that to ensure consistency.
> 
> How about "...  is successfully split into smaller orders. This can..."?

fine by me.

> 
> Thanks,
> Lance
> 
>>
>>> +
>>> +split_failed
>>> +     is incremented if kernel fails to split huge
>>> +     page. This can happen if the page was pinned by somebody.
>>> +
>>> +split_deferred
>>> +     is incremented when a huge page is put onto split
>>> +     queue. This happens when a huge page is partially unmapped and
>>> +     splitting it would free up some memory. Pages on split queue are
>>> +     going to be split under memory pressure.
>>> +
>>>  As the system ages, allocating huge pages may be expensive as the
>>>  system uses memory compaction to copy data around memory to free a
>>>  huge page for use. There are some counters in ``/proc/vmstat`` to help
>>



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-07-01 11:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-28 13:07 [PATCH v2 0/2] mm: introduce per-order mTHP split counters Lance Yang
2024-06-28 13:07 ` [PATCH v2 1/2] mm: add " Lance Yang
2024-07-01  0:02   ` Barry Song
2024-07-01  1:42     ` Lance Yang
2024-07-01  2:23     ` Baolin Wang
2024-07-01 10:36       ` Lance Yang
2024-07-01  8:18   ` Ryan Roberts
2024-07-01 10:37     ` Lance Yang
2024-06-28 13:07 ` [PATCH v2 2/2] mm: add docs for " Lance Yang
2024-06-29  3:08   ` Barry Song
2024-06-29 14:30     ` Lance Yang
2024-07-01  8:31   ` Ryan Roberts
2024-07-01 10:50     ` Lance Yang
2024-07-01 11:46       ` Ryan Roberts

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).