linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next v6 0/2] Make memory reclamation measurable
@ 2024-01-05  1:36 Bixuan Cui
  2024-01-05  1:36 ` [PATCH -next v6 1/2] mm: shrinker: add new event to trace shrink count Bixuan Cui
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Bixuan Cui @ 2024-01-05  1:36 UTC (permalink / raw)
  To: akpm, rostedt, mhiramat, mathieu.desnoyers
  Cc: linux-kernel, linux-trace-kernel, linux-mm, opensource.kernel,
	cuibixuan

When the system memory is low, kswapd reclaims the memory. The key steps
of memory reclamation include
1.shrink_lruvec
  * shrink_active_list, moves folios from the active LRU to the inactive LRU
  * shrink_inactive_list, shrink lru from inactive LRU list
2.shrink_slab
  * shrinker->count_objects(), calculates the freeable memory
  * shrinker->scan_objects(), reclaims the slab memory

The existing tracers in the vmscan are as follows:

--do_try_to_free_pages
--shrink_zones
--trace_mm_vmscan_node_reclaim_begin (tracer)
--shrink_node
--shrink_node_memcgs
  --trace_mm_vmscan_memcg_shrink_begin (tracer)
  --shrink_lruvec
    --shrink_list
      --shrink_active_list
	  --trace_mm_vmscan_lru_shrink_active (tracer)
      --shrink_inactive_list
	  --trace_mm_vmscan_lru_shrink_inactive (tracer)
    --shrink_active_list
  --shrink_slab
    --do_shrink_slab
    --shrinker->count_objects()
    --trace_mm_shrink_slab_start (tracer)
    --shrinker->scan_objects()
    --trace_mm_shrink_slab_end (tracer)
  --trace_mm_vmscan_memcg_shrink_end (tracer)
--trace_mm_vmscan_node_reclaim_end (tracer)

If we get the duration and quantity of shrink lru and slab,
then we can measure the memory recycling, as follows

Measuring memory reclamation with bpf:
  LRU FILE:
	CPU COMM 	ShrinkActive(us) ShrinkInactive(us)  Reclaim(page)
	7   kswapd0	 	26		51		32
	7   kswapd0		52		47		13
  SLAB:
	CPU COMM 		OBJ_NAME		Count_Dur(us) Freeable(page) Scan_Dur(us) Reclaim(page)
	 1  kswapd0		super_cache_scan.cfi_jt     2		    341		   3225		128
	 7  kswapd0		super_cache_scan.cfi_jt     0		    2247	   8524		1024
	 7  kswapd0	        super_cache_scan.cfi_jt     2367	    0		   0		0

For this, add the new tracer to shrink_active_list/shrink_inactive_list
and shrinker->count_objects().

Changes:
v6: * Add Reviewed-by from Steven Rostedt.
v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to
replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start'
    * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru'
v4: Add Reviewed-by and Changlog to every patch.
v3: Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event.
v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error.

cuibixuan (2):
  mm: shrinker: add new event to trace shrink count
  mm: vmscan: add new event to trace shrink lru

 include/trace/events/vmscan.h | 80 ++++++++++++++++++++++++++++++++++-
 mm/shrinker.c                 |  4 ++
 mm/vmscan.c                   | 11 +++--
 3 files changed, 90 insertions(+), 5 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH -next v6 1/2] mm: shrinker: add new event to trace shrink count
  2024-01-05  1:36 [PATCH -next v6 0/2] Make memory reclamation measurable Bixuan Cui
@ 2024-01-05  1:36 ` Bixuan Cui
  2024-01-05  1:36 ` [PATCH -next v6 2/2] mm: vmscan: add new event to trace shrink lru Bixuan Cui
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Bixuan Cui @ 2024-01-05  1:36 UTC (permalink / raw)
  To: akpm, rostedt, mhiramat, mathieu.desnoyers
  Cc: linux-kernel, linux-trace-kernel, linux-mm, opensource.kernel,
	cuibixuan

From: cuibixuan <cuibixuan@vivo.com>

do_shrink_slab() calculates the freeable memory through shrinker->count_objects(),
and then reclaims the memory through shrinker->scan_objects(). When reclaiming
memory, shrinker->count_objects() takes a certain amount of time:

	Fun               spend(us)
	ext4_es_count     4302
	ext4_es_scan      12
	super_cache_count 4195
	super_cache_scan  2103

Therefore, adding the trace event to count_objects() can more accurately
obtain the time taken for slab memory recycling.

Example of output:
         kswapd0-103     [003] .....  1098.317942: mm_shrink_count_start: kfree_rcu_shrink_count.cfi_jt+0x0/0x8 00000000c540ff51: nid: 0
         kswapd0-103     [003] .....  1098.317951: mm_shrink_count_end: kfree_rcu_shrink_count.cfi_jt+0x0/0x8 00000000c540ff51: nid: 0 freeable:36

Signed-off-by: Bixuan Cui <cuibixuan@vivo.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Changes:
v6: * Add Reviewed-by from Steven Rostedt.
v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to
replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start'
    * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru'
v4: * Add Reviewed-by and Changlog to every patch.
v3: * Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event.
v2: * Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error (Andrew pointed out).

 include/trace/events/vmscan.h | 49 +++++++++++++++++++++++++++++++++++
 mm/shrinker.c                 |  4 +++
 2 files changed, 53 insertions(+)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 1a488c30afa5..b99cd28c9815 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -196,6 +196,55 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_softlimit_re
 );
 #endif /* CONFIG_MEMCG */
 
+TRACE_EVENT(mm_shrink_count_start,
+	TP_PROTO(struct shrinker *shr, struct shrink_control *sc),
+
+	TP_ARGS(shr, sc),
+
+	TP_STRUCT__entry(
+		__field(struct shrinker *, shr)
+		__field(void *, shrink)
+		__field(int, nid)
+	),
+
+	TP_fast_assign(
+		__entry->shr = shr;
+		__entry->shrink = shr->count_objects;
+		__entry->nid = sc->nid;
+	),
+
+	TP_printk("%pS %p: nid: %d",
+		__entry->shrink,
+		__entry->shr,
+		__entry->nid)
+);
+
+TRACE_EVENT(mm_shrink_count_end,
+	TP_PROTO(struct shrinker *shr, struct shrink_control *sc, long freeable),
+
+	TP_ARGS(shr, sc, freeable),
+
+	TP_STRUCT__entry(
+		__field(struct shrinker *, shr)
+		__field(void *, shrink)
+		__field(long, freeable)
+		__field(int, nid)
+	),
+
+	TP_fast_assign(
+		__entry->shr = shr;
+		__entry->shrink = shr->count_objects;
+		__entry->freeable = freeable;
+		__entry->nid = sc->nid;
+	),
+
+	TP_printk("%pS %p: nid: %d freeable:%ld",
+		__entry->shrink,
+		__entry->shr,
+		__entry->nid,
+		__entry->freeable)
+);
+
 TRACE_EVENT(mm_shrink_slab_start,
 	TP_PROTO(struct shrinker *shr, struct shrink_control *sc,
 		long nr_objects_to_shrink, unsigned long cache_items,
diff --git a/mm/shrinker.c b/mm/shrinker.c
index dd91eab43ed3..d0c7bf61db61 100644
--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -379,7 +379,11 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
 					  : SHRINK_BATCH;
 	long scanned = 0, next_deferred;
 
+	trace_mm_shrink_count_start(shrinker, shrinkctl);
+
 	freeable = shrinker->count_objects(shrinker, shrinkctl);
+
+	trace_mm_shrink_count_end(shrinker, shrinkctl, freeable);
 	if (freeable == 0 || freeable == SHRINK_EMPTY)
 		return freeable;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH -next v6 2/2] mm: vmscan: add new event to trace shrink lru
  2024-01-05  1:36 [PATCH -next v6 0/2] Make memory reclamation measurable Bixuan Cui
  2024-01-05  1:36 ` [PATCH -next v6 1/2] mm: shrinker: add new event to trace shrink count Bixuan Cui
@ 2024-01-05  1:36 ` Bixuan Cui
  2024-01-15  6:27 ` [PATCH -next v6 0/2] Make memory reclamation measurable Bixuan Cui
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Bixuan Cui @ 2024-01-05  1:36 UTC (permalink / raw)
  To: akpm, rostedt, mhiramat, mathieu.desnoyers
  Cc: linux-kernel, linux-trace-kernel, linux-mm, opensource.kernel,
	cuibixuan

From: cuibixuan <cuibixuan@vivo.com>

Page reclaim is an important part of memory reclaim, including:
  * shrink_active_list(), moves folios from the active LRU to the inactive LRU
  * shrink_inactive_list(), shrink lru from inactive LRU list

Add the new events to calculate the execution time to better evaluate 
the entire memory recycling ratio.

Example of output:
         kswapd0-103     [007] .....  1098.353020: mm_vmscan_lru_shrink_active_start: nid=0
         kswapd0-103     [007] .....  1098.353040: mm_vmscan_lru_shrink_active_end: nid=0 nr_taken=32 nr_active=0 nr_deactivated=32 nr_referenced=0 priority=6 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
         kswapd0-103     [007] .....  1098.353040: mm_vmscan_lru_shrink_inactive_start: nid=0
         kswapd0-103     [007] .....  1098.353094: mm_vmscan_lru_shrink_inactive_end: nid=0 nr_scanned=32 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 nr_ref_keep=32 nr_unmap_fail=0 priority=6 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
         kswapd0-103     [007] .....  1098.353094: mm_vmscan_lru_shrink_inactive_start: nid=0
         kswapd0-103     [007] .....  1098.353162: mm_vmscan_lru_shrink_inactive_end: nid=0 nr_scanned=32 nr_reclaimed=21 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 nr_ref_keep=11 nr_unmap_fail=0 priority=6 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC

Signed-off-by: Bixuan Cui <cuibixuan@vivo.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Changes:
v6: * Add Reviewed-by from Steven Rostedt.
v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to
replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start'
    * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru'
v4: * Add Reviewed-by and Changlog to every patch.
v3: * Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event.
v2: * Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error (Andrew pointed out).

 include/trace/events/vmscan.h | 31 +++++++++++++++++++++++++++++--
 mm/vmscan.c                   | 11 ++++++++---
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index b99cd28c9815..4793d952c248 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -395,7 +395,34 @@ TRACE_EVENT(mm_vmscan_write_folio,
 		show_reclaim_flags(__entry->reclaim_flags))
 );
 
-TRACE_EVENT(mm_vmscan_lru_shrink_inactive,
+DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template,
+
+	TP_PROTO(int nid),
+
+	TP_ARGS(nid),
+
+	TP_STRUCT__entry(
+		__field(int, nid)
+	),
+
+	TP_fast_assign(
+		__entry->nid = nid;
+	),
+
+	TP_printk("nid=%d", __entry->nid)
+);
+
+DEFINE_EVENT(mm_vmscan_lru_shrink_start_template, mm_vmscan_lru_shrink_inactive_start,
+	TP_PROTO(int nid),
+	TP_ARGS(nid)
+);
+
+DEFINE_EVENT(mm_vmscan_lru_shrink_start_template, mm_vmscan_lru_shrink_active_start,
+	TP_PROTO(int nid),
+	TP_ARGS(nid)
+);
+
+TRACE_EVENT(mm_vmscan_lru_shrink_inactive_end,
 
 	TP_PROTO(int nid,
 		unsigned long nr_scanned, unsigned long nr_reclaimed,
@@ -446,7 +473,7 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive,
 		show_reclaim_flags(__entry->reclaim_flags))
 );
 
-TRACE_EVENT(mm_vmscan_lru_shrink_active,
+TRACE_EVENT(mm_vmscan_lru_shrink_active_end,
 
 	TP_PROTO(int nid, unsigned long nr_taken,
 		unsigned long nr_active, unsigned long nr_deactivated,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4e3b835c6b4a..a44d9624d60f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1906,6 +1906,8 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 	bool stalled = false;
 
+	trace_mm_vmscan_lru_shrink_inactive_start(pgdat->node_id);
+
 	while (unlikely(too_many_isolated(pgdat, file, sc))) {
 		if (stalled)
 			return 0;
@@ -1990,7 +1992,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
 	if (file)
 		sc->nr.file_taken += nr_taken;
 
-	trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
+	trace_mm_vmscan_lru_shrink_inactive_end(pgdat->node_id,
 			nr_scanned, nr_reclaimed, &stat, sc->priority, file);
 	return nr_reclaimed;
 }
@@ -2028,6 +2030,8 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	int file = is_file_lru(lru);
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 
+	trace_mm_vmscan_lru_shrink_active_start(pgdat->node_id);
+
 	lru_add_drain();
 
 	spin_lock_irq(&lruvec->lru_lock);
@@ -2107,7 +2111,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 		lru_note_cost(lruvec, file, 0, nr_rotated);
 	mem_cgroup_uncharge_list(&l_active);
 	free_unref_page_list(&l_active);
-	trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_taken, nr_activate,
+	trace_mm_vmscan_lru_shrink_active_end(pgdat->node_id, nr_taken, nr_activate,
 			nr_deactivate, nr_rotated, sc->priority, file);
 }
 
@@ -4524,9 +4528,10 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 	if (list_empty(&list))
 		return scanned;
 retry:
+	trace_mm_vmscan_lru_shrink_inactive_start(pgdat->node_id);
 	reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false);
 	sc->nr_reclaimed += reclaimed;
-	trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
+	trace_mm_vmscan_lru_shrink_inactive_end(pgdat->node_id,
 			scanned, reclaimed, &stat, sc->priority,
 			type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH -next v6 0/2] Make memory reclamation measurable
  2024-01-05  1:36 [PATCH -next v6 0/2] Make memory reclamation measurable Bixuan Cui
  2024-01-05  1:36 ` [PATCH -next v6 1/2] mm: shrinker: add new event to trace shrink count Bixuan Cui
  2024-01-05  1:36 ` [PATCH -next v6 2/2] mm: vmscan: add new event to trace shrink lru Bixuan Cui
@ 2024-01-15  6:27 ` Bixuan Cui
  2024-01-24  2:41 ` Bixuan Cui
  2024-02-21  1:44 ` Bixuan Cui
  4 siblings, 0 replies; 12+ messages in thread
From: Bixuan Cui @ 2024-01-15  6:27 UTC (permalink / raw)
  To: akpm, rostedt, mhiramat, mathieu.desnoyers
  Cc: linux-kernel, linux-trace-kernel, linux-mm, opensource.kernel

ping~

在 2024/1/5 9:36, Bixuan Cui 写道:
> When the system memory is low, kswapd reclaims the memory. The key steps
> of memory reclamation include
> 1.shrink_lruvec
>    * shrink_active_list, moves folios from the active LRU to the inactive LRU
>    * shrink_inactive_list, shrink lru from inactive LRU list
> 2.shrink_slab
>    * shrinker->count_objects(), calculates the freeable memory
>    * shrinker->scan_objects(), reclaims the slab memory
> 
> The existing tracers in the vmscan are as follows:
> 
> --do_try_to_free_pages
> --shrink_zones
> --trace_mm_vmscan_node_reclaim_begin (tracer)
> --shrink_node
> --shrink_node_memcgs
>    --trace_mm_vmscan_memcg_shrink_begin (tracer)
>    --shrink_lruvec
>      --shrink_list
>        --shrink_active_list
> 	  --trace_mm_vmscan_lru_shrink_active (tracer)
>        --shrink_inactive_list
> 	  --trace_mm_vmscan_lru_shrink_inactive (tracer)
>      --shrink_active_list
>    --shrink_slab
>      --do_shrink_slab
>      --shrinker->count_objects()
>      --trace_mm_shrink_slab_start (tracer)
>      --shrinker->scan_objects()
>      --trace_mm_shrink_slab_end (tracer)
>    --trace_mm_vmscan_memcg_shrink_end (tracer)
> --trace_mm_vmscan_node_reclaim_end (tracer)
> 
> If we get the duration and quantity of shrink lru and slab,
> then we can measure the memory recycling, as follows
> 
> Measuring memory reclamation with bpf:
>    LRU FILE:
> 	CPU COMM 	ShrinkActive(us) ShrinkInactive(us)  Reclaim(page)
> 	7   kswapd0	 	26		51		32
> 	7   kswapd0		52		47		13
>    SLAB:
> 	CPU COMM 		OBJ_NAME		Count_Dur(us) Freeable(page) Scan_Dur(us) Reclaim(page)
> 	 1  kswapd0		super_cache_scan.cfi_jt     2		    341		   3225		128
> 	 7  kswapd0		super_cache_scan.cfi_jt     0		    2247	   8524		1024
> 	 7  kswapd0	        super_cache_scan.cfi_jt     2367	    0		   0		0
> 
> For this, add the new tracer to shrink_active_list/shrink_inactive_list
> and shrinker->count_objects().
> 
> Changes:
> v6: * Add Reviewed-by from Steven Rostedt.
> v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to
> replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start'
>      * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru'
> v4: Add Reviewed-by and Changlog to every patch.
> v3: Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event.
> v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error.
> 
> cuibixuan (2):
>    mm: shrinker: add new event to trace shrink count
>    mm: vmscan: add new event to trace shrink lru
> 
>   include/trace/events/vmscan.h | 80 ++++++++++++++++++++++++++++++++++-
>   mm/shrinker.c                 |  4 ++
>   mm/vmscan.c                   | 11 +++--
>   3 files changed, 90 insertions(+), 5 deletions(-)
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -next v6 0/2] Make memory reclamation measurable
  2024-01-05  1:36 [PATCH -next v6 0/2] Make memory reclamation measurable Bixuan Cui
                   ` (2 preceding siblings ...)
  2024-01-15  6:27 ` [PATCH -next v6 0/2] Make memory reclamation measurable Bixuan Cui
@ 2024-01-24  2:41 ` Bixuan Cui
  2024-02-21  1:44 ` Bixuan Cui
  4 siblings, 0 replies; 12+ messages in thread
From: Bixuan Cui @ 2024-01-24  2:41 UTC (permalink / raw)
  To: akpm, rostedt, mhiramat, mathieu.desnoyers
  Cc: linux-kernel, linux-trace-kernel, linux-mm, opensource.kernel

ping~

在 2024/1/5 9:36, Bixuan Cui 写道:
> When the system memory is low, kswapd reclaims the memory. The key steps
> of memory reclamation include
> 1.shrink_lruvec
>    * shrink_active_list, moves folios from the active LRU to the inactive LRU
>    * shrink_inactive_list, shrink lru from inactive LRU list
> 2.shrink_slab
>    * shrinker->count_objects(), calculates the freeable memory
>    * shrinker->scan_objects(), reclaims the slab memory
> 
> The existing tracers in the vmscan are as follows:
> 
> --do_try_to_free_pages
> --shrink_zones
> --trace_mm_vmscan_node_reclaim_begin (tracer)
> --shrink_node
> --shrink_node_memcgs
>    --trace_mm_vmscan_memcg_shrink_begin (tracer)
>    --shrink_lruvec
>      --shrink_list
>        --shrink_active_list
> 	  --trace_mm_vmscan_lru_shrink_active (tracer)
>        --shrink_inactive_list
> 	  --trace_mm_vmscan_lru_shrink_inactive (tracer)
>      --shrink_active_list
>    --shrink_slab
>      --do_shrink_slab
>      --shrinker->count_objects()
>      --trace_mm_shrink_slab_start (tracer)
>      --shrinker->scan_objects()
>      --trace_mm_shrink_slab_end (tracer)
>    --trace_mm_vmscan_memcg_shrink_end (tracer)
> --trace_mm_vmscan_node_reclaim_end (tracer)
> 
> If we get the duration and quantity of shrink lru and slab,
> then we can measure the memory recycling, as follows
> 
> Measuring memory reclamation with bpf:
>    LRU FILE:
> 	CPU COMM 	ShrinkActive(us) ShrinkInactive(us)  Reclaim(page)
> 	7   kswapd0	 	26		51		32
> 	7   kswapd0		52		47		13
>    SLAB:
> 	CPU COMM 		OBJ_NAME		Count_Dur(us) Freeable(page) Scan_Dur(us) Reclaim(page)
> 	 1  kswapd0		super_cache_scan.cfi_jt     2		    341		   3225		128
> 	 7  kswapd0		super_cache_scan.cfi_jt     0		    2247	   8524		1024
> 	 7  kswapd0	        super_cache_scan.cfi_jt     2367	    0		   0		0
> 
> For this, add the new tracer to shrink_active_list/shrink_inactive_list
> and shrinker->count_objects().
> 
> Changes:
> v6: * Add Reviewed-by from Steven Rostedt.
> v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to
> replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start'
>      * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru'
> v4: Add Reviewed-by and Changlog to every patch.
> v3: Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event.
> v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error.
> 
> cuibixuan (2):
>    mm: shrinker: add new event to trace shrink count
>    mm: vmscan: add new event to trace shrink lru
> 
>   include/trace/events/vmscan.h | 80 ++++++++++++++++++++++++++++++++++-
>   mm/shrinker.c                 |  4 ++
>   mm/vmscan.c                   | 11 +++--
>   3 files changed, 90 insertions(+), 5 deletions(-)
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -next v6 0/2] Make memory reclamation measurable
  2024-01-05  1:36 [PATCH -next v6 0/2] Make memory reclamation measurable Bixuan Cui
                   ` (3 preceding siblings ...)
  2024-01-24  2:41 ` Bixuan Cui
@ 2024-02-21  1:44 ` Bixuan Cui
  2024-02-21  2:22   ` Steven Rostedt
  4 siblings, 1 reply; 12+ messages in thread
From: Bixuan Cui @ 2024-02-21  1:44 UTC (permalink / raw)
  To: akpm, rostedt, mhiramat, mathieu.desnoyers
  Cc: linux-kernel, linux-trace-kernel, linux-mm, opensource.kernel

ping~

在 2024/1/5 9:36, Bixuan Cui 写道:
> When the system memory is low, kswapd reclaims the memory. The key steps
> of memory reclamation include
> 1.shrink_lruvec
>    * shrink_active_list, moves folios from the active LRU to the inactive LRU
>    * shrink_inactive_list, shrink lru from inactive LRU list
> 2.shrink_slab
>    * shrinker->count_objects(), calculates the freeable memory
>    * shrinker->scan_objects(), reclaims the slab memory
> 
> The existing tracers in the vmscan are as follows:
> 
> --do_try_to_free_pages
> --shrink_zones
> --trace_mm_vmscan_node_reclaim_begin (tracer)
> --shrink_node
> --shrink_node_memcgs
>    --trace_mm_vmscan_memcg_shrink_begin (tracer)
>    --shrink_lruvec
>      --shrink_list
>        --shrink_active_list
> 	  --trace_mm_vmscan_lru_shrink_active (tracer)
>        --shrink_inactive_list
> 	  --trace_mm_vmscan_lru_shrink_inactive (tracer)
>      --shrink_active_list
>    --shrink_slab
>      --do_shrink_slab
>      --shrinker->count_objects()
>      --trace_mm_shrink_slab_start (tracer)
>      --shrinker->scan_objects()
>      --trace_mm_shrink_slab_end (tracer)
>    --trace_mm_vmscan_memcg_shrink_end (tracer)
> --trace_mm_vmscan_node_reclaim_end (tracer)
> 
> If we get the duration and quantity of shrink lru and slab,
> then we can measure the memory recycling, as follows
> 
> Measuring memory reclamation with bpf:
>    LRU FILE:
> 	CPU COMM 	ShrinkActive(us) ShrinkInactive(us)  Reclaim(page)
> 	7   kswapd0	 	26		51		32
> 	7   kswapd0		52		47		13
>    SLAB:
> 	CPU COMM 		OBJ_NAME		Count_Dur(us) Freeable(page) Scan_Dur(us) Reclaim(page)
> 	 1  kswapd0		super_cache_scan.cfi_jt     2		    341		   3225		128
> 	 7  kswapd0		super_cache_scan.cfi_jt     0		    2247	   8524		1024
> 	 7  kswapd0	        super_cache_scan.cfi_jt     2367	    0		   0		0
> 
> For this, add the new tracer to shrink_active_list/shrink_inactive_list
> and shrinker->count_objects().
> 
> Changes:
> v6: * Add Reviewed-by from Steven Rostedt.
> v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to
> replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start'
>      * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru'
> v4: Add Reviewed-by and Changlog to every patch.
> v3: Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event.
> v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error.
> 
> cuibixuan (2):
>    mm: shrinker: add new event to trace shrink count
>    mm: vmscan: add new event to trace shrink lru
> 
>   include/trace/events/vmscan.h | 80 ++++++++++++++++++++++++++++++++++-
>   mm/shrinker.c                 |  4 ++
>   mm/vmscan.c                   | 11 +++--
>   3 files changed, 90 insertions(+), 5 deletions(-)
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -next v6 0/2] Make memory reclamation measurable
  2024-02-21  1:44 ` Bixuan Cui
@ 2024-02-21  2:22   ` Steven Rostedt
  2024-02-21  3:00     ` Bixuan Cui
  0 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2024-02-21  2:22 UTC (permalink / raw)
  To: Bixuan Cui
  Cc: akpm, mhiramat, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, linux-mm, opensource.kernel

On Wed, 21 Feb 2024 09:44:32 +0800
Bixuan Cui <cuibixuan@vivo.com> wrote:

> ping~
> 

It's up to the memory management folks to decide on this.

-- Steve


> 在 2024/1/5 9:36, Bixuan Cui 写道:
> > When the system memory is low, kswapd reclaims the memory. The key steps
> > of memory reclamation include
> > 1.shrink_lruvec
> >    * shrink_active_list, moves folios from the active LRU to the inactive LRU
> >    * shrink_inactive_list, shrink lru from inactive LRU list
> > 2.shrink_slab
> >    * shrinker->count_objects(), calculates the freeable memory
> >    * shrinker->scan_objects(), reclaims the slab memory
> > 
> > The existing tracers in the vmscan are as follows:
> > 
> > --do_try_to_free_pages
> > --shrink_zones
> > --trace_mm_vmscan_node_reclaim_begin (tracer)
> > --shrink_node
> > --shrink_node_memcgs
> >    --trace_mm_vmscan_memcg_shrink_begin (tracer)
> >    --shrink_lruvec
> >      --shrink_list
> >        --shrink_active_list
> > 	  --trace_mm_vmscan_lru_shrink_active (tracer)
> >        --shrink_inactive_list
> > 	  --trace_mm_vmscan_lru_shrink_inactive (tracer)
> >      --shrink_active_list
> >    --shrink_slab
> >      --do_shrink_slab
> >      --shrinker->count_objects()
> >      --trace_mm_shrink_slab_start (tracer)
> >      --shrinker->scan_objects()
> >      --trace_mm_shrink_slab_end (tracer)
> >    --trace_mm_vmscan_memcg_shrink_end (tracer)
> > --trace_mm_vmscan_node_reclaim_end (tracer)
> > 
> > If we get the duration and quantity of shrink lru and slab,
> > then we can measure the memory recycling, as follows
> > 
> > Measuring memory reclamation with bpf:
> >    LRU FILE:
> > 	CPU COMM 	ShrinkActive(us) ShrinkInactive(us)  Reclaim(page)
> > 	7   kswapd0	 	26		51		32
> > 	7   kswapd0		52		47		13
> >    SLAB:
> > 	CPU COMM 		OBJ_NAME		Count_Dur(us) Freeable(page) Scan_Dur(us) Reclaim(page)
> > 	 1  kswapd0		super_cache_scan.cfi_jt     2		    341		   3225		128
> > 	 7  kswapd0		super_cache_scan.cfi_jt     0		    2247	   8524		1024
> > 	 7  kswapd0	        super_cache_scan.cfi_jt     2367	    0		   0		0
> > 
> > For this, add the new tracer to shrink_active_list/shrink_inactive_list
> > and shrinker->count_objects().
> > 
> > Changes:
> > v6: * Add Reviewed-by from Steven Rostedt.
> > v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to
> > replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start'
> >      * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru'
> > v4: Add Reviewed-by and Changlog to every patch.
> > v3: Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event.
> > v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error.
> > 
> > cuibixuan (2):
> >    mm: shrinker: add new event to trace shrink count
> >    mm: vmscan: add new event to trace shrink lru
> > 
> >   include/trace/events/vmscan.h | 80 ++++++++++++++++++++++++++++++++++-
> >   mm/shrinker.c                 |  4 ++
> >   mm/vmscan.c                   | 11 +++--
> >   3 files changed, 90 insertions(+), 5 deletions(-)
> >   


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -next v6 0/2] Make memory reclamation measurable
  2024-02-21  2:22   ` Steven Rostedt
@ 2024-02-21  3:00     ` Bixuan Cui
  2024-02-21  7:44       ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Bixuan Cui @ 2024-02-21  3:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: akpm, mhiramat, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, linux-mm, opensource.kernel



在 2024/2/21 10:22, Steven Rostedt 写道:
> It's up to the memory management folks to decide on this. -- Steve
Noted with thanks.

Bixuan Cui

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -next v6 0/2] Make memory reclamation measurable
  2024-02-21  3:00     ` Bixuan Cui
@ 2024-02-21  7:44       ` Michal Hocko
  2024-03-07  7:40         ` Bixuan Cui
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2024-02-21  7:44 UTC (permalink / raw)
  To: Bixuan Cui
  Cc: Steven Rostedt, akpm, mhiramat, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, linux-mm, opensource.kernel

On Wed 21-02-24 11:00:53, Bixuan Cui wrote:
> 
> 
> 在 2024/2/21 10:22, Steven Rostedt 写道:
> > It's up to the memory management folks to decide on this. -- Steve
> Noted with thanks.

It would be really helpful to have more details on why we need those
trace points.

It is my understanding that you would like to have a more fine grained
numbers for the time duration of different parts of the reclaim process.
I can imagine this could be useful in some cases but is it useful enough
and for a wider variety of workloads? Is that worth a dedicated static
tracepoints? Why an add-hoc dynamic tracepoints or BPF for a very
special situation is not sufficient?

In other words, tell us more about the usecases and why is this
generally useful.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -next v6 0/2] Make memory reclamation measurable
  2024-02-21  7:44       ` Michal Hocko
@ 2024-03-07  7:40         ` Bixuan Cui
  2024-03-07  9:26           ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Bixuan Cui @ 2024-03-07  7:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Steven Rostedt, akpm, mhiramat, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, linux-mm, opensource.kernel



在 2024/2/21 15:44, Michal Hocko 写道:
> It would be really helpful to have more details on why we need those 
> trace points. It is my understanding that you would like to have a more 
> fine grained numbers for the time duration of different parts of the 
> reclaim process. I can imagine this could be useful in some cases but is 
> it useful enough and for a wider variety of workloads? Is that worth a 
> dedicated static tracepoints? Why an add-hoc dynamic tracepoints or BPF 
> for a very special situation is not sufficient? In other words, tell us 
> more about the usecases and why is this generally useful.
Thank you for your reply, I'm sorry that I forgot to describe the 
detailed reason.

Memory reclamation usually occurs when there is high memory pressure (or 
low memory) and is performed by Kswapd. In embedded systems, CPU 
resources are limited, and it is common for kswapd and critical 
processes (which typically require a large amount of memory and trigger 
memory reclamation) to compete for CPU resources. which in turn affects 
the execution of this key process, causing the execution time to 
increase and causing lags,such as dropped frames or slower startup times 
in mobile games.
Currently, with the help of kernel trace events or tools like Perfetto, 
we can only see that kswapd is competing for CPU and the frequency of 
memory reclamation triggers, but we do not have detailed information or 
metrics about memory reclamation, such as the duration and amount of 
each reclamation, or who is releasing memory (super_cache, f2fs, ext4), 
etc. This makes it impossible to locate the above problems.

Currently this patch helps us solve 2 actual performance problems 
(kswapd preempts the CPU causing game delay)
1. The increased memory allocation in the game (across different 
versions) has led to the degradation of kswapd.
     This is found by calculating the total amount of Reclaim(page) 
during the game startup phase.

2. The adoption of a different file system in the new system version has 
resulted in a slower reclamation rate.
     This is discovered through the OBJ_NAME change. For example, 
OBJ_NAME changes from super_cache_scan to ext4_es_scan.

Subsequently, it is also possible to calculate the memory reclamation 
rate to evaluate the memory performance of different versions.



The main reasons for adding static tracepoints are:
1. To subdivide the time spent in the shrinker->count_objects() and 
shrinker->scan_objects() functions within the do_shrink_slab function. 
Using BPF kprobe, we can only track the time spent in the do_shrink_slab 
function.
2. When tracing frequently called functions, static tracepoints (BPF 
tp/tracepoint) have lower performance impact compared to dynamic 
tracepoints (BPF kprobe).

Thanks
Bixuan Cui

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -next v6 0/2] Make memory reclamation measurable
  2024-03-07  7:40         ` Bixuan Cui
@ 2024-03-07  9:26           ` Michal Hocko
  2024-03-08  8:37             ` Bixuan Cui
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2024-03-07  9:26 UTC (permalink / raw)
  To: Bixuan Cui
  Cc: Steven Rostedt, akpm, mhiramat, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, linux-mm, opensource.kernel

On Thu 07-03-24 15:40:29, Bixuan Cui wrote:
[...]
> Currently, with the help of kernel trace events or tools like Perfetto, we
> can only see that kswapd is competing for CPU and the frequency of memory
> reclamation triggers, but we do not have detailed information or metrics
> about memory reclamation, such as the duration and amount of each
> reclamation, or who is releasing memory (super_cache, f2fs, ext4), etc. This
> makes it impossible to locate the above problems.

I am not sure I agree with you here. We do provide insight into LRU and
shrinkers reclaim. Why isn't that enough. In general I would advise you
to focus more on describing why the existing infrastructure is
insuficient (having examples would be really appreciated).

> Currently this patch helps us solve 2 actual performance problems (kswapd
> preempts the CPU causing game delay)
> 1. The increased memory allocation in the game (across different versions)
> has led to the degradation of kswapd.
>     This is found by calculating the total amount of Reclaim(page) during
> the game startup phase.
> 
> 2. The adoption of a different file system in the new system version has
> resulted in a slower reclamation rate.
>     This is discovered through the OBJ_NAME change. For example, OBJ_NAME
> changes from super_cache_scan to ext4_es_scan.
> 
> Subsequently, it is also possible to calculate the memory reclamation rate
> to evaluate the memory performance of different versions.

Why cannot you achive this with existing tracing or /proc/vmstat
infrastructure?

> The main reasons for adding static tracepoints are:
> 1. To subdivide the time spent in the shrinker->count_objects() and
> shrinker->scan_objects() functions within the do_shrink_slab function. Using
> BPF kprobe, we can only track the time spent in the do_shrink_slab function.
> 2. When tracing frequently called functions, static tracepoints (BPF
> tp/tracepoint) have lower performance impact compared to dynamic tracepoints
> (BPF kprobe).

You can track the time process has been preempted by other means, no? We
have context switching tracepoints in place. Have you considered that
option?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -next v6 0/2] Make memory reclamation measurable
  2024-03-07  9:26           ` Michal Hocko
@ 2024-03-08  8:37             ` Bixuan Cui
  0 siblings, 0 replies; 12+ messages in thread
From: Bixuan Cui @ 2024-03-08  8:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Steven Rostedt, akpm, mhiramat, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, linux-mm, opensource.kernel



在 2024/3/7 17:26, Michal Hocko 写道:
>> The main reasons for adding static tracepoints are:
>> 1. To subdivide the time spent in the shrinker->count_objects() and
>> shrinker->scan_objects() functions within the do_shrink_slab function. Using
>> BPF kprobe, we can only track the time spent in the do_shrink_slab function.
>> 2. When tracing frequently called functions, static tracepoints (BPF
>> tp/tracepoint) have lower performance impact compared to dynamic tracepoints
>> (BPF kprobe).
> You can track the time process has been preempted by other means, no? We
> have context switching tracepoints in place. Have you considered that
> option?
Let me think about it...

Thanks
Bixuan Cui

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-03-08  8:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-05  1:36 [PATCH -next v6 0/2] Make memory reclamation measurable Bixuan Cui
2024-01-05  1:36 ` [PATCH -next v6 1/2] mm: shrinker: add new event to trace shrink count Bixuan Cui
2024-01-05  1:36 ` [PATCH -next v6 2/2] mm: vmscan: add new event to trace shrink lru Bixuan Cui
2024-01-15  6:27 ` [PATCH -next v6 0/2] Make memory reclamation measurable Bixuan Cui
2024-01-24  2:41 ` Bixuan Cui
2024-02-21  1:44 ` Bixuan Cui
2024-02-21  2:22   ` Steven Rostedt
2024-02-21  3:00     ` Bixuan Cui
2024-02-21  7:44       ` Michal Hocko
2024-03-07  7:40         ` Bixuan Cui
2024-03-07  9:26           ` Michal Hocko
2024-03-08  8:37             ` Bixuan Cui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).