[PATCH] mm/lruvec: trace LRU add drains and drain-all queuing

Linux Trace Kernel
 help / color / mirror / Atom feed

* [PATCH] mm/lruvec: trace LRU add drains and drain-all queuing
@ 2026-06-09  4:11 JP Kobryn
  2026-06-09  7:44 ` Barry Song
  0 siblings, 1 reply; 8+ messages in thread
From: JP Kobryn @ 2026-06-09  4:11 UTC (permalink / raw)
  To: linux-mm, willy, shakeel.butt, usama.arif, akpm, vbabka, mhocko,
	rostedt, mhiramat, mathieu.desnoyers, kasong, qi.zheng, baohua,
	axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
	baoquan.he, youngjun.park
  Cc: linux-kernel, linux-trace-kernel

LRU add batches can be drained before they reach capacity. This can be a
source of LRU lock contention, but it is not currently possible to
attribute these drains to callers with existing tracepoints.

Add mm_lru_add_drain to report the CPU and lru_add batch count when an
lru_add batch is drained. This allows tracing to distinguish full drains
from partial drains and attribute them to the calling stack.

Add mm_lru_drain_all_queue to report when lru_add_drain_all() queues
per-CPU drain work. This captures the requester stack and target CPU for
remote drain work. The event is named as a drain-all queue event because
the queued work can be needed for batches other than lru_add.

Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
---
 include/trace/events/pagemap.h | 40 ++++++++++++++++++++++++++++++++++
 mm/swap.c                      |  6 ++++-
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
index 171524d3526d..ea8fc46bedb0 100644
--- a/include/trace/events/pagemap.h
+++ b/include/trace/events/pagemap.h
@@ -77,6 +77,46 @@ TRACE_EVENT(mm_lru_activate,
 	TP_printk("folio=%p pfn=0x%lx", __entry->folio, __entry->pfn)
 );
 
+TRACE_EVENT(mm_lru_add_drain,
+
+	TP_PROTO(int cpu, unsigned int nr),
+
+	TP_ARGS(cpu, nr),
+
+	TP_STRUCT__entry(
+		__field(int,		cpu	)
+		__field(unsigned int,	nr	)
+	),
+
+	TP_fast_assign(
+		__entry->cpu	= cpu;
+		__entry->nr	= nr;
+	),
+
+	TP_printk("cpu=%d nr=%u", __entry->cpu, __entry->nr)
+);
+
+TRACE_EVENT(mm_lru_drain_all_queue,
+
+	TP_PROTO(int target_cpu, bool force_all_cpus),
+
+	TP_ARGS(target_cpu, force_all_cpus),
+
+	TP_STRUCT__entry(
+		__field(int,	target_cpu	)
+		__field(bool,	force_all_cpus	)
+	),
+
+	TP_fast_assign(
+		__entry->target_cpu	= target_cpu;
+		__entry->force_all_cpus	= force_all_cpus;
+	),
+
+	TP_printk("target_cpu=%d force_all_cpus=%s",
+		__entry->target_cpu,
+		__entry->force_all_cpus ? "true" : "false")
+);
+
 #endif /* _TRACE_PAGEMAP_H */
 
 /* This part must be outside protection */
diff --git a/mm/swap.c b/mm/swap.c
index 588f50d8f1a8..c385b93582eb 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -694,9 +694,12 @@ void lru_add_drain_cpu(int cpu)
 {
 	struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);
 	struct folio_batch *fbatch = &fbatches->lru_add;
+	unsigned int nr_folios_add = folio_batch_count(fbatch);
 
-	if (folio_batch_count(fbatch))
+	if (nr_folios_add) {
 		folio_batch_move_lru(fbatch, lru_add);
+		trace_mm_lru_add_drain(cpu, nr_folios_add);
+	}
 
 	fbatch = &fbatches->lru_move_tail;
 	/* Disabling interrupts below acts as a compiler barrier. */
@@ -928,6 +931,7 @@ static inline void __lru_add_drain_all(bool force_all_cpus)
 		if (cpu_needs_drain(cpu)) {
 			INIT_WORK(work, lru_add_drain_per_cpu);
 			queue_work_on(cpu, mm_percpu_wq, work);
+			trace_mm_lru_drain_all_queue(cpu, force_all_cpus);
 			__cpumask_set_cpu(cpu, &has_work);
 		}
 	}
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/lruvec: trace LRU add drains and drain-all queuing
  2026-06-09  4:11 [PATCH] mm/lruvec: trace LRU add drains and drain-all queuing JP Kobryn
@ 2026-06-09  7:44 ` Barry Song
  2026-06-10  0:07   ` JP Kobryn
  0 siblings, 1 reply; 8+ messages in thread
From: Barry Song @ 2026-06-09  7:44 UTC (permalink / raw)
  To: JP Kobryn
  Cc: linux-mm, willy, shakeel.butt, usama.arif, akpm, vbabka, mhocko,
	rostedt, mhiramat, mathieu.desnoyers, kasong, qi.zheng,
	axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
	baoquan.he, youngjun.park, linux-kernel, linux-trace-kernel

On Tue, Jun 9, 2026 at 12:12 PM JP Kobryn <jp.kobryn@linux.dev> wrote:
>
> LRU add batches can be drained before they reach capacity. This can be a
> source of LRU lock contention, but it is not currently possible to
> attribute these drains to callers with existing tracepoints.
>
> Add mm_lru_add_drain to report the CPU and lru_add batch count when an
> lru_add batch is drained. This allows tracing to distinguish full drains
> from partial drains and attribute them to the calling stack.
>
> Add mm_lru_drain_all_queue to report when lru_add_drain_all() queues
> per-CPU drain work. This captures the requester stack and target CPU for
> remote drain work. The event is named as a drain-all queue event because
> the queued work can be needed for batches other than lru_add.
>
> Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
> ---
>  include/trace/events/pagemap.h | 40 ++++++++++++++++++++++++++++++++++
>  mm/swap.c                      |  6 ++++-
>  2 files changed, 45 insertions(+), 1 deletion(-)
>
> diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
> index 171524d3526d..ea8fc46bedb0 100644
> --- a/include/trace/events/pagemap.h
> +++ b/include/trace/events/pagemap.h
> @@ -77,6 +77,46 @@ TRACE_EVENT(mm_lru_activate,
>         TP_printk("folio=%p pfn=0x%lx", __entry->folio, __entry->pfn)
>  );
>
> +TRACE_EVENT(mm_lru_add_drain,
> +
> +       TP_PROTO(int cpu, unsigned int nr),
> +
> +       TP_ARGS(cpu, nr),
> +
> +       TP_STRUCT__entry(
> +               __field(int,            cpu     )
> +               __field(unsigned int,   nr      )
> +       ),
> +
> +       TP_fast_assign(
> +               __entry->cpu    = cpu;
> +               __entry->nr     = nr;
> +       ),
> +
> +       TP_printk("cpu=%d nr=%u", __entry->cpu, __entry->nr)
> +);
> +
> +TRACE_EVENT(mm_lru_drain_all_queue,
> +
> +       TP_PROTO(int target_cpu, bool force_all_cpus),
> +
> +       TP_ARGS(target_cpu, force_all_cpus),
> +
> +       TP_STRUCT__entry(
> +               __field(int,    target_cpu      )
> +               __field(bool,   force_all_cpus  )
> +       ),
> +
> +       TP_fast_assign(
> +               __entry->target_cpu     = target_cpu;
> +               __entry->force_all_cpus = force_all_cpus;
> +       ),
> +
> +       TP_printk("target_cpu=%d force_all_cpus=%s",
> +               __entry->target_cpu,
> +               __entry->force_all_cpus ? "true" : "false")
> +);
> +
>  #endif /* _TRACE_PAGEMAP_H */
>
>  /* This part must be outside protection */
> diff --git a/mm/swap.c b/mm/swap.c
> index 588f50d8f1a8..c385b93582eb 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -694,9 +694,12 @@ void lru_add_drain_cpu(int cpu)
>  {
>         struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);
>         struct folio_batch *fbatch = &fbatches->lru_add;
> +       unsigned int nr_folios_add = folio_batch_count(fbatch);
>
> -       if (folio_batch_count(fbatch))
> +       if (nr_folios_add) {
>                 folio_batch_move_lru(fbatch, lru_add);
> +               trace_mm_lru_add_drain(cpu, nr_folios_add);
> +       }
>
>         fbatch = &fbatches->lru_move_tail;
>         /* Disabling interrupts below acts as a compiler barrier. */
> @@ -928,6 +931,7 @@ static inline void __lru_add_drain_all(bool force_all_cpus)
>                 if (cpu_needs_drain(cpu)) {
>                         INIT_WORK(work, lru_add_drain_per_cpu);
>                         queue_work_on(cpu, mm_percpu_wq, work);
> +                       trace_mm_lru_drain_all_queue(cpu, force_all_cpus);

Do you need tracing on each CPU individually, or is tracing the
entire __lru_add_drain_all() invocation sufficient?

Do you also need this_gen and lru_drain_gen to be traced?

By the way, I'm not sure drain_all_queue is the best name here.
Why not simply use add_drain_all()? It would match the existing
function name better.

Best Regards
Barry

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/lruvec: trace LRU add drains and drain-all queuing
  2026-06-09  7:44 ` Barry Song
@ 2026-06-10  0:07   ` JP Kobryn
  2026-06-10  0:16     ` JP Kobryn
  0 siblings, 1 reply; 8+ messages in thread
From: JP Kobryn @ 2026-06-10  0:07 UTC (permalink / raw)
  To: Barry Song
  Cc: linux-mm, willy, shakeel.butt, usama.arif, akpm, vbabka, mhocko,
	rostedt, mhiramat, mathieu.desnoyers, kasong, qi.zheng,
	axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
	baoquan.he, youngjun.park, linux-kernel, linux-trace-kernel

On 6/9/26 12:44 AM, Barry Song wrote:
> On Tue, Jun 9, 2026 at 12:12 PM JP Kobryn <jp.kobryn@linux.dev> wrote:
>>
>> LRU add batches can be drained before they reach capacity. This can be a
>> source of LRU lock contention, but it is not currently possible to
>> attribute these drains to callers with existing tracepoints.
>>
>> Add mm_lru_add_drain to report the CPU and lru_add batch count when an
>> lru_add batch is drained. This allows tracing to distinguish full drains
>> from partial drains and attribute them to the calling stack.
>>
>> Add mm_lru_drain_all_queue to report when lru_add_drain_all() queues
>> per-CPU drain work. This captures the requester stack and target CPU for
>> remote drain work. The event is named as a drain-all queue event because
>> the queued work can be needed for batches other than lru_add.
>>
>> Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
>> ---
>>  include/trace/events/pagemap.h | 40 ++++++++++++++++++++++++++++++++++
>>  mm/swap.c                      |  6 ++++-
>>  2 files changed, 45 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
>> index 171524d3526d..ea8fc46bedb0 100644
>> --- a/include/trace/events/pagemap.h
>> +++ b/include/trace/events/pagemap.h
>> @@ -77,6 +77,46 @@ TRACE_EVENT(mm_lru_activate,
>>         TP_printk("folio=%p pfn=0x%lx", __entry->folio, __entry->pfn)
>>  );
>>
>> +TRACE_EVENT(mm_lru_add_drain,
>> +
>> +       TP_PROTO(int cpu, unsigned int nr),
>> +
>> +       TP_ARGS(cpu, nr),
>> +
>> +       TP_STRUCT__entry(
>> +               __field(int,            cpu     )
>> +               __field(unsigned int,   nr      )
>> +       ),
>> +
>> +       TP_fast_assign(
>> +               __entry->cpu    = cpu;
>> +               __entry->nr     = nr;
>> +       ),
>> +
>> +       TP_printk("cpu=%d nr=%u", __entry->cpu, __entry->nr)
>> +);
>> +
>> +TRACE_EVENT(mm_lru_drain_all_queue,
>> +
>> +       TP_PROTO(int target_cpu, bool force_all_cpus),
>> +
>> +       TP_ARGS(target_cpu, force_all_cpus),
>> +
>> +       TP_STRUCT__entry(
>> +               __field(int,    target_cpu      )
>> +               __field(bool,   force_all_cpus  )
>> +       ),
>> +
>> +       TP_fast_assign(
>> +               __entry->target_cpu     = target_cpu;
>> +               __entry->force_all_cpus = force_all_cpus;
>> +       ),
>> +
>> +       TP_printk("target_cpu=%d force_all_cpus=%s",
>> +               __entry->target_cpu,
>> +               __entry->force_all_cpus ? "true" : "false")
>> +);
>> +
>>  #endif /* _TRACE_PAGEMAP_H */
>>
>>  /* This part must be outside protection */
>> diff --git a/mm/swap.c b/mm/swap.c
>> index 588f50d8f1a8..c385b93582eb 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -694,9 +694,12 @@ void lru_add_drain_cpu(int cpu)
>>  {
>>         struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);
>>         struct folio_batch *fbatch = &fbatches->lru_add;
>> +       unsigned int nr_folios_add = folio_batch_count(fbatch);
>>
>> -       if (folio_batch_count(fbatch))
>> +       if (nr_folios_add) {
>>                 folio_batch_move_lru(fbatch, lru_add);
>> +               trace_mm_lru_add_drain(cpu, nr_folios_add);
>> +       }
>>
>>         fbatch = &fbatches->lru_move_tail;
>>         /* Disabling interrupts below acts as a compiler barrier. */
>> @@ -928,6 +931,7 @@ static inline void __lru_add_drain_all(bool force_all_cpus)
>>                 if (cpu_needs_drain(cpu)) {
>>                         INIT_WORK(work, lru_add_drain_per_cpu);
>>                         queue_work_on(cpu, mm_percpu_wq, work);
>> +                       trace_mm_lru_drain_all_queue(cpu, force_all_cpus);
> 
> Do you need tracing on each CPU individually, or is tracing the
> entire __lru_add_drain_all() invocation sufficient?

I think the latter would be fine. The remote work will invoke the
mm_lru_add_drain tracepoint, which will show up as kworker stacks. Since
the event already has the CPU, we could see where queued drains actually
ran.

> 
> Do you also need this_gen and lru_drain_gen to be traced?

As trace parameters, I don't think they give meaningful info on who is
making requests to drain all. But since I'm going to move the trace call
from within the CPU loop to earlier in the function we can still see
requestors even if the function exits early because a parallel drain
generation already satisfied the request.

> 
> By the way, I'm not sure drain_all_queue is the best name here.
> Why not simply use add_drain_all()? It would match the existing
> function name better.

The new tracepoint name can be mm_lru_add_drain_all.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/lruvec: trace LRU add drains and drain-all queuing
  2026-06-10  0:07   ` JP Kobryn
@ 2026-06-10  0:16     ` JP Kobryn
  2026-06-10  1:21       ` Shakeel Butt
  0 siblings, 1 reply; 8+ messages in thread
From: JP Kobryn @ 2026-06-10  0:16 UTC (permalink / raw)
  To: Barry Song
  Cc: linux-mm, willy, shakeel.butt, usama.arif, akpm, vbabka, mhocko,
	rostedt, mhiramat, mathieu.desnoyers, kasong, qi.zheng,
	axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
	baoquan.he, youngjun.park, linux-kernel, linux-trace-kernel

On 6/9/26 5:07 PM, JP Kobryn wrote:
> On 6/9/26 12:44 AM, Barry Song wrote:
>> On Tue, Jun 9, 2026 at 12:12 PM JP Kobryn <jp.kobryn@linux.dev> wrote:
>>>
>>> LRU add batches can be drained before they reach capacity. This can be a
>>> source of LRU lock contention, but it is not currently possible to
>>> attribute these drains to callers with existing tracepoints.
>>>
>>> Add mm_lru_add_drain to report the CPU and lru_add batch count when an
>>> lru_add batch is drained. This allows tracing to distinguish full drains
>>> from partial drains and attribute them to the calling stack.
>>>
>>> Add mm_lru_drain_all_queue to report when lru_add_drain_all() queues
>>> per-CPU drain work. This captures the requester stack and target CPU for
>>> remote drain work. The event is named as a drain-all queue event because
>>> the queued work can be needed for batches other than lru_add.
>>>
>>> Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
>>> ---
>>>  include/trace/events/pagemap.h | 40 ++++++++++++++++++++++++++++++++++
>>>  mm/swap.c                      |  6 ++++-
>>>  2 files changed, 45 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
>>> index 171524d3526d..ea8fc46bedb0 100644
>>> --- a/include/trace/events/pagemap.h
>>> +++ b/include/trace/events/pagemap.h
>>> @@ -77,6 +77,46 @@ TRACE_EVENT(mm_lru_activate,
>>>         TP_printk("folio=%p pfn=0x%lx", __entry->folio, __entry->pfn)
>>>  );
>>>
>>> +TRACE_EVENT(mm_lru_add_drain,
>>> +
>>> +       TP_PROTO(int cpu, unsigned int nr),
>>> +
>>> +       TP_ARGS(cpu, nr),
>>> +
>>> +       TP_STRUCT__entry(
>>> +               __field(int,            cpu     )
>>> +               __field(unsigned int,   nr      )
>>> +       ),
>>> +
>>> +       TP_fast_assign(
>>> +               __entry->cpu    = cpu;
>>> +               __entry->nr     = nr;
>>> +       ),
>>> +
>>> +       TP_printk("cpu=%d nr=%u", __entry->cpu, __entry->nr)
>>> +);
>>> +
>>> +TRACE_EVENT(mm_lru_drain_all_queue,
>>> +
>>> +       TP_PROTO(int target_cpu, bool force_all_cpus),
>>> +
>>> +       TP_ARGS(target_cpu, force_all_cpus),
>>> +
>>> +       TP_STRUCT__entry(
>>> +               __field(int,    target_cpu      )
>>> +               __field(bool,   force_all_cpus  )
>>> +       ),
>>> +
>>> +       TP_fast_assign(
>>> +               __entry->target_cpu     = target_cpu;
>>> +               __entry->force_all_cpus = force_all_cpus;
>>> +       ),
>>> +
>>> +       TP_printk("target_cpu=%d force_all_cpus=%s",
>>> +               __entry->target_cpu,
>>> +               __entry->force_all_cpus ? "true" : "false")
>>> +);
>>> +
>>>  #endif /* _TRACE_PAGEMAP_H */
>>>
>>>  /* This part must be outside protection */
>>> diff --git a/mm/swap.c b/mm/swap.c
>>> index 588f50d8f1a8..c385b93582eb 100644
>>> --- a/mm/swap.c
>>> +++ b/mm/swap.c
>>> @@ -694,9 +694,12 @@ void lru_add_drain_cpu(int cpu)
>>>  {
>>>         struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);
>>>         struct folio_batch *fbatch = &fbatches->lru_add;
>>> +       unsigned int nr_folios_add = folio_batch_count(fbatch);
>>>
>>> -       if (folio_batch_count(fbatch))
>>> +       if (nr_folios_add) {
>>>                 folio_batch_move_lru(fbatch, lru_add);
>>> +               trace_mm_lru_add_drain(cpu, nr_folios_add);
>>> +       }
>>>
>>>         fbatch = &fbatches->lru_move_tail;
>>>         /* Disabling interrupts below acts as a compiler barrier. */
>>> @@ -928,6 +931,7 @@ static inline void __lru_add_drain_all(bool force_all_cpus)
>>>                 if (cpu_needs_drain(cpu)) {
>>>                         INIT_WORK(work, lru_add_drain_per_cpu);
>>>                         queue_work_on(cpu, mm_percpu_wq, work);
>>> +                       trace_mm_lru_drain_all_queue(cpu, force_all_cpus);
>>
>> Do you need tracing on each CPU individually, or is tracing the
>> entire __lru_add_drain_all() invocation sufficient?
> 
> I think the latter would be fine. The remote work will invoke the
> mm_lru_add_drain tracepoint, which will show up as kworker stacks. Since
> the event already has the CPU, we could see where queued drains actually
> ran.

Actually if it's just a single invocation and the only event data is the
force flag, a tracepoint may not even be needed. Other probes can be
installed on function invocation and read the single argument. I can
drop this from v2 and keep the single mm_lru_add_drain tracepoint.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/lruvec: trace LRU add drains and drain-all queuing
  2026-06-10  0:16     ` JP Kobryn
@ 2026-06-10  1:21       ` Shakeel Butt
  2026-06-10 18:54         ` JP Kobryn
  0 siblings, 1 reply; 8+ messages in thread
From: Shakeel Butt @ 2026-06-10  1:21 UTC (permalink / raw)
  To: JP Kobryn
  Cc: Barry Song, linux-mm, willy, usama.arif, akpm, vbabka, mhocko,
	rostedt, mhiramat, mathieu.desnoyers, kasong, qi.zheng,
	axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
	baoquan.he, youngjun.park, linux-kernel, linux-trace-kernel

On Tue, Jun 09, 2026 at 05:16:15PM -0700, JP Kobryn wrote:
> On 6/9/26 5:07 PM, JP Kobryn wrote:
> > On 6/9/26 12:44 AM, Barry Song wrote:
> >> On Tue, Jun 9, 2026 at 12:12 PM JP Kobryn <jp.kobryn@linux.dev> wrote:
> >>>
> >>> LRU add batches can be drained before they reach capacity. This can be a
> >>> source of LRU lock contention, but it is not currently possible to
> >>> attribute these drains to callers with existing tracepoints.
> >>>
> >>> Add mm_lru_add_drain to report the CPU and lru_add batch count when an
> >>> lru_add batch is drained. This allows tracing to distinguish full drains
> >>> from partial drains and attribute them to the calling stack.
> >>>
> >>> Add mm_lru_drain_all_queue to report when lru_add_drain_all() queues
> >>> per-CPU drain work. This captures the requester stack and target CPU for
> >>> remote drain work. The event is named as a drain-all queue event because
> >>> the queued work can be needed for batches other than lru_add.
> >>>
> >>> Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
> >>> ---
> >>>  include/trace/events/pagemap.h | 40 ++++++++++++++++++++++++++++++++++
> >>>  mm/swap.c                      |  6 ++++-
> >>>  2 files changed, 45 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
> >>> index 171524d3526d..ea8fc46bedb0 100644
> >>> --- a/include/trace/events/pagemap.h
> >>> +++ b/include/trace/events/pagemap.h
> >>> @@ -77,6 +77,46 @@ TRACE_EVENT(mm_lru_activate,
> >>>         TP_printk("folio=%p pfn=0x%lx", __entry->folio, __entry->pfn)
> >>>  );
> >>>
> >>> +TRACE_EVENT(mm_lru_add_drain,
> >>> +
> >>> +       TP_PROTO(int cpu, unsigned int nr),
> >>> +
> >>> +       TP_ARGS(cpu, nr),
> >>> +
> >>> +       TP_STRUCT__entry(
> >>> +               __field(int,            cpu     )
> >>> +               __field(unsigned int,   nr      )
> >>> +       ),
> >>> +
> >>> +       TP_fast_assign(
> >>> +               __entry->cpu    = cpu;
> >>> +               __entry->nr     = nr;
> >>> +       ),
> >>> +
> >>> +       TP_printk("cpu=%d nr=%u", __entry->cpu, __entry->nr)
> >>> +);
> >>> +
> >>> +TRACE_EVENT(mm_lru_drain_all_queue,
> >>> +
> >>> +       TP_PROTO(int target_cpu, bool force_all_cpus),
> >>> +
> >>> +       TP_ARGS(target_cpu, force_all_cpus),
> >>> +
> >>> +       TP_STRUCT__entry(
> >>> +               __field(int,    target_cpu      )
> >>> +               __field(bool,   force_all_cpus  )
> >>> +       ),
> >>> +
> >>> +       TP_fast_assign(
> >>> +               __entry->target_cpu     = target_cpu;
> >>> +               __entry->force_all_cpus = force_all_cpus;
> >>> +       ),
> >>> +
> >>> +       TP_printk("target_cpu=%d force_all_cpus=%s",
> >>> +               __entry->target_cpu,
> >>> +               __entry->force_all_cpus ? "true" : "false")
> >>> +);
> >>> +
> >>>  #endif /* _TRACE_PAGEMAP_H */
> >>>
> >>>  /* This part must be outside protection */
> >>> diff --git a/mm/swap.c b/mm/swap.c
> >>> index 588f50d8f1a8..c385b93582eb 100644
> >>> --- a/mm/swap.c
> >>> +++ b/mm/swap.c
> >>> @@ -694,9 +694,12 @@ void lru_add_drain_cpu(int cpu)
> >>>  {
> >>>         struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);
> >>>         struct folio_batch *fbatch = &fbatches->lru_add;
> >>> +       unsigned int nr_folios_add = folio_batch_count(fbatch);
> >>>
> >>> -       if (folio_batch_count(fbatch))
> >>> +       if (nr_folios_add) {
> >>>                 folio_batch_move_lru(fbatch, lru_add);
> >>> +               trace_mm_lru_add_drain(cpu, nr_folios_add);
> >>> +       }
> >>>
> >>>         fbatch = &fbatches->lru_move_tail;
> >>>         /* Disabling interrupts below acts as a compiler barrier. */
> >>> @@ -928,6 +931,7 @@ static inline void __lru_add_drain_all(bool force_all_cpus)
> >>>                 if (cpu_needs_drain(cpu)) {
> >>>                         INIT_WORK(work, lru_add_drain_per_cpu);
> >>>                         queue_work_on(cpu, mm_percpu_wq, work);
> >>> +                       trace_mm_lru_drain_all_queue(cpu, force_all_cpus);
> >>
> >> Do you need tracing on each CPU individually, or is tracing the
> >> entire __lru_add_drain_all() invocation sufficient?
> > 
> > I think the latter would be fine. The remote work will invoke the
> > mm_lru_add_drain tracepoint, which will show up as kworker stacks. Since
> > the event already has the CPU, we could see where queued drains actually
> > ran.
> 
> Actually if it's just a single invocation and the only event data is the
> force flag, a tracepoint may not even be needed. Other probes can be
> installed on function invocation and read the single argument. I can
> drop this from v2 and keep the single mm_lru_add_drain tracepoint.

No we do want to trace the callers requesting to drain from all the CPUs. If you
trace just lru_add_drain_cpu() then you will only see that the drain is
requested for a given CPU but no information on the requester. 

Also as Barry said, I think single trace for whole __lru_add_drain_all() is good
enough.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/lruvec: trace LRU add drains and drain-all queuing
  2026-06-10  1:21       ` Shakeel Butt
@ 2026-06-10 18:54         ` JP Kobryn
  2026-06-10 19:20           ` JP Kobryn
  0 siblings, 1 reply; 8+ messages in thread
From: JP Kobryn @ 2026-06-10 18:54 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Barry Song, linux-mm, willy, usama.arif, akpm, vbabka, mhocko,
	rostedt, mhiramat, mathieu.desnoyers, kasong, qi.zheng,
	axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
	baoquan.he, youngjun.park, linux-kernel, linux-trace-kernel

On 6/9/26 6:21 PM, Shakeel Butt wrote:
> On Tue, Jun 09, 2026 at 05:16:15PM -0700, JP Kobryn wrote:
>> On 6/9/26 5:07 PM, JP Kobryn wrote:
>>> On 6/9/26 12:44 AM, Barry Song wrote:
>>>> On Tue, Jun 9, 2026 at 12:12 PM JP Kobryn <jp.kobryn@linux.dev> wrote:
>>>>>
>>>>> LRU add batches can be drained before they reach capacity. This can be a
>>>>> source of LRU lock contention, but it is not currently possible to
>>>>> attribute these drains to callers with existing tracepoints.
>>>>>
>>>>> Add mm_lru_add_drain to report the CPU and lru_add batch count when an
>>>>> lru_add batch is drained. This allows tracing to distinguish full drains
>>>>> from partial drains and attribute them to the calling stack.
>>>>>
>>>>> Add mm_lru_drain_all_queue to report when lru_add_drain_all() queues
>>>>> per-CPU drain work. This captures the requester stack and target CPU for
>>>>> remote drain work. The event is named as a drain-all queue event because
>>>>> the queued work can be needed for batches other than lru_add.
>>>>>
>>>>> Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
>>>>> ---
>>>>>  include/trace/events/pagemap.h | 40 ++++++++++++++++++++++++++++++++++
>>>>>  mm/swap.c                      |  6 ++++-
>>>>>  2 files changed, 45 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
>>>>> index 171524d3526d..ea8fc46bedb0 100644
>>>>> --- a/include/trace/events/pagemap.h
>>>>> +++ b/include/trace/events/pagemap.h
>>>>> @@ -77,6 +77,46 @@ TRACE_EVENT(mm_lru_activate,
>>>>>         TP_printk("folio=%p pfn=0x%lx", __entry->folio, __entry->pfn)
>>>>>  );
>>>>>
>>>>> +TRACE_EVENT(mm_lru_add_drain,
>>>>> +
>>>>> +       TP_PROTO(int cpu, unsigned int nr),
>>>>> +
>>>>> +       TP_ARGS(cpu, nr),
>>>>> +
>>>>> +       TP_STRUCT__entry(
>>>>> +               __field(int,            cpu     )
>>>>> +               __field(unsigned int,   nr      )
>>>>> +       ),
>>>>> +
>>>>> +       TP_fast_assign(
>>>>> +               __entry->cpu    = cpu;
>>>>> +               __entry->nr     = nr;
>>>>> +       ),
>>>>> +
>>>>> +       TP_printk("cpu=%d nr=%u", __entry->cpu, __entry->nr)
>>>>> +);
>>>>> +
>>>>> +TRACE_EVENT(mm_lru_drain_all_queue,
>>>>> +
>>>>> +       TP_PROTO(int target_cpu, bool force_all_cpus),
>>>>> +
>>>>> +       TP_ARGS(target_cpu, force_all_cpus),
>>>>> +
>>>>> +       TP_STRUCT__entry(
>>>>> +               __field(int,    target_cpu      )
>>>>> +               __field(bool,   force_all_cpus  )
>>>>> +       ),
>>>>> +
>>>>> +       TP_fast_assign(
>>>>> +               __entry->target_cpu     = target_cpu;
>>>>> +               __entry->force_all_cpus = force_all_cpus;
>>>>> +       ),
>>>>> +
>>>>> +       TP_printk("target_cpu=%d force_all_cpus=%s",
>>>>> +               __entry->target_cpu,
>>>>> +               __entry->force_all_cpus ? "true" : "false")
>>>>> +);
>>>>> +
>>>>>  #endif /* _TRACE_PAGEMAP_H */
>>>>>
>>>>>  /* This part must be outside protection */
>>>>> diff --git a/mm/swap.c b/mm/swap.c
>>>>> index 588f50d8f1a8..c385b93582eb 100644
>>>>> --- a/mm/swap.c
>>>>> +++ b/mm/swap.c
>>>>> @@ -694,9 +694,12 @@ void lru_add_drain_cpu(int cpu)
>>>>>  {
>>>>>         struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);
>>>>>         struct folio_batch *fbatch = &fbatches->lru_add;
>>>>> +       unsigned int nr_folios_add = folio_batch_count(fbatch);
>>>>>
>>>>> -       if (folio_batch_count(fbatch))
>>>>> +       if (nr_folios_add) {
>>>>>                 folio_batch_move_lru(fbatch, lru_add);
>>>>> +               trace_mm_lru_add_drain(cpu, nr_folios_add);
>>>>> +       }
>>>>>
>>>>>         fbatch = &fbatches->lru_move_tail;
>>>>>         /* Disabling interrupts below acts as a compiler barrier. */
>>>>> @@ -928,6 +931,7 @@ static inline void __lru_add_drain_all(bool force_all_cpus)
>>>>>                 if (cpu_needs_drain(cpu)) {
>>>>>                         INIT_WORK(work, lru_add_drain_per_cpu);
>>>>>                         queue_work_on(cpu, mm_percpu_wq, work);
>>>>> +                       trace_mm_lru_drain_all_queue(cpu, force_all_cpus);
>>>>
>>>> Do you need tracing on each CPU individually, or is tracing the
>>>> entire __lru_add_drain_all() invocation sufficient?
>>>
>>> I think the latter would be fine. The remote work will invoke the
>>> mm_lru_add_drain tracepoint, which will show up as kworker stacks. Since
>>> the event already has the CPU, we could see where queued drains actually
>>> ran.
>>
>> Actually if it's just a single invocation and the only event data is the
>> force flag, a tracepoint may not even be needed. Other probes can be
>> installed on function invocation and read the single argument. I can
>> drop this from v2 and keep the single mm_lru_add_drain tracepoint.
> 
> No we do want to trace the callers requesting to drain from all the CPUs. If you
> trace just lru_add_drain_cpu() then you will only see that the drain is
> requested for a given CPU but no information on the requester. 
> 
> Also as Barry said, I think single trace for whole __lru_add_drain_all() is good
> enough.

Right, but couldn't that already be done with fentry or kprobe? If we
only need the calling stack and the argument value of force_all_cpus I
don't see a strong need for a dedicated tracepoint.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/lruvec: trace LRU add drains and drain-all queuing
  2026-06-10 18:54         ` JP Kobryn
@ 2026-06-10 19:20           ` JP Kobryn
  2026-06-10 19:38             ` Shakeel Butt
  0 siblings, 1 reply; 8+ messages in thread
From: JP Kobryn @ 2026-06-10 19:20 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Barry Song, linux-mm, willy, usama.arif, akpm, vbabka, mhocko,
	rostedt, mhiramat, mathieu.desnoyers, kasong, qi.zheng,
	axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
	baoquan.he, youngjun.park, linux-kernel, linux-trace-kernel

On 6/10/26 11:54 AM, JP Kobryn wrote:
> On 6/9/26 6:21 PM, Shakeel Butt wrote:
>> On Tue, Jun 09, 2026 at 05:16:15PM -0700, JP Kobryn wrote:
>>> On 6/9/26 5:07 PM, JP Kobryn wrote:
>>>> On 6/9/26 12:44 AM, Barry Song wrote:
>>>>> On Tue, Jun 9, 2026 at 12:12 PM JP Kobryn <jp.kobryn@linux.dev> wrote:
>>>>>>
>>>>>> LRU add batches can be drained before they reach capacity. This can be a
>>>>>> source of LRU lock contention, but it is not currently possible to
>>>>>> attribute these drains to callers with existing tracepoints.
>>>>>>
>>>>>> Add mm_lru_add_drain to report the CPU and lru_add batch count when an
>>>>>> lru_add batch is drained. This allows tracing to distinguish full drains
>>>>>> from partial drains and attribute them to the calling stack.
>>>>>>
>>>>>> Add mm_lru_drain_all_queue to report when lru_add_drain_all() queues
>>>>>> per-CPU drain work. This captures the requester stack and target CPU for
>>>>>> remote drain work. The event is named as a drain-all queue event because
>>>>>> the queued work can be needed for batches other than lru_add.
>>>>>>
>>>>>> Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
>>>>>> ---
>>>>>>  include/trace/events/pagemap.h | 40 ++++++++++++++++++++++++++++++++++
>>>>>>  mm/swap.c                      |  6 ++++-
>>>>>>  2 files changed, 45 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
>>>>>> index 171524d3526d..ea8fc46bedb0 100644
>>>>>> --- a/include/trace/events/pagemap.h
>>>>>> +++ b/include/trace/events/pagemap.h
>>>>>> @@ -77,6 +77,46 @@ TRACE_EVENT(mm_lru_activate,
>>>>>>         TP_printk("folio=%p pfn=0x%lx", __entry->folio, __entry->pfn)
>>>>>>  );
>>>>>>
>>>>>> +TRACE_EVENT(mm_lru_add_drain,
>>>>>> +
>>>>>> +       TP_PROTO(int cpu, unsigned int nr),
>>>>>> +
>>>>>> +       TP_ARGS(cpu, nr),
>>>>>> +
>>>>>> +       TP_STRUCT__entry(
>>>>>> +               __field(int,            cpu     )
>>>>>> +               __field(unsigned int,   nr      )
>>>>>> +       ),
>>>>>> +
>>>>>> +       TP_fast_assign(
>>>>>> +               __entry->cpu    = cpu;
>>>>>> +               __entry->nr     = nr;
>>>>>> +       ),
>>>>>> +
>>>>>> +       TP_printk("cpu=%d nr=%u", __entry->cpu, __entry->nr)
>>>>>> +);
>>>>>> +
>>>>>> +TRACE_EVENT(mm_lru_drain_all_queue,
>>>>>> +
>>>>>> +       TP_PROTO(int target_cpu, bool force_all_cpus),
>>>>>> +
>>>>>> +       TP_ARGS(target_cpu, force_all_cpus),
>>>>>> +
>>>>>> +       TP_STRUCT__entry(
>>>>>> +               __field(int,    target_cpu      )
>>>>>> +               __field(bool,   force_all_cpus  )
>>>>>> +       ),
>>>>>> +
>>>>>> +       TP_fast_assign(
>>>>>> +               __entry->target_cpu     = target_cpu;
>>>>>> +               __entry->force_all_cpus = force_all_cpus;
>>>>>> +       ),
>>>>>> +
>>>>>> +       TP_printk("target_cpu=%d force_all_cpus=%s",
>>>>>> +               __entry->target_cpu,
>>>>>> +               __entry->force_all_cpus ? "true" : "false")
>>>>>> +);
>>>>>> +
>>>>>>  #endif /* _TRACE_PAGEMAP_H */
>>>>>>
>>>>>>  /* This part must be outside protection */
>>>>>> diff --git a/mm/swap.c b/mm/swap.c
>>>>>> index 588f50d8f1a8..c385b93582eb 100644
>>>>>> --- a/mm/swap.c
>>>>>> +++ b/mm/swap.c
>>>>>> @@ -694,9 +694,12 @@ void lru_add_drain_cpu(int cpu)
>>>>>>  {
>>>>>>         struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);
>>>>>>         struct folio_batch *fbatch = &fbatches->lru_add;
>>>>>> +       unsigned int nr_folios_add = folio_batch_count(fbatch);
>>>>>>
>>>>>> -       if (folio_batch_count(fbatch))
>>>>>> +       if (nr_folios_add) {
>>>>>>                 folio_batch_move_lru(fbatch, lru_add);
>>>>>> +               trace_mm_lru_add_drain(cpu, nr_folios_add);
>>>>>> +       }
>>>>>>
>>>>>>         fbatch = &fbatches->lru_move_tail;
>>>>>>         /* Disabling interrupts below acts as a compiler barrier. */
>>>>>> @@ -928,6 +931,7 @@ static inline void __lru_add_drain_all(bool force_all_cpus)
>>>>>>                 if (cpu_needs_drain(cpu)) {
>>>>>>                         INIT_WORK(work, lru_add_drain_per_cpu);
>>>>>>                         queue_work_on(cpu, mm_percpu_wq, work);
>>>>>> +                       trace_mm_lru_drain_all_queue(cpu, force_all_cpus);
>>>>>
>>>>> Do you need tracing on each CPU individually, or is tracing the
>>>>> entire __lru_add_drain_all() invocation sufficient?
>>>>
>>>> I think the latter would be fine. The remote work will invoke the
>>>> mm_lru_add_drain tracepoint, which will show up as kworker stacks. Since
>>>> the event already has the CPU, we could see where queued drains actually
>>>> ran.
>>>
>>> Actually if it's just a single invocation and the only event data is the
>>> force flag, a tracepoint may not even be needed. Other probes can be
>>> installed on function invocation and read the single argument. I can
>>> drop this from v2 and keep the single mm_lru_add_drain tracepoint.
>>
>> No we do want to trace the callers requesting to drain from all the CPUs. If you
>> trace just lru_add_drain_cpu() then you will only see that the drain is
>> requested for a given CPU but no information on the requester. 
>>
>> Also as Barry said, I think single trace for whole __lru_add_drain_all() is good
>> enough.
> 
> Right, but couldn't that already be done with fentry or kprobe? If we
> only need the calling stack and the argument value of force_all_cpus I
> don't see a strong need for a dedicated tracepoint.

Nevermind that. I see it's declared inline so I'll add a tracepoint and
send v3.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/lruvec: trace LRU add drains and drain-all queuing
  2026-06-10 19:20           ` JP Kobryn
@ 2026-06-10 19:38             ` Shakeel Butt
  0 siblings, 0 replies; 8+ messages in thread
From: Shakeel Butt @ 2026-06-10 19:38 UTC (permalink / raw)
  To: JP Kobryn
  Cc: Barry Song, linux-mm, willy, usama.arif, akpm, vbabka, mhocko,
	rostedt, mhiramat, mathieu.desnoyers, kasong, qi.zheng,
	axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
	baoquan.he, youngjun.park, linux-kernel, linux-trace-kernel

On Wed, Jun 10, 2026 at 12:20:19PM -0700, JP Kobryn wrote:
> On 6/10/26 11:54 AM, JP Kobryn wrote:
> > On 6/9/26 6:21 PM, Shakeel Butt wrote:
> >> On Tue, Jun 09, 2026 at 05:16:15PM -0700, JP Kobryn wrote:
> >>> On 6/9/26 5:07 PM, JP Kobryn wrote:
> >>>> On 6/9/26 12:44 AM, Barry Song wrote:
> >>>>> On Tue, Jun 9, 2026 at 12:12 PM JP Kobryn <jp.kobryn@linux.dev> wrote:
> >>>>>>

[...]

> >>>>> Do you need tracing on each CPU individually, or is tracing the
> >>>>> entire __lru_add_drain_all() invocation sufficient?
> >>>>
> >>>> I think the latter would be fine. The remote work will invoke the
> >>>> mm_lru_add_drain tracepoint, which will show up as kworker stacks. Since
> >>>> the event already has the CPU, we could see where queued drains actually
> >>>> ran.
> >>>
> >>> Actually if it's just a single invocation and the only event data is the
> >>> force flag, a tracepoint may not even be needed. Other probes can be
> >>> installed on function invocation and read the single argument. I can
> >>> drop this from v2 and keep the single mm_lru_add_drain tracepoint.
> >>
> >> No we do want to trace the callers requesting to drain from all the CPUs. If you
> >> trace just lru_add_drain_cpu() then you will only see that the drain is
> >> requested for a given CPU but no information on the requester. 
> >>
> >> Also as Barry said, I think single trace for whole __lru_add_drain_all() is good
> >> enough.
> > 
> > Right, but couldn't that already be done with fentry or kprobe? If we
> > only need the calling stack and the argument value of force_all_cpus I
> > don't see a strong need for a dedicated tracepoint.
> 
> Nevermind that. I see it's declared inline so I'll add a tracepoint and
> send v3.

Thanks. BTW even without inline keyword, compiler can still decide to inline
a function, so kprobe/fentry are not always reliable.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-10 19:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09  4:11 [PATCH] mm/lruvec: trace LRU add drains and drain-all queuing JP Kobryn
2026-06-09  7:44 ` Barry Song
2026-06-10  0:07   ` JP Kobryn
2026-06-10  0:16     ` JP Kobryn
2026-06-10  1:21       ` Shakeel Butt
2026-06-10 18:54         ` JP Kobryn
2026-06-10 19:20           ` JP Kobryn
2026-06-10 19:38             ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox