* [PATCH v3] mm/lruvec: trace LRU add drains and drain-all requests
@ 2026-06-10 19:52 JP Kobryn
2026-06-10 21:03 ` Barry Song
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: JP Kobryn @ 2026-06-10 19:52 UTC (permalink / raw)
To: linux-mm, willy, shakeel.butt, usama.arif, akpm, vbabka, mhocko,
rostedt, mhiramat, mathieu.desnoyers, kasong, qi.zheng, baohua,
axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
baoquan.he, youngjun.park
Cc: linux-kernel, linux-trace-kernel
LRU add batches can be drained before they reach capacity. This can be a
source of LRU lock contention, but it is not currently possible to
attribute these drains to callers with existing tracepoints.
Add mm_lru_add_drain to report the CPU and lru_add batch count when an
lru_add batch is drained. This allows tracing to distinguish full drains
from partial drains and attribute them to the calling stack.
Add mm_lru_add_drain_all to capture callers of __lru_add_drain_all and
whether they set the force flag for all CPUs. The tracepoint resembles
the signature of the enclosing function, but is needed because of
potential inlining.
Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
---
include/trace/events/pagemap.h | 37 ++++++++++++++++++++++++++++++++++
mm/swap.c | 7 ++++++-
2 files changed, 43 insertions(+), 1 deletion(-)
diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
index 171524d3526d..ff3da07ccb40 100644
--- a/include/trace/events/pagemap.h
+++ b/include/trace/events/pagemap.h
@@ -77,6 +77,43 @@ TRACE_EVENT(mm_lru_activate,
TP_printk("folio=%p pfn=0x%lx", __entry->folio, __entry->pfn)
);
+TRACE_EVENT(mm_lru_add_drain,
+
+ TP_PROTO(int cpu, unsigned int nr),
+
+ TP_ARGS(cpu, nr),
+
+ TP_STRUCT__entry(
+ __field(int, cpu )
+ __field(unsigned int, nr )
+ ),
+
+ TP_fast_assign(
+ __entry->cpu = cpu;
+ __entry->nr = nr;
+ ),
+
+ TP_printk("cpu=%d nr=%u", __entry->cpu, __entry->nr)
+);
+
+TRACE_EVENT(mm_lru_add_drain_all,
+
+ TP_PROTO(bool force_all_cpus),
+
+ TP_ARGS(force_all_cpus),
+
+ TP_STRUCT__entry(
+ __field(bool, force_all_cpus )
+ ),
+
+ TP_fast_assign(
+ __entry->force_all_cpus = force_all_cpus;
+ ),
+
+ TP_printk("force_all_cpus=%s",
+ __entry->force_all_cpus ? "true" : "false")
+);
+
#endif /* _TRACE_PAGEMAP_H */
/* This part must be outside protection */
diff --git a/mm/swap.c b/mm/swap.c
index 588f50d8f1a8..e14b7612f896 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -694,9 +694,12 @@ void lru_add_drain_cpu(int cpu)
{
struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);
struct folio_batch *fbatch = &fbatches->lru_add;
+ unsigned int nr_folios_add = folio_batch_count(fbatch);
- if (folio_batch_count(fbatch))
+ if (nr_folios_add) {
folio_batch_move_lru(fbatch, lru_add);
+ trace_mm_lru_add_drain(cpu, nr_folios_add);
+ }
fbatch = &fbatches->lru_move_tail;
/* Disabling interrupts below acts as a compiler barrier. */
@@ -869,6 +872,8 @@ static inline void __lru_add_drain_all(bool force_all_cpus)
if (WARN_ON(!mm_percpu_wq))
return;
+ trace_mm_lru_add_drain_all(force_all_cpus);
+
/*
* Guarantee folio_batch counter stores visible by this CPU
* are visible to other CPUs before loading the current drain
--
2.54.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH v3] mm/lruvec: trace LRU add drains and drain-all requests
2026-06-10 19:52 [PATCH v3] mm/lruvec: trace LRU add drains and drain-all requests JP Kobryn
@ 2026-06-10 21:03 ` Barry Song
2026-06-10 21:13 ` Shakeel Butt
2026-06-17 11:11 ` David Hildenbrand (Arm)
2 siblings, 0 replies; 6+ messages in thread
From: Barry Song @ 2026-06-10 21:03 UTC (permalink / raw)
To: JP Kobryn
Cc: linux-mm, willy, shakeel.butt, usama.arif, akpm, vbabka, mhocko,
rostedt, mhiramat, mathieu.desnoyers, kasong, qi.zheng,
axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
baoquan.he, youngjun.park, linux-kernel, linux-trace-kernel
On Thu, Jun 11, 2026 at 3:53 AM JP Kobryn <jp.kobryn@linux.dev> wrote:
>
> LRU add batches can be drained before they reach capacity. This can be a
> source of LRU lock contention, but it is not currently possible to
> attribute these drains to callers with existing tracepoints.
>
> Add mm_lru_add_drain to report the CPU and lru_add batch count when an
> lru_add batch is drained. This allows tracing to distinguish full drains
> from partial drains and attribute them to the calling stack.
>
> Add mm_lru_add_drain_all to capture callers of __lru_add_drain_all and
> whether they set the force flag for all CPUs. The tracepoint resembles
> the signature of the enclosing function, but is needed because of
> potential inlining.
>
> Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
Reviewed-by: Barry Song <baohua@kernel.org>
Some minor nits:
[...]
> + unsigned int nr_folios_add = folio_batch_count(fbatch);
>
> - if (folio_batch_count(fbatch))
> + if (nr_folios_add) {
> folio_batch_move_lru(fbatch, lru_add);
> + trace_mm_lru_add_drain(cpu, nr_folios_add);
> + }
Would "nr_folios" work here, given the surrounding lru_add context?
Alternatively, nr_folios_added might make the meaning a little clearer.
Best Regards
Barry
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH v3] mm/lruvec: trace LRU add drains and drain-all requests
2026-06-10 19:52 [PATCH v3] mm/lruvec: trace LRU add drains and drain-all requests JP Kobryn
2026-06-10 21:03 ` Barry Song
@ 2026-06-10 21:13 ` Shakeel Butt
2026-06-17 11:11 ` David Hildenbrand (Arm)
2 siblings, 0 replies; 6+ messages in thread
From: Shakeel Butt @ 2026-06-10 21:13 UTC (permalink / raw)
To: JP Kobryn
Cc: linux-mm, willy, usama.arif, akpm, vbabka, mhocko, rostedt,
mhiramat, mathieu.desnoyers, kasong, qi.zheng, baohua,
axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
baoquan.he, youngjun.park, linux-kernel, linux-trace-kernel
On Wed, Jun 10, 2026 at 12:52:20PM -0700, JP Kobryn wrote:
> LRU add batches can be drained before they reach capacity. This can be a
> source of LRU lock contention, but it is not currently possible to
> attribute these drains to callers with existing tracepoints.
>
> Add mm_lru_add_drain to report the CPU and lru_add batch count when an
> lru_add batch is drained. This allows tracing to distinguish full drains
> from partial drains and attribute them to the calling stack.
>
> Add mm_lru_add_drain_all to capture callers of __lru_add_drain_all and
> whether they set the force flag for all CPUs. The tracepoint resembles
> the signature of the enclosing function, but is needed because of
> potential inlining.
>
> Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3] mm/lruvec: trace LRU add drains and drain-all requests
2026-06-10 19:52 [PATCH v3] mm/lruvec: trace LRU add drains and drain-all requests JP Kobryn
2026-06-10 21:03 ` Barry Song
2026-06-10 21:13 ` Shakeel Butt
@ 2026-06-17 11:11 ` David Hildenbrand (Arm)
2026-06-17 15:03 ` Shakeel Butt
2 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-17 11:11 UTC (permalink / raw)
To: JP Kobryn, linux-mm, willy, shakeel.butt, usama.arif, akpm,
vbabka, mhocko, rostedt, mhiramat, mathieu.desnoyers, kasong,
qi.zheng, baohua, axelrasmussen, yuanchu, weixugc, chrisl,
shikemeng, nphamcs, baoquan.he, youngjun.park
Cc: linux-kernel, linux-trace-kernel
On 6/10/26 21:52, JP Kobryn wrote:
> LRU add batches can be drained before they reach capacity. This can be a
> source of LRU lock contention, but it is not currently possible to
> attribute these drains to callers with existing tracepoints.
>
> Add mm_lru_add_drain to report the CPU and lru_add batch count when an
> lru_add batch is drained. This allows tracing to distinguish full drains
> from partial drains and attribute them to the calling stack.
>
> Add mm_lru_add_drain_all to capture callers of __lru_add_drain_all and
> whether they set the force flag for all CPUs. The tracepoint resembles
> the signature of the enclosing function, but is needed because of
> potential inlining.
>
> Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
> ---
> include/trace/events/pagemap.h | 37 ++++++++++++++++++++++++++++++++++
> mm/swap.c | 7 ++++++-
> 2 files changed, 43 insertions(+), 1 deletion(-)
>
> diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
> index 171524d3526d..ff3da07ccb40 100644
> --- a/include/trace/events/pagemap.h
> +++ b/include/trace/events/pagemap.h
> @@ -77,6 +77,43 @@ TRACE_EVENT(mm_lru_activate,
> TP_printk("folio=%p pfn=0x%lx", __entry->folio, __entry->pfn)
> );
>
> +TRACE_EVENT(mm_lru_add_drain,
> +
> + TP_PROTO(int cpu, unsigned int nr),
> +
> + TP_ARGS(cpu, nr),
> +
> + TP_STRUCT__entry(
> + __field(int, cpu )
> + __field(unsigned int, nr )
> + ),
> +
> + TP_fast_assign(
> + __entry->cpu = cpu;
> + __entry->nr = nr;
> + ),
> +
> + TP_printk("cpu=%d nr=%u", __entry->cpu, __entry->nr)
> +);
> +
> +TRACE_EVENT(mm_lru_add_drain_all,
> +
> + TP_PROTO(bool force_all_cpus),
> +
> + TP_ARGS(force_all_cpus),
> +
> + TP_STRUCT__entry(
> + __field(bool, force_all_cpus )
> + ),
> +
> + TP_fast_assign(
> + __entry->force_all_cpus = force_all_cpus;
> + ),
> +
> + TP_printk("force_all_cpus=%s",
> + __entry->force_all_cpus ? "true" : "false")
> +);
> +
> #endif /* _TRACE_PAGEMAP_H */
>
> /* This part must be outside protection */
> diff --git a/mm/swap.c b/mm/swap.c
> index 588f50d8f1a8..e14b7612f896 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -694,9 +694,12 @@ void lru_add_drain_cpu(int cpu)
> {
> struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);
> struct folio_batch *fbatch = &fbatches->lru_add;
> + unsigned int nr_folios_add = folio_batch_count(fbatch);
>
> - if (folio_batch_count(fbatch))
> + if (nr_folios_add) {
> folio_batch_move_lru(fbatch, lru_add);
> + trace_mm_lru_add_drain(cpu, nr_folios_add);
> + }
>
> fbatch = &fbatches->lru_move_tail;
> /* Disabling interrupts below acts as a compiler barrier. */
> @@ -869,6 +872,8 @@ static inline void __lru_add_drain_all(bool force_all_cpus)
> if (WARN_ON(!mm_percpu_wq))
> return;
>
> + trace_mm_lru_add_drain_all(force_all_cpus);
> +
> /*
> * Guarantee folio_batch counter stores visible by this CPU
> * are visible to other CPUs before loading the current drain
Given that trace events can quickly become stable ABI [1], are we really sure we
want to add this?
[1] https://lore.kernel.org/r/20260603130006.7d2c4a62@gandalf.local.home
--
Cheers,
David
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH v3] mm/lruvec: trace LRU add drains and drain-all requests
2026-06-17 11:11 ` David Hildenbrand (Arm)
@ 2026-06-17 15:03 ` Shakeel Butt
2026-06-17 18:18 ` Vlastimil Babka (SUSE)
0 siblings, 1 reply; 6+ messages in thread
From: Shakeel Butt @ 2026-06-17 15:03 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: JP Kobryn, linux-mm, willy, usama.arif, akpm, vbabka, mhocko,
rostedt, mhiramat, mathieu.desnoyers, kasong, qi.zheng, baohua,
axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
baoquan.he, youngjun.park, linux-kernel, linux-trace-kernel
On Wed, Jun 17, 2026 at 01:11:16PM +0200, David Hildenbrand (Arm) wrote:
> On 6/10/26 21:52, JP Kobryn wrote:
> > LRU add batches can be drained before they reach capacity. This can be a
> > source of LRU lock contention, but it is not currently possible to
> > attribute these drains to callers with existing tracepoints.
> >
> > Add mm_lru_add_drain to report the CPU and lru_add batch count when an
> > lru_add batch is drained. This allows tracing to distinguish full drains
> > from partial drains and attribute them to the calling stack.
> >
> > Add mm_lru_add_drain_all to capture callers of __lru_add_drain_all and
> > whether they set the force flag for all CPUs. The tracepoint resembles
> > the signature of the enclosing function, but is needed because of
> > potential inlining.
> >
> > Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
> > ---
> > include/trace/events/pagemap.h | 37 ++++++++++++++++++++++++++++++++++
> > mm/swap.c | 7 ++++++-
> > 2 files changed, 43 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
> > index 171524d3526d..ff3da07ccb40 100644
> > --- a/include/trace/events/pagemap.h
> > +++ b/include/trace/events/pagemap.h
> > @@ -77,6 +77,43 @@ TRACE_EVENT(mm_lru_activate,
> > TP_printk("folio=%p pfn=0x%lx", __entry->folio, __entry->pfn)
> > );
> >
> > +TRACE_EVENT(mm_lru_add_drain,
> > +
> > + TP_PROTO(int cpu, unsigned int nr),
> > +
> > + TP_ARGS(cpu, nr),
> > +
> > + TP_STRUCT__entry(
> > + __field(int, cpu )
> > + __field(unsigned int, nr )
> > + ),
> > +
> > + TP_fast_assign(
> > + __entry->cpu = cpu;
> > + __entry->nr = nr;
> > + ),
> > +
> > + TP_printk("cpu=%d nr=%u", __entry->cpu, __entry->nr)
> > +);
> > +
> > +TRACE_EVENT(mm_lru_add_drain_all,
> > +
> > + TP_PROTO(bool force_all_cpus),
> > +
> > + TP_ARGS(force_all_cpus),
> > +
> > + TP_STRUCT__entry(
> > + __field(bool, force_all_cpus )
> > + ),
> > +
> > + TP_fast_assign(
> > + __entry->force_all_cpus = force_all_cpus;
> > + ),
> > +
> > + TP_printk("force_all_cpus=%s",
> > + __entry->force_all_cpus ? "true" : "false")
> > +);
> > +
> > #endif /* _TRACE_PAGEMAP_H */
> >
> > /* This part must be outside protection */
> > diff --git a/mm/swap.c b/mm/swap.c
> > index 588f50d8f1a8..e14b7612f896 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -694,9 +694,12 @@ void lru_add_drain_cpu(int cpu)
> > {
> > struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);
> > struct folio_batch *fbatch = &fbatches->lru_add;
> > + unsigned int nr_folios_add = folio_batch_count(fbatch);
> >
> > - if (folio_batch_count(fbatch))
> > + if (nr_folios_add) {
> > folio_batch_move_lru(fbatch, lru_add);
> > + trace_mm_lru_add_drain(cpu, nr_folios_add);
> > + }
> >
> > fbatch = &fbatches->lru_move_tail;
> > /* Disabling interrupts below acts as a compiler barrier. */
> > @@ -869,6 +872,8 @@ static inline void __lru_add_drain_all(bool force_all_cpus)
> > if (WARN_ON(!mm_percpu_wq))
> > return;
> >
> > + trace_mm_lru_add_drain_all(force_all_cpus);
> > +
> > /*
> > * Guarantee folio_batch counter stores visible by this CPU
> > * are visible to other CPUs before loading the current drain
>
> Given that trace events can quickly become stable ABI [1], are we really sure we
> want to add this?
Yes, I think so as this is useful to get insights into lru cache draining.
Trace events being stable or not is secondary IMHO. If in future we rearchitect
the lru page handling where there is no cache draining anymore, we can make
these a noops.
>
> [1] https://lore.kernel.org/r/20260603130006.7d2c4a62@gandalf.local.home
>
> --
> Cheers,
>
> David
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH v3] mm/lruvec: trace LRU add drains and drain-all requests
2026-06-17 15:03 ` Shakeel Butt
@ 2026-06-17 18:18 ` Vlastimil Babka (SUSE)
0 siblings, 0 replies; 6+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-06-17 18:18 UTC (permalink / raw)
To: Shakeel Butt, David Hildenbrand (Arm)
Cc: JP Kobryn, linux-mm, willy, usama.arif, akpm, mhocko, rostedt,
mhiramat, mathieu.desnoyers, kasong, qi.zheng, baohua,
axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
baoquan.he, youngjun.park, linux-kernel, linux-trace-kernel
On 6/17/26 17:03, Shakeel Butt wrote:
> On Wed, Jun 17, 2026 at 01:11:16PM +0200, David Hildenbrand (Arm) wrote:
>> On 6/10/26 21:52, JP Kobryn wrote:
>> > LRU add batches can be drained before they reach capacity. This can be a
>> > source of LRU lock contention, but it is not currently possible to
>> > attribute these drains to callers with existing tracepoints.
>> >
>> > Add mm_lru_add_drain to report the CPU and lru_add batch count when an
>> > lru_add batch is drained. This allows tracing to distinguish full drains
>> > from partial drains and attribute them to the calling stack.
>> >
>> > Add mm_lru_add_drain_all to capture callers of __lru_add_drain_all and
>> > whether they set the force flag for all CPUs. The tracepoint resembles
>> > the signature of the enclosing function, but is needed because of
>> > potential inlining.
>> >
>> > Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
>> > ---
>> > include/trace/events/pagemap.h | 37 ++++++++++++++++++++++++++++++++++
>> > mm/swap.c | 7 ++++++-
>> > 2 files changed, 43 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
>> > index 171524d3526d..ff3da07ccb40 100644
>> > --- a/include/trace/events/pagemap.h
>> > +++ b/include/trace/events/pagemap.h
>> > @@ -77,6 +77,43 @@ TRACE_EVENT(mm_lru_activate,
>> > TP_printk("folio=%p pfn=0x%lx", __entry->folio, __entry->pfn)
>> > );
>> >
>> > +TRACE_EVENT(mm_lru_add_drain,
>> > +
>> > + TP_PROTO(int cpu, unsigned int nr),
>> > +
>> > + TP_ARGS(cpu, nr),
>> > +
>> > + TP_STRUCT__entry(
>> > + __field(int, cpu )
>> > + __field(unsigned int, nr )
>> > + ),
>> > +
>> > + TP_fast_assign(
>> > + __entry->cpu = cpu;
>> > + __entry->nr = nr;
>> > + ),
>> > +
>> > + TP_printk("cpu=%d nr=%u", __entry->cpu, __entry->nr)
>> > +);
>> > +
>> > +TRACE_EVENT(mm_lru_add_drain_all,
>> > +
>> > + TP_PROTO(bool force_all_cpus),
>> > +
>> > + TP_ARGS(force_all_cpus),
>> > +
>> > + TP_STRUCT__entry(
>> > + __field(bool, force_all_cpus )
>> > + ),
>> > +
>> > + TP_fast_assign(
>> > + __entry->force_all_cpus = force_all_cpus;
>> > + ),
>> > +
>> > + TP_printk("force_all_cpus=%s",
>> > + __entry->force_all_cpus ? "true" : "false")
>> > +);
>> > +
>> > #endif /* _TRACE_PAGEMAP_H */
>> >
>> > /* This part must be outside protection */
>> > diff --git a/mm/swap.c b/mm/swap.c
>> > index 588f50d8f1a8..e14b7612f896 100644
>> > --- a/mm/swap.c
>> > +++ b/mm/swap.c
>> > @@ -694,9 +694,12 @@ void lru_add_drain_cpu(int cpu)
>> > {
>> > struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);
>> > struct folio_batch *fbatch = &fbatches->lru_add;
>> > + unsigned int nr_folios_add = folio_batch_count(fbatch);
>> >
>> > - if (folio_batch_count(fbatch))
>> > + if (nr_folios_add) {
>> > folio_batch_move_lru(fbatch, lru_add);
>> > + trace_mm_lru_add_drain(cpu, nr_folios_add);
>> > + }
>> >
>> > fbatch = &fbatches->lru_move_tail;
>> > /* Disabling interrupts below acts as a compiler barrier. */
>> > @@ -869,6 +872,8 @@ static inline void __lru_add_drain_all(bool force_all_cpus)
>> > if (WARN_ON(!mm_percpu_wq))
>> > return;
>> >
>> > + trace_mm_lru_add_drain_all(force_all_cpus);
>> > +
>> > /*
>> > * Guarantee folio_batch counter stores visible by this CPU
>> > * are visible to other CPUs before loading the current drain
>>
>> Given that trace events can quickly become stable ABI [1], are we really sure we
>> want to add this?
>
> Yes, I think so as this is useful to get insights into lru cache draining.
> Trace events being stable or not is secondary IMHO. If in future we rearchitect
> the lru page handling where there is no cache draining anymore, we can make
> these a noops.
Yeah and I don't recall ever that a change to a mm tracepoint would ever
break someone who'd complain and we'd have to revert it. These are niche
enough. So I think the risk is low.
>>
>> [1] https://lore.kernel.org/r/20260603130006.7d2c4a62@gandalf.local.home
>>
>> --
>> Cheers,
>>
>> David
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-06-17 18:19 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10 19:52 [PATCH v3] mm/lruvec: trace LRU add drains and drain-all requests JP Kobryn
2026-06-10 21:03 ` Barry Song
2026-06-10 21:13 ` Shakeel Butt
2026-06-17 11:11 ` David Hildenbrand (Arm)
2026-06-17 15:03 ` Shakeel Butt
2026-06-17 18:18 ` Vlastimil Babka (SUSE)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox