linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Add accumulated call counter for memory allocation profiling
@ 2024-06-17 15:32 David Wang
  2024-06-30 19:33 ` Suren Baghdasaryan
  0 siblings, 1 reply; 7+ messages in thread
From: David Wang @ 2024-06-17 15:32 UTC (permalink / raw)
  To: surenb, kent.overstreet, akpm; +Cc: linux-mm, linux-kernel, David Wang

Accumulated call counter can be used to evaluate rate
of memory allocation via delta(counters)/delta(time).
This metrics can help analysis performance behaviours,
e.g. tuning cache size, etc.

Signed-off-by: David Wang <00107082@163.com>
---
 include/linux/alloc_tag.h | 11 +++++++----
 lib/alloc_tag.c           |  7 +++----
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index abd24016a900..62734244c0b9 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -18,6 +18,7 @@
 struct alloc_tag_counters {
 	u64 bytes;
 	u64 calls;
+	u64 accu_calls;
 };
 
 /*
@@ -102,14 +103,15 @@ static inline bool mem_alloc_profiling_enabled(void)
 
 static inline struct alloc_tag_counters alloc_tag_read(struct alloc_tag *tag)
 {
-	struct alloc_tag_counters v = { 0, 0 };
+	struct alloc_tag_counters v = { 0, 0, 0 };
 	struct alloc_tag_counters *counter;
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
-		counter = per_cpu_ptr(tag->counters, cpu);
-		v.bytes += counter->bytes;
-		v.calls += counter->calls;
+		counter		= per_cpu_ptr(tag->counters, cpu);
+		v.bytes		+= counter->bytes;
+		v.calls		+= counter->calls;
+		v.accu_calls	+= counter->accu_calls;
 	}
 
 	return v;
@@ -145,6 +147,7 @@ static inline void __alloc_tag_ref_set(union codetag_ref *ref, struct alloc_tag
 	 * counter because when we free each part the counter will be decremented.
 	 */
 	this_cpu_inc(tag->counters->calls);
+	this_cpu_inc(tag->counters->accu_calls);
 }
 
 static inline void alloc_tag_ref_set(union codetag_ref *ref, struct alloc_tag *tag)
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 11ed973ac359..c4059362d828 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -66,8 +66,8 @@ static void allocinfo_stop(struct seq_file *m, void *arg)
 static void print_allocinfo_header(struct seq_buf *buf)
 {
 	/* Output format version, so we can change it. */
-	seq_buf_printf(buf, "allocinfo - version: 1.0\n");
-	seq_buf_printf(buf, "#     <size>  <calls> <tag info>\n");
+	seq_buf_printf(buf, "allocinfo - version: 1.1\n");
+	seq_buf_printf(buf, "#     <size>  <calls> <tag info> <accumulated calls>\n");
 }
 
 static void alloc_tag_to_text(struct seq_buf *out, struct codetag *ct)
@@ -78,8 +78,7 @@ static void alloc_tag_to_text(struct seq_buf *out, struct codetag *ct)
 
 	seq_buf_printf(out, "%12lli %8llu ", bytes, counter.calls);
 	codetag_to_text(out, ct);
-	seq_buf_putc(out, ' ');
-	seq_buf_putc(out, '\n');
+	seq_buf_printf(out, " %llu\n", counter.accu_calls);
 }
 
 static int allocinfo_show(struct seq_file *m, void *arg)
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] Add accumulated call counter for memory allocation profiling
  2024-06-17 15:32 [PATCH] Add accumulated call counter for memory allocation profiling David Wang
@ 2024-06-30 19:33 ` Suren Baghdasaryan
  2024-06-30 19:52   ` Kent Overstreet
  2024-07-01  2:23   ` David Wang
  0 siblings, 2 replies; 7+ messages in thread
From: Suren Baghdasaryan @ 2024-06-30 19:33 UTC (permalink / raw)
  To: David Wang; +Cc: kent.overstreet, akpm, linux-mm, linux-kernel

On Mon, Jun 17, 2024 at 8:33 AM David Wang <00107082@163.com> wrote:
>
> Accumulated call counter can be used to evaluate rate
> of memory allocation via delta(counters)/delta(time).
> This metrics can help analysis performance behaviours,
> e.g. tuning cache size, etc.

Sorry for the delay, David.
IIUC with this counter you can identify the number of allocations ever
made from a specific code location. Could you please clarify the usage
a bit more? Is the goal to see which locations are the most active and
the rate at which allocations are made there? How will that
information be used?
I'm a bit cautious here because each counter will take more space and
use some additional cpu cycles.
Thanks,
Suren.

>
> Signed-off-by: David Wang <00107082@163.com>
> ---
>  include/linux/alloc_tag.h | 11 +++++++----
>  lib/alloc_tag.c           |  7 +++----
>  2 files changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
> index abd24016a900..62734244c0b9 100644
> --- a/include/linux/alloc_tag.h
> +++ b/include/linux/alloc_tag.h
> @@ -18,6 +18,7 @@
>  struct alloc_tag_counters {
>         u64 bytes;
>         u64 calls;
> +       u64 accu_calls;
>  };
>
>  /*
> @@ -102,14 +103,15 @@ static inline bool mem_alloc_profiling_enabled(void)
>
>  static inline struct alloc_tag_counters alloc_tag_read(struct alloc_tag *tag)
>  {
> -       struct alloc_tag_counters v = { 0, 0 };
> +       struct alloc_tag_counters v = { 0, 0, 0 };
>         struct alloc_tag_counters *counter;
>         int cpu;
>
>         for_each_possible_cpu(cpu) {
> -               counter = per_cpu_ptr(tag->counters, cpu);
> -               v.bytes += counter->bytes;
> -               v.calls += counter->calls;
> +               counter         = per_cpu_ptr(tag->counters, cpu);
> +               v.bytes         += counter->bytes;
> +               v.calls         += counter->calls;
> +               v.accu_calls    += counter->accu_calls;
>         }
>
>         return v;
> @@ -145,6 +147,7 @@ static inline void __alloc_tag_ref_set(union codetag_ref *ref, struct alloc_tag
>          * counter because when we free each part the counter will be decremented.
>          */
>         this_cpu_inc(tag->counters->calls);
> +       this_cpu_inc(tag->counters->accu_calls);
>  }
>
>  static inline void alloc_tag_ref_set(union codetag_ref *ref, struct alloc_tag *tag)
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index 11ed973ac359..c4059362d828 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -66,8 +66,8 @@ static void allocinfo_stop(struct seq_file *m, void *arg)
>  static void print_allocinfo_header(struct seq_buf *buf)
>  {
>         /* Output format version, so we can change it. */
> -       seq_buf_printf(buf, "allocinfo - version: 1.0\n");
> -       seq_buf_printf(buf, "#     <size>  <calls> <tag info>\n");
> +       seq_buf_printf(buf, "allocinfo - version: 1.1\n");
> +       seq_buf_printf(buf, "#     <size>  <calls> <tag info> <accumulated calls>\n");
>  }
>
>  static void alloc_tag_to_text(struct seq_buf *out, struct codetag *ct)
> @@ -78,8 +78,7 @@ static void alloc_tag_to_text(struct seq_buf *out, struct codetag *ct)
>
>         seq_buf_printf(out, "%12lli %8llu ", bytes, counter.calls);
>         codetag_to_text(out, ct);
> -       seq_buf_putc(out, ' ');
> -       seq_buf_putc(out, '\n');
> +       seq_buf_printf(out, " %llu\n", counter.accu_calls);
>  }
>
>  static int allocinfo_show(struct seq_file *m, void *arg)
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Add accumulated call counter for memory allocation profiling
  2024-06-30 19:33 ` Suren Baghdasaryan
@ 2024-06-30 19:52   ` Kent Overstreet
  2024-07-01  2:23   ` David Wang
  1 sibling, 0 replies; 7+ messages in thread
From: Kent Overstreet @ 2024-06-30 19:52 UTC (permalink / raw)
  To: Suren Baghdasaryan; +Cc: David Wang, akpm, linux-mm, linux-kernel

On Sun, Jun 30, 2024 at 12:33:14PM GMT, Suren Baghdasaryan wrote:
> On Mon, Jun 17, 2024 at 8:33 AM David Wang <00107082@163.com> wrote:
> >
> > Accumulated call counter can be used to evaluate rate
> > of memory allocation via delta(counters)/delta(time).
> > This metrics can help analysis performance behaviours,
> > e.g. tuning cache size, etc.
> 
> Sorry for the delay, David.
> IIUC with this counter you can identify the number of allocations ever
> made from a specific code location. Could you please clarify the usage
> a bit more? Is the goal to see which locations are the most active and
> the rate at which allocations are made there? How will that
> information be used?
> I'm a bit cautious here because each counter will take more space and
> use some additional cpu cycles.
> Thanks,
> Suren.

Maybe behind another kconfig option?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re:Re: [PATCH] Add accumulated call counter for memory allocation profiling
  2024-06-30 19:33 ` Suren Baghdasaryan
  2024-06-30 19:52   ` Kent Overstreet
@ 2024-07-01  2:23   ` David Wang
  2024-07-01 21:58     ` Kent Overstreet
  1 sibling, 1 reply; 7+ messages in thread
From: David Wang @ 2024-07-01  2:23 UTC (permalink / raw)
  To: Suren Baghdasaryan; +Cc: kent.overstreet, akpm, linux-mm, linux-kernel

HI Suren, 

At 2024-07-01 03:33:14, "Suren Baghdasaryan" <surenb@google.com> wrote:
>On Mon, Jun 17, 2024 at 8:33 AM David Wang <00107082@163.com> wrote:
>>
>> Accumulated call counter can be used to evaluate rate
>> of memory allocation via delta(counters)/delta(time).
>> This metrics can help analysis performance behaviours,
>> e.g. tuning cache size, etc.
>
>Sorry for the delay, David.
>IIUC with this counter you can identify the number of allocations ever
>made from a specific code location. Could you please clarify the usage
>a bit more? Is the goal to see which locations are the most active and
>the rate at which allocations are made there? How will that
>information be used?
 
Cumulative counters can be sampled with timestamp,  say at T1, a monitoring tool got a sample value V1,
then after sampling interval, at T2,  got a sample value V2. Then the average rate of allocation can be evaluated
via (V2-V1)/(T2-T1). (The accuracy depends on sampling interval)

This information "may" help identify where the memory allocation is unnecessary frequent,  
and  gain some  better performance by making less memory allocation .
The performance "gain" is just a guess, I do not have a valid example.



>I'm a bit cautious here because each counter will take more space and
>use some additional cpu cycles.
>Thanks,
>Suren.
>



Thanks~!
David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Add accumulated call counter for memory allocation profiling
  2024-07-01  2:23   ` David Wang
@ 2024-07-01 21:58     ` Kent Overstreet
  2024-09-12  2:27       ` David Wang
  0 siblings, 1 reply; 7+ messages in thread
From: Kent Overstreet @ 2024-07-01 21:58 UTC (permalink / raw)
  To: David Wang; +Cc: Suren Baghdasaryan, akpm, linux-mm, linux-kernel

On Mon, Jul 01, 2024 at 10:23:32AM GMT, David Wang wrote:
> HI Suren, 
> 
> At 2024-07-01 03:33:14, "Suren Baghdasaryan" <surenb@google.com> wrote:
> >On Mon, Jun 17, 2024 at 8:33 AM David Wang <00107082@163.com> wrote:
> >>
> >> Accumulated call counter can be used to evaluate rate
> >> of memory allocation via delta(counters)/delta(time).
> >> This metrics can help analysis performance behaviours,
> >> e.g. tuning cache size, etc.
> >
> >Sorry for the delay, David.
> >IIUC with this counter you can identify the number of allocations ever
> >made from a specific code location. Could you please clarify the usage
> >a bit more? Is the goal to see which locations are the most active and
> >the rate at which allocations are made there? How will that
> >information be used?
>  
> Cumulative counters can be sampled with timestamp,  say at T1, a monitoring tool got a sample value V1,
> then after sampling interval, at T2,  got a sample value V2. Then the average rate of allocation can be evaluated
> via (V2-V1)/(T2-T1). (The accuracy depends on sampling interval)
> 
> This information "may" help identify where the memory allocation is unnecessary frequent,  
> and  gain some  better performance by making less memory allocation .
> The performance "gain" is just a guess, I do not have a valid example.

Easier to just run perf...


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Add accumulated call counter for memory allocation profiling
  2024-07-01 21:58     ` Kent Overstreet
@ 2024-09-12  2:27       ` David Wang
  2024-09-12 16:12         ` Suren Baghdasaryan
  0 siblings, 1 reply; 7+ messages in thread
From: David Wang @ 2024-09-12  2:27 UTC (permalink / raw)
  To: kent.overstreet, surenb; +Cc: akpm, linux-kernel, linux-mm

At 2024-07-02 05:58:50, "Kent Overstreet" <kent.overstreet@linux.dev> wrote:
>On Mon, Jul 01, 2024 at 10:23:32AM GMT, David Wang wrote:
>> HI Suren, 
>> 
>> At 2024-07-01 03:33:14, "Suren Baghdasaryan" <surenb@google.com> wrote:
>> >On Mon, Jun 17, 2024 at 8:33 AM David Wang <00107082@163.com> wrote:
>> >>
>> >> Accumulated call counter can be used to evaluate rate
>> >> of memory allocation via delta(counters)/delta(time).
>> >> This metrics can help analysis performance behaviours,
>> >> e.g. tuning cache size, etc.
>> >
>> >Sorry for the delay, David.
>> >IIUC with this counter you can identify the number of allocations ever
>> >made from a specific code location. Could you please clarify the usage
>> >a bit more? Is the goal to see which locations are the most active and
>> >the rate at which allocations are made there? How will that
>> >information be used?
>>  
>> Cumulative counters can be sampled with timestamp,  say at T1, a monitoring tool got a sample value V1,
>> then after sampling interval, at T2,  got a sample value V2. Then the average rate of allocation can be evaluated
>> via (V2-V1)/(T2-T1). (The accuracy depends on sampling interval)
>> 
>> This information "may" help identify where the memory allocation is unnecessary frequent,  
>> and  gain some  better performance by making less memory allocation .
>> The performance "gain" is just a guess, I do not have a valid example.
>
>Easier to just run perf...

Hi, 

To Kent:
It is strangely odd to reply to this when I was trying to debug a performance issue for bcachefs :)

Yes it is true that performance bottleneck could be identified by perf tools, but normally perf
is not continously running (well, there are some continous profiling projects out there).
And also, memory allocation normally is not the biggest bottleneck,
 its impact may not easily picked up by perf. 

Well, in the case of https://lore.kernel.org/lkml/20240906154354.61915-1-00107082@163.com/,
the memory allocation is picked up by perf tools though. 
But with this patch, it is easier to spot that memory allocations behavior are quite different:
When performance were bad, the average rate for 
"fs/bcachefs/io_write.c:113 func:__bio_alloc_page_pool" was 400k/s,
while when performance were good, rate was only less than 200/s.

(I have a sample tool collecting /proc/allocinfo, and the data is stored in prometheus,
the rate is calculated and plot via prometheus statement:
irate(mem_profiling_count_total{file=~"fs/bcachefs.*", func="__bio_alloc_page_pool"}[5m]))

Hope this could be a valid example demonstrating the usefulness of accumulative counters
of memory allocation for performance issues.


Thanks
David

 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Add accumulated call counter for memory allocation profiling
  2024-09-12  2:27       ` David Wang
@ 2024-09-12 16:12         ` Suren Baghdasaryan
  0 siblings, 0 replies; 7+ messages in thread
From: Suren Baghdasaryan @ 2024-09-12 16:12 UTC (permalink / raw)
  To: David Wang, Yu Zhao; +Cc: kent.overstreet, akpm, linux-kernel, linux-mm

On Wed, Sep 11, 2024 at 7:28 PM David Wang <00107082@163.com> wrote:
>
> At 2024-07-02 05:58:50, "Kent Overstreet" <kent.overstreet@linux.dev> wrote:
> >On Mon, Jul 01, 2024 at 10:23:32AM GMT, David Wang wrote:
> >> HI Suren,
> >>
> >> At 2024-07-01 03:33:14, "Suren Baghdasaryan" <surenb@google.com> wrote:
> >> >On Mon, Jun 17, 2024 at 8:33 AM David Wang <00107082@163.com> wrote:
> >> >>
> >> >> Accumulated call counter can be used to evaluate rate
> >> >> of memory allocation via delta(counters)/delta(time).
> >> >> This metrics can help analysis performance behaviours,
> >> >> e.g. tuning cache size, etc.
> >> >
> >> >Sorry for the delay, David.
> >> >IIUC with this counter you can identify the number of allocations ever
> >> >made from a specific code location. Could you please clarify the usage
> >> >a bit more? Is the goal to see which locations are the most active and
> >> >the rate at which allocations are made there? How will that
> >> >information be used?
> >>
> >> Cumulative counters can be sampled with timestamp,  say at T1, a monitoring tool got a sample value V1,
> >> then after sampling interval, at T2,  got a sample value V2. Then the average rate of allocation can be evaluated
> >> via (V2-V1)/(T2-T1). (The accuracy depends on sampling interval)
> >>
> >> This information "may" help identify where the memory allocation is unnecessary frequent,
> >> and  gain some  better performance by making less memory allocation .
> >> The performance "gain" is just a guess, I do not have a valid example.
> >
> >Easier to just run perf...
>
> Hi,
>
> To Kent:
> It is strangely odd to reply to this when I was trying to debug a performance issue for bcachefs :)
>
> Yes it is true that performance bottleneck could be identified by perf tools, but normally perf
> is not continously running (well, there are some continous profiling projects out there).
> And also, memory allocation normally is not the biggest bottleneck,
>  its impact may not easily picked up by perf.
>
> Well, in the case of https://lore.kernel.org/lkml/20240906154354.61915-1-00107082@163.com/,
> the memory allocation is picked up by perf tools though.
> But with this patch, it is easier to spot that memory allocations behavior are quite different:
> When performance were bad, the average rate for
> "fs/bcachefs/io_write.c:113 func:__bio_alloc_page_pool" was 400k/s,
> while when performance were good, rate was only less than 200/s.
>
> (I have a sample tool collecting /proc/allocinfo, and the data is stored in prometheus,
> the rate is calculated and plot via prometheus statement:
> irate(mem_profiling_count_total{file=~"fs/bcachefs.*", func="__bio_alloc_page_pool"}[5m]))
>
> Hope this could be a valid example demonstrating the usefulness of accumulative counters
> of memory allocation for performance issues.

Hi David,
I agree with Kent that this feature should be behind a kconfig flag.
We don't want to impose the overhead to the users who do not need this
feature.
Thanks,
Suren.

>
>
> Thanks
> David
>
>
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-09-12 16:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-17 15:32 [PATCH] Add accumulated call counter for memory allocation profiling David Wang
2024-06-30 19:33 ` Suren Baghdasaryan
2024-06-30 19:52   ` Kent Overstreet
2024-07-01  2:23   ` David Wang
2024-07-01 21:58     ` Kent Overstreet
2024-09-12  2:27       ` David Wang
2024-09-12 16:12         ` Suren Baghdasaryan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).