All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Olsa <olsajiri@gmail.com>
To: Andrii Nakryiko <andrii@kernel.org>
Cc: linux-trace-kernel@vger.kernel.org, peterz@infradead.org,
	mingo@kernel.org, oleg@redhat.com, rostedt@goodmis.org,
	mhiramat@kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org, liaochang1@huawei.com,
	kernel-team@meta.com
Subject: Re: [PATCH perf/core 4/4] uprobes: reuse return_instances between multiple uretprobes within task
Date: Fri, 6 Dec 2024 15:09:03 +0100	[thread overview]
Message-ID: <Z1MFfyNLRilb8G6r@krava> (raw)
In-Reply-To: <20241206002417.3295533-5-andrii@kernel.org>

On Thu, Dec 05, 2024 at 04:24:17PM -0800, Andrii Nakryiko wrote:
> Instead of constantly allocating and freeing very short-lived
> struct return_instance, reuse it as much as possible within current
> task. For that, store a linked list of reusable return_instances within
> current->utask.
> 
> The only complication is that ri_timer() might be still processing such
> return_instance. And so while the main uretprobe processing logic might
> be already done with return_instance and would be OK to immediately
> reuse it for the next uretprobe instance, it's not correct to
> unconditionally reuse it just like that.
> 
> Instead we make sure that ri_timer() can't possibly be processing it by
> using seqcount_t, with ri_timer() being "a writer", while
> free_ret_instance() being "a reader". If, after we unlink return
> instance from utask->return_instances list, we know that ri_timer()
> hasn't gotten to processing utask->return_instances yet, then we can be
> sure that immediate return_instance reuse is OK, and so we put it
> onto utask->ri_pool for future (potentially, almost immediate) reuse.
> 
> This change shows improvements both in single CPU performance (by
> avoiding relatively expensive kmalloc/free combon) and in terms of
> multi-CPU scalability, where you can see that per-CPU throughput doesn't
> decline as steeply with increased number of CPUs (which were previously
> attributed to kmalloc()/free() through profiling):
> 
> BASELINE (latest perf/core)
> ===========================
> uretprobe-nop         ( 1 cpus):    1.898 ± 0.002M/s  (  1.898M/s/cpu)
> uretprobe-nop         ( 2 cpus):    3.574 ± 0.011M/s  (  1.787M/s/cpu)
> uretprobe-nop         ( 3 cpus):    5.279 ± 0.066M/s  (  1.760M/s/cpu)
> uretprobe-nop         ( 4 cpus):    6.824 ± 0.047M/s  (  1.706M/s/cpu)
> uretprobe-nop         ( 5 cpus):    8.339 ± 0.060M/s  (  1.668M/s/cpu)
> uretprobe-nop         ( 6 cpus):    9.812 ± 0.047M/s  (  1.635M/s/cpu)
> uretprobe-nop         ( 7 cpus):   11.030 ± 0.048M/s  (  1.576M/s/cpu)
> uretprobe-nop         ( 8 cpus):   12.453 ± 0.126M/s  (  1.557M/s/cpu)
> uretprobe-nop         (10 cpus):   14.838 ± 0.044M/s  (  1.484M/s/cpu)
> uretprobe-nop         (12 cpus):   17.092 ± 0.115M/s  (  1.424M/s/cpu)
> uretprobe-nop         (14 cpus):   19.576 ± 0.022M/s  (  1.398M/s/cpu)
> uretprobe-nop         (16 cpus):   22.264 ± 0.015M/s  (  1.391M/s/cpu)
> uretprobe-nop         (24 cpus):   33.534 ± 0.078M/s  (  1.397M/s/cpu)
> uretprobe-nop         (32 cpus):   43.262 ± 0.127M/s  (  1.352M/s/cpu)
> uretprobe-nop         (40 cpus):   53.252 ± 0.080M/s  (  1.331M/s/cpu)
> uretprobe-nop         (48 cpus):   55.778 ± 0.045M/s  (  1.162M/s/cpu)
> uretprobe-nop         (56 cpus):   56.850 ± 0.227M/s  (  1.015M/s/cpu)
> uretprobe-nop         (64 cpus):   62.005 ± 0.077M/s  (  0.969M/s/cpu)
> uretprobe-nop         (72 cpus):   66.445 ± 0.236M/s  (  0.923M/s/cpu)
> uretprobe-nop         (80 cpus):   68.353 ± 0.180M/s  (  0.854M/s/cpu)
> 
> THIS PATCHSET (on top of latest perf/core)
> ==========================================
> uretprobe-nop         ( 1 cpus):    2.253 ± 0.004M/s  (  2.253M/s/cpu)
> uretprobe-nop         ( 2 cpus):    4.281 ± 0.003M/s  (  2.140M/s/cpu)
> uretprobe-nop         ( 3 cpus):    6.389 ± 0.027M/s  (  2.130M/s/cpu)
> uretprobe-nop         ( 4 cpus):    8.328 ± 0.005M/s  (  2.082M/s/cpu)
> uretprobe-nop         ( 5 cpus):   10.353 ± 0.001M/s  (  2.071M/s/cpu)
> uretprobe-nop         ( 6 cpus):   12.513 ± 0.010M/s  (  2.086M/s/cpu)
> uretprobe-nop         ( 7 cpus):   14.525 ± 0.017M/s  (  2.075M/s/cpu)
> uretprobe-nop         ( 8 cpus):   15.633 ± 0.013M/s  (  1.954M/s/cpu)
> uretprobe-nop         (10 cpus):   19.532 ± 0.011M/s  (  1.953M/s/cpu)
> uretprobe-nop         (12 cpus):   21.405 ± 0.009M/s  (  1.784M/s/cpu)
> uretprobe-nop         (14 cpus):   24.857 ± 0.020M/s  (  1.776M/s/cpu)
> uretprobe-nop         (16 cpus):   26.466 ± 0.018M/s  (  1.654M/s/cpu)
> uretprobe-nop         (24 cpus):   40.513 ± 0.222M/s  (  1.688M/s/cpu)
> uretprobe-nop         (32 cpus):   54.180 ± 0.074M/s  (  1.693M/s/cpu)
> uretprobe-nop         (40 cpus):   66.100 ± 0.082M/s  (  1.652M/s/cpu)
> uretprobe-nop         (48 cpus):   70.544 ± 0.068M/s  (  1.470M/s/cpu)
> uretprobe-nop         (56 cpus):   74.494 ± 0.055M/s  (  1.330M/s/cpu)
> uretprobe-nop         (64 cpus):   79.317 ± 0.029M/s  (  1.239M/s/cpu)
> uretprobe-nop         (72 cpus):   84.875 ± 0.020M/s  (  1.179M/s/cpu)
> uretprobe-nop         (80 cpus):   92.318 ± 0.224M/s  (  1.154M/s/cpu)

nice! left few comments but overall lgtm

thanks,
jirka

  parent reply	other threads:[~2024-12-06 14:09 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-06  0:24 [PATCH perf/core 0/4] Improve performance and scalability of uretprobes Andrii Nakryiko
2024-12-06  0:24 ` [PATCH perf/core 1/4] uprobes: simplify session consumer tracking Andrii Nakryiko
2024-12-06  9:38   ` [tip: perf/core] uprobes: Simplify " tip-bot2 for Andrii Nakryiko
2024-12-06 14:07   ` [PATCH perf/core 1/4] uprobes: simplify " Jiri Olsa
2024-12-06 17:50     ` Andrii Nakryiko
2024-12-09 10:32   ` [tip: perf/core] uprobes: Simplify " tip-bot2 for Andrii Nakryiko
2024-12-09 14:55   ` tip-bot2 for Andrii Nakryiko
2024-12-06  0:24 ` [PATCH perf/core 2/4] uprobes: decouple return_instance list traversal and freeing Andrii Nakryiko
2024-12-06  9:38   ` [tip: perf/core] uprobes: Decouple " tip-bot2 for Andrii Nakryiko
2024-12-09 10:32   ` tip-bot2 for Andrii Nakryiko
2024-12-09 14:55   ` tip-bot2 for Andrii Nakryiko
2024-12-06  0:24 ` [PATCH perf/core 3/4] uprobes: ensure return_instance is detached from the list before freeing Andrii Nakryiko
2024-12-06  9:38   ` [tip: perf/core] uprobes: Ensure " tip-bot2 for Andrii Nakryiko
2024-12-09 10:32   ` tip-bot2 for Andrii Nakryiko
2024-12-09 14:55   ` tip-bot2 for Andrii Nakryiko
2024-12-06  0:24 ` [PATCH perf/core 4/4] uprobes: reuse return_instances between multiple uretprobes within task Andrii Nakryiko
2024-12-06  9:38   ` [tip: perf/core] uprobes: Reuse " tip-bot2 for Andrii Nakryiko
2024-12-06 14:07   ` [PATCH perf/core 4/4] uprobes: reuse " Jiri Olsa
2024-12-06 18:00     ` Andrii Nakryiko
2024-12-07  0:36       ` Jiri Olsa
2024-12-06 14:09   ` Jiri Olsa [this message]
2024-12-09 10:32   ` [tip: perf/core] uprobes: Reuse " tip-bot2 for Andrii Nakryiko
2024-12-09 14:55   ` tip-bot2 for Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z1MFfyNLRilb8G6r@krava \
    --to=olsajiri@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=kernel-team@meta.com \
    --cc=liaochang1@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.