Re: [PATCH v3 0/2] uprobes: Improve scalability by reducing the contention on siglock

BPF List
 help / color / mirror / Atom feed

From: "Liao, Chang" <liaochang1@huawei.com>
To: <oleg@redhat.com>
Cc: <linux-kernel@vger.kernel.org>,
	<linux-trace-kernel@vger.kernel.org>,
	<linux-perf-users@vger.kernel.org>, <bpf@vger.kernel.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrii Nakryiko <andrii@kernel.org>
Subject: Re: [PATCH v3 0/2] uprobes: Improve scalability by reducing the contention on siglock
Date: Sat, 14 Sep 2024 10:53:01 +0800	[thread overview]
Message-ID: <cfa88a34-617b-9a24-a648-55262a4e8a4c@huawei.com> (raw)
In-Reply-To: <20240815014629.2685155-1-liaochang1@huawei.com>

Hi, Oleg

Kindly ping.

This series have been pending for a month. Is thre any issue I overlook?

Thanks.

在 2024/8/15 9:46, Liao Chang 写道:
> The profiling result of BPF selftest on ARM64 platform reveals the
> significant contention on the current->sighand->siglock is the
> scalability bottleneck. The reason is also very straightforward that all
> producer threads of benchmark have to contend the spinlock mentioned to
> resume the TIF_SIGPENDING bit in thread_info that might be removed in
> uprobe_deny_signal().
> 
> The contention on current->sighand->siglock is unnecessary, this series
> remove them thoroughly. I've use the script developed by Andrii in [1]
> to run benchmark. The CPU used was Kunpeng916 (Hi1616), 4 NUMA nodes,
> 64 cores@2.4GHz running the kernel on next tree + the optimization in
> [2] for get_xol_insn_slot().
> 
> before-opt
> ----------
> uprobe-nop      ( 1 cpus):    0.907 ± 0.003M/s  (  0.907M/s/cpu)
> uprobe-nop      ( 2 cpus):    1.676 ± 0.008M/s  (  0.838M/s/cpu)
> uprobe-nop      ( 4 cpus):    3.210 ± 0.003M/s  (  0.802M/s/cpu)
> uprobe-nop      ( 8 cpus):    4.457 ± 0.003M/s  (  0.557M/s/cpu)
> uprobe-nop      (16 cpus):    3.724 ± 0.011M/s  (  0.233M/s/cpu)
> uprobe-nop      (32 cpus):    2.761 ± 0.003M/s  (  0.086M/s/cpu)
> uprobe-nop      (64 cpus):    1.293 ± 0.015M/s  (  0.020M/s/cpu)
> 
> uprobe-push     ( 1 cpus):    0.883 ± 0.001M/s  (  0.883M/s/cpu)
> uprobe-push     ( 2 cpus):    1.642 ± 0.005M/s  (  0.821M/s/cpu)
> uprobe-push     ( 4 cpus):    3.086 ± 0.002M/s  (  0.771M/s/cpu)
> uprobe-push     ( 8 cpus):    3.390 ± 0.003M/s  (  0.424M/s/cpu)
> uprobe-push     (16 cpus):    2.652 ± 0.005M/s  (  0.166M/s/cpu)
> uprobe-push     (32 cpus):    2.713 ± 0.005M/s  (  0.085M/s/cpu)
> uprobe-push     (64 cpus):    1.313 ± 0.009M/s  (  0.021M/s/cpu)
> 
> uprobe-ret      ( 1 cpus):    1.774 ± 0.000M/s  (  1.774M/s/cpu)
> uprobe-ret      ( 2 cpus):    3.350 ± 0.001M/s  (  1.675M/s/cpu)
> uprobe-ret      ( 4 cpus):    6.604 ± 0.000M/s  (  1.651M/s/cpu)
> uprobe-ret      ( 8 cpus):    6.706 ± 0.005M/s  (  0.838M/s/cpu)
> uprobe-ret      (16 cpus):    5.231 ± 0.001M/s  (  0.327M/s/cpu)
> uprobe-ret      (32 cpus):    5.743 ± 0.003M/s  (  0.179M/s/cpu)
> uprobe-ret      (64 cpus):    4.726 ± 0.016M/s  (  0.074M/s/cpu)
> 
> after-opt
> ---------
> uprobe-nop      ( 1 cpus):    0.985 ± 0.002M/s  (  0.985M/s/cpu)
> uprobe-nop      ( 2 cpus):    1.773 ± 0.005M/s  (  0.887M/s/cpu)
> uprobe-nop      ( 4 cpus):    3.304 ± 0.001M/s  (  0.826M/s/cpu)
> uprobe-nop      ( 8 cpus):    5.328 ± 0.002M/s  (  0.666M/s/cpu)
> uprobe-nop      (16 cpus):    6.475 ± 0.002M/s  (  0.405M/s/cpu)
> uprobe-nop      (32 cpus):    4.831 ± 0.082M/s  (  0.151M/s/cpu)
> uprobe-nop      (64 cpus):    2.564 ± 0.053M/s  (  0.040M/s/cpu)
> 
> uprobe-push     ( 1 cpus):    0.964 ± 0.001M/s  (  0.964M/s/cpu)
> uprobe-push     ( 2 cpus):    1.766 ± 0.002M/s  (  0.883M/s/cpu)
> uprobe-push     ( 4 cpus):    3.290 ± 0.009M/s  (  0.823M/s/cpu)
> uprobe-push     ( 8 cpus):    4.670 ± 0.002M/s  (  0.584M/s/cpu)
> uprobe-push     (16 cpus):    5.197 ± 0.004M/s  (  0.325M/s/cpu)
> uprobe-push     (32 cpus):    5.068 ± 0.161M/s  (  0.158M/s/cpu)
> uprobe-push     (64 cpus):    2.605 ± 0.026M/s  (  0.041M/s/cpu)
> 
> uprobe-ret      ( 1 cpus):    1.833 ± 0.001M/s  (  1.833M/s/cpu)
> uprobe-ret      ( 2 cpus):    3.384 ± 0.003M/s  (  1.692M/s/cpu)
> uprobe-ret      ( 4 cpus):    6.677 ± 0.004M/s  (  1.669M/s/cpu)
> uprobe-ret      ( 8 cpus):    6.854 ± 0.005M/s  (  0.857M/s/cpu)
> uprobe-ret      (16 cpus):    6.508 ± 0.006M/s  (  0.407M/s/cpu)
> uprobe-ret      (32 cpus):    5.793 ± 0.009M/s  (  0.181M/s/cpu)
> uprobe-ret      (64 cpus):    4.743 ± 0.016M/s  (  0.074M/s/cpu)
> 
> Above benchmark results demonstrates a obivious improvement in the
> scalability of trig-uprobe-nop and trig-uprobe-push, the peak throughput
> of which are from 4.5M/s to 6.4M/s and 3.3M/s to 5.1M/s individually.
> 
> v3->v2:
> Renaming the flag in [2/2], s/deny_signal/signal_denied/g.
> 
> v2->v1:
> Oleg pointed out the _DENY_SIGNAL will be replaced by _ACK upon the
> completion of singlestep which leads to handle_singlestep() has no
> chance to restore the removed TIF_SIGPENDING [3] and some case in
> question. So this revision proposes to use a flag in uprobe_task to
> track the denied TIF_SIGPENDING instead of new UPROBE_SSTEP state.
> 
> [1] https://lore.kernel.org/all/20240731214256.3588718-1-andrii@kernel.org
> [2] https://lore.kernel.org/all/20240727094405.1362496-1-liaochang1@huawei.com
> [3] https://lore.kernel.org/all/20240801082407.1618451-1-liaochang1@huawei.com
> 
> Liao Chang (2):
>   uprobes: Remove redundant spinlock in uprobe_deny_signal()
>   uprobes: Remove the spinlock within handle_singlestep()
> 
>  include/linux/uprobes.h |  1 +
>  kernel/events/uprobes.c | 10 +++++-----
>  2 files changed, 6 insertions(+), 5 deletions(-)
> 

-- 
BR
Liao, Chang

next prev parent reply	other threads:[~2024-09-14  2:53 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-15  1:46 [PATCH v3 0/2] uprobes: Improve scalability by reducing the contention on siglock Liao Chang
2024-08-15  1:46 ` [PATCH v3 1/2] uprobes: Remove redundant spinlock in uprobe_deny_signal() Liao Chang
2024-10-22  4:01   ` Masami Hiramatsu
2024-08-15  1:46 ` [PATCH v3 2/2] uprobes: Remove the spinlock within handle_singlestep() Liao Chang
2024-10-22  4:01   ` Masami Hiramatsu
2024-09-14  2:53 ` Liao, Chang [this message]
2024-09-15 15:18   ` [PATCH v3 0/2] uprobes: Improve scalability by reducing the contention on siglock Oleg Nesterov
2024-09-18  2:05     ` Liao, Chang
2024-10-11 19:34       ` Andrii Nakryiko
2024-10-21 10:43         ` Liao, Chang
2024-10-21 17:18           ` Andrii Nakryiko
2024-10-22  6:18             ` Liao, Chang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cfa88a34-617b-9a24-a648-55262a4e8a4c@huawei.com \
    --to=liaochang1@huawei.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox