From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E626D4C92;
	Sat, 14 Sep 2024 02:53:05 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.32
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1726282389; cv=none; b=BU+asmbGYTGkP9tPEQ3NFG/uTLwIOkRW2pNQV3WYScDCovFD6aQk18vSDSwATn+RQ/sxfHQGfMNmw/eU7FR7XtUOvB7Zzy4Vb7VLTNICEypHmnOHD69duTd7y5jUe55mBurqlagG9QZGVRnIjZVS2klf5JwdsSHQlKAh5GD8miU=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1726282389; c=relaxed/simple;
	bh=CzdkWVv9/yiMtGtG4wA7P6VDMMTifj0mpX3iXE4TtcU=;
	h=Message-ID:Date:MIME-Version:Subject:From:To:CC:References:
	 In-Reply-To:Content-Type; b=sAosRz5Wnqa1LOb3NSXPuC/7ke7FXlEgYmSqAe+iuG3kmziARu34wPjc9sx4A38/vjmqcQ5hwQMQPeAUqCULOqoIHLgPJd3c2cO2ibw6uOB6EfEFis+PX7Fvw44AUcH0Z/M7Z4BqTWikhNVN/W6Pbq4wCzi9i6gWLbdx7aDsr9w=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.32
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com
Received: from mail.maildlp.com (unknown [172.19.88.234])
	by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4X5G262rV7z23jX3;
	Sat, 14 Sep 2024 10:53:02 +0800 (CST)
Received: from kwepemd200013.china.huawei.com (unknown [7.221.188.133])
	by mail.maildlp.com (Postfix) with ESMTPS id 1523A140134;
	Sat, 14 Sep 2024 10:53:03 +0800 (CST)
Received: from [10.67.110.108] (10.67.110.108) by
 kwepemd200013.china.huawei.com (7.221.188.133) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.2.1258.34; Sat, 14 Sep 2024 10:53:02 +0800
Message-ID: <cfa88a34-617b-9a24-a648-55262a4e8a4c@huawei.com>
Date: Sat, 14 Sep 2024 10:53:01 +0800
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.11.2
Subject: Re: [PATCH v3 0/2] uprobes: Improve scalability by reducing the
 contention on siglock
From: "Liao, Chang" <liaochang1@huawei.com>
To: <oleg@redhat.com>
CC: <linux-kernel@vger.kernel.org>, <linux-trace-kernel@vger.kernel.org>,
	<linux-perf-users@vger.kernel.org>, <bpf@vger.kernel.org>, Masami Hiramatsu
	<mhiramat@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Andrii Nakryiko
	<andrii@kernel.org>
References: <20240815014629.2685155-1-liaochang1@huawei.com>
In-Reply-To: <20240815014629.2685155-1-liaochang1@huawei.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To
 kwepemd200013.china.huawei.com (7.221.188.133)

Hi, Oleg

Kindly ping.

This series have been pending for a month. Is thre any issue I overlook?

Thanks.

在 2024/8/15 9:46, Liao Chang 写道:
> The profiling result of BPF selftest on ARM64 platform reveals the
> significant contention on the current->sighand->siglock is the
> scalability bottleneck. The reason is also very straightforward that all
> producer threads of benchmark have to contend the spinlock mentioned to
> resume the TIF_SIGPENDING bit in thread_info that might be removed in
> uprobe_deny_signal().
> 
> The contention on current->sighand->siglock is unnecessary, this series
> remove them thoroughly. I've use the script developed by Andrii in [1]
> to run benchmark. The CPU used was Kunpeng916 (Hi1616), 4 NUMA nodes,
> 64 cores@2.4GHz running the kernel on next tree + the optimization in
> [2] for get_xol_insn_slot().
> 
> before-opt
> ----------
> uprobe-nop      ( 1 cpus):    0.907 ± 0.003M/s  (  0.907M/s/cpu)
> uprobe-nop      ( 2 cpus):    1.676 ± 0.008M/s  (  0.838M/s/cpu)
> uprobe-nop      ( 4 cpus):    3.210 ± 0.003M/s  (  0.802M/s/cpu)
> uprobe-nop      ( 8 cpus):    4.457 ± 0.003M/s  (  0.557M/s/cpu)
> uprobe-nop      (16 cpus):    3.724 ± 0.011M/s  (  0.233M/s/cpu)
> uprobe-nop      (32 cpus):    2.761 ± 0.003M/s  (  0.086M/s/cpu)
> uprobe-nop      (64 cpus):    1.293 ± 0.015M/s  (  0.020M/s/cpu)
> 
> uprobe-push     ( 1 cpus):    0.883 ± 0.001M/s  (  0.883M/s/cpu)
> uprobe-push     ( 2 cpus):    1.642 ± 0.005M/s  (  0.821M/s/cpu)
> uprobe-push     ( 4 cpus):    3.086 ± 0.002M/s  (  0.771M/s/cpu)
> uprobe-push     ( 8 cpus):    3.390 ± 0.003M/s  (  0.424M/s/cpu)
> uprobe-push     (16 cpus):    2.652 ± 0.005M/s  (  0.166M/s/cpu)
> uprobe-push     (32 cpus):    2.713 ± 0.005M/s  (  0.085M/s/cpu)
> uprobe-push     (64 cpus):    1.313 ± 0.009M/s  (  0.021M/s/cpu)
> 
> uprobe-ret      ( 1 cpus):    1.774 ± 0.000M/s  (  1.774M/s/cpu)
> uprobe-ret      ( 2 cpus):    3.350 ± 0.001M/s  (  1.675M/s/cpu)
> uprobe-ret      ( 4 cpus):    6.604 ± 0.000M/s  (  1.651M/s/cpu)
> uprobe-ret      ( 8 cpus):    6.706 ± 0.005M/s  (  0.838M/s/cpu)
> uprobe-ret      (16 cpus):    5.231 ± 0.001M/s  (  0.327M/s/cpu)
> uprobe-ret      (32 cpus):    5.743 ± 0.003M/s  (  0.179M/s/cpu)
> uprobe-ret      (64 cpus):    4.726 ± 0.016M/s  (  0.074M/s/cpu)
> 
> after-opt
> ---------
> uprobe-nop      ( 1 cpus):    0.985 ± 0.002M/s  (  0.985M/s/cpu)
> uprobe-nop      ( 2 cpus):    1.773 ± 0.005M/s  (  0.887M/s/cpu)
> uprobe-nop      ( 4 cpus):    3.304 ± 0.001M/s  (  0.826M/s/cpu)
> uprobe-nop      ( 8 cpus):    5.328 ± 0.002M/s  (  0.666M/s/cpu)
> uprobe-nop      (16 cpus):    6.475 ± 0.002M/s  (  0.405M/s/cpu)
> uprobe-nop      (32 cpus):    4.831 ± 0.082M/s  (  0.151M/s/cpu)
> uprobe-nop      (64 cpus):    2.564 ± 0.053M/s  (  0.040M/s/cpu)
> 
> uprobe-push     ( 1 cpus):    0.964 ± 0.001M/s  (  0.964M/s/cpu)
> uprobe-push     ( 2 cpus):    1.766 ± 0.002M/s  (  0.883M/s/cpu)
> uprobe-push     ( 4 cpus):    3.290 ± 0.009M/s  (  0.823M/s/cpu)
> uprobe-push     ( 8 cpus):    4.670 ± 0.002M/s  (  0.584M/s/cpu)
> uprobe-push     (16 cpus):    5.197 ± 0.004M/s  (  0.325M/s/cpu)
> uprobe-push     (32 cpus):    5.068 ± 0.161M/s  (  0.158M/s/cpu)
> uprobe-push     (64 cpus):    2.605 ± 0.026M/s  (  0.041M/s/cpu)
> 
> uprobe-ret      ( 1 cpus):    1.833 ± 0.001M/s  (  1.833M/s/cpu)
> uprobe-ret      ( 2 cpus):    3.384 ± 0.003M/s  (  1.692M/s/cpu)
> uprobe-ret      ( 4 cpus):    6.677 ± 0.004M/s  (  1.669M/s/cpu)
> uprobe-ret      ( 8 cpus):    6.854 ± 0.005M/s  (  0.857M/s/cpu)
> uprobe-ret      (16 cpus):    6.508 ± 0.006M/s  (  0.407M/s/cpu)
> uprobe-ret      (32 cpus):    5.793 ± 0.009M/s  (  0.181M/s/cpu)
> uprobe-ret      (64 cpus):    4.743 ± 0.016M/s  (  0.074M/s/cpu)
> 
> Above benchmark results demonstrates a obivious improvement in the
> scalability of trig-uprobe-nop and trig-uprobe-push, the peak throughput
> of which are from 4.5M/s to 6.4M/s and 3.3M/s to 5.1M/s individually.
> 
> v3->v2:
> Renaming the flag in [2/2], s/deny_signal/signal_denied/g.
> 
> v2->v1:
> Oleg pointed out the _DENY_SIGNAL will be replaced by _ACK upon the
> completion of singlestep which leads to handle_singlestep() has no
> chance to restore the removed TIF_SIGPENDING [3] and some case in
> question. So this revision proposes to use a flag in uprobe_task to
> track the denied TIF_SIGPENDING instead of new UPROBE_SSTEP state.
> 
> [1] https://lore.kernel.org/all/20240731214256.3588718-1-andrii@kernel.org
> [2] https://lore.kernel.org/all/20240727094405.1362496-1-liaochang1@huawei.com
> [3] https://lore.kernel.org/all/20240801082407.1618451-1-liaochang1@huawei.com
> 
> Liao Chang (2):
>   uprobes: Remove redundant spinlock in uprobe_deny_signal()
>   uprobes: Remove the spinlock within handle_singlestep()
> 
>  include/linux/uprobes.h |  1 +
>  kernel/events/uprobes.c | 10 +++++-----
>  2 files changed, 6 insertions(+), 5 deletions(-)
> 

-- 
BR
Liao, Chang