linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm64: uprobes: Optimize cache flushes for xol slot
@ 2024-09-19 12:17 Liao Chang
  2024-09-19 14:18 ` Oleg Nesterov
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Liao Chang @ 2024-09-19 12:17 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, catalin.marinas, will, mark.rutland
  Cc: linux-kernel, linux-trace-kernel, linux-arm-kernel

The profiling of single-thread selftests bench reveals a bottlenect in
caches_clean_inval_pou() on ARM64. On my local testing machine, this
function takes approximately 34% of CPU cycles for trig-uprobe-nop and
trig-uprobe-push.

This patch add a check to avoid unnecessary cache flush when writing
instruction to the xol slot. If the instruction is same with the
existing instruction in slot, there is no need to synchronize D/I cache.
Since xol slot allocation and updates occur on the hot path of uprobe
handling, The upstream kernel running on Kunpeng916 (Hi1616), 4 NUMA
nodes, 64 cores@ 2.4GHz reveals this optimization has obvious gain for
nop and push testcases.

Before (next-20240918)
----------------------
uprobe-nop      ( 1 cpus):    0.418 ± 0.001M/s  (  0.418M/s/cpu)
uprobe-push     ( 1 cpus):    0.411 ± 0.005M/s  (  0.411M/s/cpu)
uprobe-ret      ( 1 cpus):    2.052 ± 0.002M/s  (  2.052M/s/cpu)
uretprobe-nop   ( 1 cpus):    0.350 ± 0.000M/s  (  0.350M/s/cpu)
uretprobe-push  ( 1 cpus):    0.353 ± 0.000M/s  (  0.353M/s/cpu)
uretprobe-ret   ( 1 cpus):    1.074 ± 0.001M/s  (  1.074M/s/cpu)

After
-----
uprobe-nop      ( 1 cpus):    0.926 ± 0.000M/s  (  0.926M/s/cpu)
uprobe-push     ( 1 cpus):    0.910 ± 0.001M/s  (  0.910M/s/cpu)
uprobe-ret      ( 1 cpus):    2.056 ± 0.001M/s  (  2.056M/s/cpu)
uretprobe-nop   ( 1 cpus):    0.653 ± 0.001M/s  (  0.653M/s/cpu)
uretprobe-push  ( 1 cpus):    0.645 ± 0.000M/s  (  0.645M/s/cpu)
uretprobe-ret   ( 1 cpus):    1.093 ± 0.001M/s  (  1.093M/s/cpu)

Signed-off-by: Liao Chang <liaochang1@huawei.com>
---
 arch/arm64/kernel/probes/uprobes.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/kernel/probes/uprobes.c b/arch/arm64/kernel/probes/uprobes.c
index d49aef2657cd..5ee27509d6f6 100644
--- a/arch/arm64/kernel/probes/uprobes.c
+++ b/arch/arm64/kernel/probes/uprobes.c
@@ -17,12 +17,16 @@ void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
 	void *xol_page_kaddr = kmap_atomic(page);
 	void *dst = xol_page_kaddr + (vaddr & ~PAGE_MASK);
 
+	if (!memcmp(dst, src, len))
+		goto done;
+
 	/* Initialize the slot */
 	memcpy(dst, src, len);
 
 	/* flush caches (dcache/icache) */
 	sync_icache_aliases((unsigned long)dst, (unsigned long)dst + len);
 
+done:
 	kunmap_atomic(xol_page_kaddr);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-11-08 16:49 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-19 12:17 [PATCH] arm64: uprobes: Optimize cache flushes for xol slot Liao Chang
2024-09-19 14:18 ` Oleg Nesterov
2024-09-20  8:58   ` Liao, Chang
2024-09-20 11:03     ` Oleg Nesterov
2024-09-20 15:32     ` Catalin Marinas
2024-09-20 17:32       ` Oleg Nesterov
2024-09-22 14:09         ` Will Deacon
2024-09-22 14:39           ` Oleg Nesterov
2024-09-23 11:16           ` Liao, Chang
2024-09-23  1:57       ` Liao, Chang
2024-09-23  7:18         ` Will Deacon
2024-09-23 10:52           ` Oleg Nesterov
2024-09-26 12:06             ` Liao, Chang
2024-09-26 16:08               ` Oleg Nesterov
2024-09-23 11:16           ` Liao, Chang
2024-09-23 16:03           ` Catalin Marinas
2024-11-06  9:55 ` Liao, Chang
2024-11-07 18:35   ` Catalin Marinas
2024-11-08 16:49 ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).