* [PATCH] Loongarch: entry: fix syscall_get_arguments() VS no-bultin-memcpy
@ 2025-08-26 11:32 Wentao Guan
2025-08-28 5:47 ` Yanteng Si
2025-08-28 6:46 ` Xi Ruoyao
0 siblings, 2 replies; 11+ messages in thread
From: Wentao Guan @ 2025-08-26 11:32 UTC (permalink / raw)
To: chenhuacai; +Cc: kernel, xry111, linux-kernel, loongarch, zhanjun, Wentao Guan
Loongarch use -fno-builtin-memcpy in Makefile,
so cause a extra memcpy in syscall hot path:
syscall_enter_audit
->syscall_get_arguments->memcpy(__memcpy_fast)
->audit_syscall_entry
Just avoid memcpy() altogether and write the copying of args from regs
manually, which shows 5% multi core score up in UnixBench syscall.
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
---
arch/loongarch/include/asm/syscall.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/loongarch/include/asm/syscall.h b/arch/loongarch/include/asm/syscall.h
index 81d2733f7b94..171af2edd569 100644
--- a/arch/loongarch/include/asm/syscall.h
+++ b/arch/loongarch/include/asm/syscall.h
@@ -65,7 +65,11 @@ static inline void syscall_get_arguments(struct task_struct *task,
unsigned long *args)
{
args[0] = regs->orig_a0;
- memcpy(&args[1], ®s->regs[5], 5 * sizeof(long));
+ args[1] = regs->regs[5];
+ args[2] = regs->regs[6];
+ args[3] = regs->regs[7];
+ args[4] = regs->regs[8];
+ args[5] = regs->regs[9];
}
static inline void syscall_set_arguments(struct task_struct *task,
--
2.20.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] Loongarch: entry: fix syscall_get_arguments() VS no-bultin-memcpy
2025-08-26 11:32 [PATCH] Loongarch: entry: fix syscall_get_arguments() VS no-bultin-memcpy Wentao Guan
@ 2025-08-28 5:47 ` Yanteng Si
2025-08-28 6:24 ` Huacai Chen
2025-08-28 6:44 ` Wentao Guan
2025-08-28 6:46 ` Xi Ruoyao
1 sibling, 2 replies; 11+ messages in thread
From: Yanteng Si @ 2025-08-28 5:47 UTC (permalink / raw)
To: Wentao Guan, chenhuacai; +Cc: kernel, xry111, linux-kernel, loongarch, zhanjun
在 8/26/25 7:32 PM, Wentao Guan 写道:
> Loongarch use -fno-builtin-memcpy in Makefile,
> so cause a extra memcpy in syscall hot path:
>
> syscall_enter_audit
> ->syscall_get_arguments->memcpy(__memcpy_fast)
> ->audit_syscall_entry
>
> Just avoid memcpy() altogether and write the copying of args from regs
> manually, which shows 5% multi core score up in UnixBench syscall.
5%? Awesome!
>
> Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Thanks,
Yanteng
> ---
> arch/loongarch/include/asm/syscall.h | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/loongarch/include/asm/syscall.h b/arch/loongarch/include/asm/syscall.h
> index 81d2733f7b94..171af2edd569 100644
> --- a/arch/loongarch/include/asm/syscall.h
> +++ b/arch/loongarch/include/asm/syscall.h
> @@ -65,7 +65,11 @@ static inline void syscall_get_arguments(struct task_struct *task,
> unsigned long *args)
> {
> args[0] = regs->orig_a0;
> - memcpy(&args[1], ®s->regs[5], 5 * sizeof(long));
> + args[1] = regs->regs[5];
> + args[2] = regs->regs[6];
> + args[3] = regs->regs[7];
> + args[4] = regs->regs[8];
> + args[5] = regs->regs[9];
> }
>
> static inline void syscall_set_arguments(struct task_struct *task,
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Loongarch: entry: fix syscall_get_arguments() VS no-bultin-memcpy
2025-08-28 5:47 ` Yanteng Si
@ 2025-08-28 6:24 ` Huacai Chen
2025-08-28 6:43 ` Wentao Guan
2025-08-28 6:44 ` Wentao Guan
1 sibling, 1 reply; 11+ messages in thread
From: Huacai Chen @ 2025-08-28 6:24 UTC (permalink / raw)
To: Yanteng Si; +Cc: Wentao Guan, kernel, xry111, linux-kernel, loongarch, zhanjun
On Thu, Aug 28, 2025 at 1:47 PM Yanteng Si <si.yanteng@linux.dev> wrote:
>
> 在 8/26/25 7:32 PM, Wentao Guan 写道:
> > Loongarch use -fno-builtin-memcpy in Makefile,
> > so cause a extra memcpy in syscall hot path:
> >
> > syscall_enter_audit
> > ->syscall_get_arguments->memcpy(__memcpy_fast)
> > ->audit_syscall_entry
> >
> > Just avoid memcpy() altogether and write the copying of args from regs
> > manually, which shows 5% multi core score up in UnixBench syscall.
> 5%? Awesome!
> >
> > Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
> Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
But why this is a "fix"? It is an optimization.
Huacai
>
> Thanks,
> Yanteng
> > ---
> > arch/loongarch/include/asm/syscall.h | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/loongarch/include/asm/syscall.h b/arch/loongarch/include/asm/syscall.h
> > index 81d2733f7b94..171af2edd569 100644
> > --- a/arch/loongarch/include/asm/syscall.h
> > +++ b/arch/loongarch/include/asm/syscall.h
> > @@ -65,7 +65,11 @@ static inline void syscall_get_arguments(struct task_struct *task,
> > unsigned long *args)
> > {
> > args[0] = regs->orig_a0;
> > - memcpy(&args[1], ®s->regs[5], 5 * sizeof(long));
> > + args[1] = regs->regs[5];
> > + args[2] = regs->regs[6];
> > + args[3] = regs->regs[7];
> > + args[4] = regs->regs[8];
> > + args[5] = regs->regs[9];
> > }
> >
> > static inline void syscall_set_arguments(struct task_struct *task,
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Loongarch: entry: fix syscall_get_arguments() VS no-bultin-memcpy
2025-08-28 6:24 ` Huacai Chen
@ 2025-08-28 6:43 ` Wentao Guan
0 siblings, 0 replies; 11+ messages in thread
From: Wentao Guan @ 2025-08-28 6:43 UTC (permalink / raw)
To: Huacai Chen, Yanteng Si
Cc: WANG Xuerui, Xi Ruoyao, linux-kernel, loongarch, 占俊
Dear Huacai,
Emmm, let change the title fix to optimize...
BRs
Wentao Guan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Loongarch: entry: fix syscall_get_arguments() VS no-bultin-memcpy
2025-08-28 5:47 ` Yanteng Si
2025-08-28 6:24 ` Huacai Chen
@ 2025-08-28 6:44 ` Wentao Guan
1 sibling, 0 replies; 11+ messages in thread
From: Wentao Guan @ 2025-08-28 6:44 UTC (permalink / raw)
To: Yanteng Si, chenhuacai
Cc: WANG Xuerui, Xi Ruoyao, linux-kernel, loongarch, 占俊
Hello,
Thanks for your review,
I test the patch in our v6.6 kernel,
in 3a6000 platform UnixBench results
syscall 1 thread from 966 to 996
8 threads from 6173 to 6501.
Best Regards
Wentao Guan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Loongarch: entry: fix syscall_get_arguments() VS no-bultin-memcpy
2025-08-26 11:32 [PATCH] Loongarch: entry: fix syscall_get_arguments() VS no-bultin-memcpy Wentao Guan
2025-08-28 5:47 ` Yanteng Si
@ 2025-08-28 6:46 ` Xi Ruoyao
2025-08-28 7:06 ` Wentao Guan
1 sibling, 1 reply; 11+ messages in thread
From: Xi Ruoyao @ 2025-08-28 6:46 UTC (permalink / raw)
To: Wentao Guan, chenhuacai; +Cc: kernel, linux-kernel, loongarch, zhanjun
On Tue, 2025-08-26 at 19:32 +0800, Wentao Guan wrote:
> Loongarch use -fno-builtin-memcpy in Makefile,
I still think we should just remove -fno-builtin-memcpy if GCC >= 14.
There's some wide-spread misunderstanding like "-fno-builtin-memcpy can
stop the compiler from using memcpy for copying small structs" but the
fact is very much contrary to it.
The compiler just uses memcpy for copying the structs of which the size
is larger than a threshold, no matter -fbuiltin-memcpy or not, but with
-fbuiltin-memcpy the compiler can expand some of them into ld/st
sequences, reducing the actual numbers of memcpy calls.
The reason we didn't use -fbuiltin-memcpy is GCC < 14 expands memcpy for
various sizes in a very stupid way, see https://gcc.gnu.org/PR109465.
--
Xi Ruoyao <xry111@xry111.site>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Loongarch: entry: fix syscall_get_arguments() VS no-bultin-memcpy
2025-08-28 6:46 ` Xi Ruoyao
@ 2025-08-28 7:06 ` Wentao Guan
2025-08-28 7:09 ` Huacai Chen
2025-08-28 17:18 ` Wentao Guan
0 siblings, 2 replies; 11+ messages in thread
From: Wentao Guan @ 2025-08-28 7:06 UTC (permalink / raw)
To: Xi Ruoyao, chenhuacai
Cc: WANG Xuerui, linux-kernel, loongarch, 占俊
Hello Ruoyao,
Thanks for your review.
I will test and send test results later.
(use different gcc verions and remove the three flags.)
BRs
Wentao Guan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Loongarch: entry: fix syscall_get_arguments() VS no-bultin-memcpy
2025-08-28 7:06 ` Wentao Guan
@ 2025-08-28 7:09 ` Huacai Chen
2025-08-28 7:11 ` Xi Ruoyao
2025-08-28 17:18 ` Wentao Guan
1 sibling, 1 reply; 11+ messages in thread
From: Huacai Chen @ 2025-08-28 7:09 UTC (permalink / raw)
To: Wentao Guan
Cc: Xi Ruoyao, WANG Xuerui, linux-kernel, loongarch, 占俊
On Thu, Aug 28, 2025 at 3:06 PM Wentao Guan <guanwentao@uniontech.com> wrote:
>
> Hello Ruoyao,
>
> Thanks for your review.
> I will test and send test results later.
> (use different gcc verions and remove the three flags.)
I think kernel's memcpy is still needed for CPUs without UAL.
Huacai
>
> BRs
> Wentao Guan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Loongarch: entry: fix syscall_get_arguments() VS no-bultin-memcpy
2025-08-28 7:09 ` Huacai Chen
@ 2025-08-28 7:11 ` Xi Ruoyao
2025-08-28 7:14 ` Xi Ruoyao
0 siblings, 1 reply; 11+ messages in thread
From: Xi Ruoyao @ 2025-08-28 7:11 UTC (permalink / raw)
To: Huacai Chen, Wentao Guan
Cc: WANG Xuerui, linux-kernel, loongarch, 占俊
On Thu, 2025-08-28 at 15:09 +0800, Huacai Chen wrote:
> On Thu, Aug 28, 2025 at 3:06 PM Wentao Guan <guanwentao@uniontech.com>
> wrote:
> >
> > Hello Ruoyao,
> >
> > Thanks for your review.
> > I will test and send test results later.
> > (use different gcc verions and remove the three flags.)
> I think kernel's memcpy is still needed for CPUs without UAL.
GCC memcpy expansion correctly understands no-UAL case. It won't
generate unaligned access unless -mno-strict-align.
--
Xi Ruoyao <xry111@xry111.site>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Loongarch: entry: fix syscall_get_arguments() VS no-bultin-memcpy
2025-08-28 7:11 ` Xi Ruoyao
@ 2025-08-28 7:14 ` Xi Ruoyao
0 siblings, 0 replies; 11+ messages in thread
From: Xi Ruoyao @ 2025-08-28 7:14 UTC (permalink / raw)
To: Huacai Chen, Wentao Guan
Cc: WANG Xuerui, linux-kernel, loongarch, 占俊
On Thu, 2025-08-28 at 15:11 +0800, Xi Ruoyao wrote:
> On Thu, 2025-08-28 at 15:09 +0800, Huacai Chen wrote:
> > On Thu, Aug 28, 2025 at 3:06 PM Wentao Guan
> > <guanwentao@uniontech.com>
> > wrote:
> > >
> > > Hello Ruoyao,
> > >
> > > Thanks for your review.
> > > I will test and send test results later.
> > > (use different gcc verions and remove the three flags.)
> > I think kernel's memcpy is still needed for CPUs without UAL.
>
> GCC memcpy expansion correctly understands no-UAL case. It won't
> generate unaligned access unless -mno-strict-align.
P.S. kernel's memcpy is indeed still needed because when GCC deduces
it's not beneficial to expand the __builtin_memcpy into ld/st sequeces,
it'll just call memcpy.
--
Xi Ruoyao <xry111@xry111.site>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Loongarch: entry: fix syscall_get_arguments() VS no-bultin-memcpy
2025-08-28 7:06 ` Wentao Guan
2025-08-28 7:09 ` Huacai Chen
@ 2025-08-28 17:18 ` Wentao Guan
1 sibling, 0 replies; 11+ messages in thread
From: Wentao Guan @ 2025-08-28 17:18 UTC (permalink / raw)
To: 关文涛, Xi Ruoyao, chenhuacai
Cc: WANG Xuerui, linux-kernel, loongarch, 占俊
Hello Ruoyao,
The quick test results are very interesting...:
UnixBench syscall on 3A6000 platform:
./Run syscall -i 10 -c 8
kernel source: v6.16
kernel config:
https://github.com/deepin-community/kernel/blob/linux-6.12.y/arch/loongarch/configs/deepin_loongarch_desktop_defconfig
Use this GCC 15[1]:
result:
1. with no patch:
System Call Overhead 15000.0 8893011.8 5928.7
2. with patch [2] to disable -fno-builtin-memcpy -fno-builtin-memmove -fno-builtin-memset:
System Call Overhead 15000.0 8510415.2 5673.6
lower than patch???
3. with my origin patch [3](+4.9%):
System Call Overhead 15000.0 9336022.1 6224.0
BRs
Wentao Guan
[1]: https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/15.2.0
[2]:
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 4b19f9337..53bbcf31c 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -746,6 +746,9 @@ config ARCH_SUSPEND_POSSIBLE
config ARCH_HIBERNATION_POSSIBLE
def_bool y
+config ARCH_MEMORY_LIBCALL_DISABLE
+ def_bool KASAN || GCC_VERSION >= 140000
+
source "kernel/power/Kconfig"
source "drivers/acpi/Kconfig"
source "drivers/cpufreq/Kconfig"
diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
index b0703a4e0..fba837b22 100644
--- a/arch/loongarch/Makefile
+++ b/arch/loongarch/Makefile
@@ -120,7 +120,7 @@ endif
cflags-y += $(call cc-option, -mno-check-zero-division)
-ifndef CONFIG_KASAN
+ifeq ($(CONFIG_ARCH_MEMORY_LIBCALL_DISABLE),)
cflags-y += -fno-builtin-memcpy -fno-builtin-memmove -fno-builtin-memset
endif
[3]:
diff --git a/arch/loongarch/include/asm/syscall.h b/arch/loongarch/include/asm/syscall.h
index 81d2733f7..171af2edd 100644
--- a/arch/loongarch/include/asm/syscall.h
+++ b/arch/loongarch/include/asm/syscall.h
@@ -65,7 +65,11 @@ static inline void syscall_get_arguments(struct task_struct *task,
unsigned long *args)
{
args[0] = regs->orig_a0;
- memcpy(&args[1], ®s->regs[5], 5 * sizeof(long));
+ args[1] = regs->regs[5];
+ args[2] = regs->regs[6];
+ args[3] = regs->regs[7];
+ args[4] = regs->regs[8];
+ args[5] = regs->regs[9];
}
static inline void syscall_set_arguments(struct task_struct *task,
[4]: perf diff
# Event 'cycles:P'
#
# Baseline Delta Abs Shared Object Symbol >
# ........ ......... .................................. .....................................................>
#
12.72% -7.07% [kernel.kallsyms] [k] syscall_trace_enter
0.00% +2.94% [kernel.kallsyms] [k] __memcpy_fast
33.69% +1.41% [kernel.kallsyms] [k] do_syscall
2.64% +0.32% [kernel.kallsyms] [k] __audit_syscall_entry
4.92% +0.32% libc.so.6 [.] __GI___umask
3.54% +0.30% [kernel.kallsyms] [k] alloc_fd
4.36% +0.24% libc.so.6 [.] __GI___getuid
4.63% +0.24% libc.so.6 [.] syscall
4.50% +0.22% libc.so.6 [.] __GI___dup
4.84% +0.22% [kernel.kallsyms] [k] syscall_exit_work
4.81% +0.22% libc.so.6 [.] __close
+0.20% [kernel.kallsyms] [k] __memcpy
1.48% +0.15% [kernel.kallsyms] [k] audit_reset_context
0.69% +0.13% [kernel.kallsyms] [k] __se_sys_close
2.13% +0.10% [kernel.kallsyms] [k] file_close_fd_locked
0.86% +0.08% [kernel.kallsyms] [k] map_id_range_up
0.71% +0.06% [kernel.kallsyms] [k] __task_pid_nr_ns
0.41% +0.04% [kernel.kallsyms] [k] fd_install
0.56% +0.04% [kernel.kallsyms] [k] __rcu_read_unlock
0.57% +0.04% [kernel.kallsyms] [k] fput_close_sync
0.07% -0.04% [kernel.kallsyms] [k] idle_exit
1.37% +0.04% [kernel.kallsyms] [k] _raw_spin_lock
0.69% +0.03% [kernel.kallsyms] [k] filp_flush
0.76% +0.03% syscall [.] main
0.52% +0.03% [kernel.kallsyms] [k] from_kuid_munged
0.45% +0.03% [kernel.kallsyms] [k] _raw_spin_unlock
0.07% -0.03% [kernel.kallsyms] [k] finish_task_switch.isra.0
1.72% +0.03% [kernel.kallsyms] [k] ktime_get_coarse_real_ts64
0.67% +0.03% [kernel.kallsyms] [k] __rcu_read_lock
0.03% -0.03% deepin-deepinid-daemon [.] 0x0000000000003018
1.28% -0.02% [kernel.kallsyms] [k] __audit_syscall_exit
0.12% +0.02% syscall [.] getuid@plt
0.02% -0.02% dde-session-daemon [.] 0x0000000000299d10
0.02% -0.02% ld-linux-loongarch-lp64d.so.1 [.] do_lookup_x
0.03% -0.01% [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.01% -0.01% ld-linux-loongarch-lp64d.so.1 [.] _dl_lookup_symbol_x
0.30% +0.01% [kernel.kallsyms] [k] sys_getpid
0.14% -0.01% syscall [.] umask@plt
0.01% -0.01% libdbus-1.so.3.32.4 [.] 0x0000000000015ac0
0.02% -0.01% [kernel.kallsyms] [k] clear_page
0.01% -0.01% deepin-sync-helper [.] 0x0000000000003a84
0.14% +0.01% syscall [.] close@plt
^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-08-28 17:19 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-26 11:32 [PATCH] Loongarch: entry: fix syscall_get_arguments() VS no-bultin-memcpy Wentao Guan
2025-08-28 5:47 ` Yanteng Si
2025-08-28 6:24 ` Huacai Chen
2025-08-28 6:43 ` Wentao Guan
2025-08-28 6:44 ` Wentao Guan
2025-08-28 6:46 ` Xi Ruoyao
2025-08-28 7:06 ` Wentao Guan
2025-08-28 7:09 ` Huacai Chen
2025-08-28 7:11 ` Xi Ruoyao
2025-08-28 7:14 ` Xi Ruoyao
2025-08-28 17:18 ` Wentao Guan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).