linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] syscall: Cleanup and improve syscall_get_arguments()
@ 2025-11-27 12:36 Jinjie Ruan
  2025-11-27 12:36 ` [PATCH 1/2] syscall.h: Remove unused SYSCALL_MAX_ARGS Jinjie Ruan
  2025-11-27 12:36 ` [PATCH 2/2] arm64: Avoid memcpy() for syscall_get_arguments() Jinjie Ruan
  0 siblings, 2 replies; 7+ messages in thread
From: Jinjie Ruan @ 2025-11-27 12:36 UTC (permalink / raw)
  To: linux, catalin.marinas, will, chris, jcmvbkbc, akpm, macro,
	charlie, deller, ldv, rostedt, tglx, linux-arm-kernel,
	linux-kernel
  Cc: ruanjinjie

Remove unused SYSCALL_MAX_ARGS and avoid memcpy() for
syscall_get_arguments() for arm64.

Jinjie Ruan (2):
  syscall.h: Remove unused SYSCALL_MAX_ARGS
  arm64: Avoid memcpy() for syscall_get_arguments()

 arch/arm/include/asm/syscall.h    |  2 --
 arch/arm64/include/asm/syscall.h  | 10 +++++-----
 arch/xtensa/include/asm/syscall.h |  1 -
 3 files changed, 5 insertions(+), 8 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] syscall.h: Remove unused SYSCALL_MAX_ARGS
  2025-11-27 12:36 [PATCH 0/2] syscall: Cleanup and improve syscall_get_arguments() Jinjie Ruan
@ 2025-11-27 12:36 ` Jinjie Ruan
  2025-11-27 12:36 ` [PATCH 2/2] arm64: Avoid memcpy() for syscall_get_arguments() Jinjie Ruan
  1 sibling, 0 replies; 7+ messages in thread
From: Jinjie Ruan @ 2025-11-27 12:36 UTC (permalink / raw)
  To: linux, catalin.marinas, will, chris, jcmvbkbc, akpm, macro,
	charlie, deller, ldv, rostedt, tglx, linux-arm-kernel,
	linux-kernel
  Cc: ruanjinjie

The "SYSCALL_MAX_ARGS" appears to have been unused since
commit 32d92586629a ("syscalls: Remove start and number from
syscall_set_arguments() args"), so remove it.

Fixes: 32d92586629a ("syscalls: Remove start and number from syscall_set_arguments() args")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm/include/asm/syscall.h    | 2 --
 arch/arm64/include/asm/syscall.h  | 2 --
 arch/xtensa/include/asm/syscall.h | 1 -
 3 files changed, 5 deletions(-)

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index 18b102a30741..574bbcc55382 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -92,8 +92,6 @@ static inline void syscall_set_nr(struct task_struct *task,
 		(nr & __NR_SYSCALL_MASK);
 }
 
-#define SYSCALL_MAX_ARGS 7
-
 static inline void syscall_get_arguments(struct task_struct *task,
 					 struct pt_regs *regs,
 					 unsigned long *args)
diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
index 712daa90e643..f3853047c28e 100644
--- a/arch/arm64/include/asm/syscall.h
+++ b/arch/arm64/include/asm/syscall.h
@@ -77,8 +77,6 @@ static inline void syscall_set_nr(struct task_struct *task,
 	}
 }
 
-#define SYSCALL_MAX_ARGS 6
-
 static inline void syscall_get_arguments(struct task_struct *task,
 					 struct pt_regs *regs,
 					 unsigned long *args)
diff --git a/arch/xtensa/include/asm/syscall.h b/arch/xtensa/include/asm/syscall.h
index 7db3b489c8ad..bab7cdd96cbe 100644
--- a/arch/xtensa/include/asm/syscall.h
+++ b/arch/xtensa/include/asm/syscall.h
@@ -61,7 +61,6 @@ static inline void syscall_set_return_value(struct task_struct *task,
 	regs->areg[2] = (long) error ? error : val;
 }
 
-#define SYSCALL_MAX_ARGS 6
 #define XTENSA_SYSCALL_ARGUMENT_REGS {6, 3, 4, 5, 8, 9}
 
 static inline void syscall_get_arguments(struct task_struct *task,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] arm64: Avoid memcpy() for syscall_get_arguments()
  2025-11-27 12:36 [PATCH 0/2] syscall: Cleanup and improve syscall_get_arguments() Jinjie Ruan
  2025-11-27 12:36 ` [PATCH 1/2] syscall.h: Remove unused SYSCALL_MAX_ARGS Jinjie Ruan
@ 2025-11-27 12:36 ` Jinjie Ruan
  2025-11-27 14:35   ` Dmitry V. Levin
  2025-12-01 10:13   ` Mark Rutland
  1 sibling, 2 replies; 7+ messages in thread
From: Jinjie Ruan @ 2025-11-27 12:36 UTC (permalink / raw)
  To: linux, catalin.marinas, will, chris, jcmvbkbc, akpm, macro,
	charlie, deller, ldv, rostedt, tglx, linux-arm-kernel,
	linux-kernel
  Cc: ruanjinjie

Do not use memcpy() to extract syscall arguments from struct pt_regs
but rather just perform direct assignments.

The performance benchmarks with Generic Entry patch[1] with audit on
from perf bench basic syscall on kunpeng920 gives roughly a 1%
performance uplift and also aligns the implementation with
x86 and RISC-V.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, #272]
       f8008444        str     x4, [x2], #8
       a9409404        ldp     x4, x5, [x0, #8]
       a9009424        stp     x4, x5, [x1, #8]
       a9418400        ldp     x0, x1, [x0, #24]
       a9010440        stp     x0, x1, [x2, #16]
       f9401060        ldr     x0, [x3, #32]
       f9001040        str     x0, [x2, #32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, #8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, #24]
       f9408a81        ldr     x1, [x20, #272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

[1]: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/include/asm/syscall.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
index f3853047c28e..f3564ba97f7e 100644
--- a/arch/arm64/include/asm/syscall.h
+++ b/arch/arm64/include/asm/syscall.h
@@ -82,9 +82,11 @@ static inline void syscall_get_arguments(struct task_struct *task,
 					 unsigned long *args)
 {
 	args[0] = regs->orig_x0;
-	args++;
-
-	memcpy(args, &regs->regs[1], 5 * sizeof(args[0]));
+	args[1] = regs->regs[1];
+	args[2] = regs->regs[2];
+	args[3] = regs->regs[3];
+	args[4] = regs->regs[4];
+	args[5] = regs->regs[5];
 }
 
 static inline void syscall_set_arguments(struct task_struct *task,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] arm64: Avoid memcpy() for syscall_get_arguments()
  2025-11-27 12:36 ` [PATCH 2/2] arm64: Avoid memcpy() for syscall_get_arguments() Jinjie Ruan
@ 2025-11-27 14:35   ` Dmitry V. Levin
  2025-12-01 10:13   ` Mark Rutland
  1 sibling, 0 replies; 7+ messages in thread
From: Dmitry V. Levin @ 2025-11-27 14:35 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: linux, catalin.marinas, will, chris, jcmvbkbc, akpm, macro,
	charlie, deller, rostedt, tglx, linux-arm-kernel, linux-kernel

On Thu, Nov 27, 2025 at 08:36:30PM +0800, Jinjie Ruan wrote:
> Do not use memcpy() to extract syscall arguments from struct pt_regs
> but rather just perform direct assignments.
> 
> The performance benchmarks with Generic Entry patch[1] with audit on
> from perf bench basic syscall on kunpeng920 gives roughly a 1%
> performance uplift and also aligns the implementation with
> x86 and RISC-V.
> 
> | Metric     | W/O this patch | With this patch | Change    |
> | ---------- | -------------- | --------------- | --------- |
> | Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
> | usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
> | ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |
> 
> Before:
> <syscall_get_arguments.constprop.0>:
>        aa0103e2        mov     x2, x1
>        91002003        add     x3, x0, #0x8
>        f9408804        ldr     x4, [x0, #272]
>        f8008444        str     x4, [x2], #8
>        a9409404        ldp     x4, x5, [x0, #8]
>        a9009424        stp     x4, x5, [x1, #8]
>        a9418400        ldp     x0, x1, [x0, #24]
>        a9010440        stp     x0, x1, [x2, #16]
>        f9401060        ldr     x0, [x3, #32]
>        f9001040        str     x0, [x2, #32]
>        d65f03c0        ret
>        d503201f        nop
> 
> After:
>        a9408e82        ldp     x2, x3, [x20, #8]
>        2a1603e0        mov     w0, w22
>        f9400e84        ldr     x4, [x20, #24]
>        f9408a81        ldr     x1, [x20, #272]
>        9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>
> 
> [1]: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/include/asm/syscall.h | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
> index f3853047c28e..f3564ba97f7e 100644
> --- a/arch/arm64/include/asm/syscall.h
> +++ b/arch/arm64/include/asm/syscall.h
> @@ -82,9 +82,11 @@ static inline void syscall_get_arguments(struct task_struct *task,
>  					 unsigned long *args)
>  {
>  	args[0] = regs->orig_x0;
> -	args++;
> -
> -	memcpy(args, &regs->regs[1], 5 * sizeof(args[0]));
> +	args[1] = regs->regs[1];
> +	args[2] = regs->regs[2];
> +	args[3] = regs->regs[3];
> +	args[4] = regs->regs[4];
> +	args[5] = regs->regs[5];
>  }
>  
>  static inline void syscall_set_arguments(struct task_struct *task,

Please keep syscall_get_arguments() and syscall_set_arguments() in sync:
if you replace memset() with direct assignments in one of these functions,
please mirror the change in another.


-- 
ldv


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] arm64: Avoid memcpy() for syscall_get_arguments()
  2025-11-27 12:36 ` [PATCH 2/2] arm64: Avoid memcpy() for syscall_get_arguments() Jinjie Ruan
  2025-11-27 14:35   ` Dmitry V. Levin
@ 2025-12-01 10:13   ` Mark Rutland
  2025-12-01 10:26     ` david laight
  1 sibling, 1 reply; 7+ messages in thread
From: Mark Rutland @ 2025-12-01 10:13 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: linux, catalin.marinas, will, chris, jcmvbkbc, akpm, macro,
	charlie, deller, ldv, rostedt, tglx, linux-arm-kernel,
	linux-kernel

On Thu, Nov 27, 2025 at 08:36:30PM +0800, Jinjie Ruan wrote:
> Do not use memcpy() to extract syscall arguments from struct pt_regs
> but rather just perform direct assignments.
> 
> The performance benchmarks with Generic Entry patch[1] with audit on
> from perf bench basic syscall on kunpeng920 gives roughly a 1%
> performance uplift and also aligns the implementation with
> x86 and RISC-V.
> 
> | Metric     | W/O this patch | With this patch | Change    |
> | ---------- | -------------- | --------------- | --------- |
> | Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
> | usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
> | ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |
> 
> Before:
> <syscall_get_arguments.constprop.0>:
>        aa0103e2        mov     x2, x1
>        91002003        add     x3, x0, #0x8
>        f9408804        ldr     x4, [x0, #272]
>        f8008444        str     x4, [x2], #8
>        a9409404        ldp     x4, x5, [x0, #8]
>        a9009424        stp     x4, x5, [x1, #8]
>        a9418400        ldp     x0, x1, [x0, #24]
>        a9010440        stp     x0, x1, [x2, #16]
>        f9401060        ldr     x0, [x3, #32]
>        f9001040        str     x0, [x2, #32]
>        d65f03c0        ret
>        d503201f        nop
> 
> After:
>        a9408e82        ldp     x2, x3, [x20, #8]
>        2a1603e0        mov     w0, w22
>        f9400e84        ldr     x4, [x20, #24]
>        f9408a81        ldr     x1, [x20, #272]
>        9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

It's probably worth noting that __audit_syscall_entry() only takes 4
syscall arguments, and hence the compiler has elided the copy of
regs->regs[4] and regs->regs[5], which it apparently couldn't manage
before.

> [1]: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/arm64/include/asm/syscall.h | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
> index f3853047c28e..f3564ba97f7e 100644
> --- a/arch/arm64/include/asm/syscall.h
> +++ b/arch/arm64/include/asm/syscall.h
> @@ -82,9 +82,11 @@ static inline void syscall_get_arguments(struct task_struct *task,
>  					 unsigned long *args)
>  {
>  	args[0] = regs->orig_x0;
> -	args++;
> -
> -	memcpy(args, &regs->regs[1], 5 * sizeof(args[0]));
> +	args[1] = regs->regs[1];
> +	args[2] = regs->regs[2];
> +	args[3] = regs->regs[3];
> +	args[4] = regs->regs[4];
> +	args[5] = regs->regs[5];
>  }

FWIW, I think this is clearer than the 'args++' and the memcpy(), so I'm
happy with this regardless of the performance concern.

However, as Dmitry says, we should keep this structurally the same as
syscall_set_arguments(), and so we should update that in the same way.

Mark.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] arm64: Avoid memcpy() for syscall_get_arguments()
  2025-12-01 10:13   ` Mark Rutland
@ 2025-12-01 10:26     ` david laight
  2025-12-01 10:30       ` Mark Rutland
  0 siblings, 1 reply; 7+ messages in thread
From: david laight @ 2025-12-01 10:26 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Jinjie Ruan, linux, catalin.marinas, will, chris, jcmvbkbc, akpm,
	macro, charlie, deller, ldv, rostedt, tglx, linux-arm-kernel,
	linux-kernel

On Mon, 1 Dec 2025 10:13:54 +0000
Mark Rutland <mark.rutland@arm.com> wrote:

> On Thu, Nov 27, 2025 at 08:36:30PM +0800, Jinjie Ruan wrote:
> > Do not use memcpy() to extract syscall arguments from struct pt_regs
> > but rather just perform direct assignments.
> > 
> > The performance benchmarks with Generic Entry patch[1] with audit on
> > from perf bench basic syscall on kunpeng920 gives roughly a 1%
> > performance uplift and also aligns the implementation with
> > x86 and RISC-V.
> > 
> > | Metric     | W/O this patch | With this patch | Change    |
> > | ---------- | -------------- | --------------- | --------- |
> > | Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
> > | usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
> > | ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |
> > 
> > Before:
> > <syscall_get_arguments.constprop.0>:
> >        aa0103e2        mov     x2, x1
> >        91002003        add     x3, x0, #0x8
> >        f9408804        ldr     x4, [x0, #272]
> >        f8008444        str     x4, [x2], #8
> >        a9409404        ldp     x4, x5, [x0, #8]
> >        a9009424        stp     x4, x5, [x1, #8]
> >        a9418400        ldp     x0, x1, [x0, #24]
> >        a9010440        stp     x0, x1, [x2, #16]
> >        f9401060        ldr     x0, [x3, #32]
> >        f9001040        str     x0, [x2, #32]
> >        d65f03c0        ret
> >        d503201f        nop
> > 
> > After:
> >        a9408e82        ldp     x2, x3, [x20, #8]
> >        2a1603e0        mov     w0, w22
> >        f9400e84        ldr     x4, [x20, #24]
> >        f9408a81        ldr     x1, [x20, #272]
> >        9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>  
> 
> It's probably worth noting that __audit_syscall_entry() only takes 4
> syscall arguments, and hence the compiler has elided the copy of
> regs->regs[4] and regs->regs[5], which it apparently couldn't manage
> before.

Hasn't it actually inlined it and completely optimised away the regs[] array?
It looks (from the asm) as though syscall_get_arguments() is followed by:
	fn(regs[0], regs[1], regs[2], regs[3])

    David

> 
> > [1]: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/
> > Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> > ---
> >  arch/arm64/include/asm/syscall.h | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
> > index f3853047c28e..f3564ba97f7e 100644
> > --- a/arch/arm64/include/asm/syscall.h
> > +++ b/arch/arm64/include/asm/syscall.h
> > @@ -82,9 +82,11 @@ static inline void syscall_get_arguments(struct task_struct *task,
> >  					 unsigned long *args)
> >  {
> >  	args[0] = regs->orig_x0;
> > -	args++;
> > -
> > -	memcpy(args, &regs->regs[1], 5 * sizeof(args[0]));
> > +	args[1] = regs->regs[1];
> > +	args[2] = regs->regs[2];
> > +	args[3] = regs->regs[3];
> > +	args[4] = regs->regs[4];
> > +	args[5] = regs->regs[5];
> >  }  
> 
> FWIW, I think this is clearer than the 'args++' and the memcpy(), so I'm
> happy with this regardless of the performance concern.
> 
> However, as Dmitry says, we should keep this structurally the same as
> syscall_set_arguments(), and so we should update that in the same way.
> 
> Mark.
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] arm64: Avoid memcpy() for syscall_get_arguments()
  2025-12-01 10:26     ` david laight
@ 2025-12-01 10:30       ` Mark Rutland
  0 siblings, 0 replies; 7+ messages in thread
From: Mark Rutland @ 2025-12-01 10:30 UTC (permalink / raw)
  To: david laight
  Cc: Jinjie Ruan, linux, catalin.marinas, will, chris, jcmvbkbc, akpm,
	macro, charlie, deller, ldv, rostedt, tglx, linux-arm-kernel,
	linux-kernel

On Mon, Dec 01, 2025 at 10:26:33AM +0000, david laight wrote:
> On Mon, 1 Dec 2025 10:13:54 +0000
> Mark Rutland <mark.rutland@arm.com> wrote:
> > On Thu, Nov 27, 2025 at 08:36:30PM +0800, Jinjie Ruan wrote:
> > > Before:
> > > <syscall_get_arguments.constprop.0>:
> > >        aa0103e2        mov     x2, x1
> > >        91002003        add     x3, x0, #0x8
> > >        f9408804        ldr     x4, [x0, #272]
> > >        f8008444        str     x4, [x2], #8
> > >        a9409404        ldp     x4, x5, [x0, #8]
> > >        a9009424        stp     x4, x5, [x1, #8]
> > >        a9418400        ldp     x0, x1, [x0, #24]
> > >        a9010440        stp     x0, x1, [x2, #16]
> > >        f9401060        ldr     x0, [x3, #32]
> > >        f9001040        str     x0, [x2, #32]
> > >        d65f03c0        ret
> > >        d503201f        nop
> > > 
> > > After:
> > >        a9408e82        ldp     x2, x3, [x20, #8]
> > >        2a1603e0        mov     w0, w22
> > >        f9400e84        ldr     x4, [x20, #24]
> > >        f9408a81        ldr     x1, [x20, #272]
> > >        9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>  
> > 
> > It's probably worth noting that __audit_syscall_entry() only takes 4
> > syscall arguments, and hence the compiler has elided the copy of
> > regs->regs[4] and regs->regs[5], which it apparently couldn't manage
> > before.
> 
> Hasn't it actually inlined it and completely optimised away the regs[] array?
> It looks (from the asm) as though syscall_get_arguments() is followed by:
> 	fn(regs[0], regs[1], regs[2], regs[3])

Yes; I was assuming that people could infer that.

I was poining out that the elision of copies/loads of regs->regs[4] and
regs->regs[5] in particular was not a bug.

Mark.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-12-01 10:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-27 12:36 [PATCH 0/2] syscall: Cleanup and improve syscall_get_arguments() Jinjie Ruan
2025-11-27 12:36 ` [PATCH 1/2] syscall.h: Remove unused SYSCALL_MAX_ARGS Jinjie Ruan
2025-11-27 12:36 ` [PATCH 2/2] arm64: Avoid memcpy() for syscall_get_arguments() Jinjie Ruan
2025-11-27 14:35   ` Dmitry V. Levin
2025-12-01 10:13   ` Mark Rutland
2025-12-01 10:26     ` david laight
2025-12-01 10:30       ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).