linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: david laight <david.laight@runbox.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>,
	linux@armlinux.org.uk, catalin.marinas@arm.com, will@kernel.org,
	chris@zankel.net, jcmvbkbc@gmail.com, akpm@linux-foundation.org,
	macro@orcam.me.uk, charlie@rivosinc.com, deller@gmx.de,
	ldv@strace.io, rostedt@goodmis.org, tglx@linutronix.de,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] arm64: Avoid memcpy() for syscall_get_arguments()
Date: Mon, 1 Dec 2025 10:26:33 +0000	[thread overview]
Message-ID: <20251201102633.17a99afc@pumpkin> (raw)
In-Reply-To: <aS1qYhHhZK3CD_bU@J2N7QTR9R3>

On Mon, 1 Dec 2025 10:13:54 +0000
Mark Rutland <mark.rutland@arm.com> wrote:

> On Thu, Nov 27, 2025 at 08:36:30PM +0800, Jinjie Ruan wrote:
> > Do not use memcpy() to extract syscall arguments from struct pt_regs
> > but rather just perform direct assignments.
> > 
> > The performance benchmarks with Generic Entry patch[1] with audit on
> > from perf bench basic syscall on kunpeng920 gives roughly a 1%
> > performance uplift and also aligns the implementation with
> > x86 and RISC-V.
> > 
> > | Metric     | W/O this patch | With this patch | Change    |
> > | ---------- | -------------- | --------------- | --------- |
> > | Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
> > | usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
> > | ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |
> > 
> > Before:
> > <syscall_get_arguments.constprop.0>:
> >        aa0103e2        mov     x2, x1
> >        91002003        add     x3, x0, #0x8
> >        f9408804        ldr     x4, [x0, #272]
> >        f8008444        str     x4, [x2], #8
> >        a9409404        ldp     x4, x5, [x0, #8]
> >        a9009424        stp     x4, x5, [x1, #8]
> >        a9418400        ldp     x0, x1, [x0, #24]
> >        a9010440        stp     x0, x1, [x2, #16]
> >        f9401060        ldr     x0, [x3, #32]
> >        f9001040        str     x0, [x2, #32]
> >        d65f03c0        ret
> >        d503201f        nop
> > 
> > After:
> >        a9408e82        ldp     x2, x3, [x20, #8]
> >        2a1603e0        mov     w0, w22
> >        f9400e84        ldr     x4, [x20, #24]
> >        f9408a81        ldr     x1, [x20, #272]
> >        9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>  
> 
> It's probably worth noting that __audit_syscall_entry() only takes 4
> syscall arguments, and hence the compiler has elided the copy of
> regs->regs[4] and regs->regs[5], which it apparently couldn't manage
> before.

Hasn't it actually inlined it and completely optimised away the regs[] array?
It looks (from the asm) as though syscall_get_arguments() is followed by:
	fn(regs[0], regs[1], regs[2], regs[3])

    David

> 
> > [1]: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/
> > Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> > ---
> >  arch/arm64/include/asm/syscall.h | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
> > index f3853047c28e..f3564ba97f7e 100644
> > --- a/arch/arm64/include/asm/syscall.h
> > +++ b/arch/arm64/include/asm/syscall.h
> > @@ -82,9 +82,11 @@ static inline void syscall_get_arguments(struct task_struct *task,
> >  					 unsigned long *args)
> >  {
> >  	args[0] = regs->orig_x0;
> > -	args++;
> > -
> > -	memcpy(args, &regs->regs[1], 5 * sizeof(args[0]));
> > +	args[1] = regs->regs[1];
> > +	args[2] = regs->regs[2];
> > +	args[3] = regs->regs[3];
> > +	args[4] = regs->regs[4];
> > +	args[5] = regs->regs[5];
> >  }  
> 
> FWIW, I think this is clearer than the 'args++' and the memcpy(), so I'm
> happy with this regardless of the performance concern.
> 
> However, as Dmitry says, we should keep this structurally the same as
> syscall_set_arguments(), and so we should update that in the same way.
> 
> Mark.
> 



  reply	other threads:[~2025-12-01 10:27 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-27 12:36 [PATCH 0/2] syscall: Cleanup and improve syscall_get_arguments() Jinjie Ruan
2025-11-27 12:36 ` [PATCH 1/2] syscall.h: Remove unused SYSCALL_MAX_ARGS Jinjie Ruan
2025-11-27 12:36 ` [PATCH 2/2] arm64: Avoid memcpy() for syscall_get_arguments() Jinjie Ruan
2025-11-27 14:35   ` Dmitry V. Levin
2025-12-01 10:13   ` Mark Rutland
2025-12-01 10:26     ` david laight [this message]
2025-12-01 10:30       ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251201102633.17a99afc@pumpkin \
    --to=david.laight@runbox.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=charlie@rivosinc.com \
    --cc=chris@zankel.net \
    --cc=deller@gmx.de \
    --cc=jcmvbkbc@gmail.com \
    --cc=ldv@strace.io \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=macro@orcam.me.uk \
    --cc=mark.rutland@arm.com \
    --cc=rostedt@goodmis.org \
    --cc=ruanjinjie@huawei.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).