All of lore.kernel.org
 help / color / mirror / Atom feed
From: david laight <david.laight@runbox.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>,
	linux@armlinux.org.uk, catalin.marinas@arm.com, will@kernel.org,
	chris@zankel.net, jcmvbkbc@gmail.com, akpm@linux-foundation.org,
	macro@orcam.me.uk, charlie@rivosinc.com, deller@gmx.de,
	ldv@strace.io, rostedt@goodmis.org, tglx@linutronix.de,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] arm64: Avoid memcpy() for syscall_get_arguments()
Date: Mon, 1 Dec 2025 10:26:33 +0000	[thread overview]
Message-ID: <20251201102633.17a99afc@pumpkin> (raw)
In-Reply-To: <aS1qYhHhZK3CD_bU@J2N7QTR9R3>

On Mon, 1 Dec 2025 10:13:54 +0000
Mark Rutland <mark.rutland@arm.com> wrote:

> On Thu, Nov 27, 2025 at 08:36:30PM +0800, Jinjie Ruan wrote:
> > Do not use memcpy() to extract syscall arguments from struct pt_regs
> > but rather just perform direct assignments.
> > 
> > The performance benchmarks with Generic Entry patch[1] with audit on
> > from perf bench basic syscall on kunpeng920 gives roughly a 1%
> > performance uplift and also aligns the implementation with
> > x86 and RISC-V.
> > 
> > | Metric     | W/O this patch | With this patch | Change    |
> > | ---------- | -------------- | --------------- | --------- |
> > | Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
> > | usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
> > | ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |
> > 
> > Before:
> > <syscall_get_arguments.constprop.0>:
> >        aa0103e2        mov     x2, x1
> >        91002003        add     x3, x0, #0x8
> >        f9408804        ldr     x4, [x0, #272]
> >        f8008444        str     x4, [x2], #8
> >        a9409404        ldp     x4, x5, [x0, #8]
> >        a9009424        stp     x4, x5, [x1, #8]
> >        a9418400        ldp     x0, x1, [x0, #24]
> >        a9010440        stp     x0, x1, [x2, #16]
> >        f9401060        ldr     x0, [x3, #32]
> >        f9001040        str     x0, [x2, #32]
> >        d65f03c0        ret
> >        d503201f        nop
> > 
> > After:
> >        a9408e82        ldp     x2, x3, [x20, #8]
> >        2a1603e0        mov     w0, w22
> >        f9400e84        ldr     x4, [x20, #24]
> >        f9408a81        ldr     x1, [x20, #272]
> >        9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>  
> 
> It's probably worth noting that __audit_syscall_entry() only takes 4
> syscall arguments, and hence the compiler has elided the copy of
> regs->regs[4] and regs->regs[5], which it apparently couldn't manage
> before.

Hasn't it actually inlined it and completely optimised away the regs[] array?
It looks (from the asm) as though syscall_get_arguments() is followed by:
	fn(regs[0], regs[1], regs[2], regs[3])

    David

> 
> > [1]: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/
> > Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> > ---
> >  arch/arm64/include/asm/syscall.h | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
> > index f3853047c28e..f3564ba97f7e 100644
> > --- a/arch/arm64/include/asm/syscall.h
> > +++ b/arch/arm64/include/asm/syscall.h
> > @@ -82,9 +82,11 @@ static inline void syscall_get_arguments(struct task_struct *task,
> >  					 unsigned long *args)
> >  {
> >  	args[0] = regs->orig_x0;
> > -	args++;
> > -
> > -	memcpy(args, &regs->regs[1], 5 * sizeof(args[0]));
> > +	args[1] = regs->regs[1];
> > +	args[2] = regs->regs[2];
> > +	args[3] = regs->regs[3];
> > +	args[4] = regs->regs[4];
> > +	args[5] = regs->regs[5];
> >  }  
> 
> FWIW, I think this is clearer than the 'args++' and the memcpy(), so I'm
> happy with this regardless of the performance concern.
> 
> However, as Dmitry says, we should keep this structurally the same as
> syscall_set_arguments(), and so we should update that in the same way.
> 
> Mark.
> 



  reply	other threads:[~2025-12-01 10:27 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-27 12:36 [PATCH 0/2] syscall: Cleanup and improve syscall_get_arguments() Jinjie Ruan
2025-11-27 12:36 ` [PATCH 1/2] syscall.h: Remove unused SYSCALL_MAX_ARGS Jinjie Ruan
2025-11-27 12:36 ` [PATCH 2/2] arm64: Avoid memcpy() for syscall_get_arguments() Jinjie Ruan
2025-11-27 14:35   ` Dmitry V. Levin
2025-12-01 10:13   ` Mark Rutland
2025-12-01 10:26     ` david laight [this message]
2025-12-01 10:30       ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251201102633.17a99afc@pumpkin \
    --to=david.laight@runbox.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=charlie@rivosinc.com \
    --cc=chris@zankel.net \
    --cc=deller@gmx.de \
    --cc=jcmvbkbc@gmail.com \
    --cc=ldv@strace.io \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=macro@orcam.me.uk \
    --cc=mark.rutland@arm.com \
    --cc=rostedt@goodmis.org \
    --cc=ruanjinjie@huawei.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.