linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Ben Niu <BenNiu@meta.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: <catalin.marinas@arm.com>, <will@kernel.org>, <tytso@mit.edu>,
	<Jason@zx2c4.com>, <linux-arm-kernel@lists.infradead.org>,
	<niuben003@gmail.com>, <benniu@meta.com>,
	<linux-kernel@vger.kernel.org>
Subject: Withdraw [PATCH] tracing: Enable kprobe tracing for Arm64 asm functions
Date: Wed, 10 Dec 2025 12:16:17 -0800	[thread overview]
Message-ID: <aTnVEdv2j5ur48aF@meta.com> (raw)
In-Reply-To: <aRr6Ll7h29dpt7zz@J2N7QTR9R3>

On Mon, Nov 17, 2025 at 10:34:22AM +0000, Mark Rutland wrote:
> On Thu, Oct 30, 2025 at 11:07:51AM -0700, Ben Niu wrote:
> > On Thu, Oct 30, 2025 at 12:35:25PM +0000, Mark Rutland wrote:
> > > Is there something specific you want to trace, but cannot currently
> > > trace (on arm64)?
> > 
> > For some reason, we only saw Arm64 Linux asm functions __arch_copy_to_user and
> > __arch_copy_from_user being hot in our workloads, not those counterpart asm
> > functions on x86, so we are trying to understand and improve performance of
> > those Arm64 asm functions.
> 
> Are you sure that's not an artifact of those being out-of-line on arm64,
> but inline on x86? On x86, the out-of-line forms are only used when the
> CPU doesn't have FSRM, and when the CPU *does* have FSRM, the logic gets
> inlined. See raw_copy_from_user(), raw_copy_to_user(), and
> copy_user_generic() in arch/x86/include/asm/uaccess_64.h.

On x86, INLINE_COPY_TO_USER is not defined in the latest linux kernel and our
internal branch, so _copy_to_user is always defined as an extern function
(no-inline), which ends up inlining copy_user_generic. copy_user_generic
executes FSRM rep movs if CPU supports it (our case), otherwise, it calls
rep_movs_alternative, which issues plain movs to copy memory.

FSRM is vectorized internally by the CPU and Arm FEAT_MOPS seems similar to
what FSRM does, but Neoverse V2 does not support FEAT_MOPS so the kernel
resorts to sttr to copy memory.

> Have you checked that inlining is not skewing your results, and
> artificially making those look hotter on am64 by virtue of centralizing
> samples to the same IP/PC range?

As mentioned above, _copy_to_user is not inlined on x86.

> Can you share any information on those workloads? e.g. which callchains
> were hot?

Please reach out to James Greenhalgh and Chris Goodyer at Arm for more details
about those workloads, which I can't share in a public channel.

By bpftracing, we found that most of __arch_copy_to_user calls are against size
2k-4k. I wrote a simple microbenchmark to test and compare the performance of
copy_to_user on both x64 and arm64, see
https://github.com/mcfi/benchmark/tree/main/ku_copy
Below were data collected. You could see that Arm64 ku copy speed had a dip
when the size hits 2K-16K.

Routine	        Size	Arm64 bytes/s / x64 bytes /s
copy_to_user	8	203.4%
copy_from_user	8	248.5%
copy_to_user	16	194.6%
copy_from_user	16	240.6%
copy_to_user	32	195.0%
copy_from_user	32	236.5%
copy_to_user	64	193.2%
copy_from_user	64	240.4%
copy_to_user	128	211.0%
copy_from_user	128	273.0%
copy_to_user	256	185.3%
copy_from_user	256	229.3%
copy_to_user	512	152.9%
copy_from_user	512	182.4%
copy_to_user	1024	121.0%
copy_from_user	1024	141.9%
copy_to_user	2048	95.1%
copy_from_user	2048	108.7%
copy_to_user	4096	78.9%
copy_from_user	4096	85.7%
copy_to_user	8192	69.7%
copy_from_user	8192	73.5%
copy_to_user	16384	65.9%
copy_from_user	16384	68.8%
copy_to_user	32768	117.3%
copy_from_user	32768	118.6%
copy_to_user	65536	115.5%
copy_from_user	65536	117.4%
copy_to_user	131072	114.6%
copy_from_user	131072	113.3%
copy_to_user	262144	114.4%
copy_from_user	262144	109.3%

I sent out a patch for faster __arch_copy_from_user/__arch_copy_to_user. Please
see this message ID 20251018052237.1368504-2-benniu@meta.com for more details.

> Mark.

In addition, I'm withdrawing this patch because we found a workaround to trace
__arch_copy_to_user/__arch_copy_from_user through watchpoints. This message ID
aTjpzfqTEGPcird5@meta.com should have all the details. Thank you for reviewing
this patch and all your helpful comments.

Ben


  reply	other threads:[~2025-12-10 20:16 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-27 18:17 [PATCH] tracing: Enable kprobe tracing for Arm64 asm functions Ben Niu
2025-10-29 14:33 ` Mark Rutland
2025-10-29 23:53   ` Ben Niu
2025-10-30 12:35     ` Mark Rutland
2025-10-30 18:07       ` Ben Niu
2025-11-17 10:34         ` Mark Rutland
2025-12-10 20:16           ` Ben Niu [this message]
2025-12-12 16:33             ` Withdraw " Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aTnVEdv2j5ur48aF@meta.com \
    --to=benniu@meta.com \
    --cc=Jason@zx2c4.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=niuben003@gmail.com \
    --cc=tytso@mit.edu \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).