From: Mark Rutland <mark.rutland@arm.com>
To: "Havens, Austin" <austin.havens@anritsu.com>
Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>,
"will@kernel.org" <will@kernel.org>,
"michal.simek@amd.com" <michal.simek@amd.com>,
"Suresh, Siddarth" <siddarth.suresh@anritsu.com>,
"Lui, Vincent" <vincent.lui@anritsu.com>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>
Subject: Re: Slowdown copying data between kernel versions 4.19 and 5.15
Date: Thu, 29 Jun 2023 15:24:33 +0100 [thread overview]
Message-ID: <ZJ2UIfbCWeLzRMfs@FVFF77S0Q05N> (raw)
In-Reply-To: <OS3P301MB04215E0BD835F0742EA377D9EE24A@OS3P301MB0421.JPNP301.PROD.OUTLOOK.COM>
On Wed, Jun 28, 2023 at 09:38:14PM +0000, Havens, Austin wrote:
> >After some investigation I am guessing the issue is either in the iovector
> >iteration changes (around
> >https://elixir.bootlin.com/linux/v5.15/source/lib/iov_iter.c#L922 ) or the
> >lower level changes in arch/arm64/lib/copy_from_user.S, but I am pretty out
> >of my depth so it is just speculation.
>
> After comparing the dissassembly of __arch_copy_from_user on both kernels and
> going through commit logs, I figured out the slowdown was mostly due to to
> the changes from commit c703d80130b1c9d6783f4cbb9516fd5fe4a750d, specifially
> the changes to uao_ldp.
For the benefit of others, that's commit:
fc703d80130b1c9d ("arm64: uaccess: split user/kernel routine")
>
> diff --git a/arch/arm64/include/asm/asm-uaccess.h b/arch/arm64/include/asm/asm-uaccess.h
> index 2c26ca5b7bb0..2b5454fa0f24 100644
> --- a/arch/arm64/include/asm/asm-uaccess.h
> +++ b/arch/arm64/include/asm/asm-uaccess.h
> @@ -59,62 +59,32 @@ alternative_else_nop_endif
> #endif
>
> /*
> - * Generate the assembly for UAO alternatives with exception table entries.
> + * Generate the assembly for LDTR/STTR with exception table entries.
> * This is complicated as there is no post-increment or pair versions of the
> * unprivileged instructions, and USER() only works for single instructions.
> */
> -#ifdef CONFIG_ARM64_UAO
> .macro uao_ldp l, reg1, reg2, addr, post_inc
> - alternative_if_not ARM64_HAS_UAO
> -8888: ldp \reg1, \reg2, [\addr], \post_inc;
> -8889: nop;
> - nop;
> - alternative_else
> - ldtr \reg1, [\addr];
> - ldtr \reg2, [\addr, #8];
> - add \addr, \addr, \post_inc;
> - alternative_endif
> +8888: ldtr \reg1, [\addr];
> +8889: ldtr \reg2, [\addr, #8];
> + add \addr, \addr, \post_inc;
>
> _asm_extable 8888b,\l;
> _asm_extable 8889b,\l;
> .endm
>
> I could not directly revert the changes to test since more names changed in
> other commits than I cared to figure out, but I hacked out that change, and
> saw that the performance of the test program was basically back to normal.
>
> diff --git a/arch/arm64/include/asm/asm-uaccess.h b/arch/arm64/include/asm/asm-uaccess.h
> index ccedf548dac9..2ddf7eba46fd 100644
> --- a/arch/arm64/include/asm/asm-uaccess.h
> +++ b/arch/arm64/include/asm/asm-uaccess.h
> @@ -64,9 +64,9 @@ alternative_else_nop_endif
> * unprivileged instructions, and USER() only works for single instructions.
> */
> .macro user_ldp l, reg1, reg2, addr, post_inc
> -8888: ldtr \reg1, [\addr];
> -8889: ldtr \reg2, [\addr, #8];
> - add \addr, \addr, \post_inc;
> +8888: ldp \reg1, \reg2, [\addr], \post_inc;
> +8889: nop;
> + nop;
As Catalin noted, we can't make that change generally as it'd be broken for any
system with PAN, and in general we *really* want to use LDTR/STTR for user
accesses to catch any misuse with kernel pointers.
> Profiling with the hacked __arch_copy_from_user
> root@ahraptor:/tmp# perf stat -einstructions -ecycles -e ld_dep_stall -e read_alloc -e dTLB-load-misses /mnt/usrroot/test_copy
>
> Performance counter stats for '/mnt/usrroot/test_copy':
>
> 11822342 instructions # 0.23 insn per cycle
> 50689594 cycles
> 37627922 ld_dep_stall
> 17933 read_alloc
> 3421 dTLB-load-misses
>
> 0.043440253 seconds time elapsed
>
> 0.004382000 seconds user
> 0.039442000 seconds sys
>
> Unfortunately the hack crashes in other cases so it is not a viable solution
> for us. Also, on our actual workload there is still a small difference in
> performance remaining that I have not tracked down yet (I am guessing it has
> to do with the dTLB-load-misses remaining higher).
>
> Note, I think that the slow down is only noticeable in cases like ours where
> the data being copied from is not in cache (for us, because the FPGA writes
> it).
When you say "is not in cache", what exactly do you mean? If this were just the
latency of filling a cache I wouldn't expect the size of the first access to
make a difference, so I'm assuming the source buffer is not mapped with
cacheable memory attributes, which we generally assume.
Which memory attribues are the source and destination buffers mapped with? Is
that Normal-WB, Normal-NC, or Device? How exactly has that memory been mapped?
I'm assuming this is with some out-of-tree driver; if that's in a public tree
could you please provide a pointer to it?
Thanks,
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-06-29 14:25 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-23 21:30 Slowdown copying data between kernel versions 4.19 and 5.15 Havens, Austin
2023-06-28 21:38 ` Havens, Austin
2023-06-29 13:09 ` Robin Murphy
2023-06-29 13:36 ` Catalin Marinas
2023-06-29 14:24 ` Mark Rutland [this message]
2023-06-29 18:33 ` Robin Murphy
2023-06-29 19:33 ` Havens, Austin
2023-06-30 11:15 ` Mark Rutland
2023-07-06 19:12 ` Havens, Austin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZJ2UIfbCWeLzRMfs@FVFF77S0Q05N \
--to=mark.rutland@arm.com \
--cc=austin.havens@anritsu.com \
--cc=catalin.marinas@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=michal.simek@amd.com \
--cc=siddarth.suresh@anritsu.com \
--cc=vincent.lui@anritsu.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox