* [PATCH v2 0/2] riscv: enable EFFICIENT_UNALIGNED_ACCESS and DCACHE_WORD_ACCESS
@ 2023-12-03 13:57 Jisheng Zhang
2023-12-03 13:57 ` [PATCH v2 1/2] riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS Jisheng Zhang
2023-12-03 13:57 ` [PATCH v2 2/2] riscv: select DCACHE_WORD_ACCESS for efficient unaligned access HW Jisheng Zhang
0 siblings, 2 replies; 11+ messages in thread
From: Jisheng Zhang @ 2023-12-03 13:57 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou
Cc: Conor Dooley, linux-riscv, linux-kernel
Some riscv implementations such as T-HEAD's C906, C908, C910 and C920
support efficient unaligned access, for performance reason we want
to enable HAVE_EFFICIENT_UNALIGNED_ACCESS on these platforms. To
avoid performance regressions on non efficient unaligned access
platforms, HAVE_EFFICIENT_UNALIGNED_ACCESS can't be globally selected.
To solve this problem, runtime code patching based on the detected
speed is a good solution. But that's not easy, it involves lots of
work to modify vairous subsystems such as net, mm, lib and so on.
This can be done step by step.
So let's take an easier solution: add support to efficient unaligned
access and hide the support under NONPORTABLE.
patch1 introduces RISCV_EFFICIENT_UNALIGNED_ACCESS which depends on
NONPORTABLE, if users know during config time that the kernel will be
only run on those efficient unaligned access hw platforms, they can
enable it. Obviously, generic unified kernel Image shouldn't enable it.
patch2 adds support DCACHE_WORD_ACCESS when MMU and
RISCV_EFFICIENT_UNALIGNED_ACCESS.
Below test program and step shows how much performance can be improved:
$ cat tt.c
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#define ITERATIONS 1000000
#define PATH "123456781234567812345678123456781"
int main(void)
{
unsigned long i;
struct stat buf;
for (i = 0; i < ITERATIONS; i++)
stat(PATH, &buf);
return 0;
}
$ gcc -O2 tt.c
$ touch 123456781234567812345678123456781
$ time ./a.out
Per my test on T-HEAD C910 platforms, the above test performance is
improved by about 7.5%.
Since v1:
- fix typo in commit msg
- fix build error if NOMMU
Jisheng Zhang (2):
riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS
riscv: select DCACHE_WORD_ACCESS for efficient unaligned access HW
arch/riscv/Kconfig | 13 +++++++++++
arch/riscv/include/asm/asm-extable.h | 15 ++++++++++++
arch/riscv/include/asm/word-at-a-time.h | 27 +++++++++++++++++++++
arch/riscv/mm/extable.c | 31 +++++++++++++++++++++++++
4 files changed, 86 insertions(+)
--
2.42.0
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH v2 1/2] riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS 2023-12-03 13:57 [PATCH v2 0/2] riscv: enable EFFICIENT_UNALIGNED_ACCESS and DCACHE_WORD_ACCESS Jisheng Zhang @ 2023-12-03 13:57 ` Jisheng Zhang 2023-12-04 19:15 ` Charlie Jenkins 2023-12-05 8:39 ` Qingfang DENG 2023-12-03 13:57 ` [PATCH v2 2/2] riscv: select DCACHE_WORD_ACCESS for efficient unaligned access HW Jisheng Zhang 1 sibling, 2 replies; 11+ messages in thread From: Jisheng Zhang @ 2023-12-03 13:57 UTC (permalink / raw) To: Paul Walmsley, Palmer Dabbelt, Albert Ou Cc: Conor Dooley, linux-riscv, linux-kernel Some riscv implementations such as T-HEAD's C906, C908, C910 and C920 support efficient unaligned access, for performance reason we want to enable HAVE_EFFICIENT_UNALIGNED_ACCESS on these platforms. To avoid performance regressions on other non efficient unaligned access platforms, HAVE_EFFICIENT_UNALIGNED_ACCESS can't be globally selected. To solve this problem, runtime code patching based on the detected speed is a good solution. But that's not easy, it involves lots of work to modify vairous subsystems such as net, mm, lib and so on. This can be done step by step. So let's take an easier solution: add support to efficient unaligned access and hide the support under NONPORTABLE. Now let's introduce RISCV_EFFICIENT_UNALIGNED_ACCESS which depends on NONPORTABLE, if users know during config time that the kernel will be only run on those efficient unaligned access hw platforms, they can enable it. Obviously, generic unified kernel Image shouldn't enable it. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> --- arch/riscv/Kconfig | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 7f8aa25457ba..0a76209e9b02 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -654,6 +654,18 @@ config RISCV_MISALIGNED load/store for both kernel and userspace. When disable, misaligned accesses will generate SIGBUS in userspace and panic in kernel. +config RISCV_EFFICIENT_UNALIGNED_ACCESS + bool "Use unaligned access for some functions" + depends on NONPORTABLE + select HAVE_EFFICIENT_UNALIGNED_ACCESS + default n + help + Say Y here if you want the kernel only run on hardware platforms which + support efficient unaligned access, then unaligned access will be used + in some functions for optimized performance. + + If unsure what to do here, say N. + endmenu # "Platform type" menu "Kernel features" -- 2.42.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/2] riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS 2023-12-03 13:57 ` [PATCH v2 1/2] riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS Jisheng Zhang @ 2023-12-04 19:15 ` Charlie Jenkins 2023-12-05 2:14 ` Eric Biggers 2023-12-05 8:39 ` Qingfang DENG 1 sibling, 1 reply; 11+ messages in thread From: Charlie Jenkins @ 2023-12-04 19:15 UTC (permalink / raw) To: Jisheng Zhang Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Conor Dooley, linux-riscv, linux-kernel On Sun, Dec 03, 2023 at 09:57:52PM +0800, Jisheng Zhang wrote: > Some riscv implementations such as T-HEAD's C906, C908, C910 and C920 > support efficient unaligned access, for performance reason we want > to enable HAVE_EFFICIENT_UNALIGNED_ACCESS on these platforms. To > avoid performance regressions on other non efficient unaligned access > platforms, HAVE_EFFICIENT_UNALIGNED_ACCESS can't be globally selected. > > To solve this problem, runtime code patching based on the detected > speed is a good solution. But that's not easy, it involves lots of > work to modify vairous subsystems such as net, mm, lib and so on. > This can be done step by step. > > So let's take an easier solution: add support to efficient unaligned > access and hide the support under NONPORTABLE. > > Now let's introduce RISCV_EFFICIENT_UNALIGNED_ACCESS which depends on > NONPORTABLE, if users know during config time that the kernel will be > only run on those efficient unaligned access hw platforms, they can > enable it. Obviously, generic unified kernel Image shouldn't enable it. > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org> > --- > arch/riscv/Kconfig | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index 7f8aa25457ba..0a76209e9b02 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -654,6 +654,18 @@ config RISCV_MISALIGNED > load/store for both kernel and userspace. When disable, misaligned > accesses will generate SIGBUS in userspace and panic in kernel. > > +config RISCV_EFFICIENT_UNALIGNED_ACCESS There already exists hwprobe for this purpose. If kernel code wants to leverage the efficient unaligned accesses of hardware, it can use static keys. I have a patch that will set this static key if the hardware was detected to have fast unaligned accesses: https://lore.kernel.org/linux-riscv/20231117-optimize_checksum-v11-2-7d9d954fe361@rivosinc.com/ - Charlie > + bool "Use unaligned access for some functions" > + depends on NONPORTABLE > + select HAVE_EFFICIENT_UNALIGNED_ACCESS > + default n > + help > + Say Y here if you want the kernel only run on hardware platforms which > + support efficient unaligned access, then unaligned access will be used > + in some functions for optimized performance. > + > + If unsure what to do here, say N. > + > endmenu # "Platform type" > > menu "Kernel features" > -- > 2.42.0 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/2] riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS 2023-12-04 19:15 ` Charlie Jenkins @ 2023-12-05 2:14 ` Eric Biggers 2023-12-05 13:53 ` Jisheng Zhang 0 siblings, 1 reply; 11+ messages in thread From: Eric Biggers @ 2023-12-05 2:14 UTC (permalink / raw) To: Charlie Jenkins Cc: Jisheng Zhang, Paul Walmsley, Palmer Dabbelt, Albert Ou, Conor Dooley, linux-riscv, linux-kernel On Mon, Dec 04, 2023 at 11:15:28AM -0800, Charlie Jenkins wrote: > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > > index 7f8aa25457ba..0a76209e9b02 100644 > > --- a/arch/riscv/Kconfig > > +++ b/arch/riscv/Kconfig > > @@ -654,6 +654,18 @@ config RISCV_MISALIGNED > > load/store for both kernel and userspace. When disable, misaligned > > accesses will generate SIGBUS in userspace and panic in kernel. > > > > +config RISCV_EFFICIENT_UNALIGNED_ACCESS > > There already exists hwprobe for this purpose. If kernel code wants to > leverage the efficient unaligned accesses of hardware, it can use static > keys. I have a patch that will set this static key if the hardware was > detected to have fast unaligned accesses: > > https://lore.kernel.org/linux-riscv/20231117-optimize_checksum-v11-2-7d9d954fe361@rivosinc.com/ Is the plan to make the get_unaligned* and put_unaligned* macros expand to code for both cases, and select between them using a static key? Note that there are a very large number of callers of these macros in the kernel. And what about kernel code that checks CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS directly? AFAIK, no other Linux architecture supports kernel images where the unaligned access support is unknown at compile time. It's not clear to me that such an approach is feasible. A static key can easily be provided, but it's unclear what code would use it, given that currently lots of kernel code assumes that unaligned access support is known at compile time. Meanwhile, there are people building kernels they know will only be deployed on systems where unaligned accesses are supported. To me, it seems useful to provide a kconfig option for them to build a more efficient kernel. - Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/2] riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS 2023-12-05 2:14 ` Eric Biggers @ 2023-12-05 13:53 ` Jisheng Zhang 2023-12-05 20:56 ` Charlie Jenkins 0 siblings, 1 reply; 11+ messages in thread From: Jisheng Zhang @ 2023-12-05 13:53 UTC (permalink / raw) To: Eric Biggers Cc: Charlie Jenkins, Paul Walmsley, Palmer Dabbelt, Albert Ou, Conor Dooley, linux-riscv, linux-kernel On Mon, Dec 04, 2023 at 06:14:06PM -0800, Eric Biggers wrote: > On Mon, Dec 04, 2023 at 11:15:28AM -0800, Charlie Jenkins wrote: > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > > > index 7f8aa25457ba..0a76209e9b02 100644 > > > --- a/arch/riscv/Kconfig > > > +++ b/arch/riscv/Kconfig > > > @@ -654,6 +654,18 @@ config RISCV_MISALIGNED > > > load/store for both kernel and userspace. When disable, misaligned > > > accesses will generate SIGBUS in userspace and panic in kernel. > > > > > > +config RISCV_EFFICIENT_UNALIGNED_ACCESS > > > > There already exists hwprobe for this purpose. If kernel code wants to > > leverage the efficient unaligned accesses of hardware, it can use static > > keys. I have a patch that will set this static key if the hardware was > > detected to have fast unaligned accesses: > > > > https://lore.kernel.org/linux-riscv/20231117-optimize_checksum-v11-2-7d9d954fe361@rivosinc.com/ > > Is the plan to make the get_unaligned* and put_unaligned* macros expand to code > for both cases, and select between them using a static key? Note that there are > a very large number of callers of these macros in the kernel. And what about > kernel code that checks CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS directly? > > AFAIK, no other Linux architecture supports kernel images where the unaligned > access support is unknown at compile time. It's not clear to me that such an > approach is feasible. A static key can easily be provided, but it's unclear > what code would use it, given that currently lots of kernel code assumes that > unaligned access support is known at compile time. > > Meanwhile, there are people building kernels they know will only be deployed on > systems where unaligned accesses are supported. To me, it seems useful to > provide a kconfig option for them to build a more efficient kernel. Generally, I agree with Eric's above points. Various subsystem such as net, mm, lib and so on have different code path for CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, while Charlie's patch only touch partial code of arch/riscv, and even if those subsystem maintainers agree with dynamic code patching(I still believe persuading those subsystem maintainers is not easy), that's still a huge task which needs to be done step by step. So before that, we'd better let this series merged and benefit all efficient unaligned access riscv systems. When the huge task is completed, we can remove the config option. Thanks ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/2] riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS 2023-12-05 13:53 ` Jisheng Zhang @ 2023-12-05 20:56 ` Charlie Jenkins 2023-12-06 0:05 ` Charles Lohr 0 siblings, 1 reply; 11+ messages in thread From: Charlie Jenkins @ 2023-12-05 20:56 UTC (permalink / raw) To: Jisheng Zhang Cc: Eric Biggers, Paul Walmsley, Palmer Dabbelt, Albert Ou, Conor Dooley, linux-riscv, linux-kernel On Tue, Dec 05, 2023 at 09:53:50PM +0800, Jisheng Zhang wrote: > On Mon, Dec 04, 2023 at 06:14:06PM -0800, Eric Biggers wrote: > > On Mon, Dec 04, 2023 at 11:15:28AM -0800, Charlie Jenkins wrote: > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > > > > index 7f8aa25457ba..0a76209e9b02 100644 > > > > --- a/arch/riscv/Kconfig > > > > +++ b/arch/riscv/Kconfig > > > > @@ -654,6 +654,18 @@ config RISCV_MISALIGNED > > > > load/store for both kernel and userspace. When disable, misaligned > > > > accesses will generate SIGBUS in userspace and panic in kernel. > > > > > > > > +config RISCV_EFFICIENT_UNALIGNED_ACCESS > > > > > > There already exists hwprobe for this purpose. If kernel code wants to > > > leverage the efficient unaligned accesses of hardware, it can use static > > > keys. I have a patch that will set this static key if the hardware was > > > detected to have fast unaligned accesses: > > > > > > https://lore.kernel.org/linux-riscv/20231117-optimize_checksum-v11-2-7d9d954fe361@rivosinc.com/ > > > > Is the plan to make the get_unaligned* and put_unaligned* macros expand to code > > for both cases, and select between them using a static key? Note that there are > > a very large number of callers of these macros in the kernel. And what about > > kernel code that checks CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS directly? > > > > AFAIK, no other Linux architecture supports kernel images where the unaligned > > access support is unknown at compile time. It's not clear to me that such an > > approach is feasible. A static key can easily be provided, but it's unclear > > what code would use it, given that currently lots of kernel code assumes that > > unaligned access support is known at compile time. > > > > Meanwhile, there are people building kernels they know will only be deployed on > > systems where unaligned accesses are supported. To me, it seems useful to > > provide a kconfig option for them to build a more efficient kernel. > > Generally, I agree with Eric's above points. Various subsystem such as net, mm, > lib and so on have different code path for CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, > while Charlie's patch only touch partial code of arch/riscv, and even if those > subsystem maintainers agree with dynamic code patching(I still believe > persuading those subsystem maintainers is not easy), that's still a > huge task which needs to be done step by step. So before that, we'd > better let this series merged and benefit all efficient unaligned access > riscv systems. When the huge task is completed, we can remove the config > option. > > Thanks It would be best to enable all of the paths that leverage CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS at runtime (using hwprobe) instead of using a compile-time flag to do so. However, as you say, that is large task and doesn't need to be done immediately. For now I agree it is sufficient to use this new RISCV_EFFICIENT_UNALIGNED_ACCESS config. - Charlie Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/2] riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS 2023-12-05 20:56 ` Charlie Jenkins @ 2023-12-06 0:05 ` Charles Lohr 2023-12-06 16:19 ` Palmer Dabbelt 0 siblings, 1 reply; 11+ messages in thread From: Charles Lohr @ 2023-12-06 0:05 UTC (permalink / raw) To: Charlie Jenkins Cc: Jisheng Zhang, Eric Biggers, Paul Walmsley, Palmer Dabbelt, Albert Ou, Conor Dooley, linux-riscv, linux-kernel The automatic detection code has become a bit of a thorn both for folks like me who use the kernel for some fast-spin up aodr virtualization (where check_unaligned_access soaks up 1/4 to 1/3 of the total boot time and unaligned accesses are always fast) as well as causing issues for the FPGA soft core development where they easily know ahead of time what the situation is going to be. It would be extremely welcome if the access could always be overridden with a config value that could either force on or force off unaligned access and avoid execution of the check function permanently. I don't see a world where for some of us, we would ever want autodetection on. In the RISC-V arena, many times we're dealing with very small systems where the marginal cost of dead code is rather high. On Tue, Dec 5, 2023 at 12:57 PM Charlie Jenkins <charlie@rivosinc.com> wrote: > > On Tue, Dec 05, 2023 at 09:53:50PM +0800, Jisheng Zhang wrote: > > On Mon, Dec 04, 2023 at 06:14:06PM -0800, Eric Biggers wrote: > > > On Mon, Dec 04, 2023 at 11:15:28AM -0800, Charlie Jenkins wrote: > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > > > > > index 7f8aa25457ba..0a76209e9b02 100644 > > > > > --- a/arch/riscv/Kconfig > > > > > +++ b/arch/riscv/Kconfig > > > > > @@ -654,6 +654,18 @@ config RISCV_MISALIGNED > > > > > load/store for both kernel and userspace. When disable, misaligned > > > > > accesses will generate SIGBUS in userspace and panic in kernel. > > > > > > > > > > +config RISCV_EFFICIENT_UNALIGNED_ACCESS > > > > > > > > There already exists hwprobe for this purpose. If kernel code wants to > > > > leverage the efficient unaligned accesses of hardware, it can use static > > > > keys. I have a patch that will set this static key if the hardware was > > > > detected to have fast unaligned accesses: > > > > > > > > https://lore.kernel.org/linux-riscv/20231117-optimize_checksum-v11-2-7d9d954fe361@rivosinc.com/ > > > > > > Is the plan to make the get_unaligned* and put_unaligned* macros expand to code > > > for both cases, and select between them using a static key? Note that there are > > > a very large number of callers of these macros in the kernel. And what about > > > kernel code that checks CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS directly? > > > > > > AFAIK, no other Linux architecture supports kernel images where the unaligned > > > access support is unknown at compile time. It's not clear to me that such an > > > approach is feasible. A static key can easily be provided, but it's unclear > > > what code would use it, given that currently lots of kernel code assumes that > > > unaligned access support is known at compile time. > > > > > > Meanwhile, there are people building kernels they know will only be deployed on > > > systems where unaligned accesses are supported. To me, it seems useful to > > > provide a kconfig option for them to build a more efficient kernel. > > > > Generally, I agree with Eric's above points. Various subsystem such as net, mm, > > lib and so on have different code path for CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, > > while Charlie's patch only touch partial code of arch/riscv, and even if those > > subsystem maintainers agree with dynamic code patching(I still believe > > persuading those subsystem maintainers is not easy), that's still a > > huge task which needs to be done step by step. So before that, we'd > > better let this series merged and benefit all efficient unaligned access > > riscv systems. When the huge task is completed, we can remove the config > > option. > > > > Thanks > > It would be best to enable all of the paths that leverage > CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS at runtime (using hwprobe) > instead of using a compile-time flag to do so. However, as you say, that > is large task and doesn't need to be done immediately. For now I agree > it is sufficient to use this new RISCV_EFFICIENT_UNALIGNED_ACCESS > config. > > - Charlie > > Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/2] riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS 2023-12-06 0:05 ` Charles Lohr @ 2023-12-06 16:19 ` Palmer Dabbelt 0 siblings, 0 replies; 11+ messages in thread From: Palmer Dabbelt @ 2023-12-06 16:19 UTC (permalink / raw) To: lohr85 Cc: charlie, jszhang, ebiggers, Paul Walmsley, aou, Conor Dooley, linux-riscv, linux-kernel On Tue, 05 Dec 2023 16:05:27 PST (-0800), lohr85@gmail.com wrote: > The automatic detection code has become a bit of a thorn both for > folks like me who use the kernel for some fast-spin up aodr > virtualization (where check_unaligned_access soaks up 1/4 to 1/3 of > the total boot time and unaligned accesses are always fast) as well as > causing issues for the FPGA soft core development where they easily > know ahead of time what the situation is going to be. It would be > extremely welcome if the access could always be overridden with a > config value that could either force on or force off unaligned access > and avoid execution of the check function permanently. I don't see a > world where for some of us, we would ever want autodetection on. In > the RISC-V arena, many times we're dealing with very small systems > where the marginal cost of dead code is rather high. That seems generally reasonable to me. We'd talked about putting misaligned access performance informaiton in the DT at some point, but we went with the probing instead. So I think our options are a Kconfig or a kernel command line argument, both seem generally useful to me so I'd be fine with either (or both). So I think someone should send a patch... ;) Also: I think it's not really a blocker for this patch set, as the probing behavior is there already. IIUC it's really the probing that's the problem here due to the boot time performance impact, so even if we did nothing with the probed information it'd still be causing your issues. > On Tue, Dec 5, 2023 at 12:57 PM Charlie Jenkins <charlie@rivosinc.com> wrote: >> >> On Tue, Dec 05, 2023 at 09:53:50PM +0800, Jisheng Zhang wrote: >> > On Mon, Dec 04, 2023 at 06:14:06PM -0800, Eric Biggers wrote: >> > > On Mon, Dec 04, 2023 at 11:15:28AM -0800, Charlie Jenkins wrote: >> > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig >> > > > > index 7f8aa25457ba..0a76209e9b02 100644 >> > > > > --- a/arch/riscv/Kconfig >> > > > > +++ b/arch/riscv/Kconfig >> > > > > @@ -654,6 +654,18 @@ config RISCV_MISALIGNED >> > > > > load/store for both kernel and userspace. When disable, misaligned >> > > > > accesses will generate SIGBUS in userspace and panic in kernel. >> > > > > >> > > > > +config RISCV_EFFICIENT_UNALIGNED_ACCESS >> > > > >> > > > There already exists hwprobe for this purpose. If kernel code wants to >> > > > leverage the efficient unaligned accesses of hardware, it can use static >> > > > keys. I have a patch that will set this static key if the hardware was >> > > > detected to have fast unaligned accesses: >> > > > >> > > > https://lore.kernel.org/linux-riscv/20231117-optimize_checksum-v11-2-7d9d954fe361@rivosinc.com/ >> > > >> > > Is the plan to make the get_unaligned* and put_unaligned* macros expand to code >> > > for both cases, and select between them using a static key? Note that there are >> > > a very large number of callers of these macros in the kernel. And what about >> > > kernel code that checks CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS directly? >> > > >> > > AFAIK, no other Linux architecture supports kernel images where the unaligned >> > > access support is unknown at compile time. It's not clear to me that such an >> > > approach is feasible. A static key can easily be provided, but it's unclear >> > > what code would use it, given that currently lots of kernel code assumes that >> > > unaligned access support is known at compile time. I agree we won't be able to get everything, but there's some focused routines like memcpy() where having runtime-variant behavior can make things measurably faster. I'd guess there's some of this in crypto land as well. We'd have to really look into the benefits, though: not only do we end up with a bunch of complexity, but also using ALTERNATIVE() tends to cause lower quality codegen because of all the inline assembly trickery. All of that is really based on replacing a whole function at runtime, though. I don't think we're going to be able to do anything dynamic for the more general case of misaligned access support, though -- that's really in the relm of fine-grained compiler code generation, and trying to do that at runtime with the alternative-type approach is just going to lead to a bunch of poor quality codegen and patched-in NOPs. We'd essentially be trying to build a full JIT inside the kernel at that point. It essentially the same problem as things like CMOV and bitmanip. >> > > Meanwhile, there are people building kernels they know will only be deployed on >> > > systems where unaligned accesses are supported. To me, it seems useful to >> > > provide a kconfig option for them to build a more efficient kernel. I agree. We've got a bit of a mess in Kconfig land where we don't differentiate between "build a kernel that tries to probe for $FEATURE" and "build a kernel that requires HW that supports $FEATURE". We need to clean that up at some point, but there's enough of them I'm OK taking one more. >> > Generally, I agree with Eric's above points. Various subsystem such as net, mm, >> > lib and so on have different code path for CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, >> > while Charlie's patch only touch partial code of arch/riscv, and even if those >> > subsystem maintainers agree with dynamic code patching(I still believe >> > persuading those subsystem maintainers is not easy), that's still a >> > huge task which needs to be done step by step. So before that, we'd >> > better let this series merged and benefit all efficient unaligned access >> > riscv systems. When the huge task is completed, we can remove the config >> > option. >> > >> > Thanks >> >> It would be best to enable all of the paths that leverage >> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS at runtime (using hwprobe) >> instead of using a compile-time flag to do so. However, as you say, that >> is large task and doesn't need to be done immediately. For now I agree >> it is sufficient to use this new RISCV_EFFICIENT_UNALIGNED_ACCESS >> config. We've got a lot more JIT-ish stuff in the RISC-V port than other ports do, it's kind of ugly but that's just the nature of the ISA. It's kind of the same spot we're in with things like CMOV or the bitmanip extensions: there'll be some specific routines where the feature makes a big difference and we can provide an alternative (string and crypto stuff, for example), but trying to do it everywhere is just going to lead to chaos (and probably worse performance). So I don't know exactly where the line is, but we're always going to have some amount of compile-time performance tuning -- at least until we just replace the whole kernel with BPF ;) >> >> - Charlie >> >> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> >> >> >> _______________________________________________ >> linux-riscv mailing list >> linux-riscv@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/2] riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS 2023-12-03 13:57 ` [PATCH v2 1/2] riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS Jisheng Zhang 2023-12-04 19:15 ` Charlie Jenkins @ 2023-12-05 8:39 ` Qingfang DENG 2023-12-22 5:04 ` Eric Biggers 1 sibling, 1 reply; 11+ messages in thread From: Qingfang DENG @ 2023-12-05 8:39 UTC (permalink / raw) Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Conor Dooley, linux-riscv, linux-kernel Hi, You may as well remove the -mstrict-align CFLAGS in the Makefile, if this option is enabled: --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -108,7 +108,9 @@ KBUILD_AFLAGS_MODULE += $(call as-option,-Wa$(comma)-mno-relax) # unaligned accesses. While unaligned accesses are explicitly allowed in the # RISC-V ISA, they're emulated by machine mode traps on all extant # architectures. It's faster to have GCC emit only aligned accesses. +ifneq ($(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS),y) KBUILD_CFLAGS += $(call cc-option,-mstrict-align) +endif ifeq ($(CONFIG_STACKPROTECTOR_PER_TASK),y) prepare: stack_protector_prepare -- - Qingfang ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/2] riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS 2023-12-05 8:39 ` Qingfang DENG @ 2023-12-22 5:04 ` Eric Biggers 0 siblings, 0 replies; 11+ messages in thread From: Eric Biggers @ 2023-12-22 5:04 UTC (permalink / raw) To: Qingfang DENG Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Conor Dooley, linux-riscv, linux-kernel, Jisheng Zhang On Tue, Dec 05, 2023 at 04:39:24PM +0800, Qingfang DENG wrote: > Hi, > > You may as well remove the -mstrict-align CFLAGS in the Makefile, if > this option is enabled: > > --- a/arch/riscv/Makefile > +++ b/arch/riscv/Makefile > @@ -108,7 +108,9 @@ KBUILD_AFLAGS_MODULE += $(call as-option,-Wa$(comma)-mno-relax) > # unaligned accesses. While unaligned accesses are explicitly allowed in the > # RISC-V ISA, they're emulated by machine mode traps on all extant > # architectures. It's faster to have GCC emit only aligned accesses. > +ifneq ($(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS),y) > KBUILD_CFLAGS += $(call cc-option,-mstrict-align) > +endif > Agreed. When CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y, we shouldn't use -mstrict-align, so that the compiler can actually use unaligned memory accesses. If I understand correctly, beyond the change requested above, people seem to be happy with this patch. Jisheng, can you resend it with the above feedback addressed? Thanks! - Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 2/2] riscv: select DCACHE_WORD_ACCESS for efficient unaligned access HW 2023-12-03 13:57 [PATCH v2 0/2] riscv: enable EFFICIENT_UNALIGNED_ACCESS and DCACHE_WORD_ACCESS Jisheng Zhang 2023-12-03 13:57 ` [PATCH v2 1/2] riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS Jisheng Zhang @ 2023-12-03 13:57 ` Jisheng Zhang 1 sibling, 0 replies; 11+ messages in thread From: Jisheng Zhang @ 2023-12-03 13:57 UTC (permalink / raw) To: Paul Walmsley, Palmer Dabbelt, Albert Ou Cc: Conor Dooley, linux-riscv, linux-kernel DCACHE_WORD_ACCESS uses the word-at-a-time API for optimised string comparisons in the vfs layer. This patch implements support for load_unaligned_zeropad in much the same way as has been done for arm64. Here is the test program and step: $ cat tt.c #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #define ITERATIONS 1000000 #define PATH "123456781234567812345678123456781" int main(void) { unsigned long i; struct stat buf; for (i = 0; i < ITERATIONS; i++) stat(PATH, &buf); return 0; } $ gcc -O2 tt.c $ touch 123456781234567812345678123456781 $ time ./a.out Per my test on T-HEAD C910 platforms, the above test performance is improved by about 7.5%. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> --- arch/riscv/Kconfig | 1 + arch/riscv/include/asm/asm-extable.h | 15 ++++++++++++ arch/riscv/include/asm/word-at-a-time.h | 27 +++++++++++++++++++++ arch/riscv/mm/extable.c | 31 +++++++++++++++++++++++++ 4 files changed, 74 insertions(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 0a76209e9b02..bb366eb1870e 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -657,6 +657,7 @@ config RISCV_MISALIGNED config RISCV_EFFICIENT_UNALIGNED_ACCESS bool "Use unaligned access for some functions" depends on NONPORTABLE + select DCACHE_WORD_ACCESS if MMU select HAVE_EFFICIENT_UNALIGNED_ACCESS default n help diff --git a/arch/riscv/include/asm/asm-extable.h b/arch/riscv/include/asm/asm-extable.h index 00a96e7a9664..0c8bfd54fc4e 100644 --- a/arch/riscv/include/asm/asm-extable.h +++ b/arch/riscv/include/asm/asm-extable.h @@ -6,6 +6,7 @@ #define EX_TYPE_FIXUP 1 #define EX_TYPE_BPF 2 #define EX_TYPE_UACCESS_ERR_ZERO 3 +#define EX_TYPE_LOAD_UNALIGNED_ZEROPAD 4 #ifdef CONFIG_MMU @@ -47,6 +48,11 @@ #define EX_DATA_REG_ZERO_SHIFT 5 #define EX_DATA_REG_ZERO GENMASK(9, 5) +#define EX_DATA_REG_DATA_SHIFT 0 +#define EX_DATA_REG_DATA GENMASK(4, 0) +#define EX_DATA_REG_ADDR_SHIFT 5 +#define EX_DATA_REG_ADDR GENMASK(9, 5) + #define EX_DATA_REG(reg, gpr) \ "((.L__gpr_num_" #gpr ") << " __stringify(EX_DATA_REG_##reg##_SHIFT) ")" @@ -62,6 +68,15 @@ #define _ASM_EXTABLE_UACCESS_ERR(insn, fixup, err) \ _ASM_EXTABLE_UACCESS_ERR_ZERO(insn, fixup, err, zero) +#define _ASM_EXTABLE_LOAD_UNALIGNED_ZEROPAD(insn, fixup, data, addr) \ + __DEFINE_ASM_GPR_NUMS \ + __ASM_EXTABLE_RAW(#insn, #fixup, \ + __stringify(EX_TYPE_LOAD_UNALIGNED_ZEROPAD), \ + "(" \ + EX_DATA_REG(DATA, data) " | " \ + EX_DATA_REG(ADDR, addr) \ + ")") + #endif /* __ASSEMBLY__ */ #else /* CONFIG_MMU */ diff --git a/arch/riscv/include/asm/word-at-a-time.h b/arch/riscv/include/asm/word-at-a-time.h index 7c086ac6ecd4..f3f031e34191 100644 --- a/arch/riscv/include/asm/word-at-a-time.h +++ b/arch/riscv/include/asm/word-at-a-time.h @@ -9,6 +9,7 @@ #define _ASM_RISCV_WORD_AT_A_TIME_H +#include <asm/asm-extable.h> #include <linux/kernel.h> struct word_at_a_time { @@ -45,4 +46,30 @@ static inline unsigned long find_zero(unsigned long mask) /* The mask we created is directly usable as a bytemask */ #define zero_bytemask(mask) (mask) +#ifdef CONFIG_DCACHE_WORD_ACCESS + +/* + * Load an unaligned word from kernel space. + * + * In the (very unlikely) case of the word being a page-crosser + * and the next page not being mapped, take the exception and + * return zeroes in the non-existing part. + */ +static inline unsigned long load_unaligned_zeropad(const void *addr) +{ + unsigned long ret; + + /* Load word from unaligned pointer addr */ + asm( + "1: " REG_L " %0, %2\n" + "2:\n" + _ASM_EXTABLE_LOAD_UNALIGNED_ZEROPAD(1b, 2b, %0, %1) + : "=&r" (ret) + : "r" (addr), "m" (*(unsigned long *)addr)); + + return ret; +} + +#endif /* CONFIG_DCACHE_WORD_ACCESS */ + #endif /* _ASM_RISCV_WORD_AT_A_TIME_H */ diff --git a/arch/riscv/mm/extable.c b/arch/riscv/mm/extable.c index 35484d830fd6..dd1530af3ef1 100644 --- a/arch/riscv/mm/extable.c +++ b/arch/riscv/mm/extable.c @@ -27,6 +27,14 @@ static bool ex_handler_fixup(const struct exception_table_entry *ex, return true; } +static inline unsigned long regs_get_gpr(struct pt_regs *regs, unsigned int offset) +{ + if (unlikely(!offset || offset > MAX_REG_OFFSET)) + return 0; + + return *(unsigned long *)((unsigned long)regs + offset); +} + static inline void regs_set_gpr(struct pt_regs *regs, unsigned int offset, unsigned long val) { @@ -50,6 +58,27 @@ static bool ex_handler_uaccess_err_zero(const struct exception_table_entry *ex, return true; } +static bool +ex_handler_load_unaligned_zeropad(const struct exception_table_entry *ex, + struct pt_regs *regs) +{ + int reg_data = FIELD_GET(EX_DATA_REG_DATA, ex->data); + int reg_addr = FIELD_GET(EX_DATA_REG_ADDR, ex->data); + unsigned long data, addr, offset; + + addr = regs_get_gpr(regs, reg_addr * sizeof(unsigned long)); + + offset = addr & 0x7UL; + addr &= ~0x7UL; + + data = *(unsigned long *)addr >> (offset * 8); + + regs_set_gpr(regs, reg_data * sizeof(unsigned long), data); + + regs->epc = get_ex_fixup(ex); + return true; +} + bool fixup_exception(struct pt_regs *regs) { const struct exception_table_entry *ex; @@ -65,6 +94,8 @@ bool fixup_exception(struct pt_regs *regs) return ex_handler_bpf(ex, regs); case EX_TYPE_UACCESS_ERR_ZERO: return ex_handler_uaccess_err_zero(ex, regs); + case EX_TYPE_LOAD_UNALIGNED_ZEROPAD: + return ex_handler_load_unaligned_zeropad(ex, regs); } BUG(); -- 2.42.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-12-22 5:04 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-12-03 13:57 [PATCH v2 0/2] riscv: enable EFFICIENT_UNALIGNED_ACCESS and DCACHE_WORD_ACCESS Jisheng Zhang 2023-12-03 13:57 ` [PATCH v2 1/2] riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS Jisheng Zhang 2023-12-04 19:15 ` Charlie Jenkins 2023-12-05 2:14 ` Eric Biggers 2023-12-05 13:53 ` Jisheng Zhang 2023-12-05 20:56 ` Charlie Jenkins 2023-12-06 0:05 ` Charles Lohr 2023-12-06 16:19 ` Palmer Dabbelt 2023-12-05 8:39 ` Qingfang DENG 2023-12-22 5:04 ` Eric Biggers 2023-12-03 13:57 ` [PATCH v2 2/2] riscv: select DCACHE_WORD_ACCESS for efficient unaligned access HW Jisheng Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox