patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next V6] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
@ 2025-09-28 21:08 Tariq Toukan
  2025-10-01  0:30 ` patchwork-bot+netdevbpf
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Tariq Toukan @ 2025-09-28 21:08 UTC (permalink / raw)
  To: Catalin Marinas, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Leon Romanovsky,
	Jason Gunthorpe, Michael Guralnik, Moshe Shemesh, Will Deacon,
	Alexander Gordeev, Andrew Morton, Christian Borntraeger,
	Borislav Petkov, Dave Hansen, Gerald Schaefer, Vasily Gorbik,
	Heiko Carstens, H. Peter Anvin, Justin Stitt, linux-s390, llvm,
	Ingo Molnar, Bill Wendling, Nathan Chancellor, Nick Desaulniers,
	Salil Mehta, Sven Schnelle, Thomas Gleixner, x86, Yisen Zhuang,
	Arnd Bergmann, Leon Romanovsky, linux-arch, linux-arm-kernel,
	Mark Rutland, Michael Guralnik, patches, Niklas Schnelle,
	Jijie Shao, Simon Horman, Patrisious Haddad

From: Patrisious Haddad <phaddad@nvidia.com>

Write combining is an optimization feature in CPUs that is frequently
used by modern devices to generate 32 or 64 byte TLPs at the PCIe level.
These large TLPs allow certain optimizations in the driver to HW
communication that improve performance. As WC is unpredictable and
optional the HW designs all tolerate cases where combining doesn't
happen and simply experience a performance degradation.

Unfortunately many virtualization environments on all architectures have
done things that completely disable WC inside the VM with no generic way
to detect this. For example WC was fully blocked in ARM64 KVM until
commit 8c47ce3e1d2c ("KVM: arm64: Set io memory s2 pte as normalnc for
vfio pci device").

Trying to use WC when it is known not to work has a measurable
performance cost (~5%). Long ago mlx5 developed an boot time algorithm
to test if WC is available or not by using unique mlx5 HW features to
measure how many large TLPs the device is receiving. The SW generates a
large number of combining opportunities and if any succeed then WC is
declared working.

In mlx5 the WC optimization feature is never used by the kernel except
for the boot time test. The WC is only used by userspace in rdma-core.

Sadly modern ARM CPUs, especially NVIDIA Grace, have a combining
implementation that is very unreliable compared to pretty much
everything prior. This is being fixed architecturally in new CPUs with a
new ST64B instruction, but current shipping devices suffer this problem.

Unreliable means the SW can present thousands of combining opportunities
and the HW will not combine for any of them, which creates a performance
degradation, and critically fails the mlx5 boot test. However, the CPU
is very sensitive to the instruction sequence used, with the better
options being sufficiently good that the performance loss from the
unreliable CPU is not measurable.

Broadly there are several options, from worst to best:
1) A C loop doing a u64 memcpy.
   This was used prior to commit ef302283ddfc
   ("IB/mlx5: Use __iowrite64_copy() for write combining stores")
   and failed almost all the time on Grace CPUs.

2) ARM64 assembly with consecutive 8 byte stores. This was implemented
   as an arch-generic __iowriteXX_copy() family of functions suitable
   for performance use in drivers for WC. commit ead79118dae6
   ("arm64/io: Provide a WC friendly __iowriteXX_copy()") provided the
   ARM implementation.

3) ARM64 assembly with consecutive 16 byte stores. This was rejected
   from kernel use over fears of virtualization failures. Common ARM
   VMMs will crash if STP is used against emulated memory.

4) A single NEON store instruction. Userspace has used this option for a
   very long time, it performs well.

5) For future silicon the new ST64B instruction is guaranteed to
   generate a 64 byte TLP 100% of the time

The past upgrade from #1 to #2 was thought to be sufficient to solve
this problem. However, more testing on more systems shows that #3 is
still problematic at a low frequency and the kernel test fails.

Thus, make the mlx5 use the same instructions as userspace during the
boot time WC self test. This way the WC test matches the userspace and
will properly detect the ability of HW to support the WC workload that
userspace will generate. While #4 still has imperfect combining
performance, it is substantially better than #2, and does actually give
a performance win to applications. Self-test failures with #2 are like
3/10 boots, on some systems, #4 has never seen a boot failure.

There is no real general use case for a NEON based WC flow in the
kernel. This is not suitable for any performance path work as getting
into/out of a NEON context is fairly expensive compared to the gain of
WC. Future CPUs are going to fix this issue by using an new ARM
instruction and __iowriteXX_copy() will be updated to use that
automatically, probably using the ALTERNATES mechanism.

Since this problem is constrained to mlx5's unique situation of needing
a non-performance code path to duplicate what mlx5 userspace is doing as
a matter of self-testing, implement it as a one line inline assembly in
the driver directly.

Lastly, this was concluded from the discussion with ARM maintainers
which confirms that this is the best approach for the solution:
https://lore.kernel.org/r/aHqN_hpJl84T1Usi@arm.com

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/wc.c | 28 ++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

Find V5 here:
https://lore.kernel.org/all/1758800913-830383-1-git-send-email-tariqt@nvidia.com/

V6:
- Replace defined() usages with IS_ENABLED() (Jason).

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wc.c b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
index 999d6216648a..c281153bd411 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/wc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
@@ -7,6 +7,10 @@
 #include "mlx5_core.h"
 #include "wq.h"
 
+#if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && IS_ENABLED(CONFIG_ARM64)
+#include <asm/neon.h>
+#endif
+
 #define TEST_WC_NUM_WQES 255
 #define TEST_WC_LOG_CQ_SZ (order_base_2(TEST_WC_NUM_WQES))
 #define TEST_WC_SQ_LOG_WQ_SZ TEST_WC_LOG_CQ_SZ
@@ -255,6 +259,27 @@ static void mlx5_wc_destroy_sq(struct mlx5_wc_sq *sq)
 	mlx5_wq_destroy(&sq->wq_ctrl);
 }
 
+static void mlx5_iowrite64_copy(struct mlx5_wc_sq *sq, __be32 mmio_wqe[16],
+				size_t mmio_wqe_size, unsigned int offset)
+{
+#if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && IS_ENABLED(CONFIG_ARM64)
+	if (cpu_has_neon()) {
+		kernel_neon_begin();
+		asm volatile
+		(".arch_extension simd;\n\t"
+		"ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%0]\n\t"
+		"st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%1]"
+		:
+		: "r"(mmio_wqe), "r"(sq->bfreg.map + offset)
+		: "memory", "v0", "v1", "v2", "v3");
+		kernel_neon_end();
+		return;
+	}
+#endif
+	__iowrite64_copy(sq->bfreg.map + offset, mmio_wqe,
+			 mmio_wqe_size / 8);
+}
+
 static void mlx5_wc_post_nop(struct mlx5_wc_sq *sq, unsigned int *offset,
 			     bool signaled)
 {
@@ -289,8 +314,7 @@ static void mlx5_wc_post_nop(struct mlx5_wc_sq *sq, unsigned int *offset,
 	 */
 	wmb();
 
-	__iowrite64_copy(sq->bfreg.map + *offset, mmio_wqe,
-			 sizeof(mmio_wqe) / 8);
+	mlx5_iowrite64_copy(sq, mmio_wqe, sizeof(mmio_wqe), *offset);
 
 	*offset ^= buf_size;
 }

base-commit: e835faaed2f80ee8652f59a54703edceab04f0d9
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next V6] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
  2025-09-28 21:08 [PATCH net-next V6] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs Tariq Toukan
@ 2025-10-01  0:30 ` patchwork-bot+netdevbpf
  2025-10-01  9:28 ` Paolo Abeni
  2025-10-06 13:57 ` Sebastian Ott
  2 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-10-01  0:30 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: catalin.marinas, edumazet, kuba, pabeni, andrew+netdev, davem,
	saeedm, leon, mbloch, netdev, linux-rdma, linux-kernel, gal,
	leonro, jgg, michaelgur, moshe, will, agordeev, akpm, borntraeger,
	bp, dave.hansen, gerald.schaefer, gor, hca, hpa, justinstitt,
	linux-s390, llvm, mingo, morbo, nathan, ndesaulniers, salil.mehta,
	svens, tglx, x86, yisen.zhuang, arnd, leonro, linux-arch,
	linux-arm-kernel, mark.rutland, michaelgur, patches, schnelle,
	shaojijie, horms, phaddad

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 29 Sep 2025 00:08:08 +0300 you wrote:
> From: Patrisious Haddad <phaddad@nvidia.com>
> 
> Write combining is an optimization feature in CPUs that is frequently
> used by modern devices to generate 32 or 64 byte TLPs at the PCIe level.
> These large TLPs allow certain optimizations in the driver to HW
> communication that improve performance. As WC is unpredictable and
> optional the HW designs all tolerate cases where combining doesn't
> happen and simply experience a performance degradation.
> 
> [...]

Here is the summary with links:
  - [net-next,V6] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
    https://git.kernel.org/netdev/net-next/c/fd8c8216648c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next V6] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
  2025-09-28 21:08 [PATCH net-next V6] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs Tariq Toukan
  2025-10-01  0:30 ` patchwork-bot+netdevbpf
@ 2025-10-01  9:28 ` Paolo Abeni
  2025-10-01 14:55   ` Jason Gunthorpe
  2025-10-06 13:57 ` Sebastian Ott
  2 siblings, 1 reply; 8+ messages in thread
From: Paolo Abeni @ 2025-10-01  9:28 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Saeed Mahameed, Leon Romanovsky, Mark Bloch, netdev, linux-rdma,
	linux-kernel, Gal Pressman, Leon Romanovsky, Jason Gunthorpe,
	Michael Guralnik, Moshe Shemesh, Will Deacon, Alexander Gordeev,
	Andrew Morton, Christian Borntraeger, Borislav Petkov,
	Dave Hansen, Gerald Schaefer, Vasily Gorbik, Heiko Carstens,
	H. Peter Anvin, Justin Stitt, linux-s390, llvm, Ingo Molnar,
	Bill Wendling, Nathan Chancellor, Nick Desaulniers, Salil Mehta,
	Sven Schnelle, Thomas Gleixner, x86, Yisen Zhuang, Arnd Bergmann,
	Leon Romanovsky, linux-arch, linux-arm-kernel, Mark Rutland,
	Michael Guralnik, patches, Niklas Schnelle, Jijie Shao,
	Simon Horman, Patrisious Haddad, Andrew Lunn, Eric Dumazet,
	Catalin Marinas, Jakub Kicinski, David S. Miller

Hi,

On 9/28/25 11:08 PM, Tariq Toukan wrote:
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wc.c b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
> index 999d6216648a..c281153bd411 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/wc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
> @@ -7,6 +7,10 @@
>  #include "mlx5_core.h"
>  #include "wq.h"
>  
> +#if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && IS_ENABLED(CONFIG_ARM64)
> +#include <asm/neon.h>
> +#endif
> +
>  #define TEST_WC_NUM_WQES 255
>  #define TEST_WC_LOG_CQ_SZ (order_base_2(TEST_WC_NUM_WQES))
>  #define TEST_WC_SQ_LOG_WQ_SZ TEST_WC_LOG_CQ_SZ
> @@ -255,6 +259,27 @@ static void mlx5_wc_destroy_sq(struct mlx5_wc_sq *sq)
>  	mlx5_wq_destroy(&sq->wq_ctrl);
>  }
>  
> +static void mlx5_iowrite64_copy(struct mlx5_wc_sq *sq, __be32 mmio_wqe[16],
> +				size_t mmio_wqe_size, unsigned int offset)
> +{
> +#if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && IS_ENABLED(CONFIG_ARM64)
> +	if (cpu_has_neon()) {
> +		kernel_neon_begin();
> +		asm volatile
> +		(".arch_extension simd;\n\t"

Here I'm observing build errors with aarch64-linux-gnu-gcc 12.1.1
20220507 (Red Hat Cross 12.1.1-1):

/tmp/cchqHdeI.s: Assembler messages:
/tmp/cchqHdeI.s:746: Error: unknown architectural extension `simd;'

I can't reproduce the error on any recent compiler version via
godbolt.org, so I *think* this should not block/be reverted for the MR,
but could you please have a look soonish?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next V6] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
  2025-10-01  9:28 ` Paolo Abeni
@ 2025-10-01 14:55   ` Jason Gunthorpe
  2025-10-01 16:36     ` Nathan Chancellor
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2025-10-01 14:55 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Tariq Toukan, Saeed Mahameed, Leon Romanovsky, Mark Bloch, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Leon Romanovsky,
	Michael Guralnik, Moshe Shemesh, Will Deacon, Alexander Gordeev,
	Andrew Morton, Christian Borntraeger, Borislav Petkov,
	Dave Hansen, Gerald Schaefer, Vasily Gorbik, Heiko Carstens,
	H. Peter Anvin, Justin Stitt, linux-s390, llvm, Ingo Molnar,
	Bill Wendling, Nathan Chancellor, Nick Desaulniers, Salil Mehta,
	Sven Schnelle, Thomas Gleixner, x86, Yisen Zhuang, Arnd Bergmann,
	Leon Romanovsky, linux-arch, linux-arm-kernel, Mark Rutland,
	Michael Guralnik, patches, Niklas Schnelle, Jijie Shao,
	Simon Horman, Patrisious Haddad, Andrew Lunn, Eric Dumazet,
	Catalin Marinas, Jakub Kicinski, David S. Miller

On Wed, Oct 01, 2025 at 11:28:09AM +0200, Paolo Abeni wrote:

> > +static void mlx5_iowrite64_copy(struct mlx5_wc_sq *sq, __be32 mmio_wqe[16],
> > +				size_t mmio_wqe_size, unsigned int offset)
> > +{
> > +#if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && IS_ENABLED(CONFIG_ARM64)
> > +	if (cpu_has_neon()) {
> > +		kernel_neon_begin();
> > +		asm volatile
> > +		(".arch_extension simd;\n\t"
> 
> Here I'm observing build errors with aarch64-linux-gnu-gcc 12.1.1
> 20220507 (Red Hat Cross 12.1.1-1):

> /tmp/cchqHdeI.s: Assembler messages:
> /tmp/cchqHdeI.s:746: Error: unknown architectural extension `simd;'

This is a binutils error not gcc.. What is the binutils version?

2.30 is the lowest that v6.16 supports..

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next V6] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
  2025-10-01 14:55   ` Jason Gunthorpe
@ 2025-10-01 16:36     ` Nathan Chancellor
  2025-10-01 16:45       ` Nathan Chancellor
  0 siblings, 1 reply; 8+ messages in thread
From: Nathan Chancellor @ 2025-10-01 16:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Paolo Abeni, Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Mark Bloch, netdev, linux-rdma, linux-kernel, Gal Pressman,
	Leon Romanovsky, Michael Guralnik, Moshe Shemesh, Will Deacon,
	Alexander Gordeev, Andrew Morton, Christian Borntraeger,
	Borislav Petkov, Dave Hansen, Gerald Schaefer, Vasily Gorbik,
	Heiko Carstens, H. Peter Anvin, Justin Stitt, linux-s390, llvm,
	Ingo Molnar, Bill Wendling, Nick Desaulniers, Salil Mehta,
	Sven Schnelle, Thomas Gleixner, x86, Yisen Zhuang, Arnd Bergmann,
	Leon Romanovsky, linux-arch, linux-arm-kernel, Mark Rutland,
	Michael Guralnik, patches, Niklas Schnelle, Jijie Shao,
	Simon Horman, Patrisious Haddad, Andrew Lunn, Eric Dumazet,
	Catalin Marinas, Jakub Kicinski, David S. Miller

On Wed, Oct 01, 2025 at 11:55:14AM -0300, Jason Gunthorpe wrote:
> On Wed, Oct 01, 2025 at 11:28:09AM +0200, Paolo Abeni wrote:
> 
> > > +static void mlx5_iowrite64_copy(struct mlx5_wc_sq *sq, __be32 mmio_wqe[16],
> > > +				size_t mmio_wqe_size, unsigned int offset)
> > > +{
> > > +#if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && IS_ENABLED(CONFIG_ARM64)
> > > +	if (cpu_has_neon()) {
> > > +		kernel_neon_begin();
> > > +		asm volatile
> > > +		(".arch_extension simd;\n\t"
> > 
> > Here I'm observing build errors with aarch64-linux-gnu-gcc 12.1.1
> > 20220507 (Red Hat Cross 12.1.1-1):
> 
> > /tmp/cchqHdeI.s: Assembler messages:
> > /tmp/cchqHdeI.s:746: Error: unknown architectural extension `simd;'
> 
> This is a binutils error not gcc.. What is the binutils version?

I can reproduce this with at least binutils 2.36.1, which is in the
kernel.org GCC 8.5.0 toolchain.

Removing the semicolon resolves the issue for me and matches the format
of .arch_extension in the rest of the kernel. I am guessing binutils
became less strict with parsing at some point.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next V6] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
  2025-10-01 16:36     ` Nathan Chancellor
@ 2025-10-01 16:45       ` Nathan Chancellor
  0 siblings, 0 replies; 8+ messages in thread
From: Nathan Chancellor @ 2025-10-01 16:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Paolo Abeni, Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Mark Bloch, netdev, linux-rdma, linux-kernel, Gal Pressman,
	Leon Romanovsky, Michael Guralnik, Moshe Shemesh, Will Deacon,
	Alexander Gordeev, Andrew Morton, Christian Borntraeger,
	Borislav Petkov, Dave Hansen, Gerald Schaefer, Vasily Gorbik,
	Heiko Carstens, H. Peter Anvin, Justin Stitt, linux-s390, llvm,
	Ingo Molnar, Bill Wendling, Nick Desaulniers, Salil Mehta,
	Sven Schnelle, Thomas Gleixner, x86, Yisen Zhuang, Arnd Bergmann,
	Leon Romanovsky, linux-arch, linux-arm-kernel, Mark Rutland,
	Michael Guralnik, patches, Niklas Schnelle, Jijie Shao,
	Simon Horman, Patrisious Haddad, Andrew Lunn, Eric Dumazet,
	Catalin Marinas, Jakub Kicinski, David S. Miller

On Wed, Oct 01, 2025 at 09:36:55AM -0700, Nathan Chancellor wrote:
> Removing the semicolon resolves the issue for me and matches the format
> of .arch_extension in the rest of the kernel. I am guessing binutils
> became less strict with parsing at some point.

Looks like 2.40 is the first fixed release.

  https://sourceware.org/bugzilla/show_bug.cgi?id=29519
  https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=e8f20526238199c18afe163a230eafe19b51fca0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next V6] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
  2025-09-28 21:08 [PATCH net-next V6] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs Tariq Toukan
  2025-10-01  0:30 ` patchwork-bot+netdevbpf
  2025-10-01  9:28 ` Paolo Abeni
@ 2025-10-06 13:57 ` Sebastian Ott
  2025-10-06 13:59   ` Arnd Bergmann
  2 siblings, 1 reply; 8+ messages in thread
From: Sebastian Ott @ 2025-10-06 13:57 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Catalin Marinas, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S. Miller, Saeed Mahameed, Leon Romanovsky,
	Mark Bloch, netdev, linux-rdma, linux-kernel, Gal Pressman,
	Leon Romanovsky, Jason Gunthorpe, Michael Guralnik, Moshe Shemesh,
	Will Deacon, Alexander Gordeev, Andrew Morton,
	Christian Borntraeger, Borislav Petkov, Dave Hansen,
	Gerald Schaefer, Vasily Gorbik, Heiko Carstens, H. Peter Anvin,
	Justin Stitt, linux-s390, llvm, Ingo Molnar, Bill Wendling,
	Nathan Chancellor, Nick Desaulniers, Salil Mehta, Sven Schnelle,
	Thomas Gleixner, x86, Yisen Zhuang, Arnd Bergmann,
	Leon Romanovsky, linux-arch, linux-arm-kernel, Mark Rutland,
	Michael Guralnik, patches, Niklas Schnelle, Jijie Shao,
	Simon Horman, Patrisious Haddad

On Mon, 29 Sep 2025, Tariq Toukan wrote:
> +static void mlx5_iowrite64_copy(struct mlx5_wc_sq *sq, __be32 mmio_wqe[16],
> +				size_t mmio_wqe_size, unsigned int offset)
> +{
> +#if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && IS_ENABLED(CONFIG_ARM64)
> +	if (cpu_has_neon()) {
> +		kernel_neon_begin();
> +		asm volatile
> +		(".arch_extension simd;\n\t"
> +		"ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%0]\n\t"
> +		"st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%1]"
> +		:
> +		: "r"(mmio_wqe), "r"(sq->bfreg.map + offset)
> +		: "memory", "v0", "v1", "v2", "v3");
> +		kernel_neon_end();
> +		return;
> +	}
> +#endif

This one breaks the build for me:
/tmp/cc2vw3CJ.s: Assembler messages:
/tmp/cc2vw3CJ.s:391: Error: unknown architectural extension `simd;'

Removing the extra ";" after simd seems to fix it.

Regards,
Sebastian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next V6] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
  2025-10-06 13:57 ` Sebastian Ott
@ 2025-10-06 13:59   ` Arnd Bergmann
  0 siblings, 0 replies; 8+ messages in thread
From: Arnd Bergmann @ 2025-10-06 13:59 UTC (permalink / raw)
  To: Sebastian Ott, Tariq Toukan
  Cc: Catalin Marinas, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S . Miller, Saeed Mahameed, Leon Romanovsky,
	Mark Bloch, Netdev, linux-rdma, linux-kernel, Gal Pressman,
	Leon Romanovsky, Jason Gunthorpe, Michael Guralnik, Moshe Shemesh,
	Will Deacon, Alexander Gordeev, Andrew Morton,
	Christian Borntraeger, Borislav Petkov, Dave Hansen,
	Gerald Schaefer, Vasily Gorbik, Heiko Carstens, H. Peter Anvin,
	Justin Stitt, linux-s390, llvm, Ingo Molnar, Bill Wendling,
	Nathan Chancellor, Nick Desaulniers, Salil Mehta, Sven Schnelle,
	Thomas Gleixner, x86, Yisen Zhuang, Leon Romanovsky, Linux-Arch,
	linux-arm-kernel, Mark Rutland, Michael Guralnik, patches,
	Niklas Schnelle, Jijie Shao, Simon Horman, Patrisious Haddad

On Mon, Oct 6, 2025, at 15:57, Sebastian Ott wrote:
> On Mon, 29 Sep 2025, Tariq Toukan wrote:
>> +static void mlx5_iowrite64_copy(struct mlx5_wc_sq *sq, __be32 mmio_wqe[16],
>> +				size_t mmio_wqe_size, unsigned int offset)
>> +{
>> +#if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && IS_ENABLED(CONFIG_ARM64)
>> +	if (cpu_has_neon()) {
>> +		kernel_neon_begin();
>> +		asm volatile
>> +		(".arch_extension simd;\n\t"
>> +		"ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%0]\n\t"
>> +		"st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%1]"
>> +		:
>> +		: "r"(mmio_wqe), "r"(sq->bfreg.map + offset)
>> +		: "memory", "v0", "v1", "v2", "v3");
>> +		kernel_neon_end();
>> +		return;
>> +	}
>> +#endif
>
> This one breaks the build for me:
> /tmp/cc2vw3CJ.s: Assembler messages:
> /tmp/cc2vw3CJ.s:391: Error: unknown architectural extension `simd;'
>
> Removing the extra ";" after simd seems to fix it.

I sent that fixup earlier today:

https://lore.kernel.org/all/20251006115640.497169-1-arnd@kernel.org/

     Arnd

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-10-06 13:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-28 21:08 [PATCH net-next V6] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs Tariq Toukan
2025-10-01  0:30 ` patchwork-bot+netdevbpf
2025-10-01  9:28 ` Paolo Abeni
2025-10-01 14:55   ` Jason Gunthorpe
2025-10-01 16:36     ` Nathan Chancellor
2025-10-01 16:45       ` Nathan Chancellor
2025-10-06 13:57 ` Sebastian Ott
2025-10-06 13:59   ` Arnd Bergmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).