Re: [PATCH net-next V2] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Patrisious Haddad <phaddad@nvidia.com>
To: Nathan Chancellor <nathan@kernel.org>, Jason Gunthorpe <jgg@nvidia.com>
Cc: Tariq Toukan <tariqt@nvidia.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>, Mark Bloch <mbloch@nvidia.com>,
	Sabrina Dubroca <sd@queasysnail.net>,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, Gal Pressman <gal@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Michael Guralnik <michaelgur@nvidia.com>,
	Moshe Shemesh <moshe@nvidia.com>, Will Deacon <will@kernel.org>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Justin Stitt <justinstitt@google.com>,
	linux-s390@vger.kernel.org, llvm@lists.linux.dev,
	Ingo Molnar <mingo@redhat.com>, Bill Wendling <morbo@google.com>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Salil Mehta <salil.mehta@huawei.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	x86@kernel.org, Yisen Zhuang <yisen.zhuang@huawei.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Leon Romanovsky <leonro@mellanox.com>,
	linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	Mark Rutland <mark.rutland@arm.com>,
	Michael Guralnik <michaelgur@mellanox.com>,
	patches@lists.linux.dev, Niklas Schnelle <schnelle@linux.ibm.com>,
	Jijie Shao <shaojijie@huawei.com>
Subject: Re: [PATCH net-next V2] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
Date: Tue, 16 Sep 2025 11:39:06 +0300	[thread overview]
Message-ID: <d259ffa9-6c9e-488f-a64f-81025deba75c@nvidia.com> (raw)
In-Reply-To: <20250915231506.GA973819@ax162>


On 9/16/2025 2:15 AM, Nathan Chancellor wrote:
> External email: Use caution opening links or attachments
>
>
> On Mon, Sep 15, 2025 at 07:48:10PM -0300, Jason Gunthorpe wrote:
>> On Mon, Sep 15, 2025 at 03:27:58PM -0700, Nathan Chancellor wrote:
>>> On Mon, Sep 15, 2025 at 03:18:59PM -0700, Nathan Chancellor wrote:
>>>> On Mon, Sep 15, 2025 at 11:35:08AM +0300, Tariq Toukan wrote:
>>>> ...
>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>>>> index d77696f46eb5..06d0eb190816 100644
>>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>>>> @@ -176,3 +176,9 @@ mlx5_core-$(CONFIG_PCIE_TPH) += lib/st.o
>>>>>
>>>>>   obj-$(CONFIG_MLX5_DPLL) += mlx5_dpll.o
>>>>>   mlx5_dpll-y := dpll.o
>>>>> +
>>>>> +#
>>>>> +# NEON WC specific for mlx5
>>>>> +#
>>>>> +mlx5_core-$(CONFIG_KERNEL_MODE_NEON) += lib/wc_neon_iowrite64_copy.o
>>>>> +FLAGS_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_FPU)
>>>> Does this work as is? I think this needs to be CFLAGS instead of FLAGS
>>>> but I did not test to verify.
>>> Also, Documentation/core-api/floating-point.rst states that code should
>>> also use CFLAGS_REMOVE_ for CC_FLAGS_NO_FPU as well as adding
>>> CC_FLAGS_FPU.
>>>
>>>    CFLAGS_REMOVE_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_NO_FPU)
>> I wondered if you needed the seperate compilation unit at all since it
>> it all done with inline assembly.. Since the makefile seems to have a
>> typo, it suggests you don't need the compilation unit and it could
>> just be a little inline protected by CONFIG_KERNEL_MODE_NEON.

There is difference between what actually compiles and the effect of 
these flags on actual performance/assembly translation. To avoid finding 
that the hard way I prefer to stick to their documentation which does as 
Natan described below,

a separate compilation unit between begin and end and the correct flags 
- and eventually that was what I tested , I missed to re-test this post 
finishing my code review - thinking my changes were only cosmetic ...

> Hmmm, clang rejects the current patch
>
>    drivers/net/ethernet/mellanox/mlx5/core/lib/wc_neon_iowrite64_copy.c:9:3: error: instruction requires: neon
>        9 |         ("ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%0]\n\t"
>          |          ^
>    <inline asm>:1:2: note: instantiated into assembly here
>        1 |         ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x19]
>          |         ^
>    drivers/net/ethernet/mellanox/mlx5/core/lib/wc_neon_iowrite64_copy.c:9:48: error: instruction requires: neon
>        9 |         ("ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%0]\n\t"
>          |                                                       ^
>    <inline asm>:2:2: note: instantiated into assembly here
>        2 |         st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x20]
>          |         ^
>
> while GCC accepts it... It looks like GCC's -mgeneral-regs-only only
> impacts the compiler using floating-point and SIMD registers after [1]
> in GCC 6.x, whereas clang's restriction is on both the compiler and
> assembler. Perhaps clang should be adjusted to match but its behavior
> seems more desirable for the kernel to ensure floating-point code is
> properly separated and called between kernel_fpu_{begin,end}(). This
> error is resolved with the following diff.
>
> [1]: https://gcc.gnu.org/cgit/gcc/commit/?id=7d9425d46b58e69667300331aa55ebddddcceaeb
>
> Cheers,
> Nathan
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> index 06d0eb190816..a85fc21419d8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> @@ -181,4 +181,5 @@ mlx5_dpll-y :=      dpll.o
>   # NEON WC specific for mlx5
>   #
>   mlx5_core-$(CONFIG_KERNEL_MODE_NEON) += lib/wc_neon_iowrite64_copy.o
> -FLAGS_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_FPU)
> +CFLAGS_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_FPU)
> +CFLAGS_REMOVE_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_NO_FPU)

You are spot on, I checked my patchset and the actual tested code 
(performance wise) beyond compilation used the following code:

ifeq ($(ARCH),arm64)
         CFLAGS_lib/neon_iowrite64_copy.o += -ffreestanding
         CFLAGS_REMOVE_lib/neon_iowrite64_copy.o += -mgeneral-regs-only
endif

Which is actually equivalent to the diff you sent, Thanks for the 
heads-up will fix and resend.

Thanks, Patrisious.

next prev parent reply	other threads:[~2025-09-16  8:39 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-15  8:35 [PATCH net-next V2] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs Tariq Toukan
2025-09-15 22:18 ` Nathan Chancellor
2025-09-15 22:27   ` Nathan Chancellor
2025-09-15 22:48     ` Jason Gunthorpe
2025-09-15 23:15       ` Nathan Chancellor
2025-09-16  8:39         ` Patrisious Haddad [this message]
2025-09-16  8:58           ` Arnd Bergmann
2025-09-16  9:47             ` Patrisious Haddad
2025-09-16 12:27               ` Jason Gunthorpe
2025-09-16 13:13                 ` Patrisious Haddad
2025-09-16  5:45 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d259ffa9-6c9e-488f-a64f-81025deba75c@nvidia.com \
    --to=phaddad@nvidia.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=arnd@arndb.de \
    --cc=borntraeger@linux.ibm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jgg@nvidia.com \
    --cc=justinstitt@google.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@mellanox.com \
    --cc=leonro@nvidia.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=mark.rutland@arm.com \
    --cc=mbloch@nvidia.com \
    --cc=michaelgur@mellanox.com \
    --cc=michaelgur@nvidia.com \
    --cc=mingo@redhat.com \
    --cc=morbo@google.com \
    --cc=moshe@nvidia.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=patches@lists.linux.dev \
    --cc=saeedm@nvidia.com \
    --cc=salil.mehta@huawei.com \
    --cc=schnelle@linux.ibm.com \
    --cc=sd@queasysnail.net \
    --cc=shaojijie@huawei.com \
    --cc=svens@linux.ibm.com \
    --cc=tariqt@nvidia.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yisen.zhuang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).