From: Patrisious Haddad <phaddad@nvidia.com>
To: Nathan Chancellor <nathan@kernel.org>, Jason Gunthorpe <jgg@nvidia.com>
Cc: Tariq Toukan <tariqt@nvidia.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Saeed Mahameed <saeedm@nvidia.com>,
Leon Romanovsky <leon@kernel.org>, Mark Bloch <mbloch@nvidia.com>,
Sabrina Dubroca <sd@queasysnail.net>,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, Gal Pressman <gal@nvidia.com>,
Leon Romanovsky <leonro@nvidia.com>,
Michael Guralnik <michaelgur@nvidia.com>,
Moshe Shemesh <moshe@nvidia.com>, Will Deacon <will@kernel.org>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Heiko Carstens <hca@linux.ibm.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Justin Stitt <justinstitt@google.com>,
linux-s390@vger.kernel.org, llvm@lists.linux.dev,
Ingo Molnar <mingo@redhat.com>, Bill Wendling <morbo@google.com>,
Nick Desaulniers <ndesaulniers@google.com>,
Salil Mehta <salil.mehta@huawei.com>,
Sven Schnelle <svens@linux.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
x86@kernel.org, Yisen Zhuang <yisen.zhuang@huawei.com>,
Arnd Bergmann <arnd@arndb.de>,
Leon Romanovsky <leonro@mellanox.com>,
linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
Mark Rutland <mark.rutland@arm.com>,
Michael Guralnik <michaelgur@mellanox.com>,
patches@lists.linux.dev, Niklas Schnelle <schnelle@linux.ibm.com>,
Jijie Shao <shaojijie@huawei.com>
Subject: Re: [PATCH net-next V2] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
Date: Tue, 16 Sep 2025 11:39:06 +0300 [thread overview]
Message-ID: <d259ffa9-6c9e-488f-a64f-81025deba75c@nvidia.com> (raw)
In-Reply-To: <20250915231506.GA973819@ax162>
On 9/16/2025 2:15 AM, Nathan Chancellor wrote:
> External email: Use caution opening links or attachments
>
>
> On Mon, Sep 15, 2025 at 07:48:10PM -0300, Jason Gunthorpe wrote:
>> On Mon, Sep 15, 2025 at 03:27:58PM -0700, Nathan Chancellor wrote:
>>> On Mon, Sep 15, 2025 at 03:18:59PM -0700, Nathan Chancellor wrote:
>>>> On Mon, Sep 15, 2025 at 11:35:08AM +0300, Tariq Toukan wrote:
>>>> ...
>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>>>> index d77696f46eb5..06d0eb190816 100644
>>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>>>> @@ -176,3 +176,9 @@ mlx5_core-$(CONFIG_PCIE_TPH) += lib/st.o
>>>>>
>>>>> obj-$(CONFIG_MLX5_DPLL) += mlx5_dpll.o
>>>>> mlx5_dpll-y := dpll.o
>>>>> +
>>>>> +#
>>>>> +# NEON WC specific for mlx5
>>>>> +#
>>>>> +mlx5_core-$(CONFIG_KERNEL_MODE_NEON) += lib/wc_neon_iowrite64_copy.o
>>>>> +FLAGS_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_FPU)
>>>> Does this work as is? I think this needs to be CFLAGS instead of FLAGS
>>>> but I did not test to verify.
>>> Also, Documentation/core-api/floating-point.rst states that code should
>>> also use CFLAGS_REMOVE_ for CC_FLAGS_NO_FPU as well as adding
>>> CC_FLAGS_FPU.
>>>
>>> CFLAGS_REMOVE_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_NO_FPU)
>> I wondered if you needed the seperate compilation unit at all since it
>> it all done with inline assembly.. Since the makefile seems to have a
>> typo, it suggests you don't need the compilation unit and it could
>> just be a little inline protected by CONFIG_KERNEL_MODE_NEON.
There is difference between what actually compiles and the effect of
these flags on actual performance/assembly translation. To avoid finding
that the hard way I prefer to stick to their documentation which does as
Natan described below,
a separate compilation unit between begin and end and the correct flags
- and eventually that was what I tested , I missed to re-test this post
finishing my code review - thinking my changes were only cosmetic ...
> Hmmm, clang rejects the current patch
>
> drivers/net/ethernet/mellanox/mlx5/core/lib/wc_neon_iowrite64_copy.c:9:3: error: instruction requires: neon
> 9 | ("ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%0]\n\t"
> | ^
> <inline asm>:1:2: note: instantiated into assembly here
> 1 | ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x19]
> | ^
> drivers/net/ethernet/mellanox/mlx5/core/lib/wc_neon_iowrite64_copy.c:9:48: error: instruction requires: neon
> 9 | ("ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%0]\n\t"
> | ^
> <inline asm>:2:2: note: instantiated into assembly here
> 2 | st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x20]
> | ^
>
> while GCC accepts it... It looks like GCC's -mgeneral-regs-only only
> impacts the compiler using floating-point and SIMD registers after [1]
> in GCC 6.x, whereas clang's restriction is on both the compiler and
> assembler. Perhaps clang should be adjusted to match but its behavior
> seems more desirable for the kernel to ensure floating-point code is
> properly separated and called between kernel_fpu_{begin,end}(). This
> error is resolved with the following diff.
>
> [1]: https://gcc.gnu.org/cgit/gcc/commit/?id=7d9425d46b58e69667300331aa55ebddddcceaeb
>
> Cheers,
> Nathan
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> index 06d0eb190816..a85fc21419d8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> @@ -181,4 +181,5 @@ mlx5_dpll-y := dpll.o
> # NEON WC specific for mlx5
> #
> mlx5_core-$(CONFIG_KERNEL_MODE_NEON) += lib/wc_neon_iowrite64_copy.o
> -FLAGS_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_FPU)
> +CFLAGS_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_FPU)
> +CFLAGS_REMOVE_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_NO_FPU)
You are spot on, I checked my patchset and the actual tested code
(performance wise) beyond compilation used the following code:
ifeq ($(ARCH),arm64)
CFLAGS_lib/neon_iowrite64_copy.o += -ffreestanding
CFLAGS_REMOVE_lib/neon_iowrite64_copy.o += -mgeneral-regs-only
endif
Which is actually equivalent to the diff you sent, Thanks for the
heads-up will fix and resend.
Thanks, Patrisious.
next prev parent reply other threads:[~2025-09-16 8:39 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-15 8:35 [PATCH net-next V2] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs Tariq Toukan
2025-09-15 22:18 ` Nathan Chancellor
2025-09-15 22:27 ` Nathan Chancellor
2025-09-15 22:48 ` Jason Gunthorpe
2025-09-15 23:15 ` Nathan Chancellor
2025-09-16 8:39 ` Patrisious Haddad [this message]
2025-09-16 8:58 ` Arnd Bergmann
2025-09-16 9:47 ` Patrisious Haddad
2025-09-16 12:27 ` Jason Gunthorpe
2025-09-16 13:13 ` Patrisious Haddad
2025-09-16 5:45 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d259ffa9-6c9e-488f-a64f-81025deba75c@nvidia.com \
--to=phaddad@nvidia.com \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=andrew+netdev@lunn.ch \
--cc=arnd@arndb.de \
--cc=borntraeger@linux.ibm.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gal@nvidia.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=hpa@zytor.com \
--cc=jgg@nvidia.com \
--cc=justinstitt@google.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=leonro@mellanox.com \
--cc=leonro@nvidia.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=mark.rutland@arm.com \
--cc=mbloch@nvidia.com \
--cc=michaelgur@mellanox.com \
--cc=michaelgur@nvidia.com \
--cc=mingo@redhat.com \
--cc=morbo@google.com \
--cc=moshe@nvidia.com \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=patches@lists.linux.dev \
--cc=saeedm@nvidia.com \
--cc=salil.mehta@huawei.com \
--cc=schnelle@linux.ibm.com \
--cc=sd@queasysnail.net \
--cc=shaojijie@huawei.com \
--cc=svens@linux.ibm.com \
--cc=tariqt@nvidia.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yisen.zhuang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).