From: Leon Romanovsky <leon@kernel.org>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Heiko Carstens <hca@linux.ibm.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Justin Stitt <justinstitt@google.com>,
Jakub Kicinski <kuba@kernel.org>,
linux-rdma@vger.kernel.org, linux-s390@vger.kernel.org,
llvm@lists.linux.dev, Ingo Molnar <mingo@redhat.com>,
Bill Wendling <morbo@google.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
netdev@vger.kernel.org, Paolo Abeni <pabeni@redhat.com>,
Salil Mehta <salil.mehta@huawei.com>,
Sven Schnelle <svens@linux.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
x86@kernel.org, Yisen Zhuang <yisen.zhuang@huawei.com>,
Arnd Bergmann <arnd@arndb.de>,
Catalin Marinas <catalin.marinas@arm.com>,
linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
Mark Rutland <mark.rutland@arm.com>,
Michael Guralnik <michaelgur@mellanox.com>,
patches@lists.linux.dev, Niklas Schnelle <schnelle@linux.ibm.com>,
Jijie Shao <shaojijie@huawei.com>, Will Deacon <will@kernel.org>
Subject: Re: [PATCH v3 6/6] IB/mlx5: Use __iowrite64_copy() for write combining stores
Date: Tue, 16 Apr 2024 11:29:57 +0300 [thread overview]
Message-ID: <20240416082957.GC6832@unreal> (raw)
In-Reply-To: <6-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com>
On Thu, Apr 11, 2024 at 01:46:19PM -0300, Jason Gunthorpe wrote:
> mlx5 has a built in self-test at driver startup to evaluate if the
> platform supports write combining to generate a 64 byte PCIe TLP or
> not. This has proven necessary because a lot of common scenarios end up
> with broken write combining (especially inside virtual machines) and there
> is other way to learn this information.
>
> This self test has been consistently failing on new ARM64 CPU
> designs (specifically with NVIDIA Grace's implementation of Neoverse
> V2). The C loop around writeq() generates some pretty terrible ARM64
> assembly, but historically this has worked on a lot of existing ARM64 CPUs
> till now.
>
> We see it succeed about 1 time in 10,000 on the worst effected
> systems. The CPU architects speculate that the load instructions
> interspersed with the stores makes the WC buffers statistically flush too
> often and thus the generation of large TLPs becomes infrequent. This makes
> the boot up test unreliable in that it indicates no write-combining,
> however userspace would be fine since it uses a ST4 instruction.
>
> Further, S390 has similar issues where only the special zpci_memcpy_toio()
> will actually generate large TLPs, and the open coded loop does not
> trigger it at all.
>
> Fix both ARM64 and S390 by switching to __iowrite64_copy() which now
> provides architecture specific variants that have a high change of
> generating a large TLP with write combining. x86 continues to use a
> similar writeq loop in the generate __iowrite64_copy().
>
> Fixes: 11f552e21755 ("IB/mlx5: Test write combining support")
> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/infiniband/hw/mlx5/mem.c | 8 +++-----
> 1 file changed, 3 insertions(+), 5 deletions(-)
>
Thanks,
Acked-by: Leon Romanovsky <leonro@nvidia.com>
WARNING: multiple messages have this Message-ID (diff)
From: Leon Romanovsky <leon@kernel.org>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Heiko Carstens <hca@linux.ibm.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Justin Stitt <justinstitt@google.com>,
Jakub Kicinski <kuba@kernel.org>,
linux-rdma@vger.kernel.org, linux-s390@vger.kernel.org,
llvm@lists.linux.dev, Ingo Molnar <mingo@redhat.com>,
Bill Wendling <morbo@google.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
netdev@vger.kernel.org, Paolo Abeni <pabeni@redhat.com>,
Salil Mehta <salil.mehta@huawei.com>,
Sven Schnelle <svens@linux.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
x86@kernel.org, Yisen Zhuang <yisen.zhuang@huawei.com>,
Arnd Bergmann <arnd@arndb.de>,
Catalin Marinas <catalin.marinas@arm.com>,
linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
Mark Rutland <mark.rutland@arm.com>,
Michael Guralnik <michaelgur@mellanox.com>,
patches@lists.linux.dev, Niklas Schnelle <schnelle@linux.ibm.com>,
Jijie Shao <shaojijie@huawei.com>, Will Deacon <will@kernel.org>
Subject: Re: [PATCH v3 6/6] IB/mlx5: Use __iowrite64_copy() for write combining stores
Date: Tue, 16 Apr 2024 11:29:57 +0300 [thread overview]
Message-ID: <20240416082957.GC6832@unreal> (raw)
In-Reply-To: <6-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com>
On Thu, Apr 11, 2024 at 01:46:19PM -0300, Jason Gunthorpe wrote:
> mlx5 has a built in self-test at driver startup to evaluate if the
> platform supports write combining to generate a 64 byte PCIe TLP or
> not. This has proven necessary because a lot of common scenarios end up
> with broken write combining (especially inside virtual machines) and there
> is other way to learn this information.
>
> This self test has been consistently failing on new ARM64 CPU
> designs (specifically with NVIDIA Grace's implementation of Neoverse
> V2). The C loop around writeq() generates some pretty terrible ARM64
> assembly, but historically this has worked on a lot of existing ARM64 CPUs
> till now.
>
> We see it succeed about 1 time in 10,000 on the worst effected
> systems. The CPU architects speculate that the load instructions
> interspersed with the stores makes the WC buffers statistically flush too
> often and thus the generation of large TLPs becomes infrequent. This makes
> the boot up test unreliable in that it indicates no write-combining,
> however userspace would be fine since it uses a ST4 instruction.
>
> Further, S390 has similar issues where only the special zpci_memcpy_toio()
> will actually generate large TLPs, and the open coded loop does not
> trigger it at all.
>
> Fix both ARM64 and S390 by switching to __iowrite64_copy() which now
> provides architecture specific variants that have a high change of
> generating a large TLP with write combining. x86 continues to use a
> similar writeq loop in the generate __iowrite64_copy().
>
> Fixes: 11f552e21755 ("IB/mlx5: Test write combining support")
> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/infiniband/hw/mlx5/mem.c | 8 +++-----
> 1 file changed, 3 insertions(+), 5 deletions(-)
>
Thanks,
Acked-by: Leon Romanovsky <leonro@nvidia.com>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-04-16 8:30 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-11 16:46 [PATCH v3 0/6] Fix mlx5 write combining support on new ARM64 cores Jason Gunthorpe
2024-04-11 16:46 ` Jason Gunthorpe
2024-04-11 16:46 ` [PATCH v3 1/6] x86: Stop using weak symbols for __iowrite32_copy() Jason Gunthorpe
2024-04-11 16:46 ` Jason Gunthorpe
2024-04-11 20:24 ` Arnd Bergmann
2024-04-11 20:24 ` Arnd Bergmann
2024-04-11 16:46 ` [PATCH v3 2/6] s390: Implement __iowrite32_copy() Jason Gunthorpe
2024-04-11 16:46 ` Jason Gunthorpe
2024-04-11 16:46 ` [PATCH v3 3/6] s390: Stop using weak symbols for __iowrite64_copy() Jason Gunthorpe
2024-04-11 16:46 ` Jason Gunthorpe
2024-04-11 20:23 ` Arnd Bergmann
2024-04-11 20:23 ` Arnd Bergmann
2024-04-11 16:46 ` [PATCH v3 4/6] arm64/io: Provide a WC friendly __iowriteXX_copy() Jason Gunthorpe
2024-04-11 16:46 ` Jason Gunthorpe
2024-04-11 16:46 ` [PATCH v3 5/6] net: hns3: Remove io_stop_wc() calls after __iowrite64_copy() Jason Gunthorpe
2024-04-11 16:46 ` Jason Gunthorpe
2024-04-11 16:46 ` [PATCH v3 6/6] IB/mlx5: Use __iowrite64_copy() for write combining stores Jason Gunthorpe
2024-04-11 16:46 ` Jason Gunthorpe
2024-04-16 8:29 ` Leon Romanovsky [this message]
2024-04-16 8:29 ` Leon Romanovsky
2025-07-14 21:55 ` Jason Gunthorpe
2025-07-15 5:57 ` Leon Romanovsky
2025-07-15 10:15 ` Will Deacon
2025-07-15 11:52 ` Jason Gunthorpe
2025-07-18 18:10 ` Catalin Marinas
2025-07-18 20:00 ` Jason Gunthorpe
2024-04-23 0:18 ` [PATCH v3 0/6] Fix mlx5 write combining support on new ARM64 cores Jason Gunthorpe
2024-04-23 0:18 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240416082957.GC6832@unreal \
--to=leon@kernel.org \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=borntraeger@linux.ibm.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=hpa@zytor.com \
--cc=jgg@nvidia.com \
--cc=justinstitt@google.com \
--cc=kuba@kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=mark.rutland@arm.com \
--cc=michaelgur@mellanox.com \
--cc=mingo@redhat.com \
--cc=morbo@google.com \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=patches@lists.linux.dev \
--cc=salil.mehta@huawei.com \
--cc=schnelle@linux.ibm.com \
--cc=shaojijie@huawei.com \
--cc=svens@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yisen.zhuang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.