From: Jason Gunthorpe <jgg@nvidia.com>
To: David Laight <David.Laight@aculab.com>
Cc: 'Niklas Schnelle' <schnelle@linux.ibm.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Heiko Carstens <hca@linux.ibm.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Justin Stitt <justinstitt@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Leon Romanovsky <leon@kernel.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
"llvm@lists.linux.dev" <llvm@lists.linux.dev>,
Ingo Molnar <mingo@redhat.com>, Bill Wendling <morbo@google.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Salil Mehta <salil.mehta@huawei.com>,
Jijie Shao <shaojijie@huawei.com>,
Sven Schnelle <svens@linux.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
"x86@kernel.org" <x86@kernel.org>,
Yisen Zhuang <yisen.zhuang@huawei.com>,
Arnd Bergmann <arnd@arndb.de>,
Catalin Marinas <catalin.marinas@arm.com>,
Leon Romanovsky <leonro@mellanox.com>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
Mark Rutland <mark.rutland@arm.com>,
Michael Guralnik <michaelgur@mellanox.com>,
"patches@lists.linux.dev" <patches@lists.linux.dev>,
Will Deacon <will@kernel.org>
Subject: Re: [PATCH 4/6] arm64/io: Provide a WC friendly __iowriteXX_copy()
Date: Fri, 23 Feb 2024 10:44:02 -0400 [thread overview]
Message-ID: <20240223144402.GG13330@nvidia.com> (raw)
In-Reply-To: <18248cc6f411441c8a68a55f68416150@AcuMS.aculab.com>
On Fri, Feb 23, 2024 at 01:52:37PM +0000, David Laight wrote:
> > > Since writes get 'posted' all over the place.
> > > How many writes do you need to do before write-combining makes a
> > > difference?
> >
> > The issue is that the HW can optimize if the entire transaction is
> > presented in one TLP, if it has to reassemble the transaction it takes
> > a big slow path hit.
>
> Ah, so you aren't optimising to reduce the number of TLP for
> (effectively) a write to a memory buffer, but have a pcie slave
> that really want to see (for example) the writes for a ring buffer
> entry in a single TLP?
>
> So you really want something that (should) generate a 16 (or 32)
> byte TLP? Rather than abusing the function that is expected to
> generate multiple 8 byte TLP to generate larger TLP.
__iowriteXX_copy() was originally created by Pathscale (an RDMA device
company) to support RDMA drivers doing exactly this workload. It is
not an abuse.
> It is rather a shame that there isn't an efficient way to get
> access to a couple of large SIMD registers.
Yes, userspace uses SIMD to make this work alot better and run faster.
Jason
WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: David Laight <David.Laight@aculab.com>
Cc: 'Niklas Schnelle' <schnelle@linux.ibm.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Heiko Carstens <hca@linux.ibm.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Justin Stitt <justinstitt@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Leon Romanovsky <leon@kernel.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
"llvm@lists.linux.dev" <llvm@lists.linux.dev>,
Ingo Molnar <mingo@redhat.com>, Bill Wendling <morbo@google.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Salil Mehta <salil.mehta@huawei.com>,
Jijie Shao <shaojijie@huawei.com>,
Sven Schnelle <svens@linux.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
"x86@kernel.org" <x86@kernel.org>,
Yisen Zhuang <yisen.zhuang@huawei.com>,
Arnd Bergmann <arnd@arndb.de>,
Catalin Marinas <catalin.marinas@arm.com>,
Leon Romanovsky <leonro@mellanox.com>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
Mark Rutland <mark.rutland@arm.com>,
Michael Guralnik <michaelgur@mellanox.com>,
"patches@lists.linux.dev" <patches@lists.linux.dev>,
Will Deacon <will@kernel.org>
Subject: Re: [PATCH 4/6] arm64/io: Provide a WC friendly __iowriteXX_copy()
Date: Fri, 23 Feb 2024 10:44:02 -0400 [thread overview]
Message-ID: <20240223144402.GG13330@nvidia.com> (raw)
In-Reply-To: <18248cc6f411441c8a68a55f68416150@AcuMS.aculab.com>
On Fri, Feb 23, 2024 at 01:52:37PM +0000, David Laight wrote:
> > > Since writes get 'posted' all over the place.
> > > How many writes do you need to do before write-combining makes a
> > > difference?
> >
> > The issue is that the HW can optimize if the entire transaction is
> > presented in one TLP, if it has to reassemble the transaction it takes
> > a big slow path hit.
>
> Ah, so you aren't optimising to reduce the number of TLP for
> (effectively) a write to a memory buffer, but have a pcie slave
> that really want to see (for example) the writes for a ring buffer
> entry in a single TLP?
>
> So you really want something that (should) generate a 16 (or 32)
> byte TLP? Rather than abusing the function that is expected to
> generate multiple 8 byte TLP to generate larger TLP.
__iowriteXX_copy() was originally created by Pathscale (an RDMA device
company) to support RDMA drivers doing exactly this workload. It is
not an abuse.
> It is rather a shame that there isn't an efficient way to get
> access to a couple of large SIMD registers.
Yes, userspace uses SIMD to make this work alot better and run faster.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-02-23 14:44 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-21 1:17 [PATCH 0/6] Fix mlx5 write combining support on new ARM64 cores Jason Gunthorpe
2024-02-21 1:17 ` Jason Gunthorpe
2024-02-21 1:17 ` [PATCH 1/6] x86: Stop using weak symbols for __iowrite32_copy() Jason Gunthorpe
2024-02-21 1:17 ` Jason Gunthorpe
2024-02-21 1:17 ` [PATCH 2/6] s390: Implement __iowrite32_copy() Jason Gunthorpe
2024-02-21 1:17 ` Jason Gunthorpe
2024-02-21 1:17 ` [PATCH 3/6] s390: Stop using weak symbols for __iowrite64_copy() Jason Gunthorpe
2024-02-21 1:17 ` Jason Gunthorpe
2024-02-21 1:17 ` [PATCH 4/6] arm64/io: Provide a WC friendly __iowriteXX_copy() Jason Gunthorpe
2024-02-21 1:17 ` Jason Gunthorpe
2024-02-21 19:22 ` Will Deacon
2024-02-21 19:22 ` Will Deacon
2024-02-21 23:28 ` Jason Gunthorpe
2024-02-21 23:28 ` Jason Gunthorpe
2024-02-22 22:05 ` David Laight
2024-02-22 22:05 ` David Laight
2024-02-22 22:36 ` Jason Gunthorpe
2024-02-22 22:36 ` Jason Gunthorpe
2024-02-23 9:07 ` David Laight
2024-02-23 9:07 ` David Laight
2024-02-23 11:01 ` Niklas Schnelle
2024-02-23 11:01 ` Niklas Schnelle
2024-02-23 11:05 ` David Laight
2024-02-23 11:05 ` David Laight
2024-02-23 12:53 ` Jason Gunthorpe
2024-02-23 12:53 ` Jason Gunthorpe
2024-02-23 11:38 ` Niklas Schnelle
2024-02-23 11:38 ` Niklas Schnelle
2024-02-23 12:19 ` David Laight
2024-02-23 12:19 ` David Laight
2024-02-23 13:03 ` Jason Gunthorpe
2024-02-23 13:03 ` Jason Gunthorpe
2024-02-23 13:52 ` David Laight
2024-02-23 13:52 ` David Laight
2024-02-23 14:44 ` Jason Gunthorpe [this message]
2024-02-23 14:44 ` Jason Gunthorpe
2024-02-23 12:58 ` Jason Gunthorpe
2024-02-23 12:58 ` Jason Gunthorpe
2024-02-23 16:35 ` Niklas Schnelle
2024-02-23 16:35 ` Niklas Schnelle
2024-02-23 17:05 ` Jason Gunthorpe
2024-02-23 17:05 ` Jason Gunthorpe
2024-02-27 10:37 ` Catalin Marinas
2024-02-27 10:37 ` Catalin Marinas
2024-02-28 23:06 ` Jason Gunthorpe
2024-02-28 23:06 ` Jason Gunthorpe
2024-02-29 10:24 ` Catalin Marinas
2024-02-29 10:24 ` Catalin Marinas
2024-02-29 13:28 ` Jason Gunthorpe
2024-02-29 13:28 ` Jason Gunthorpe
2024-02-29 10:33 ` Catalin Marinas
2024-02-29 10:33 ` Catalin Marinas
2024-02-29 13:29 ` Jason Gunthorpe
2024-02-29 13:29 ` Jason Gunthorpe
2024-03-01 18:52 ` Catalin Marinas
2024-03-01 18:52 ` Catalin Marinas
2024-02-21 1:17 ` [PATCH 5/6] net: hns3: Remove io_stop_wc() calls after __iowrite64_copy() Jason Gunthorpe
2024-02-21 1:17 ` Jason Gunthorpe
2024-02-22 0:57 ` Jijie Shao
2024-02-22 0:57 ` Jijie Shao
2024-02-21 1:17 ` [PATCH 6/6] IB/mlx5: Use __iowrite64_copy() for write combining stores Jason Gunthorpe
2024-02-21 1:17 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240223144402.GG13330@nvidia.com \
--to=jgg@nvidia.com \
--cc=David.Laight@aculab.com \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=borntraeger@linux.ibm.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=hpa@zytor.com \
--cc=justinstitt@google.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=leonro@mellanox.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=mark.rutland@arm.com \
--cc=michaelgur@mellanox.com \
--cc=mingo@redhat.com \
--cc=morbo@google.com \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=patches@lists.linux.dev \
--cc=salil.mehta@huawei.com \
--cc=schnelle@linux.ibm.com \
--cc=shaojijie@huawei.com \
--cc=svens@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yisen.zhuang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.