From: David Laight <David.Laight@ACULAB.COM>
To: 'Jason Gunthorpe' <jgg@nvidia.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
"Heiko Carstens" <hca@linux.ibm.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Justin Stitt <justinstitt@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Leon Romanovsky <leon@kernel.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
"llvm@lists.linux.dev" <llvm@lists.linux.dev>,
Ingo Molnar <mingo@redhat.com>, Bill Wendling <morbo@google.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Salil Mehta <salil.mehta@huawei.com>,
Jijie Shao <shaojijie@huawei.com>,
Sven Schnelle <svens@linux.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
"x86@kernel.org" <x86@kernel.org>,
Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Arnd Bergmann <arnd@arndb.de>,
Catalin Marinas <catalin.marinas@arm.com>,
Leon Romanovsky <leonro@mellanox.com>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
Mark Rutland <mark.rutland@arm.com>,
Michael Guralnik <michaelgur@mellanox.com>,
"patches@lists.linux.dev" <patches@lists.linux.dev>,
Niklas Schnelle <schnelle@linux.ibm.com>,
"Will Deacon" <will@kernel.org>
Subject: RE: [PATCH 4/6] arm64/io: Provide a WC friendly __iowriteXX_copy()
Date: Thu, 22 Feb 2024 22:05:04 +0000 [thread overview]
Message-ID: <6d335e8701334a15b220b75d49b98d77@AcuMS.aculab.com> (raw)
In-Reply-To: <4-v1-38290193eace+5-mlx5_arm_wc_jgg@nvidia.com>
From: Jason Gunthorpe
> Sent: 21 February 2024 01:17
>
> The kernel provides driver support for using write combining IO memory
> through the __iowriteXX_copy() API which is commonly used as an optional
> optimization to generate 16/32/64 byte MemWr TLPs in a PCIe environment.
>
...
> Implement __iowrite32/64_copy() specifically for ARM64 and use inline
> assembly to build consecutive blocks of STR instructions. Provide direct
> support for 64/32/16 large TLP generation in this manner. Optimize for
> common constant lengths so that the compiler can directly inline the store
> blocks.
...
> +/*
> + * This generates a memcpy that works on a from/to address which is aligned to
> + * bits. Count is in terms of the number of bits sized quantities to copy. It
> + * optimizes to use the STR groupings when possible so that it is WC friendly.
> + */
> +#define memcpy_toio_aligned(to, from, count, bits) \
> + ({ \
> + volatile u##bits __iomem *_to = to; \
> + const u##bits *_from = from; \
> + size_t _count = count; \
> + const u##bits *_end_from = _from + ALIGN_DOWN(_count, 8); \
> + \
> + for (; _from < _end_from; _from += 8, _to += 8) \
> + __const_memcpy_toio_aligned##bits(_to, _from, 8); \
> + if ((_count % 8) >= 4) {
If (_count & 4) {
\
> + __const_memcpy_toio_aligned##bits(_to, _from, 4); \
> + _from += 4; \
> + _to += 4; \
> + } \
> + if ((_count % 4) >= 2) { \
Ditto
> + __const_memcpy_toio_aligned##bits(_to, _from, 2); \
> + _from += 2; \
> + _to += 2; \
> + } \
> + if (_count % 2) \
and again
> + __const_memcpy_toio_aligned##bits(_to, _from, 1); \
> + })
But that looks bit a bit large to be inlined.
Except, perhaps, for small constant lengths.
I'd guess that even with write-combining and posted PCIe writes it
doesn't take much for it to be PCIe limited rather than cpu limited?
Is there a sane way to do the same for reads - they are far worse
than writes.
I solved the problem a few years back on a little ppc by using an on-cpu
DMA controller that could do PCIe master accesses and spinning until
the transfer completed.
But that sort of DMA controller seems uncommon.
We now initiate most of the transfers from the slave (an fpga) - after
writing a suitable/sane dma controller for that end.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
next prev parent reply other threads:[~2024-02-22 22:05 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-21 1:17 [PATCH 0/6] Fix mlx5 write combining support on new ARM64 cores Jason Gunthorpe
2024-02-21 1:17 ` [PATCH 1/6] x86: Stop using weak symbols for __iowrite32_copy() Jason Gunthorpe
2024-02-21 1:17 ` [PATCH 2/6] s390: Implement __iowrite32_copy() Jason Gunthorpe
2024-02-21 1:17 ` [PATCH 3/6] s390: Stop using weak symbols for __iowrite64_copy() Jason Gunthorpe
2024-02-21 1:17 ` [PATCH 4/6] arm64/io: Provide a WC friendly __iowriteXX_copy() Jason Gunthorpe
2024-02-21 19:22 ` Will Deacon
2024-02-21 23:28 ` Jason Gunthorpe
2024-02-22 22:05 ` David Laight [this message]
2024-02-22 22:36 ` Jason Gunthorpe
2024-02-23 9:07 ` David Laight
2024-02-23 11:01 ` Niklas Schnelle
2024-02-23 11:05 ` David Laight
2024-02-23 12:53 ` Jason Gunthorpe
2024-02-23 11:38 ` Niklas Schnelle
2024-02-23 12:19 ` David Laight
2024-02-23 13:03 ` Jason Gunthorpe
2024-02-23 13:52 ` David Laight
2024-02-23 14:44 ` Jason Gunthorpe
2024-02-23 12:58 ` Jason Gunthorpe
2024-02-23 16:35 ` Niklas Schnelle
2024-02-23 17:05 ` Jason Gunthorpe
2024-02-27 10:37 ` Catalin Marinas
2024-02-28 23:06 ` Jason Gunthorpe
2024-02-29 10:24 ` Catalin Marinas
2024-02-29 13:28 ` Jason Gunthorpe
2024-02-29 10:33 ` Catalin Marinas
2024-02-29 13:29 ` Jason Gunthorpe
2024-03-01 18:52 ` Catalin Marinas
2024-02-21 1:17 ` [PATCH 5/6] net: hns3: Remove io_stop_wc() calls after __iowrite64_copy() Jason Gunthorpe
2024-02-22 0:57 ` Jijie Shao
2024-02-21 1:17 ` [PATCH 6/6] IB/mlx5: Use __iowrite64_copy() for write combining stores Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6d335e8701334a15b220b75d49b98d77@AcuMS.aculab.com \
--to=david.laight@aculab.com \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=borntraeger@linux.ibm.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=hpa@zytor.com \
--cc=jgg@nvidia.com \
--cc=justinstitt@google.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=leonro@mellanox.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=mark.rutland@arm.com \
--cc=michaelgur@mellanox.com \
--cc=mingo@redhat.com \
--cc=morbo@google.com \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=patches@lists.linux.dev \
--cc=salil.mehta@huawei.com \
--cc=schnelle@linux.ibm.com \
--cc=shaojijie@huawei.com \
--cc=svens@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yisen.zhuang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).