From: Catalin Marinas <catalin.marinas@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
Niklas Schnelle <schnelle@linux.ibm.com>,
Leon Romanovsky <leon@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
linux-rdma@vger.kernel.org, llvm@lists.linux.dev,
Michael Guralnik <michaelgur@mellanox.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>
Subject: Re: [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64
Date: Wed, 24 Jan 2024 17:22:05 +0000 [thread overview]
Message-ID: <ZbFHPTUaBmbHYnwx@arm.com> (raw)
In-Reply-To: <20240124132719.GF1455070@nvidia.com>
On Wed, Jan 24, 2024 at 09:27:19AM -0400, Jason Gunthorpe wrote:
> On Wed, Jan 24, 2024 at 12:40:29PM +0000, Catalin Marinas wrote:
>
> > > Just to be clear, that means we should drop this patch ("arm64/io: add
> > > memcpy_toio_64") for now, right?
> >
> > In its current form yes, but that doesn't mean that memcpy_toio_64()
> > cannot be reworked differently.
>
> I gave up on touching memcpy_toio_64(), it doesn't work very well
> because of the weak alignment
>
> Instead I followed your suggestion to fix __iowrite64_copy()
I forgot the details. Was it to introduce an __iowrite512_copy()
function or to simply use __iowrite64_copy() with a count of 8?
> There are only a couple of places that use this API:
[...]
> __iowrite64_copy() has a sufficient API that the compiler can inline
> the STP block as this patch did.
>
> I experimented with having memcpy_toio_64() invoke __iowrite64_copy(),
> but it did not look very nice. Maybe there is a possible performance
> win there, I don't know.
Just invoking __iowrite64_copy() with a count of 8 wouldn't work well
even if we have the writeq generating STR with an offset (well, it also
increments the pointers, so I don't think Mark's optimisation would
help). The copy implies loads and these would be interleaved with stores
and potentially get in the way of write combining. An
__iowrite512_copy() or maybe even an optimised __iowrite64_copy() for
count 8 could do the loads first followed by the stores. You can use a
special path in __iowrite64_copy() with __builtin_contant_p().
You can try with an arm64 specific __iowrite64_copy() and see how it
goes (together with Mark's patch):
void __iowrite64_copy(void __iomem *to, const void *from,
size_t count)
{
u64 __iomem *dst = to;
const u64 *src = from;
const u64 *end = src + count;
/*
* Try a 64-byte write, the CPUs tend to write-combine them.
*/
if (__builtin_contant_p(count) && count == 8) {
__raw_writeq(*src, dst);
__raw_writeq(*(src + 1), dst + 1);
__raw_writeq(*(src + 2), dst + 2);
__raw_writeq(*(src + 3), dst + 3);
__raw_writeq(*(src + 4), dst + 4);
__raw_writeq(*(src + 5), dst + 5);
__raw_writeq(*(src + 6), dst + 6);
__raw_writeq(*(src + 7), dst + 7);
return;
}
while (src < end)
__raw_writeq(*src++, dst++);
}
EXPORT_SYMBOL_GPL(__iowrite64_copy);
What we don't have is inlining of __iowrite64_copy() but if we need that
we can move away from a weak symbol to a static inline.
Give this a go and see if it you get write-combining in your hardware.
If the loads interleaves with stores get in the way, maybe we can resort
to inline asm.
--
Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
Niklas Schnelle <schnelle@linux.ibm.com>,
Leon Romanovsky <leon@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
linux-rdma@vger.kernel.org, llvm@lists.linux.dev,
Michael Guralnik <michaelgur@mellanox.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>
Subject: Re: [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64
Date: Wed, 24 Jan 2024 17:22:05 +0000 [thread overview]
Message-ID: <ZbFHPTUaBmbHYnwx@arm.com> (raw)
In-Reply-To: <20240124132719.GF1455070@nvidia.com>
On Wed, Jan 24, 2024 at 09:27:19AM -0400, Jason Gunthorpe wrote:
> On Wed, Jan 24, 2024 at 12:40:29PM +0000, Catalin Marinas wrote:
>
> > > Just to be clear, that means we should drop this patch ("arm64/io: add
> > > memcpy_toio_64") for now, right?
> >
> > In its current form yes, but that doesn't mean that memcpy_toio_64()
> > cannot be reworked differently.
>
> I gave up on touching memcpy_toio_64(), it doesn't work very well
> because of the weak alignment
>
> Instead I followed your suggestion to fix __iowrite64_copy()
I forgot the details. Was it to introduce an __iowrite512_copy()
function or to simply use __iowrite64_copy() with a count of 8?
> There are only a couple of places that use this API:
[...]
> __iowrite64_copy() has a sufficient API that the compiler can inline
> the STP block as this patch did.
>
> I experimented with having memcpy_toio_64() invoke __iowrite64_copy(),
> but it did not look very nice. Maybe there is a possible performance
> win there, I don't know.
Just invoking __iowrite64_copy() with a count of 8 wouldn't work well
even if we have the writeq generating STR with an offset (well, it also
increments the pointers, so I don't think Mark's optimisation would
help). The copy implies loads and these would be interleaved with stores
and potentially get in the way of write combining. An
__iowrite512_copy() or maybe even an optimised __iowrite64_copy() for
count 8 could do the loads first followed by the stores. You can use a
special path in __iowrite64_copy() with __builtin_contant_p().
You can try with an arm64 specific __iowrite64_copy() and see how it
goes (together with Mark's patch):
void __iowrite64_copy(void __iomem *to, const void *from,
size_t count)
{
u64 __iomem *dst = to;
const u64 *src = from;
const u64 *end = src + count;
/*
* Try a 64-byte write, the CPUs tend to write-combine them.
*/
if (__builtin_contant_p(count) && count == 8) {
__raw_writeq(*src, dst);
__raw_writeq(*(src + 1), dst + 1);
__raw_writeq(*(src + 2), dst + 2);
__raw_writeq(*(src + 3), dst + 3);
__raw_writeq(*(src + 4), dst + 4);
__raw_writeq(*(src + 5), dst + 5);
__raw_writeq(*(src + 6), dst + 6);
__raw_writeq(*(src + 7), dst + 7);
return;
}
while (src < end)
__raw_writeq(*src++, dst++);
}
EXPORT_SYMBOL_GPL(__iowrite64_copy);
What we don't have is inlining of __iowrite64_copy() but if we need that
we can move away from a weak symbol to a static inline.
Give this a go and see if it you get write-combining in your hardware.
If the loads interleaves with stores get in the way, maybe we can resort
to inline asm.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-01-24 17:22 UTC|newest]
Thread overview: 136+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-23 19:04 [PATCH rdma-next 0/2] Add and use memcpy_toio_64() Leon Romanovsky
2023-11-23 19:04 ` Leon Romanovsky
2023-11-23 19:04 ` [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64 Leon Romanovsky
2023-11-23 19:04 ` Leon Romanovsky
2023-11-24 10:16 ` Mark Rutland
2023-11-24 10:16 ` Mark Rutland
2023-11-24 12:23 ` Jason Gunthorpe
2023-11-24 12:23 ` Jason Gunthorpe
2023-11-27 12:42 ` Catalin Marinas
2023-11-27 12:42 ` Catalin Marinas
2023-11-27 13:45 ` Jason Gunthorpe
2023-11-27 13:45 ` Jason Gunthorpe
2023-12-04 17:31 ` Catalin Marinas
2023-12-04 17:31 ` Catalin Marinas
2023-12-04 18:23 ` Jason Gunthorpe
2023-12-04 18:23 ` Jason Gunthorpe
2023-12-05 17:21 ` Catalin Marinas
2023-12-05 17:21 ` Catalin Marinas
2023-12-05 17:51 ` Jason Gunthorpe
2023-12-05 17:51 ` Jason Gunthorpe
2023-12-05 19:34 ` Catalin Marinas
2023-12-05 19:34 ` Catalin Marinas
2023-12-05 19:51 ` Jason Gunthorpe
2023-12-05 19:51 ` Jason Gunthorpe
2023-12-06 11:09 ` Catalin Marinas
2023-12-06 11:09 ` Catalin Marinas
2023-12-06 12:59 ` Jason Gunthorpe
2023-12-06 12:59 ` Jason Gunthorpe
2024-01-16 18:51 ` Jason Gunthorpe
2024-01-16 18:51 ` Jason Gunthorpe
2024-01-17 12:30 ` Mark Rutland
2024-01-17 12:30 ` Mark Rutland
2024-01-17 12:36 ` Jason Gunthorpe
2024-01-17 12:36 ` Jason Gunthorpe
2024-01-17 12:41 ` Jason Gunthorpe
2024-01-17 12:41 ` Jason Gunthorpe
2024-01-17 13:29 ` Mark Rutland
2024-01-17 13:29 ` Mark Rutland
2024-01-23 20:38 ` Catalin Marinas
2024-01-23 20:38 ` Catalin Marinas
2024-01-24 1:27 ` Jason Gunthorpe
2024-01-24 1:27 ` Jason Gunthorpe
2024-01-24 8:26 ` Marc Zyngier
2024-01-24 8:26 ` Marc Zyngier
2024-01-24 13:06 ` Jason Gunthorpe
2024-01-24 13:06 ` Jason Gunthorpe
2024-01-24 13:32 ` Marc Zyngier
2024-01-24 13:32 ` Marc Zyngier
2024-01-24 15:52 ` Jason Gunthorpe
2024-01-24 15:52 ` Jason Gunthorpe
2024-01-24 17:54 ` Catalin Marinas
2024-01-24 17:54 ` Catalin Marinas
2024-01-25 1:29 ` Jason Gunthorpe
2024-01-25 1:29 ` Jason Gunthorpe
2024-01-26 16:15 ` Catalin Marinas
2024-01-26 16:15 ` Catalin Marinas
2024-01-26 17:09 ` Jason Gunthorpe
2024-01-26 17:09 ` Jason Gunthorpe
2024-01-24 11:38 ` Mark Rutland
2024-01-24 11:38 ` Mark Rutland
2024-01-24 12:40 ` Catalin Marinas
2024-01-24 12:40 ` Catalin Marinas
2024-01-24 13:27 ` Jason Gunthorpe
2024-01-24 13:27 ` Jason Gunthorpe
2024-01-24 17:22 ` Catalin Marinas [this message]
2024-01-24 17:22 ` Catalin Marinas
2024-01-24 19:26 ` Jason Gunthorpe
2024-01-24 19:26 ` Jason Gunthorpe
2024-01-25 17:43 ` Jason Gunthorpe
2024-01-25 17:43 ` Jason Gunthorpe
2024-01-26 14:56 ` Catalin Marinas
2024-01-26 14:56 ` Catalin Marinas
2024-01-26 15:24 ` Jason Gunthorpe
2024-01-26 15:24 ` Jason Gunthorpe
2024-01-17 14:07 ` Mark Rutland
2024-01-17 14:07 ` Mark Rutland
2024-01-17 15:28 ` Jason Gunthorpe
2024-01-17 15:28 ` Jason Gunthorpe
2024-01-17 16:05 ` Will Deacon
2024-01-17 16:05 ` Will Deacon
2024-01-18 16:18 ` Jason Gunthorpe
2024-01-18 16:18 ` Jason Gunthorpe
2024-01-24 11:31 ` Mark Rutland
2024-01-24 11:31 ` Mark Rutland
2023-11-24 12:58 ` Robin Murphy
2023-11-24 12:58 ` Robin Murphy
2023-11-24 13:45 ` Jason Gunthorpe
2023-11-24 13:45 ` Jason Gunthorpe
2023-11-24 15:32 ` Robin Murphy
2023-11-24 15:32 ` Robin Murphy
2023-11-24 14:10 ` Niklas Schnelle
2023-11-24 14:10 ` Niklas Schnelle
2023-11-24 14:20 ` Jason Gunthorpe
2023-11-24 14:20 ` Jason Gunthorpe
2023-11-24 14:48 ` Niklas Schnelle
2023-11-24 14:48 ` Niklas Schnelle
2023-11-24 14:53 ` Niklas Schnelle
2023-11-24 14:53 ` Niklas Schnelle
2023-11-24 14:55 ` Jason Gunthorpe
2023-11-24 14:55 ` Jason Gunthorpe
2023-11-24 15:59 ` Niklas Schnelle
2023-11-24 15:59 ` Niklas Schnelle
2023-11-24 16:06 ` Jason Gunthorpe
2023-11-24 16:06 ` Jason Gunthorpe
2023-11-27 17:43 ` Niklas Schnelle
2023-11-27 17:43 ` Niklas Schnelle
2023-11-27 17:51 ` Jason Gunthorpe
2023-11-27 17:51 ` Jason Gunthorpe
2023-11-28 16:28 ` Niklas Schnelle
2023-11-28 16:28 ` Niklas Schnelle
2024-01-16 17:33 ` Jason Gunthorpe
2024-01-16 17:33 ` Jason Gunthorpe
2024-01-17 13:20 ` Niklas Schnelle
2024-01-17 13:20 ` Niklas Schnelle
2024-01-17 13:26 ` Jason Gunthorpe
2024-01-17 13:26 ` Jason Gunthorpe
2024-01-17 17:55 ` Jason Gunthorpe
2024-01-17 17:55 ` Jason Gunthorpe
2024-01-18 13:46 ` Niklas Schnelle
2024-01-18 13:46 ` Niklas Schnelle
2024-01-18 14:00 ` Jason Gunthorpe
2024-01-18 14:00 ` Jason Gunthorpe
2024-01-18 15:59 ` Niklas Schnelle
2024-01-18 15:59 ` Niklas Schnelle
2024-01-18 16:21 ` Jason Gunthorpe
2024-01-18 16:21 ` Jason Gunthorpe
2024-01-18 16:25 ` Niklas Schnelle
2024-01-18 16:25 ` Niklas Schnelle
2024-01-19 11:52 ` Niklas Schnelle
2024-01-19 11:52 ` Niklas Schnelle
2024-02-16 12:09 ` Niklas Schnelle
2024-02-16 12:09 ` Niklas Schnelle
2024-02-16 12:39 ` Jason Gunthorpe
2024-02-16 12:39 ` Jason Gunthorpe
2023-11-23 19:04 ` [PATCH rdma-next 2/2] IB/mlx5: Use memcpy_toio_64() for write combining stores Leon Romanovsky
2023-11-23 19:04 ` Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZbFHPTUaBmbHYnwx@arm.com \
--to=catalin.marinas@arm.com \
--cc=arnd@arndb.de \
--cc=jgg@nvidia.com \
--cc=leon@kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=michaelgur@mellanox.com \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=schnelle@linux.ibm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.