From: Catalin Marinas <catalin.marinas@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
Leon Romanovsky <leon@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
linux-rdma@vger.kernel.org, llvm@lists.linux.dev,
Michael Guralnik <michaelgur@mellanox.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
Will Deacon <will@kernel.org>
Subject: Re: [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64
Date: Mon, 27 Nov 2023 12:42:41 +0000 [thread overview]
Message-ID: <ZWSOwT2OyMXD1lmo@arm.com> (raw)
In-Reply-To: <20231124122352.GB436702@nvidia.com>
On Fri, Nov 24, 2023 at 08:23:52AM -0400, Jason Gunthorpe wrote:
> On Fri, Nov 24, 2023 at 10:16:15AM +0000, Mark Rutland wrote:
> > On Thu, Nov 23, 2023 at 09:04:31PM +0200, Leon Romanovsky wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
[...]
> > > Provide a new generic function memcpy_toio_64() which should reliably
> > > generate the needed instructions for the architecture, assuming address
> > > alignment. As the usual need for this operation is performance sensitive a
> > > fast inline implementation is preferred.
> >
> > There is *no* architectural sequence that is guaranteed to reliably generate a
> > 64-byte TLP, and this sequence won't guarnatee that (e.g. even if the CPU
> > *always* merged adjacent stores, we can take an interrupt mid-sequence that
> > would prevent that).
>
> WC is not guaranteed on any arch, that is well known.
>
> The HW has means to handle fragmented TLPs, it just hurts performance
> when it happens. "reliable" here means we'd like to see something like
> a > 90% chance of the large TLP instead of the < 1% chance with the C
> loop.
>
> Future ARM CPUs have the ST64B instruction which does provide the
> architectural guarantee, and x86 has a similar guaranteed instruction
> now too.
>
> > What's the actual requirement here? Is this just for performance?
>
> Yes, just performance.
Do you have any rough numbers (percentage)? It's highly
microarchitecture-dependent until we get the ST64B instruction.
More of a bike-shedding, I wonder whether the __iowrite*_copy()
semantics are better suited for what you need in terms of ordering (not
that mempcy_toio() to Normal NC memory gives us any ordering).
--
Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
Leon Romanovsky <leon@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
linux-rdma@vger.kernel.org, llvm@lists.linux.dev,
Michael Guralnik <michaelgur@mellanox.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
Will Deacon <will@kernel.org>
Subject: Re: [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64
Date: Mon, 27 Nov 2023 12:42:41 +0000 [thread overview]
Message-ID: <ZWSOwT2OyMXD1lmo@arm.com> (raw)
In-Reply-To: <20231124122352.GB436702@nvidia.com>
On Fri, Nov 24, 2023 at 08:23:52AM -0400, Jason Gunthorpe wrote:
> On Fri, Nov 24, 2023 at 10:16:15AM +0000, Mark Rutland wrote:
> > On Thu, Nov 23, 2023 at 09:04:31PM +0200, Leon Romanovsky wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
[...]
> > > Provide a new generic function memcpy_toio_64() which should reliably
> > > generate the needed instructions for the architecture, assuming address
> > > alignment. As the usual need for this operation is performance sensitive a
> > > fast inline implementation is preferred.
> >
> > There is *no* architectural sequence that is guaranteed to reliably generate a
> > 64-byte TLP, and this sequence won't guarnatee that (e.g. even if the CPU
> > *always* merged adjacent stores, we can take an interrupt mid-sequence that
> > would prevent that).
>
> WC is not guaranteed on any arch, that is well known.
>
> The HW has means to handle fragmented TLPs, it just hurts performance
> when it happens. "reliable" here means we'd like to see something like
> a > 90% chance of the large TLP instead of the < 1% chance with the C
> loop.
>
> Future ARM CPUs have the ST64B instruction which does provide the
> architectural guarantee, and x86 has a similar guaranteed instruction
> now too.
>
> > What's the actual requirement here? Is this just for performance?
>
> Yes, just performance.
Do you have any rough numbers (percentage)? It's highly
microarchitecture-dependent until we get the ST64B instruction.
More of a bike-shedding, I wonder whether the __iowrite*_copy()
semantics are better suited for what you need in terms of ordering (not
that mempcy_toio() to Normal NC memory gives us any ordering).
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-11-27 12:42 UTC|newest]
Thread overview: 136+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-23 19:04 [PATCH rdma-next 0/2] Add and use memcpy_toio_64() Leon Romanovsky
2023-11-23 19:04 ` Leon Romanovsky
2023-11-23 19:04 ` [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64 Leon Romanovsky
2023-11-23 19:04 ` Leon Romanovsky
2023-11-24 10:16 ` Mark Rutland
2023-11-24 10:16 ` Mark Rutland
2023-11-24 12:23 ` Jason Gunthorpe
2023-11-24 12:23 ` Jason Gunthorpe
2023-11-27 12:42 ` Catalin Marinas [this message]
2023-11-27 12:42 ` Catalin Marinas
2023-11-27 13:45 ` Jason Gunthorpe
2023-11-27 13:45 ` Jason Gunthorpe
2023-12-04 17:31 ` Catalin Marinas
2023-12-04 17:31 ` Catalin Marinas
2023-12-04 18:23 ` Jason Gunthorpe
2023-12-04 18:23 ` Jason Gunthorpe
2023-12-05 17:21 ` Catalin Marinas
2023-12-05 17:21 ` Catalin Marinas
2023-12-05 17:51 ` Jason Gunthorpe
2023-12-05 17:51 ` Jason Gunthorpe
2023-12-05 19:34 ` Catalin Marinas
2023-12-05 19:34 ` Catalin Marinas
2023-12-05 19:51 ` Jason Gunthorpe
2023-12-05 19:51 ` Jason Gunthorpe
2023-12-06 11:09 ` Catalin Marinas
2023-12-06 11:09 ` Catalin Marinas
2023-12-06 12:59 ` Jason Gunthorpe
2023-12-06 12:59 ` Jason Gunthorpe
2024-01-16 18:51 ` Jason Gunthorpe
2024-01-16 18:51 ` Jason Gunthorpe
2024-01-17 12:30 ` Mark Rutland
2024-01-17 12:30 ` Mark Rutland
2024-01-17 12:36 ` Jason Gunthorpe
2024-01-17 12:36 ` Jason Gunthorpe
2024-01-17 12:41 ` Jason Gunthorpe
2024-01-17 12:41 ` Jason Gunthorpe
2024-01-17 13:29 ` Mark Rutland
2024-01-17 13:29 ` Mark Rutland
2024-01-23 20:38 ` Catalin Marinas
2024-01-23 20:38 ` Catalin Marinas
2024-01-24 1:27 ` Jason Gunthorpe
2024-01-24 1:27 ` Jason Gunthorpe
2024-01-24 8:26 ` Marc Zyngier
2024-01-24 8:26 ` Marc Zyngier
2024-01-24 13:06 ` Jason Gunthorpe
2024-01-24 13:06 ` Jason Gunthorpe
2024-01-24 13:32 ` Marc Zyngier
2024-01-24 13:32 ` Marc Zyngier
2024-01-24 15:52 ` Jason Gunthorpe
2024-01-24 15:52 ` Jason Gunthorpe
2024-01-24 17:54 ` Catalin Marinas
2024-01-24 17:54 ` Catalin Marinas
2024-01-25 1:29 ` Jason Gunthorpe
2024-01-25 1:29 ` Jason Gunthorpe
2024-01-26 16:15 ` Catalin Marinas
2024-01-26 16:15 ` Catalin Marinas
2024-01-26 17:09 ` Jason Gunthorpe
2024-01-26 17:09 ` Jason Gunthorpe
2024-01-24 11:38 ` Mark Rutland
2024-01-24 11:38 ` Mark Rutland
2024-01-24 12:40 ` Catalin Marinas
2024-01-24 12:40 ` Catalin Marinas
2024-01-24 13:27 ` Jason Gunthorpe
2024-01-24 13:27 ` Jason Gunthorpe
2024-01-24 17:22 ` Catalin Marinas
2024-01-24 17:22 ` Catalin Marinas
2024-01-24 19:26 ` Jason Gunthorpe
2024-01-24 19:26 ` Jason Gunthorpe
2024-01-25 17:43 ` Jason Gunthorpe
2024-01-25 17:43 ` Jason Gunthorpe
2024-01-26 14:56 ` Catalin Marinas
2024-01-26 14:56 ` Catalin Marinas
2024-01-26 15:24 ` Jason Gunthorpe
2024-01-26 15:24 ` Jason Gunthorpe
2024-01-17 14:07 ` Mark Rutland
2024-01-17 14:07 ` Mark Rutland
2024-01-17 15:28 ` Jason Gunthorpe
2024-01-17 15:28 ` Jason Gunthorpe
2024-01-17 16:05 ` Will Deacon
2024-01-17 16:05 ` Will Deacon
2024-01-18 16:18 ` Jason Gunthorpe
2024-01-18 16:18 ` Jason Gunthorpe
2024-01-24 11:31 ` Mark Rutland
2024-01-24 11:31 ` Mark Rutland
2023-11-24 12:58 ` Robin Murphy
2023-11-24 12:58 ` Robin Murphy
2023-11-24 13:45 ` Jason Gunthorpe
2023-11-24 13:45 ` Jason Gunthorpe
2023-11-24 15:32 ` Robin Murphy
2023-11-24 15:32 ` Robin Murphy
2023-11-24 14:10 ` Niklas Schnelle
2023-11-24 14:10 ` Niklas Schnelle
2023-11-24 14:20 ` Jason Gunthorpe
2023-11-24 14:20 ` Jason Gunthorpe
2023-11-24 14:48 ` Niklas Schnelle
2023-11-24 14:48 ` Niklas Schnelle
2023-11-24 14:53 ` Niklas Schnelle
2023-11-24 14:53 ` Niklas Schnelle
2023-11-24 14:55 ` Jason Gunthorpe
2023-11-24 14:55 ` Jason Gunthorpe
2023-11-24 15:59 ` Niklas Schnelle
2023-11-24 15:59 ` Niklas Schnelle
2023-11-24 16:06 ` Jason Gunthorpe
2023-11-24 16:06 ` Jason Gunthorpe
2023-11-27 17:43 ` Niklas Schnelle
2023-11-27 17:43 ` Niklas Schnelle
2023-11-27 17:51 ` Jason Gunthorpe
2023-11-27 17:51 ` Jason Gunthorpe
2023-11-28 16:28 ` Niklas Schnelle
2023-11-28 16:28 ` Niklas Schnelle
2024-01-16 17:33 ` Jason Gunthorpe
2024-01-16 17:33 ` Jason Gunthorpe
2024-01-17 13:20 ` Niklas Schnelle
2024-01-17 13:20 ` Niklas Schnelle
2024-01-17 13:26 ` Jason Gunthorpe
2024-01-17 13:26 ` Jason Gunthorpe
2024-01-17 17:55 ` Jason Gunthorpe
2024-01-17 17:55 ` Jason Gunthorpe
2024-01-18 13:46 ` Niklas Schnelle
2024-01-18 13:46 ` Niklas Schnelle
2024-01-18 14:00 ` Jason Gunthorpe
2024-01-18 14:00 ` Jason Gunthorpe
2024-01-18 15:59 ` Niklas Schnelle
2024-01-18 15:59 ` Niklas Schnelle
2024-01-18 16:21 ` Jason Gunthorpe
2024-01-18 16:21 ` Jason Gunthorpe
2024-01-18 16:25 ` Niklas Schnelle
2024-01-18 16:25 ` Niklas Schnelle
2024-01-19 11:52 ` Niklas Schnelle
2024-01-19 11:52 ` Niklas Schnelle
2024-02-16 12:09 ` Niklas Schnelle
2024-02-16 12:09 ` Niklas Schnelle
2024-02-16 12:39 ` Jason Gunthorpe
2024-02-16 12:39 ` Jason Gunthorpe
2023-11-23 19:04 ` [PATCH rdma-next 2/2] IB/mlx5: Use memcpy_toio_64() for write combining stores Leon Romanovsky
2023-11-23 19:04 ` Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZWSOwT2OyMXD1lmo@arm.com \
--to=catalin.marinas@arm.com \
--cc=arnd@arndb.de \
--cc=jgg@nvidia.com \
--cc=leon@kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=mark.rutland@arm.com \
--cc=michaelgur@mellanox.com \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.