From: Catalin Marinas <catalin.marinas@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>,
Mark Rutland <mark.rutland@arm.com>,
Leon Romanovsky <leon@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
linux-rdma@vger.kernel.org, llvm@lists.linux.dev,
Michael Guralnik <michaelgur@mellanox.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
Will Deacon <will@kernel.org>
Subject: Re: [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64
Date: Tue, 5 Dec 2023 19:34:45 +0000 [thread overview]
Message-ID: <ZW97VdHYH3HYVyd5@arm.com> (raw)
In-Reply-To: <20231205175127.GJ2692119@nvidia.com>
On Tue, Dec 05, 2023 at 01:51:27PM -0400, Jason Gunthorpe wrote:
> On Tue, Dec 05, 2023 at 05:21:27PM +0000, Catalin Marinas wrote:
> > On Mon, Dec 04, 2023 at 02:23:30PM -0400, Jason Gunthorpe wrote:
> > > On Mon, Dec 04, 2023 at 05:31:47PM +0000, Catalin Marinas wrote:
> > > > Personally I'd optimise the mempcy_toio() arm64 implementation to do
> > > > STPs if the alignment is right (like we do for classic memcpy()).
> > > > There's a slight overhead for alignment checking but I suspect it would
> > > > be lost as long as you can get the write-combining. Not sure whether the
> > > > interspersed reads in memcpy_toio() would somehow prevent the
> > > > write-combining.
> > >
> > > I understand on these new CPUs anything other than a block of
> > > contiguous STPs is risky to break the WC. I was told we should not
> > > have any loads between them.
> >
> > Classic memcpy does similar tricks with four LDPs in a row before
> > starting to issue the STPs (though there are new LDPs for the next
> > data in-between). But that was tuned for cacheable memory, not sure
> > if something similar would behave well on Normal-NC memory.
>
> Can we conclude a direction here?
>
> 1) I don't want to mess with x86 so we keep a dedicated API
> Are we agreed to call it __iowrite512_copy() and note its special
> alignment limitation?
Sounds fine to me.
> 2) You want to #define __iowrite512_copy() to memcpy_toio() on ARM and
> implement some quad STP optimization for this case?
We can have the generic __iowrite512_copy() do memcpy_toio() and have
the arm64 implement an optimised version.
What I'm not entirely sure of is the DGH (whatever the io_* barrier name
is). I'd put it in the same __iowrite512_copy() function and remove it
from the driver code. Otherwise when ST64B is added, we have an
unnecessary DGH in the driver. If this does not match the other
__iowrite*_copy() semantics, we can come up with another name. But start
with this for now and document the function.
> 3) A future ST64B and the x86 version would be put under
> __iowrite512_copy()?
Yes, arch-specific override.
> 4) A future ST64B would come with some kind of 'must do 64b copy or
> oops' to support the future HW that must have this instruction? eg
> we already see on Intel that HW must use ENQCMD and nothing else.
I don't agree with the oops part. We can't guarantee it on arm64, ST64B
I think is optional in the architecture. If you do need such guarantees,
we'd need the driver to probe for the feature (e.g. arch_has_...()) and
invoke a new macro. You can't have the __iowrite512_copy() that worked
fine suddenly giving an error just because some driver wants a
guaranteed atomic 64 byte write.
--
Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>,
Mark Rutland <mark.rutland@arm.com>,
Leon Romanovsky <leon@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
linux-rdma@vger.kernel.org, llvm@lists.linux.dev,
Michael Guralnik <michaelgur@mellanox.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
Will Deacon <will@kernel.org>
Subject: Re: [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64
Date: Tue, 5 Dec 2023 19:34:45 +0000 [thread overview]
Message-ID: <ZW97VdHYH3HYVyd5@arm.com> (raw)
In-Reply-To: <20231205175127.GJ2692119@nvidia.com>
On Tue, Dec 05, 2023 at 01:51:27PM -0400, Jason Gunthorpe wrote:
> On Tue, Dec 05, 2023 at 05:21:27PM +0000, Catalin Marinas wrote:
> > On Mon, Dec 04, 2023 at 02:23:30PM -0400, Jason Gunthorpe wrote:
> > > On Mon, Dec 04, 2023 at 05:31:47PM +0000, Catalin Marinas wrote:
> > > > Personally I'd optimise the mempcy_toio() arm64 implementation to do
> > > > STPs if the alignment is right (like we do for classic memcpy()).
> > > > There's a slight overhead for alignment checking but I suspect it would
> > > > be lost as long as you can get the write-combining. Not sure whether the
> > > > interspersed reads in memcpy_toio() would somehow prevent the
> > > > write-combining.
> > >
> > > I understand on these new CPUs anything other than a block of
> > > contiguous STPs is risky to break the WC. I was told we should not
> > > have any loads between them.
> >
> > Classic memcpy does similar tricks with four LDPs in a row before
> > starting to issue the STPs (though there are new LDPs for the next
> > data in-between). But that was tuned for cacheable memory, not sure
> > if something similar would behave well on Normal-NC memory.
>
> Can we conclude a direction here?
>
> 1) I don't want to mess with x86 so we keep a dedicated API
> Are we agreed to call it __iowrite512_copy() and note its special
> alignment limitation?
Sounds fine to me.
> 2) You want to #define __iowrite512_copy() to memcpy_toio() on ARM and
> implement some quad STP optimization for this case?
We can have the generic __iowrite512_copy() do memcpy_toio() and have
the arm64 implement an optimised version.
What I'm not entirely sure of is the DGH (whatever the io_* barrier name
is). I'd put it in the same __iowrite512_copy() function and remove it
from the driver code. Otherwise when ST64B is added, we have an
unnecessary DGH in the driver. If this does not match the other
__iowrite*_copy() semantics, we can come up with another name. But start
with this for now and document the function.
> 3) A future ST64B and the x86 version would be put under
> __iowrite512_copy()?
Yes, arch-specific override.
> 4) A future ST64B would come with some kind of 'must do 64b copy or
> oops' to support the future HW that must have this instruction? eg
> we already see on Intel that HW must use ENQCMD and nothing else.
I don't agree with the oops part. We can't guarantee it on arm64, ST64B
I think is optional in the architecture. If you do need such guarantees,
we'd need the driver to probe for the feature (e.g. arch_has_...()) and
invoke a new macro. You can't have the __iowrite512_copy() that worked
fine suddenly giving an error just because some driver wants a
guaranteed atomic 64 byte write.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-12-05 19:34 UTC|newest]
Thread overview: 136+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-23 19:04 [PATCH rdma-next 0/2] Add and use memcpy_toio_64() Leon Romanovsky
2023-11-23 19:04 ` Leon Romanovsky
2023-11-23 19:04 ` [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64 Leon Romanovsky
2023-11-23 19:04 ` Leon Romanovsky
2023-11-24 10:16 ` Mark Rutland
2023-11-24 10:16 ` Mark Rutland
2023-11-24 12:23 ` Jason Gunthorpe
2023-11-24 12:23 ` Jason Gunthorpe
2023-11-27 12:42 ` Catalin Marinas
2023-11-27 12:42 ` Catalin Marinas
2023-11-27 13:45 ` Jason Gunthorpe
2023-11-27 13:45 ` Jason Gunthorpe
2023-12-04 17:31 ` Catalin Marinas
2023-12-04 17:31 ` Catalin Marinas
2023-12-04 18:23 ` Jason Gunthorpe
2023-12-04 18:23 ` Jason Gunthorpe
2023-12-05 17:21 ` Catalin Marinas
2023-12-05 17:21 ` Catalin Marinas
2023-12-05 17:51 ` Jason Gunthorpe
2023-12-05 17:51 ` Jason Gunthorpe
2023-12-05 19:34 ` Catalin Marinas [this message]
2023-12-05 19:34 ` Catalin Marinas
2023-12-05 19:51 ` Jason Gunthorpe
2023-12-05 19:51 ` Jason Gunthorpe
2023-12-06 11:09 ` Catalin Marinas
2023-12-06 11:09 ` Catalin Marinas
2023-12-06 12:59 ` Jason Gunthorpe
2023-12-06 12:59 ` Jason Gunthorpe
2024-01-16 18:51 ` Jason Gunthorpe
2024-01-16 18:51 ` Jason Gunthorpe
2024-01-17 12:30 ` Mark Rutland
2024-01-17 12:30 ` Mark Rutland
2024-01-17 12:36 ` Jason Gunthorpe
2024-01-17 12:36 ` Jason Gunthorpe
2024-01-17 12:41 ` Jason Gunthorpe
2024-01-17 12:41 ` Jason Gunthorpe
2024-01-17 13:29 ` Mark Rutland
2024-01-17 13:29 ` Mark Rutland
2024-01-23 20:38 ` Catalin Marinas
2024-01-23 20:38 ` Catalin Marinas
2024-01-24 1:27 ` Jason Gunthorpe
2024-01-24 1:27 ` Jason Gunthorpe
2024-01-24 8:26 ` Marc Zyngier
2024-01-24 8:26 ` Marc Zyngier
2024-01-24 13:06 ` Jason Gunthorpe
2024-01-24 13:06 ` Jason Gunthorpe
2024-01-24 13:32 ` Marc Zyngier
2024-01-24 13:32 ` Marc Zyngier
2024-01-24 15:52 ` Jason Gunthorpe
2024-01-24 15:52 ` Jason Gunthorpe
2024-01-24 17:54 ` Catalin Marinas
2024-01-24 17:54 ` Catalin Marinas
2024-01-25 1:29 ` Jason Gunthorpe
2024-01-25 1:29 ` Jason Gunthorpe
2024-01-26 16:15 ` Catalin Marinas
2024-01-26 16:15 ` Catalin Marinas
2024-01-26 17:09 ` Jason Gunthorpe
2024-01-26 17:09 ` Jason Gunthorpe
2024-01-24 11:38 ` Mark Rutland
2024-01-24 11:38 ` Mark Rutland
2024-01-24 12:40 ` Catalin Marinas
2024-01-24 12:40 ` Catalin Marinas
2024-01-24 13:27 ` Jason Gunthorpe
2024-01-24 13:27 ` Jason Gunthorpe
2024-01-24 17:22 ` Catalin Marinas
2024-01-24 17:22 ` Catalin Marinas
2024-01-24 19:26 ` Jason Gunthorpe
2024-01-24 19:26 ` Jason Gunthorpe
2024-01-25 17:43 ` Jason Gunthorpe
2024-01-25 17:43 ` Jason Gunthorpe
2024-01-26 14:56 ` Catalin Marinas
2024-01-26 14:56 ` Catalin Marinas
2024-01-26 15:24 ` Jason Gunthorpe
2024-01-26 15:24 ` Jason Gunthorpe
2024-01-17 14:07 ` Mark Rutland
2024-01-17 14:07 ` Mark Rutland
2024-01-17 15:28 ` Jason Gunthorpe
2024-01-17 15:28 ` Jason Gunthorpe
2024-01-17 16:05 ` Will Deacon
2024-01-17 16:05 ` Will Deacon
2024-01-18 16:18 ` Jason Gunthorpe
2024-01-18 16:18 ` Jason Gunthorpe
2024-01-24 11:31 ` Mark Rutland
2024-01-24 11:31 ` Mark Rutland
2023-11-24 12:58 ` Robin Murphy
2023-11-24 12:58 ` Robin Murphy
2023-11-24 13:45 ` Jason Gunthorpe
2023-11-24 13:45 ` Jason Gunthorpe
2023-11-24 15:32 ` Robin Murphy
2023-11-24 15:32 ` Robin Murphy
2023-11-24 14:10 ` Niklas Schnelle
2023-11-24 14:10 ` Niklas Schnelle
2023-11-24 14:20 ` Jason Gunthorpe
2023-11-24 14:20 ` Jason Gunthorpe
2023-11-24 14:48 ` Niklas Schnelle
2023-11-24 14:48 ` Niklas Schnelle
2023-11-24 14:53 ` Niklas Schnelle
2023-11-24 14:53 ` Niklas Schnelle
2023-11-24 14:55 ` Jason Gunthorpe
2023-11-24 14:55 ` Jason Gunthorpe
2023-11-24 15:59 ` Niklas Schnelle
2023-11-24 15:59 ` Niklas Schnelle
2023-11-24 16:06 ` Jason Gunthorpe
2023-11-24 16:06 ` Jason Gunthorpe
2023-11-27 17:43 ` Niklas Schnelle
2023-11-27 17:43 ` Niklas Schnelle
2023-11-27 17:51 ` Jason Gunthorpe
2023-11-27 17:51 ` Jason Gunthorpe
2023-11-28 16:28 ` Niklas Schnelle
2023-11-28 16:28 ` Niklas Schnelle
2024-01-16 17:33 ` Jason Gunthorpe
2024-01-16 17:33 ` Jason Gunthorpe
2024-01-17 13:20 ` Niklas Schnelle
2024-01-17 13:20 ` Niklas Schnelle
2024-01-17 13:26 ` Jason Gunthorpe
2024-01-17 13:26 ` Jason Gunthorpe
2024-01-17 17:55 ` Jason Gunthorpe
2024-01-17 17:55 ` Jason Gunthorpe
2024-01-18 13:46 ` Niklas Schnelle
2024-01-18 13:46 ` Niklas Schnelle
2024-01-18 14:00 ` Jason Gunthorpe
2024-01-18 14:00 ` Jason Gunthorpe
2024-01-18 15:59 ` Niklas Schnelle
2024-01-18 15:59 ` Niklas Schnelle
2024-01-18 16:21 ` Jason Gunthorpe
2024-01-18 16:21 ` Jason Gunthorpe
2024-01-18 16:25 ` Niklas Schnelle
2024-01-18 16:25 ` Niklas Schnelle
2024-01-19 11:52 ` Niklas Schnelle
2024-01-19 11:52 ` Niklas Schnelle
2024-02-16 12:09 ` Niklas Schnelle
2024-02-16 12:09 ` Niklas Schnelle
2024-02-16 12:39 ` Jason Gunthorpe
2024-02-16 12:39 ` Jason Gunthorpe
2023-11-23 19:04 ` [PATCH rdma-next 2/2] IB/mlx5: Use memcpy_toio_64() for write combining stores Leon Romanovsky
2023-11-23 19:04 ` Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZW97VdHYH3HYVyd5@arm.com \
--to=catalin.marinas@arm.com \
--cc=arnd@arndb.de \
--cc=jgg@nvidia.com \
--cc=leon@kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=mark.rutland@arm.com \
--cc=michaelgur@mellanox.com \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=schnelle@linux.ibm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.