Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Leon Romanovsky <leon@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-rdma@vger.kernel.org, llvm@lists.linux.dev,
	Michael Guralnik <michaelgur@mellanox.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Will Deacon <will@kernel.org>
Subject: Re: [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64
Date: Fri, 24 Nov 2023 08:23:52 -0400	[thread overview]
Message-ID: <20231124122352.GB436702@nvidia.com> (raw)
In-Reply-To: <ZWB373y5XuZDultf@FVFF77S0Q05N>

On Fri, Nov 24, 2023 at 10:16:15AM +0000, Mark Rutland wrote:
> On Thu, Nov 23, 2023 at 09:04:31PM +0200, Leon Romanovsky wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > 
> > The kernel supports write combining IO memory which is commonly used to
> > generate 64 byte TLPs in a PCIe environment. On many CPUs this mechanism
> > is pretty tolerant and a simple C loop will suffice to generate a 64 byte
> > TLP.
> > 
> > However modern ARM64 CPUs are quite sensitive and a compiler generated
> > loop is not enough to reliably generate a 64 byte TLP. Especially given
> > the ARM64 issue that writel() does not codegen anything other than "[xN]"
> > as the address calculation.
> > 
> > These newer CPUs require an orderly consecutive block of stores to work
> > reliably. This is best done with four STP integer instructions (perhaps
> > ST64B in future), or a single ST4 vector instruction.
> > 
> > Provide a new generic function memcpy_toio_64() which should reliably
> > generate the needed instructions for the architecture, assuming address
> > alignment. As the usual need for this operation is performance sensitive a
> > fast inline implementation is preferred.
> 
> There is *no* architectural sequence that is guaranteed to reliably generate a
> 64-byte TLP, and this sequence won't guarnatee that (e.g. even if the CPU
> *always* merged adjacent stores, we can take an interrupt mid-sequence that
> would prevent that).

WC is not guaranteed on any arch, that is well known.

The HW has means to handle fragmented TLPs, it just hurts performance
when it happens. "reliable" here means we'd like to see something like
a > 90% chance of the large TLP instead of the < 1% chance with the C
loop.

Future ARM CPUs have the ST64B instruction which does provide the
architectural guarantee, and x86 has a similar guaranteed instruction
now too. 

> What's the actual requirement here? Is this just for performance?

Yes, just performance.

Jason

  reply	other threads:[~2023-11-24 12:23 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-23 19:04 [PATCH rdma-next 0/2] Add and use memcpy_toio_64() Leon Romanovsky
2023-11-23 19:04 ` [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64 Leon Romanovsky
2023-11-24 10:16   ` Mark Rutland
2023-11-24 12:23     ` Jason Gunthorpe [this message]
2023-11-27 12:42       ` Catalin Marinas
2023-11-27 13:45         ` Jason Gunthorpe
2023-12-04 17:31           ` Catalin Marinas
2023-12-04 18:23             ` Jason Gunthorpe
2023-12-05 17:21               ` Catalin Marinas
2023-12-05 17:51                 ` Jason Gunthorpe
2023-12-05 19:34                   ` Catalin Marinas
2023-12-05 19:51                     ` Jason Gunthorpe
2023-12-06 11:09                       ` Catalin Marinas
2023-12-06 12:59                         ` Jason Gunthorpe
2024-01-16 18:51                           ` Jason Gunthorpe
2024-01-17 12:30                             ` Mark Rutland
2024-01-17 12:36                               ` Jason Gunthorpe
2024-01-17 12:41                                 ` Jason Gunthorpe
2024-01-17 13:29                                 ` Mark Rutland
2024-01-23 20:38                                   ` Catalin Marinas
2024-01-24  1:27                                     ` Jason Gunthorpe
2024-01-24  8:26                                       ` Marc Zyngier
2024-01-24 13:06                                         ` Jason Gunthorpe
2024-01-24 13:32                                           ` Marc Zyngier
2024-01-24 15:52                                             ` Jason Gunthorpe
2024-01-24 17:54                                               ` Catalin Marinas
2024-01-25  1:29                                                 ` Jason Gunthorpe
2024-01-26 16:15                                                   ` Catalin Marinas
2024-01-26 17:09                                                     ` Jason Gunthorpe
2024-01-24 11:38                                     ` Mark Rutland
2024-01-24 12:40                                       ` Catalin Marinas
2024-01-24 13:27                                         ` Jason Gunthorpe
2024-01-24 17:22                                           ` Catalin Marinas
2024-01-24 19:26                                             ` Jason Gunthorpe
2024-01-25 17:43                                               ` Jason Gunthorpe
2024-01-26 14:56                                                 ` Catalin Marinas
2024-01-26 15:24                                                   ` Jason Gunthorpe
2024-01-17 14:07                               ` Mark Rutland
2024-01-17 15:28                                 ` Jason Gunthorpe
2024-01-17 16:05                                   ` Will Deacon
2024-01-18 16:18                                     ` Jason Gunthorpe
2024-01-24 11:31                                       ` Mark Rutland
2023-11-24 12:58   ` Robin Murphy
2023-11-24 13:45     ` Jason Gunthorpe
2023-11-24 15:32       ` Robin Murphy
2023-11-24 14:10   ` Niklas Schnelle
2023-11-24 14:20     ` Jason Gunthorpe
2023-11-24 14:48       ` Niklas Schnelle
2023-11-24 14:53         ` Niklas Schnelle
2023-11-24 14:55         ` Jason Gunthorpe
2023-11-24 15:59           ` Niklas Schnelle
2023-11-24 16:06             ` Jason Gunthorpe
2023-11-27 17:43               ` Niklas Schnelle
2023-11-27 17:51                 ` Jason Gunthorpe
2023-11-28 16:28                   ` Niklas Schnelle
2024-01-16 17:33                     ` Jason Gunthorpe
2024-01-17 13:20                       ` Niklas Schnelle
2024-01-17 13:26                         ` Jason Gunthorpe
2024-01-17 17:55                           ` Jason Gunthorpe
2024-01-18 13:46                             ` Niklas Schnelle
2024-01-18 14:00                               ` Jason Gunthorpe
2024-01-18 15:59                                 ` Niklas Schnelle
2024-01-18 16:21                                   ` Jason Gunthorpe
2024-01-18 16:25                                     ` Niklas Schnelle
2024-01-19 11:52                                       ` Niklas Schnelle
2024-02-16 12:09                                   ` Niklas Schnelle
2024-02-16 12:39                                     ` Jason Gunthorpe
2023-11-23 19:04 ` [PATCH rdma-next 2/2] IB/mlx5: Use memcpy_toio_64() for write combining stores Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231124122352.GB436702@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=arnd@arndb.de \
    --cc=catalin.marinas@arm.com \
    --cc=leon@kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=mark.rutland@arm.com \
    --cc=michaelgur@mellanox.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox