From: David Laight <david.laight.linux@gmail.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: David Matlack <dmatlack@google.com>,
Alex Williamson <alex@shazbot.org>,
kvm@vger.kernel.org, Leon Romanovsky <leon@kernel.org>,
linux-kselftest@vger.kernel.org, linux-rdma@vger.kernel.org,
Mark Bloch <mbloch@nvidia.com>,
netdev@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
Shuah Khan <shuah@kernel.org>, Tariq Toukan <tariqt@nvidia.com>,
patches@lists.linux.dev
Subject: Re: [PATCH v2 06/11] selftests: Fix arm64 IO barriers to match kernel
Date: Sat, 30 May 2026 10:28:24 +0100 [thread overview]
Message-ID: <20260530102824.65ceb098@pumpkin> (raw)
In-Reply-To: <20260529224442.11d7320d@pumpkin>
On Fri, 29 May 2026 22:44:42 +0100
David Laight <david.laight.linux@gmail.com> wrote:
> On Fri, 29 May 2026 16:29:34 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> > On Fri, May 29, 2026 at 05:55:16PM +0100, David Laight wrote:
...
> > I can't say, this is copied from the kernel and Will made it:
> >
> > arm64: io: Ensure calls to delay routines are ordered against prior readX()
> >
> > A relatively standard idiom for ensuring that a pair of MMIO writes to a
> > device arrive at that device with a specified minimum delay between them
> > is as follows:
> >
> > writel_relaxed(42, dev_base + CTL1);
> > readl(dev_base + CTL1);
> > udelay(10);
> > writel_relaxed(42, dev_base + CTL2);
> >
> > the intention being that the read-back from the device will push the
> > prior write to CTL1, and the udelay will hold up the write to CTL1 until
> > at least 10us have elapsed.
> >
> > Unfortunately, on arm64 where the underlying delay loop is implemented
> > as a read of the architected counter, the CPU does not guarantee
> > ordering from the readl() to the delay loop and therefore the delay loop
> > could in theory be speculated and not provide the desired interval
> > between the two writes.
> >
> > Fix this in a similar manner to PowerPC by introducing a dummy control
> > dependency on the output of readX() which, combined with the ISB in the
> > read of the architected counter, guarantees that a subsequent delay loop
> > can not be executed until the readX() has returned its result.
>
> Hmmm...
>
> Ok so there is some subtlety with the read of the counter that might
> make it all work.
>
> It is better to make the delay loop have a data dependency on the result
> of the readl().
> Something like:
> u32 z = 0;
> OPTIMIZER_HIDE_VAR(z);
> writel_relaxed(42, dev_base + CTL1);
> udelay(10 + (z & readl(dev_base + CTL1)));
> writel_relaxed(42, dev_base + CTL2);
> That avoids the potentially mispredicted branch and only adds instructions
> when a delay follows.
> That sequence is safe for all cpu and doesn't cost much for cpu (like x86)
> where it (probably) isn't needed (maybe unless you patch the scale for udelay
> into the code so there are no memory reads, just code).
>
> Probably best refactored as udelay_depends(10, readl(dev_base + CTL1)).
> Or maybe udelay_after().
Sleeping on it, all the code can be put in udelay().
You just need a read memory barrier, followed by a memory read (of anywhere
'hot') and then use a data dependency (as above) from the second read
into the delay loop.
-- David
next prev parent reply other threads:[~2026-05-30 9:28 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-15 17:29 [PATCH v2 00/11] mlx5 support for VFIO self test Jason Gunthorpe
2026-05-15 17:29 ` [PATCH v2 01/11] net/mlx5: Add IFC structures for CQE and WQE Jason Gunthorpe
2026-05-15 17:29 ` [PATCH v2 02/11] net/mlx5: Move HW constant groups from device.h/cq.h to mlx5_ifc.h Jason Gunthorpe
2026-05-15 17:30 ` [PATCH v2 03/11] net/mlx5: Extract MLX5_SET/GET macros into mlx5_ifc_macros.h Jason Gunthorpe
2026-05-15 17:30 ` [PATCH v2 04/11] net/mlx5: Add ONCE and MMIO accessor variants to mlx5_ifc_macros.h Jason Gunthorpe
2026-05-15 17:30 ` [PATCH v2 05/11] selftests: Add additional kernel functions to tools/include/ Jason Gunthorpe
2026-05-28 18:16 ` David Matlack
2026-05-15 17:30 ` [PATCH v2 06/11] selftests: Fix arm64 IO barriers to match kernel Jason Gunthorpe
2026-05-28 18:13 ` David Matlack
2026-05-29 13:49 ` Jason Gunthorpe
2026-05-29 16:55 ` David Laight
2026-05-29 19:29 ` Jason Gunthorpe
2026-05-29 21:44 ` David Laight
2026-05-30 9:28 ` David Laight [this message]
2026-05-15 17:30 ` [PATCH v2 07/11] vfio: selftests: Allow drivers to specify required region size Jason Gunthorpe
2026-05-28 18:59 ` David Matlack
2026-05-29 17:37 ` Jason Gunthorpe
2026-05-15 17:30 ` [PATCH v2 08/11] vfio: selftests: Add dev_dbg Jason Gunthorpe
2026-05-28 22:02 ` David Matlack
2026-05-15 17:30 ` [PATCH v2 09/11] vfio: selftests: Add mlx5 driver - HW init and command interface Jason Gunthorpe
2026-05-15 17:30 ` [PATCH v2 10/11] vfio: selftests: Add mlx5 driver - data path and memcpy ops Jason Gunthorpe
2026-05-15 17:30 ` [PATCH v2 11/11] vfio: selftests: mlx5 driver - add send_msi support Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260530102824.65ceb098@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=alex@shazbot.org \
--cc=dmatlack@google.com \
--cc=jgg@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=patches@lists.linux.dev \
--cc=saeedm@nvidia.com \
--cc=shuah@kernel.org \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox