linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Justin Stitt <justinstitt@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Leon Romanovsky <leon@kernel.org>,
	linux-rdma@vger.kernel.org, linux-s390@vger.kernel.org,
	llvm@lists.linux.dev, Ingo Molnar <mingo@redhat.com>,
	Bill Wendling <morbo@google.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	netdev@vger.kernel.org, Paolo Abeni <pabeni@redhat.com>,
	Salil Mehta <salil.mehta@huawei.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	x86@kernel.org, Yisen Zhuang <yisen.zhuang@huawei.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Leon Romanovsky <leonro@mellanox.com>,
	linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	Mark Rutland <mark.rutland@arm.com>,
	Michael Guralnik <michaelgur@mellanox.com>,
	patches@lists.linux.dev, Niklas Schnelle <schnelle@linux.ibm.com>,
	Jijie Shao <shaojijie@huawei.com>
Subject: Re: [PATCH v3 6/6] IB/mlx5: Use __iowrite64_copy() for write combining stores
Date: Tue, 15 Jul 2025 08:52:00 -0300	[thread overview]
Message-ID: <20250715115200.GJ2067380@nvidia.com> (raw)
In-Reply-To: <aHYqPRqgcl5DQOpq@willie-the-truck>

On Tue, Jul 15, 2025 at 11:15:25AM +0100, Will Deacon wrote:
> > Since STP was rejected alread we've only tested the Neon version. It
> > does make a huge improvement, but it still somehow fails to combine
> > rarely sometimes. The CPU is really bad at this :(
> 
> I think the thread was from last year so I've forgotten most of the
> details, but wasn't STP rejected because it wasn't virtualisable? 

Yes, that was the claim.

> In which case, doesn't NEON suffer from exactly the same (or possibly
> worse) problem?

In general yes, in specific no.

mlx5 (and other RDMA devices) have long used Neon for MMIO in
userspace, so any VMM assigning mlx5 devices simply must make this
work - it is already not optional. So we know that all VMs out there
with mlx5 support neon for mlx5, and it is safe for mlx5 to use.

Typically this is trivally done in a VMM by never emulating mlx5's
MMIO space. If the VMM takes a fault on a MMIO page it fixes the fault
and restarts the neon instruction.

The generality was the notion that there could be other devices in a
VM that are fully emulated and using these challenging instructions
would break the simple emulation. This is why the general purpose
__iowrite64_copy() didn't use STP.

> Also, have you managed to investigate why the CPU tends not to get this
> right? 

I have asked but our CPU architects have said it is too complex to
analyze, but they admit it doesn't work entirely well :(

The belief is some micro-architectural condition is breaking it as we
see even neon instructions failing during every test.

They say it is fully fixed with ST64B in the future.

> Do we e.g. end up taking interrupts/exceptions while the self
> test is running or something like that?

I doubt it, the test is running in kernel mode during boot for
hundreds of iterations. An interrupt on every interation is not
likely. Any single successful combine is a pass for the test.

Even an interrupt shouldn't disrupt a single instruction Neon store,
yet we can still mesure a low rate of neon failures.

> Sorry for the wall of questions!

No worries! It's weird and definately complicated.

Jason

  reply	other threads:[~2025-07-15 11:52 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-11 16:46 [PATCH v3 0/6] Fix mlx5 write combining support on new ARM64 cores Jason Gunthorpe
2024-04-11 16:46 ` [PATCH v3 1/6] x86: Stop using weak symbols for __iowrite32_copy() Jason Gunthorpe
2024-04-11 20:24   ` Arnd Bergmann
2024-04-11 16:46 ` [PATCH v3 2/6] s390: Implement __iowrite32_copy() Jason Gunthorpe
2024-04-11 16:46 ` [PATCH v3 3/6] s390: Stop using weak symbols for __iowrite64_copy() Jason Gunthorpe
2024-04-11 20:23   ` Arnd Bergmann
2024-04-11 16:46 ` [PATCH v3 4/6] arm64/io: Provide a WC friendly __iowriteXX_copy() Jason Gunthorpe
2024-04-11 16:46 ` [PATCH v3 5/6] net: hns3: Remove io_stop_wc() calls after __iowrite64_copy() Jason Gunthorpe
2024-04-11 16:46 ` [PATCH v3 6/6] IB/mlx5: Use __iowrite64_copy() for write combining stores Jason Gunthorpe
2024-04-16  8:29   ` Leon Romanovsky
2025-07-14 21:55   ` Jason Gunthorpe
2025-07-15  5:57     ` Leon Romanovsky
2025-07-15 10:15     ` Will Deacon
2025-07-15 11:52       ` Jason Gunthorpe [this message]
2025-07-18 18:10         ` Catalin Marinas
2025-07-18 20:00           ` Jason Gunthorpe
2024-04-23  0:18 ` [PATCH v3 0/6] Fix mlx5 write combining support on new ARM64 cores Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250715115200.GJ2067380@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=borntraeger@linux.ibm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hpa@zytor.com \
    --cc=justinstitt@google.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@mellanox.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=mark.rutland@arm.com \
    --cc=michaelgur@mellanox.com \
    --cc=mingo@redhat.com \
    --cc=morbo@google.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=patches@lists.linux.dev \
    --cc=salil.mehta@huawei.com \
    --cc=schnelle@linux.ibm.com \
    --cc=shaojijie@huawei.com \
    --cc=svens@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yisen.zhuang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).