Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Niklas Schnelle <schnelle@linux.ibm.com>,
	Leon Romanovsky <leon@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-rdma@vger.kernel.org, llvm@lists.linux.dev,
	Michael Guralnik <michaelgur@mellanox.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Will Deacon <will@kernel.org>
Subject: Re: [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64
Date: Wed, 24 Jan 2024 21:29:24 -0400	[thread overview]
Message-ID: <20240125012924.GL1455070@nvidia.com> (raw)
In-Reply-To: <ZbFO6ZXq99AWerlQ@arm.com>

On Wed, Jan 24, 2024 at 05:54:49PM +0000, Catalin Marinas wrote:
> On Wed, Jan 24, 2024 at 11:52:25AM -0400, Jason Gunthorpe wrote:
> > On Wed, Jan 24, 2024 at 01:32:22PM +0000, Marc Zyngier wrote:
> > > What I'm saying is that there are way to make it better without
> > > breaking your particular toy workload which, as important as it may be
> > > to *you*, doesn't cover everybody's use case.
> > 
> > Please, do we need the "toy" stuff? The industry is spending 10's of
> > billions of dollars right now to run "my workload". Currently not
> > widely on ARM servers, but we are all hoping ARM can succeed here,
> > right?
> > 
> > I still don't know what you mean by "better". There are several issues
> > now
> > 
> > 1) This series, where WC doesn't trigger on new cores. Maybe 8x STR
> >    will fix it, but it is not better performance wise than 4x STP.
> 
> It would be good to know. If the performance difference is significant,
> we can revisit. I'm not keen on using alternatives here without backing
> it up by numbers (do we even have a way to detect whether Linux is
> running natively or not? we may have to invent something).

I don't have a setup to measure performance, mlx5 is not using it in a
performance path. The other drivers in the tree are. I feel bad about
hobbling them.

> > 2) Userspace does ST4 to MMIO memory, and the VMM can't explode
> >    because of this. Replacing the ST4 with 8x STR is NOT better,
> >    that would be a big performance downside, especially for the
> >    quirky hi-silicon hardware.
> 
> I was hoping KVM injects an error into the guest rather than killing it
> but at a quick look I couldn't find it. The kvm_handle_guest_abort() ->
> io_mem_abort() ends up returning -ENOSYS while handle_trap_exceptions()
> only understands handled or not (like 1 or 0). Well, maybe I didn't look
> deep enough.

It looks to me like qemu turns on the KVM_CAP_ARM_NISV_TO_USER and
then when it gets a NISV it always converts it to a data abort to the
guest. See kvm_arm_handle_dabt_nisv() in qemu. So it is just a
correctness issue, not a 'VM userspace can crash the VMM' security
problem.

The reason we've never seen this fault in any of our testing is
because the whole system is designed to have qemu back vMMIO space
that is under hot path use by only a VFIO memslot. ie it never drops
the memslot and forces emulation. (KVM has no issue to handle a S2
abort if a memslot is present, obviously)

VFIO IO emulation is used to cover corner cases and establish a slow
technical correctness. It is not fast path. Avoid this if you want any
sort of performance.

Thus, IMHO, doing IO emulation for VFIO that doesn't support all the
instructions actual existing SW uses to do IO is hard to justify. We
are already on a slow path that only exists for technical correctness,
it should be perfect. It is perfect on x86 because x86 KVM does SW
instruction decode and emulation. ARM could too, but doesn't.

To put it in a practical example, I predict that if someone steps
outside our "engineered" box and runs a 64k page size hypervisor
kernel with a mlx5 device that is not engineered for 64K page size
they will get a MMIO BAR layout where the 64k page that covers the MSI
items will overlap with hot path addresses. The existing user space
stack could issue ST4's to hot path addresses within that emulated 64k
of vMMIO and explode. 4k page size hypervisors avoid this because the
typical mlx5 device has a BAR layout with a 4k granule in mind.

Jason

  reply	other threads:[~2024-01-25  1:29 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-23 19:04 [PATCH rdma-next 0/2] Add and use memcpy_toio_64() Leon Romanovsky
2023-11-23 19:04 ` [PATCH rdma-next 1/2] arm64/io: add memcpy_toio_64 Leon Romanovsky
2023-11-24 10:16   ` Mark Rutland
2023-11-24 12:23     ` Jason Gunthorpe
2023-11-27 12:42       ` Catalin Marinas
2023-11-27 13:45         ` Jason Gunthorpe
2023-12-04 17:31           ` Catalin Marinas
2023-12-04 18:23             ` Jason Gunthorpe
2023-12-05 17:21               ` Catalin Marinas
2023-12-05 17:51                 ` Jason Gunthorpe
2023-12-05 19:34                   ` Catalin Marinas
2023-12-05 19:51                     ` Jason Gunthorpe
2023-12-06 11:09                       ` Catalin Marinas
2023-12-06 12:59                         ` Jason Gunthorpe
2024-01-16 18:51                           ` Jason Gunthorpe
2024-01-17 12:30                             ` Mark Rutland
2024-01-17 12:36                               ` Jason Gunthorpe
2024-01-17 12:41                                 ` Jason Gunthorpe
2024-01-17 13:29                                 ` Mark Rutland
2024-01-23 20:38                                   ` Catalin Marinas
2024-01-24  1:27                                     ` Jason Gunthorpe
2024-01-24  8:26                                       ` Marc Zyngier
2024-01-24 13:06                                         ` Jason Gunthorpe
2024-01-24 13:32                                           ` Marc Zyngier
2024-01-24 15:52                                             ` Jason Gunthorpe
2024-01-24 17:54                                               ` Catalin Marinas
2024-01-25  1:29                                                 ` Jason Gunthorpe [this message]
2024-01-26 16:15                                                   ` Catalin Marinas
2024-01-26 17:09                                                     ` Jason Gunthorpe
2024-01-24 11:38                                     ` Mark Rutland
2024-01-24 12:40                                       ` Catalin Marinas
2024-01-24 13:27                                         ` Jason Gunthorpe
2024-01-24 17:22                                           ` Catalin Marinas
2024-01-24 19:26                                             ` Jason Gunthorpe
2024-01-25 17:43                                               ` Jason Gunthorpe
2024-01-26 14:56                                                 ` Catalin Marinas
2024-01-26 15:24                                                   ` Jason Gunthorpe
2024-01-17 14:07                               ` Mark Rutland
2024-01-17 15:28                                 ` Jason Gunthorpe
2024-01-17 16:05                                   ` Will Deacon
2024-01-18 16:18                                     ` Jason Gunthorpe
2024-01-24 11:31                                       ` Mark Rutland
2023-11-24 12:58   ` Robin Murphy
2023-11-24 13:45     ` Jason Gunthorpe
2023-11-24 15:32       ` Robin Murphy
2023-11-24 14:10   ` Niklas Schnelle
2023-11-24 14:20     ` Jason Gunthorpe
2023-11-24 14:48       ` Niklas Schnelle
2023-11-24 14:53         ` Niklas Schnelle
2023-11-24 14:55         ` Jason Gunthorpe
2023-11-24 15:59           ` Niklas Schnelle
2023-11-24 16:06             ` Jason Gunthorpe
2023-11-27 17:43               ` Niklas Schnelle
2023-11-27 17:51                 ` Jason Gunthorpe
2023-11-28 16:28                   ` Niklas Schnelle
2024-01-16 17:33                     ` Jason Gunthorpe
2024-01-17 13:20                       ` Niklas Schnelle
2024-01-17 13:26                         ` Jason Gunthorpe
2024-01-17 17:55                           ` Jason Gunthorpe
2024-01-18 13:46                             ` Niklas Schnelle
2024-01-18 14:00                               ` Jason Gunthorpe
2024-01-18 15:59                                 ` Niklas Schnelle
2024-01-18 16:21                                   ` Jason Gunthorpe
2024-01-18 16:25                                     ` Niklas Schnelle
2024-01-19 11:52                                       ` Niklas Schnelle
2024-02-16 12:09                                   ` Niklas Schnelle
2024-02-16 12:39                                     ` Jason Gunthorpe
2023-11-23 19:04 ` [PATCH rdma-next 2/2] IB/mlx5: Use memcpy_toio_64() for write combining stores Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240125012924.GL1455070@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=arnd@arndb.de \
    --cc=catalin.marinas@arm.com \
    --cc=leon@kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=michaelgur@mellanox.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=schnelle@linux.ibm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox