Re: Cache Invalidation Solution for Nested IOMMU

Linux IOMMU Development
 help / color / mirror / Atom feed

From: Robin Murphy <robin.murphy@arm.com>
To: Nicolin Chen <nicolinc@nvidia.com>, Jason Gunthorpe <jgg@nvidia.com>
Cc: kevin.tian@intel.com, yi.l.liu@intel.com, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, shameerali.kolothum.thodi@huawei.com,
	jean-philippe@linaro.org, iommu@lists.linux.dev
Subject: Re: Cache Invalidation Solution for Nested IOMMU
Date: Mon, 3 Apr 2023 20:15:03 +0100	[thread overview]
Message-ID: <ea6fba01-b483-94e6-8c84-d096dcb64b5a@arm.com> (raw)
In-Reply-To: <ZCrn8PPPyEJ/VqFU@Asurada-Nvidia>

On 2023-04-03 15:51, Nicolin Chen wrote:
> On Mon, Apr 03, 2023 at 11:08:23AM -0300, Jason Gunthorpe wrote:
>> On Sun, Apr 02, 2023 at 05:33:35PM -0700, Nicolin Chen wrote:
>>> The first version is simply to individually forward the entire
>>> command. This can save a few CPU cycles from packing/unpacking
>>> invalidation fields of the commands via a data structure, v.s.
>>> the structure in v1[2].
>>
>> The kernel must validate the SID for the ATS invalidations, we can't
>> just blindly pass it through.
> 
> Yes. I didn't go further with the first version, yet leaving a
> line of comments in the handler: we'd need set/unset_rid_user,
> to validate the SID field of INV_ATC commands, as we discussed.
> 
>> And this simple path needs an explanation how errors are properly
>> handled, eg by making execution synchronous, or someone guaranteeing
>> that errors are impossible.
> 
> Yes. Both versions here execute in a synchronous fashion. The
> error code will be returned in the cache_invalidate_user data
> structure.
> 
>>> Then I added a new mmap interface to share kernel page(s) from the
>>> Driver, to allow QEMU to write all TLBI commands as a single batch.
>>> Then it can initiate the batch invalidation via another synchronous
>>> hypercall.
>>
>> I don't think a mmap is really needed for simple batching, just
>> passing a larger buffer to ioctl is probably good enough
> 
> It wouldn't be a must, yet can omit a copy_from_user() at each
> hypercall? And it also eases VCMDQ a bit.
> 
>> If a SW side is built it should mirror the HW vCMDQ path, not be
>> different.
> 
> The host kernel has the host queue, while the hypervisor fills
> in a guest TLBI queue. Switching between two queues at one SMMU
> CMDQ (HW) requires a very complicated locking mechanism, v.s.
> inserting the batch to the existing host queue. And it probably
> doesn't have a big perf improvement by doing that?
> 
> If SMMU has ECMDQ, it'd allocate a free CMDQ upon availability,
> calling arm_smmu_init_one_queue() and mmapping q->base, then it
> can execute the guest TLBI queue directly, passing that q ptr.

FWIW I don't think that should be visible in a userspace interface. When 
the VMM is just requesting some invalidations in order to emulate some 
commands, it's up to the SMMU driver, or at best between the driver and 
and IOMMUFD, to decide exactly how those requests get executed as 
physical commands - that should not make any difference to the requester 
other than how quickly the requests are processed.

AFAICS this interface can't look like the proper hardware vCMDQ path, 
because the whole point of that will be to configure it in advance, map 
the queue controls directly into the guest, and avoid trapping 
invalidations to the VMM at all. This invalidation request interface is 
a large part of precisely what that path is intended to bypass. I don't 
see much benefit in supporting an additional slightly-accelerated slow 
path where the host avoids a tiny bit of housekeeping by maintaining a 
real vCMDQ *on behalf of* the guest and forwarding trapped commands into 
it instead of just processing them normally as host commands. Or, 
conversely, emulating a vCMDQ *in* the host kernel, in a way which still 
requires traps to bounce through the VMM and back - that just seems 
objectively worse than keeping all the emulation together in one place 
(however, I would concur that a "vhost-style" emulation, using all the 
same interfaces for configuration and error/irq/etc. reporting as the 
real hardware would, might be viable if performance really demands it).

Thanks,
Robin.

next prev parent reply	other threads:[~2023-04-03 19:15 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-03  0:33 Cache Invalidation Solution for Nested IOMMU Nicolin Chen
2023-04-03  7:26 ` Liu, Yi L
2023-04-03  8:39   ` Tian, Kevin
2023-04-03 15:24     ` Nicolin Chen
2023-04-04  2:42       ` Tian, Kevin
2023-04-04  3:12         ` Nicolin Chen
2023-04-03 12:23   ` Jason Gunthorpe
2023-04-03  8:00 ` Tian, Kevin
2023-04-03 14:29   ` Nicolin Chen
2023-04-04  2:15     ` Tian, Kevin
2023-04-04  2:47       ` Nicolin Chen
2023-04-03 14:08 ` Jason Gunthorpe
2023-04-03 14:51   ` Nicolin Chen
2023-04-03 19:15     ` Robin Murphy [this message]
2023-04-04  0:02       ` Nicolin Chen
2023-04-04 16:20         ` Jason Gunthorpe
2023-04-04 16:50           ` Shameerali Kolothum Thodi
2023-04-05 11:57             ` Jason Gunthorpe
2023-04-06  6:23             ` Zhangfei Gao
2023-04-06  6:39               ` Nicolin Chen
2023-04-06 11:40               ` Jason Gunthorpe
2023-04-10  1:08                 ` Nicolin Chen
2023-04-11  9:07                   ` Jean-Philippe Brucker
2023-04-11 11:57                     ` Jason Gunthorpe
2023-04-11 18:39                       ` Nicolin Chen
2023-04-11 18:41                         ` Jason Gunthorpe
2023-04-11 19:02                           ` Nicolin Chen
2023-04-11 18:43                     ` Nicolin Chen
2023-04-12  2:47                   ` Zhangfei Gao
2023-04-12  5:47                     ` Nicolin Chen
2023-05-03 15:14                     ` Shameerali Kolothum Thodi
2023-05-03 23:44                       ` Nicolin Chen
2023-04-05  5:45           ` Nicolin Chen
2023-04-05 11:37             ` Jason Gunthorpe
2023-04-05 15:34               ` Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea6fba01-b483-94e6-8c84-d096dcb64b5a@arm.com \
    --to=robin.murphy@arm.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=eric.auger@redhat.com \
    --cc=iommu@lists.linux.dev \
    --cc=jean-philippe@linaro.org \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=nicolinc@nvidia.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox