public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Eric Auger <eric.auger@redhat.com>
Cc: eric.auger.pro@gmail.com, joro@8bytes.org,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	will.deacon@arm.com, robin.murphy@arm.com, dwmw2@infradead.org,
	baolu.lu@linux.intel.com, shameerali.kolothum.thodi@huawei.com,
	jean-philippe.brucker@arm.com
Subject: Re: [RFC 0/3] iommu: Reserved regions for IOVAs beyond dma_mask and iommu aperture
Date: Mon, 28 Sep 2020 16:42:24 -0600	[thread overview]
Message-ID: <20200928164224.12350d84@w520.home> (raw)
In-Reply-To: <20200928195037.22654-1-eric.auger@redhat.com>

On Mon, 28 Sep 2020 21:50:34 +0200
Eric Auger <eric.auger@redhat.com> wrote:

> VFIO currently exposes the usable IOVA regions through the
> VFIO_IOMMU_GET_INFO ioctl / VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
> capability. However it fails to take into account the dma_mask
> of the devices within the container. The top limit currently is
> defined by the iommu aperture.

I think that dma_mask is traditionally a DMA API interface for a device
driver to indicate to the DMA layer which mappings are accessible to the
device.  On the other hand, vfio makes use of the IOMMU API where the
driver is in userspace.  That userspace driver has full control of the
IOVA range of the device, therefore dma_mask is mostly irrelevant to
vfio.  I think the issue you're trying to tackle is that the IORT code
is making use of the dma_mask to try to describe a DMA address
limitation imposed by the PCI root bus, living between the endpoint
device and the IOMMU.  Therefore, if the IORT code is exposing a
topology or system imposed device limitation, this seems much more akin
to something like an MSI reserved range, where it's not necessarily the
device or the IOMMU with the limitation, but something that sits
between them.

> So, for instance, if the IOMMU supports up to 48bits, it may give
> the impression the max IOVA is 48b while a device may have a
> dma_mask of 42b. So this API cannot really be used to compute
> the max usable IOVA.
> 
> This patch removes the IOVA region beyond the dma_mask's.

Rather it adds a reserved region accounting for the range above the
device's dma_mask.  I don't think the IOMMU API should be consuming
dma_mask like this though.  For example, what happens in
pci_dma_configure() when there are no OF or ACPI DMA restrictions?  It
appears to me that the dma_mask from whatever previous driver had the
device carries over to the new driver.  That's generally ok for the DMA
API because a driver is required to set the device's DMA mask.  It
doesn't make sense however to blindly consume that dma_mask and export
it via an IOMMU API.  For example I would expect to see different
results depending on whether a host driver has been bound to a device.
It seems the correct IOMMU API approach would be for the IORT code to
specifically register reserved ranges for the device.

> As we start to expose this reserved region in the sysfs file
> /sys/kernel/iommu_groups/<n>/reserved_regions, we also need to
> handle the IOVA range beyond the IOMMU aperture to handle the case
> where the dma_mask would have a higher number of bits than the iommu
> max input address.

Why?  The IOMMU geometry already describes this and vfio combines both
the IOMMU geometry and the device reserved regions when generating the
IOVA ranges?  Who is going to consume this information?  Additionally
it appears that reserved regions will report different information
depending on whether a device is attached to a domain.

> This is a change to the ABI as this reserved region was not yet
> exposed in sysfs /sys/kernel/iommu_groups/<n>/reserved_regions or
> through the VFIO ioctl. At VFIO level we increment the version of
> the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE capability to advertise
> that change.

Is this really an ABI change?  The original entry for reserved regions
includes:

  Not necessarily all reserved regions are listed. This is typically
  used to output direct-mapped, MSI, non mappable regions.

I imagine the intention here was non-mappable relative to the IOMMU,
but non-mappable to the device is essentially what we're including
here.

I'm also concerned about bumping the vfio interface version for the
IOVA range.  We're not changing the interface, we're modifying the
result, and even then only for a fraction of users.  How many users are
potentially broken by that change?  Are we going to bump the version
for everyone any time the result changes on any platform?  Thanks,

Alex


  parent reply	other threads:[~2020-09-28 23:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-28 19:50 [RFC 0/3] iommu: Reserved regions for IOVAs beyond dma_mask and iommu aperture Eric Auger
2020-09-28 19:50 ` [RFC 1/3] iommu: Fix merging in iommu_insert_resv_region Eric Auger
2020-09-28 19:50 ` [RFC 2/3] iommu: Account for dma_mask and iommu aperture in IOVA reserved regions Eric Auger
2020-09-29  6:03   ` Christoph Hellwig
2020-09-29  7:20     ` Auger Eric
2020-09-28 19:50 ` [RFC 3/3] vfio/type1: Increase the version of VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE Eric Auger
2020-09-28 22:42 ` Alex Williamson [this message]
2020-09-29  7:18   ` [RFC 0/3] iommu: Reserved regions for IOVAs beyond dma_mask and iommu aperture Auger Eric
2020-09-29 18:18     ` Alex Williamson
2020-09-30  9:59       ` Auger Eric
2020-10-05 10:44       ` Lorenzo Pieralisi
2020-10-05 13:08         ` Christoph Hellwig
2020-10-06 15:41           ` Auger Eric

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200928164224.12350d84@w520.home \
    --to=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=eric.auger.pro@gmail.com \
    --cc=eric.auger@redhat.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jean-philippe.brucker@arm.com \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox