ARM PCI/MSI KVM passthrough with GICv2M

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: eric.auger@linaro.org (Eric Auger)
To: linux-arm-kernel@lists.infradead.org
Subject: ARM PCI/MSI KVM passthrough with GICv2M
Date: Mon, 8 Feb 2016 14:27:41 +0100	[thread overview]
Message-ID: <56B897CD.1000402@linaro.org> (raw)
In-Reply-To: <20160208094826.GA620@cbox>

Hi Alex, Christoffer,
On 02/08/2016 10:48 AM, Christoffer Dall wrote:
> On Fri, Feb 05, 2016 at 11:17:00AM -0700, Alex Williamson wrote:
>> On Fri, 5 Feb 2016 18:32:07 +0100
>> Eric Auger <eric.auger@linaro.org> wrote:
>>
>>> Hi Alex,
>>>
>>> I tried to sketch a proposal for guaranteeing the IRQ integrity when
>>> doing ARM PCI/MSI passthrough with ARM GICv2M msi-controller. This is
>>> based on extended VFIO group viability control, as detailed below.
>>>
>>> As opposed to ARM GICv3 ITS, this MSI controller does *not* support IRQ
>>> remapping. It can expose 1 or more 4kB MSI frame. Each frame contains a
>>> single register where the msi data is written.
>>>
>>> I would be grateful to you if you could tell me whether it makes any sense.
>>>
>>> Thanks in advance
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>>
>>> 1) GICv2m with a single 4kB single frame
>>>    all devices having this msi-controller as msi-parent share this
>>>    single MSI frame. Those devices can work on behalf of the host
>>>    or work on behalf of 1 or more guests (KVM assigned devices). We
>>>    must make sure either the host only or 1 single VM can access to the
>>>    single frame to guarantee interrupt integrity: a device assigned
>>>    to 1 VM should not be able to trigger MSI targeted to the host
>>>    or another VM.
>>>
>>>    I would propose to extend the VFIO notion of group viability.
>>>    Currently a VFIO group is viable if:
>>>    all devices belonging to the same group are bound to a VFIO driver
>>>    or unbound.
>>>
>>>    Let's imagine we extend the viability check as follows:
>>>
>>>    0) keep the current viable check: all the devices belonging to
>>>       the group must be vfio bound or unbound.
>>>    1) retrieve the MSI parent of the device and list all the
>>>       other devices using that MSI controller as MSI-parent (does not
>>>       look straightforward):
>>>    2) they must be VFIO driver bound or unbound as well (meaning
>>>       they are not used by the host). If not, reject device attachment
>>>    - in case they are VFIO bound (a VFIO group is set):
>>>      x if all VFIO containers are the same as the one of the device's
>>>        we try to attach, that's OK. This means the other devices
>>>        use different IOMMU mappings, eventually will target the
>>>        MSI frame but they all work for the same user space client/VM.
>>>      x 1 or more devices has a different container than the device
>>>        under attachment:
>>>        It works on behalf of a different user space client/VM,
>>>        we can't attach the new device. I think there is a case however
>>>        where severals containers can be opened by a single QEMU.
>>>
>>> Of course the dynamic aspects, ie a new device showing up or an unbind
>>> event bring significant complexity.
>>>
>>> 2) GICv2M with multiple 4kB frames
>>>    Each msi-frame is enumerated as msi-controller. The device tree
>>>    statically defines which device is attached to each msi frame.
>>>    In case devices are assigned we cannot change this attachment
>>>    anyway since there might be physical contraints behind.
>>>    So devices likely to be assigned to guests should be linked to a
>>>    different MSI frame than devices that are not.
>>>
>>>    I think extended viability concept can be used as well.
>>>
>>>    This model still is not ideal: in case we have a SR-IOV device
>>>    plugged onto an host bridge attached to a single MSI parent you won't
>>>    be able anyway to have 1 Virtual Function working for host and 1 VF
>>>    working for a guest. Only Interrupt translation (ITS) will bring that
>>>    feature.
>>>
>>> 3) GICv3 ITS
>>>    This one supports interrupt translation service ~ Intel
>>>    IRQ remapping.
>>>    This means a single frame can be used by all devices. A deviceID is
>>>    used exclusively by the host or a guest. I assume the ITS driver
>>>    allocates/populates deviceid interrupt translation table featuring
>>>    separate LPI spaces ie by construction different ITT cannot feature
>>>    same LPIs. So no need to do the extended viability test.
>>>
>>>    The MSI controller should have a property telling whether
>>>    it supports interrupt translation. This kind of property currently
>>>    exists on IOMMU side for INTEL remapping.
>>>
>>
>> Hi Eric,
>>
>> Would anyone be terribly upset if we simply assume the worst case
>> scenario on GICv2m/M, have the IOMMU not claim IOMMU_CAP_INTR_REMAP, and
>> require the user to opt-in via the allow_unsafe_interrupts on the
>> vfio_iommu_type1 module?  That would make it very compatible with what
>> we already do on x86, where it really is all or nothing.  
> 
> meaning either you allow unsafe multiplexing with passthrough in every
> flavor (unsafely) or you don't allow it at all?
that's my understanding. if the iommu does not expose
IOMMU_CAP_INTR_REMAP, the end-user must explicitly turn
allow_unsafe_interrupts on. On ARM we will have the handle the fact the
interrupt translation is handled on interrupt controller side and not on
iommu side though;
>
> I didn't know such on option existed, but it seems to me that this fits
> the bill exactly.
well I think the support of multiple GICv2m MSI frames was devised to
allow safe interrupts but extending the VFIO viability notion as
described above effectively seems a huge work with small benefits since
we don't have much HW featuring multiple frames I am afraid. So I think
it is a good compromise to have a minimal integration with GICv2m and
full feature with best fitted HW, ie. GICv3 ITS.
> 
> 
>> My assumption
>> is that GICv2 would be phased out in favor of GICv3, so there's always
>> a hardware upgrade path to having more complete isolation, but the
>> return on investment for figuring out whether a given device really has
>> this sort of isolation seems pretty low.  Often users already have some
>> degree of trust in the VMs they use for device assignment anyway.  An
>> especially prudent user can still look at the hardware specs for their
>> specific system to understand whether any devices are fully isolated
>> and only make use of those for device assignment.  Does that seem like
>> a reasonable alternative?
>>
> 
> It sounds good to me, that would allow us to release a GICv2m-based
> solution for MSI passthrough on currently available hardware like the
> Seattle.

Sounds good to me too. I am going to respin the kernel series according
to this discussion and previous comments.

Thanks for your comments!

Best Regards

Eric
> 
> Thanks,
> -Christoffer
>

     prev parent reply	other threads:[~2016-02-08 13:27 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-26 13:12 [PATCH 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
2016-01-26 13:12 ` [PATCH 01/10] iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute Eric Auger
2016-01-26 13:12 ` [PATCH 02/10] vfio: expose MSI mapping requirement through VFIO_IOMMU_GET_INFO Eric Auger
2016-01-26 13:12 ` [PATCH 03/10] vfio_iommu_type1: add reserved binding RB tree management Eric Auger
2016-01-26 13:12 ` [PATCH 04/10] vfio: introduce VFIO_IOVA_RESERVED vfio_dma type Eric Auger
2016-01-26 13:12 ` [PATCH 05/10] vfio/type1: attach a reserved iova domain to vfio_domain Eric Auger
2016-01-26 13:12 ` [PATCH 06/10] vfio: introduce vfio_group_alloc_map_/unmap_free_reserved_iova Eric Auger
2016-01-26 16:17   ` kbuild test robot
2016-01-26 16:37     ` Eric Auger
2016-01-26 13:12 ` [PATCH 07/10] vfio: pci: cache the vfio_group in vfio_pci_device Eric Auger
2016-01-26 13:12 ` [PATCH 08/10] vfio: introduce vfio_group_require_msi_mapping Eric Auger
2016-01-26 13:12 ` [PATCH 09/10] vfio-pci: create an iommu mapping for msi address Eric Auger
2016-01-26 14:43   ` kbuild test robot
2016-01-26 15:14     ` Eric Auger
2016-01-26 13:12 ` [PATCH 10/10] vfio: allow the user to register reserved iova range for MSI mapping Eric Auger
2016-01-26 16:42   ` kbuild test robot
2016-01-26 18:32   ` kbuild test robot
2016-01-26 17:25 ` [PATCH 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 Pavel Fedin
2016-01-27  8:52   ` Eric Auger
2016-01-28  7:13     ` Pavel Fedin
2016-01-28  9:50       ` Eric Auger
2016-01-28 21:51 ` Alex Williamson
2016-01-29 14:35   ` Eric Auger
2016-01-29 19:33     ` Alex Williamson
2016-01-29 21:25       ` Eric Auger
2016-02-01 14:03         ` Will Deacon
2016-02-03 12:50           ` Christoffer Dall
2016-02-03 13:10             ` Will Deacon
2016-02-03 15:36               ` Christoffer Dall
     [not found]                 ` <56B4DC97.60904@linaro.org>
2016-02-05 18:17                   ` ARM PCI/MSI KVM passthrough with GICv2M Alex Williamson
2016-02-08  9:48                     ` Christoffer Dall
2016-02-08 13:27                       ` Eric Auger [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B897CD.1000402@linaro.org \
    --to=eric.auger@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).