From: Alex Williamson <alex.williamson@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Chen Fan <chen.fan.fnst@cn.fujitsu.com>,
izumi taku <izumi.taku@jp.fujitsu.com>,
Cao jin <caoj.fnst@cn.fujitsu.com>,
qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v16 00/14] vfio-pci: pass the aer error to guest
Date: Thu, 4 Feb 2016 15:15:39 -0500 (EST) [thread overview]
Message-ID: <1010754107.24917162.1454616939506.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20160204200350-mutt-send-email-mst@redhat.com>
----- Original Message -----
> On Thu, Feb 04, 2016 at 10:46:52AM -0700, Alex Williamson wrote:
> > On Thu, 4 Feb 2016 13:21:57 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >
> > > On Thu, Feb 04, 2016 at 10:04:01AM +0800, Chen Fan wrote:
> > > >
> > > > On 02/03/2016 09:57 PM, Michael S. Tsirkin wrote:
> > > > >On Wed, Feb 03, 2016 at 04:54:01PM +0800, Chen Fan wrote:
> > > > >>On 01/17/2016 02:34 AM, Michael S. Tsirkin wrote:
> > > > >>>On Tue, Jan 12, 2016 at 10:43:01AM +0800, Cao jin wrote:
> > > > >>>>From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> > > > >>>>
> > > > >>>>For now, for vfio pci passthough devices when qemu receives
> > > > >>>>an error from host aer report, currentlly just terminate the
> > > > >>>>guest, but usually user want to know what error occurred but
> > > > >>>>stopping the guest, so this patches add aer capability support
> > > > >>>>for vfio device, and pass the error to guest, and have guest
> > > > >>>>driver to recover from the error.
> > > > >>>I would like to see a version of this patchset that doesn't
> > > > >>>depend on pci core changes.
> > > > >>>I think that if you make this simplifying assumption:
> > > > >>>
> > > > >>>- all devices on same bus in guest are on same bus in host
> > > > >>>
> > > > >>>then you can handle both reset and hotplug simply in function 0
> > > > >>>since it will belong to vfio.
> > > > >>>
> > > > >>>So we can have a version without pci core changes that simply
> > > > >>>assumes this, and things will just work.
> > > > >>>
> > > > >>>
> > > > >>>Now, if we wanted to enforce this limitation, I think the
> > > > >>>cleanest way would be to add a callback in struct PCIDevice:
> > > > >>>
> > > > >>> bool is_valid_function(PCIDevice *newfunction)
> > > > >>>
> > > > >>>and call it as each function is added.
> > > > >>>This way aer function can validate that each function
> > > > >>>added shares the same bus.
> > > > >>>And this way issues will be detected directly and not when
> > > > >>>function 0 is added.
> > > > >>>
> > > > >>>I would prefer this validation code to be a patch on top so we
> > > > >>>can merge the functionality directly and avoid blocking it while
> > > > >>>we figure out the best api to validate things.
> > > > >>>
> > > > >>>I don't see why making guest topology match host would
> > > > >>>ever be a problem, but if it's required to support
> > > > >>>configurations where these differ, I'd like to see
> > > > >>>an attempt to address that be split out, after aer
> > > > >>>is supported.
> > > > >>Hi Michael,
> > > > >>
> > > > >>Just think about this more, I think we also should check the vfio
> > > > >>devices whether on the same bus at the time of function 0 is
> > > > >>added. because we don't know the affected devices by a bus reset
> > > > >>have already all been assigned to VM.
> > > > >This is something vfio in kernel should check.
> > > > >You can't rely on qemu being well behaved, so don't
> > > > >even try to catch cases which would break host in userspace.
> > > > >
> > > > >qemu should only worry about not breaking guest.
> > > > >
> > > > >
> > > > >>for example, the multi-function's hotplug.
> > > > >>devices on same bus in host are added to VM one by one. when we
> > > > >>test one device, we haven't yet added the other devices.
> > > > >>so I think
> > > > >>the patch should like below. then we could add a
> > > > >>vfio_is_valid_function in vfio
> > > > >>to test each device whether the affected devices on the same bus.
> > > > >>
> > > > >>Thanks,
> > > > >>Chen
> > > > >>
> > > > >>diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > >>index d940f79..7163b56 100644
> > > > >>--- a/hw/pci/pci.c
> > > > >>+++ b/hw/pci/pci.c
> > > > >>@@ -1836,6 +1836,38 @@ PCIDevice *pci_find_device(PCIBus *bus,
> > > > >>int bus_num, uint8_t devfn)
> > > > >> return bus->devices[devfn];
> > > > >> }
> > > > >>
> > > > >>+static int pci_bus_check_devices(PCIBus *bus)
> > > > >>+{
> > > > >>+ PCIDeviceClass *pc;
> > > > >>+ int i, ret = 0;
> > > > >>+
> > > > >>+ for (i = 0; i < ARRAY_SIZE(bus->devices); ++i) {
> > > > >>+ if (!bus->devices[i]) {
> > > > >>+ continue;
> > > > >>+ }
> > > > >>+
> > > > >>+ pc = PCI_DEVICE_GET_CLASS(bus->devices[i]);
> > > > >>+ if (!pc->is_valid_func) {
> > > > >>+ continue;
> > > > >>+ }
> > > > >>+
> > > > >>+ ret = pc->is_valid_func(bus->devices[i], bus);
> > > > >>+ if (!ret) {
> > > > >>+ return -1;
> > > > >>+ }
> > > > >>+ }
> > > > >>+ return 0;
> > > > >>+}
> > > > >>+
> > > > >>+static bool pci_is_valid_function(PCIDevice *pdev, PCIBus *bus)
> > > > >>+{
> > > > >>+ if (pdev->bus == bus) {
> > > > >>+ return true;
> > > > >>+ }
> > > > >>+
> > > > >>+ return false;
> > > > >>+}
> > > > >>+
> > > > >I don't really understand what is this one doing.
> > > > >Why do we need a default function?
> > > > if the vfio driver in kernel can handle the bus reset for any one
> > > > device in qemu without the affected devices assigned. I think
> > > > we don't need this default one.
> > > > BTW, IIRC at present the devices on the same bus in host can
> > > > be assigned to different VM, so if we want to support this kind of
> > > > bus reset for an independent device when enable aer, aren't we
> > > > limiting the case that others devices on the same bus must be
> > > > assigned to current VM?
> > > >
> > > > Thanks,
> > > > Chen
> > >
> > > I don't believe this works at the moment, and
> > > I'd expect kernel to prevent this,
> > > so we should not rely on userspace code for this.
> > > Alex, could you comment please?
> >
> > DMA isolation and bus isolation are separate things. So long as
> > devices on the same bus are DMA isolated then they can be assigned to
> > separate VMs or split between host and VM. However, certain features
> > like bus reset are not available to the user unless they "own" all of
> > the DMA isolated sets affected by a bus reset. The kernel doesn't care
> > how the user is using them, but they must prove they own them through
> > vfio group file descriptors.
>
> OK this makes sense.
> So I think the solution is for userspace to make sure bus reset
> is available before exposing aer to guest.
> For example, attempt a bus reset.
The API includes a mechanism for retrieving the affected devices without
making it necessary to actually perform a bus reset.
> > I thought in previous discussions we decided that unused devices made
> > the problem set more complicated for userspace so we simplified by
> > requiring them to be assigned. For instance imagine a two function
> > device, with DMA isolation between functions, where we only want one
> > function assigned to the VM. QEMU would need to learn to take ownership
> > of the other function without exposing it to the VM simply for the
> > purpose of being able to perform a bus reset.
>
> Hmm this might be a security problem.
How so? We don't simply simply trust that the user previously validated
the ability to do a bus reset and allow it any time they ask, each reset
needs to pass the file descriptors related to the affected devices.
> Ideally we should not touch devices we don't *need* to touch.
> But given kernel requires this at the moment (IIUC)
> QEMU could open the device
> but not expose it to guest until actually asked.
Well, a bus reset counts as needing to touch the device. I don't see
how it could not. Requiring QEMU to add a new mode of holding a device
only for ownership purposes sounds like an expansion of the scope of
these changes, which is exactly what we don't need at this point.
> > Another simplification
> > was to expose them in the same slot, so we don't need to worry about VM
> > configurations where one device could be hot-unplugged and the
> > ownership released, breaking QEMU's ability to do a bus reset on the
> > remaining device.
>
> Same would apply here.
Again, expanding scope versus configuration requirement, I take the
latter. Thanks,
Alex
next prev parent reply other threads:[~2016-02-04 20:15 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-12 2:43 [Qemu-devel] [PATCH v16 00/14] vfio-pci: pass the aer error to guest Cao jin
2016-01-12 2:43 ` [Qemu-devel] [PATCH v16 01/14] vfio: extract vfio_get_hot_reset_info as a single function Cao jin
2016-01-17 12:48 ` Marcel Apfelbaum
2016-01-12 2:43 ` [Qemu-devel] [PATCH v16 02/14] vfio: squeeze out vfio_pci_do_hot_reset for support bus reset Cao jin
2016-01-17 13:01 ` Marcel Apfelbaum
2016-01-12 2:43 ` [Qemu-devel] [PATCH v16 03/14] pcie: modify the capability size assert Cao jin
2016-01-12 2:43 ` [Qemu-devel] [PATCH v16 04/14] vfio: make the 4 bytes aligned for capability size Cao jin
2016-01-17 12:30 ` Marcel Apfelbaum
2016-01-12 2:43 ` [Qemu-devel] [PATCH v16 05/14] vfio: add pcie extanded capability support Cao jin
2016-01-17 13:22 ` Marcel Apfelbaum
2016-01-19 9:44 ` Chen Fan
2016-01-12 2:43 ` [Qemu-devel] [PATCH v16 06/14] aer: impove pcie_aer_init to support vfio device Cao jin
2016-01-12 2:43 ` [Qemu-devel] [PATCH v16 07/14] vfio: add aer support for " Cao jin
2016-01-18 9:12 ` Marcel Apfelbaum
2016-01-19 9:47 ` Chen Fan
2016-01-12 2:43 ` [Qemu-devel] [PATCH v16 08/14] vfio: add check host bus reset is support or not Cao jin
2016-01-18 10:32 ` Marcel Apfelbaum
2016-01-19 9:55 ` Chen Fan
2016-01-12 2:43 ` [Qemu-devel] [PATCH v16 09/14] add check reset mechanism when hotplug vfio device Cao jin
2016-01-18 11:03 ` Marcel Apfelbaum
2016-01-19 1:46 ` Cao jin
2016-01-12 2:43 ` [Qemu-devel] [PATCH v16 10/14] pci: introduce pci bus pre reset Cao jin
2016-01-14 20:36 ` Alex Williamson
2016-01-19 10:15 ` Chen Fan
2016-01-12 2:43 ` [Qemu-devel] [PATCH v16 11/14] vfio: introduce last reset sequence id Cao jin
2016-01-14 20:43 ` Alex Williamson
2016-01-12 2:43 ` [Qemu-devel] [PATCH v16 12/14] pcie_aer: expose pcie_aer_msg() interface Cao jin
2016-01-12 2:43 ` [Qemu-devel] [PATCH v16 13/14] vfio-pci: pass the aer error to guest Cao jin
2016-01-18 10:45 ` Marcel Apfelbaum
2016-01-19 9:27 ` Chen Fan
2016-01-12 2:43 ` [Qemu-devel] [PATCH v16 14/14] vfio: add 'aer' property to expose aercap Cao jin
2016-01-18 10:46 ` Marcel Apfelbaum
2016-01-16 18:34 ` [Qemu-devel] [PATCH v16 00/14] vfio-pci: pass the aer error to guest Michael S. Tsirkin
2016-01-19 9:09 ` Chen Fan
2016-02-03 8:54 ` Chen Fan
2016-02-03 13:57 ` Michael S. Tsirkin
2016-02-04 2:04 ` Chen Fan
2016-02-04 11:21 ` Michael S. Tsirkin
2016-02-04 17:46 ` Alex Williamson
2016-02-04 18:09 ` Michael S. Tsirkin
2016-02-04 20:15 ` Alex Williamson [this message]
2016-02-04 21:58 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1010754107.24917162.1454616939506.JavaMail.zimbra@redhat.com \
--to=alex.williamson@redhat.com \
--cc=caoj.fnst@cn.fujitsu.com \
--cc=chen.fan.fnst@cn.fujitsu.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).