From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33159) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCAYH-0006hj-Sh for qemu-devel@nongnu.org; Thu, 24 Dec 2015 13:23:51 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aCAYD-0008CT-5W for qemu-devel@nongnu.org; Thu, 24 Dec 2015 13:23:49 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48923) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCAYC-0008Bo-Rq for qemu-devel@nongnu.org; Thu, 24 Dec 2015 13:23:45 -0500 Date: Thu, 24 Dec 2015 20:23:41 +0200 From: "Michael S. Tsirkin" Message-ID: <20151224202221-mutt-send-email-mst@redhat.com> References: <20151224163132-mutt-send-email-mst@redhat.com> <1450979226.2950.108.camel@redhat.com> <20151224200603-mutt-send-email-mst@redhat.com> <1450981226.2950.111.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <1450981226.2950.111.camel@redhat.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v14 Resend 08/13] vfio: add check host bus reset is support or not List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: chen.fan.fnst@cn.fujitsu.com, Cao jin , qemu-devel@nongnu.org On Thu, Dec 24, 2015 at 11:20:26AM -0700, Alex Williamson wrote: > On Thu, 2015-12-24 at 20:06 +0200, Michael S. Tsirkin wrote: > > On Thu, Dec 24, 2015 at 10:47:06AM -0700, Alex Williamson wrote: > > > On Thu, 2015-12-24 at 16:32 +0200, Michael S. Tsirkin wrote: > > > > On Thu, Dec 17, 2015 at 09:41:49AM +0800, Cao jin wrote: > > > > > From: Chen Fan > > > > >=20 > > > > > when init vfio devices done, we should test all the devices > > > > > supported > > > > > aer whether conflict with others. For each one, get the hot > > > > > reset > > > > > info for the affected device list.=A0=A0For each affected devic= e, > > > > > all > > > > > should attach to the VM and on/below the same bus. also, we > > > > > should > > > > > test > > > > > all of the non-AER supporting vfio-pci devices on or below the > > > > > target > > > > > bus to verify they have a reset mechanism. > > > > >=20 > > > > > Signed-off-by: Chen Fan > > > > > --- > > > > > =A0hw/vfio/pci.c | 236 > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- > > > > > =A0hw/vfio/pci.h |=A0=A0=A01 + > > > > > =A02 files changed, 230 insertions(+), 7 deletions(-) > > > > >=20 > > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > > > > > index d00b0e4..6926dcc 100644 > > > > > --- a/hw/vfio/pci.c > > > > > +++ b/hw/vfio/pci.c > > > > > @@ -1806,6 +1806,216 @@ static int > > > > > vfio_add_std_cap(VFIOPCIDevice > > > > > *vdev, uint8_t pos) > > > > > =A0=A0=A0=A0=A0return 0; > > > > > =A0} > > > > > =A0 > > > > > +static bool vfio_pci_host_slot_match(PCIHostDeviceAddress > > > > > *host1, > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0PCIHostDeviceAddress > > > > > *host2) > > > > > +{ > > > > > +=A0=A0=A0=A0return (host1->domain =3D=3D host2->domain && host= 1->bus =3D=3D > > > > > host2- > > > > > > bus && > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0host1->slot =3D=3D host2->= slot); > > > > > +} > > > > > + > > > > > +static bool vfio_pci_host_match(PCIHostDeviceAddress *host1, > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0PCIHostDeviceAddress *host2) > > > > > +{ > > > > > +=A0=A0=A0=A0return (vfio_pci_host_slot_match(host1, host2) && > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0host1->function =3D=3D hos= t2->function); > > > > > +} > > > > > + > > > > > +struct VFIODeviceFind { > > > > > +=A0=A0=A0=A0PCIDevice *pdev; > > > > > +=A0=A0=A0=A0bool found; > > > > > +}; > > > > > + > > > > > +static void vfio_check_device_noreset(PCIBus *bus, PCIDevice > > > > > *pdev, > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0void *opaque) > > > > > +{ > > > > > +=A0=A0=A0=A0DeviceState *dev =3D DEVICE(pdev); > > > > > +=A0=A0=A0=A0DeviceClass *dc =3D DEVICE_GET_CLASS(dev); > > > > > +=A0=A0=A0=A0VFIOPCIDevice *vdev; > > > > > +=A0=A0=A0=A0struct VFIODeviceFind *find =3D opaque; > > > > > + > > > > > +=A0=A0=A0=A0if (find->found) { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0return; > > > > > +=A0=A0=A0=A0} > > > > > + > > > > > +=A0=A0=A0=A0if (!object_dynamic_cast(OBJECT(dev), "vfio-pci"))= { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0if (!dc->reset) { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0goto found; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0return; > > > > > +=A0=A0=A0=A0} > > > > > +=A0=A0=A0=A0vdev =3D DO_UPCAST(VFIOPCIDevice, pdev, pdev); > > > > > +=A0=A0=A0=A0if (!(vdev->features & VFIO_FEATURE_ENABLE_AER) && > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0!vdev->vbasedev.reset_works) { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0goto found; > > > > > +=A0=A0=A0=A0} > > > > > + > > > > > +=A0=A0=A0=A0return; > > > > > +found: > > > > > +=A0=A0=A0=A0find->pdev =3D pdev; > > > > > +=A0=A0=A0=A0find->found =3D true; > > > > > +} > > > > > + > > > > > +static void device_find(PCIBus *bus, PCIDevice *pdev, void > > > > > *opaque) > > > > > +{ > > > > > +=A0=A0=A0=A0struct VFIODeviceFind *find =3D opaque; > > > > > + > > > > > +=A0=A0=A0=A0if (find->found) { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0return; > > > > > +=A0=A0=A0=A0} > > > > > + > > > > > +=A0=A0=A0=A0if (pdev =3D=3D find->pdev) { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0find->found =3D true; > > > > > +=A0=A0=A0=A0} > > > > > +} > > > > > + > > > > > +static int vfio_check_host_bus_reset(VFIOPCIDevice *vdev) > > > > > +{ > > > > > +=A0=A0=A0=A0PCIBus *bus =3D vdev->pdev.bus; > > > > > +=A0=A0=A0=A0struct vfio_pci_hot_reset_info *info =3D NULL; > > > > > +=A0=A0=A0=A0struct vfio_pci_dependent_device *devices; > > > > > +=A0=A0=A0=A0VFIOGroup *group; > > > > > +=A0=A0=A0=A0struct VFIODeviceFind find; > > > > > +=A0=A0=A0=A0int ret, i; > > > > > + > > > > > +=A0=A0=A0=A0ret =3D vfio_get_hot_reset_info(vdev, &info); > > > > > +=A0=A0=A0=A0if (ret) { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0error_report("vfio: Cannot enable AER = for device %s," > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= " device does not support hot reset.", > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= vdev->vbasedev.name); > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0goto out; > > > > > +=A0=A0=A0=A0} > > > > > + > > > > > +=A0=A0=A0=A0/* List all affected devices by bus reset */ > > > > > +=A0=A0=A0=A0devices =3D &info->devices[0]; > > > > > + > > > > > +=A0=A0=A0=A0/* Verify that we have all the groups required */ > > > > > +=A0=A0=A0=A0for (i =3D 0; i < info->count; i++) { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0PCIHostDeviceAddress host; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0VFIOPCIDevice *tmp; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0VFIODevice *vbasedev_iter; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0bool found =3D false; > > > > > + > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0host.domain =3D devices[i].segment; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0host.bus =3D devices[i].bus; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0host.slot =3D PCI_SLOT(devices[i].devf= n); > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0host.function =3D PCI_FUNC(devices[i].= devfn); > > > > > + > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0/* Skip the current device */ > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0if (vfio_pci_host_match(&host, &vdev->= host)) { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0continue; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > + > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0/* Ensure we own the group of the affe= cted device */ > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0QLIST_FOREACH(group, &vfio_group_list,= next) { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (group->groupid =3D=3D = devices[i].group_id) { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0break; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > + > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0if (!group) { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0error_report("vfio: Cannot= enable AER for device > > > > > %s, " > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0"depends on group %d which is not > > > > > owned.", > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0vdev->vbasedev.name, > > > > > devices[i].group_id); > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0ret =3D -1; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0goto out; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > + > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0/* Ensure affected devices for reset o= n/blow the bus > > > > > */ > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0QLIST_FOREACH(vbasedev_iter, &group->d= evice_list, > > > > > next) { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (vbasedev_iter->type !=3D= VFIO_DEVICE_TYPE_PCI) { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0continue; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0tmp =3D container_of(vbase= dev_iter, VFIOPCIDevice, > > > > > vbasedev); > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (vfio_pci_host_match(&h= ost, &tmp->host)) { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0PCIDevice *pci= =3D PCI_DEVICE(tmp); > > > > > + > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0/* > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0* For multi= function device, due to vfio > > > > > driver > > > > > signal all > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0* functions= under the upstream link of the > > > > > end > > > > > point. here > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0* we valida= te all functions whether enable > > > > > AER. > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0*/ > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (vfio_pci_h= ost_slot_match(&vdev->host, > > > > > &tmp- > > > > > > host) && > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0!(= tmp->features & > > > > > VFIO_FEATURE_ENABLE_AER)) { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0er= ror_report("vfio: Cannot enable AER for > > > > > device %s, on same slot" > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0" the dependent device %s > > > > > which > > > > > does not enable AER.", > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0vdev->vbasedev.name, tmp- > > > > > > vbasedev.name); > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0re= t =3D -1; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0go= to out; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > + > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0find.pdev =3D = pci; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0find.found =3D= false; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0pci_for_each_d= evice(bus, pci_bus_num(bus), > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0device_find, &find); > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (!find.foun= d) { > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0er= ror_report("vfio: Cannot enable AER for > > > > > device %s, " > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0"the dependent device %s is > > > > > not > > > > > under the same bus", > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0vdev->vbasedev.name, tmp- > > > > > > vbasedev.name); > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0re= t =3D -1; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0go= to out; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0found =3D true= ; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0break; > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > + > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0/* Ensure all affected devices assigne= d to VM */ > > > >=20 > > > > I am puzzled. > > > > Does not kernel enforce this already? > > > > If not it's a security problem. > > > > If yes why does userspace need to check this? > > >=20 > > > DMA isolation and bus level isolation are separate concepts. =A0Eac= h > > > function of a multi-function device can have DMA isolation, but a > > > user > > > needs to own all of the functions affected by a bus reset in order > > > to > > > perform one. =A0An AER configuration can only be created if the use= r > > > can > > > translate a guest bus reset into a host bus reset and therefore > > > needs > > > to test whether it has the permissions to do so. =A0I believe over > > > the > > > course of reviews we've also added some simplifying constraints > > > around > > > this to reduce the problem set, things like all the groups being > > > assigned rather than just owned by the user. =A0However, I believe > > > the > > > kernel is sound in how it provides security for bus resets. > > > =A0Thanks, > > >=20 > > > Alex > >=20 > > Yes, sounds good. > >=20 > > So how about just trying to do bus reset at setup time? > > If kernel allows this, we know it is safe ... >=20 > The host may support hotplug, what's possible at setup time may not be > possible when an error occurs. How does this patch help solve this problem? > It's unlikely, but worth considering I > think. I suspect vfio will have to solve this in kernel (e.g. automatically add all new devices in the same group wrt reset). > =A0Thanks, >=20 > Alex