From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47220) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCBmO-0002cT-23 for qemu-devel@nongnu.org; Thu, 24 Dec 2015 14:42:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aCBmK-0002OE-P2 for qemu-devel@nongnu.org; Thu, 24 Dec 2015 14:42:27 -0500 Received: from mx1.redhat.com ([209.132.183.28]:54414) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCBmK-0002O1-GP for qemu-devel@nongnu.org; Thu, 24 Dec 2015 14:42:24 -0500 Date: Thu, 24 Dec 2015 21:42:20 +0200 From: "Michael S. Tsirkin" Message-ID: <20151224213836-mutt-send-email-mst@redhat.com> References: <20151224163132-mutt-send-email-mst@redhat.com> <1450979226.2950.108.camel@redhat.com> <20151224200603-mutt-send-email-mst@redhat.com> <1450981226.2950.111.camel@redhat.com> <20151224202221-mutt-send-email-mst@redhat.com> <1450982475.2950.116.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <1450982475.2950.116.camel@redhat.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v14 Resend 08/13] vfio: add check host bus reset is support or not List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: chen.fan.fnst@cn.fujitsu.com, Cao jin , qemu-devel@nongnu.org On Thu, Dec 24, 2015 at 11:41:15AM -0700, Alex Williamson wrote: > On Thu, 2015-12-24 at 20:23 +0200, Michael S. Tsirkin wrote: > > On Thu, Dec 24, 2015 at 11:20:26AM -0700, Alex Williamson wrote: > > > On Thu, 2015-12-24 at 20:06 +0200, Michael S. Tsirkin wrote: > > > > On Thu, Dec 24, 2015 at 10:47:06AM -0700, Alex Williamson wrote: > > > > > On Thu, 2015-12-24 at 16:32 +0200, Michael S. Tsirkin wrote: > > > > > > On Thu, Dec 17, 2015 at 09:41:49AM +0800, Cao jin wrote: > > > > > > > From: Chen Fan > > > > > > >=20 > > > > > > > when init vfio devices done, we should test all the devices > > > > > > > supported > > > > > > > aer whether conflict with others. For each one, get the hot > > > > > > > reset > > > > > > > info for the affected device list.=A0=A0For each affected > > > > > > > device, > > > > > > > all > > > > > > > should attach to the VM and on/below the same bus. also, we > > > > > > > should > > > > > > > test > > > > > > > all of the non-AER supporting vfio-pci devices on or below > > > > > > > the > > > > > > > target > > > > > > > bus to verify they have a reset mechanism. > > > > > > >=20 > > > > > > > Signed-off-by: Chen Fan > > > > > > > --- > > > > > > > =A0hw/vfio/pci.c | 236 > > > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- > > > > > > > =A0hw/vfio/pci.h |=A0=A0=A01 + > > > > > > > =A02 files changed, 230 insertions(+), 7 deletions(-) > > > > > > >=20 > > > > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > > > > > > > index d00b0e4..6926dcc 100644 > > > > > > > --- a/hw/vfio/pci.c > > > > > > > +++ b/hw/vfio/pci.c > > > > > > > @@ -1806,6 +1806,216 @@ static int > > > > > > > vfio_add_std_cap(VFIOPCIDevice > > > > > > > *vdev, uint8_t pos) > > > > > > > =A0=A0=A0=A0=A0return 0; > > > > > > > =A0} > > > > > > > =A0 > > > > > > > +static bool vfio_pci_host_slot_match(PCIHostDeviceAddress > > > > > > > *host1, > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0PCIHostDeviceAddress > > > > > > > *host2) > > > > > > > +{ > > > > > > > +=A0=A0=A0=A0return (host1->domain =3D=3D host2->domain && = host1->bus > > > > > > > =3D=3D > > > > > > > host2- > > > > > > > > bus && > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0host1->slot =3D=3D hos= t2->slot); > > > > > > > +} > > > > > > > + > > > > > > > +static bool vfio_pci_host_match(PCIHostDeviceAddress > > > > > > > *host1, > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0PCIHostDeviceAddress > > > > > > > *host2) > > > > > > > +{ > > > > > > > +=A0=A0=A0=A0return (vfio_pci_host_slot_match(host1, host2)= && > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0host1->function =3D=3D= host2->function); > > > > > > > +} > > > > > > > + > > > > > > > +struct VFIODeviceFind { > > > > > > > +=A0=A0=A0=A0PCIDevice *pdev; > > > > > > > +=A0=A0=A0=A0bool found; > > > > > > > +}; > > > > > > > + > > > > > > > +static void vfio_check_device_noreset(PCIBus *bus, > > > > > > > PCIDevice > > > > > > > *pdev, > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0void *opaque) > > > > > > > +{ > > > > > > > +=A0=A0=A0=A0DeviceState *dev =3D DEVICE(pdev); > > > > > > > +=A0=A0=A0=A0DeviceClass *dc =3D DEVICE_GET_CLASS(dev); > > > > > > > +=A0=A0=A0=A0VFIOPCIDevice *vdev; > > > > > > > +=A0=A0=A0=A0struct VFIODeviceFind *find =3D opaque; > > > > > > > + > > > > > > > +=A0=A0=A0=A0if (find->found) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0return; > > > > > > > +=A0=A0=A0=A0} > > > > > > > + > > > > > > > +=A0=A0=A0=A0if (!object_dynamic_cast(OBJECT(dev), "vfio-pc= i")) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0if (!dc->reset) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0goto found; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0return; > > > > > > > +=A0=A0=A0=A0} > > > > > > > +=A0=A0=A0=A0vdev =3D DO_UPCAST(VFIOPCIDevice, pdev, pdev); > > > > > > > +=A0=A0=A0=A0if (!(vdev->features & VFIO_FEATURE_ENABLE_AER= ) && > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0!vdev->vbasedev.reset_works) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0goto found; > > > > > > > +=A0=A0=A0=A0} > > > > > > > + > > > > > > > +=A0=A0=A0=A0return; > > > > > > > +found: > > > > > > > +=A0=A0=A0=A0find->pdev =3D pdev; > > > > > > > +=A0=A0=A0=A0find->found =3D true; > > > > > > > +} > > > > > > > + > > > > > > > +static void device_find(PCIBus *bus, PCIDevice *pdev, void > > > > > > > *opaque) > > > > > > > +{ > > > > > > > +=A0=A0=A0=A0struct VFIODeviceFind *find =3D opaque; > > > > > > > + > > > > > > > +=A0=A0=A0=A0if (find->found) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0return; > > > > > > > +=A0=A0=A0=A0} > > > > > > > + > > > > > > > +=A0=A0=A0=A0if (pdev =3D=3D find->pdev) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0find->found =3D true; > > > > > > > +=A0=A0=A0=A0} > > > > > > > +} > > > > > > > + > > > > > > > +static int vfio_check_host_bus_reset(VFIOPCIDevice *vdev) > > > > > > > +{ > > > > > > > +=A0=A0=A0=A0PCIBus *bus =3D vdev->pdev.bus; > > > > > > > +=A0=A0=A0=A0struct vfio_pci_hot_reset_info *info =3D NULL; > > > > > > > +=A0=A0=A0=A0struct vfio_pci_dependent_device *devices; > > > > > > > +=A0=A0=A0=A0VFIOGroup *group; > > > > > > > +=A0=A0=A0=A0struct VFIODeviceFind find; > > > > > > > +=A0=A0=A0=A0int ret, i; > > > > > > > + > > > > > > > +=A0=A0=A0=A0ret =3D vfio_get_hot_reset_info(vdev, &info); > > > > > > > +=A0=A0=A0=A0if (ret) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0error_report("vfio: Cannot enable = AER for device > > > > > > > %s," > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0" device does not support hot > > > > > > > reset.", > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0vdev->vbasedev.name); > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0goto out; > > > > > > > +=A0=A0=A0=A0} > > > > > > > + > > > > > > > +=A0=A0=A0=A0/* List all affected devices by bus reset */ > > > > > > > +=A0=A0=A0=A0devices =3D &info->devices[0]; > > > > > > > + > > > > > > > +=A0=A0=A0=A0/* Verify that we have all the groups required= */ > > > > > > > +=A0=A0=A0=A0for (i =3D 0; i < info->count; i++) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0PCIHostDeviceAddress host; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0VFIOPCIDevice *tmp; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0VFIODevice *vbasedev_iter; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0bool found =3D false; > > > > > > > + > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0host.domain =3D devices[i].segment= ; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0host.bus =3D devices[i].bus; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0host.slot =3D PCI_SLOT(devices[i].= devfn); > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0host.function =3D PCI_FUNC(devices= [i].devfn); > > > > > > > + > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0/* Skip the current device */ > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0if (vfio_pci_host_match(&host, &vd= ev->host)) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0continue; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > > > + > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0/* Ensure we own the group of the = affected device > > > > > > > */ > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0QLIST_FOREACH(group, &vfio_group_l= ist, next) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (group->groupid =3D= =3D devices[i].group_id) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0break; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > > > + > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0if (!group) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0error_report("vfio: Ca= nnot enable AER for > > > > > > > device > > > > > > > %s, " > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0"depends on group %d which is not > > > > > > > owned.", > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0vdev->vbasedev.name, > > > > > > > devices[i].group_id); > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0ret =3D -1; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0goto out; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > > > + > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0/* Ensure affected devices for res= et on/blow the > > > > > > > bus > > > > > > > */ > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0QLIST_FOREACH(vbasedev_iter, &grou= p->device_list, > > > > > > > next) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (vbasedev_iter->typ= e !=3D > > > > > > > VFIO_DEVICE_TYPE_PCI) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0continue; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0tmp =3D container_of(v= basedev_iter, > > > > > > > VFIOPCIDevice, > > > > > > > vbasedev); > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (vfio_pci_host_matc= h(&host, &tmp->host)) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0PCIDevice = *pci =3D PCI_DEVICE(tmp); > > > > > > > + > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0/* > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0* For m= ultifunction device, due to vfio > > > > > > > driver > > > > > > > signal all > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0* funct= ions under the upstream link of > > > > > > > the > > > > > > > end > > > > > > > point. here > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0* we va= lidate all functions whether > > > > > > > enable > > > > > > > AER. > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0*/ > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (vfio_p= ci_host_slot_match(&vdev->host, > > > > > > > &tmp- > > > > > > > > host) && > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= !(tmp->features & > > > > > > > VFIO_FEATURE_ENABLE_AER)) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= error_report("vfio: Cannot enable AER > > > > > > > for > > > > > > > device %s, on same slot" > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0" the dependent device %s > > > > > > > which > > > > > > > does not enable AER.", > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0vdev->vbasedev.name, tmp- > > > > > > > > vbasedev.name); > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= ret =3D -1; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= goto out; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > > > + > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0find.pdev = =3D pci; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0find.found= =3D false; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0pci_for_ea= ch_device(bus, pci_bus_num(bus), > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0device_find, &find); > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (!find.= found) { > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= error_report("vfio: Cannot enable AER > > > > > > > for > > > > > > > device %s, " > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0"the dependent device %s > > > > > > > is > > > > > > > not > > > > > > > under the same bus", > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0vdev->vbasedev.name, tmp- > > > > > > > > vbasedev.name); > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= ret =3D -1; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= goto out; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0found =3D = true; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0break; > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0} > > > > > > > + > > > > > > > +=A0=A0=A0=A0=A0=A0=A0=A0/* Ensure all affected devices ass= igned to VM */ > > > > > >=20 > > > > > > I am puzzled. > > > > > > Does not kernel enforce this already? > > > > > > If not it's a security problem. > > > > > > If yes why does userspace need to check this? > > > > >=20 > > > > > DMA isolation and bus level isolation are separate concepts. > > > > > =A0Each > > > > > function of a multi-function device can have DMA isolation, but > > > > > a > > > > > user > > > > > needs to own all of the functions affected by a bus reset in > > > > > order > > > > > to > > > > > perform one. =A0An AER configuration can only be created if the > > > > > user > > > > > can > > > > > translate a guest bus reset into a host bus reset and therefore > > > > > needs > > > > > to test whether it has the permissions to do so. =A0I believe > > > > > over > > > > > the > > > > > course of reviews we've also added some simplifying constraints > > > > > around > > > > > this to reduce the problem set, things like all the groups > > > > > being > > > > > assigned rather than just owned by the user. =A0However, I > > > > > believe > > > > > the > > > > > kernel is sound in how it provides security for bus resets. > > > > > =A0Thanks, > > > > >=20 > > > > > Alex > > > >=20 > > > > Yes, sounds good. > > > >=20 > > > > So how about just trying to do bus reset at setup time? > > > > If kernel allows this, we know it is safe ... > > >=20 > > > The host may support hotplug, what's possible at setup time may not > > > be > > > possible when an error occurs. > >=20 > > How does this patch help solve this problem? >=20 > I believe there's a patch in this series that re-tests on the > occurrence of an error, before injecting the AER into the guest. Doesn't seem robust. What if hotplug happens right after error is injected? > > > It's unlikely, but worth considering I > > > think. > >=20 > > I suspect vfio will have to solve this in kernel > > (e.g. automatically add all new devices in the same group > > wrt reset). >=20 > Nope, the user simply loses their ability to reset the bus if they > don't own all the groups at the time they attempt to do a bus reset. Hmm, this is sub-optimal. Assume I hot-plug a device behind a bus. I fully intend to pass it through to a VM where all other devices are but before I manage to do this, an error triggers. > =A0Mixing bus isolation and DMA isolation would cause a mess of groups. Not sure how what I said implies this. I merely suggested that if vfio takes over bus reset it should take over handling hotplug as well, so devices added on this bus are automatically pevented from being used by anyone except the same VM, making it safe to reset them. --=20 MST