From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50508) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCAHs-0002Qm-KH for qemu-devel@nongnu.org; Thu, 24 Dec 2015 13:06:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aCAHp-0006BC-DW for qemu-devel@nongnu.org; Thu, 24 Dec 2015 13:06:52 -0500 Received: from mx1.redhat.com ([209.132.183.28]:55752) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCAHp-0006Ak-3i for qemu-devel@nongnu.org; Thu, 24 Dec 2015 13:06:49 -0500 Date: Thu, 24 Dec 2015 20:06:45 +0200 From: "Michael S. Tsirkin" Message-ID: <20151224200603-mutt-send-email-mst@redhat.com> References: <20151224163132-mutt-send-email-mst@redhat.com> <1450979226.2950.108.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <1450979226.2950.108.camel@redhat.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v14 Resend 08/13] vfio: add check host bus reset is support or not List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: chen.fan.fnst@cn.fujitsu.com, Cao jin , qemu-devel@nongnu.org On Thu, Dec 24, 2015 at 10:47:06AM -0700, Alex Williamson wrote: > On Thu, 2015-12-24 at 16:32 +0200, Michael S. Tsirkin wrote: > > On Thu, Dec 17, 2015 at 09:41:49AM +0800, Cao jin wrote: > > > From: Chen Fan > > >=20 > > > when init vfio devices done, we should test all the devices > > > supported > > > aer whether conflict with others. For each one, get the hot reset > > > info for the affected device list.=A0=A0For each affected device, a= ll > > > should attach to the VM and on/below the same bus. also, we should > > > test > > > all of the non-AER supporting vfio-pci devices on or below the > > > target > > > bus to verify they have a reset mechanism. > > >=20 > > > Signed-off-by: Chen Fan > > > --- > > > =A0hw/vfio/pci.c | 236 > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- > > > =A0hw/vfio/pci.h |=A0=A0=A01 + > > > =A02 files changed, 230 insertions(+), 7 deletions(-) > > >=20 > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > > > index d00b0e4..6926dcc 100644 > > > --- a/hw/vfio/pci.c > > > +++ b/hw/vfio/pci.c > > > @@ -1806,6 +1806,216 @@ static int vfio_add_std_cap(VFIOPCIDevice > > > *vdev, uint8_t pos) > > > =A0=A0=A0=A0=A0return 0; > > > =A0} > > > =A0 > > > +static bool vfio_pci_host_slot_match(PCIHostDeviceAddress *host1, > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0PCIHostDeviceAddress *host2) > > > +{ > > > +=A0=A0=A0=A0return (host1->domain =3D=3D host2->domain && host1->b= us =3D=3D host2- > > > >bus && > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0host1->slot =3D=3D host2->slot= ); > > > +} > > > + > > > +static bool vfio_pci_host_match(PCIHostDeviceAddress *host1, > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0PCIHostDeviceAddress *host2) > > > +{ > > > +=A0=A0=A0=A0return (vfio_pci_host_slot_match(host1, host2) && > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0host1->function =3D=3D host2->= function); > > > +} > > > + > > > +struct VFIODeviceFind { > > > +=A0=A0=A0=A0PCIDevice *pdev; > > > +=A0=A0=A0=A0bool found; > > > +}; > > > + > > > +static void vfio_check_device_noreset(PCIBus *bus, PCIDevice > > > *pdev, > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0void *opaque) > > > +{ > > > +=A0=A0=A0=A0DeviceState *dev =3D DEVICE(pdev); > > > +=A0=A0=A0=A0DeviceClass *dc =3D DEVICE_GET_CLASS(dev); > > > +=A0=A0=A0=A0VFIOPCIDevice *vdev; > > > +=A0=A0=A0=A0struct VFIODeviceFind *find =3D opaque; > > > + > > > +=A0=A0=A0=A0if (find->found) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0return; > > > +=A0=A0=A0=A0} > > > + > > > +=A0=A0=A0=A0if (!object_dynamic_cast(OBJECT(dev), "vfio-pci")) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0if (!dc->reset) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0goto found; > > > +=A0=A0=A0=A0=A0=A0=A0=A0} > > > +=A0=A0=A0=A0=A0=A0=A0=A0return; > > > +=A0=A0=A0=A0} > > > +=A0=A0=A0=A0vdev =3D DO_UPCAST(VFIOPCIDevice, pdev, pdev); > > > +=A0=A0=A0=A0if (!(vdev->features & VFIO_FEATURE_ENABLE_AER) && > > > +=A0=A0=A0=A0=A0=A0=A0=A0!vdev->vbasedev.reset_works) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0goto found; > > > +=A0=A0=A0=A0} > > > + > > > +=A0=A0=A0=A0return; > > > +found: > > > +=A0=A0=A0=A0find->pdev =3D pdev; > > > +=A0=A0=A0=A0find->found =3D true; > > > +} > > > + > > > +static void device_find(PCIBus *bus, PCIDevice *pdev, void > > > *opaque) > > > +{ > > > +=A0=A0=A0=A0struct VFIODeviceFind *find =3D opaque; > > > + > > > +=A0=A0=A0=A0if (find->found) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0return; > > > +=A0=A0=A0=A0} > > > + > > > +=A0=A0=A0=A0if (pdev =3D=3D find->pdev) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0find->found =3D true; > > > +=A0=A0=A0=A0} > > > +} > > > + > > > +static int vfio_check_host_bus_reset(VFIOPCIDevice *vdev) > > > +{ > > > +=A0=A0=A0=A0PCIBus *bus =3D vdev->pdev.bus; > > > +=A0=A0=A0=A0struct vfio_pci_hot_reset_info *info =3D NULL; > > > +=A0=A0=A0=A0struct vfio_pci_dependent_device *devices; > > > +=A0=A0=A0=A0VFIOGroup *group; > > > +=A0=A0=A0=A0struct VFIODeviceFind find; > > > +=A0=A0=A0=A0int ret, i; > > > + > > > +=A0=A0=A0=A0ret =3D vfio_get_hot_reset_info(vdev, &info); > > > +=A0=A0=A0=A0if (ret) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0error_report("vfio: Cannot enable AER for = device %s," > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0" d= evice does not support hot reset.", > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0vde= v->vbasedev.name); > > > +=A0=A0=A0=A0=A0=A0=A0=A0goto out; > > > +=A0=A0=A0=A0} > > > + > > > +=A0=A0=A0=A0/* List all affected devices by bus reset */ > > > +=A0=A0=A0=A0devices =3D &info->devices[0]; > > > + > > > +=A0=A0=A0=A0/* Verify that we have all the groups required */ > > > +=A0=A0=A0=A0for (i =3D 0; i < info->count; i++) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0PCIHostDeviceAddress host; > > > +=A0=A0=A0=A0=A0=A0=A0=A0VFIOPCIDevice *tmp; > > > +=A0=A0=A0=A0=A0=A0=A0=A0VFIODevice *vbasedev_iter; > > > +=A0=A0=A0=A0=A0=A0=A0=A0bool found =3D false; > > > + > > > +=A0=A0=A0=A0=A0=A0=A0=A0host.domain =3D devices[i].segment; > > > +=A0=A0=A0=A0=A0=A0=A0=A0host.bus =3D devices[i].bus; > > > +=A0=A0=A0=A0=A0=A0=A0=A0host.slot =3D PCI_SLOT(devices[i].devfn); > > > +=A0=A0=A0=A0=A0=A0=A0=A0host.function =3D PCI_FUNC(devices[i].devf= n); > > > + > > > +=A0=A0=A0=A0=A0=A0=A0=A0/* Skip the current device */ > > > +=A0=A0=A0=A0=A0=A0=A0=A0if (vfio_pci_host_match(&host, &vdev->host= )) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0continue; > > > +=A0=A0=A0=A0=A0=A0=A0=A0} > > > + > > > +=A0=A0=A0=A0=A0=A0=A0=A0/* Ensure we own the group of the affected= device */ > > > +=A0=A0=A0=A0=A0=A0=A0=A0QLIST_FOREACH(group, &vfio_group_list, nex= t) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (group->groupid =3D=3D devi= ces[i].group_id) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0break; > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0} > > > +=A0=A0=A0=A0=A0=A0=A0=A0} > > > + > > > +=A0=A0=A0=A0=A0=A0=A0=A0if (!group) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0error_report("vfio: Cannot ena= ble AER for device %s, " > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0"depends on group %d which is not > > > owned.", > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0vdev->vbasedev.name, > > > devices[i].group_id); > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0ret =3D -1; > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0goto out; > > > +=A0=A0=A0=A0=A0=A0=A0=A0} > > > + > > > +=A0=A0=A0=A0=A0=A0=A0=A0/* Ensure affected devices for reset on/bl= ow the bus */ > > > +=A0=A0=A0=A0=A0=A0=A0=A0QLIST_FOREACH(vbasedev_iter, &group->devic= e_list, next) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (vbasedev_iter->type !=3D V= FIO_DEVICE_TYPE_PCI) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0continue; > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0} > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0tmp =3D container_of(vbasedev_= iter, VFIOPCIDevice, > > > vbasedev); > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (vfio_pci_host_match(&host,= &tmp->host)) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0PCIDevice *pci =3D= PCI_DEVICE(tmp); > > > + > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0/* > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0* For multifunc= tion device, due to vfio driver > > > signal all > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0* functions und= er the upstream link of the end > > > point. here > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0* we validate a= ll functions whether enable AER. > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0*/ > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (vfio_pci_host_= slot_match(&vdev->host, &tmp- > > > >host) && > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0!(tmp-= >features & VFIO_FEATURE_ENABLE_AER)) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0error_= report("vfio: Cannot enable AER for > > > device %s, on same slot" > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0" the dependent device %s which > > > does not enable AER.", > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0vdev->vbasedev.name, tmp- > > > >vbasedev.name); > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0ret =3D= -1; > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0goto o= ut; > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0} > > > + > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0find.pdev =3D pci; > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0find.found =3D fal= se; > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0pci_for_each_devic= e(bus, pci_bus_num(bus), > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0device_find, &find); > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (!find.found) { > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0error_= report("vfio: Cannot enable AER for > > > device %s, " > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0"the dependent device %s is not > > > under the same bus", > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0vdev->vbasedev.name, tmp- > > > >vbasedev.name); > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0ret =3D= -1; > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0goto o= ut; > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0} > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0found =3D true; > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0break; > > > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0} > > > +=A0=A0=A0=A0=A0=A0=A0=A0} > > > + > > > +=A0=A0=A0=A0=A0=A0=A0=A0/* Ensure all affected devices assigned to= VM */ > >=20 > > I am puzzled. > > Does not kernel enforce this already? > > If not it's a security problem. > > If yes why does userspace need to check this? >=20 > DMA isolation and bus level isolation are separate concepts. =A0Each > function of a multi-function device can have DMA isolation, but a user > needs to own all of the functions affected by a bus reset in order to > perform one. =A0An AER configuration can only be created if the user ca= n > translate a guest bus reset into a host bus reset and therefore needs > to test whether it has the permissions to do so. =A0I believe over the > course of reviews we've also added some simplifying constraints around > this to reduce the problem set, things like all the groups being > assigned rather than just owned by the user. =A0However, I believe the > kernel is sound in how it provides security for bus resets. =A0Thanks, >=20 > Alex Yes, sounds good. So how about just trying to do bus reset at setup time? If kernel allows this, we know it is safe ... --=20 MST