From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Williamson Subject: Re: VFIO no-iommu Date: Thu, 17 Dec 2015 09:43:59 -0700 Message-ID: <1450370639.2674.93.camel@redhat.com> References: <60420822.AbcfvjLZCk@xps13> <566B4A50.9090607@6wind.com> <1449874953.20509.6.camel@redhat.com> <26FA93C7ED1EAA44AB77D62FBE1D27BA6747CE55@IRSMSX108.ger.corp.intel.com> <1450198398.6042.32.camel@redhat.com> <20151216040408.GA18363@sivlogin002.ir.intel.com> <1450240711.2674.11.camel@redhat.com> <1450285912.2674.22.camel@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Cc: "dev@dpdk.org" To: "Burakov, Anatoly" , "Yigit, Ferruh" Return-path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 76BF07E23 for ; Thu, 17 Dec 2015 17:44:01 +0100 (CET) In-Reply-To: List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, 2015-12-16 at 17:22 +0000, Burakov, Anatoly wrote: > Hi Alex, >=20 > > On Wed, 2015-12-16 at 08:35 +0000, Burakov, Anatoly wrote: > > > Hi Alex, > > >=20 > > > > On Wed, 2015-12-16 at 04:04 +0000, Ferruh Yigit wrote: > > > > > On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson > > > > > wrote: > > > > > I tested the DPDK (HEAD of master) with the patch, with help > > > > > of > > > > > Anatoly, and DPDK works in no-iommu environment with a little > > > > > modification. > > > > >=20 > > > > > Basically the only modification is adapt new group naming > > > > > (noiommu-$) > > > > > and > > > >=20 > > > > Sorry, forgot to mention that one. =C2=A0The intention with the > > > > modified > > > > group name is that I want to be very certain that a user > > > > intending > > > > to only support properly iommu isolated devices doesn't > > > > accidentally > > > > need to deal with these no-iommu mode devices. > > > >=20 > > > > > disable dma mapping (VFIO_IOMMU_MAP_DMA) > > > > >=20 > > > > > Also I need to disable VFIO_CHECK_EXTENSION ioctl, because in > > > > > vfio > > > > > module, > > > > > container->noiommu is not set before doing a > > > > > vfio_group_set_container() > > > > > and vfio_for_each_iommu_driver selects wrong driver. > > > >=20 > > > > Running CHECK_EXTENSION on a container without the group > > > > attached is > > > > only going to tell you what extensions vfio is capable of, not > > > > necessarily what extensions are available to you with that > > > > group. > > > > Is this just a general dpdk- vfio ordering bug? > > >=20 > > > Yes, that is how VFIO was implemented in DPDK. I was under the > > > impression that checking extension before assigning devices was > > > the > > > correct way to do things, so as to not to try anything we know > > > would > > > fail anyway. Does this imply that CHECK_EXTENSION needs to be > > > called > > > on both container and groups (or just on groups)? > >=20 > > Hmm, in Documentation/vfio.txt we do give the following algorithm: > >=20 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (ioctl(container, = VFIO_GET_API_VERSION) !=3D > > VFIO_API_VERSION) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0/* Unknown API version */ > >=20 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (!ioctl(container,= VFIO_CHECK_EXTENSION, > > VFIO_TYPE1_IOMMU)) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0/* Doesn't support the IOMMU driver we want. *= / > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0... > >=20 > > That's just going to query each iommu driver and we can't yet say > > whether > > the group the user attaches to the container later will actually > > support that > > extension until we try to do it, that would come at VFIO_SET_IOMMU. > > =C2=A0So is > > it perhaps a vfio bug that we're not advertising no-iommu until the > > group is > > attached? =C2=A0After all, we are capable of it with just an empty > > container, just > > like we are with type1, but we're going to fail SET_IOMMU for the > > wrong > > combination. > > =C2=A0This is exactly the sort of thing that makes me glad we reverte= d > > it without > > feedback from a working user driver. =C2=A0Thanks, >=20 > Whether it should be considered a "bug" in VFIO or "by design" is up > to you, of course, but at least according to the VFIO documentation, > we are meant to check for type 1 extension and then attach devices, > so it would be expected to get VFIO_NOIOMMU_IOMMU marked as supported > even without any devices attached to the container (just like we get > type 1 as supported without any devices attached). Having said that, > if it was meant to attach devices first and then check the > extensions, then perhaps the documentation should also point out that > fact (or perhaps I missed that detail in my readings of the docs, in > which case my apologies). Hi Anatoly, Does the below patch make it behave more like you'd expect. =C2=A0This applies to v4.4-rc4, I'd fold this into the base patch if we reincorporate it to a future kernel. =C2=A0Thanks, Alex commit 88d4dcb6b77624965f0b45b5cd305a2b4a105c94 Author: Alex Williamson Date:=C2=A0=C2=A0=C2=A0Wed Dec 16 19:02:01 2015 -0700 =C2=A0=C2=A0=C2=A0=C2=A0vfio: Fix no-iommu CHECK_EXTENSION =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0Previously the no-iommu iommu driver was only vis= ible when the =C2=A0=C2=A0=C2=A0=C2=A0container had an attached no-iommu group.=C2=A0=C2= =A0This means that =C2=A0=C2=A0=C2=A0=C2=A0CHECK_EXTENSION on and empty container couldn't r= eport the possibility =C2=A0=C2=A0=C2=A0=C2=A0of using VFIO_NOIOMMU_IOMMU.=C2=A0=C2=A0We report= TYPE1 whether or not the user =C2=A0=C2=A0=C2=A0=C2=A0can make use of it with the group, so this is inc= onsistent.=C2=A0=C2=A0Add the =C2=A0=C2=A0=C2=A0=C2=A0no-iommu iommu to the list of iommu drivers when = enabled via module =C2=A0=C2=A0=C2=A0=C2=A0option, but skip all the others if the container = is attached to a =C2=A0=C2=A0=C2=A0=C2=A0no-iommu groups.=C2=A0=C2=A0Note that tainting is= now done with the "unsafe" =C2=A0=C2=A0=C2=A0=C2=A0module callback rather than explictly within vfio= . =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0Also fixes module option and module description n= ame inconsistency. =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0Also make vfio_noiommu_ops const. =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0Signed-off-by: Alex Williamson diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index de632da..d3a9432 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -99,9 +99,6 @@ struct vfio_device { =C2=A0 =C2=A0#ifdef CONFIG_VFIO_NOIOMMU =C2=A0static bool noiommu __read_mostly; -module_param_named(enable_unsafe_noiommu_support, - =C2=A0=C2=A0=C2=A0noiommu, bool, S_IRUGO | S_IWUSR); -MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE, no-IOMMU mo= de.=C2=A0=C2=A0This mode provides no device isolation, no DMA translation= , no host kernel protection, cannot be used for device assignment to virt= ual machines, requires RAWIO permissions, and will taint the kernel.=C2=A0= =C2=A0If you do not know what this is for, step away. (default: false)"); =C2=A0#endif =C2=A0 =C2=A0/* @@ -138,17 +135,6 @@ struct iommu_group *vfio_iommu_group_get(struct devi= ce *dev) =C2=A0 iommu_group_put(group); =C2=A0 if (ret) =C2=A0 return NULL; - - /* - =C2=A0* Where to taint?=C2=A0=C2=A0At this point we've added an IOMMU g= roup for a - =C2=A0* device that is not backed by iommu_ops, therefore any iommu_ - =C2=A0* callback using iommu_ops can legitimately Oops.=C2=A0=C2=A0So, = while we may - =C2=A0* be about to give a DMA capable device to a user without IOMMU - =C2=A0* protection, which is clearly taint-worthy, let's go ahead and d= o - =C2=A0* it here. - =C2=A0*/ - add_taint(TAINT_USER, LOCKDEP_STILL_OK); - dev_warn(dev, "Adding kernel taint for vfio-noiommu group on device\n")= ; =C2=A0#endif =C2=A0 =C2=A0 return group; @@ -207,7 +193,7 @@ static void vfio_noiommu_detach_group(void *iommu_dat= a, =C2=A0{ =C2=A0} =C2=A0 -static struct vfio_iommu_driver_ops vfio_noiommu_ops =3D { +static const struct vfio_iommu_driver_ops vfio_noiommu_ops =3D { =C2=A0 .name =3D "vfio-noiommu", =C2=A0 .owner =3D THIS_MODULE, =C2=A0 .open =3D vfio_noiommu_open, @@ -217,24 +203,34 @@ static struct vfio_iommu_driver_ops vfio_noiommu_op= s =3D { =C2=A0 .detach_group =3D vfio_noiommu_detach_group, =C2=A0}; =C2=A0 -static struct vfio_iommu_driver vfio_noiommu_driver =3D { - .ops =3D &vfio_noiommu_ops, +static int noiommu_param_set(const char *val, const struct kernel_param = *kp) +{ + int ret; + + if (!val) + val =3D "1"; + + ret =3D strtobool(val, kp->arg); + if (ret) + return ret; + + if (noiommu) + ret =3D vfio_register_iommu_driver(&vfio_noiommu_ops); + else + vfio_unregister_iommu_driver(&vfio_noiommu_ops); + + return ret; +} + +static const struct kernel_param_ops noiommu_param_ops =3D { + .flags =3D KERNEL_PARAM_OPS_FL_NOARG, + .set =3D noiommu_param_set, + .get =3D param_get_bool, =C2=A0}; =C2=A0 -/* - * Wrap IOMMU drivers, the noiommu driver is the one and only driver for - * noiommu groups (and thus containers) and not available for normal gro= ups. - */ -#define vfio_for_each_iommu_driver(con, pos) \ - for (pos =3D con->noiommu ? &vfio_noiommu_driver : \ - =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0list_first_entry(&vfio.iommu_drivers_list= , \ - =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct vfio_iommu_driver, vfio_ne= xt); \ - =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0(con->noiommu ? pos !=3D NULL : \ - &pos->vfio_next !=3D &vfio.iommu_drivers_list); \ - =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0pos =3D con->noiommu ? NULL : list_= next_entry(pos, vfio_next)) -#else -#define vfio_for_each_iommu_driver(con, pos) \ - list_for_each_entry(pos, &vfio.iommu_drivers_list, vfio_next) +module_param_cb_unsafe(enable_unsafe_noiommu_mode, &noiommu_param_ops, + =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0&noiommu, S_IRUGO | S_IWUSR)= ; +MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE, no-IOMMU mo= de.=C2=A0=C2=A0This mode provides no device isolation, no DMA translation= , no host kernel protection, cannot be used for device assignment to virt= ual machines, requires RAWIO permissions, and will taint the kernel.=C2=A0= =C2=A0If you do not know what this is for, step away. (default: false)"); =C2=A0#endif =C2=A0 =C2=A0 @@ -999,7 +995,12 @@ static long vfio_ioctl_check_extension(struct vfio_c= ontainer *container, =C2=A0 =C2=A0*/ =C2=A0 if (!driver) { =C2=A0 mutex_lock(&vfio.iommu_drivers_lock); - vfio_for_each_iommu_driver(container, driver) { + list_for_each_entry(driver, &vfio.iommu_drivers_list, + =C2=A0=C2=A0=C2=A0=C2=A0vfio_next) { + if (container->noiommu && + =C2=A0=C2=A0=C2=A0=C2=A0driver->ops !=3D &vfio_noiommu_ops) + continue; + =C2=A0 if (!try_module_get(driver->ops->owner)) =C2=A0 continue; =C2=A0 @@ -1068,9 +1069,12 @@ static long vfio_ioctl_set_iommu(struct vfio_conta= iner *container, =C2=A0 } =C2=A0 =C2=A0 mutex_lock(&vfio.iommu_drivers_lock); - vfio_for_each_iommu_driver(container, driver) { + list_for_each_entry(driver, &vfio.iommu_drivers_list, vfio_next) { =C2=A0 void *data; =C2=A0 + if (container->noiommu && driver->ops !=3D &vfio_noiommu_ops) + continue; + =C2=A0 if (!try_module_get(driver->ops->owner)) =C2=A0 continue; =C2=A0