From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Williamson Subject: Re: VFIO no-iommu Date: Fri, 18 Dec 2015 14:50:17 -0700 Message-ID: <1450475417.2674.167.camel@redhat.com> References: <566B4A50.9090607@6wind.com> <1449874953.20509.6.camel@redhat.com> <26FA93C7ED1EAA44AB77D62FBE1D27BA6747CE55@IRSMSX108.ger.corp.intel.com> <1450198398.6042.32.camel@redhat.com> <20151216040408.GA18363@sivlogin002.ir.intel.com> <1450240711.2674.11.camel@redhat.com> <1450285912.2674.22.camel@redhat.com> <1450370639.2674.93.camel@redhat.com> <20151218104310.GA11371@sivlogin002.ir.intel.com> <1450449520.2674.162.camel@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Cc: "dev@dpdk.org" To: "Yigit, Ferruh" Return-path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id ADF455A8C for ; Fri, 18 Dec 2015 22:50:20 +0100 (CET) In-Reply-To: <1450449520.2674.162.camel@redhat.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, 2015-12-18 at 07:38 -0700, Alex Williamson wrote: > On Fri, 2015-12-18 at 10:43 +0000, Yigit, Ferruh wrote: > > On Thu, Dec 17, 2015 at 09:43:59AM -0700, Alex Williamson wrote: > > <...> > > > > > > > >=20 > > > > > > > > Also I need to disable VFIO_CHECK_EXTENSION ioctl, > > > > > > > > because in > > > > > > > > vfio > > > > > > > > module, > > > > > > > > container->noiommu is not set before doing a > > > > > > > > vfio_group_set_container() > > > > > > > > and vfio_for_each_iommu_driver selects wrong driver. > > > > > > >=20 > > > > > > > Running CHECK_EXTENSION on a container without the group > > > > > > > attached is > > > > > > > only going to tell you what extensions vfio is capable > > > > > > > of, > > > > > > > not > > > > > > > necessarily what extensions are available to you with > > > > > > > that > > > > > > > group. > > > > > > > Is this just a general dpdk- vfio ordering bug? > > > > > >=20 > > > > > > Yes, that is how VFIO was implemented in DPDK. I was under > > > > > > the > > > > > > impression that checking extension before assigning devices > > > > > > was > > > > > > the > > > > > > correct way to do things, so as to not to try anything we > > > > > > know > > > > > > would > > > > > > fail anyway. Does this imply that CHECK_EXTENSION needs to > > > > > > be > > > > > > called > > > > > > on both container and groups (or just on groups)? > > > > >=20 > > > > > Hmm, in Documentation/vfio.txt we do give the following > > > > > algorithm: > > > > >=20 > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (ioctl(conta= iner, VFIO_GET_API_VERSION) !=3D > > > > > VFIO_API_VERSION) > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* Unknown API version */ > > > > >=20 > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (!ioctl(cont= ainer, VFIO_CHECK_EXTENSION, > > > > > VFIO_TYPE1_IOMMU)) > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* Doesn't support the IOMMU driver we w= ant. > > > > > */ > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0... > > > > >=20 > > > > > That's just going to query each iommu driver and we can't yet > > > > > say > > > > > whether > > > > > the group the user attaches to the container later will > > > > > actually > > > > > support that > > > > > extension until we try to do it, that would come at > > > > > VFIO_SET_IOMMU. > > > > > =C2=A0So is > > > > > it perhaps a vfio bug that we're not advertising no-iommu > > > > > until > > > > > the > > > > > group is > > > > > attached? =C2=A0After all, we are capable of it with just an em= pty > > > > > container, just > > > > > like we are with type1, but we're going to fail SET_IOMMU for > > > > > the > > > > > wrong > > > > > combination. > > > > > =C2=A0This is exactly the sort of thing that makes me glad we > > > > > reverted > > > > > it without > > > > > feedback from a working user driver. =C2=A0Thanks, > > > >=20 > > > > Whether it should be considered a "bug" in VFIO or "by design" > > > > is > > > > up > > > > to you, of course, but at least according to the VFIO > > > > documentation, > > > > we are meant to check for type 1 extension and then attach > > > > devices, > > > > so it would be expected to get VFIO_NOIOMMU_IOMMU marked as > > > > supported > > > > even without any devices attached to the container (just like > > > > we > > > > get > > > > type 1 as supported without any devices attached). Having said > > > > that, > > > > if it was meant to attach devices first and then check the > > > > extensions, then perhaps the documentation should also point > > > > out > > > > that > > > > fact (or perhaps I missed that detail in my readings of the > > > > docs, > > > > in > > > > which case my apologies). > > >=20 > > > Hi Anatoly, > > >=20 > > > Does the below patch make it behave more like you'd expect. =C2=A0T= his > > > applies to v4.4-rc4, I'd fold this into the base patch if we > > > reincorporate it to a future kernel. =C2=A0Thanks, > > >=20 > > > Alex > > >=20 > > > commit 88d4dcb6b77624965f0b45b5cd305a2b4a105c94 > > > Author: Alex Williamson > > > Date:=C2=A0=C2=A0=C2=A0Wed Dec 16 19:02:01 2015 -0700 > > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0vfio: Fix no-iommu CHECK_EXTENSION > > > =C2=A0=C2=A0=C2=A0=C2=A0 > > > =C2=A0=C2=A0=C2=A0=C2=A0Previously the no-iommu iommu driver was on= ly visible when > > > the > > > =C2=A0=C2=A0=C2=A0=C2=A0container had an attached no-iommu group.=C2= =A0=C2=A0This means that > > > =C2=A0=C2=A0=C2=A0=C2=A0CHECK_EXTENSION on and empty container coul= dn't report the > > > possibility > > > =C2=A0=C2=A0=C2=A0=C2=A0of using VFIO_NOIOMMU_IOMMU.=C2=A0=C2=A0We = report TYPE1 whether or not > > > the user > > > =C2=A0=C2=A0=C2=A0=C2=A0can make use of it with the group, so this = is > > > inconsistent.=C2=A0=C2=A0Add the > > > =C2=A0=C2=A0=C2=A0=C2=A0no-iommu iommu to the list of iommu drivers= when enabled via > > > module > > > =C2=A0=C2=A0=C2=A0=C2=A0option, but skip all the others if the cont= ainer is attached > > > to > > > a > > > =C2=A0=C2=A0=C2=A0=C2=A0no-iommu groups.=C2=A0=C2=A0Note that taint= ing is now done with the > > > "unsafe" > > > =C2=A0=C2=A0=C2=A0=C2=A0module callback rather than explictly withi= n vfio. > > > =C2=A0=C2=A0=C2=A0=C2=A0 > > > =C2=A0=C2=A0=C2=A0=C2=A0Also fixes module option and module descrip= tion name > > > inconsistency. > > > =C2=A0=C2=A0=C2=A0=C2=A0 > > > =C2=A0=C2=A0=C2=A0=C2=A0Also make vfio_noiommu_ops const. > > > =C2=A0=C2=A0=C2=A0=C2=A0 > > > =C2=A0=C2=A0=C2=A0=C2=A0Signed-off-by: Alex Williamson > >=20 > > Hi Alex, > >=20 > > I got following crash with this update: Let's try this one: commit 8ff839c6ffe9f3b50b50f1cc87e7afbf23171f05 Author: Alex Williamson Date:=C2=A0=C2=A0=C2=A0Fri Dec 18 14:45:55 2015 -0700 =C2=A0=C2=A0=C2=A0=C2=A0v2 vfio fix no-iommu CHECK_EXTENSION =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0Register and unregister the no-iommu iommu backen= d at module =C2=A0=C2=A0=C2=A0=C2=A0initialization and exit, but disable it unless en= abled via module =C2=A0=C2=A0=C2=A0=C2=A0option.=C2=A0=C2=A0Rather than modify the iommu d= river walk, selectively skip =C2=A0=C2=A0=C2=A0=C2=A0combinations that aren't supported.=C2=A0=C2=A0CH= ECK_EXTENSION on a container =C2=A0=C2=A0=C2=A0=C2=A0without any groups attached exposes all possible = extensions.=C2=A0=C2=A0Once a =C2=A0=C2=A0=C2=A0=C2=A0group is attached, the no-iommu backend is skippe= d for regular groups =C2=A0=C2=A0=C2=A0=C2=A0and regular iommu backends are skipped for no-iom= mu groups. =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0This would be folded into a single patch to re-pr= opose vfio no-iommu =C2=A0=C2=A0=C2=A0=C2=A0mode upstream. =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0Signed-off-by: Alex Williamson diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index de632da..85a5793 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -99,7 +99,7 @@ struct vfio_device { =C2=A0 =C2=A0#ifdef CONFIG_VFIO_NOIOMMU =C2=A0static bool noiommu __read_mostly; -module_param_named(enable_unsafe_noiommu_support, +module_param_named(enable_unsafe_noiommu_mode, =C2=A0 =C2=A0=C2=A0=C2=A0noiommu, bool, S_IRUGO | S_IWUSR); =C2=A0MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE, no-IOM= MU mode.=C2=A0=C2=A0This mode provides no device isolation, no DMA transl= ation, no host kernel protection, cannot be used for device assignment to= virtual machines, requires RAWIO permissions, and will taint the kernel.= =C2=A0=C2=A0If you do not know what this is for, step away. (default: fal= se)"); =C2=A0#endif @@ -185,7 +185,7 @@ static long vfio_noiommu_ioctl(void *iommu_data, =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0unsigned int cmd, unsi= gned long arg) =C2=A0{ =C2=A0 if (cmd =3D=3D VFIO_CHECK_EXTENSION) - return arg =3D=3D VFIO_NOIOMMU_IOMMU ? 1 : 0; + return noiommu && (arg =3D=3D VFIO_NOIOMMU_IOMMU) ? 1 : 0; =C2=A0 =C2=A0 return -ENOTTY; =C2=A0} @@ -207,7 +207,7 @@ static void vfio_noiommu_detach_group(void *iommu_dat= a, =C2=A0{ =C2=A0} =C2=A0 -static struct vfio_iommu_driver_ops vfio_noiommu_ops =3D { +static const struct vfio_iommu_driver_ops vfio_noiommu_ops =3D { =C2=A0 .name =3D "vfio-noiommu", =C2=A0 .owner =3D THIS_MODULE, =C2=A0 .open =3D vfio_noiommu_open, @@ -216,25 +216,6 @@ static struct vfio_iommu_driver_ops vfio_noiommu_ops= =3D { =C2=A0 .attach_group =3D vfio_noiommu_attach_group, =C2=A0 .detach_group =3D vfio_noiommu_detach_group, =C2=A0}; - -static struct vfio_iommu_driver vfio_noiommu_driver =3D { - .ops =3D &vfio_noiommu_ops, -}; - -/* - * Wrap IOMMU drivers, the noiommu driver is the one and only driver for - * noiommu groups (and thus containers) and not available for normal gro= ups. - */ -#define vfio_for_each_iommu_driver(con, pos) \ - for (pos =3D con->noiommu ? &vfio_noiommu_driver : \ - =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0list_first_entry(&vfio.iommu_drivers_list= , \ - =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct vfio_iommu_driver, vfio_ne= xt); \ - =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0(con->noiommu ? pos !=3D NULL : \ - &pos->vfio_next !=3D &vfio.iommu_drivers_list); \ - =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0pos =3D con->noiommu ? NULL : list_= next_entry(pos, vfio_next)) -#else -#define vfio_for_each_iommu_driver(con, pos) \ - list_for_each_entry(pos, &vfio.iommu_drivers_list, vfio_next) =C2=A0#endif =C2=A0 =C2=A0 @@ -999,7 +980,14 @@ static long vfio_ioctl_check_extension(struct vfio_c= ontainer *container, =C2=A0 =C2=A0*/ =C2=A0 if (!driver) { =C2=A0 mutex_lock(&vfio.iommu_drivers_lock); - vfio_for_each_iommu_driver(container, driver) { + list_for_each_entry(driver, &vfio.iommu_drivers_list, + =C2=A0=C2=A0=C2=A0=C2=A0vfio_next) { + + if (!list_empty(&container->group_list) && + =C2=A0=C2=A0=C2=A0=C2=A0(container->noiommu !=3D + =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0(driver->ops =3D=3D &vfio_noiommu_ops)= )) + continue; + =C2=A0 if (!try_module_get(driver->ops->owner)) =C2=A0 continue; =C2=A0 @@ -1068,9 +1056,16 @@ static long vfio_ioctl_set_iommu(struct vfio_conta= iner *container, =C2=A0 } =C2=A0 =C2=A0 mutex_lock(&vfio.iommu_drivers_lock); - vfio_for_each_iommu_driver(container, driver) { + list_for_each_entry(driver, &vfio.iommu_drivers_list, vfio_next) { =C2=A0 void *data; =C2=A0 + /* + =C2=A0* Only noiommu containers can use vfio-noiommu and noiommu + =C2=A0* containers can only use vfio-noiommu. + =C2=A0*/ + if (container->noiommu !=3D (driver->ops =3D=3D &vfio_noiommu_ops)) + continue; + =C2=A0 if (!try_module_get(driver->ops->owner)) =C2=A0 continue; =C2=A0 @@ -1799,6 +1794,9 @@ static int __init vfio_init(void) =C2=A0 request_module_nowait("vfio_iommu_type1"); =C2=A0 request_module_nowait("vfio_iommu_spapr_tce"); =C2=A0 +#ifdef CONFIG_VFIO_NOIOMMU + vfio_register_iommu_driver(&vfio_noiommu_ops); +#endif =C2=A0 return 0; =C2=A0 =C2=A0err_cdev_add: @@ -1815,6 +1813,9 @@ static void __exit vfio_cleanup(void) =C2=A0{ =C2=A0 WARN_ON(!list_empty(&vfio.group_list)); =C2=A0 +#ifdef CONFIG_VFIO_NOIOMMU + vfio_unregister_iommu_driver(&vfio_noiommu_ops); +#endif =C2=A0 idr_destroy(&vfio.group_idr); =C2=A0 cdev_del(&vfio.group_cdev); =C2=A0 unregister_chrdev_region(vfio.group_devt, MINORMASK);