From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59420) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmNqI-0004qH-A5 for qemu-devel@nongnu.org; Tue, 20 Sep 2016 12:24:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bmNqD-0002oO-5W for qemu-devel@nongnu.org; Tue, 20 Sep 2016 12:24:21 -0400 Received: from mx1.redhat.com ([209.132.183.28]:38766) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmNqC-0002o8-Rx for qemu-devel@nongnu.org; Tue, 20 Sep 2016 12:24:17 -0400 Date: Tue, 20 Sep 2016 10:24:15 -0600 From: Alex Williamson Message-ID: <20160920102415.33ed7774@t450s.home> In-Reply-To: <57E0A407.4020305@intel.com> References: <1472097235-6332-1-git-send-email-kwankhede@nvidia.com> <1472097235-6332-3-git-send-email-kwankhede@nvidia.com> <20160825172226.2cd06d71@oc7835276234> <72e3baa7-70d1-c5dd-6cd7-3874e7ea4c01@nvidia.com> <1a5e06cb-35bf-e7e3-0c0c-1ebf0fa95dcd@nvidia.com> <20160919123612.61b81ece@t450s.home> <8f4ace28-08bb-5afa-e6af-61395aa44ae1@nvidia.com> <20160919140332.26d6900b@t450s.home> <57E0A407.4020305@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v7 2/4] vfio: VFIO driver for mediated devices List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jike Song Cc: Kirti Wankhede , kraxel@redhat.com, Dong Jia , kevin.tian@intel.com, cjia@nvidia.com, kvm@vger.kernel.org, qemu-devel@nongnu.org, pbonzini@redhat.com On Tue, 20 Sep 2016 10:50:47 +0800 Jike Song wrote: > On 09/20/2016 04:03 AM, Alex Williamson wrote: > > On Tue, 20 Sep 2016 00:43:15 +0530 > > Kirti Wankhede wrote: > > > >> On 9/20/2016 12:06 AM, Alex Williamson wrote: > >>> On Mon, 19 Sep 2016 23:52:36 +0530 > >>> Kirti Wankhede wrote: > >>> > >>>> On 8/26/2016 7:43 PM, Kirti Wankhede wrote: > >>>>> * PGP Signed: 08/26/2016 at 07:15:44 AM, Decrypted > >>>>> On 8/25/2016 2:52 PM, Dong Jia wrote: > >>>>>> On Thu, 25 Aug 2016 09:23:53 +0530 > >>>> > >>>>>>> + > >>>>>>> +static ssize_t vfio_mdev_read(void *device_data, char __user *buf, > >>>>>>> + size_t count, loff_t *ppos) > >>>>>>> +{ > >>>>>>> + struct vfio_mdev *vmdev = device_data; > >>>>>>> + struct mdev_device *mdev = vmdev->mdev; > >>>>>>> + struct parent_device *parent = mdev->parent; > >>>>>>> + unsigned int done = 0; > >>>>>>> + int ret; > >>>>>>> + > >>>>>>> + if (!parent->ops->read) > >>>>>>> + return -EINVAL; > >>>>>>> + > >>>>>>> + while (count) { > >>>>>> Here, I have to say sorry to you guys for that I didn't notice the > >>>>>> bad impact of this change to my patches during the v6 discussion. > >>>>>> > >>>>>> For vfio-ccw, I introduced an I/O region to input/output I/O > >>>>>> instruction parameters and results for Qemu. The @count of these data > >>>>>> currently is 140. So supporting arbitrary lengths in one shot here, and > >>>>>> also in vfio_mdev_write, seems the better option for this case. > >>>>>> > >>>>>> I believe that if the pci drivers want to iterate in a 4 bytes step, you > >>>>>> can do that in the parent read/write callbacks instead. > >>>>>> > >>>>>> What do you think? > >>>>>> > >>>>> > >>>>> I would like to know Alex's thought on this. He raised concern with this > >>>>> approach in v6 reviews: > >>>>> "But I think this is exploitable, it lets the user make the kernel > >>>>> allocate an arbitrarily sized buffer." > >>>>> > >>>> > >>>> Read/write callbacks are for slow path, emulation of mmio region which > >>>> are mainly device registers. I do feel it shouldn't support arbitrary > >>>> lengths. > >>>> Alex, I would like to know your thoughts. > >>> > >>> The exploit was that the mdev layer allocated a buffer and copied the > >>> entire user buffer into kernel space before passing it to the vendor > >>> driver. The solution is to pass the __user *buf to the vendor driver > >>> and let them sanitize and split it however makes sense for their > >>> device. We shouldn't be assuming naturally aligned, up to dword > >>> accesses in the generic mdev layers. Those sorts of semantics are > >>> defined by the device type. This is another case where making > >>> the mdev layer as thin as possible is actually the best thing to > >>> do to make it really device type agnostic. Thanks, > >>> > >> > >> > >> Alex, > >> > >> These were the comments on v6 patch: > >> > >>>>> Do we really need to support arbitrary lengths in one shot? Seems > >>>>> like > >>>>> we could just use a 4 or 8 byte variable on the stack and iterate > >>>>> until > >>>>> done. > >>>>> > >>>> > >>>> We just want to pass the arguments to vendor driver as is here.Vendor > >>>> driver could take care of that. > >> > >>> But I think this is exploitable, it lets the user make the kernel > >>> allocate an arbitrarily sized buffer. > >> > >> As per above discussion in v7 version, this module don't allocated > >> memory from heap. > >> > >> If vendor driver allocates arbitrary memory in kernel space through mdev > >> module interface, isn't that would be exploit? > > > > Yep, my 4-8/byte chunking idea was too PCI specific. If a vendor > > driver introduces an exploit, that's a bug in the vendor driver. I'm > > not sure if we can or should attempt to guard against that. Ultimately > > the vendor driver is either open source and we can inspect it for such > > exploits or it's closed source, taints the kernel, and we hope for the > > best. It might make a good unit test to perform substantially sized > > reads/writes to the mdev device. > > Can't agree more! :-) > > > Perhaps the only sanity test we can > > make in the core is to verify the access doesn't exceed the size of > > the region as advertised by the vendor driver. Thanks, > > Even performing a lightweight sanity check, would require vfio-mdev > to be able to decode the ppos into a particular region, that means > information of all regions should be stored in the framework. I guess > it is not your preferred way :) There's certainly a trade-off there, we don't support dynamic regions, the user expects them to be stable and the mdev-core code can expect that also. It might simplify the vendor drivers slightly if the core could perform such a basic sanity test, but the cost to do so would be that the core needs to have an understanding of the region layout of the device. That seems like non-trivial overhead to consolidate testing that the vendor driver itself can do much more efficiently. Thanks, Alex