From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59420)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.williamson@redhat.com>) id 1bmNqI-0004qH-A5
	for qemu-devel@nongnu.org; Tue, 20 Sep 2016 12:24:23 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <alex.williamson@redhat.com>) id 1bmNqD-0002oO-5W
	for qemu-devel@nongnu.org; Tue, 20 Sep 2016 12:24:21 -0400
Received: from mx1.redhat.com ([209.132.183.28]:38766)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.williamson@redhat.com>) id 1bmNqC-0002o8-Rx
	for qemu-devel@nongnu.org; Tue, 20 Sep 2016 12:24:17 -0400
Date: Tue, 20 Sep 2016 10:24:15 -0600
From: Alex Williamson <alex.williamson@redhat.com>
Message-ID: <20160920102415.33ed7774@t450s.home>
In-Reply-To: <57E0A407.4020305@intel.com>
References: <1472097235-6332-1-git-send-email-kwankhede@nvidia.com>
	<1472097235-6332-3-git-send-email-kwankhede@nvidia.com>
	<20160825172226.2cd06d71@oc7835276234>
	<72e3baa7-70d1-c5dd-6cd7-3874e7ea4c01@nvidia.com>
	<1a5e06cb-35bf-e7e3-0c0c-1ebf0fa95dcd@nvidia.com>
	<20160919123612.61b81ece@t450s.home>
	<8f4ace28-08bb-5afa-e6af-61395aa44ae1@nvidia.com>
	<20160919140332.26d6900b@t450s.home> <57E0A407.4020305@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v7 2/4] vfio: VFIO driver for mediated
 devices
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Jike Song <jike.song@intel.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>, kraxel@redhat.com, Dong Jia <bjsdjshi@linux.vnet.ibm.com>, kevin.tian@intel.com, cjia@nvidia.com, kvm@vger.kernel.org, qemu-devel@nongnu.org, pbonzini@redhat.com

On Tue, 20 Sep 2016 10:50:47 +0800
Jike Song <jike.song@intel.com> wrote:

> On 09/20/2016 04:03 AM, Alex Williamson wrote:
> > On Tue, 20 Sep 2016 00:43:15 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> >> On 9/20/2016 12:06 AM, Alex Williamson wrote:  
> >>> On Mon, 19 Sep 2016 23:52:36 +0530
> >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>     
> >>>> On 8/26/2016 7:43 PM, Kirti Wankhede wrote:    
> >>>>> * PGP Signed: 08/26/2016 at 07:15:44 AM, Decrypted
> >>>>> On 8/25/2016 2:52 PM, Dong Jia wrote:      
> >>>>>> On Thu, 25 Aug 2016 09:23:53 +0530      
> >>>>    
> >>>>>>> +
> >>>>>>> +static ssize_t vfio_mdev_read(void *device_data, char __user *buf,
> >>>>>>> +			      size_t count, loff_t *ppos)
> >>>>>>> +{
> >>>>>>> +	struct vfio_mdev *vmdev = device_data;
> >>>>>>> +	struct mdev_device *mdev = vmdev->mdev;
> >>>>>>> +	struct parent_device *parent = mdev->parent;
> >>>>>>> +	unsigned int done = 0;
> >>>>>>> +	int ret;
> >>>>>>> +
> >>>>>>> +	if (!parent->ops->read)
> >>>>>>> +		return -EINVAL;
> >>>>>>> +
> >>>>>>> +	while (count) {      
> >>>>>> Here, I have to say sorry to you guys for that I didn't notice the
> >>>>>> bad impact of this change to my patches during the v6 discussion.
> >>>>>>
> >>>>>> For vfio-ccw, I introduced an I/O region to input/output I/O
> >>>>>> instruction parameters and results for Qemu. The @count of these data
> >>>>>> currently is 140. So supporting arbitrary lengths in one shot here, and
> >>>>>> also in vfio_mdev_write, seems the better option for this case.
> >>>>>>
> >>>>>> I believe that if the pci drivers want to iterate in a 4 bytes step, you
> >>>>>> can do that in the parent read/write callbacks instead.
> >>>>>>
> >>>>>> What do you think?
> >>>>>>      
> >>>>>
> >>>>> I would like to know Alex's thought on this. He raised concern with this
> >>>>> approach in v6 reviews:
> >>>>> "But I think this is exploitable, it lets the user make the kernel
> >>>>> allocate an arbitrarily sized buffer."
> >>>>>       
> >>>>
> >>>> Read/write callbacks are for slow path, emulation of mmio region which
> >>>> are mainly device registers. I do feel it shouldn't support arbitrary
> >>>> lengths.
> >>>> Alex, I would like to know your thoughts.    
> >>>
> >>> The exploit was that the mdev layer allocated a buffer and copied the
> >>> entire user buffer into kernel space before passing it to the vendor
> >>> driver.  The solution is to pass the __user *buf to the vendor driver
> >>> and let them sanitize and split it however makes sense for their
> >>> device.  We shouldn't be assuming naturally aligned, up to dword
> >>> accesses in the generic mdev layers.  Those sorts of semantics are
> >>> defined by the device type.  This is another case where making
> >>> the mdev layer as thin as possible is actually the best thing to
> >>> do to make it really device type agnostic.  Thanks,
> >>>     
> >>
> >>
> >> Alex,
> >>
> >> These were the comments on v6 patch:
> >>  
> >>>>> Do we really need to support arbitrary lengths in one shot?  Seems
> >>>>> like
> >>>>> we could just use a 4 or 8 byte variable on the stack and iterate
> >>>>> until
> >>>>> done.
> >>>>>    
> >>>>
> >>>> We just want to pass the arguments to vendor driver as is here.Vendor
> >>>> driver could take care of that.    
> >>  
> >>> But I think this is exploitable, it lets the user make the kernel
> >>> allocate an arbitrarily sized buffer.    
> >>
> >> As per above discussion in v7 version, this module don't allocated
> >> memory from heap.
> >>
> >> If vendor driver allocates arbitrary memory in kernel space through mdev
> >> module interface, isn't that would be exploit?  
> > 
> > Yep, my 4-8/byte chunking idea was too PCI specific.  If a vendor
> > driver introduces an exploit, that's a bug in the vendor driver.  I'm
> > not sure if we can or should attempt to guard against that.  Ultimately
> > the vendor driver is either open source and we can inspect it for such
> > exploits or it's closed source, taints the kernel, and we hope for the
> > best.  It might make a good unit test to perform substantially sized
> > reads/writes to the mdev device.  
> 
> Can't agree more! :-)
> 
> > Perhaps the only sanity test we can
> > make in the core is to verify the access doesn't exceed the size of
> > the region as advertised by the vendor driver.  Thanks,  
> 
> Even performing a lightweight sanity check, would require vfio-mdev
> to be able to decode the ppos into a particular region, that means
> information of all regions should be stored in the framework. I guess
> it is not your preferred way :)

There's certainly a trade-off there, we don't support dynamic regions,
the user expects them to be stable and the mdev-core code can expect
that also.  It might simplify the vendor drivers slightly if the core
could perform such a basic sanity test, but the cost to do so would be
that the core needs to have an understanding of the region layout of
the device.  That seems like non-trivial overhead to consolidate
testing that the vendor driver itself can do much more efficiently.
Thanks,

Alex