From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=56738 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OZnha-0001xL-Vd
	for qemu-devel@nongnu.org; Fri, 16 Jul 2010 12:23:56 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <anthony@codemonkey.ws>) id 1OZnhZ-0006Ld-0M
	for qemu-devel@nongnu.org; Fri, 16 Jul 2010 12:23:54 -0400
Received: from mail-qy0-f173.google.com ([209.85.216.173]:61211)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <anthony@codemonkey.ws>) id 1OZnhY-0006LY-TX
	for qemu-devel@nongnu.org; Fri, 16 Jul 2010 12:23:52 -0400
Received: by qyk7 with SMTP id 7so339541qyk.4
	for <qemu-devel@nongnu.org>; Fri, 16 Jul 2010 09:23:52 -0700 (PDT)
Message-ID: <4C4085DF.1080307@codemonkey.ws>
Date: Fri, 16 Jul 2010 11:16:31 -0500
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [PATCH] Make default invocation of block drivers
	safer
References: <1279123952-1576-1-git-send-email-aliguori@us.ibm.com>	<20100714184311.GA9383@lst.de>
	<4C3E06DA.8080302@codemonkey.ws>	<m3pqyoai3f.fsf@blackfin.pond.sub.org>	<4C3F355E.3020800@codemonkey.ws>
	<m31vb3za1i.fsf@blackfin.pond.sub.org>
In-Reply-To: <m31vb3za1i.fsf@blackfin.pond.sub.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Markus Armbruster <armbru@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Christoph Hellwig <hch@lst.de>, Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>, qemu-devel@nongnu.org

On 07/16/2010 11:06 AM, Markus Armbruster wrote:
> Anthony Liguori<anthony@codemonkey.ws>  writes:
>
>    
>> On 07/15/2010 10:19 AM, Markus Armbruster wrote:
>>      
>>> Anthony Liguori<anthony@codemonkey.ws>   writes:
>>>
>>>
>>>        
>>>> On 07/14/2010 01:43 PM, Christoph Hellwig wrote:
>>>>
>>>>          
>>>>> Err, strong NACK.  Please don't start messing with the contents of the
>>>>> data plane, we're getting into real trouble there.  It's perfectly
>>>>> valid for a guest to create an image inside an image, and with hardware
>>>>> support for nested virtualization I guess this use case will become
>>>>> rather common, just as it already is on S/390 with VM.
>>>>>
>>>>>
>>>>>            
>>>> Then we have to remove block format probing.
>>>>
>>>> The two things are fundamentally incompatible.
>>>>
>>>>          
>>> I agree with Christoph: changing guest writes is a big no-no, and
>>> changing them silently is even worse.
>>>
>>>        
>> I do sympathize.  The problem is we're already doing this.  This patch
>> simply changes the behavior to not be a security problem.  I've
>> committed it to attempt to resolve that security problem.  However, we
>> still have a problem and I don't consider the issue closed.
>>
>>      
>>> I could perhaps accept EIO.  Elsewhere in this thread you wrote that you
>>> rejected that approach because "it would trigger the stop-on-error
>>> behavior and the result would be far too difficult for a management
>>> tool/person to deal with."  I think that would be *far* superior in
>>> fact: it fails spectacularly, immediately and safely instead of silently
>>> corrupting disk contents.
>>>
>>>        
>> There's really nothing wrong with this type of write, so EIO doesn't
>> solve the problem.  While we can argue whether writing zeros or EIO is
>> a "better bad" solution, let's try to figure out a good solution.
>>
>>      
>>> The real problem in need of fixing is the unsafe default.  You wrote
>>> that "most users want block probing".  I disagree.  Users want to set up
>>> drives with as little hassle as possible.  If format is optional, and
>>> appears to work, why bother specifying it?
>>>        
>> I really think specifying the format is a burden that is nice to avoid.
>>      
> Yes, users don't like having to specify the "obvious".
>
>    
>> I have another idea that I hope will solve the problem in a more
>> complete way.  The fundamental issue is that it's impossible to probe
>> raw images reliably.  We can probe qcow2, vmdk, etc but not raw.
>>
>> So, let's do the following: have raw_probe() always fail.  Probing
>> shouldn't be a heuristic, it should be an absolute.  We can't prove
>> it's a raw image, so we should always fail.
>>      
> Note: if we stop right here, the security hole is patched, but use of
> raw images requires explicit specification of format.
>
>    
>> To accomodate current use-cases with raw, let's introduce a new format
>> called "probed_raw".  probed_raw's semantics will be the following:
>>
>> The signature of a probed_raw will be ~{'QFI\xfb', 'VMDK', 'COWD',
>> OOOM', ...}.  If the signature is 'QRAW', then instead of reading the
>> first sector at offset 0, we read the first sector at offset LENGTH.
>> If the signature is 'QRAW', LENGTH is computed by calculating
>> FILE_SIZE - 512.
>>
>> For probed_raw, write requests to sector 0 are checked.  If the first
>> four bytes is an invalid probed_raw signature or QRAW, we write a QRAW
>> signature to file offset 0 and copy the first sector to the end of the
>> file redirecting reads and writes to the end of file.
>>      
> Doesn't this require an image that can grow?  What about host block
> devices?
>    

I don't believe we probe host block devices.  We assume they're raw 
which means they would never be probed_raw.

>> An approach like this has the following properties:
>>
>> 1) We can make the bdrv_probe check 100% reliable and return a boolean.
>> 2) In the cases where we known format=raw, none of this code is ever
>> invoked.
>> 3) probed_raw images usually look exactly like raw images in most cases
>> 4) In the degenerate cases, probe_raw images are still mountable in
>> the normal way.
>> 5) Even after the QRAW signature is applied, if the guest writes a
>> valid signature, we can truncate the file and make it appear as a
>> normal raw image.
>>
>> Christoph/Markus/Stefan, does this seem like a more reasonable approach?
>>      
> I'm not convinced it's a good idea.  It's clearly a less bad idea,
> though :)
>
> It avoids guest-visible lossage, and that's good.
>
> There's still host-visible lossage: as soon as we redirect sector 0, the
> image isn't raw anymore, and accessing it with non-qemu tools (say
> losetup + kpartx) no longer works.  You need to know what QEMU did to
> your no-longer-raw image to work around the lossage (say losetup -o
> 512).
>    

Yeah, but as previously discussed, we can't probe raw.  So probed_raw 
ends up being a compromise.

>>>     That they get an unsafe
>>> default that way is a big surprise to them.  And I can't blame them!
>>> Users can reasonably expect programs not to trap them.
>>>
>>> If we want to let users define drives without having to specify the
>>> format, we can guess the format from the file name.
>>>        
> I still think guessing the format from the file name is a better
> way to spare users from having to specify formats.
>    
I think that would be true if we did it from day 1 but it would be a 
huge impact to users if we did it today.

Regards,

Anthony Liguori