From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=56738 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OZnha-0001xL-Vd for qemu-devel@nongnu.org; Fri, 16 Jul 2010 12:23:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OZnhZ-0006Ld-0M for qemu-devel@nongnu.org; Fri, 16 Jul 2010 12:23:54 -0400 Received: from mail-qy0-f173.google.com ([209.85.216.173]:61211) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OZnhY-0006LY-TX for qemu-devel@nongnu.org; Fri, 16 Jul 2010 12:23:52 -0400 Received: by qyk7 with SMTP id 7so339541qyk.4 for ; Fri, 16 Jul 2010 09:23:52 -0700 (PDT) Message-ID: <4C4085DF.1080307@codemonkey.ws> Date: Fri, 16 Jul 2010 11:16:31 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH] Make default invocation of block drivers safer References: <1279123952-1576-1-git-send-email-aliguori@us.ibm.com> <20100714184311.GA9383@lst.de> <4C3E06DA.8080302@codemonkey.ws> <4C3F355E.3020800@codemonkey.ws> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: Kevin Wolf , Christoph Hellwig , Stefan Hajnoczi , qemu-devel@nongnu.org On 07/16/2010 11:06 AM, Markus Armbruster wrote: > Anthony Liguori writes: > > >> On 07/15/2010 10:19 AM, Markus Armbruster wrote: >> >>> Anthony Liguori writes: >>> >>> >>> >>>> On 07/14/2010 01:43 PM, Christoph Hellwig wrote: >>>> >>>> >>>>> Err, strong NACK. Please don't start messing with the contents of the >>>>> data plane, we're getting into real trouble there. It's perfectly >>>>> valid for a guest to create an image inside an image, and with hardware >>>>> support for nested virtualization I guess this use case will become >>>>> rather common, just as it already is on S/390 with VM. >>>>> >>>>> >>>>> >>>> Then we have to remove block format probing. >>>> >>>> The two things are fundamentally incompatible. >>>> >>>> >>> I agree with Christoph: changing guest writes is a big no-no, and >>> changing them silently is even worse. >>> >>> >> I do sympathize. The problem is we're already doing this. This patch >> simply changes the behavior to not be a security problem. I've >> committed it to attempt to resolve that security problem. However, we >> still have a problem and I don't consider the issue closed. >> >> >>> I could perhaps accept EIO. Elsewhere in this thread you wrote that you >>> rejected that approach because "it would trigger the stop-on-error >>> behavior and the result would be far too difficult for a management >>> tool/person to deal with." I think that would be *far* superior in >>> fact: it fails spectacularly, immediately and safely instead of silently >>> corrupting disk contents. >>> >>> >> There's really nothing wrong with this type of write, so EIO doesn't >> solve the problem. While we can argue whether writing zeros or EIO is >> a "better bad" solution, let's try to figure out a good solution. >> >> >>> The real problem in need of fixing is the unsafe default. You wrote >>> that "most users want block probing". I disagree. Users want to set up >>> drives with as little hassle as possible. If format is optional, and >>> appears to work, why bother specifying it? >>> >> I really think specifying the format is a burden that is nice to avoid. >> > Yes, users don't like having to specify the "obvious". > > >> I have another idea that I hope will solve the problem in a more >> complete way. The fundamental issue is that it's impossible to probe >> raw images reliably. We can probe qcow2, vmdk, etc but not raw. >> >> So, let's do the following: have raw_probe() always fail. Probing >> shouldn't be a heuristic, it should be an absolute. We can't prove >> it's a raw image, so we should always fail. >> > Note: if we stop right here, the security hole is patched, but use of > raw images requires explicit specification of format. > > >> To accomodate current use-cases with raw, let's introduce a new format >> called "probed_raw". probed_raw's semantics will be the following: >> >> The signature of a probed_raw will be ~{'QFI\xfb', 'VMDK', 'COWD', >> OOOM', ...}. If the signature is 'QRAW', then instead of reading the >> first sector at offset 0, we read the first sector at offset LENGTH. >> If the signature is 'QRAW', LENGTH is computed by calculating >> FILE_SIZE - 512. >> >> For probed_raw, write requests to sector 0 are checked. If the first >> four bytes is an invalid probed_raw signature or QRAW, we write a QRAW >> signature to file offset 0 and copy the first sector to the end of the >> file redirecting reads and writes to the end of file. >> > Doesn't this require an image that can grow? What about host block > devices? > I don't believe we probe host block devices. We assume they're raw which means they would never be probed_raw. >> An approach like this has the following properties: >> >> 1) We can make the bdrv_probe check 100% reliable and return a boolean. >> 2) In the cases where we known format=raw, none of this code is ever >> invoked. >> 3) probed_raw images usually look exactly like raw images in most cases >> 4) In the degenerate cases, probe_raw images are still mountable in >> the normal way. >> 5) Even after the QRAW signature is applied, if the guest writes a >> valid signature, we can truncate the file and make it appear as a >> normal raw image. >> >> Christoph/Markus/Stefan, does this seem like a more reasonable approach? >> > I'm not convinced it's a good idea. It's clearly a less bad idea, > though :) > > It avoids guest-visible lossage, and that's good. > > There's still host-visible lossage: as soon as we redirect sector 0, the > image isn't raw anymore, and accessing it with non-qemu tools (say > losetup + kpartx) no longer works. You need to know what QEMU did to > your no-longer-raw image to work around the lossage (say losetup -o > 512). > Yeah, but as previously discussed, we can't probe raw. So probed_raw ends up being a compromise. >>> That they get an unsafe >>> default that way is a big surprise to them. And I can't blame them! >>> Users can reasonably expect programs not to trap them. >>> >>> If we want to let users define drives without having to specify the >>> format, we can guess the format from the file name. >>> > I still think guessing the format from the file name is a better > way to spare users from having to specify formats. > I think that would be true if we did it from day 1 but it would be a huge impact to users if we did it today. Regards, Anthony Liguori