From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44607) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XmgIh-0001RT-H1 for qemu-devel@nongnu.org; Fri, 07 Nov 2014 04:57:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XmgIX-0004R6-G3 for qemu-devel@nongnu.org; Fri, 07 Nov 2014 04:57:51 -0500 Received: from mx1.redhat.com ([209.132.183.28]:43717) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XmgIX-0004Qo-5D for qemu-devel@nongnu.org; Fri, 07 Nov 2014 04:57:41 -0500 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id sA79vdWY021369 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL) for ; Fri, 7 Nov 2014 04:57:40 -0500 From: Markus Armbruster References: <87lhnq3iul.fsf@blackfin.pond.sub.org> <5459E210.2020008@redhat.com> <87a944y0od.fsf@blackfin.pond.sub.org> <545B6F4F.4050106@redhat.com> Date: Fri, 07 Nov 2014 10:57:36 +0100 In-Reply-To: <545B6F4F.4050106@redhat.com> (Max Reitz's message of "Thu, 06 Nov 2014 13:53:35 +0100") Message-ID: <87fvdvcoz3.fsf@blackfin.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Max Reitz Cc: Kevin Wolf , Jeff Cody , qemu-devel@nongnu.org, Stefan Hajnoczi Max Reitz writes: > On 2014-11-06 at 13:26, Markus Armbruster wrote: >> Max Reitz writes: >> >>> On 2014-11-04 at 19:45, Markus Armbruster wrote: >>>> I'll try to explain all solutions fairly. Isn't easy when you're as >>>> biased towards one of them as I am. Please bear with me. >>>> >>>> >>>> = The trust boundary between image contents and meta-data = >>>> >>>> A disk image consists of image contents and meta-data. >>>> >>>> Example: all of a raw image's contents is image contents. Leaves just >>>> file name and attributes for meta-data. >>>> >>>> Example: QCOW2 meta-data includes header, header extensions, L1 table, >>>> L2 tables, ... The meta-data defines where in the image the actual >>>> contents is stored. >>>> >>>> A guest can access the image contents, not the meta-data. >>>> >>>> Image contents you've let an untrusted guest write is untrusted. >>>> >>>> Therefore, there's a trust boundary between image contents and >>>> meta-data. QEMU has to trust image meta-data, but shouldn't trust image >>>> contents. The exact location of the trust boundary depends on the image >>>> format. >>>> >>>> >>>> = How we instruct QEMU what to trust = >>>> >>>> By configuring QEMU to use an image, the user instructs QEMU to trust >>>> the image's meta-data. >>>> >>>> When the user's configuration specifies the image format explicitly, the >>>> trust boundary is clear. >>>> >>>> Else, the trust boundary is ambigous when more than one format is >>>> possible. >>>> >>>> QEMU resolves this ambiguity by picking the first format with the >>>> highest "score". Raw format is always possible, and always has the >>>> lowest score. >>>> >>>> >>>> = How this lets the guest escape isolation = >>>> >>>> Unfortunately, this lets the guest shift the trust boundary and escape >>>> isolation, as follows: >>>> >>>> * Expose a raw image to the guest (whether you specify the format=raw or >>>> let QEMU guess it doesn't matter). The complete contents becomes >>>> untrusted. >>>> >>>> * Reuse the image *without* specifying the raw format. QEMU guesses the >>>> format based on untrusted image contents. Now QEMU guesses a format >>>> chosen by the guest, with meta-data chosen by the guest. By >>>> controlling image meta-data, the malicious guest can access arbitrary >>>> files as QEMU, enlarge its storage, and more. A non-malicious guest >>>> can accidentally DoS itself, by writing a pattern probing recognizes. >>> Thank you for bringing that to my attention. This means that I'm even >>> more in favor of using Kevin's patches because in fact they don't >>> break anything. >> They break things differently. The difference may or may not matter. >> >> Example: innocent guest writes a recognized pattern. >> >> Now: next restart fails, guest DoSed itself until host operator gets >> around to adding format=raw to the configuration. Consequence: >> downtime (probably lengthy), but no data corruption. >> >> With Kevin's patch: write fails, guest may or may not handle the >> failure gracefully. Consequences can range from "guest complains to >> its logs (who cares)" via "guest stops whatever it's doing and refuses >> to continue until its hardware gets fixed (downtime as above)" to >> "data corruption". > > You somehow seem convinced that writing to sector 0 is a completely > normal operation. For x86, it isn't, though. > > There are only a couple of programs which do that, I can only think of > partitioning and setting up boot loaders. There's not a myriad of > programs which would increase the probability of one both writing a > recognizable pattern *and* not handling EPERM correctly. > > I see the probability of both happening at the same time as extremely > low, not least because there are only a handful of programs which > access that sector. > >> Example: innocent guest first writes a recognized pattern, then >> overwrites it with a non-recognized pattern. >> >> Now: works. >> >> With Kevin's patch: as above. > > Not really, if the guest overwrites the data with a non-recognized > pattern after EPERM it works as well. The difference here is that you > won't have the intended data in the meantime. That's roughly "guest complains to its logs (who cares)" from the text referenced by "as above". The other consequences are possibly here, too. >> Again, I'm not claiming the differences are serious in practice, only >> that they exist. > > True. > >>>> This is CVE-2008-2004. >>>> >>>> >>>> = Aside: other trust boundaries = >>>> >>>> Of course, this is not the only trust boundary that matters. For >>>> instance, there's normally one between your host and somebody else's >>>> computers. Telling QEMU to trust meta-data of some image you got "from >>>> the internet" violates it. There's nothing QEMU can do about that. >>>> >>>> >>>> = Insecure usage is easy, secure usage is hard = >>>> >>>> The oldest stratum of user interfaces doesn't let you specify the image >>>> format. Use of raw images with these is insecure by design. These >>>> interfaces are still recommended for human users. >>>> >>>> Example of insecure usage: -hda foo.img, where foo.img is raw. >>>> >>>> With the next generation of interfaces, specifying the image format is >>>> optional. Use of raw images with these is insecure by default. >>>> >>>> Example of insecure usage: -drive file=foo.img,index=0,media=cdrom, >>>> where foo.img is raw. The -hda above is actually sugar for this. >>>> >>>> Equivalent secure usage: add format=raw. >>>> >>>> Note that specifying just the top image's format is not enough, you also >>>> have to specify any backing images' formats. QCOW2 can optionally store >>>> the backing image format in the image. The other COW formats can't. >>> Well, they can, with "json:". *cough* >> Point coughingly taken. >> >>>> Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image >>>> with a raw backing file. >>> Yesterday I found out that doesn't seem possible. You apparently can >>> only use VMDK with VMDK backing files. >> I figure you're referring to this code in vmdk_create(): >> >> if (strcmp(bs->drv->format_name, "vmdk")) { >> bdrv_unref(bs); >> ret = -EINVAL; >> goto exit; >> } > > Yes. Of course it does depend on probing, which figures, I guess. > >>> Other than that, we only have >>> qcow1 and qed as COW formats which should not be used anyway. >> qemu-doc.texi calls them "old image format", and qemu-img.texi has them >> under "Other", "for compatibility with older QEMU versions". I guess we >> could do better job telling users they "should not be used anyway". >> >> Even in old stuff retained just for compatibility, we should make an >> effort to plug security holes, make secure usage easy, and guide users >> away from insecure usage. > > Of course, I'm just pointing out that it seems to be only qcow1 which > is fully subject to this. qcow2 is partially, as you're pointing out > next. > >> Now back to the point I was trying to make in my original message. >> >> Replacement example of insecure usage: -hda bar.qcow2, where bar.qcow2 >> is a QCOW2 image with a raw backing file and no backing image format, >> i.e. created without "-o backing_format=". >> >> Then the next paragraph applies: >> >>>> Equivalent secure usage: Beats me. Maybe there's a funky -drive >>>> backing.whatever to specify the backing image's format. >> See Kevin's reply for equivalent secure usage. >> >>>> With the latest interface blockdev-add, specifying the format is >>>> mandatory. Secure, but not really suitable for humans. >>>> >>>> Example of secure usage: >>>> >>>> { "execute": "blockdev-add", >>>> "arguments": {'options': { >>>> 'driver': 'raw', 'id':'foo', >>>> 'file': { 'driver': 'file', 'filename': 'foo.img' } } } } >>>> >>>> Insecure usage is easy, secure usage is *hard*. Even for sophisticated >>>> users like libvirt developers. Evidence: libvirt CVE-2010-2237, >>>> CVE-2010-2238, CVE-2010-2239, and more that didn't get a CVE, like the >>>> recent accidental probing in drive-mirror. >>>> >>>> >>>> = How can we better guard the trust boundary in QEMU? = >>>> >>>> The guest can violate the trust boundary only because >>>> >>>> (a) QEMU supports both raw images and image formats, and >>>> >>>> (b) QEMU guesses image format from raw image contents, and >>>> >>>> (c) given a raw image, guests can change its contents and control a >>>> future QEMU's format guess. >>>> >>>> We can attack one ore more of these three conditions: >>> I'd like to attack more because any of these steps might be carried >>> out in another program which thus either becomes vulnerable itself >>> (which we don't really have to care about, but I'd like to either way) >>> or which makes qemu vulnerable. >>> >>> Having an external program with (a) and (c), this makes qemu >>> vulnerable if we don't try to forbid (b) or at least make it work >>> better. Having an external program with (a) and (b), not doing >>> anything against (c) in qemu makes that external program vulnerable. >>> >>>> (a) Outlaw raw images >>>> >>>> (b) Don't guess format from untrusted image contents >>>> >>>> (c) Prevent "bad" guest writes >>>> >>>> Nobody is seriously suggesting we do (a). It's clearly too late for >>>> that. Let's explore the other two in more detail. >>> And thus I prefer to find and implement solutions for *both* (b) and (c). >> Quoting myself: "We can attack one ore more of these three conditions". > > Yes, and I referred to that by saying "I'd like to attack more". Understood. >>>> == Don't guess format from untrusted image contents == >>>> >>>> Several variations of the theme. >>>> >>>> Guessing only happens when the user doesn't specify a format, so the >>>> simplest way to avoid it would be requiring users to always specify the >>>> format. >>>> >>>> PRO: Simple, plugs the hole. >>>> >>>> CON: Breaks a lot of existing usage. Bye -hda, hello extra typing. >>>> >>>> Variation: command line option to opt out of probing completely. >>>> >>>> PRO: Simple, plugs the hole. >>>> >>>> CON: Insecure by default. >>>> >>>> CON: In addition to opting out, you have to update your usage >>>> accordingly. I guess libvirt would do it anyway, to guard against >>>> accidental probing once and for all. >>>> >>>> Variation: Stefan proposed to make format= mandatory just for -drive. I >>>> guess he rather meant mandatory for anything but -hda and other selected >>>> convenience interfaces. >>>> >>>> PRO: No extra typing in the cases where it makes the most difference. >>>> >>>> CON: Breaks existing usage in the other cases, extra typing. >>>> >>>> CON: Doesn't fully plug the hole. Users of convenience interfaces may >>>> remain insecure out of ignorance. We could add a warning to guide >>>> them to more secure usage, but then that warning would annoy users >>>> who don't care for security (sometimes with reason), and we can't >>>> have that. So we silently leave the users who would care if they >>>> knew insecure. >>>> >>>> I proposed something less radical, namely to keep guessing the image >>>> format, but base the guess on trusted meta-data only: file name and >>>> attributes. >>> You actually want to completely abolish probing? I thought we just >>> wanted to use the filename as a second source of information and warn >>> if the contents and the extension don't seem to match; and in the >>> future, maybe make it an error (but we don't have to discuss that >>> second part now, I think). >> Yes, I propose to ditch it completely, after a suitable grace period. I >> tried to make that quite clear in my PATCH RFC 2/2. >> >> Here's why. >> >> Now: 1. probe >> 4. open, error out if meta-data is bad >> >> With my proposed patch: >> 1. probe >> 2. guess from trusted meta-data >> 3. warn unless 1 and 2 match >> 4. open, error out if meta-data is bad >> >> Now make the warning an error: >> 1. probe >> 2. guess from trusted meta-data >> 3. error out unless 1 and 2 match >> 4. open, error out if meta-data is bad >> >> I figure the following is equivalent, but simpler: >> >> 2. guess from trusted meta-data >> 4. open, error out if meta-data is bad >> >> because open should detect all the errors the previous step 3 caught. >> If not, things are broken with explicit format=. > > You're right, it seems be equivalent. One difference will probably be > error messages ("Bad signature" vs. "Filename extension and format do > not match"). Yes. > The other difference is that you have a problem if you cannot > distinguish between two formats by extensions, as we've seen > already. .qcow could mean both qcow1 and qcow2; a similar problem > appeared with .vhd. Opening the image using all possible formats seems > bad to me. Better swap 1 and 2: Guess from commonly trusted metadata > (i.e. the filename extension) and then probe all possible formats. Fair enough. >>>> Block and character special files are raw. For other >>>> files, find the file name extension, and look up the format claiming it. >>>> >>>> PRO: Plugs the hole. >>> You mean "plugs hole (b)". >> What I (airily) call "the hole" is the scenario I described above: guest >> escaping isolation by subverting qemu-system-*. >> >> (b) is not a hole, it's a condition for "the hole". > > Your "the hole" is only one hole. There are more holes. You only > consider the case where there's only qemu, which is technically fine > for discussion on qemu-devel, but I'd like to consider having other > tools beside qemu as well. > > You're right, though, it should be "prevents condition (b)". > >> I guess what you want to say is "attacking just condition (b) doesn't >> plug some other holes I care about". That's true. There are indeed >> other holes. >> >> For instance, guest attacking QEMU's utility programs qemu-img, ... Or >> guest attacking other host software. I'm not trying to discount any of >> them. But tangling all problems up into a hairball won't do us any >> good. > > We have two approaches and I think using both will help to plug more > holes. I'm not saying we should consider every possible hole with > every host configuration there may be. I'm just saying using only of > the approaches clearly only plugs "the hole" if there's only qemu. > >> So your point that my analysis is narrow is taken. In my defense, I can >> say that my narrow analysis was difficult enough to write up, and >> probably produced enough text to deter readers. >> >>>> CON: Breaks existing usage when the new guess differs from the old >>>> guess. Common usage should be fine: >>>> >>>> * -hda test.qcow2 >>>> >>>> Fine as long as test.qcow2 is really QCOW2 (as it should!), and >>>> either specifies a backing format (as it arguably should), or the >>>> backing file name is sane. >>>> >>>> * -hda disk.img >>>> >>>> Fine as long as disk.img is really a disk image (as it should). >>>> >>>> * -hda /dev/mapper/vg0-virtdisk >>>> >>>> Fine as long as the logical volume is raw. >>>> >>>> Less common usage can break: >>>> >>>> * -hda nbd://localhost >>>> >>>> Socket provides no clue, so no guess. >>> nbd should be raw. If it isn't, you're most likely doing something >>> wrong. See https://bugzilla.redhat.com/show_bug.cgi?id=1090713 what >>> happens when you're doing it wrong. >> Okay. >> >> My RFC PATCH is too simplistic to exploit that, because it only looks at >> file name extensions. But "user asked for nbd protocol" is quite >> obviously trusted meta-data. We could have a less simplistic patch >> putting it to use. Or adopt the variation below. >> >>>> Weird usage can conceivably break hard: >>>> >>>> * -hdd disk.img >>>> >>>> Breaks hard when disk.img is actually QCOW2, the guest boots >>>> anyway from another drive, then proceeds to overwrite this one. >>> Then why not continue to do probing and using the extension as a safeguard? >>> >>>> Mitigation: lengthy transition period where we warn "this usage is >>>> insecure, and we'll eventually break it; here's a hint on secure usage". >>>> >>>> CON: We delay plugging the hole one more time. But at least we no >>>> longer expose our users to it silently. >>>> >>>> Jeff pointed out that we want to retain probing in things like qemu-img >>>> info. >>>> >>>> Richard asked for a way for users to ask for insecure probing, say >>>> format=insecure-probe. I have no problem with giving users rope when >>>> they ask for it. >>>> >>>> Variation: if file name and attributes are unavailable or provide no >>>> clue, guess raw. Same PRO and CON as above, only it avoids breaking a >>>> few more cases. For instance, "-hda nbd://localhost" keeps working as >>>> long as the server serves a raw image. >>> Which it should be. >>> >>>> Smells a bit like too much magic >>>> to me. >>>> >>>> My proposal replaces probing wholesale. I like that because it results >>>> in simple, predictable guessing. Here's a hybrid approach: first guess >>>> raw vs. non-raw based on trusted meta-data only, and if non-raw, probe. >>>> >>>> Nothing the guest writes can affect the raw vs. non-raw decision. Once >>>> an image is raw, only the user can make it non-raw, by changing its name >>>> or attributes. >>>> >>>> Two variations: 1. guess raw without a clue, and 2. guess non-raw then. >>>> >>>> Again, same PRO and CON as above, only it doesn't break when users give >>>> their non-raw images weird names. >>>> >>>> == Prevent "bad" guest writes == >>>> >>>> Again, several variations, but this time, only the last one is serious, >>>> the others are just for illustration. >>>> >>>> Fail guest writes to those parts of the image that probing may examine >>>> Can fail only writes to the first few sectors (at worst) of raw images. >>>> >>>> PRO: Plugs the hole. >>>> >>>> CON: The virtual hardware is defective. Breaks common guest software >>>> that writes to the first few sectors, such as boot loaders and >>>> partitioning tools. Breaks guest software using the whole device, which >>>> isn't common, but certainly not unheard of. >>>> >>>> Variation: fail only writes of patterns that actually can make probing >>>> guess something other than raw. >>>> >>>> PRO: Still plugs the hole. >>> You mean "plugs hole (c)". >> My reply to "plugs hole (b)" applies. >> >>>> CON: Except when you upgrade to a version that recognizes more patterns. >>> Which is better than not plugging hole (c) at all. >>> >>>> CON: The virtual hardware is still defective, but the defects are >>>> minimized. >>> As you pointed out to us it's already defective and I don't think >>> anybody ever noticed accidentally. >> You're right in that probed raw is already defective, with defects >> delayed to the next restart. Preventing "bad" guest writes changes the >> nature of the defects subtly, as I described above. > > Sector 0 is rarely ever written. It's not that people write some > recognizable sequences there all the time but don't notice because > they are quickly overwritten again. > > If nobody hit the problem accidentally until now, I'm certain that > means that nobody ever wrote any recognizable sequence there while > using qemu and raw (which is not too rare a combination). > >> This and the previous variation also extends them to non-probed raw. >> The following variations avoid the extension. >> >>>> We can hope that partition tables, boot sectors and such >>>> won't match the patterns, so common guest software hopefully works. >>> It's worked in the past, that's good enough for me. >>> >>>> Guest software using the whole device still breaks, only now it breaks >>>> later rather than sooner. >>>> >>>> Variation: fail writes only on *probed* raw images. >>>> >>>> CON: Doesn't fully plug the hole: mixing probed usage (user doesn't >>>> specify format) with non-probed usage (user specifies format) remains >>>> insecure. The guest's write succeeds in non-probed usage, and the guest >>>> escapes isolation in the next probed usage. >>>> >>>> CON: The virtual hardware is still defective, but it now comes with a >>>> "defective on/off" switch, factory default "defective on". We could add >>>> a warning to guide users to switch defective off but then that warning >>>> would annoy people who don't care to switch it off (sometimes with >>>> reason), and we can't have that. So we leave users who would care if >>>> they knew in the dark. >>>> >>>> The two variations can be combined. This is Kevin's proposal. >>>> >>>> CON: Doesn't fully plug the hole: union of both variations' flaws. >>>> >>>> CON: The virtual hardware is still defective: interesection of both >>>> variations' defects. >>>> >>>> >>>> = Conclusion = >>>> >>>> This is *my* conclusion. Yours may be different, and that's okay. It's >>>> business, not personal ;) >>>> >>>> Secure usage should not be hard. >>>> >>>> If we permit insecure usage for convenience or compatibility, we should >>>> warn the user, unless he clearly asked for it. Even if that warning >>>> annoys Kevin and Max. >>> A warning does not annoy me as long as I know what it means. >> Good :) >> >>>> If you want to suppress it with configure >>>> --reckless or something, no objections. >>>> >>>> Same for defective virtual hardware. >>>> >>>> Kevin's proposal to prevent "bad" guest writes has relatively small >>>> compatibility issues. It improves things from "insecure by default" to >>>> "slightly defective by default" as long as you use raw images either >>>> always probed or always non-probed. It doesn't help at all when you >>>> alternate probed and non-probed usage. Improvement of sorts, but it >>>> still fails my "secure usage should not be hard" requirement, and that >>>> requirement is a hard one for me. >>>> >>>> My proposal to ditch image contents probing entirely has more serious >>>> compatibility issues. In particular, we'd have to forgo sugared >>>> convenience syntax for a number of less common things. It definitely >>>> needs a grace period where all usage we're going to break warns. On the >>>> up side, it will actually be secure by default when it's done. >>>> >>>> If this is not acceptable, my second choice is actually the command line >>>> option to opt out of probing completely. This doesn't address "insecure >>>> by default" (sadly), but it does at least satisfy my "secure usage >>>> should not be hard" requirement. >>>> >>>> If we should adopt Kevin's proposal against my objections, then I very >>>> badly want the opt out option on top of it, opting out of both probing >>>> and "bad" write prevention. >>>> >>>> Thanks for hearing me out. >>> My conclusion: Don't ditch probing. It increases entropy, why would >>> you ditch probing? Just combine it with the extension and if both >>> don't seem to match, that's an error. >> How does probing add value? >> >> Compare >> >> 1. probe >> 2. guess from trusted meta-data >> 3. error out unless 1 and 2 match >> 4. open, error out if meta-data is bad >> >> to just >> >> 2. guess from trusted meta-data >> 4. open, error out if meta-data is bad >> >> Let P be the driver recommended by probe, and G be the driver >> recommended by guess. >> >> If P == G, same result: we open with the same driver. >> >> Else, if open with G fails, equivalent result: error out in step 3 >> vs. error out in step 4. >> >> Else, we have an odd case: one driver's probe accepts (P's), yet a >> different driver's open succeeds (G's). >> >> If G's probe rejects: is this a bug? Shouldn't open always fail >> when probe rejects? >> >> If G's probe accepts, then probing chose P over G. Maybe it >> shouldn't have. Or maybe the probing functions are imprecise. >> Anyway, keeping probing around makes this an error. Should it be >> one? >> >> Am I missing something? > > No, see my reply above. I failed to consider that opening an image > basically is advanced probing. > > The only thing I really see which could be missing is the problem of > having ambiguous filename extensions, and that problem can easily be > solved by keeping probing. > >>> Also, holes (b) and (c) are two different holes. We should fix >>> both. We should fix (b) so qemu isn't vulnerable and we should fix (c) >>> so qemu doesn't make other programs which do probe vulnerable. >> Adjusting terminology, but hopefully not your intent: "the hole" isn't >> the only hole that matters. The conditions enable other holes. >> Therefore, we should attack both (b) and (c). Attacking (a) is not >> practical. > > Yes. that's what I meant. There are various holes and preventing only > one of the conditions (b) and (c) only plugs one (or a subset) of > them, and I can easily see unplugged holes which can be fixed by > preventing both. No misunderstanding then. >> Points taken, except I think we could attack (a) if we really wanted. > > Of course, but for that we'd either need some flat qcow2 mode or > better support for an image format that does have such a flat mode. Yes.