[Qemu-devel] Image probing: how it can be insecure, and what we could do about it

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
@ 2014-11-04 18:45 Markus Armbruster
  2014-11-04 20:33 ` Jeff Cody
                   ` (6 more replies)
  0 siblings, 7 replies; 38+ messages in thread
From: Markus Armbruster @ 2014-11-04 18:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Jeff Cody, Stefan Hajnoczi, Max Reitz

I'll try to explain all solutions fairly.  Isn't easy when you're as
biased towards one of them as I am.  Please bear with me.

= The trust boundary between image contents and meta-data =

A disk image consists of image contents and meta-data.

Example: all of a raw image's contents is image contents.  Leaves just
file name and attributes for meta-data.

Example: QCOW2 meta-data includes header, header extensions, L1 table,
L2 tables, ...  The meta-data defines where in the image the actual
contents is stored.

A guest can access the image contents, not the meta-data.

Image contents you've let an untrusted guest write is untrusted.

Therefore, there's a trust boundary between image contents and
meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
contents.  The exact location of the trust boundary depends on the image
format.

= How we instruct QEMU what to trust =

By configuring QEMU to use an image, the user instructs QEMU to trust
the image's meta-data.

When the user's configuration specifies the image format explicitly, the
trust boundary is clear.

Else, the trust boundary is ambigous when more than one format is
possible.

QEMU resolves this ambiguity by picking the first format with the
highest "score".  Raw format is always possible, and always has the
lowest score.

= How this lets the guest escape isolation =

Unfortunately, this lets the guest shift the trust boundary and escape
isolation, as follows:

* Expose a raw image to the guest (whether you specify the format=raw or
  let QEMU guess it doesn't matter).  The complete contents becomes
  untrusted.

* Reuse the image *without* specifying the raw format.  QEMU guesses the
  format based on untrusted image contents.  Now QEMU guesses a format
  chosen by the guest, with meta-data chosen by the guest.  By
  controlling image meta-data, the malicious guest can access arbitrary
  files as QEMU, enlarge its storage, and more.  A non-malicious guest
  can accidentally DoS itself, by writing a pattern probing recognizes.

This is CVE-2008-2004.

= Aside: other trust boundaries =

Of course, this is not the only trust boundary that matters.  For
instance, there's normally one between your host and somebody else's
computers.  Telling QEMU to trust meta-data of some image you got "from
the internet" violates it.  There's nothing QEMU can do about that.

= Insecure usage is easy, secure usage is hard =

The oldest stratum of user interfaces doesn't let you specify the image
format.  Use of raw images with these is insecure by design.  These
interfaces are still recommended for human users.

Example of insecure usage: -hda foo.img, where foo.img is raw.

With the next generation of interfaces, specifying the image format is
optional.  Use of raw images with these is insecure by default.

Example of insecure usage: -drive file=foo.img,index=0,media=cdrom,
where foo.img is raw.  The -hda above is actually sugar for this.

Equivalent secure usage: add format=raw.

Note that specifying just the top image's format is not enough, you also
have to specify any backing images' formats.  QCOW2 can optionally store
the backing image format in the image.  The other COW formats can't.

Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
with a raw backing file.

Equivalent secure usage: Beats me.  Maybe there's a funky -drive
backing.whatever to specify the backing image's format.

With the latest interface blockdev-add, specifying the format is
mandatory.  Secure, but not really suitable for humans.

Example of secure usage:

    { "execute": "blockdev-add",
      "arguments": {'options': {
          'driver': 'raw', 'id':'foo',
          'file': { 'driver': 'file', 'filename': 'foo.img' } } } }

Insecure usage is easy, secure usage is *hard*.  Even for sophisticated
users like libvirt developers.  Evidence: libvirt CVE-2010-2237,
CVE-2010-2238, CVE-2010-2239, and more that didn't get a CVE, like the
recent accidental probing in drive-mirror.

= How can we better guard the trust boundary in QEMU? =

The guest can violate the trust boundary only because

(a) QEMU supports both raw images and image formats, and

(b) QEMU guesses image format from raw image contents, and

(c) given a raw image, guests can change its contents and control a
future QEMU's format guess.

We can attack one ore more of these three conditions:

(a) Outlaw raw images

(b) Don't guess format from untrusted image contents

(c) Prevent "bad" guest writes

Nobody is seriously suggesting we do (a).  It's clearly too late for
that.  Let's explore the other two in more detail.

== Don't guess format from untrusted image contents ==

Several variations of the theme.

Guessing only happens when the user doesn't specify a format, so the
simplest way to avoid it would be requiring users to always specify the
format.

PRO: Simple, plugs the hole.

CON: Breaks a lot of existing usage.  Bye -hda, hello extra typing.

Variation: command line option to opt out of probing completely.

PRO: Simple, plugs the hole.

CON: Insecure by default.

CON: In addition to opting out, you have to update your usage
accordingly.  I guess libvirt would do it anyway, to guard against
accidental probing once and for all.

Variation: Stefan proposed to make format= mandatory just for -drive.  I
guess he rather meant mandatory for anything but -hda and other selected
convenience interfaces.

PRO: No extra typing in the cases where it makes the most difference.

CON: Breaks existing usage in the other cases, extra typing.

CON: Doesn't fully plug the hole.  Users of convenience interfaces may
    remain insecure out of ignorance.  We could add a warning to guide
    them to more secure usage, but then that warning would annoy users
    who don't care for security (sometimes with reason), and we can't
    have that.  So we silently leave the users who would care if they
    knew insecure.

I proposed something less radical, namely to keep guessing the image
format, but base the guess on trusted meta-data only: file name and
attributes.  Block and character special files are raw.  For other
files, find the file name extension, and look up the format claiming it.

PRO: Plugs the hole.

CON: Breaks existing usage when the new guess differs from the old
    guess.  Common usage should be fine:

    * -hda test.qcow2

      Fine as long as test.qcow2 is really QCOW2 (as it should!), and
      either specifies a backing format (as it arguably should), or the
      backing file name is sane.

    * -hda disk.img

      Fine as long as disk.img is really a disk image (as it should).

    * -hda /dev/mapper/vg0-virtdisk

      Fine as long as the logical volume is raw.

    Less common usage can break:

    * -hda nbd://localhost

      Socket provides no clue, so no guess.

    Weird usage can conceivably break hard:

    * -hdd disk.img

      Breaks hard when disk.img is actually QCOW2, the guest boots
      anyway from another drive, then proceeds to overwrite this one.

Mitigation: lengthy transition period where we warn "this usage is
insecure, and we'll eventually break it; here's a hint on secure usage".

CON: We delay plugging the hole one more time.  But at least we no
longer expose our users to it silently.

Jeff pointed out that we want to retain probing in things like qemu-img
info.

Richard asked for a way for users to ask for insecure probing, say
format=insecure-probe.  I have no problem with giving users rope when
they ask for it.

Variation: if file name and attributes are unavailable or provide no
clue, guess raw.  Same PRO and CON as above, only it avoids breaking a
few more cases.  For instance, "-hda nbd://localhost" keeps working as
long as the server serves a raw image.  Smells a bit like too much magic
to me.

My proposal replaces probing wholesale.  I like that because it results
in simple, predictable guessing.  Here's a hybrid approach: first guess
raw vs. non-raw based on trusted meta-data only, and if non-raw, probe.

Nothing the guest writes can affect the raw vs. non-raw decision.  Once
an image is raw, only the user can make it non-raw, by changing its name
or attributes.

Two variations: 1. guess raw without a clue, and 2. guess non-raw then.

Again, same PRO and CON as above, only it doesn't break when users give
their non-raw images weird names.

== Prevent "bad" guest writes ==

Again, several variations, but this time, only the last one is serious,
the others are just for illustration.

Fail guest writes to those parts of the image that probing may examine
Can fail only writes to the first few sectors (at worst) of raw images.

PRO: Plugs the hole.

CON: The virtual hardware is defective.  Breaks common guest software
that writes to the first few sectors, such as boot loaders and
partitioning tools.  Breaks guest software using the whole device, which
isn't common, but certainly not unheard of.

Variation: fail only writes of patterns that actually can make probing
guess something other than raw.

PRO: Still plugs the hole.

CON: Except when you upgrade to a version that recognizes more patterns.

CON: The virtual hardware is still defective, but the defects are
minimized.  We can hope that partition tables, boot sectors and such
won't match the patterns, so common guest software hopefully works.
Guest software using the whole device still breaks, only now it breaks
later rather than sooner.

Variation: fail writes only on *probed* raw images.

CON: Doesn't fully plug the hole: mixing probed usage (user doesn't
specify format) with non-probed usage (user specifies format) remains
insecure.  The guest's write succeeds in non-probed usage, and the guest
escapes isolation in the next probed usage.

CON: The virtual hardware is still defective, but it now comes with a
"defective on/off" switch, factory default "defective on".  We could add
a warning to guide users to switch defective off but then that warning
would annoy people who don't care to switch it off (sometimes with
reason), and we can't have that.  So we leave users who would care if
they knew in the dark.

The two variations can be combined.  This is Kevin's proposal.

CON: Doesn't fully plug the hole: union of both variations' flaws.

CON: The virtual hardware is still defective: interesection of both
variations' defects.

= Conclusion =

This is *my* conclusion.  Yours may be different, and that's okay.  It's
business, not personal ;)

Secure usage should not be hard.

If we permit insecure usage for convenience or compatibility, we should
warn the user, unless he clearly asked for it.  Even if that warning
annoys Kevin and Max.  If you want to suppress it with configure
--reckless or something, no objections.

Same for defective virtual hardware.

Kevin's proposal to prevent "bad" guest writes has relatively small
compatibility issues.  It improves things from "insecure by default" to
"slightly defective by default" as long as you use raw images either
always probed or always non-probed.  It doesn't help at all when you
alternate probed and non-probed usage.  Improvement of sorts, but it
still fails my "secure usage should not be hard" requirement, and that
requirement is a hard one for me.

My proposal to ditch image contents probing entirely has more serious
compatibility issues.  In particular, we'd have to forgo sugared
convenience syntax for a number of less common things.  It definitely
needs a grace period where all usage we're going to break warns.  On the
up side, it will actually be secure by default when it's done.

If this is not acceptable, my second choice is actually the command line
option to opt out of probing completely.  This doesn't address "insecure
by default" (sadly), but it does at least satisfy my "secure usage
should not be hard" requirement.

If we should adopt Kevin's proposal against my objections, then I very
badly want the opt out option on top of it, opting out of both probing
and "bad" write prevention.

Thanks for hearing me out.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-04 18:45 [Qemu-devel] Image probing: how it can be insecure, and what we could do about it Markus Armbruster
@ 2014-11-04 20:33 ` Jeff Cody
  2014-11-05  7:04   ` Markus Armbruster
  2014-11-05  7:30 ` Markus Armbruster
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 38+ messages in thread
From: Jeff Cody @ 2014-11-04 20:33 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

On Tue, Nov 04, 2014 at 07:45:38PM +0100, Markus Armbruster wrote:
> I'll try to explain all solutions fairly.  Isn't easy when you're as
> biased towards one of them as I am.  Please bear with me.
> 
> 
> = The trust boundary between image contents and meta-data =
> 
> A disk image consists of image contents and meta-data.
> 
> Example: all of a raw image's contents is image contents.  Leaves just
> file name and attributes for meta-data.
> 
> Example: QCOW2 meta-data includes header, header extensions, L1 table,
> L2 tables, ...  The meta-data defines where in the image the actual
> contents is stored.
> 
> A guest can access the image contents, not the meta-data.
> 
> Image contents you've let an untrusted guest write is untrusted.
> 
> Therefore, there's a trust boundary between image contents and
> meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
> contents.  The exact location of the trust boundary depends on the image
> format.
> 
> 
> = How we instruct QEMU what to trust =
> 
> By configuring QEMU to use an image, the user instructs QEMU to trust
> the image's meta-data.
> 
> When the user's configuration specifies the image format explicitly, the
> trust boundary is clear.
> 
> Else, the trust boundary is ambigous when more than one format is
> possible.
> 
> QEMU resolves this ambiguity by picking the first format with the
> highest "score".  Raw format is always possible, and always has the
> lowest score.
> 
> 
> = How this lets the guest escape isolation =
> 
> Unfortunately, this lets the guest shift the trust boundary and escape
> isolation, as follows:
> 
> * Expose a raw image to the guest (whether you specify the format=raw or
>   let QEMU guess it doesn't matter).  The complete contents becomes
>   untrusted.
> 
> * Reuse the image *without* specifying the raw format.  QEMU guesses the
>   format based on untrusted image contents.  Now QEMU guesses a format
>   chosen by the guest, with meta-data chosen by the guest.  By
>   controlling image meta-data, the malicious guest can access arbitrary
>   files as QEMU, enlarge its storage, and more.  A non-malicious guest
>   can accidentally DoS itself, by writing a pattern probing recognizes.
> 
> This is CVE-2008-2004.
> 
> 
> = Aside: other trust boundaries =
> 
> Of course, this is not the only trust boundary that matters.  For
> instance, there's normally one between your host and somebody else's
> computers.  Telling QEMU to trust meta-data of some image you got "from
> the internet" violates it.  There's nothing QEMU can do about that.
> 
> 
> = Insecure usage is easy, secure usage is hard =
> 
> The oldest stratum of user interfaces doesn't let you specify the image
> format.  Use of raw images with these is insecure by design.  These
> interfaces are still recommended for human users.
> 
> Example of insecure usage: -hda foo.img, where foo.img is raw.
> 
> With the next generation of interfaces, specifying the image format is
> optional.  Use of raw images with these is insecure by default.
> 
> Example of insecure usage: -drive file=foo.img,index=0,media=cdrom,
> where foo.img is raw.  The -hda above is actually sugar for this.
> 
> Equivalent secure usage: add format=raw.
> 
> Note that specifying just the top image's format is not enough, you also
> have to specify any backing images' formats.  QCOW2 can optionally store
> the backing image format in the image.  The other COW formats can't.
> 
> Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
> with a raw backing file.
> 
> Equivalent secure usage: Beats me.  Maybe there's a funky -drive
> backing.whatever to specify the backing image's format.
> 
> With the latest interface blockdev-add, specifying the format is
> mandatory.  Secure, but not really suitable for humans.
> 
> Example of secure usage:
> 
>     { "execute": "blockdev-add",
>       "arguments": {'options': {
>           'driver': 'raw', 'id':'foo',
>           'file': { 'driver': 'file', 'filename': 'foo.img' } } } }
> 
> Insecure usage is easy, secure usage is *hard*.  Even for sophisticated
> users like libvirt developers.  Evidence: libvirt CVE-2010-2237,
> CVE-2010-2238, CVE-2010-2239, and more that didn't get a CVE, like the
> recent accidental probing in drive-mirror.
> 
> 
> = How can we better guard the trust boundary in QEMU? =
> 
> The guest can violate the trust boundary only because
> 
> (a) QEMU supports both raw images and image formats, and
> 
> (b) QEMU guesses image format from raw image contents, and
> 
> (c) given a raw image, guests can change its contents and control a
> future QEMU's format guess.
> 
> We can attack one ore more of these three conditions:
> 
> (a) Outlaw raw images
> 
> (b) Don't guess format from untrusted image contents
> 
> (c) Prevent "bad" guest writes
> 
> Nobody is seriously suggesting we do (a).  It's clearly too late for
> that.  Let's explore the other two in more detail.
> 
> == Don't guess format from untrusted image contents ==
> 
> Several variations of the theme.
> 
> Guessing only happens when the user doesn't specify a format, so the
> simplest way to avoid it would be requiring users to always specify the
> format.
> 
> PRO: Simple, plugs the hole.
> 
> CON: Breaks a lot of existing usage.  Bye -hda, hello extra typing.
> 
> Variation: command line option to opt out of probing completely.
> 
> PRO: Simple, plugs the hole.
> 
> CON: Insecure by default.
> 
> CON: In addition to opting out, you have to update your usage
> accordingly.  I guess libvirt would do it anyway, to guard against
> accidental probing once and for all.
> 
> Variation: Stefan proposed to make format= mandatory just for -drive.  I
> guess he rather meant mandatory for anything but -hda and other selected
> convenience interfaces.
> 
> PRO: No extra typing in the cases where it makes the most difference.
> 
> CON: Breaks existing usage in the other cases, extra typing.
> 
> CON: Doesn't fully plug the hole.  Users of convenience interfaces may
>     remain insecure out of ignorance.  We could add a warning to guide
>     them to more secure usage, but then that warning would annoy users
>     who don't care for security (sometimes with reason), and we can't
>     have that.  So we silently leave the users who would care if they
>     knew insecure.
> 
> I proposed something less radical, namely to keep guessing the image
> format, but base the guess on trusted meta-data only: file name and
> attributes.  Block and character special files are raw.  For other
> files, find the file name extension, and look up the format claiming it.
> 
> PRO: Plugs the hole.
> 
> CON: Breaks existing usage when the new guess differs from the old
>     guess.  Common usage should be fine:
> 
>     * -hda test.qcow2
> 
>       Fine as long as test.qcow2 is really QCOW2 (as it should!), and
>       either specifies a backing format (as it arguably should), or the
>       backing file name is sane.
> 
>     * -hda disk.img
> 
>       Fine as long as disk.img is really a disk image (as it should).
> 
>     * -hda /dev/mapper/vg0-virtdisk
> 
>       Fine as long as the logical volume is raw.
> 
>     Less common usage can break:
> 
>     * -hda nbd://localhost
> 
>       Socket provides no clue, so no guess.
> 
>     Weird usage can conceivably break hard:
> 
>     * -hdd disk.img
> 
>       Breaks hard when disk.img is actually QCOW2, the guest boots
>       anyway from another drive, then proceeds to overwrite this one.
> 
> Mitigation: lengthy transition period where we warn "this usage is
> insecure, and we'll eventually break it; here's a hint on secure usage".
> 
> CON: We delay plugging the hole one more time.  But at least we no
> longer expose our users to it silently.
> 
> Jeff pointed out that we want to retain probing in things like qemu-img
> info.
> 
> Richard asked for a way for users to ask for insecure probing, say
> format=insecure-probe.  I have no problem with giving users rope when
> they ask for it.
> 
> Variation: if file name and attributes are unavailable or provide no
> clue, guess raw.  Same PRO and CON as above, only it avoids breaking a
> few more cases.  For instance, "-hda nbd://localhost" keeps working as
> long as the server serves a raw image.  Smells a bit like too much magic
> to me.
>

How about another variation: probe as we do currently, but a
result of 'raw' is a hard failure ("Unable to determine image format,
it may be 'raw'.  You must specify the image format").

This makes sense, because of a probe result of raw currently really
means "I don't recognize this image", so maybe we should just treat it
as such.

Imagine two different QEMU binaries, with different image formats
compiled in.  Currently, by detecting "raw" we could cause data
corruption in a guest, if one QEMU version doesn't recognize 'foobar'
image format.

PRO: 
* Simple, and non-raw format probes will continue to behave the same

* We stop "guessing" raw

CON: 
* User could specify format of "raw", have a guest write an image
header to sector 0, and then in subsequent usage go back to probing

* Will break some scripts




> My proposal replaces probing wholesale.  I like that because it results
> in simple, predictable guessing.  Here's a hybrid approach: first guess
> raw vs. non-raw based on trusted meta-data only, and if non-raw, probe.
> 
> Nothing the guest writes can affect the raw vs. non-raw decision.  Once
> an image is raw, only the user can make it non-raw, by changing its name
> or attributes.
> 
> Two variations: 1. guess raw without a clue, and 2. guess non-raw then.
> 
> Again, same PRO and CON as above, only it doesn't break when users give
> their non-raw images weird names.
> 
> == Prevent "bad" guest writes ==
> 
> Again, several variations, but this time, only the last one is serious,
> the others are just for illustration.
> 
> Fail guest writes to those parts of the image that probing may examine
> Can fail only writes to the first few sectors (at worst) of raw images.
> 
> PRO: Plugs the hole.
> 
> CON: The virtual hardware is defective.  Breaks common guest software
> that writes to the first few sectors, such as boot loaders and
> partitioning tools.  Breaks guest software using the whole device, which
> isn't common, but certainly not unheard of.
> 
> Variation: fail only writes of patterns that actually can make probing
> guess something other than raw.
> 
> PRO: Still plugs the hole.
> 
> CON: Except when you upgrade to a version that recognizes more patterns.
> 
> CON: The virtual hardware is still defective, but the defects are
> minimized.  We can hope that partition tables, boot sectors and such
> won't match the patterns, so common guest software hopefully works.
> Guest software using the whole device still breaks, only now it breaks
> later rather than sooner.
> 
> Variation: fail writes only on *probed* raw images.
> 
> CON: Doesn't fully plug the hole: mixing probed usage (user doesn't
> specify format) with non-probed usage (user specifies format) remains
> insecure.  The guest's write succeeds in non-probed usage, and the guest
> escapes isolation in the next probed usage.
> 
> CON: The virtual hardware is still defective, but it now comes with a
> "defective on/off" switch, factory default "defective on".  We could add
> a warning to guide users to switch defective off but then that warning
> would annoy people who don't care to switch it off (sometimes with
> reason), and we can't have that.  So we leave users who would care if
> they knew in the dark.
> 
> The two variations can be combined.  This is Kevin's proposal.
> 
> CON: Doesn't fully plug the hole: union of both variations' flaws.
> 
> CON: The virtual hardware is still defective: interesection of both
> variations' defects.
> 
> 
> = Conclusion =
> 
> This is *my* conclusion.  Yours may be different, and that's okay.  It's
> business, not personal ;)
> 
> Secure usage should not be hard.
> 
> If we permit insecure usage for convenience or compatibility, we should
> warn the user, unless he clearly asked for it.  Even if that warning
> annoys Kevin and Max.  If you want to suppress it with configure
> --reckless or something, no objections.
> 
> Same for defective virtual hardware.
> 
> Kevin's proposal to prevent "bad" guest writes has relatively small
> compatibility issues.  It improves things from "insecure by default" to
> "slightly defective by default" as long as you use raw images either
> always probed or always non-probed.  It doesn't help at all when you
> alternate probed and non-probed usage.  Improvement of sorts, but it
> still fails my "secure usage should not be hard" requirement, and that
> requirement is a hard one for me.
> 
> My proposal to ditch image contents probing entirely has more serious
> compatibility issues.  In particular, we'd have to forgo sugared
> convenience syntax for a number of less common things.  It definitely
> needs a grace period where all usage we're going to break warns.  On the
> up side, it will actually be secure by default when it's done.
> 
> If this is not acceptable, my second choice is actually the command line
> option to opt out of probing completely.  This doesn't address "insecure
> by default" (sadly), but it does at least satisfy my "secure usage
> should not be hard" requirement.
> 
> If we should adopt Kevin's proposal against my objections, then I very
> badly want the opt out option on top of it, opting out of both probing
> and "bad" write prevention.
> 
> Thanks for hearing me out.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-04 20:33 ` Jeff Cody
@ 2014-11-05  7:04   ` Markus Armbruster
  0 siblings, 0 replies; 38+ messages in thread
From: Markus Armbruster @ 2014-11-05  7:04 UTC (permalink / raw)
  To: Jeff Cody; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Jeff Cody <jcody@redhat.com> writes:

> On Tue, Nov 04, 2014 at 07:45:38PM +0100, Markus Armbruster wrote:
[...]
>> == Don't guess format from untrusted image contents ==
>> 
>> Several variations of the theme.
>> 
>> Guessing only happens when the user doesn't specify a format, so the
>> simplest way to avoid it would be requiring users to always specify the
>> format.
>> 
>> PRO: Simple, plugs the hole.
>> 
>> CON: Breaks a lot of existing usage.  Bye -hda, hello extra typing.
>> 
>> Variation: command line option to opt out of probing completely.
>> 
>> PRO: Simple, plugs the hole.
>> 
>> CON: Insecure by default.
>> 
>> CON: In addition to opting out, you have to update your usage
>> accordingly.  I guess libvirt would do it anyway, to guard against
>> accidental probing once and for all.
>> 
>> Variation: Stefan proposed to make format= mandatory just for -drive.  I
>> guess he rather meant mandatory for anything but -hda and other selected
>> convenience interfaces.
>> 
>> PRO: No extra typing in the cases where it makes the most difference.
>> 
>> CON: Breaks existing usage in the other cases, extra typing.
>> 
>> CON: Doesn't fully plug the hole.  Users of convenience interfaces may
>>     remain insecure out of ignorance.  We could add a warning to guide
>>     them to more secure usage, but then that warning would annoy users
>>     who don't care for security (sometimes with reason), and we can't
>>     have that.  So we silently leave the users who would care if they
>>     knew insecure.
>> 
>> I proposed something less radical, namely to keep guessing the image
>> format, but base the guess on trusted meta-data only: file name and
>> attributes.  Block and character special files are raw.  For other
>> files, find the file name extension, and look up the format claiming it.
>> 
>> PRO: Plugs the hole.
>> 
>> CON: Breaks existing usage when the new guess differs from the old
>>     guess.  Common usage should be fine:
>> 
>>     * -hda test.qcow2
>> 
>>       Fine as long as test.qcow2 is really QCOW2 (as it should!), and
>>       either specifies a backing format (as it arguably should), or the
>>       backing file name is sane.
>> 
>>     * -hda disk.img
>> 
>>       Fine as long as disk.img is really a disk image (as it should).
>> 
>>     * -hda /dev/mapper/vg0-virtdisk
>> 
>>       Fine as long as the logical volume is raw.
>> 
>>     Less common usage can break:
>> 
>>     * -hda nbd://localhost
>> 
>>       Socket provides no clue, so no guess.
>> 
>>     Weird usage can conceivably break hard:
>> 
>>     * -hdd disk.img
>> 
>>       Breaks hard when disk.img is actually QCOW2, the guest boots
>>       anyway from another drive, then proceeds to overwrite this one.
>> 
>> Mitigation: lengthy transition period where we warn "this usage is
>> insecure, and we'll eventually break it; here's a hint on secure usage".
>> 
>> CON: We delay plugging the hole one more time.  But at least we no
>> longer expose our users to it silently.
>> 
>> Jeff pointed out that we want to retain probing in things like qemu-img
>> info.
>> 
>> Richard asked for a way for users to ask for insecure probing, say
>> format=insecure-probe.  I have no problem with giving users rope when
>> they ask for it.
>> 
>> Variation: if file name and attributes are unavailable or provide no
>> clue, guess raw.  Same PRO and CON as above, only it avoids breaking a
>> few more cases.  For instance, "-hda nbd://localhost" keeps working as
>> long as the server serves a raw image.  Smells a bit like too much magic
>> to me.
>>
>
> How about another variation: probe as we do currently, but a
> result of 'raw' is a hard failure ("Unable to determine image format,
> it may be 'raw'.  You must specify the image format").
>
> This makes sense, because of a probe result of raw currently really
> means "I don't recognize this image", so maybe we should just treat it
> as such.
>
> Imagine two different QEMU binaries, with different image formats
> compiled in.  Currently, by detecting "raw" we could cause data
> corruption in a guest, if one QEMU version doesn't recognize 'foobar'
> image format.

I like the honesty of letting probing yield "I don't know", and the
general simplicity.

> PRO: 
> * Simple, and non-raw format probes will continue to behave the same
>
> * We stop "guessing" raw
>
> CON: 
> * User could specify format of "raw", have a guest write an image
> header to sector 0, and then in subsequent usage go back to probing

Yes, it doesn't fully plug the hole, just like Kevin's proposal.  Thus,
it can't satisfy my "secure usage should not be hard" requirement on its
own.

> * Will break some scripts

It'll break common usage like -cdrom r7.iso and -hda foo.img.

>> My proposal replaces probing wholesale.  I like that because it results
>> in simple, predictable guessing.  Here's a hybrid approach: first guess
>> raw vs. non-raw based on trusted meta-data only, and if non-raw, probe.
>> 
>> Nothing the guest writes can affect the raw vs. non-raw decision.  Once
>> an image is raw, only the user can make it non-raw, by changing its name
>> or attributes.
>> 
>> Two variations: 1. guess raw without a clue, and 2. guess non-raw then.
>> 
>> Again, same PRO and CON as above, only it doesn't break when users give
>> their non-raw images weird names.
[...]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-04 18:45 [Qemu-devel] Image probing: how it can be insecure, and what we could do about it Markus Armbruster
  2014-11-04 20:33 ` Jeff Cody
@ 2014-11-05  7:30 ` Markus Armbruster
  2014-11-05  8:38 ` Max Reitz
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 38+ messages in thread
From: Markus Armbruster @ 2014-11-05  7:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Jeff Cody, Stefan Hajnoczi, Max Reitz

Markus Armbruster <armbru@redhat.com> writes:

[...]
> = How can we better guard the trust boundary in QEMU? =
>
> The guest can violate the trust boundary only because
>
> (a) QEMU supports both raw images and image formats, and
>
> (b) QEMU guesses image format from raw image contents, and
>
> (c) given a raw image, guests can change its contents and control a
> future QEMU's format guess.
>
> We can attack one ore more of these three conditions:
>
> (a) Outlaw raw images
>
> (b) Don't guess format from untrusted image contents
>
> (c) Prevent "bad" guest writes
>
> Nobody is seriously suggesting we do (a).  It's clearly too late for
> that.  Let's explore the other two in more detail.

Of course, if we really *want* (a)...

== Outlaw raw images ==

Define a simple raw image container, or steal an existing one we like
(suggestions welcome).  Drop support for containerless images.

Naturally, the raw container should support "single file",
i.e. container file actually contains the image.

It could also support separate container and image file, for easy and
efficient wrapping and unwrapping.

PRO: Simple, plugs the hole.

CON: Breaks a lot of existing usage.

CON: Upgrade involves image wrapping operations, which may be a bother.
If the container doesn't support separate image files, it's also a lot
of expensive copying.

Mitigation: lengthy transition period where we warn "use of
containerless raw images are insecure, and we'll eventually break this
usage; here's how to wrap your raw images in a container".

CON: We delay plugging the hole one more time.  But at least we no
longer expose our users to it silently.

Variation: keep supporting them with explicit format=raw.

PRO: Simple.

CON: Still breaks a lot of existing usage.  Not libvirt's, though.

CON: Doesn't fully plug the hole: mixing non-probed usage (user
specifies format=raw) with probed usage (user doesn't specify format)
remains insecure.  The guest's write succeeds in non-probed usage, and
the guest escapes isolation in the next probed usage.

[...]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-04 18:45 [Qemu-devel] Image probing: how it can be insecure, and what we could do about it Markus Armbruster
  2014-11-04 20:33 ` Jeff Cody
  2014-11-05  7:30 ` Markus Armbruster
@ 2014-11-05  8:38 ` Max Reitz
  2014-11-05 10:18   ` Eric Blake
                     ` (2 more replies)
  2014-11-05 10:12 ` Gerd Hoffmann
                   ` (3 subsequent siblings)
  6 siblings, 3 replies; 38+ messages in thread
From: Max Reitz @ 2014-11-05  8:38 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: Kevin Wolf, Jeff Cody, Stefan Hajnoczi

On 2014-11-04 at 19:45, Markus Armbruster wrote:
> I'll try to explain all solutions fairly.  Isn't easy when you're as
> biased towards one of them as I am.  Please bear with me.
>
>
> = The trust boundary between image contents and meta-data =
>
> A disk image consists of image contents and meta-data.
>
> Example: all of a raw image's contents is image contents.  Leaves just
> file name and attributes for meta-data.
>
> Example: QCOW2 meta-data includes header, header extensions, L1 table,
> L2 tables, ...  The meta-data defines where in the image the actual
> contents is stored.
>
> A guest can access the image contents, not the meta-data.
>
> Image contents you've let an untrusted guest write is untrusted.
>
> Therefore, there's a trust boundary between image contents and
> meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
> contents.  The exact location of the trust boundary depends on the image
> format.
>
>
> = How we instruct QEMU what to trust =
>
> By configuring QEMU to use an image, the user instructs QEMU to trust
> the image's meta-data.
>
> When the user's configuration specifies the image format explicitly, the
> trust boundary is clear.
>
> Else, the trust boundary is ambigous when more than one format is
> possible.
>
> QEMU resolves this ambiguity by picking the first format with the
> highest "score".  Raw format is always possible, and always has the
> lowest score.
>
>
> = How this lets the guest escape isolation =
>
> Unfortunately, this lets the guest shift the trust boundary and escape
> isolation, as follows:
>
> * Expose a raw image to the guest (whether you specify the format=raw or
>    let QEMU guess it doesn't matter).  The complete contents becomes
>    untrusted.
>
> * Reuse the image *without* specifying the raw format.  QEMU guesses the
>    format based on untrusted image contents.  Now QEMU guesses a format
>    chosen by the guest, with meta-data chosen by the guest.  By
>    controlling image meta-data, the malicious guest can access arbitrary
>    files as QEMU, enlarge its storage, and more.  A non-malicious guest
>    can accidentally DoS itself, by writing a pattern probing recognizes.

Thank you for bringing that to my attention. This means that I'm even 
more in favor of using Kevin's patches because in fact they don't break 
anything.

> This is CVE-2008-2004.
>
>
> = Aside: other trust boundaries =
>
> Of course, this is not the only trust boundary that matters.  For
> instance, there's normally one between your host and somebody else's
> computers.  Telling QEMU to trust meta-data of some image you got "from
> the internet" violates it.  There's nothing QEMU can do about that.
>
>
> = Insecure usage is easy, secure usage is hard =
>
> The oldest stratum of user interfaces doesn't let you specify the image
> format.  Use of raw images with these is insecure by design.  These
> interfaces are still recommended for human users.
>
> Example of insecure usage: -hda foo.img, where foo.img is raw.
>
> With the next generation of interfaces, specifying the image format is
> optional.  Use of raw images with these is insecure by default.
>
> Example of insecure usage: -drive file=foo.img,index=0,media=cdrom,
> where foo.img is raw.  The -hda above is actually sugar for this.
>
> Equivalent secure usage: add format=raw.
>
> Note that specifying just the top image's format is not enough, you also
> have to specify any backing images' formats.  QCOW2 can optionally store
> the backing image format in the image.  The other COW formats can't.

Well, they can, with "json:". *cough*

> Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
> with a raw backing file.

Yesterday I found out that doesn't seem possible. You apparently can 
only use VMDK with VMDK backing files. Other than that, we only have 
qcow1 and qed as COW formats which should not be used anyway.

> Equivalent secure usage: Beats me.  Maybe there's a funky -drive
> backing.whatever to specify the backing image's format.
>
> With the latest interface blockdev-add, specifying the format is
> mandatory.  Secure, but not really suitable for humans.
>
> Example of secure usage:
>
>      { "execute": "blockdev-add",
>        "arguments": {'options': {
>            'driver': 'raw', 'id':'foo',
>            'file': { 'driver': 'file', 'filename': 'foo.img' } } } }
>
> Insecure usage is easy, secure usage is *hard*.  Even for sophisticated
> users like libvirt developers.  Evidence: libvirt CVE-2010-2237,
> CVE-2010-2238, CVE-2010-2239, and more that didn't get a CVE, like the
> recent accidental probing in drive-mirror.
>
>
> = How can we better guard the trust boundary in QEMU? =
>
> The guest can violate the trust boundary only because
>
> (a) QEMU supports both raw images and image formats, and
>
> (b) QEMU guesses image format from raw image contents, and
>
> (c) given a raw image, guests can change its contents and control a
> future QEMU's format guess.
>
> We can attack one ore more of these three conditions:

I'd like to attack more because any of these steps might be carried out 
in another program which thus either becomes vulnerable itself (which we 
don't really have to care about, but I'd like to either way) or which 
makes qemu vulnerable.

Having an external program with (a) and (c), this makes qemu vulnerable 
if we don't try to forbid (b) or at least make it work better. Having an 
external program with (a) and (b), not doing anything against (c) in 
qemu makes that external program vulnerable.

> (a) Outlaw raw images
>
> (b) Don't guess format from untrusted image contents
>
> (c) Prevent "bad" guest writes
>
> Nobody is seriously suggesting we do (a).  It's clearly too late for
> that.  Let's explore the other two in more detail.

And thus I prefer to find and implement solutions for *both* (b) and (c).

> == Don't guess format from untrusted image contents ==
>
> Several variations of the theme.
>
> Guessing only happens when the user doesn't specify a format, so the
> simplest way to avoid it would be requiring users to always specify the
> format.
>
> PRO: Simple, plugs the hole.
>
> CON: Breaks a lot of existing usage.  Bye -hda, hello extra typing.
>
> Variation: command line option to opt out of probing completely.
>
> PRO: Simple, plugs the hole.
>
> CON: Insecure by default.
>
> CON: In addition to opting out, you have to update your usage
> accordingly.  I guess libvirt would do it anyway, to guard against
> accidental probing once and for all.
>
> Variation: Stefan proposed to make format= mandatory just for -drive.  I
> guess he rather meant mandatory for anything but -hda and other selected
> convenience interfaces.
>
> PRO: No extra typing in the cases where it makes the most difference.
>
> CON: Breaks existing usage in the other cases, extra typing.
>
> CON: Doesn't fully plug the hole.  Users of convenience interfaces may
>      remain insecure out of ignorance.  We could add a warning to guide
>      them to more secure usage, but then that warning would annoy users
>      who don't care for security (sometimes with reason), and we can't
>      have that.  So we silently leave the users who would care if they
>      knew insecure.
>
> I proposed something less radical, namely to keep guessing the image
> format, but base the guess on trusted meta-data only: file name and
> attributes.

You actually want to completely abolish probing? I thought we just 
wanted to use the filename as a second source of information and warn if 
the contents and the extension don't seem to match; and in the future, 
maybe make it an error (but we don't have to discuss that second part 
now, I think).

> Block and character special files are raw.  For other
> files, find the file name extension, and look up the format claiming it.
>
> PRO: Plugs the hole.

You mean "plugs hole (b)".

> CON: Breaks existing usage when the new guess differs from the old
>      guess.  Common usage should be fine:
>
>      * -hda test.qcow2
>
>        Fine as long as test.qcow2 is really QCOW2 (as it should!), and
>        either specifies a backing format (as it arguably should), or the
>        backing file name is sane.
>
>      * -hda disk.img
>
>        Fine as long as disk.img is really a disk image (as it should).
>
>      * -hda /dev/mapper/vg0-virtdisk
>
>        Fine as long as the logical volume is raw.
>
>      Less common usage can break:
>
>      * -hda nbd://localhost
>
>        Socket provides no clue, so no guess.

nbd should be raw. If it isn't, you're most likely doing something 
wrong. See https://bugzilla.redhat.com/show_bug.cgi?id=1090713 what 
happens when you're doing it wrong.

>      Weird usage can conceivably break hard:
>
>      * -hdd disk.img
>
>        Breaks hard when disk.img is actually QCOW2, the guest boots
>        anyway from another drive, then proceeds to overwrite this one.

Then why not continue to do probing and using the extension as a safeguard?

> Mitigation: lengthy transition period where we warn "this usage is
> insecure, and we'll eventually break it; here's a hint on secure usage".
>
> CON: We delay plugging the hole one more time.  But at least we no
> longer expose our users to it silently.
>
> Jeff pointed out that we want to retain probing in things like qemu-img
> info.
>
> Richard asked for a way for users to ask for insecure probing, say
> format=insecure-probe.  I have no problem with giving users rope when
> they ask for it.
>
> Variation: if file name and attributes are unavailable or provide no
> clue, guess raw.  Same PRO and CON as above, only it avoids breaking a
> few more cases.  For instance, "-hda nbd://localhost" keeps working as
> long as the server serves a raw image.

Which it should be.

> Smells a bit like too much magic
> to me.
>
> My proposal replaces probing wholesale.  I like that because it results
> in simple, predictable guessing.  Here's a hybrid approach: first guess
> raw vs. non-raw based on trusted meta-data only, and if non-raw, probe.
>
> Nothing the guest writes can affect the raw vs. non-raw decision.  Once
> an image is raw, only the user can make it non-raw, by changing its name
> or attributes.
>
> Two variations: 1. guess raw without a clue, and 2. guess non-raw then.
>
> Again, same PRO and CON as above, only it doesn't break when users give
> their non-raw images weird names.
>
> == Prevent "bad" guest writes ==
>
> Again, several variations, but this time, only the last one is serious,
> the others are just for illustration.
>
> Fail guest writes to those parts of the image that probing may examine
> Can fail only writes to the first few sectors (at worst) of raw images.
>
> PRO: Plugs the hole.
>
> CON: The virtual hardware is defective.  Breaks common guest software
> that writes to the first few sectors, such as boot loaders and
> partitioning tools.  Breaks guest software using the whole device, which
> isn't common, but certainly not unheard of.
>
> Variation: fail only writes of patterns that actually can make probing
> guess something other than raw.
>
> PRO: Still plugs the hole.

You mean "plugs hole (c)".

> CON: Except when you upgrade to a version that recognizes more patterns.

Which is better than not plugging hole (c) at all.

> CON: The virtual hardware is still defective, but the defects are
> minimized.

As you pointed out to us it's already defective and I don't think 
anybody ever noticed accidentally.

> We can hope that partition tables, boot sectors and such
> won't match the patterns, so common guest software hopefully works.

It's worked in the past, that's good enough for me.

> Guest software using the whole device still breaks, only now it breaks
> later rather than sooner.
>
> Variation: fail writes only on *probed* raw images.
>
> CON: Doesn't fully plug the hole: mixing probed usage (user doesn't
> specify format) with non-probed usage (user specifies format) remains
> insecure.  The guest's write succeeds in non-probed usage, and the guest
> escapes isolation in the next probed usage.
>
> CON: The virtual hardware is still defective, but it now comes with a
> "defective on/off" switch, factory default "defective on".  We could add
> a warning to guide users to switch defective off but then that warning
> would annoy people who don't care to switch it off (sometimes with
> reason), and we can't have that.  So we leave users who would care if
> they knew in the dark.
>
> The two variations can be combined.  This is Kevin's proposal.
>
> CON: Doesn't fully plug the hole: union of both variations' flaws.
>
> CON: The virtual hardware is still defective: interesection of both
> variations' defects.
>
>
> = Conclusion =
>
> This is *my* conclusion.  Yours may be different, and that's okay.  It's
> business, not personal ;)
>
> Secure usage should not be hard.
>
> If we permit insecure usage for convenience or compatibility, we should
> warn the user, unless he clearly asked for it.  Even if that warning
> annoys Kevin and Max.

A warning does not annoy me as long as I know what it means.

> If you want to suppress it with configure
> --reckless or something, no objections.
>
> Same for defective virtual hardware.
>
> Kevin's proposal to prevent "bad" guest writes has relatively small
> compatibility issues.  It improves things from "insecure by default" to
> "slightly defective by default" as long as you use raw images either
> always probed or always non-probed.  It doesn't help at all when you
> alternate probed and non-probed usage.  Improvement of sorts, but it
> still fails my "secure usage should not be hard" requirement, and that
> requirement is a hard one for me.
>
> My proposal to ditch image contents probing entirely has more serious
> compatibility issues.  In particular, we'd have to forgo sugared
> convenience syntax for a number of less common things.  It definitely
> needs a grace period where all usage we're going to break warns.  On the
> up side, it will actually be secure by default when it's done.
>
> If this is not acceptable, my second choice is actually the command line
> option to opt out of probing completely.  This doesn't address "insecure
> by default" (sadly), but it does at least satisfy my "secure usage
> should not be hard" requirement.
>
> If we should adopt Kevin's proposal against my objections, then I very
> badly want the opt out option on top of it, opting out of both probing
> and "bad" write prevention.
>
> Thanks for hearing me out.

My conclusion: Don't ditch probing. It increases entropy, why would you 
ditch probing? Just combine it with the extension and if both don't seem 
to match, that's an error.

Also, holes (b) and (c) are two different holes. We should fix both. We 
should fix (b) so qemu isn't vulnerable and we should fix (c) so qemu 
doesn't make other programs which do probe vulnerable.

So, for fixing (b): Just use the extensions as a safeguard and issue a 
warning for now. We can discuss about making it an error later.

And for fixing (c): As you pointed out, if guests wrote some 
probe-matching pattern in the past, it would break qemu (which is what 
we're trying to fix). Since noone ever said that some guest did that by 
accident, I think we can safely assume that prohibiting such writes will 
not hurt anyone in the future either; at least there are no 
compatibility issues, so if someone notices a problem, he/she can just 
explicitly specify the format and it'll work (which you should be doing 
anyway, as we all know, though many of us, including me, don't want to 
do it all the time).

Max

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-04 18:45 [Qemu-devel] Image probing: how it can be insecure, and what we could do about it Markus Armbruster
                   ` (2 preceding siblings ...)
  2014-11-05  8:38 ` Max Reitz
@ 2014-11-05 10:12 ` Gerd Hoffmann
  2014-11-05 10:33 ` Eric Blake
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 38+ messages in thread
From: Gerd Hoffmann @ 2014-11-05 10:12 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Jeff Cody, qemu-devel, Stefan Hajnoczi, Max Reitz

  Hi,

> My proposal to ditch image contents probing entirely has more serious
> compatibility issues.  In particular, we'd have to forgo sugared
> convenience syntax for a number of less common things.  It definitely
> needs a grace period where all usage we're going to break warns.  On the
> up side, it will actually be secure by default when it's done.

This makes most sense to me.  We can even have a config option to
control this, i.e. something like ...

-guessformat={allow-content,allow-content-with-warning,filename-only,off}

... and over time we'll make things more strict by default.

People can tweak things locally via cfg file in /etc/qemu if they wish.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-05  8:38 ` Max Reitz
@ 2014-11-05 10:18   ` Eric Blake
  2014-11-06 12:43     ` Markus Armbruster
  2014-11-05 11:15   ` Kevin Wolf
  2014-11-06 12:26   ` Markus Armbruster
  2 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2014-11-05 10:18 UTC (permalink / raw)
  To: Max Reitz, Markus Armbruster, qemu-devel
  Cc: Kevin Wolf, Jeff Cody, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 927 bytes --]

On 11/05/2014 09:38 AM, Max Reitz wrote:

>> Note that specifying just the top image's format is not enough, you also
>> have to specify any backing images' formats.  QCOW2 can optionally store
>> the backing image format in the image.  The other COW formats can't.
> 
> Well, they can, with "json:". *cough*
> 
>> Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
>> with a raw backing file.
> 
> Yesterday I found out that doesn't seem possible. You apparently can
> only use VMDK with VMDK backing files. Other than that, we only have
> qcow1 and qed as COW formats which should not be used anyway.

Actually, qed requires the backing format to be recorded (it is
non-optional) and is therefore immune to probing problems of backing
files.  That's one thing it got right.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-04 18:45 [Qemu-devel] Image probing: how it can be insecure, and what we could do about it Markus Armbruster
                   ` (3 preceding siblings ...)
  2014-11-05 10:12 ` Gerd Hoffmann
@ 2014-11-05 10:33 ` Eric Blake
  2014-11-06 12:52   ` Markus Armbruster
  2014-11-05 11:01 ` Kevin Wolf
  2014-11-05 15:24 ` Dr. David Alan Gilbert
  6 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2014-11-05 10:33 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel
  Cc: Kevin Wolf, Jeff Cody, Stefan Hajnoczi, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2199 bytes --]

On 11/04/2014 07:45 PM, Markus Armbruster wrote:
> I'll try to explain all solutions fairly.  Isn't easy when you're as
> biased towards one of them as I am.  Please bear with me.
> 

Thanks for this write-up.  I'll probably reply again, but for now I'm
focusing on just one thing I think you missed that came up in the
related threads:

> = How can we better guard the trust boundary in QEMU? =
> 
> The guest can violate the trust boundary only because
> 
> (a) QEMU supports both raw images and image formats, and
> 
> (b) QEMU guesses image format from raw image contents, and
> 
> (c) given a raw image, guests can change its contents and control a
> future QEMU's format guess.
> 
> We can attack one ore more of these three conditions:
> 
> (a) Outlaw raw images
> 
> (b) Don't guess format from untrusted image contents
> 
> (c) Prevent "bad" guest writes

(d) write metadata that records our guess before releasing data across a
trust boundary, so that we no longer need to probe on the next encounter

While this won't work for top-level images, it is a possibility for
backing store.  Any time we open a qcow2 file that has a backing file
without a format, we can rewrite the qcow2 metadata to record the type
that we ended up settling on, prior to allowing the guest to manipulate
the data.  The initial open is "safe" (we haven't yet handed the data to
the guest - and the trust boundary is not broken until the point that
the guest has had a chance to overwrite data) and all subsequent opens
are now safe (because we rewrote the metadata, subsequent operations no
longer need to probe the backing file, but are guaranteed to use the
same format as the first open, whether or not those subsequent
operations are performed by a different qemu that would have probed a
different type).

PRO: Plugs hole for backing files

CON: Doesn't help for top-level images.  In a chain "base <- mid <- top"
where neither mid nor top recorded backing type, it would require mid to
be temporarily opened read-write to update the metadata.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-04 18:45 [Qemu-devel] Image probing: how it can be insecure, and what we could do about it Markus Armbruster
                   ` (4 preceding siblings ...)
  2014-11-05 10:33 ` Eric Blake
@ 2014-11-05 11:01 ` Kevin Wolf
  2014-11-06 13:57   ` Markus Armbruster
  2014-11-05 15:24 ` Dr. David Alan Gilbert
  6 siblings, 1 reply; 38+ messages in thread
From: Kevin Wolf @ 2014-11-05 11:01 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Jeff Cody, qemu-devel, Stefan Hajnoczi, Max Reitz

Am 04.11.2014 um 19:45 hat Markus Armbruster geschrieben:
> I'll try to explain all solutions fairly.  Isn't easy when you're as
> biased towards one of them as I am.  Please bear with me.
> 
> 
> = The trust boundary between image contents and meta-data =
> 
> A disk image consists of image contents and meta-data.
> 
> Example: all of a raw image's contents is image contents.  Leaves just
> file name and attributes for meta-data.

Better: Leaves only protocol-specific metadata (e.g. file name and
attributes for raw-posix).

> Example: QCOW2 meta-data includes header, header extensions, L1 table,
> L2 tables, ...  The meta-data defines where in the image the actual
> contents is stored.
> 
> A guest can access the image contents, not the meta-data.
> 
> Image contents you've let an untrusted guest write is untrusted.
> 
> Therefore, there's a trust boundary between image contents and
> meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
> contents.  The exact location of the trust boundary depends on the image
> format.

Trust metadata to a certain degree - the block layer audit was started
because we noticed that it might not be all that trustworthy in
practice. Different problem, though, it just shows that there are hardly
any clear borders between "always completely trusted" and "always
completely untrusted".

> = How we instruct QEMU what to trust =
> 
> By configuring QEMU to use an image, the user instructs QEMU to trust
> the image's meta-data.
> 
> When the user's configuration specifies the image format explicitly, the
> trust boundary is clear.
> 
> Else, the trust boundary is ambigous when more than one format is
> possible.
> 
> QEMU resolves this ambiguity by picking the first format with the
> highest "score".  Raw format is always possible, and always has the
> lowest score.

You used the term "untrusted guest" before. Are there any trusted guests,
or should we assume that guests are untrusted by definition?

If you use virtualisation for isolation, then the answer is probably
that guests are always untrusted. Other users may know exactly what
their guest is doing and are using qemu for other reasons. The former
would probably want to disable probing completely, the latter don't care
about it and prefer convenience.

My guess is that the share of those with trusted guests is higher among
direct qemu users than libvirt users, but it's just that, a guess. It
also doesn't mean that they are the majority of direct qemu users (they
might be, but I honestly don't know).

If there are trusted and untrusted guests, does this section need some
thoughts about instructing qemu whether to trust the guest or not?

> = How this lets the guest escape isolation =
> 
> Unfortunately, this lets the guest shift the trust boundary and escape
> isolation, as follows:
> 
> * Expose a raw image to the guest (whether you specify the format=raw or
>   let QEMU guess it doesn't matter).  The complete contents becomes
>   untrusted.
> 
> * Reuse the image *without* specifying the raw format.  QEMU guesses the
>   format based on untrusted image contents.  Now QEMU guesses a format
>   chosen by the guest, with meta-data chosen by the guest.  By
>   controlling image meta-data, the malicious guest can access arbitrary
>   files as QEMU, enlarge its storage, and more.  A non-malicious guest
>   can accidentally DoS itself, by writing a pattern probing recognizes.
> 
> This is CVE-2008-2004.
> 
> 
> = Aside: other trust boundaries =
> 
> Of course, this is not the only trust boundary that matters.  For
> instance, there's normally one between your host and somebody else's
> computers.  Telling QEMU to trust meta-data of some image you got "from
> the internet" violates it.  There's nothing QEMU can do about that.

Okay, this addresses what I commented above.

> = Insecure usage is easy, secure usage is hard =
> 
> The oldest stratum of user interfaces doesn't let you specify the image
> format.  Use of raw images with these is insecure by design.  These
> interfaces are still recommended for human users.
> 
> Example of insecure usage: -hda foo.img, where foo.img is raw.
> 
> With the next generation of interfaces, specifying the image format is
> optional.  Use of raw images with these is insecure by default.
> 
> Example of insecure usage: -drive file=foo.img,index=0,media=cdrom,
> where foo.img is raw.  The -hda above is actually sugar for this.
> 
> Equivalent secure usage: add format=raw.
> 
> Note that specifying just the top image's format is not enough, you also
> have to specify any backing images' formats.  QCOW2 can optionally store
> the backing image format in the image.  The other COW formats can't.
> 
> Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
> with a raw backing file.

Usually this is mitigated by the fact that backing files are read-only.
Trouble is starting when you use things like commit.

> Equivalent secure usage: Beats me.  Maybe there's a funky -drive
> backing.whatever to specify the backing image's format.

Yes, you can override the backing file driver (backing.driver=raw should
do the trick). Not really user-friendly, especially with long backing
file chains, but it happens to be there.

And of course, libvirt should be using it for non-qcow2 or qcow2 without
the backing format header extension (but doesn't yet).

> With the latest interface blockdev-add, specifying the format is
> mandatory.  Secure, but not really suitable for humans.
> 
> Example of secure usage:
> 
>     { "execute": "blockdev-add",
>       "arguments": {'options': {
>           'driver': 'raw', 'id':'foo',
>           'file': { 'driver': 'file', 'filename': 'foo.img' } } } }

Backing file probing can still happen with blockdev-add. As for secure
usage, -drive should be equivalent with blockdev-add.

> Insecure usage is easy, secure usage is *hard*.  Even for sophisticated
> users like libvirt developers.  Evidence: libvirt CVE-2010-2237,
> CVE-2010-2238, CVE-2010-2239, and more that didn't get a CVE, like the
> recent accidental probing in drive-mirror.
> 
> 
> = How can we better guard the trust boundary in QEMU? =
> 
> The guest can violate the trust boundary only because
> 
> (a) QEMU supports both raw images and image formats, and
> 
> (b) QEMU guesses image format from raw image contents, and
> 
> (c) given a raw image, guests can change its contents and control a
> future QEMU's format guess.
> 
> We can attack one ore more of these three conditions:
> 
> (a) Outlaw raw images
> 
> (b) Don't guess format from untrusted image contents
> 
> (c) Prevent "bad" guest writes
> 
> Nobody is seriously suggesting we do (a).  It's clearly too late for
> that.  Let's explore the other two in more detail.
> 
> == Don't guess format from untrusted image contents ==
> 
> Several variations of the theme.
> 
> Guessing only happens when the user doesn't specify a format, so the
> simplest way to avoid it would be requiring users to always specify the
> format.
> 
> PRO: Simple, plugs the hole.
> 
> CON: Breaks a lot of existing usage.  Bye -hda, hello extra typing.

Not only extra typing, but affects existing scripts, too.

> Variation: command line option to opt out of probing completely.
> 
> PRO: Simple, plugs the hole.
> 
> CON: Insecure by default.
> 
> CON: In addition to opting out, you have to update your usage
> accordingly.  I guess libvirt would do it anyway, to guard against
> accidental probing once and for all.

This seems to come back to my comment above whether we need to tell qemu
whether the guest is trusted or not. It's an interesting thought that we
haven't discussed enough yet.

> Variation: Stefan proposed to make format= mandatory just for -drive.  I
> guess he rather meant mandatory for anything but -hda and other selected
> convenience interfaces.
> 
> PRO: No extra typing in the cases where it makes the most difference.
> 
> CON: Breaks existing usage in the other cases, extra typing.
> 
> CON: Doesn't fully plug the hole.  Users of convenience interfaces may
>     remain insecure out of ignorance.  We could add a warning to guide
>     them to more secure usage, but then that warning would annoy users
>     who don't care for security (sometimes with reason), and we can't
>     have that.  So we silently leave the users who would care if they
>     knew insecure.

I do think that we should allow those with trusted guests to use a
convenient interface, but their being annoyed by a warning shouldn't be
a reason not to print warnings. Ignoring the warning isn't any
considerable extra work.

> I proposed something less radical, namely to keep guessing the image
> format, but base the guess on trusted meta-data only: file name and
> attributes.  Block and character special files are raw.  For other
> files, find the file name extension, and look up the format claiming it.
> 
> PRO: Plugs the hole.
> 
> CON: Breaks existing usage when the new guess differs from the old
>     guess.  Common usage should be fine:
> 
>     * -hda test.qcow2
> 
>       Fine as long as test.qcow2 is really QCOW2 (as it should!), and
>       either specifies a backing format (as it arguably should), or the
>       backing file name is sane.
> 
>     * -hda disk.img
> 
>       Fine as long as disk.img is really a disk image (as it should).

.img is not as clear, I've seen people using it for other formats. It's
still a disk image, but not a raw one.

>     * -hda /dev/mapper/vg0-virtdisk
> 
>       Fine as long as the logical volume is raw.
> 
>     Less common usage can break:
> 
>     * -hda nbd://localhost
> 
>       Socket provides no clue, so no guess.
> 
>     Weird usage can conceivably break hard:
> 
>     * -hdd disk.img
> 
>       Breaks hard when disk.img is actually QCOW2, the guest boots
>       anyway from another drive, then proceeds to overwrite this one.
> 
> Mitigation: lengthy transition period where we warn "this usage is
> insecure, and we'll eventually break it; here's a hint on secure usage".
> 
> CON: We delay plugging the hole one more time.  But at least we no
> longer expose our users to it silently.

CON: Relies on metadata that is protocol-specific. Each protocol that
     should support probing needs extra code. Essentially means that
     probing will be disabled on anything except raw-posix (and if we're
     lucky enough that someone pays attention during review, raw-win32)

> Jeff pointed out that we want to retain probing in things like qemu-img
> info.
> 
> Richard asked for a way for users to ask for insecure probing, say
> format=insecure-probe.  I have no problem with giving users rope when
> they ask for it.

Should actually fall out naturally from my bdrv_open() refactorings.
I'll need at least "probe-format" and "probe-protocol" for internal use,
and there's nothing that stops us from allowing it in the schema.

> Variation: if file name and attributes are unavailable or provide no
> clue, guess raw.  Same PRO and CON as above, only it avoids breaking a
> few more cases.  For instance, "-hda nbd://localhost" keeps working as
> long as the server serves a raw image.  Smells a bit like too much magic
> to me.

Can break in the same way as disk.img being qcow2. If we can tolerate
one instance of the problem (can we?), we can probably tolerate the
other one as well.

> My proposal replaces probing wholesale.  I like that because it results
> in simple, predictable guessing.  Here's a hybrid approach: first guess
> raw vs. non-raw based on trusted meta-data only, and if non-raw, probe.
> 
> Nothing the guest writes can affect the raw vs. non-raw decision.  Once
> an image is raw, only the user can make it non-raw, by changing its name
> or attributes.
> 
> Two variations: 1. guess raw without a clue, and 2. guess non-raw then.
> 
> Again, same PRO and CON as above, only it doesn't break when users give
> their non-raw images weird names.
> 
> == Prevent "bad" guest writes ==
> 
> Again, several variations, but this time, only the last one is serious,
> the others are just for illustration.
> 
> Fail guest writes to those parts of the image that probing may examine
> Can fail only writes to the first few sectors (at worst) of raw images.
> 
> PRO: Plugs the hole.
> 
> CON: The virtual hardware is defective.  Breaks common guest software
> that writes to the first few sectors, such as boot loaders and
> partitioning tools.  Breaks guest software using the whole device, which
> isn't common, but certainly not unheard of.
> 
> Variation: fail only writes of patterns that actually can make probing
> guess something other than raw.
> 
> PRO: Still plugs the hole.
> 
> CON: Except when you upgrade to a version that recognizes more patterns.
> 
> CON: The virtual hardware is still defective, but the defects are
> minimized.  We can hope that partition tables, boot sectors and such
> won't match the patterns, so common guest software hopefully works.
> Guest software using the whole device still breaks, only now it breaks
> later rather than sooner.
> 
> Variation: fail writes only on *probed* raw images.
> 
> CON: Doesn't fully plug the hole: mixing probed usage (user doesn't
> specify format) with non-probed usage (user specifies format) remains
> insecure.  The guest's write succeeds in non-probed usage, and the guest
> escapes isolation in the next probed usage.
> 
> CON: The virtual hardware is still defective, but it now comes with a
> "defective on/off" switch, factory default "defective on".  We could add
> a warning to guide users to switch defective off but then that warning
> would annoy people who don't care to switch it off (sometimes with
> reason), and we can't have that.  So we leave users who would care if
> they knew in the dark.
> 
> The two variations can be combined.  This is Kevin's proposal.
> 
> CON: Doesn't fully plug the hole: union of both variations' flaws.
> 
> CON: The virtual hardware is still defective: interesection of both
> variations' defects.

I like how you took care to avoid finding any PROs. :-)

I'll leave commenting on this section to others for now. I feel I have
already said enough about it in the other threads, and defending it here
at this point wouldn't help the discussion.

> = Conclusion =
> 
> This is *my* conclusion.  Yours may be different, and that's okay.  It's
> business, not personal ;)
> 
> Secure usage should not be hard.
> 
> If we permit insecure usage for convenience or compatibility, we should
> warn the user, unless he clearly asked for it.  Even if that warning
> annoys Kevin and Max.

It doesn't. I couldn't care less about a warning on my stderr.

(Practically, though, it causes a bit of work because qemu-iotests runs
into many of these warnings, changing the output. I have a patch that
adds a -f to qemu-io, now I need to teach qemu-iotests to make use of
it.)

> If you want to suppress it with configure
> --reckless or something, no objections.
> 
> Same for defective virtual hardware.
> 
> Kevin's proposal to prevent "bad" guest writes has relatively small
> compatibility issues.  It improves things from "insecure by default" to
> "slightly defective by default" as long as you use raw images either
> always probed or always non-probed.  It doesn't help at all when you
> alternate probed and non-probed usage.

Why would you do that?

And yes, it also doesn't help when you accidentally type format=qcow2
instead of format=raw. When you have images of both types, things like
this happen with manual typing.

> Improvement of sorts, but it
> still fails my "secure usage should not be hard" requirement, and that
> requirement is a hard one for me.
> 
> My proposal to ditch image contents probing entirely has more serious
> compatibility issues.  In particular, we'd have to forgo sugared
> convenience syntax for a number of less common things.  It definitely
> needs a grace period where all usage we're going to break warns.  On the
> up side, it will actually be secure by default when it's done.
> 
> If this is not acceptable, my second choice is actually the command line
> option to opt out of probing completely.  This doesn't address "insecure
> by default" (sadly), but it does at least satisfy my "secure usage
> should not be hard" requirement.
> 
> If we should adopt Kevin's proposal against my objections, then I very
> badly want the opt out option on top of it, opting out of both probing
> and "bad" write prevention.

This sounds quite reasonable. And opting out of probing automatically
opts out of "bad write prevention" because it only happens on probed
images.

To take this though a bit further: Do we have similar problems in other
areas in qemu? We could introduce a -[no-]strict option that forces
the user to be explicit in all places and return errors if they aren't.
I could even live with a configure switch that makes -strict the
default, as long as that configure switch remains disabled by default.

Kevin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-05  8:38 ` Max Reitz
  2014-11-05 10:18   ` Eric Blake
@ 2014-11-05 11:15   ` Kevin Wolf
  2014-11-06 12:26   ` Markus Armbruster
  2 siblings, 0 replies; 38+ messages in thread
From: Kevin Wolf @ 2014-11-05 11:15 UTC (permalink / raw)
  To: Max Reitz; +Cc: Jeff Cody, Markus Armbruster, Stefan Hajnoczi, qemu-devel

Am 05.11.2014 um 09:38 hat Max Reitz geschrieben:
> My conclusion: Don't ditch probing. It increases entropy, why would
> you ditch probing? Just combine it with the extension and if both
> don't seem to match, that's an error.

I actually kind of like this (in addition to preventing bad writes). If
we do have file name (or other metadata-specific) information that gives
us a clue, use it to double check the guess. If we don't, rely on
probing like we do today.

.qcow2 should never contain anything but qcow2, .iso should always be
raw. If we don't have a recognised extension, anything is okay. We need
to decide what to do with ambiguous extensions like .img or .vhd.

This again wouldn't be a perfect solution that catches all cases, but
it improves the situation and shouldn't cause too many compatibility
issues.

> So, for fixing (b): Just use the extensions as a safeguard and issue
> a warning for now. We can discuss about making it an error later.

Warnings are useless. They warn too late. It needs to be an error, and I
think when we don't require the filename check, it's reasonable enough
to do it from the start.

> And for fixing (c): As you pointed out, if guests wrote some
> probe-matching pattern in the past, it would break qemu (which is
> what we're trying to fix). Since noone ever said that some guest did
> that by accident, I think we can safely assume that prohibiting such
> writes will not hurt anyone in the future either; at least there are
> no compatibility issues

Good point, thanks for pointing it out.

Kevin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-04 18:45 [Qemu-devel] Image probing: how it can be insecure, and what we could do about it Markus Armbruster
                   ` (5 preceding siblings ...)
  2014-11-05 11:01 ` Kevin Wolf
@ 2014-11-05 15:24 ` Dr. David Alan Gilbert
  2014-11-06 13:04   ` Markus Armbruster
  6 siblings, 1 reply; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-05 15:24 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Jeff Cody, qemu-devel, Stefan Hajnoczi, Max Reitz

* Markus Armbruster (armbru@redhat.com) wrote:
> I'll try to explain all solutions fairly.  Isn't easy when you're as
> biased towards one of them as I am.  Please bear with me.
> 
> 
> = The trust boundary between image contents and meta-data =
> 
> A disk image consists of image contents and meta-data.
> 
> Example: all of a raw image's contents is image contents.  Leaves just
> file name and attributes for meta-data.
> 
> Example: QCOW2 meta-data includes header, header extensions, L1 table,
> L2 tables, ...  The meta-data defines where in the image the actual
> contents is stored.
> 
> A guest can access the image contents, not the meta-data.
> 
> Image contents you've let an untrusted guest write is untrusted.
> 
> Therefore, there's a trust boundary between image contents and
> meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
> contents.  The exact location of the trust boundary depends on the image
> format.

I'm not sure of the line:
'QEMU has to trust image meta-data'

I think there are different levels of trust and people will be more
prepared to accept levels of pain at the commandline to avoid different
types of problem.

A problem that could cause qemu to access arbitrary other files on the
host (as backing files for example) is obviously the worst; especially
if things like qemu-img and other analysis type stuff could trip it.

Stuff that only allows a guest to misuse it's own block storage is bad;
but it's nowhere near as bad as being able to walk around the host.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-05  8:38 ` Max Reitz
  2014-11-05 10:18   ` Eric Blake
  2014-11-05 11:15   ` Kevin Wolf
@ 2014-11-06 12:26   ` Markus Armbruster
  2014-11-06 12:53     ` Max Reitz
  2014-11-06 13:02     ` Kevin Wolf
  2 siblings, 2 replies; 38+ messages in thread
From: Markus Armbruster @ 2014-11-06 12:26 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, Jeff Cody, qemu-devel, Stefan Hajnoczi

Max Reitz <mreitz@redhat.com> writes:

> On 2014-11-04 at 19:45, Markus Armbruster wrote:
>> I'll try to explain all solutions fairly.  Isn't easy when you're as
>> biased towards one of them as I am.  Please bear with me.
>>
>>
>> = The trust boundary between image contents and meta-data =
>>
>> A disk image consists of image contents and meta-data.
>>
>> Example: all of a raw image's contents is image contents.  Leaves just
>> file name and attributes for meta-data.
>>
>> Example: QCOW2 meta-data includes header, header extensions, L1 table,
>> L2 tables, ...  The meta-data defines where in the image the actual
>> contents is stored.
>>
>> A guest can access the image contents, not the meta-data.
>>
>> Image contents you've let an untrusted guest write is untrusted.
>>
>> Therefore, there's a trust boundary between image contents and
>> meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
>> contents.  The exact location of the trust boundary depends on the image
>> format.
>>
>>
>> = How we instruct QEMU what to trust =
>>
>> By configuring QEMU to use an image, the user instructs QEMU to trust
>> the image's meta-data.
>>
>> When the user's configuration specifies the image format explicitly, the
>> trust boundary is clear.
>>
>> Else, the trust boundary is ambigous when more than one format is
>> possible.
>>
>> QEMU resolves this ambiguity by picking the first format with the
>> highest "score".  Raw format is always possible, and always has the
>> lowest score.
>>
>>
>> = How this lets the guest escape isolation =
>>
>> Unfortunately, this lets the guest shift the trust boundary and escape
>> isolation, as follows:
>>
>> * Expose a raw image to the guest (whether you specify the format=raw or
>>    let QEMU guess it doesn't matter).  The complete contents becomes
>>    untrusted.
>>
>> * Reuse the image *without* specifying the raw format.  QEMU guesses the
>>    format based on untrusted image contents.  Now QEMU guesses a format
>>    chosen by the guest, with meta-data chosen by the guest.  By
>>    controlling image meta-data, the malicious guest can access arbitrary
>>    files as QEMU, enlarge its storage, and more.  A non-malicious guest
>>    can accidentally DoS itself, by writing a pattern probing recognizes.
>
> Thank you for bringing that to my attention. This means that I'm even
> more in favor of using Kevin's patches because in fact they don't
> break anything.

They break things differently.  The difference may or may not matter.

Example: innocent guest writes a recognized pattern.

  Now: next restart fails, guest DoSed itself until host operator gets
  around to adding format=raw to the configuration.  Consequence:
  downtime (probably lengthy), but no data corruption.

  With Kevin's patch: write fails, guest may or may not handle the
  failure gracefully.  Consequences can range from "guest complains to
  its logs (who cares)" via "guest stops whatever it's doing and refuses
  to continue until its hardware gets fixed (downtime as above)" to
  "data corruption".

Example: innocent guest first writes a recognized pattern, then
overwrites it with a non-recognized pattern.

  Now: works.

  With Kevin's patch: as above.

Again, I'm not claiming the differences are serious in practice, only
that they exist.

>> This is CVE-2008-2004.
>>
>>
>> = Aside: other trust boundaries =
>>
>> Of course, this is not the only trust boundary that matters.  For
>> instance, there's normally one between your host and somebody else's
>> computers.  Telling QEMU to trust meta-data of some image you got "from
>> the internet" violates it.  There's nothing QEMU can do about that.
>>
>>
>> = Insecure usage is easy, secure usage is hard =
>>
>> The oldest stratum of user interfaces doesn't let you specify the image
>> format.  Use of raw images with these is insecure by design.  These
>> interfaces are still recommended for human users.
>>
>> Example of insecure usage: -hda foo.img, where foo.img is raw.
>>
>> With the next generation of interfaces, specifying the image format is
>> optional.  Use of raw images with these is insecure by default.
>>
>> Example of insecure usage: -drive file=foo.img,index=0,media=cdrom,
>> where foo.img is raw.  The -hda above is actually sugar for this.
>>
>> Equivalent secure usage: add format=raw.
>>
>> Note that specifying just the top image's format is not enough, you also
>> have to specify any backing images' formats.  QCOW2 can optionally store
>> the backing image format in the image.  The other COW formats can't.
>
> Well, they can, with "json:". *cough*

Point coughingly taken.

>> Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
>> with a raw backing file.
>
> Yesterday I found out that doesn't seem possible. You apparently can
> only use VMDK with VMDK backing files.

I figure you're referring to this code in vmdk_create():

        if (strcmp(bs->drv->format_name, "vmdk")) {
            bdrv_unref(bs);
            ret = -EINVAL;
            goto exit;
        }

>                                        Other than that, we only have
> qcow1 and qed as COW formats which should not be used anyway.

qemu-doc.texi calls them "old image format", and qemu-img.texi has them
under "Other", "for compatibility with older QEMU versions".  I guess we
could do better job telling users they "should not be used anyway".

Even in old stuff retained just for compatibility, we should make an
effort to plug security holes, make secure usage easy, and guide users
away from insecure usage.

Now back to the point I was trying to make in my original message.

Replacement example of insecure usage: -hda bar.qcow2, where bar.qcow2
is a QCOW2 image with a raw backing file and no backing image format,
i.e. created without "-o backing_format=".

Then the next paragraph applies:

>> Equivalent secure usage: Beats me.  Maybe there's a funky -drive
>> backing.whatever to specify the backing image's format.

See Kevin's reply for equivalent secure usage.

>> With the latest interface blockdev-add, specifying the format is
>> mandatory.  Secure, but not really suitable for humans.
>>
>> Example of secure usage:
>>
>>      { "execute": "blockdev-add",
>>        "arguments": {'options': {
>>            'driver': 'raw', 'id':'foo',
>>            'file': { 'driver': 'file', 'filename': 'foo.img' } } } }
>>
>> Insecure usage is easy, secure usage is *hard*.  Even for sophisticated
>> users like libvirt developers.  Evidence: libvirt CVE-2010-2237,
>> CVE-2010-2238, CVE-2010-2239, and more that didn't get a CVE, like the
>> recent accidental probing in drive-mirror.
>>
>>
>> = How can we better guard the trust boundary in QEMU? =
>>
>> The guest can violate the trust boundary only because
>>
>> (a) QEMU supports both raw images and image formats, and
>>
>> (b) QEMU guesses image format from raw image contents, and
>>
>> (c) given a raw image, guests can change its contents and control a
>> future QEMU's format guess.
>>
>> We can attack one ore more of these three conditions:
>
> I'd like to attack more because any of these steps might be carried
> out in another program which thus either becomes vulnerable itself
> (which we don't really have to care about, but I'd like to either way)
> or which makes qemu vulnerable.
>
> Having an external program with (a) and (c), this makes qemu
> vulnerable if we don't try to forbid (b) or at least make it work
> better. Having an external program with (a) and (b), not doing
> anything against (c) in qemu makes that external program vulnerable.
>
>> (a) Outlaw raw images
>>
>> (b) Don't guess format from untrusted image contents
>>
>> (c) Prevent "bad" guest writes
>>
>> Nobody is seriously suggesting we do (a).  It's clearly too late for
>> that.  Let's explore the other two in more detail.
>
> And thus I prefer to find and implement solutions for *both* (b) and (c).

Quoting myself: "We can attack one ore more of these three conditions".

>> == Don't guess format from untrusted image contents ==
>>
>> Several variations of the theme.
>>
>> Guessing only happens when the user doesn't specify a format, so the
>> simplest way to avoid it would be requiring users to always specify the
>> format.
>>
>> PRO: Simple, plugs the hole.
>>
>> CON: Breaks a lot of existing usage.  Bye -hda, hello extra typing.
>>
>> Variation: command line option to opt out of probing completely.
>>
>> PRO: Simple, plugs the hole.
>>
>> CON: Insecure by default.
>>
>> CON: In addition to opting out, you have to update your usage
>> accordingly.  I guess libvirt would do it anyway, to guard against
>> accidental probing once and for all.
>>
>> Variation: Stefan proposed to make format= mandatory just for -drive.  I
>> guess he rather meant mandatory for anything but -hda and other selected
>> convenience interfaces.
>>
>> PRO: No extra typing in the cases where it makes the most difference.
>>
>> CON: Breaks existing usage in the other cases, extra typing.
>>
>> CON: Doesn't fully plug the hole.  Users of convenience interfaces may
>>      remain insecure out of ignorance.  We could add a warning to guide
>>      them to more secure usage, but then that warning would annoy users
>>      who don't care for security (sometimes with reason), and we can't
>>      have that.  So we silently leave the users who would care if they
>>      knew insecure.
>>
>> I proposed something less radical, namely to keep guessing the image
>> format, but base the guess on trusted meta-data only: file name and
>> attributes.
>
> You actually want to completely abolish probing? I thought we just
> wanted to use the filename as a second source of information and warn
> if the contents and the extension don't seem to match; and in the
> future, maybe make it an error (but we don't have to discuss that
> second part now, I think).

Yes, I propose to ditch it completely, after a suitable grace period.  I
tried to make that quite clear in my PATCH RFC 2/2.

Here's why.

Now: 1. probe
     4. open, error out if meta-data is bad

With my proposed patch:
     1. probe
     2. guess from trusted meta-data
     3. warn unless 1 and 2 match
     4. open, error out if meta-data is bad

Now make the warning an error:
     1. probe
     2. guess from trusted meta-data
     3. error out unless 1 and 2 match
     4. open, error out if meta-data is bad

I figure the following is equivalent, but simpler:

     2. guess from trusted meta-data
     4. open, error out if meta-data is bad

because open should detect all the errors the previous step 3 caught.
If not, things are broken with explicit format=.

>> Block and character special files are raw.  For other
>> files, find the file name extension, and look up the format claiming it.
>>
>> PRO: Plugs the hole.
>
> You mean "plugs hole (b)".

What I (airily) call "the hole" is the scenario I described above: guest
escaping isolation by subverting qemu-system-*.

(b) is not a hole, it's a condition for "the hole".

I guess what you want to say is "attacking just condition (b) doesn't
plug some other holes I care about".  That's true.  There are indeed
other holes.

For instance, guest attacking QEMU's utility programs qemu-img, ...  Or
guest attacking other host software.  I'm not trying to discount any of
them.  But tangling all problems up into a hairball won't do us any
good.

So your point that my analysis is narrow is taken.  In my defense, I can
say that my narrow analysis was difficult enough to write up, and
probably produced enough text to deter readers.

>> CON: Breaks existing usage when the new guess differs from the old
>>      guess.  Common usage should be fine:
>>
>>      * -hda test.qcow2
>>
>>        Fine as long as test.qcow2 is really QCOW2 (as it should!), and
>>        either specifies a backing format (as it arguably should), or the
>>        backing file name is sane.
>>
>>      * -hda disk.img
>>
>>        Fine as long as disk.img is really a disk image (as it should).
>>
>>      * -hda /dev/mapper/vg0-virtdisk
>>
>>        Fine as long as the logical volume is raw.
>>
>>      Less common usage can break:
>>
>>      * -hda nbd://localhost
>>
>>        Socket provides no clue, so no guess.
>
> nbd should be raw. If it isn't, you're most likely doing something
> wrong. See https://bugzilla.redhat.com/show_bug.cgi?id=1090713 what
> happens when you're doing it wrong.

Okay.

My RFC PATCH is too simplistic to exploit that, because it only looks at
file name extensions.  But "user asked for nbd protocol" is quite
obviously trusted meta-data.  We could have a less simplistic patch
putting it to use.  Or adopt the variation below.

>>      Weird usage can conceivably break hard:
>>
>>      * -hdd disk.img
>>
>>        Breaks hard when disk.img is actually QCOW2, the guest boots
>>        anyway from another drive, then proceeds to overwrite this one.
>
> Then why not continue to do probing and using the extension as a safeguard?
>
>> Mitigation: lengthy transition period where we warn "this usage is
>> insecure, and we'll eventually break it; here's a hint on secure usage".
>>
>> CON: We delay plugging the hole one more time.  But at least we no
>> longer expose our users to it silently.
>>
>> Jeff pointed out that we want to retain probing in things like qemu-img
>> info.
>>
>> Richard asked for a way for users to ask for insecure probing, say
>> format=insecure-probe.  I have no problem with giving users rope when
>> they ask for it.
>>
>> Variation: if file name and attributes are unavailable or provide no
>> clue, guess raw.  Same PRO and CON as above, only it avoids breaking a
>> few more cases.  For instance, "-hda nbd://localhost" keeps working as
>> long as the server serves a raw image.
>
> Which it should be.
>
>> Smells a bit like too much magic
>> to me.
>>
>> My proposal replaces probing wholesale.  I like that because it results
>> in simple, predictable guessing.  Here's a hybrid approach: first guess
>> raw vs. non-raw based on trusted meta-data only, and if non-raw, probe.
>>
>> Nothing the guest writes can affect the raw vs. non-raw decision.  Once
>> an image is raw, only the user can make it non-raw, by changing its name
>> or attributes.
>>
>> Two variations: 1. guess raw without a clue, and 2. guess non-raw then.
>>
>> Again, same PRO and CON as above, only it doesn't break when users give
>> their non-raw images weird names.
>>
>> == Prevent "bad" guest writes ==
>>
>> Again, several variations, but this time, only the last one is serious,
>> the others are just for illustration.
>>
>> Fail guest writes to those parts of the image that probing may examine
>> Can fail only writes to the first few sectors (at worst) of raw images.
>>
>> PRO: Plugs the hole.
>>
>> CON: The virtual hardware is defective.  Breaks common guest software
>> that writes to the first few sectors, such as boot loaders and
>> partitioning tools.  Breaks guest software using the whole device, which
>> isn't common, but certainly not unheard of.
>>
>> Variation: fail only writes of patterns that actually can make probing
>> guess something other than raw.
>>
>> PRO: Still plugs the hole.
>
> You mean "plugs hole (c)".

My reply to "plugs hole (b)" applies.

>> CON: Except when you upgrade to a version that recognizes more patterns.
>
> Which is better than not plugging hole (c) at all.
>
>> CON: The virtual hardware is still defective, but the defects are
>> minimized.
>
> As you pointed out to us it's already defective and I don't think
> anybody ever noticed accidentally.

You're right in that probed raw is already defective, with defects
delayed to the next restart.  Preventing "bad" guest writes changes the
nature of the defects subtly, as I described above.

This and the previous variation also extends them to non-probed raw.
The following variations avoid the extension.

>> We can hope that partition tables, boot sectors and such
>> won't match the patterns, so common guest software hopefully works.
>
> It's worked in the past, that's good enough for me.
>
>> Guest software using the whole device still breaks, only now it breaks
>> later rather than sooner.
>>
>> Variation: fail writes only on *probed* raw images.
>>
>> CON: Doesn't fully plug the hole: mixing probed usage (user doesn't
>> specify format) with non-probed usage (user specifies format) remains
>> insecure.  The guest's write succeeds in non-probed usage, and the guest
>> escapes isolation in the next probed usage.
>>
>> CON: The virtual hardware is still defective, but it now comes with a
>> "defective on/off" switch, factory default "defective on".  We could add
>> a warning to guide users to switch defective off but then that warning
>> would annoy people who don't care to switch it off (sometimes with
>> reason), and we can't have that.  So we leave users who would care if
>> they knew in the dark.
>>
>> The two variations can be combined.  This is Kevin's proposal.
>>
>> CON: Doesn't fully plug the hole: union of both variations' flaws.
>>
>> CON: The virtual hardware is still defective: interesection of both
>> variations' defects.
>>
>>
>> = Conclusion =
>>
>> This is *my* conclusion.  Yours may be different, and that's okay.  It's
>> business, not personal ;)
>>
>> Secure usage should not be hard.
>>
>> If we permit insecure usage for convenience or compatibility, we should
>> warn the user, unless he clearly asked for it.  Even if that warning
>> annoys Kevin and Max.
>
> A warning does not annoy me as long as I know what it means.

Good :)

>> If you want to suppress it with configure
>> --reckless or something, no objections.
>>
>> Same for defective virtual hardware.
>>
>> Kevin's proposal to prevent "bad" guest writes has relatively small
>> compatibility issues.  It improves things from "insecure by default" to
>> "slightly defective by default" as long as you use raw images either
>> always probed or always non-probed.  It doesn't help at all when you
>> alternate probed and non-probed usage.  Improvement of sorts, but it
>> still fails my "secure usage should not be hard" requirement, and that
>> requirement is a hard one for me.
>>
>> My proposal to ditch image contents probing entirely has more serious
>> compatibility issues.  In particular, we'd have to forgo sugared
>> convenience syntax for a number of less common things.  It definitely
>> needs a grace period where all usage we're going to break warns.  On the
>> up side, it will actually be secure by default when it's done.
>>
>> If this is not acceptable, my second choice is actually the command line
>> option to opt out of probing completely.  This doesn't address "insecure
>> by default" (sadly), but it does at least satisfy my "secure usage
>> should not be hard" requirement.
>>
>> If we should adopt Kevin's proposal against my objections, then I very
>> badly want the opt out option on top of it, opting out of both probing
>> and "bad" write prevention.
>>
>> Thanks for hearing me out.
>
> My conclusion: Don't ditch probing. It increases entropy, why would
> you ditch probing? Just combine it with the extension and if both
> don't seem to match, that's an error.

How does probing add value?

Compare

     1. probe
     2. guess from trusted meta-data
     3. error out unless 1 and 2 match
     4. open, error out if meta-data is bad

to just

     2. guess from trusted meta-data
     4. open, error out if meta-data is bad

Let P be the driver recommended by probe, and G be the driver
recommended by guess.

If P == G, same result: we open with the same driver.

Else, if open with G fails, equivalent result: error out in step 3
vs. error out in step 4.

Else, we have an odd case: one driver's probe accepts (P's), yet a
different driver's open succeeds (G's).

    If G's probe rejects: is this a bug?  Shouldn't open always fail
    when probe rejects?

    If G's probe accepts, then probing chose P over G.  Maybe it
    shouldn't have.  Or maybe the probing functions are imprecise.
    Anyway, keeping probing around makes this an error.  Should it be
    one?

Am I missing something?

> Also, holes (b) and (c) are two different holes. We should fix
> both. We should fix (b) so qemu isn't vulnerable and we should fix (c)
> so qemu doesn't make other programs which do probe vulnerable.

Adjusting terminology, but hopefully not your intent: "the hole" isn't
the only hole that matters.  The conditions enable other holes.
Therefore, we should attack both (b) and (c).  Attacking (a) is not
practical.

Points taken, except I think we could attack (a) if we really wanted.

> So, for fixing (b): Just use the extensions as a safeguard and issue a
> warning for now. We can discuss about making it an error later.
>
> And for fixing (c): As you pointed out, if guests wrote some
> probe-matching pattern in the past, it would break qemu (which is what
> we're trying to fix). Since noone ever said that some guest did that
> by accident, I think we can safely assume that prohibiting such writes
> will not hurt anyone in the future either; at least there are no
> compatibility issues, so if someone notices a problem, he/she can just
> explicitly specify the format and it'll work (which you should be
> doing anyway, as we all know, though many of us, including me, don't
> want to do it all the time).

Thanks for stating your conclusion concisely, it's really helpful.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-05 10:18   ` Eric Blake
@ 2014-11-06 12:43     ` Markus Armbruster
  2014-11-06 13:02       ` Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: Markus Armbruster @ 2014-11-06 12:43 UTC (permalink / raw)
  To: Eric Blake; +Cc: Kevin Wolf, Jeff Cody, qemu-devel, Stefan Hajnoczi, Max Reitz

Eric Blake <eblake@redhat.com> writes:

> On 11/05/2014 09:38 AM, Max Reitz wrote:
>
>>> Note that specifying just the top image's format is not enough, you also
>>> have to specify any backing images' formats.  QCOW2 can optionally store
>>> the backing image format in the image.  The other COW formats can't.
>> 
>> Well, they can, with "json:". *cough*
>> 
>>> Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
>>> with a raw backing file.
>> 
>> Yesterday I found out that doesn't seem possible. You apparently can
>> only use VMDK with VMDK backing files. Other than that, we only have
>> qcow1 and qed as COW formats which should not be used anyway.
>
> Actually, qed requires the backing format to be recorded (it is
> non-optional) and is therefore immune to probing problems of backing
> files.  That's one thing it got right.

If I read the code correctly:

QED has a feature bit QED_F_BACKING_FORMAT_NO_PROBE.

It is changed when you set the backing file format.  Setting format to
"raw" sets the flag, anything else (including nothing) clears the flag.
The actual non-raw format is not recorded.

Creating an image counts as setting the backing file format.

If the flag is set, open uses "raw"for the backing file (no probing).

If it's unset, open probes, and the probe may yield "raw".

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-05 10:33 ` Eric Blake
@ 2014-11-06 12:52   ` Markus Armbruster
  0 siblings, 0 replies; 38+ messages in thread
From: Markus Armbruster @ 2014-11-06 12:52 UTC (permalink / raw)
  To: Eric Blake; +Cc: Kevin Wolf, Jeff Cody, qemu-devel, Stefan Hajnoczi, Max Reitz

Eric Blake <eblake@redhat.com> writes:

> On 11/04/2014 07:45 PM, Markus Armbruster wrote:
>> I'll try to explain all solutions fairly.  Isn't easy when you're as
>> biased towards one of them as I am.  Please bear with me.
>> 
>
> Thanks for this write-up.  I'll probably reply again, but for now I'm
> focusing on just one thing I think you missed that came up in the
> related threads:
>
>
>> = How can we better guard the trust boundary in QEMU? =
>> 
>> The guest can violate the trust boundary only because
>> 
>> (a) QEMU supports both raw images and image formats, and
>> 
>> (b) QEMU guesses image format from raw image contents, and
>> 
>> (c) given a raw image, guests can change its contents and control a
>> future QEMU's format guess.
>> 
>> We can attack one ore more of these three conditions:
>> 
>> (a) Outlaw raw images
>> 
>> (b) Don't guess format from untrusted image contents
>> 
>> (c) Prevent "bad" guest writes
>
> (d) write metadata that records our guess before releasing data across a
> trust boundary, so that we no longer need to probe on the next encounter
>
> While this won't work for top-level images, it is a possibility for
> backing store.  Any time we open a qcow2 file that has a backing file
> without a format, we can rewrite the qcow2 metadata to record the type
> that we ended up settling on, prior to allowing the guest to manipulate
> the data.  The initial open is "safe" (we haven't yet handed the data to
> the guest - and the trust boundary is not broken until the point that
> the guest has had a chance to overwrite data) and all subsequent opens
> are now safe (because we rewrote the metadata, subsequent operations no
> longer need to probe the backing file, but are guaranteed to use the
> same format as the first open, whether or not those subsequent
> operations are performed by a different qemu that would have probed a
> different type).
>
> PRO: Plugs hole for backing files
>
> CON: Doesn't help for top-level images.  In a chain "base <- mid <- top"
> where neither mid nor top recorded backing type, it would require mid to
> be temporarily opened read-write to update the metadata.

I think we better distinguish two cases:

1. Freeze the backing format on create

   PRO: Plugs hole for backing files in new images

   CON: Incompatible change, but only for truly weird usage.

2. Freeze the backing format on writable open

   PRO: Plugs hole for this existing image's backing file

   CON: As above.

I'm quite comfy with 1's CON, but 2's gives me a slightly queasy
feeling.  Not enough of it to actually object, though.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-06 12:26   ` Markus Armbruster
@ 2014-11-06 12:53     ` Max Reitz
  2014-11-06 14:56       ` Jeff Cody
  2014-11-07  9:57       ` Markus Armbruster
  2014-11-06 13:02     ` Kevin Wolf
  1 sibling, 2 replies; 38+ messages in thread
From: Max Reitz @ 2014-11-06 12:53 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Kevin Wolf, Jeff Cody, qemu-devel, Stefan Hajnoczi

On 2014-11-06 at 13:26, Markus Armbruster wrote:
> Max Reitz <mreitz@redhat.com> writes:
>
>> On 2014-11-04 at 19:45, Markus Armbruster wrote:
>>> I'll try to explain all solutions fairly.  Isn't easy when you're as
>>> biased towards one of them as I am.  Please bear with me.
>>>
>>>
>>> = The trust boundary between image contents and meta-data =
>>>
>>> A disk image consists of image contents and meta-data.
>>>
>>> Example: all of a raw image's contents is image contents.  Leaves just
>>> file name and attributes for meta-data.
>>>
>>> Example: QCOW2 meta-data includes header, header extensions, L1 table,
>>> L2 tables, ...  The meta-data defines where in the image the actual
>>> contents is stored.
>>>
>>> A guest can access the image contents, not the meta-data.
>>>
>>> Image contents you've let an untrusted guest write is untrusted.
>>>
>>> Therefore, there's a trust boundary between image contents and
>>> meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
>>> contents.  The exact location of the trust boundary depends on the image
>>> format.
>>>
>>>
>>> = How we instruct QEMU what to trust =
>>>
>>> By configuring QEMU to use an image, the user instructs QEMU to trust
>>> the image's meta-data.
>>>
>>> When the user's configuration specifies the image format explicitly, the
>>> trust boundary is clear.
>>>
>>> Else, the trust boundary is ambigous when more than one format is
>>> possible.
>>>
>>> QEMU resolves this ambiguity by picking the first format with the
>>> highest "score".  Raw format is always possible, and always has the
>>> lowest score.
>>>
>>>
>>> = How this lets the guest escape isolation =
>>>
>>> Unfortunately, this lets the guest shift the trust boundary and escape
>>> isolation, as follows:
>>>
>>> * Expose a raw image to the guest (whether you specify the format=raw or
>>>     let QEMU guess it doesn't matter).  The complete contents becomes
>>>     untrusted.
>>>
>>> * Reuse the image *without* specifying the raw format.  QEMU guesses the
>>>     format based on untrusted image contents.  Now QEMU guesses a format
>>>     chosen by the guest, with meta-data chosen by the guest.  By
>>>     controlling image meta-data, the malicious guest can access arbitrary
>>>     files as QEMU, enlarge its storage, and more.  A non-malicious guest
>>>     can accidentally DoS itself, by writing a pattern probing recognizes.
>> Thank you for bringing that to my attention. This means that I'm even
>> more in favor of using Kevin's patches because in fact they don't
>> break anything.
> They break things differently.  The difference may or may not matter.
>
> Example: innocent guest writes a recognized pattern.
>
>    Now: next restart fails, guest DoSed itself until host operator gets
>    around to adding format=raw to the configuration.  Consequence:
>    downtime (probably lengthy), but no data corruption.
>
>    With Kevin's patch: write fails, guest may or may not handle the
>    failure gracefully.  Consequences can range from "guest complains to
>    its logs (who cares)" via "guest stops whatever it's doing and refuses
>    to continue until its hardware gets fixed (downtime as above)" to
>    "data corruption".

You somehow seem convinced that writing to sector 0 is a completely 
normal operation. For x86, it isn't, though.

There are only a couple of programs which do that, I can only think of 
partitioning and setting up boot loaders. There's not a myriad of 
programs which would increase the probability of one both writing a 
recognizable pattern *and* not handling EPERM correctly.

I see the probability of both happening at the same time as extremely 
low, not least because there are only a handful of programs which access 
that sector.

> Example: innocent guest first writes a recognized pattern, then
> overwrites it with a non-recognized pattern.
>
>    Now: works.
>
>    With Kevin's patch: as above.

Not really, if the guest overwrites the data with a non-recognized 
pattern after EPERM it works as well. The difference here is that you 
won't have the intended data in the meantime.

> Again, I'm not claiming the differences are serious in practice, only
> that they exist.

True.

>>> This is CVE-2008-2004.
>>>
>>>
>>> = Aside: other trust boundaries =
>>>
>>> Of course, this is not the only trust boundary that matters.  For
>>> instance, there's normally one between your host and somebody else's
>>> computers.  Telling QEMU to trust meta-data of some image you got "from
>>> the internet" violates it.  There's nothing QEMU can do about that.
>>>
>>>
>>> = Insecure usage is easy, secure usage is hard =
>>>
>>> The oldest stratum of user interfaces doesn't let you specify the image
>>> format.  Use of raw images with these is insecure by design.  These
>>> interfaces are still recommended for human users.
>>>
>>> Example of insecure usage: -hda foo.img, where foo.img is raw.
>>>
>>> With the next generation of interfaces, specifying the image format is
>>> optional.  Use of raw images with these is insecure by default.
>>>
>>> Example of insecure usage: -drive file=foo.img,index=0,media=cdrom,
>>> where foo.img is raw.  The -hda above is actually sugar for this.
>>>
>>> Equivalent secure usage: add format=raw.
>>>
>>> Note that specifying just the top image's format is not enough, you also
>>> have to specify any backing images' formats.  QCOW2 can optionally store
>>> the backing image format in the image.  The other COW formats can't.
>> Well, they can, with "json:". *cough*
> Point coughingly taken.
>
>>> Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
>>> with a raw backing file.
>> Yesterday I found out that doesn't seem possible. You apparently can
>> only use VMDK with VMDK backing files.
> I figure you're referring to this code in vmdk_create():
>
>          if (strcmp(bs->drv->format_name, "vmdk")) {
>              bdrv_unref(bs);
>              ret = -EINVAL;
>              goto exit;
>          }

Yes. Of course it does depend on probing, which figures, I guess.

>>                                         Other than that, we only have
>> qcow1 and qed as COW formats which should not be used anyway.
> qemu-doc.texi calls them "old image format", and qemu-img.texi has them
> under "Other", "for compatibility with older QEMU versions".  I guess we
> could do better job telling users they "should not be used anyway".
>
> Even in old stuff retained just for compatibility, we should make an
> effort to plug security holes, make secure usage easy, and guide users
> away from insecure usage.

Of course, I'm just pointing out that it seems to be only qcow1 which is 
fully subject to this. qcow2 is partially, as you're pointing out next.

> Now back to the point I was trying to make in my original message.
>
> Replacement example of insecure usage: -hda bar.qcow2, where bar.qcow2
> is a QCOW2 image with a raw backing file and no backing image format,
> i.e. created without "-o backing_format=".
>
> Then the next paragraph applies:
>
>>> Equivalent secure usage: Beats me.  Maybe there's a funky -drive
>>> backing.whatever to specify the backing image's format.
> See Kevin's reply for equivalent secure usage.
>
>>> With the latest interface blockdev-add, specifying the format is
>>> mandatory.  Secure, but not really suitable for humans.
>>>
>>> Example of secure usage:
>>>
>>>       { "execute": "blockdev-add",
>>>         "arguments": {'options': {
>>>             'driver': 'raw', 'id':'foo',
>>>             'file': { 'driver': 'file', 'filename': 'foo.img' } } } }
>>>
>>> Insecure usage is easy, secure usage is *hard*.  Even for sophisticated
>>> users like libvirt developers.  Evidence: libvirt CVE-2010-2237,
>>> CVE-2010-2238, CVE-2010-2239, and more that didn't get a CVE, like the
>>> recent accidental probing in drive-mirror.
>>>
>>>
>>> = How can we better guard the trust boundary in QEMU? =
>>>
>>> The guest can violate the trust boundary only because
>>>
>>> (a) QEMU supports both raw images and image formats, and
>>>
>>> (b) QEMU guesses image format from raw image contents, and
>>>
>>> (c) given a raw image, guests can change its contents and control a
>>> future QEMU's format guess.
>>>
>>> We can attack one ore more of these three conditions:
>> I'd like to attack more because any of these steps might be carried
>> out in another program which thus either becomes vulnerable itself
>> (which we don't really have to care about, but I'd like to either way)
>> or which makes qemu vulnerable.
>>
>> Having an external program with (a) and (c), this makes qemu
>> vulnerable if we don't try to forbid (b) or at least make it work
>> better. Having an external program with (a) and (b), not doing
>> anything against (c) in qemu makes that external program vulnerable.
>>
>>> (a) Outlaw raw images
>>>
>>> (b) Don't guess format from untrusted image contents
>>>
>>> (c) Prevent "bad" guest writes
>>>
>>> Nobody is seriously suggesting we do (a).  It's clearly too late for
>>> that.  Let's explore the other two in more detail.
>> And thus I prefer to find and implement solutions for *both* (b) and (c).
> Quoting myself: "We can attack one ore more of these three conditions".

Yes, and I referred to that by saying "I'd like to attack more".

>>> == Don't guess format from untrusted image contents ==
>>>
>>> Several variations of the theme.
>>>
>>> Guessing only happens when the user doesn't specify a format, so the
>>> simplest way to avoid it would be requiring users to always specify the
>>> format.
>>>
>>> PRO: Simple, plugs the hole.
>>>
>>> CON: Breaks a lot of existing usage.  Bye -hda, hello extra typing.
>>>
>>> Variation: command line option to opt out of probing completely.
>>>
>>> PRO: Simple, plugs the hole.
>>>
>>> CON: Insecure by default.
>>>
>>> CON: In addition to opting out, you have to update your usage
>>> accordingly.  I guess libvirt would do it anyway, to guard against
>>> accidental probing once and for all.
>>>
>>> Variation: Stefan proposed to make format= mandatory just for -drive.  I
>>> guess he rather meant mandatory for anything but -hda and other selected
>>> convenience interfaces.
>>>
>>> PRO: No extra typing in the cases where it makes the most difference.
>>>
>>> CON: Breaks existing usage in the other cases, extra typing.
>>>
>>> CON: Doesn't fully plug the hole.  Users of convenience interfaces may
>>>       remain insecure out of ignorance.  We could add a warning to guide
>>>       them to more secure usage, but then that warning would annoy users
>>>       who don't care for security (sometimes with reason), and we can't
>>>       have that.  So we silently leave the users who would care if they
>>>       knew insecure.
>>>
>>> I proposed something less radical, namely to keep guessing the image
>>> format, but base the guess on trusted meta-data only: file name and
>>> attributes.
>> You actually want to completely abolish probing? I thought we just
>> wanted to use the filename as a second source of information and warn
>> if the contents and the extension don't seem to match; and in the
>> future, maybe make it an error (but we don't have to discuss that
>> second part now, I think).
> Yes, I propose to ditch it completely, after a suitable grace period.  I
> tried to make that quite clear in my PATCH RFC 2/2.
>
> Here's why.
>
> Now: 1. probe
>       4. open, error out if meta-data is bad
>
> With my proposed patch:
>       1. probe
>       2. guess from trusted meta-data
>       3. warn unless 1 and 2 match
>       4. open, error out if meta-data is bad
>
> Now make the warning an error:
>       1. probe
>       2. guess from trusted meta-data
>       3. error out unless 1 and 2 match
>       4. open, error out if meta-data is bad
>
> I figure the following is equivalent, but simpler:
>
>       2. guess from trusted meta-data
>       4. open, error out if meta-data is bad
>
> because open should detect all the errors the previous step 3 caught.
> If not, things are broken with explicit format=.

You're right, it seems be equivalent. One difference will probably be 
error messages ("Bad signature" vs. "Filename extension and format do 
not match").

The other difference is that you have a problem if you cannot 
distinguish between two formats by extensions, as we've seen already. 
.qcow could mean both qcow1 and qcow2; a similar problem appeared with 
.vhd. Opening the image using all possible formats seems bad to me. 
Better swap 1 and 2: Guess from commonly trusted metadata (i.e. the 
filename extension) and then probe all possible formats.

>>> Block and character special files are raw.  For other
>>> files, find the file name extension, and look up the format claiming it.
>>>
>>> PRO: Plugs the hole.
>> You mean "plugs hole (b)".
> What I (airily) call "the hole" is the scenario I described above: guest
> escaping isolation by subverting qemu-system-*.
>
> (b) is not a hole, it's a condition for "the hole".

Your "the hole" is only one hole. There are more holes. You only 
consider the case where there's only qemu, which is technically fine for 
discussion on qemu-devel, but I'd like to consider having other tools 
beside qemu as well.

You're right, though, it should be "prevents condition (b)".

> I guess what you want to say is "attacking just condition (b) doesn't
> plug some other holes I care about".  That's true.  There are indeed
> other holes.
>
> For instance, guest attacking QEMU's utility programs qemu-img, ...  Or
> guest attacking other host software.  I'm not trying to discount any of
> them.  But tangling all problems up into a hairball won't do us any
> good.

We have two approaches and I think using both will help to plug more 
holes. I'm not saying we should consider every possible hole with every 
host configuration there may be. I'm just saying using only of the 
approaches clearly only plugs "the hole" if there's only qemu.

> So your point that my analysis is narrow is taken.  In my defense, I can
> say that my narrow analysis was difficult enough to write up, and
> probably produced enough text to deter readers.
>
>>> CON: Breaks existing usage when the new guess differs from the old
>>>       guess.  Common usage should be fine:
>>>
>>>       * -hda test.qcow2
>>>
>>>         Fine as long as test.qcow2 is really QCOW2 (as it should!), and
>>>         either specifies a backing format (as it arguably should), or the
>>>         backing file name is sane.
>>>
>>>       * -hda disk.img
>>>
>>>         Fine as long as disk.img is really a disk image (as it should).
>>>
>>>       * -hda /dev/mapper/vg0-virtdisk
>>>
>>>         Fine as long as the logical volume is raw.
>>>
>>>       Less common usage can break:
>>>
>>>       * -hda nbd://localhost
>>>
>>>         Socket provides no clue, so no guess.
>> nbd should be raw. If it isn't, you're most likely doing something
>> wrong. See https://bugzilla.redhat.com/show_bug.cgi?id=1090713 what
>> happens when you're doing it wrong.
> Okay.
>
> My RFC PATCH is too simplistic to exploit that, because it only looks at
> file name extensions.  But "user asked for nbd protocol" is quite
> obviously trusted meta-data.  We could have a less simplistic patch
> putting it to use.  Or adopt the variation below.
>
>>>       Weird usage can conceivably break hard:
>>>
>>>       * -hdd disk.img
>>>
>>>         Breaks hard when disk.img is actually QCOW2, the guest boots
>>>         anyway from another drive, then proceeds to overwrite this one.
>> Then why not continue to do probing and using the extension as a safeguard?
>>
>>> Mitigation: lengthy transition period where we warn "this usage is
>>> insecure, and we'll eventually break it; here's a hint on secure usage".
>>>
>>> CON: We delay plugging the hole one more time.  But at least we no
>>> longer expose our users to it silently.
>>>
>>> Jeff pointed out that we want to retain probing in things like qemu-img
>>> info.
>>>
>>> Richard asked for a way for users to ask for insecure probing, say
>>> format=insecure-probe.  I have no problem with giving users rope when
>>> they ask for it.
>>>
>>> Variation: if file name and attributes are unavailable or provide no
>>> clue, guess raw.  Same PRO and CON as above, only it avoids breaking a
>>> few more cases.  For instance, "-hda nbd://localhost" keeps working as
>>> long as the server serves a raw image.
>> Which it should be.
>>
>>> Smells a bit like too much magic
>>> to me.
>>>
>>> My proposal replaces probing wholesale.  I like that because it results
>>> in simple, predictable guessing.  Here's a hybrid approach: first guess
>>> raw vs. non-raw based on trusted meta-data only, and if non-raw, probe.
>>>
>>> Nothing the guest writes can affect the raw vs. non-raw decision.  Once
>>> an image is raw, only the user can make it non-raw, by changing its name
>>> or attributes.
>>>
>>> Two variations: 1. guess raw without a clue, and 2. guess non-raw then.
>>>
>>> Again, same PRO and CON as above, only it doesn't break when users give
>>> their non-raw images weird names.
>>>
>>> == Prevent "bad" guest writes ==
>>>
>>> Again, several variations, but this time, only the last one is serious,
>>> the others are just for illustration.
>>>
>>> Fail guest writes to those parts of the image that probing may examine
>>> Can fail only writes to the first few sectors (at worst) of raw images.
>>>
>>> PRO: Plugs the hole.
>>>
>>> CON: The virtual hardware is defective.  Breaks common guest software
>>> that writes to the first few sectors, such as boot loaders and
>>> partitioning tools.  Breaks guest software using the whole device, which
>>> isn't common, but certainly not unheard of.
>>>
>>> Variation: fail only writes of patterns that actually can make probing
>>> guess something other than raw.
>>>
>>> PRO: Still plugs the hole.
>> You mean "plugs hole (c)".
> My reply to "plugs hole (b)" applies.
>
>>> CON: Except when you upgrade to a version that recognizes more patterns.
>> Which is better than not plugging hole (c) at all.
>>
>>> CON: The virtual hardware is still defective, but the defects are
>>> minimized.
>> As you pointed out to us it's already defective and I don't think
>> anybody ever noticed accidentally.
> You're right in that probed raw is already defective, with defects
> delayed to the next restart.  Preventing "bad" guest writes changes the
> nature of the defects subtly, as I described above.

Sector 0 is rarely ever written. It's not that people write some 
recognizable sequences there all the time but don't notice because they 
are quickly overwritten again.

If nobody hit the problem accidentally until now, I'm certain that means 
that nobody ever wrote any recognizable sequence there while using qemu 
and raw (which is not too rare a combination).

> This and the previous variation also extends them to non-probed raw.
> The following variations avoid the extension.
>
>>> We can hope that partition tables, boot sectors and such
>>> won't match the patterns, so common guest software hopefully works.
>> It's worked in the past, that's good enough for me.
>>
>>> Guest software using the whole device still breaks, only now it breaks
>>> later rather than sooner.
>>>
>>> Variation: fail writes only on *probed* raw images.
>>>
>>> CON: Doesn't fully plug the hole: mixing probed usage (user doesn't
>>> specify format) with non-probed usage (user specifies format) remains
>>> insecure.  The guest's write succeeds in non-probed usage, and the guest
>>> escapes isolation in the next probed usage.
>>>
>>> CON: The virtual hardware is still defective, but it now comes with a
>>> "defective on/off" switch, factory default "defective on".  We could add
>>> a warning to guide users to switch defective off but then that warning
>>> would annoy people who don't care to switch it off (sometimes with
>>> reason), and we can't have that.  So we leave users who would care if
>>> they knew in the dark.
>>>
>>> The two variations can be combined.  This is Kevin's proposal.
>>>
>>> CON: Doesn't fully plug the hole: union of both variations' flaws.
>>>
>>> CON: The virtual hardware is still defective: interesection of both
>>> variations' defects.
>>>
>>>
>>> = Conclusion =
>>>
>>> This is *my* conclusion.  Yours may be different, and that's okay.  It's
>>> business, not personal ;)
>>>
>>> Secure usage should not be hard.
>>>
>>> If we permit insecure usage for convenience or compatibility, we should
>>> warn the user, unless he clearly asked for it.  Even if that warning
>>> annoys Kevin and Max.
>> A warning does not annoy me as long as I know what it means.
> Good :)
>
>>> If you want to suppress it with configure
>>> --reckless or something, no objections.
>>>
>>> Same for defective virtual hardware.
>>>
>>> Kevin's proposal to prevent "bad" guest writes has relatively small
>>> compatibility issues.  It improves things from "insecure by default" to
>>> "slightly defective by default" as long as you use raw images either
>>> always probed or always non-probed.  It doesn't help at all when you
>>> alternate probed and non-probed usage.  Improvement of sorts, but it
>>> still fails my "secure usage should not be hard" requirement, and that
>>> requirement is a hard one for me.
>>>
>>> My proposal to ditch image contents probing entirely has more serious
>>> compatibility issues.  In particular, we'd have to forgo sugared
>>> convenience syntax for a number of less common things.  It definitely
>>> needs a grace period where all usage we're going to break warns.  On the
>>> up side, it will actually be secure by default when it's done.
>>>
>>> If this is not acceptable, my second choice is actually the command line
>>> option to opt out of probing completely.  This doesn't address "insecure
>>> by default" (sadly), but it does at least satisfy my "secure usage
>>> should not be hard" requirement.
>>>
>>> If we should adopt Kevin's proposal against my objections, then I very
>>> badly want the opt out option on top of it, opting out of both probing
>>> and "bad" write prevention.
>>>
>>> Thanks for hearing me out.
>> My conclusion: Don't ditch probing. It increases entropy, why would
>> you ditch probing? Just combine it with the extension and if both
>> don't seem to match, that's an error.
> How does probing add value?
>
> Compare
>
>       1. probe
>       2. guess from trusted meta-data
>       3. error out unless 1 and 2 match
>       4. open, error out if meta-data is bad
>
> to just
>
>       2. guess from trusted meta-data
>       4. open, error out if meta-data is bad
>
> Let P be the driver recommended by probe, and G be the driver
> recommended by guess.
>
> If P == G, same result: we open with the same driver.
>
> Else, if open with G fails, equivalent result: error out in step 3
> vs. error out in step 4.
>
> Else, we have an odd case: one driver's probe accepts (P's), yet a
> different driver's open succeeds (G's).
>
>      If G's probe rejects: is this a bug?  Shouldn't open always fail
>      when probe rejects?
>
>      If G's probe accepts, then probing chose P over G.  Maybe it
>      shouldn't have.  Or maybe the probing functions are imprecise.
>      Anyway, keeping probing around makes this an error.  Should it be
>      one?
>
> Am I missing something?

No, see my reply above. I failed to consider that opening an image 
basically is advanced probing.

The only thing I really see which could be missing is the problem of 
having ambiguous filename extensions, and that problem can easily be 
solved by keeping probing.

>> Also, holes (b) and (c) are two different holes. We should fix
>> both. We should fix (b) so qemu isn't vulnerable and we should fix (c)
>> so qemu doesn't make other programs which do probe vulnerable.
> Adjusting terminology, but hopefully not your intent: "the hole" isn't
> the only hole that matters.  The conditions enable other holes.
> Therefore, we should attack both (b) and (c).  Attacking (a) is not
> practical.

Yes. that's what I meant. There are various holes and preventing only 
one of the conditions (b) and (c) only plugs one (or a subset) of them, 
and I can easily see unplugged holes which can be fixed by preventing both.

> Points taken, except I think we could attack (a) if we really wanted.

Of course, but for that we'd either need some flat qcow2 mode or better 
support for an image format that does have such a flat mode.

Max

>> So, for fixing (b): Just use the extensions as a safeguard and issue a
>> warning for now. We can discuss about making it an error later.
>>
>> And for fixing (c): As you pointed out, if guests wrote some
>> probe-matching pattern in the past, it would break qemu (which is what
>> we're trying to fix). Since noone ever said that some guest did that
>> by accident, I think we can safely assume that prohibiting such writes
>> will not hurt anyone in the future either; at least there are no
>> compatibility issues, so if someone notices a problem, he/she can just
>> explicitly specify the format and it'll work (which you should be
>> doing anyway, as we all know, though many of us, including me, don't
>> want to do it all the time).
> Thanks for stating your conclusion concisely, it's really helpful.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-06 12:43     ` Markus Armbruster
@ 2014-11-06 13:02       ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2014-11-06 13:02 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Jeff Cody, qemu-devel, Stefan Hajnoczi, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1197 bytes --]

On 11/06/2014 01:43 PM, Markus Armbruster wrote:

>> Actually, qed requires the backing format to be recorded (it is
>> non-optional) and is therefore immune to probing problems of backing
>> files.  That's one thing it got right.
> 
> If I read the code correctly:
> 
> QED has a feature bit QED_F_BACKING_FORMAT_NO_PROBE.
> 
> It is changed when you set the backing file format.  Setting format to
> "raw" sets the flag, anything else (including nothing) clears the flag.
> The actual non-raw format is not recorded.
> 
> Creating an image counts as setting the backing file format.
> 
> If the flag is set, open uses "raw"for the backing file (no probing).
> 
> If it's unset, open probes, and the probe may yield "raw".

Eww.  Well, looks like a deficiency in the qed spec, and maybe all that
is needed to plug it is:

If the probe yields "raw", refuse to open the backing file (or put
another way, either the probe MUST find a non-raw file, or the user has
a bug that they forgot to set the raw bit so we refuse to open the file
to point out their bug).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-06 12:26   ` Markus Armbruster
  2014-11-06 12:53     ` Max Reitz
@ 2014-11-06 13:02     ` Kevin Wolf
  2014-11-07 14:50       ` Markus Armbruster
  1 sibling, 1 reply; 38+ messages in thread
From: Kevin Wolf @ 2014-11-06 13:02 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Jeff Cody, qemu-devel, Stefan Hajnoczi, Max Reitz

Am 06.11.2014 um 13:26 hat Markus Armbruster geschrieben:
> >> * Reuse the image *without* specifying the raw format.  QEMU guesses the
> >>    format based on untrusted image contents.  Now QEMU guesses a format
> >>    chosen by the guest, with meta-data chosen by the guest.  By
> >>    controlling image meta-data, the malicious guest can access arbitrary
> >>    files as QEMU, enlarge its storage, and more.  A non-malicious guest
> >>    can accidentally DoS itself, by writing a pattern probing recognizes.
> >
> > Thank you for bringing that to my attention. This means that I'm even
> > more in favor of using Kevin's patches because in fact they don't
> > break anything.
> 
> They break things differently.  The difference may or may not matter.
> 
> Example: innocent guest writes a recognized pattern.
> 
>   Now: next restart fails, guest DoSed itself until host operator gets
>   around to adding format=raw to the configuration.  Consequence:
>   downtime (probably lengthy), but no data corruption.

Depends.

Another possible outcome is that the guest doesn't DoS itself, but
writes a valid image header. You've argued before that a guest could be
keeping a qcow2 image on a whole disk for whatever odd reason. In this
case, the qemu restart will present the nested image instead of the
top-level one to the guest, which can probably be labelled corruption.

>   With Kevin's patch: write fails, guest may or may not handle the
>   failure gracefully.  Consequences can range from "guest complains to
>   its logs (who cares)" via "guest stops whatever it's doing and refuses
>   to continue until its hardware gets fixed (downtime as above)" to
>   "data corruption".
> 
> Example: innocent guest first writes a recognized pattern, then
> overwrites it with a non-recognized pattern.
> 
>   Now: works.
> 
>   With Kevin's patch: as above.
> 
> Again, I'm not claiming the differences are serious in practice, only
> that they exist.

Yes, the failure scenario is different. The point still stands that if
it were relevant in practice, we would likely have heard of it before.

> > You actually want to completely abolish probing? I thought we just
> > wanted to use the filename as a second source of information and warn
> > if the contents and the extension don't seem to match; and in the
> > future, maybe make it an error (but we don't have to discuss that
> > second part now, I think).
> 
> Yes, I propose to ditch it completely, after a suitable grace period.  I
> tried to make that quite clear in my PATCH RFC 2/2.
> 
> Here's why.
> 
> Now: 1. probe
>      4. open, error out if meta-data is bad
> 
> With my proposed patch:
>      1. probe
>      2. guess from trusted meta-data
>      3. warn unless 1 and 2 match
>      4. open, error out if meta-data is bad
> 
> Now make the warning an error:
>      1. probe
>      2. guess from trusted meta-data
>      3. error out unless 1 and 2 match
>      4. open, error out if meta-data is bad
> 
> I figure the following is equivalent, but simpler:
> 
>      2. guess from trusted meta-data
>      4. open, error out if meta-data is bad
> 
> because open should detect all the errors the previous step 3 caught.
> If not, things are broken with explicit format=.

Not entirely true, see below.

> > My conclusion: Don't ditch probing. It increases entropy, why would
> > you ditch probing? Just combine it with the extension and if both
> > don't seem to match, that's an error.
> 
> How does probing add value?
> 
> Compare
> 
>      1. probe
>      2. guess from trusted meta-data
>      3. error out unless 1 and 2 match
>      4. open, error out if meta-data is bad
> 
> to just
> 
>      2. guess from trusted meta-data
>      4. open, error out if meta-data is bad
> 
> Let P be the driver recommended by probe, and G be the driver
> recommended by guess.

P = qcow2, G = raw

> If P == G, same result: we open with the same driver.

No, they are not the same.

> Else, if open with G fails, equivalent result: error out in step 3
> vs. error out in step 4.

No, raw accepts the image.

> Else, we have an odd case: one driver's probe accepts (P's), yet a
> different driver's open succeeds (G's).
> 
>     If G's probe rejects: is this a bug?  Shouldn't open always fail
>     when probe rejects?

No, raw's probe doesn't reject, it just has a very low score.

>     If G's probe accepts, then probing chose P over G.  Maybe it
>     shouldn't have.  Or maybe the probing functions are imprecise.
>     Anyway, keeping probing around makes this an error.  Should it be
>     one?

Yes, raw being the fallback for everything is imprecise. It's the only
way we have for probing raw.

> Am I missing something?

This is the safety measure that was missing in your proposal, against
corruption caused by a qcow2 image stored in foo.img that is now
unintentionally opened as raw.

Same thing probably for qcow2 stored on LVs etc.

Kevin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-05 15:24 ` Dr. David Alan Gilbert
@ 2014-11-06 13:04   ` Markus Armbruster
  0 siblings, 0 replies; 38+ messages in thread
From: Markus Armbruster @ 2014-11-06 13:04 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Jeff Cody, qemu-devel, Stefan Hajnoczi, Max Reitz

"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:

> * Markus Armbruster (armbru@redhat.com) wrote:
>> I'll try to explain all solutions fairly.  Isn't easy when you're as
>> biased towards one of them as I am.  Please bear with me.
>> 
>> 
>> = The trust boundary between image contents and meta-data =
>> 
>> A disk image consists of image contents and meta-data.
>> 
>> Example: all of a raw image's contents is image contents.  Leaves just
>> file name and attributes for meta-data.
>> 
>> Example: QCOW2 meta-data includes header, header extensions, L1 table,
>> L2 tables, ...  The meta-data defines where in the image the actual
>> contents is stored.
>> 
>> A guest can access the image contents, not the meta-data.
>> 
>> Image contents you've let an untrusted guest write is untrusted.
>> 
>> Therefore, there's a trust boundary between image contents and
>> meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
>> contents.  The exact location of the trust boundary depends on the image
>> format.
>
> I'm not sure of the line:
> 'QEMU has to trust image meta-data'

Quoting myself: by configuring QEMU to use an image, the user instructs
QEMU to trust the image's meta-data.

> I think there are different levels of trust and people will be more
> prepared to accept levels of pain at the commandline to avoid different
> types of problem.
>
> A problem that could cause qemu to access arbitrary other files on the
> host (as backing files for example) is obviously the worst; especially
> if things like qemu-img and other analysis type stuff could trip it.
>
> Stuff that only allows a guest to misuse it's own block storage is bad;
> but it's nowhere near as bad as being able to walk around the host.

Yes, sensible, informed users will weigh risk against pain.  They may
decide that avoiding certain risks isn't worth the pain for them.

Reckless or under-informed users will go with whatever causes the least
pain.

Our job is to reduce the pain.  Secure usage should not be painful.

That way, sensible, informed users get to take less pain, and the other
users get hopefully exposed to less risk.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-05 11:01 ` Kevin Wolf
@ 2014-11-06 13:57   ` Markus Armbruster
  2014-11-06 14:14     ` Eric Blake
                       ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Markus Armbruster @ 2014-11-06 13:57 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Jeff Cody, qemu-devel, Stefan Hajnoczi, Max Reitz

Kevin Wolf <kwolf@redhat.com> writes:

> Am 04.11.2014 um 19:45 hat Markus Armbruster geschrieben:
>> I'll try to explain all solutions fairly.  Isn't easy when you're as
>> biased towards one of them as I am.  Please bear with me.
>> 
>> 
>> = The trust boundary between image contents and meta-data =
>> 
>> A disk image consists of image contents and meta-data.
>> 
>> Example: all of a raw image's contents is image contents.  Leaves just
>> file name and attributes for meta-data.
>
> Better: Leaves only protocol-specific metadata (e.g. file name and
> attributes for raw-posix).

Can you give examples for other protocols?

>> Example: QCOW2 meta-data includes header, header extensions, L1 table,
>> L2 tables, ...  The meta-data defines where in the image the actual
>> contents is stored.
>> 
>> A guest can access the image contents, not the meta-data.
>> 
>> Image contents you've let an untrusted guest write is untrusted.
>> 
>> Therefore, there's a trust boundary between image contents and
>> meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
>> contents.  The exact location of the trust boundary depends on the image
>> format.
>
> Trust metadata to a certain degree - the block layer audit was started
> because we noticed that it might not be all that trustworthy in
> practice. Different problem, though, it just shows that there are hardly
> any clear borders between "always completely trusted" and "always
> completely untrusted".

Yes.  Similar point made by Dave, I think.

Say I receive a QCOW2 image from an untrusted source.  Things I'd like
to be able to do securely with it:

* Examine it with qemu-img info.

* Open it with BDRV_O_NO_BACKING.

* If the backing file is innocent, open it without BDRV_O_NO_BACKING.

Assuming I do all this with trusted tools, of course.

However, here I talk about something else: the trust boundary between
image contents and meta-data.

>> = How we instruct QEMU what to trust =
>> 
>> By configuring QEMU to use an image, the user instructs QEMU to trust
>> the image's meta-data.
>> 
>> When the user's configuration specifies the image format explicitly, the
>> trust boundary is clear.
>> 
>> Else, the trust boundary is ambigous when more than one format is
>> possible.
>> 
>> QEMU resolves this ambiguity by picking the first format with the
>> highest "score".  Raw format is always possible, and always has the
>> lowest score.
>
> You used the term "untrusted guest" before. Are there any trusted guests,
> or should we assume that guests are untrusted by definition?

I'm trying to allow for users saying "guest subverting this trust
boundary is not an issue for me, now get off my lawn!"

> If you use virtualisation for isolation, then the answer is probably
> that guests are always untrusted. Other users may know exactly what
> their guest is doing and are using qemu for other reasons. The former
> would probably want to disable probing completely, the latter don't care
> about it and prefer convenience.

Yes.

The third group is unsophisticated users blissfully unaware of the
problem.

> My guess is that the share of those with trusted guests is higher among
> direct qemu users than libvirt users, but it's just that, a guess. It
> also doesn't mean that they are the majority of direct qemu users (they
> might be, but I honestly don't know).

Me neither.

If secure usage was easy, it wouldn't really matter.

> If there are trusted and untrusted guests, does this section need some
> thoughts about instructing qemu whether to trust the guest or not?

I don't know.

>> = How this lets the guest escape isolation =
>> 
>> Unfortunately, this lets the guest shift the trust boundary and escape
>> isolation, as follows:
>> 
>> * Expose a raw image to the guest (whether you specify the format=raw or
>>   let QEMU guess it doesn't matter).  The complete contents becomes
>>   untrusted.
>> 
>> * Reuse the image *without* specifying the raw format.  QEMU guesses the
>>   format based on untrusted image contents.  Now QEMU guesses a format
>>   chosen by the guest, with meta-data chosen by the guest.  By
>>   controlling image meta-data, the malicious guest can access arbitrary
>>   files as QEMU, enlarge its storage, and more.  A non-malicious guest
>>   can accidentally DoS itself, by writing a pattern probing recognizes.
>> 
>> This is CVE-2008-2004.
>> 
>> 
>> = Aside: other trust boundaries =
>> 
>> Of course, this is not the only trust boundary that matters.  For
>> instance, there's normally one between your host and somebody else's
>> computers.  Telling QEMU to trust meta-data of some image you got "from
>> the internet" violates it.  There's nothing QEMU can do about that.
>
> Okay, this addresses what I commented above.

Good :)

>> = Insecure usage is easy, secure usage is hard =
>> 
>> The oldest stratum of user interfaces doesn't let you specify the image
>> format.  Use of raw images with these is insecure by design.  These
>> interfaces are still recommended for human users.
>> 
>> Example of insecure usage: -hda foo.img, where foo.img is raw.
>> 
>> With the next generation of interfaces, specifying the image format is
>> optional.  Use of raw images with these is insecure by default.
>> 
>> Example of insecure usage: -drive file=foo.img,index=0,media=cdrom,
>> where foo.img is raw.  The -hda above is actually sugar for this.
>> 
>> Equivalent secure usage: add format=raw.
>> 
>> Note that specifying just the top image's format is not enough, you also
>> have to specify any backing images' formats.  QCOW2 can optionally store
>> the backing image format in the image.  The other COW formats can't.
>> 
>> Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
>> with a raw backing file.
>
> Usually this is mitigated by the fact that backing files are read-only.
> Trouble is starting when you use things like commit.

Yes.

>> Equivalent secure usage: Beats me.  Maybe there's a funky -drive
>> backing.whatever to specify the backing image's format.
>
> Yes, you can override the backing file driver (backing.driver=raw should
> do the trick). Not really user-friendly, especially with long backing
> file chains, but it happens to be there.
>
> And of course, libvirt should be using it for non-qcow2 or qcow2 without
> the backing format header extension (but doesn't yet).

I'm glad it's there.  Too bad libvirt doesn't use it, yet.  Supports my
point that secure usage is too hard now.

>> With the latest interface blockdev-add, specifying the format is
>> mandatory.  Secure, but not really suitable for humans.
>> 
>> Example of secure usage:
>> 
>>     { "execute": "blockdev-add",
>>       "arguments": {'options': {
>>           'driver': 'raw', 'id':'foo',
>>           'file': { 'driver': 'file', 'filename': 'foo.img' } } } }
>
> Backing file probing can still happen with blockdev-add. As for secure
> usage, -drive should be equivalent with blockdev-add.

More support for my point that secure usage is too hard now: even I
can't do it.

>> Insecure usage is easy, secure usage is *hard*.  Even for sophisticated
>> users like libvirt developers.  Evidence: libvirt CVE-2010-2237,
>> CVE-2010-2238, CVE-2010-2239, and more that didn't get a CVE, like the
>> recent accidental probing in drive-mirror.
>> 
>> 
>> = How can we better guard the trust boundary in QEMU? =
>> 
>> The guest can violate the trust boundary only because
>> 
>> (a) QEMU supports both raw images and image formats, and
>> 
>> (b) QEMU guesses image format from raw image contents, and
>> 
>> (c) given a raw image, guests can change its contents and control a
>> future QEMU's format guess.
>> 
>> We can attack one ore more of these three conditions:
>> 
>> (a) Outlaw raw images
>> 
>> (b) Don't guess format from untrusted image contents
>> 
>> (c) Prevent "bad" guest writes
>> 
>> Nobody is seriously suggesting we do (a).  It's clearly too late for
>> that.  Let's explore the other two in more detail.
>> 
>> == Don't guess format from untrusted image contents ==
>> 
>> Several variations of the theme.
>> 
>> Guessing only happens when the user doesn't specify a format, so the
>> simplest way to avoid it would be requiring users to always specify the
>> format.
>> 
>> PRO: Simple, plugs the hole.
>> 
>> CON: Breaks a lot of existing usage.  Bye -hda, hello extra typing.
>
> Not only extra typing, but affects existing scripts, too.

Yes, that's "breaks a lot of existing usage".

>> Variation: command line option to opt out of probing completely.
>> 
>> PRO: Simple, plugs the hole.
>> 
>> CON: Insecure by default.
>> 
>> CON: In addition to opting out, you have to update your usage
>> accordingly.  I guess libvirt would do it anyway, to guard against
>> accidental probing once and for all.
>
> This seems to come back to my comment above whether we need to tell qemu
> whether the guest is trusted or not. It's an interesting thought that we
> haven't discussed enough yet.

Yes.

>> Variation: Stefan proposed to make format= mandatory just for -drive.  I
>> guess he rather meant mandatory for anything but -hda and other selected
>> convenience interfaces.
>> 
>> PRO: No extra typing in the cases where it makes the most difference.
>> 
>> CON: Breaks existing usage in the other cases, extra typing.
>> 
>> CON: Doesn't fully plug the hole.  Users of convenience interfaces may
>>     remain insecure out of ignorance.  We could add a warning to guide
>>     them to more secure usage, but then that warning would annoy users
>>     who don't care for security (sometimes with reason), and we can't
>>     have that.  So we silently leave the users who would care if they
>>     knew insecure.
>
> I do think that we should allow those with trusted guests to use a
> convenient interface, but their being annoyed by a warning shouldn't be
> a reason not to print warnings. Ignoring the warning isn't any
> considerable extra work.

I stand corrected on warnings.

CON: Doesn't fully plug the hole.  Users of convenience interfaces may
    remain insecure out of ignorance.  We could add a warning to guide
    them to more secure usage.

>> I proposed something less radical, namely to keep guessing the image
>> format, but base the guess on trusted meta-data only: file name and
>> attributes.  Block and character special files are raw.  For other
>> files, find the file name extension, and look up the format claiming it.
>> 
>> PRO: Plugs the hole.
>> 
>> CON: Breaks existing usage when the new guess differs from the old
>>     guess.  Common usage should be fine:
>> 
>>     * -hda test.qcow2
>> 
>>       Fine as long as test.qcow2 is really QCOW2 (as it should!), and
>>       either specifies a backing format (as it arguably should), or the
>>       backing file name is sane.
>> 
>>     * -hda disk.img
>> 
>>       Fine as long as disk.img is really a disk image (as it should).
>
> .img is not as clear, I've seen people using it for other formats. It's
> still a disk image, but not a raw one.

Is this usage common?

>>     * -hda /dev/mapper/vg0-virtdisk
>> 
>>       Fine as long as the logical volume is raw.
>> 
>>     Less common usage can break:
>> 
>>     * -hda nbd://localhost
>> 
>>       Socket provides no clue, so no guess.
>> 
>>     Weird usage can conceivably break hard:
>> 
>>     * -hdd disk.img
>> 
>>       Breaks hard when disk.img is actually QCOW2, the guest boots
>>       anyway from another drive, then proceeds to overwrite this one.
>> 
>> Mitigation: lengthy transition period where we warn "this usage is
>> insecure, and we'll eventually break it; here's a hint on secure usage".
>> 
>> CON: We delay plugging the hole one more time.  But at least we no
>> longer expose our users to it silently.
>
> CON: Relies on metadata that is protocol-specific. Each protocol that
>      should support probing needs extra code. Essentially means that
>      probing will be disabled on anything except raw-posix (and if we're
>      lucky enough that someone pays attention during review, raw-win32)

Terminology: I use "probing" and "guessing from trusted meta-data".  The
former is for probing raw image contents.  The latter may only examine
trusted meta-data.

I suspect you mean "each protocol that should support guessing from
trusted meta-data needs extra code".  Do you?

>> Jeff pointed out that we want to retain probing in things like qemu-img
>> info.
>> 
>> Richard asked for a way for users to ask for insecure probing, say
>> format=insecure-probe.  I have no problem with giving users rope when
>> they ask for it.
>
> Should actually fall out naturally from my bdrv_open() refactorings.
> I'll need at least "probe-format" and "probe-protocol" for internal use,
> and there's nothing that stops us from allowing it in the schema.
>
>> Variation: if file name and attributes are unavailable or provide no
>> clue, guess raw.  Same PRO and CON as above, only it avoids breaking a
>> few more cases.  For instance, "-hda nbd://localhost" keeps working as
>> long as the server serves a raw image.  Smells a bit like too much magic
>> to me.
>
> Can break in the same way as disk.img being qcow2. If we can tolerate
> one instance of the problem (can we?), we can probably tolerate the
> other one as well.
>
>> My proposal replaces probing wholesale.  I like that because it results
>> in simple, predictable guessing.  Here's a hybrid approach: first guess
>> raw vs. non-raw based on trusted meta-data only, and if non-raw, probe.
>> 
>> Nothing the guest writes can affect the raw vs. non-raw decision.  Once
>> an image is raw, only the user can make it non-raw, by changing its name
>> or attributes.
>> 
>> Two variations: 1. guess raw without a clue, and 2. guess non-raw then.
>> 
>> Again, same PRO and CON as above, only it doesn't break when users give
>> their non-raw images weird names.
>> 
>> == Prevent "bad" guest writes ==
>> 
>> Again, several variations, but this time, only the last one is serious,
>> the others are just for illustration.
>> 
>> Fail guest writes to those parts of the image that probing may examine
>> Can fail only writes to the first few sectors (at worst) of raw images.
>> 
>> PRO: Plugs the hole.
>> 
>> CON: The virtual hardware is defective.  Breaks common guest software
>> that writes to the first few sectors, such as boot loaders and
>> partitioning tools.  Breaks guest software using the whole device, which
>> isn't common, but certainly not unheard of.
>> 
>> Variation: fail only writes of patterns that actually can make probing
>> guess something other than raw.
>> 
>> PRO: Still plugs the hole.
>> 
>> CON: Except when you upgrade to a version that recognizes more patterns.
>> 
>> CON: The virtual hardware is still defective, but the defects are
>> minimized.  We can hope that partition tables, boot sectors and such
>> won't match the patterns, so common guest software hopefully works.
>> Guest software using the whole device still breaks, only now it breaks
>> later rather than sooner.
>> 
>> Variation: fail writes only on *probed* raw images.
>> 
>> CON: Doesn't fully plug the hole: mixing probed usage (user doesn't
>> specify format) with non-probed usage (user specifies format) remains
>> insecure.  The guest's write succeeds in non-probed usage, and the guest
>> escapes isolation in the next probed usage.
>> 
>> CON: The virtual hardware is still defective, but it now comes with a
>> "defective on/off" switch, factory default "defective on".  We could add
>> a warning to guide users to switch defective off but then that warning
>> would annoy people who don't care to switch it off (sometimes with
>> reason), and we can't have that.  So we leave users who would care if
>> they knew in the dark.

Replace by

CON: The virtual hardware is still defective, but it now comes with a
"defective on/off" switch, factory default "defective on".  We could add
a warning to guide users to switch defective off.

>> The two variations can be combined.  This is Kevin's proposal.
>> 
>> CON: Doesn't fully plug the hole: union of both variations' flaws.
>> 
>> CON: The virtual hardware is still defective: interesection of both
>> variations' defects.
>
> I like how you took care to avoid finding any PROs. :-)

Come on, this line 281 of 327, cut me some slack :)

> I'll leave commenting on this section to others for now. I feel I have
> already said enough about it in the other threads, and defending it here
> at this point wouldn't help the discussion.

The purpose of this document is to summarize our thoughts.  Need yours
to achieve it.  Can you give me your concise PROs?

>> = Conclusion =
>> 
>> This is *my* conclusion.  Yours may be different, and that's okay.  It's
>> business, not personal ;)
>> 
>> Secure usage should not be hard.
>> 
>> If we permit insecure usage for convenience or compatibility, we should
>> warn the user, unless he clearly asked for it.  Even if that warning
>> annoys Kevin and Max.
>
> It doesn't. I couldn't care less about a warning on my stderr.

Good.

> (Practically, though, it causes a bit of work because qemu-iotests runs
> into many of these warnings, changing the output. I have a patch that
> adds a -f to qemu-io, now I need to teach qemu-iotests to make use of
> it.)
>
>> If you want to suppress it with configure
>> --reckless or something, no objections.
>> 
>> Same for defective virtual hardware.
>> 
>> Kevin's proposal to prevent "bad" guest writes has relatively small
>> compatibility issues.  It improves things from "insecure by default" to
>> "slightly defective by default" as long as you use raw images either
>> always probed or always non-probed.  It doesn't help at all when you
>> alternate probed and non-probed usage.
>
> Why would you do that?

Carelessness?

Why should it be dangerous?

> And yes, it also doesn't help when you accidentally type format=qcow2
> instead of format=raw. When you have images of both types, things like
> this happen with manual typing.

I consider mixing probed and non-probed usage a more plausible bad habit
than accidental use of format=qcow2, because the former works just fine,
but the latter fails (unless your guest has "helpfully" written a QCOW2
header).

>> Improvement of sorts, but it
>> still fails my "secure usage should not be hard" requirement, and that
>> requirement is a hard one for me.
>> 
>> My proposal to ditch image contents probing entirely has more serious
>> compatibility issues.  In particular, we'd have to forgo sugared
>> convenience syntax for a number of less common things.  It definitely
>> needs a grace period where all usage we're going to break warns.  On the
>> up side, it will actually be secure by default when it's done.
>> 
>> If this is not acceptable, my second choice is actually the command line
>> option to opt out of probing completely.  This doesn't address "insecure
>> by default" (sadly), but it does at least satisfy my "secure usage
>> should not be hard" requirement.
>> 
>> If we should adopt Kevin's proposal against my objections, then I very
>> badly want the opt out option on top of it, opting out of both probing
>> and "bad" write prevention.
>
> This sounds quite reasonable. And opting out of probing automatically
> opts out of "bad write prevention" because it only happens on probed
> images.
>
> To take this though a bit further: Do we have similar problems in other
> areas in qemu? We could introduce a -[no-]strict option that forces
> the user to be explicit in all places and return errors if they aren't.
> I could even live with a configure switch that makes -strict the
> default, as long as that configure switch remains disabled by default.

Open-ended question.  Good one, though.  Ideas, anyone?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-06 13:57   ` Markus Armbruster
@ 2014-11-06 14:14     ` Eric Blake
  2014-11-06 15:52       ` Jeff Cody
  2014-11-06 14:35     ` Jeff Cody
  2014-11-06 15:01     ` Kevin Wolf
  2 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2014-11-06 14:14 UTC (permalink / raw)
  To: Markus Armbruster, Kevin Wolf
  Cc: Jeff Cody, qemu-devel, Stefan Hajnoczi, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 3525 bytes --]

On 11/06/2014 02:57 PM, Markus Armbruster wrote:

>> Yes, you can override the backing file driver (backing.driver=raw should
>> do the trick). Not really user-friendly, especially with long backing
>> file chains, but it happens to be there.
>>
>> And of course, libvirt should be using it for non-qcow2 or qcow2 without
>> the backing format header extension (but doesn't yet).
> 
> I'm glad it's there.  Too bad libvirt doesn't use it, yet.  Supports my
> point that secure usage is too hard now.

Indeed, libvirt is still lacking on enforcing the backing type that it
probed vs. letting qemu re-probe a (possibly different) backing type.
Were libvirt to actually enforce this (that is, libvirt's default
out-of-the-box policy is to avoid all probes, and treat anything without
a type as raw) means that a user that forgets to use -obacking_fmt and
creates a chain base<-mid<-top will have the following conflicting view:

libvirt: mid[raw]<-top[qcow2]
qemu: base[qcow2]<-mid[qcow2]<-top[qcow2]

Right now, if the chain is only 2 deep, qemu happily boots the guest
with qcow2 format (in spite of libvirt treating mid as raw); but when
the chain is 3 deep, because libvirt failed to give SELinux permissions
to base, then qemu fails to open base, and fails to boot, but the
failure message is very cryptic.

On the other hand, if libvirt were to ENFORCE its view that mid is raw,
by passing appropriate drive options, then qemu would always boot, but
now the top image would be visibly corrupted (treating a qcow2 file as
raw leads to MUCH different data visible in the guest) and the guest
will likely fail to boot completely, but with no error message from qemu
(rather more likely that things just hang in a 100% cpu loop in the
guest).  Although the existing error message is cryptic, this new
behavior of enforcing a probed image to be raw feels like it will be
even more user-unfriendly.

At any rate, I've certainly been working on getting libvirt to output
the ENTIRE backing chain that it has determined, rather than its old
behavior of keeping that information in memory only; this at least helps
libvirt developers diagnose bug reports ("show me what libvirt thought
your backing chain was, then go fix your XML to tell libvirt the truth
and your problem will go away, if you didn't corrupt the backing files
in the meantime with something like a 'commit' operation").

[I mentioned libvirt's default policy is to forbid probing and treat
untyped disks as raw; but both of those knobs can be tweaked, to either
allow probing, or treat the default type as qcow2, or both]

>>
>> .img is not as clear, I've seen people using it for other formats. It's
>> still a disk image, but not a raw one.
> 
> Is this usage common?

At least on my laptop - yes.  I have several qcow2 files with an image
suffix of .img (perhaps because I was lazy when I created them, or
sometimes because I was quickly hacking a script to add a -fqcow2 to a
qemu-img line without changing the file name, because changing the name
would have a ripple effect on the rest of the script).  But I don't know
if my usage is typical, and I also don't mind adjusting my naming
conventions to silence a warning if qemu starts getting picky about
confusing name-vs-contents issues.  And if I consistently use
format=qcow2, I shouldn't be penalized with the warning, right?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-06 13:57   ` Markus Armbruster
  2014-11-06 14:14     ` Eric Blake
@ 2014-11-06 14:35     ` Jeff Cody
  2014-11-06 15:01     ` Kevin Wolf
  2 siblings, 0 replies; 38+ messages in thread
From: Jeff Cody @ 2014-11-06 14:35 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

On Thu, Nov 06, 2014 at 02:57:07PM +0100, Markus Armbruster wrote:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 04.11.2014 um 19:45 hat Markus Armbruster geschrieben:

[...]

> >> I proposed something less radical, namely to keep guessing the image
> >> format, but base the guess on trusted meta-data only: file name and
> >> attributes.  Block and character special files are raw.  For other
> >> files, find the file name extension, and look up the format claiming it.
> >> 
> >> PRO: Plugs the hole.
> >> 
> >> CON: Breaks existing usage when the new guess differs from the old
> >>     guess.  Common usage should be fine:
> >> 
> >>     * -hda test.qcow2
> >> 
> >>       Fine as long as test.qcow2 is really QCOW2 (as it should!), and
> >>       either specifies a backing format (as it arguably should), or the
> >>       backing file name is sane.
> >> 
> >>     * -hda disk.img
> >> 
> >>       Fine as long as disk.img is really a disk image (as it should).
> >
> > .img is not as clear, I've seen people using it for other formats. It's
> > still a disk image, but not a raw one.
> 
> Is this usage common?
> 

More anecdotal data: Like Eric, I have non-raw images using a .img
extension.

Also, ".img" as a generic naming convention is useful enough that some
of our own qemu iotests use it, regardless of format (mainly in block
job python tests)

[...]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-06 12:53     ` Max Reitz
@ 2014-11-06 14:56       ` Jeff Cody
  2014-11-06 15:00         ` Max Reitz
  2014-11-07  9:57       ` Markus Armbruster
  1 sibling, 1 reply; 38+ messages in thread
From: Jeff Cody @ 2014-11-06 14:56 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, Markus Armbruster, Stefan Hajnoczi, qemu-devel

On Thu, Nov 06, 2014 at 01:53:35PM +0100, Max Reitz wrote:
> On 2014-11-06 at 13:26, Markus Armbruster wrote:
> >Max Reitz <mreitz@redhat.com> writes:
> >
> >>On 2014-11-04 at 19:45, Markus Armbruster wrote:
> >>>I'll try to explain all solutions fairly.  Isn't easy when you're as
> >>>biased towards one of them as I am.  Please bear with me.
> >>>
> >>>
> >>>= The trust boundary between image contents and meta-data =
> >>>
> >>>A disk image consists of image contents and meta-data.
> >>>
> >>>Example: all of a raw image's contents is image contents.  Leaves just
> >>>file name and attributes for meta-data.
> >>>
> >>>Example: QCOW2 meta-data includes header, header extensions, L1 table,
> >>>L2 tables, ...  The meta-data defines where in the image the actual
> >>>contents is stored.
> >>>
> >>>A guest can access the image contents, not the meta-data.
> >>>
> >>>Image contents you've let an untrusted guest write is untrusted.
> >>>
> >>>Therefore, there's a trust boundary between image contents and
> >>>meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
> >>>contents.  The exact location of the trust boundary depends on the image
> >>>format.
> >>>
> >>>
> >>>= How we instruct QEMU what to trust =
> >>>
> >>>By configuring QEMU to use an image, the user instructs QEMU to trust
> >>>the image's meta-data.
> >>>
> >>>When the user's configuration specifies the image format explicitly, the
> >>>trust boundary is clear.
> >>>
> >>>Else, the trust boundary is ambigous when more than one format is
> >>>possible.
> >>>
> >>>QEMU resolves this ambiguity by picking the first format with the
> >>>highest "score".  Raw format is always possible, and always has the
> >>>lowest score.
> >>>
> >>>
> >>>= How this lets the guest escape isolation =
> >>>
> >>>Unfortunately, this lets the guest shift the trust boundary and escape
> >>>isolation, as follows:
> >>>
> >>>* Expose a raw image to the guest (whether you specify the format=raw or
> >>>    let QEMU guess it doesn't matter).  The complete contents becomes
> >>>    untrusted.
> >>>
> >>>* Reuse the image *without* specifying the raw format.  QEMU guesses the
> >>>    format based on untrusted image contents.  Now QEMU guesses a format
> >>>    chosen by the guest, with meta-data chosen by the guest.  By
> >>>    controlling image meta-data, the malicious guest can access arbitrary
> >>>    files as QEMU, enlarge its storage, and more.  A non-malicious guest
> >>>    can accidentally DoS itself, by writing a pattern probing recognizes.
> >>Thank you for bringing that to my attention. This means that I'm even
> >>more in favor of using Kevin's patches because in fact they don't
> >>break anything.
> >They break things differently.  The difference may or may not matter.
> >
> >Example: innocent guest writes a recognized pattern.
> >
> >   Now: next restart fails, guest DoSed itself until host operator gets
> >   around to adding format=raw to the configuration.  Consequence:
> >   downtime (probably lengthy), but no data corruption.
> >
> >   With Kevin's patch: write fails, guest may or may not handle the
> >   failure gracefully.  Consequences can range from "guest complains to
> >   its logs (who cares)" via "guest stops whatever it's doing and refuses
> >   to continue until its hardware gets fixed (downtime as above)" to
> >   "data corruption".
> 
> You somehow seem convinced that writing to sector 0 is a completely
> normal operation. For x86, it isn't, though.
> 
> There are only a couple of programs which do that, I can only think
> of partitioning and setting up boot loaders. There's not a myriad of
> programs which would increase the probability of one both writing a
> recognizable pattern *and* not handling EPERM correctly.
> 
> I see the probability of both happening at the same time as
> extremely low, not least because there are only a handful of
> programs which access that sector.
>

I'm not yet opposed to the "restricted-raw" method, but...

I think the above is a somewhat dangerous viewpoint to take with QEMU.
It is a bit of a slippery slope to start to assume what data guests
will write to the disks provided to them.  Even if the probability of
this happening is very low, with what usage we envision now, it is
still entirely legitimate usage for a guest to write data starting at
sector 0.

> >Example: innocent guest first writes a recognized pattern, then
> >overwrites it with a non-recognized pattern.
> >
> >   Now: works.
> >
> >   With Kevin's patch: as above.
> 
> Not really, if the guest overwrites the data with a non-recognized
> pattern after EPERM it works as well. The difference here is that
> you won't have the intended data in the meantime.
> 
> >Again, I'm not claiming the differences are serious in practice, only
> >that they exist.
> 
> True.
> 
> >>>This is CVE-2008-2004.
> >>>
> >>>
> >>>= Aside: other trust boundaries =
> >>>
> >>>Of course, this is not the only trust boundary that matters.  For
> >>>instance, there's normally one between your host and somebody else's
> >>>computers.  Telling QEMU to trust meta-data of some image you got "from
> >>>the internet" violates it.  There's nothing QEMU can do about that.
> >>>
> >>>
> >>>= Insecure usage is easy, secure usage is hard =
> >>>
> >>>The oldest stratum of user interfaces doesn't let you specify the image
> >>>format.  Use of raw images with these is insecure by design.  These
> >>>interfaces are still recommended for human users.
> >>>
> >>>Example of insecure usage: -hda foo.img, where foo.img is raw.
> >>>
> >>>With the next generation of interfaces, specifying the image format is
> >>>optional.  Use of raw images with these is insecure by default.
> >>>
> >>>Example of insecure usage: -drive file=foo.img,index=0,media=cdrom,
> >>>where foo.img is raw.  The -hda above is actually sugar for this.
> >>>
> >>>Equivalent secure usage: add format=raw.
> >>>
> >>>Note that specifying just the top image's format is not enough, you also
> >>>have to specify any backing images' formats.  QCOW2 can optionally store
> >>>the backing image format in the image.  The other COW formats can't.
> >>Well, they can, with "json:". *cough*
> >Point coughingly taken.
> >
> >>>Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
> >>>with a raw backing file.
> >>Yesterday I found out that doesn't seem possible. You apparently can
> >>only use VMDK with VMDK backing files.
> >I figure you're referring to this code in vmdk_create():
> >
> >         if (strcmp(bs->drv->format_name, "vmdk")) {
> >             bdrv_unref(bs);
> >             ret = -EINVAL;
> >             goto exit;
> >         }
> 
> Yes. Of course it does depend on probing, which figures, I guess.
> 
> >>                                        Other than that, we only have
> >>qcow1 and qed as COW formats which should not be used anyway.
> >qemu-doc.texi calls them "old image format", and qemu-img.texi has them
> >under "Other", "for compatibility with older QEMU versions".  I guess we
> >could do better job telling users they "should not be used anyway".
> >
> >Even in old stuff retained just for compatibility, we should make an
> >effort to plug security holes, make secure usage easy, and guide users
> >away from insecure usage.
> 
> Of course, I'm just pointing out that it seems to be only qcow1
> which is fully subject to this. qcow2 is partially, as you're
> pointing out next.
> 
> >Now back to the point I was trying to make in my original message.
> >
> >Replacement example of insecure usage: -hda bar.qcow2, where bar.qcow2
> >is a QCOW2 image with a raw backing file and no backing image format,
> >i.e. created without "-o backing_format=".
> >
> >Then the next paragraph applies:
> >
> >>>Equivalent secure usage: Beats me.  Maybe there's a funky -drive
> >>>backing.whatever to specify the backing image's format.
> >See Kevin's reply for equivalent secure usage.
> >
> >>>With the latest interface blockdev-add, specifying the format is
> >>>mandatory.  Secure, but not really suitable for humans.
> >>>
> >>>Example of secure usage:
> >>>
> >>>      { "execute": "blockdev-add",
> >>>        "arguments": {'options': {
> >>>            'driver': 'raw', 'id':'foo',
> >>>            'file': { 'driver': 'file', 'filename': 'foo.img' } } } }
> >>>
> >>>Insecure usage is easy, secure usage is *hard*.  Even for sophisticated
> >>>users like libvirt developers.  Evidence: libvirt CVE-2010-2237,
> >>>CVE-2010-2238, CVE-2010-2239, and more that didn't get a CVE, like the
> >>>recent accidental probing in drive-mirror.
> >>>
> >>>
> >>>= How can we better guard the trust boundary in QEMU? =
> >>>
> >>>The guest can violate the trust boundary only because
> >>>
> >>>(a) QEMU supports both raw images and image formats, and
> >>>
> >>>(b) QEMU guesses image format from raw image contents, and
> >>>
> >>>(c) given a raw image, guests can change its contents and control a
> >>>future QEMU's format guess.
> >>>
> >>>We can attack one ore more of these three conditions:
> >>I'd like to attack more because any of these steps might be carried
> >>out in another program which thus either becomes vulnerable itself
> >>(which we don't really have to care about, but I'd like to either way)
> >>or which makes qemu vulnerable.
> >>
> >>Having an external program with (a) and (c), this makes qemu
> >>vulnerable if we don't try to forbid (b) or at least make it work
> >>better. Having an external program with (a) and (b), not doing
> >>anything against (c) in qemu makes that external program vulnerable.
> >>
> >>>(a) Outlaw raw images
> >>>
> >>>(b) Don't guess format from untrusted image contents
> >>>
> >>>(c) Prevent "bad" guest writes
> >>>
> >>>Nobody is seriously suggesting we do (a).  It's clearly too late for
> >>>that.  Let's explore the other two in more detail.
> >>And thus I prefer to find and implement solutions for *both* (b) and (c).
> >Quoting myself: "We can attack one ore more of these three conditions".
> 
> Yes, and I referred to that by saying "I'd like to attack more".
> 
> >>>== Don't guess format from untrusted image contents ==
> >>>
> >>>Several variations of the theme.
> >>>
> >>>Guessing only happens when the user doesn't specify a format, so the
> >>>simplest way to avoid it would be requiring users to always specify the
> >>>format.
> >>>
> >>>PRO: Simple, plugs the hole.
> >>>
> >>>CON: Breaks a lot of existing usage.  Bye -hda, hello extra typing.
> >>>
> >>>Variation: command line option to opt out of probing completely.
> >>>
> >>>PRO: Simple, plugs the hole.
> >>>
> >>>CON: Insecure by default.
> >>>
> >>>CON: In addition to opting out, you have to update your usage
> >>>accordingly.  I guess libvirt would do it anyway, to guard against
> >>>accidental probing once and for all.
> >>>
> >>>Variation: Stefan proposed to make format= mandatory just for -drive.  I
> >>>guess he rather meant mandatory for anything but -hda and other selected
> >>>convenience interfaces.
> >>>
> >>>PRO: No extra typing in the cases where it makes the most difference.
> >>>
> >>>CON: Breaks existing usage in the other cases, extra typing.
> >>>
> >>>CON: Doesn't fully plug the hole.  Users of convenience interfaces may
> >>>      remain insecure out of ignorance.  We could add a warning to guide
> >>>      them to more secure usage, but then that warning would annoy users
> >>>      who don't care for security (sometimes with reason), and we can't
> >>>      have that.  So we silently leave the users who would care if they
> >>>      knew insecure.
> >>>
> >>>I proposed something less radical, namely to keep guessing the image
> >>>format, but base the guess on trusted meta-data only: file name and
> >>>attributes.
> >>You actually want to completely abolish probing? I thought we just
> >>wanted to use the filename as a second source of information and warn
> >>if the contents and the extension don't seem to match; and in the
> >>future, maybe make it an error (but we don't have to discuss that
> >>second part now, I think).
> >Yes, I propose to ditch it completely, after a suitable grace period.  I
> >tried to make that quite clear in my PATCH RFC 2/2.
> >
> >Here's why.
> >
> >Now: 1. probe
> >      4. open, error out if meta-data is bad
> >
> >With my proposed patch:
> >      1. probe
> >      2. guess from trusted meta-data
> >      3. warn unless 1 and 2 match
> >      4. open, error out if meta-data is bad
> >
> >Now make the warning an error:
> >      1. probe
> >      2. guess from trusted meta-data
> >      3. error out unless 1 and 2 match
> >      4. open, error out if meta-data is bad
> >
> >I figure the following is equivalent, but simpler:
> >
> >      2. guess from trusted meta-data
> >      4. open, error out if meta-data is bad
> >
> >because open should detect all the errors the previous step 3 caught.
> >If not, things are broken with explicit format=.
> 
> You're right, it seems be equivalent. One difference will probably
> be error messages ("Bad signature" vs. "Filename extension and
> format do not match").
> 
> The other difference is that you have a problem if you cannot
> distinguish between two formats by extensions, as we've seen
> already. .qcow could mean both qcow1 and qcow2; a similar problem
> appeared with .vhd. Opening the image using all possible formats
> seems bad to me. Better swap 1 and 2: Guess from commonly trusted
> metadata (i.e. the filename extension) and then probe all possible
> formats.
> 
> >>>Block and character special files are raw.  For other
> >>>files, find the file name extension, and look up the format claiming it.
> >>>
> >>>PRO: Plugs the hole.
> >>You mean "plugs hole (b)".
> >What I (airily) call "the hole" is the scenario I described above: guest
> >escaping isolation by subverting qemu-system-*.
> >
> >(b) is not a hole, it's a condition for "the hole".
> 
> Your "the hole" is only one hole. There are more holes. You only
> consider the case where there's only qemu, which is technically fine
> for discussion on qemu-devel, but I'd like to consider having other
> tools beside qemu as well.
> 
> You're right, though, it should be "prevents condition (b)".
> 
> >I guess what you want to say is "attacking just condition (b) doesn't
> >plug some other holes I care about".  That's true.  There are indeed
> >other holes.
> >
> >For instance, guest attacking QEMU's utility programs qemu-img, ...  Or
> >guest attacking other host software.  I'm not trying to discount any of
> >them.  But tangling all problems up into a hairball won't do us any
> >good.
> 
> We have two approaches and I think using both will help to plug more
> holes. I'm not saying we should consider every possible hole with
> every host configuration there may be. I'm just saying using only of
> the approaches clearly only plugs "the hole" if there's only qemu.
> 
> >So your point that my analysis is narrow is taken.  In my defense, I can
> >say that my narrow analysis was difficult enough to write up, and
> >probably produced enough text to deter readers.
> >
> >>>CON: Breaks existing usage when the new guess differs from the old
> >>>      guess.  Common usage should be fine:
> >>>
> >>>      * -hda test.qcow2
> >>>
> >>>        Fine as long as test.qcow2 is really QCOW2 (as it should!), and
> >>>        either specifies a backing format (as it arguably should), or the
> >>>        backing file name is sane.
> >>>
> >>>      * -hda disk.img
> >>>
> >>>        Fine as long as disk.img is really a disk image (as it should).
> >>>
> >>>      * -hda /dev/mapper/vg0-virtdisk
> >>>
> >>>        Fine as long as the logical volume is raw.
> >>>
> >>>      Less common usage can break:
> >>>
> >>>      * -hda nbd://localhost
> >>>
> >>>        Socket provides no clue, so no guess.
> >>nbd should be raw. If it isn't, you're most likely doing something
> >>wrong. See https://bugzilla.redhat.com/show_bug.cgi?id=1090713 what
> >>happens when you're doing it wrong.
> >Okay.
> >
> >My RFC PATCH is too simplistic to exploit that, because it only looks at
> >file name extensions.  But "user asked for nbd protocol" is quite
> >obviously trusted meta-data.  We could have a less simplistic patch
> >putting it to use.  Or adopt the variation below.
> >
> >>>      Weird usage can conceivably break hard:
> >>>
> >>>      * -hdd disk.img
> >>>
> >>>        Breaks hard when disk.img is actually QCOW2, the guest boots
> >>>        anyway from another drive, then proceeds to overwrite this one.
> >>Then why not continue to do probing and using the extension as a safeguard?
> >>
> >>>Mitigation: lengthy transition period where we warn "this usage is
> >>>insecure, and we'll eventually break it; here's a hint on secure usage".
> >>>
> >>>CON: We delay plugging the hole one more time.  But at least we no
> >>>longer expose our users to it silently.
> >>>
> >>>Jeff pointed out that we want to retain probing in things like qemu-img
> >>>info.
> >>>
> >>>Richard asked for a way for users to ask for insecure probing, say
> >>>format=insecure-probe.  I have no problem with giving users rope when
> >>>they ask for it.
> >>>
> >>>Variation: if file name and attributes are unavailable or provide no
> >>>clue, guess raw.  Same PRO and CON as above, only it avoids breaking a
> >>>few more cases.  For instance, "-hda nbd://localhost" keeps working as
> >>>long as the server serves a raw image.
> >>Which it should be.
> >>
> >>>Smells a bit like too much magic
> >>>to me.
> >>>
> >>>My proposal replaces probing wholesale.  I like that because it results
> >>>in simple, predictable guessing.  Here's a hybrid approach: first guess
> >>>raw vs. non-raw based on trusted meta-data only, and if non-raw, probe.
> >>>
> >>>Nothing the guest writes can affect the raw vs. non-raw decision.  Once
> >>>an image is raw, only the user can make it non-raw, by changing its name
> >>>or attributes.
> >>>
> >>>Two variations: 1. guess raw without a clue, and 2. guess non-raw then.
> >>>
> >>>Again, same PRO and CON as above, only it doesn't break when users give
> >>>their non-raw images weird names.
> >>>
> >>>== Prevent "bad" guest writes ==
> >>>
> >>>Again, several variations, but this time, only the last one is serious,
> >>>the others are just for illustration.
> >>>
> >>>Fail guest writes to those parts of the image that probing may examine
> >>>Can fail only writes to the first few sectors (at worst) of raw images.
> >>>
> >>>PRO: Plugs the hole.
> >>>
> >>>CON: The virtual hardware is defective.  Breaks common guest software
> >>>that writes to the first few sectors, such as boot loaders and
> >>>partitioning tools.  Breaks guest software using the whole device, which
> >>>isn't common, but certainly not unheard of.
> >>>
> >>>Variation: fail only writes of patterns that actually can make probing
> >>>guess something other than raw.
> >>>
> >>>PRO: Still plugs the hole.
> >>You mean "plugs hole (c)".
> >My reply to "plugs hole (b)" applies.
> >
> >>>CON: Except when you upgrade to a version that recognizes more patterns.
> >>Which is better than not plugging hole (c) at all.
> >>
> >>>CON: The virtual hardware is still defective, but the defects are
> >>>minimized.
> >>As you pointed out to us it's already defective and I don't think
> >>anybody ever noticed accidentally.
> >You're right in that probed raw is already defective, with defects
> >delayed to the next restart.  Preventing "bad" guest writes changes the
> >nature of the defects subtly, as I described above.
> 
> Sector 0 is rarely ever written. It's not that people write some
> recognizable sequences there all the time but don't notice because
> they are quickly overwritten again.
> 
> If nobody hit the problem accidentally until now, I'm certain that
> means that nobody ever wrote any recognizable sequence there while
> using qemu and raw (which is not too rare a combination).
> 
> >This and the previous variation also extends them to non-probed raw.
> >The following variations avoid the extension.
> >
> >>>We can hope that partition tables, boot sectors and such
> >>>won't match the patterns, so common guest software hopefully works.
> >>It's worked in the past, that's good enough for me.
> >>
> >>>Guest software using the whole device still breaks, only now it breaks
> >>>later rather than sooner.
> >>>
> >>>Variation: fail writes only on *probed* raw images.
> >>>
> >>>CON: Doesn't fully plug the hole: mixing probed usage (user doesn't
> >>>specify format) with non-probed usage (user specifies format) remains
> >>>insecure.  The guest's write succeeds in non-probed usage, and the guest
> >>>escapes isolation in the next probed usage.
> >>>
> >>>CON: The virtual hardware is still defective, but it now comes with a
> >>>"defective on/off" switch, factory default "defective on".  We could add
> >>>a warning to guide users to switch defective off but then that warning
> >>>would annoy people who don't care to switch it off (sometimes with
> >>>reason), and we can't have that.  So we leave users who would care if
> >>>they knew in the dark.
> >>>
> >>>The two variations can be combined.  This is Kevin's proposal.
> >>>
> >>>CON: Doesn't fully plug the hole: union of both variations' flaws.
> >>>
> >>>CON: The virtual hardware is still defective: interesection of both
> >>>variations' defects.
> >>>
> >>>
> >>>= Conclusion =
> >>>
> >>>This is *my* conclusion.  Yours may be different, and that's okay.  It's
> >>>business, not personal ;)
> >>>
> >>>Secure usage should not be hard.
> >>>
> >>>If we permit insecure usage for convenience or compatibility, we should
> >>>warn the user, unless he clearly asked for it.  Even if that warning
> >>>annoys Kevin and Max.
> >>A warning does not annoy me as long as I know what it means.
> >Good :)
> >
> >>>If you want to suppress it with configure
> >>>--reckless or something, no objections.
> >>>
> >>>Same for defective virtual hardware.
> >>>
> >>>Kevin's proposal to prevent "bad" guest writes has relatively small
> >>>compatibility issues.  It improves things from "insecure by default" to
> >>>"slightly defective by default" as long as you use raw images either
> >>>always probed or always non-probed.  It doesn't help at all when you
> >>>alternate probed and non-probed usage.  Improvement of sorts, but it
> >>>still fails my "secure usage should not be hard" requirement, and that
> >>>requirement is a hard one for me.
> >>>
> >>>My proposal to ditch image contents probing entirely has more serious
> >>>compatibility issues.  In particular, we'd have to forgo sugared
> >>>convenience syntax for a number of less common things.  It definitely
> >>>needs a grace period where all usage we're going to break warns.  On the
> >>>up side, it will actually be secure by default when it's done.
> >>>
> >>>If this is not acceptable, my second choice is actually the command line
> >>>option to opt out of probing completely.  This doesn't address "insecure
> >>>by default" (sadly), but it does at least satisfy my "secure usage
> >>>should not be hard" requirement.
> >>>
> >>>If we should adopt Kevin's proposal against my objections, then I very
> >>>badly want the opt out option on top of it, opting out of both probing
> >>>and "bad" write prevention.
> >>>
> >>>Thanks for hearing me out.
> >>My conclusion: Don't ditch probing. It increases entropy, why would
> >>you ditch probing? Just combine it with the extension and if both
> >>don't seem to match, that's an error.
> >How does probing add value?
> >
> >Compare
> >
> >      1. probe
> >      2. guess from trusted meta-data
> >      3. error out unless 1 and 2 match
> >      4. open, error out if meta-data is bad
> >
> >to just
> >
> >      2. guess from trusted meta-data
> >      4. open, error out if meta-data is bad
> >
> >Let P be the driver recommended by probe, and G be the driver
> >recommended by guess.
> >
> >If P == G, same result: we open with the same driver.
> >
> >Else, if open with G fails, equivalent result: error out in step 3
> >vs. error out in step 4.
> >
> >Else, we have an odd case: one driver's probe accepts (P's), yet a
> >different driver's open succeeds (G's).
> >
> >     If G's probe rejects: is this a bug?  Shouldn't open always fail
> >     when probe rejects?
> >
> >     If G's probe accepts, then probing chose P over G.  Maybe it
> >     shouldn't have.  Or maybe the probing functions are imprecise.
> >     Anyway, keeping probing around makes this an error.  Should it be
> >     one?
> >
> >Am I missing something?
> 
> No, see my reply above. I failed to consider that opening an image
> basically is advanced probing.
> 
> The only thing I really see which could be missing is the problem of
> having ambiguous filename extensions, and that problem can easily be
> solved by keeping probing.
> 
> >>Also, holes (b) and (c) are two different holes. We should fix
> >>both. We should fix (b) so qemu isn't vulnerable and we should fix (c)
> >>so qemu doesn't make other programs which do probe vulnerable.
> >Adjusting terminology, but hopefully not your intent: "the hole" isn't
> >the only hole that matters.  The conditions enable other holes.
> >Therefore, we should attack both (b) and (c).  Attacking (a) is not
> >practical.
> 
> Yes. that's what I meant. There are various holes and preventing
> only one of the conditions (b) and (c) only plugs one (or a subset)
> of them, and I can easily see unplugged holes which can be fixed by
> preventing both.
> 
> >Points taken, except I think we could attack (a) if we really wanted.
> 
> Of course, but for that we'd either need some flat qcow2 mode or
> better support for an image format that does have such a flat mode.
> 
> Max
> 
> >>So, for fixing (b): Just use the extensions as a safeguard and issue a
> >>warning for now. We can discuss about making it an error later.
> >>
> >>And for fixing (c): As you pointed out, if guests wrote some
> >>probe-matching pattern in the past, it would break qemu (which is what
> >>we're trying to fix). Since noone ever said that some guest did that
> >>by accident, I think we can safely assume that prohibiting such writes
> >>will not hurt anyone in the future either; at least there are no
> >>compatibility issues, so if someone notices a problem, he/she can just
> >>explicitly specify the format and it'll work (which you should be
> >>doing anyway, as we all know, though many of us, including me, don't
> >>want to do it all the time).
> >Thanks for stating your conclusion concisely, it's really helpful.
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-06 14:56       ` Jeff Cody
@ 2014-11-06 15:00         ` Max Reitz
  2014-11-07 14:52           ` Markus Armbruster
  0 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2014-11-06 15:00 UTC (permalink / raw)
  To: Jeff Cody; +Cc: Kevin Wolf, Markus Armbruster, Stefan Hajnoczi, qemu-devel

On 2014-11-06 at 15:56, Jeff Cody wrote:
> On Thu, Nov 06, 2014 at 01:53:35PM +0100, Max Reitz wrote:
>> On 2014-11-06 at 13:26, Markus Armbruster wrote:
>>> Max Reitz <mreitz@redhat.com> writes:
>>>
>>>> On 2014-11-04 at 19:45, Markus Armbruster wrote:
>>>>> I'll try to explain all solutions fairly.  Isn't easy when you're as
>>>>> biased towards one of them as I am.  Please bear with me.
>>>>>
>>>>>
>>>>> = The trust boundary between image contents and meta-data =
>>>>>
>>>>> A disk image consists of image contents and meta-data.
>>>>>
>>>>> Example: all of a raw image's contents is image contents.  Leaves just
>>>>> file name and attributes for meta-data.
>>>>>
>>>>> Example: QCOW2 meta-data includes header, header extensions, L1 table,
>>>>> L2 tables, ...  The meta-data defines where in the image the actual
>>>>> contents is stored.
>>>>>
>>>>> A guest can access the image contents, not the meta-data.
>>>>>
>>>>> Image contents you've let an untrusted guest write is untrusted.
>>>>>
>>>>> Therefore, there's a trust boundary between image contents and
>>>>> meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
>>>>> contents.  The exact location of the trust boundary depends on the image
>>>>> format.
>>>>>
>>>>>
>>>>> = How we instruct QEMU what to trust =
>>>>>
>>>>> By configuring QEMU to use an image, the user instructs QEMU to trust
>>>>> the image's meta-data.
>>>>>
>>>>> When the user's configuration specifies the image format explicitly, the
>>>>> trust boundary is clear.
>>>>>
>>>>> Else, the trust boundary is ambigous when more than one format is
>>>>> possible.
>>>>>
>>>>> QEMU resolves this ambiguity by picking the first format with the
>>>>> highest "score".  Raw format is always possible, and always has the
>>>>> lowest score.
>>>>>
>>>>>
>>>>> = How this lets the guest escape isolation =
>>>>>
>>>>> Unfortunately, this lets the guest shift the trust boundary and escape
>>>>> isolation, as follows:
>>>>>
>>>>> * Expose a raw image to the guest (whether you specify the format=raw or
>>>>>     let QEMU guess it doesn't matter).  The complete contents becomes
>>>>>     untrusted.
>>>>>
>>>>> * Reuse the image *without* specifying the raw format.  QEMU guesses the
>>>>>     format based on untrusted image contents.  Now QEMU guesses a format
>>>>>     chosen by the guest, with meta-data chosen by the guest.  By
>>>>>     controlling image meta-data, the malicious guest can access arbitrary
>>>>>     files as QEMU, enlarge its storage, and more.  A non-malicious guest
>>>>>     can accidentally DoS itself, by writing a pattern probing recognizes.
>>>> Thank you for bringing that to my attention. This means that I'm even
>>>> more in favor of using Kevin's patches because in fact they don't
>>>> break anything.
>>> They break things differently.  The difference may or may not matter.
>>>
>>> Example: innocent guest writes a recognized pattern.
>>>
>>>    Now: next restart fails, guest DoSed itself until host operator gets
>>>    around to adding format=raw to the configuration.  Consequence:
>>>    downtime (probably lengthy), but no data corruption.
>>>
>>>    With Kevin's patch: write fails, guest may or may not handle the
>>>    failure gracefully.  Consequences can range from "guest complains to
>>>    its logs (who cares)" via "guest stops whatever it's doing and refuses
>>>    to continue until its hardware gets fixed (downtime as above)" to
>>>    "data corruption".
>> You somehow seem convinced that writing to sector 0 is a completely
>> normal operation. For x86, it isn't, though.
>>
>> There are only a couple of programs which do that, I can only think
>> of partitioning and setting up boot loaders. There's not a myriad of
>> programs which would increase the probability of one both writing a
>> recognizable pattern *and* not handling EPERM correctly.
>>
>> I see the probability of both happening at the same time as
>> extremely low, not least because there are only a handful of
>> programs which access that sector.
>>
> I'm not yet opposed to the "restricted-raw" method, but...
>
> I think the above is a somewhat dangerous viewpoint to take with QEMU.
> It is a bit of a slippery slope to start to assume what data guests
> will write to the disks provided to them.  Even if the probability of
> this happening is very low, with what usage we envision now, it is
> still entirely legitimate usage for a guest to write data starting at
> sector 0.

Then let's officially deprecate format probing, if we haven't done so 
already. That way, there's no excuse.

What I'm saying is that there are obviously no compatibility issues. 
There is no guest software which did write recognizable patterns (so far 
nobody provided a counterexample), and since format probing is 
deprecated (or should be), you have no excuse for running future guests 
in qemu without having explicitly specified the format.

And if you are specifying the format, Kevin's patches will not prevent 
the guest from making its disk a qcow2 image whatsoever.

Max

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-06 13:57   ` Markus Armbruster
  2014-11-06 14:14     ` Eric Blake
  2014-11-06 14:35     ` Jeff Cody
@ 2014-11-06 15:01     ` Kevin Wolf
  2014-11-07 15:21       ` Markus Armbruster
  2 siblings, 1 reply; 38+ messages in thread
From: Kevin Wolf @ 2014-11-06 15:01 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Jeff Cody, qemu-devel, Stefan Hajnoczi, Max Reitz

Am 06.11.2014 um 14:57 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 04.11.2014 um 19:45 hat Markus Armbruster geschrieben:
> >> I'll try to explain all solutions fairly.  Isn't easy when you're as
> >> biased towards one of them as I am.  Please bear with me.
> >> 
> >> 
> >> = The trust boundary between image contents and meta-data =
> >> 
> >> A disk image consists of image contents and meta-data.
> >> 
> >> Example: all of a raw image's contents is image contents.  Leaves just
> >> file name and attributes for meta-data.
> >
> > Better: Leaves only protocol-specific metadata (e.g. file name and
> > attributes for raw-posix).
> 
> Can you give examples for other protocols?

Max already gave the example of NBD always implying raw.

I can also imagine that protocols that take URLs would use the file name
from the URL - which is not necessarily the end of the string, because
query options could follow.

Even though I don't think we have it today, it's also not entirely
unthinkable that some network protocol specifically made for VM images
(perhaps something like Sheepdog) could be storing the image format as
metadata on the server. Actually, I think that would make a whole lot of
sense for them.

> >> = Insecure usage is easy, secure usage is hard =
> >> 
> >> The oldest stratum of user interfaces doesn't let you specify the image
> >> format.  Use of raw images with these is insecure by design.  These
> >> interfaces are still recommended for human users.
> >> 
> >> Example of insecure usage: -hda foo.img, where foo.img is raw.
> >> 
> >> With the next generation of interfaces, specifying the image format is
> >> optional.  Use of raw images with these is insecure by default.
> >> 
> >> Example of insecure usage: -drive file=foo.img,index=0,media=cdrom,
> >> where foo.img is raw.  The -hda above is actually sugar for this.
> >> 
> >> Equivalent secure usage: add format=raw.
> >> 
> >> Note that specifying just the top image's format is not enough, you also
> >> have to specify any backing images' formats.  QCOW2 can optionally store
> >> the backing image format in the image.  The other COW formats can't.
> >> 
> >> Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
> >> with a raw backing file.
> >
> > Usually this is mitigated by the fact that backing files are read-only.
> > Trouble is starting when you use things like commit.
> 
> Yes.
> 
> >> Equivalent secure usage: Beats me.  Maybe there's a funky -drive
> >> backing.whatever to specify the backing image's format.
> >
> > Yes, you can override the backing file driver (backing.driver=raw should
> > do the trick). Not really user-friendly, especially with long backing
> > file chains, but it happens to be there.
> >
> > And of course, libvirt should be using it for non-qcow2 or qcow2 without
> > the backing format header extension (but doesn't yet).
> 
> I'm glad it's there.  Too bad libvirt doesn't use it, yet.  Supports my
> point that secure usage is too hard now.

I don't know whether it's related to being too hard or just too new. I
won't disagree when you say that it isn't obvious, but the libvirt
authors are experts and probably know better than the average command
line user what they should be doing ideally.

> >> I proposed something less radical, namely to keep guessing the image
> >> format, but base the guess on trusted meta-data only: file name and
> >> attributes.  Block and character special files are raw.  For other
> >> files, find the file name extension, and look up the format claiming it.
> >> 
> >> PRO: Plugs the hole.
> >> 
> >> CON: Breaks existing usage when the new guess differs from the old
> >>     guess.  Common usage should be fine:
> >> 
> >>     * -hda test.qcow2
> >> 
> >>       Fine as long as test.qcow2 is really QCOW2 (as it should!), and
> >>       either specifies a backing format (as it arguably should), or the
> >>       backing file name is sane.
> >> 
> >>     * -hda disk.img
> >> 
> >>       Fine as long as disk.img is really a disk image (as it should).
> >
> > .img is not as clear, I've seen people using it for other formats. It's
> > still a disk image, but not a raw one.
> 
> Is this usage common?

More common that writing a qcow2 header to your boot sector. ;-)

But seriously, one of the problems in this discussion is that we don't
have any actual data for more exotic use cases. I can only say that I've
seen it before, even though that doesn't mean much.

If you want me to guess: Not really common, but probably one of the most
common corner cases from those that we've been discussing here.

> >>     * -hda /dev/mapper/vg0-virtdisk
> >> 
> >>       Fine as long as the logical volume is raw.
> >> 
> >>     Less common usage can break:
> >> 
> >>     * -hda nbd://localhost
> >> 
> >>       Socket provides no clue, so no guess.
> >> 
> >>     Weird usage can conceivably break hard:
> >> 
> >>     * -hdd disk.img
> >> 
> >>       Breaks hard when disk.img is actually QCOW2, the guest boots
> >>       anyway from another drive, then proceeds to overwrite this one.
> >> 
> >> Mitigation: lengthy transition period where we warn "this usage is
> >> insecure, and we'll eventually break it; here's a hint on secure usage".
> >> 
> >> CON: We delay plugging the hole one more time.  But at least we no
> >> longer expose our users to it silently.
> >
> > CON: Relies on metadata that is protocol-specific. Each protocol that
> >      should support probing needs extra code. Essentially means that
> >      probing will be disabled on anything except raw-posix (and if we're
> >      lucky enough that someone pays attention during review, raw-win32)
> 
> Terminology: I use "probing" and "guessing from trusted meta-data".  The
> former is for probing raw image contents.  The latter may only examine
> trusted meta-data.
> 
> I suspect you mean "each protocol that should support guessing from
> trusted meta-data needs extra code".  Do you?

Yes.

I'll try to remember using "probing" and "guessing" with your meaning.

> >> == Prevent "bad" guest writes ==
> >> 
> >> Again, several variations, but this time, only the last one is serious,
> >> the others are just for illustration.
> >> 
> >> Fail guest writes to those parts of the image that probing may examine
> >> Can fail only writes to the first few sectors (at worst) of raw images.
> >> 
> >> PRO: Plugs the hole.

PRO: Fix is small, local to raw block driver, and obviously complete (in
the sense that it catches every usage of the raw format).

PRO: Only affects images actually opened as raw; can't possibly break
non-raw use cases

CON: Can affect raw images even with an explicit format=raw

> >> 
> >> CON: The virtual hardware is defective.  Breaks common guest software
> >> that writes to the first few sectors, such as boot loaders and
> >> partitioning tools.  Breaks guest software using the whole device, which
> >> isn't common, but certainly not unheard of.
> >> 
> >> Variation: fail only writes of patterns that actually can make probing
> >> guess something other than raw.
> >> 
> >> PRO: Still plugs the hole.
> >> 
> >> CON: Except when you upgrade to a version that recognizes more patterns.
> >> 
> >> CON: The virtual hardware is still defective, but the defects are
> >> minimized.  We can hope that partition tables, boot sectors and such
> >> won't match the patterns, so common guest software hopefully works.
> >> Guest software using the whole device still breaks, only now it breaks
> >> later rather than sooner.
> >> 
> >> Variation: fail writes only on *probed* raw images.

PRO: Plugs the hole in the most common case (user relies consistently on
probing)

PRO: Users explicitly specifying format=raw can't possibly be affected
any more, just like non-raw formats in the basic variant.

> >> CON: Doesn't fully plug the hole: mixing probed usage (user doesn't
> >> specify format) with non-probed usage (user specifies format) remains
> >> insecure.  The guest's write succeeds in non-probed usage, and the guest
> >> escapes isolation in the next probed usage.
> >> 
> >> CON: The virtual hardware is still defective, but it now comes with a
> >> "defective on/off" switch, factory default "defective on".  We could add
> >> a warning to guide users to switch defective off but then that warning
> >> would annoy people who don't care to switch it off (sometimes with
> >> reason), and we can't have that.  So we leave users who would care if
> >> they knew in the dark.
> 
> Replace by
> 
> CON: The virtual hardware is still defective, but it now comes with a
> "defective on/off" switch, factory default "defective on".  We could add
> a warning to guide users to switch defective off.
> 
> >> The two variations can be combined.  This is Kevin's proposal.
> >> 
> >> CON: Doesn't fully plug the hole: union of both variations' flaws.
> >> 
> >> CON: The virtual hardware is still defective: interesection of both
> >> variations' defects.

PRO: Union of both variations' advantages wrt false positives, which are
minimised as much as possible: Explicit format=... or usage of non-raw
formats doesn't trigger the check. This limits it to cases that are
already broken today (even though the failure mode changes - can
possibly even be called a bonus bug fix).

PRO: Still plugs the hole in the most common case (user relies
consistently on probing)

PRO: The fix is still small, local to raw block driver, and obviously
complete (in the sense that it catches every usage of the raw format).

> > I like how you took care to avoid finding any PROs. :-)
> 
> Come on, this line 281 of 327, cut me some slack :)
> 
> > I'll leave commenting on this section to others for now. I feel I have
> > already said enough about it in the other threads, and defending it here
> > at this point wouldn't help the discussion.
> 
> The purpose of this document is to summarize our thoughts.  Need yours
> to achieve it.  Can you give me your concise PROs?

Added them above. Of course, you could also add the CONs of the other
proposal as PROs here, like "works with any filename and even protocols
that don't have anything filename-like", but I think they are already
covered well enough in the other section.

> > And yes, it also doesn't help when you accidentally type format=qcow2
> > instead of format=raw. When you have images of both types, things like
> > this happen with manual typing.
> 
> I consider mixing probed and non-probed usage a more plausible bad habit
> than accidental use of format=qcow2, because the former works just fine,
> but the latter fails (unless your guest has "helpfully" written a QCOW2
> header).

I'll admit that mixing probed and non-probed usage might be more common
(though I think that I for one am pretty consistent in not using
format=... with my trusted images - why would I?), but I consider both
cases plausible. I've mistyped enough -f options for qemu-img. And that
it fails doesn't prevent the typo.

Kevin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-06 14:14     ` Eric Blake
@ 2014-11-06 15:52       ` Jeff Cody
  0 siblings, 0 replies; 38+ messages in thread
From: Jeff Cody @ 2014-11-06 15:52 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Max Reitz, Markus Armbruster, Stefan Hajnoczi,
	qemu-devel

On Thu, Nov 06, 2014 at 03:14:19PM +0100, Eric Blake wrote:
> On 11/06/2014 02:57 PM, Markus Armbruster wrote:
> 
> >> Yes, you can override the backing file driver (backing.driver=raw should
> >> do the trick). Not really user-friendly, especially with long backing
> >> file chains, but it happens to be there.
> >>
> >> And of course, libvirt should be using it for non-qcow2 or qcow2 without
> >> the backing format header extension (but doesn't yet).
> > 
> > I'm glad it's there.  Too bad libvirt doesn't use it, yet.  Supports my
> > point that secure usage is too hard now.
> 
> Indeed, libvirt is still lacking on enforcing the backing type that it
> probed vs. letting qemu re-probe a (possibly different) backing type.
> Were libvirt to actually enforce this (that is, libvirt's default
> out-of-the-box policy is to avoid all probes, and treat anything without
> a type as raw) means that a user that forgets to use -obacking_fmt and
> creates a chain base<-mid<-top will have the following conflicting view:
> 
> libvirt: mid[raw]<-top[qcow2]
> qemu: base[qcow2]<-mid[qcow2]<-top[qcow2]
> 
> Right now, if the chain is only 2 deep, qemu happily boots the guest
> with qcow2 format (in spite of libvirt treating mid as raw); but when
> the chain is 3 deep, because libvirt failed to give SELinux permissions
> to base, then qemu fails to open base, and fails to boot, but the
> failure message is very cryptic.
> 
> On the other hand, if libvirt were to ENFORCE its view that mid is raw,
> by passing appropriate drive options, then qemu would always boot, but
> now the top image would be visibly corrupted (treating a qcow2 file as
> raw leads to MUCH different data visible in the guest) and the guest
> will likely fail to boot completely, but with no error message from qemu
> (rather more likely that things just hang in a 100% cpu loop in the
> guest).  Although the existing error message is cryptic, this new
> behavior of enforcing a probed image to be raw feels like it will be
> even more user-unfriendly.
> 
> At any rate, I've certainly been working on getting libvirt to output
> the ENTIRE backing chain that it has determined, rather than its old
> behavior of keeping that information in memory only; this at least helps
> libvirt developers diagnose bug reports ("show me what libvirt thought
> your backing chain was, then go fix your XML to tell libvirt the truth
> and your problem will go away, if you didn't corrupt the backing files
> in the meantime with something like a 'commit' operation").
> 
> [I mentioned libvirt's default policy is to forbid probing and treat
> untyped disks as raw; but both of those knobs can be tweaked, to either
> allow probing, or treat the default type as qcow2, or both]
> 
> 
> >>
> >> .img is not as clear, I've seen people using it for other formats. It's
> >> still a disk image, but not a raw one.
> > 
> > Is this usage common?
> 
> At least on my laptop - yes.  I have several qcow2 files with an image
> suffix of .img (perhaps because I was lazy when I created them, or
> sometimes because I was quickly hacking a script to add a -fqcow2 to a
> qemu-img line without changing the file name, because changing the name
> would have a ripple effect on the rest of the script).  But I don't know
> if my usage is typical, and I also don't mind adjusting my naming
> conventions to silence a warning if qemu starts getting picky about
> confusing name-vs-contents issues.  And if I consistently use
> format=qcow2, I shouldn't be penalized with the warning, right?

If you look at the QEMU "startup" documentation[1] we link to from the
wiki[2], it uses .img for many of the QEMU image creation and usage
examples.  That leads me to think that '.img' usage as a generic
extension may be fairly common.  But, [1] seems outdated, as well.

[1] http://en.wikibooks.org/wiki/QEMU/Images
[2] http://wiki.qemu.org/Manual

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-06 12:53     ` Max Reitz
  2014-11-06 14:56       ` Jeff Cody
@ 2014-11-07  9:57       ` Markus Armbruster
  1 sibling, 0 replies; 38+ messages in thread
From: Markus Armbruster @ 2014-11-07  9:57 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, Jeff Cody, qemu-devel, Stefan Hajnoczi

Max Reitz <mreitz@redhat.com> writes:

> On 2014-11-06 at 13:26, Markus Armbruster wrote:
>> Max Reitz <mreitz@redhat.com> writes:
>>
>>> On 2014-11-04 at 19:45, Markus Armbruster wrote:
>>>> I'll try to explain all solutions fairly.  Isn't easy when you're as
>>>> biased towards one of them as I am.  Please bear with me.
>>>>
>>>>
>>>> = The trust boundary between image contents and meta-data =
>>>>
>>>> A disk image consists of image contents and meta-data.
>>>>
>>>> Example: all of a raw image's contents is image contents.  Leaves just
>>>> file name and attributes for meta-data.
>>>>
>>>> Example: QCOW2 meta-data includes header, header extensions, L1 table,
>>>> L2 tables, ...  The meta-data defines where in the image the actual
>>>> contents is stored.
>>>>
>>>> A guest can access the image contents, not the meta-data.
>>>>
>>>> Image contents you've let an untrusted guest write is untrusted.
>>>>
>>>> Therefore, there's a trust boundary between image contents and
>>>> meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
>>>> contents.  The exact location of the trust boundary depends on the image
>>>> format.
>>>>
>>>>
>>>> = How we instruct QEMU what to trust =
>>>>
>>>> By configuring QEMU to use an image, the user instructs QEMU to trust
>>>> the image's meta-data.
>>>>
>>>> When the user's configuration specifies the image format explicitly, the
>>>> trust boundary is clear.
>>>>
>>>> Else, the trust boundary is ambigous when more than one format is
>>>> possible.
>>>>
>>>> QEMU resolves this ambiguity by picking the first format with the
>>>> highest "score".  Raw format is always possible, and always has the
>>>> lowest score.
>>>>
>>>>
>>>> = How this lets the guest escape isolation =
>>>>
>>>> Unfortunately, this lets the guest shift the trust boundary and escape
>>>> isolation, as follows:
>>>>
>>>> * Expose a raw image to the guest (whether you specify the format=raw or
>>>>     let QEMU guess it doesn't matter).  The complete contents becomes
>>>>     untrusted.
>>>>
>>>> * Reuse the image *without* specifying the raw format.  QEMU guesses the
>>>>     format based on untrusted image contents.  Now QEMU guesses a format
>>>>     chosen by the guest, with meta-data chosen by the guest.  By
>>>>     controlling image meta-data, the malicious guest can access arbitrary
>>>>     files as QEMU, enlarge its storage, and more.  A non-malicious guest
>>>>     can accidentally DoS itself, by writing a pattern probing recognizes.
>>> Thank you for bringing that to my attention. This means that I'm even
>>> more in favor of using Kevin's patches because in fact they don't
>>> break anything.
>> They break things differently.  The difference may or may not matter.
>>
>> Example: innocent guest writes a recognized pattern.
>>
>>    Now: next restart fails, guest DoSed itself until host operator gets
>>    around to adding format=raw to the configuration.  Consequence:
>>    downtime (probably lengthy), but no data corruption.
>>
>>    With Kevin's patch: write fails, guest may or may not handle the
>>    failure gracefully.  Consequences can range from "guest complains to
>>    its logs (who cares)" via "guest stops whatever it's doing and refuses
>>    to continue until its hardware gets fixed (downtime as above)" to
>>    "data corruption".
>
> You somehow seem convinced that writing to sector 0 is a completely
> normal operation. For x86, it isn't, though.
>
> There are only a couple of programs which do that, I can only think of
> partitioning and setting up boot loaders. There's not a myriad of
> programs which would increase the probability of one both writing a
> recognizable pattern *and* not handling EPERM correctly.
>
> I see the probability of both happening at the same time as extremely
> low, not least because there are only a handful of programs which
> access that sector.
>
>> Example: innocent guest first writes a recognized pattern, then
>> overwrites it with a non-recognized pattern.
>>
>>    Now: works.
>>
>>    With Kevin's patch: as above.
>
> Not really, if the guest overwrites the data with a non-recognized
> pattern after EPERM it works as well. The difference here is that you
> won't have the intended data in the meantime.

That's roughly "guest complains to its logs (who cares)" from the text
referenced by "as above".  The other consequences are possibly here,
too.

>> Again, I'm not claiming the differences are serious in practice, only
>> that they exist.
>
> True.
>
>>>> This is CVE-2008-2004.
>>>>
>>>>
>>>> = Aside: other trust boundaries =
>>>>
>>>> Of course, this is not the only trust boundary that matters.  For
>>>> instance, there's normally one between your host and somebody else's
>>>> computers.  Telling QEMU to trust meta-data of some image you got "from
>>>> the internet" violates it.  There's nothing QEMU can do about that.
>>>>
>>>>
>>>> = Insecure usage is easy, secure usage is hard =
>>>>
>>>> The oldest stratum of user interfaces doesn't let you specify the image
>>>> format.  Use of raw images with these is insecure by design.  These
>>>> interfaces are still recommended for human users.
>>>>
>>>> Example of insecure usage: -hda foo.img, where foo.img is raw.
>>>>
>>>> With the next generation of interfaces, specifying the image format is
>>>> optional.  Use of raw images with these is insecure by default.
>>>>
>>>> Example of insecure usage: -drive file=foo.img,index=0,media=cdrom,
>>>> where foo.img is raw.  The -hda above is actually sugar for this.
>>>>
>>>> Equivalent secure usage: add format=raw.
>>>>
>>>> Note that specifying just the top image's format is not enough, you also
>>>> have to specify any backing images' formats.  QCOW2 can optionally store
>>>> the backing image format in the image.  The other COW formats can't.
>>> Well, they can, with "json:". *cough*
>> Point coughingly taken.
>>
>>>> Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
>>>> with a raw backing file.
>>> Yesterday I found out that doesn't seem possible. You apparently can
>>> only use VMDK with VMDK backing files.
>> I figure you're referring to this code in vmdk_create():
>>
>>          if (strcmp(bs->drv->format_name, "vmdk")) {
>>              bdrv_unref(bs);
>>              ret = -EINVAL;
>>              goto exit;
>>          }
>
> Yes. Of course it does depend on probing, which figures, I guess.
>
>>>                                         Other than that, we only have
>>> qcow1 and qed as COW formats which should not be used anyway.
>> qemu-doc.texi calls them "old image format", and qemu-img.texi has them
>> under "Other", "for compatibility with older QEMU versions".  I guess we
>> could do better job telling users they "should not be used anyway".
>>
>> Even in old stuff retained just for compatibility, we should make an
>> effort to plug security holes, make secure usage easy, and guide users
>> away from insecure usage.
>
> Of course, I'm just pointing out that it seems to be only qcow1 which
> is fully subject to this. qcow2 is partially, as you're pointing out
> next.
>
>> Now back to the point I was trying to make in my original message.
>>
>> Replacement example of insecure usage: -hda bar.qcow2, where bar.qcow2
>> is a QCOW2 image with a raw backing file and no backing image format,
>> i.e. created without "-o backing_format=".
>>
>> Then the next paragraph applies:
>>
>>>> Equivalent secure usage: Beats me.  Maybe there's a funky -drive
>>>> backing.whatever to specify the backing image's format.
>> See Kevin's reply for equivalent secure usage.
>>
>>>> With the latest interface blockdev-add, specifying the format is
>>>> mandatory.  Secure, but not really suitable for humans.
>>>>
>>>> Example of secure usage:
>>>>
>>>>       { "execute": "blockdev-add",
>>>>         "arguments": {'options': {
>>>>             'driver': 'raw', 'id':'foo',
>>>>             'file': { 'driver': 'file', 'filename': 'foo.img' } } } }
>>>>
>>>> Insecure usage is easy, secure usage is *hard*.  Even for sophisticated
>>>> users like libvirt developers.  Evidence: libvirt CVE-2010-2237,
>>>> CVE-2010-2238, CVE-2010-2239, and more that didn't get a CVE, like the
>>>> recent accidental probing in drive-mirror.
>>>>
>>>>
>>>> = How can we better guard the trust boundary in QEMU? =
>>>>
>>>> The guest can violate the trust boundary only because
>>>>
>>>> (a) QEMU supports both raw images and image formats, and
>>>>
>>>> (b) QEMU guesses image format from raw image contents, and
>>>>
>>>> (c) given a raw image, guests can change its contents and control a
>>>> future QEMU's format guess.
>>>>
>>>> We can attack one ore more of these three conditions:
>>> I'd like to attack more because any of these steps might be carried
>>> out in another program which thus either becomes vulnerable itself
>>> (which we don't really have to care about, but I'd like to either way)
>>> or which makes qemu vulnerable.
>>>
>>> Having an external program with (a) and (c), this makes qemu
>>> vulnerable if we don't try to forbid (b) or at least make it work
>>> better. Having an external program with (a) and (b), not doing
>>> anything against (c) in qemu makes that external program vulnerable.
>>>
>>>> (a) Outlaw raw images
>>>>
>>>> (b) Don't guess format from untrusted image contents
>>>>
>>>> (c) Prevent "bad" guest writes
>>>>
>>>> Nobody is seriously suggesting we do (a).  It's clearly too late for
>>>> that.  Let's explore the other two in more detail.
>>> And thus I prefer to find and implement solutions for *both* (b) and (c).
>> Quoting myself: "We can attack one ore more of these three conditions".
>
> Yes, and I referred to that by saying "I'd like to attack more".

Understood.

>>>> == Don't guess format from untrusted image contents ==
>>>>
>>>> Several variations of the theme.
>>>>
>>>> Guessing only happens when the user doesn't specify a format, so the
>>>> simplest way to avoid it would be requiring users to always specify the
>>>> format.
>>>>
>>>> PRO: Simple, plugs the hole.
>>>>
>>>> CON: Breaks a lot of existing usage.  Bye -hda, hello extra typing.
>>>>
>>>> Variation: command line option to opt out of probing completely.
>>>>
>>>> PRO: Simple, plugs the hole.
>>>>
>>>> CON: Insecure by default.
>>>>
>>>> CON: In addition to opting out, you have to update your usage
>>>> accordingly.  I guess libvirt would do it anyway, to guard against
>>>> accidental probing once and for all.
>>>>
>>>> Variation: Stefan proposed to make format= mandatory just for -drive.  I
>>>> guess he rather meant mandatory for anything but -hda and other selected
>>>> convenience interfaces.
>>>>
>>>> PRO: No extra typing in the cases where it makes the most difference.
>>>>
>>>> CON: Breaks existing usage in the other cases, extra typing.
>>>>
>>>> CON: Doesn't fully plug the hole.  Users of convenience interfaces may
>>>>       remain insecure out of ignorance.  We could add a warning to guide
>>>>       them to more secure usage, but then that warning would annoy users
>>>>       who don't care for security (sometimes with reason), and we can't
>>>>       have that.  So we silently leave the users who would care if they
>>>>       knew insecure.
>>>>
>>>> I proposed something less radical, namely to keep guessing the image
>>>> format, but base the guess on trusted meta-data only: file name and
>>>> attributes.
>>> You actually want to completely abolish probing? I thought we just
>>> wanted to use the filename as a second source of information and warn
>>> if the contents and the extension don't seem to match; and in the
>>> future, maybe make it an error (but we don't have to discuss that
>>> second part now, I think).
>> Yes, I propose to ditch it completely, after a suitable grace period.  I
>> tried to make that quite clear in my PATCH RFC 2/2.
>>
>> Here's why.
>>
>> Now: 1. probe
>>       4. open, error out if meta-data is bad
>>
>> With my proposed patch:
>>       1. probe
>>       2. guess from trusted meta-data
>>       3. warn unless 1 and 2 match
>>       4. open, error out if meta-data is bad
>>
>> Now make the warning an error:
>>       1. probe
>>       2. guess from trusted meta-data
>>       3. error out unless 1 and 2 match
>>       4. open, error out if meta-data is bad
>>
>> I figure the following is equivalent, but simpler:
>>
>>       2. guess from trusted meta-data
>>       4. open, error out if meta-data is bad
>>
>> because open should detect all the errors the previous step 3 caught.
>> If not, things are broken with explicit format=.
>
> You're right, it seems be equivalent. One difference will probably be
> error messages ("Bad signature" vs. "Filename extension and format do
> not match").

Yes.

> The other difference is that you have a problem if you cannot
> distinguish between two formats by extensions, as we've seen
> already. .qcow could mean both qcow1 and qcow2; a similar problem
> appeared with .vhd. Opening the image using all possible formats seems
> bad to me. Better swap 1 and 2: Guess from commonly trusted metadata
> (i.e. the filename extension) and then probe all possible formats.

Fair enough.

>>>> Block and character special files are raw.  For other
>>>> files, find the file name extension, and look up the format claiming it.
>>>>
>>>> PRO: Plugs the hole.
>>> You mean "plugs hole (b)".
>> What I (airily) call "the hole" is the scenario I described above: guest
>> escaping isolation by subverting qemu-system-*.
>>
>> (b) is not a hole, it's a condition for "the hole".
>
> Your "the hole" is only one hole. There are more holes. You only
> consider the case where there's only qemu, which is technically fine
> for discussion on qemu-devel, but I'd like to consider having other
> tools beside qemu as well.
>
> You're right, though, it should be "prevents condition (b)".
>
>> I guess what you want to say is "attacking just condition (b) doesn't
>> plug some other holes I care about".  That's true.  There are indeed
>> other holes.
>>
>> For instance, guest attacking QEMU's utility programs qemu-img, ...  Or
>> guest attacking other host software.  I'm not trying to discount any of
>> them.  But tangling all problems up into a hairball won't do us any
>> good.
>
> We have two approaches and I think using both will help to plug more
> holes. I'm not saying we should consider every possible hole with
> every host configuration there may be. I'm just saying using only of
> the approaches clearly only plugs "the hole" if there's only qemu.
>
>> So your point that my analysis is narrow is taken.  In my defense, I can
>> say that my narrow analysis was difficult enough to write up, and
>> probably produced enough text to deter readers.
>>
>>>> CON: Breaks existing usage when the new guess differs from the old
>>>>       guess.  Common usage should be fine:
>>>>
>>>>       * -hda test.qcow2
>>>>
>>>>         Fine as long as test.qcow2 is really QCOW2 (as it should!), and
>>>>         either specifies a backing format (as it arguably should), or the
>>>>         backing file name is sane.
>>>>
>>>>       * -hda disk.img
>>>>
>>>>         Fine as long as disk.img is really a disk image (as it should).
>>>>
>>>>       * -hda /dev/mapper/vg0-virtdisk
>>>>
>>>>         Fine as long as the logical volume is raw.
>>>>
>>>>       Less common usage can break:
>>>>
>>>>       * -hda nbd://localhost
>>>>
>>>>         Socket provides no clue, so no guess.
>>> nbd should be raw. If it isn't, you're most likely doing something
>>> wrong. See https://bugzilla.redhat.com/show_bug.cgi?id=1090713 what
>>> happens when you're doing it wrong.
>> Okay.
>>
>> My RFC PATCH is too simplistic to exploit that, because it only looks at
>> file name extensions.  But "user asked for nbd protocol" is quite
>> obviously trusted meta-data.  We could have a less simplistic patch
>> putting it to use.  Or adopt the variation below.
>>
>>>>       Weird usage can conceivably break hard:
>>>>
>>>>       * -hdd disk.img
>>>>
>>>>         Breaks hard when disk.img is actually QCOW2, the guest boots
>>>>         anyway from another drive, then proceeds to overwrite this one.
>>> Then why not continue to do probing and using the extension as a safeguard?
>>>
>>>> Mitigation: lengthy transition period where we warn "this usage is
>>>> insecure, and we'll eventually break it; here's a hint on secure usage".
>>>>
>>>> CON: We delay plugging the hole one more time.  But at least we no
>>>> longer expose our users to it silently.
>>>>
>>>> Jeff pointed out that we want to retain probing in things like qemu-img
>>>> info.
>>>>
>>>> Richard asked for a way for users to ask for insecure probing, say
>>>> format=insecure-probe.  I have no problem with giving users rope when
>>>> they ask for it.
>>>>
>>>> Variation: if file name and attributes are unavailable or provide no
>>>> clue, guess raw.  Same PRO and CON as above, only it avoids breaking a
>>>> few more cases.  For instance, "-hda nbd://localhost" keeps working as
>>>> long as the server serves a raw image.
>>> Which it should be.
>>>
>>>> Smells a bit like too much magic
>>>> to me.
>>>>
>>>> My proposal replaces probing wholesale.  I like that because it results
>>>> in simple, predictable guessing.  Here's a hybrid approach: first guess
>>>> raw vs. non-raw based on trusted meta-data only, and if non-raw, probe.
>>>>
>>>> Nothing the guest writes can affect the raw vs. non-raw decision.  Once
>>>> an image is raw, only the user can make it non-raw, by changing its name
>>>> or attributes.
>>>>
>>>> Two variations: 1. guess raw without a clue, and 2. guess non-raw then.
>>>>
>>>> Again, same PRO and CON as above, only it doesn't break when users give
>>>> their non-raw images weird names.
>>>>
>>>> == Prevent "bad" guest writes ==
>>>>
>>>> Again, several variations, but this time, only the last one is serious,
>>>> the others are just for illustration.
>>>>
>>>> Fail guest writes to those parts of the image that probing may examine
>>>> Can fail only writes to the first few sectors (at worst) of raw images.
>>>>
>>>> PRO: Plugs the hole.
>>>>
>>>> CON: The virtual hardware is defective.  Breaks common guest software
>>>> that writes to the first few sectors, such as boot loaders and
>>>> partitioning tools.  Breaks guest software using the whole device, which
>>>> isn't common, but certainly not unheard of.
>>>>
>>>> Variation: fail only writes of patterns that actually can make probing
>>>> guess something other than raw.
>>>>
>>>> PRO: Still plugs the hole.
>>> You mean "plugs hole (c)".
>> My reply to "plugs hole (b)" applies.
>>
>>>> CON: Except when you upgrade to a version that recognizes more patterns.
>>> Which is better than not plugging hole (c) at all.
>>>
>>>> CON: The virtual hardware is still defective, but the defects are
>>>> minimized.
>>> As you pointed out to us it's already defective and I don't think
>>> anybody ever noticed accidentally.
>> You're right in that probed raw is already defective, with defects
>> delayed to the next restart.  Preventing "bad" guest writes changes the
>> nature of the defects subtly, as I described above.
>
> Sector 0 is rarely ever written. It's not that people write some
> recognizable sequences there all the time but don't notice because
> they are quickly overwritten again.
>
> If nobody hit the problem accidentally until now, I'm certain that
> means that nobody ever wrote any recognizable sequence there while
> using qemu and raw (which is not too rare a combination).
>
>> This and the previous variation also extends them to non-probed raw.
>> The following variations avoid the extension.
>>
>>>> We can hope that partition tables, boot sectors and such
>>>> won't match the patterns, so common guest software hopefully works.
>>> It's worked in the past, that's good enough for me.
>>>
>>>> Guest software using the whole device still breaks, only now it breaks
>>>> later rather than sooner.
>>>>
>>>> Variation: fail writes only on *probed* raw images.
>>>>
>>>> CON: Doesn't fully plug the hole: mixing probed usage (user doesn't
>>>> specify format) with non-probed usage (user specifies format) remains
>>>> insecure.  The guest's write succeeds in non-probed usage, and the guest
>>>> escapes isolation in the next probed usage.
>>>>
>>>> CON: The virtual hardware is still defective, but it now comes with a
>>>> "defective on/off" switch, factory default "defective on".  We could add
>>>> a warning to guide users to switch defective off but then that warning
>>>> would annoy people who don't care to switch it off (sometimes with
>>>> reason), and we can't have that.  So we leave users who would care if
>>>> they knew in the dark.
>>>>
>>>> The two variations can be combined.  This is Kevin's proposal.
>>>>
>>>> CON: Doesn't fully plug the hole: union of both variations' flaws.
>>>>
>>>> CON: The virtual hardware is still defective: interesection of both
>>>> variations' defects.
>>>>
>>>>
>>>> = Conclusion =
>>>>
>>>> This is *my* conclusion.  Yours may be different, and that's okay.  It's
>>>> business, not personal ;)
>>>>
>>>> Secure usage should not be hard.
>>>>
>>>> If we permit insecure usage for convenience or compatibility, we should
>>>> warn the user, unless he clearly asked for it.  Even if that warning
>>>> annoys Kevin and Max.
>>> A warning does not annoy me as long as I know what it means.
>> Good :)
>>
>>>> If you want to suppress it with configure
>>>> --reckless or something, no objections.
>>>>
>>>> Same for defective virtual hardware.
>>>>
>>>> Kevin's proposal to prevent "bad" guest writes has relatively small
>>>> compatibility issues.  It improves things from "insecure by default" to
>>>> "slightly defective by default" as long as you use raw images either
>>>> always probed or always non-probed.  It doesn't help at all when you
>>>> alternate probed and non-probed usage.  Improvement of sorts, but it
>>>> still fails my "secure usage should not be hard" requirement, and that
>>>> requirement is a hard one for me.
>>>>
>>>> My proposal to ditch image contents probing entirely has more serious
>>>> compatibility issues.  In particular, we'd have to forgo sugared
>>>> convenience syntax for a number of less common things.  It definitely
>>>> needs a grace period where all usage we're going to break warns.  On the
>>>> up side, it will actually be secure by default when it's done.
>>>>
>>>> If this is not acceptable, my second choice is actually the command line
>>>> option to opt out of probing completely.  This doesn't address "insecure
>>>> by default" (sadly), but it does at least satisfy my "secure usage
>>>> should not be hard" requirement.
>>>>
>>>> If we should adopt Kevin's proposal against my objections, then I very
>>>> badly want the opt out option on top of it, opting out of both probing
>>>> and "bad" write prevention.
>>>>
>>>> Thanks for hearing me out.
>>> My conclusion: Don't ditch probing. It increases entropy, why would
>>> you ditch probing? Just combine it with the extension and if both
>>> don't seem to match, that's an error.
>> How does probing add value?
>>
>> Compare
>>
>>       1. probe
>>       2. guess from trusted meta-data
>>       3. error out unless 1 and 2 match
>>       4. open, error out if meta-data is bad
>>
>> to just
>>
>>       2. guess from trusted meta-data
>>       4. open, error out if meta-data is bad
>>
>> Let P be the driver recommended by probe, and G be the driver
>> recommended by guess.
>>
>> If P == G, same result: we open with the same driver.
>>
>> Else, if open with G fails, equivalent result: error out in step 3
>> vs. error out in step 4.
>>
>> Else, we have an odd case: one driver's probe accepts (P's), yet a
>> different driver's open succeeds (G's).
>>
>>      If G's probe rejects: is this a bug?  Shouldn't open always fail
>>      when probe rejects?
>>
>>      If G's probe accepts, then probing chose P over G.  Maybe it
>>      shouldn't have.  Or maybe the probing functions are imprecise.
>>      Anyway, keeping probing around makes this an error.  Should it be
>>      one?
>>
>> Am I missing something?
>
> No, see my reply above. I failed to consider that opening an image
> basically is advanced probing.
>
> The only thing I really see which could be missing is the problem of
> having ambiguous filename extensions, and that problem can easily be
> solved by keeping probing.
>
>>> Also, holes (b) and (c) are two different holes. We should fix
>>> both. We should fix (b) so qemu isn't vulnerable and we should fix (c)
>>> so qemu doesn't make other programs which do probe vulnerable.
>> Adjusting terminology, but hopefully not your intent: "the hole" isn't
>> the only hole that matters.  The conditions enable other holes.
>> Therefore, we should attack both (b) and (c).  Attacking (a) is not
>> practical.
>
> Yes. that's what I meant. There are various holes and preventing only
> one of the conditions (b) and (c) only plugs one (or a subset) of
> them, and I can easily see unplugged holes which can be fixed by
> preventing both.

No misunderstanding then.

>> Points taken, except I think we could attack (a) if we really wanted.
>
> Of course, but for that we'd either need some flat qcow2 mode or
> better support for an image format that does have such a flat mode.

Yes.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-06 13:02     ` Kevin Wolf
@ 2014-11-07 14:50       ` Markus Armbruster
  0 siblings, 0 replies; 38+ messages in thread
From: Markus Armbruster @ 2014-11-07 14:50 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Jeff Cody, qemu-devel, Stefan Hajnoczi, Max Reitz

Kevin Wolf <kwolf@redhat.com> writes:

> Am 06.11.2014 um 13:26 hat Markus Armbruster geschrieben:
>> >> * Reuse the image *without* specifying the raw format.  QEMU guesses the
>> >>    format based on untrusted image contents.  Now QEMU guesses a format
>> >>    chosen by the guest, with meta-data chosen by the guest.  By
>> >>    controlling image meta-data, the malicious guest can access arbitrary
>> >>    files as QEMU, enlarge its storage, and more.  A non-malicious guest
>> >>    can accidentally DoS itself, by writing a pattern probing recognizes.
>> >
>> > Thank you for bringing that to my attention. This means that I'm even
>> > more in favor of using Kevin's patches because in fact they don't
>> > break anything.
>> 
>> They break things differently.  The difference may or may not matter.
>> 
>> Example: innocent guest writes a recognized pattern.
>> 
>>   Now: next restart fails, guest DoSed itself until host operator gets
>>   around to adding format=raw to the configuration.  Consequence:
>>   downtime (probably lengthy), but no data corruption.
>
> Depends.
>
> Another possible outcome is that the guest doesn't DoS itself, but
> writes a valid image header. You've argued before that a guest could be
> keeping a qcow2 image on a whole disk for whatever odd reason. In this
> case, the qemu restart will present the nested image instead of the
> top-level one to the guest, which can probably be labelled corruption.

Point taken.

>>   With Kevin's patch: write fails, guest may or may not handle the
>>   failure gracefully.  Consequences can range from "guest complains to
>>   its logs (who cares)" via "guest stops whatever it's doing and refuses
>>   to continue until its hardware gets fixed (downtime as above)" to
>>   "data corruption".
>> 
>> Example: innocent guest first writes a recognized pattern, then
>> overwrites it with a non-recognized pattern.
>> 
>>   Now: works.
>> 
>>   With Kevin's patch: as above.
>> 
>> Again, I'm not claiming the differences are serious in practice, only
>> that they exist.
>
> Yes, the failure scenario is different. The point still stands that if
> it were relevant in practice, we would likely have heard of it before.

I grant you unlikely.  But I abhor even the unlikeliest data corruptors.
So let's not argue about likelihood, let's concentrate on "it doesn't
really add risk, it mostly changes existing breakage".  That's a point I
can take.

We could then argue whether the changes are for the better.  However, if
we can agree to provide a global "insecure probing on/off" switch, my
interest in arguing these details declines sharply.

>> > You actually want to completely abolish probing? I thought we just
>> > wanted to use the filename as a second source of information and warn
>> > if the contents and the extension don't seem to match; and in the
>> > future, maybe make it an error (but we don't have to discuss that
>> > second part now, I think).
>> 
>> Yes, I propose to ditch it completely, after a suitable grace period.  I
>> tried to make that quite clear in my PATCH RFC 2/2.
>> 
>> Here's why.
>> 
>> Now: 1. probe
>>      4. open, error out if meta-data is bad
>> 
>> With my proposed patch:
>>      1. probe
>>      2. guess from trusted meta-data
>>      3. warn unless 1 and 2 match
>>      4. open, error out if meta-data is bad
>> 
>> Now make the warning an error:
>>      1. probe
>>      2. guess from trusted meta-data
>>      3. error out unless 1 and 2 match
>>      4. open, error out if meta-data is bad
>> 
>> I figure the following is equivalent, but simpler:
>> 
>>      2. guess from trusted meta-data
>>      4. open, error out if meta-data is bad
>> 
>> because open should detect all the errors the previous step 3 caught.
>> If not, things are broken with explicit format=.
>
> Not entirely true, see below.
>
>> > My conclusion: Don't ditch probing. It increases entropy, why would
>> > you ditch probing? Just combine it with the extension and if both
>> > don't seem to match, that's an error.
>> 
>> How does probing add value?
>> 
>> Compare
>> 
>>      1. probe
>>      2. guess from trusted meta-data
>>      3. error out unless 1 and 2 match
>>      4. open, error out if meta-data is bad
>> 
>> to just
>> 
>>      2. guess from trusted meta-data
>>      4. open, error out if meta-data is bad
>> 
>> Let P be the driver recommended by probe, and G be the driver
>> recommended by guess.
>
> P = qcow2, G = raw
>
>> If P == G, same result: we open with the same driver.
>
> No, they are not the same.
>
>> Else, if open with G fails, equivalent result: error out in step 3
>> vs. error out in step 4.
>
> No, raw accepts the image.
>
>> Else, we have an odd case: one driver's probe accepts (P's), yet a
>> different driver's open succeeds (G's).
>> 
>>     If G's probe rejects: is this a bug?  Shouldn't open always fail
>>     when probe rejects?
>
> No, raw's probe doesn't reject, it just has a very low score.
>
>>     If G's probe accepts, then probing chose P over G.  Maybe it
>>     shouldn't have.  Or maybe the probing functions are imprecise.
>>     Anyway, keeping probing around makes this an error.  Should it be
>>     one?
>
> Yes, raw being the fallback for everything is imprecise.

Okay, that's what I missed, thanks.

>                                                          It's the only
> way we have for probing raw.

I don't want to call it a probe, since it doesn't actually probe
anything.  But that's semantics.

>> Am I missing something?
>
> This is the safety measure that was missing in your proposal, against
> corruption caused by a qcow2 image stored in foo.img that is now
> unintentionally opened as raw.

> Same thing probably for qcow2 stored on LVs etc.

The transition period is a safety measure.  In this period, any usage
that is liable to be interpreted differently in the future warns.

We can try to do more.  We could refuse to use raw without an explicit
format=raw when any other format probe accepts.

Might be a good idea in general: refuse to use a format without an
explicit format= when any other non-raw format probe accepts.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-06 15:00         ` Max Reitz
@ 2014-11-07 14:52           ` Markus Armbruster
  2014-11-07 15:17             ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Markus Armbruster @ 2014-11-07 14:52 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, Jeff Cody, qemu-devel, Stefan Hajnoczi

Max Reitz <mreitz@redhat.com> writes:

> On 2014-11-06 at 15:56, Jeff Cody wrote:
>> On Thu, Nov 06, 2014 at 01:53:35PM +0100, Max Reitz wrote:
>>> On 2014-11-06 at 13:26, Markus Armbruster wrote:
>>>> Max Reitz <mreitz@redhat.com> writes:
>>>>
>>>>> On 2014-11-04 at 19:45, Markus Armbruster wrote:
[...]
>>>>>> = How this lets the guest escape isolation =
>>>>>>
>>>>>> Unfortunately, this lets the guest shift the trust boundary and escape
>>>>>> isolation, as follows:
>>>>>>
>>>>>> * Expose a raw image to the guest (whether you specify the format=raw or
>>>>>>     let QEMU guess it doesn't matter).  The complete contents becomes
>>>>>>     untrusted.
>>>>>>
>>>>>> * Reuse the image *without* specifying the raw format.  QEMU guesses the
>>>>>>     format based on untrusted image contents.  Now QEMU guesses a format
>>>>>>     chosen by the guest, with meta-data chosen by the guest.  By
>>>>>>     controlling image meta-data, the malicious guest can access arbitrary
>>>>>>     files as QEMU, enlarge its storage, and more.  A non-malicious guest
>>>>>>     can accidentally DoS itself, by writing a pattern probing recognizes.
>>>>> Thank you for bringing that to my attention. This means that I'm even
>>>>> more in favor of using Kevin's patches because in fact they don't
>>>>> break anything.
>>>> They break things differently.  The difference may or may not matter.
>>>>
>>>> Example: innocent guest writes a recognized pattern.
>>>>
>>>>    Now: next restart fails, guest DoSed itself until host operator gets
>>>>    around to adding format=raw to the configuration.  Consequence:
>>>>    downtime (probably lengthy), but no data corruption.
>>>>
>>>>    With Kevin's patch: write fails, guest may or may not handle the
>>>>    failure gracefully.  Consequences can range from "guest complains to
>>>>    its logs (who cares)" via "guest stops whatever it's doing and refuses
>>>>    to continue until its hardware gets fixed (downtime as above)" to
>>>>    "data corruption".
>>> You somehow seem convinced that writing to sector 0 is a completely
>>> normal operation. For x86, it isn't, though.
>>>
>>> There are only a couple of programs which do that, I can only think
>>> of partitioning and setting up boot loaders. There's not a myriad of
>>> programs which would increase the probability of one both writing a
>>> recognizable pattern *and* not handling EPERM correctly.
>>>
>>> I see the probability of both happening at the same time as
>>> extremely low, not least because there are only a handful of
>>> programs which access that sector.
>>>
>> I'm not yet opposed to the "restricted-raw" method, but...
>>
>> I think the above is a somewhat dangerous viewpoint to take with QEMU.
>> It is a bit of a slippery slope to start to assume what data guests
>> will write to the disks provided to them.  Even if the probability of
>> this happening is very low, with what usage we envision now, it is
>> still entirely legitimate usage for a guest to write data starting at
>> sector 0.

Yup.

> Then let's officially deprecate format probing, if we haven't done so
> already. That way, there's no excuse.

I'd gladly deprecate format probing, or at least format probing
resulting in raw.  However, we can hardly deprecate something and keep
it the default behavior!

> What I'm saying is that there are obviously no compatibility
> issues. There is no guest software which did write recognizable
> patterns (so far nobody provided a counterexample), and since format
> probing is deprecated (or should be), you have no excuse for running
> future guests in qemu without having explicitly specified the format.
>
> And if you are specifying the format, Kevin's patches will not prevent
> the guest from making its disk a qcow2 image whatsoever.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-07 14:52           ` Markus Armbruster
@ 2014-11-07 15:17             ` Max Reitz
  2014-11-10  7:58               ` Markus Armbruster
  0 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2014-11-07 15:17 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Kevin Wolf, Jeff Cody, qemu-devel, Stefan Hajnoczi

On 2014-11-07 at 15:52, Markus Armbruster wrote:
> Max Reitz <mreitz@redhat.com> writes:
>
>> On 2014-11-06 at 15:56, Jeff Cody wrote:
>>> On Thu, Nov 06, 2014 at 01:53:35PM +0100, Max Reitz wrote:
>>>> On 2014-11-06 at 13:26, Markus Armbruster wrote:
>>>>> Max Reitz <mreitz@redhat.com> writes:
>>>>>
>>>>>> On 2014-11-04 at 19:45, Markus Armbruster wrote:
> [...]
>>>>>>> = How this lets the guest escape isolation =
>>>>>>>
>>>>>>> Unfortunately, this lets the guest shift the trust boundary and escape
>>>>>>> isolation, as follows:
>>>>>>>
>>>>>>> * Expose a raw image to the guest (whether you specify the format=raw or
>>>>>>>      let QEMU guess it doesn't matter).  The complete contents becomes
>>>>>>>      untrusted.
>>>>>>>
>>>>>>> * Reuse the image *without* specifying the raw format.  QEMU guesses the
>>>>>>>      format based on untrusted image contents.  Now QEMU guesses a format
>>>>>>>      chosen by the guest, with meta-data chosen by the guest.  By
>>>>>>>      controlling image meta-data, the malicious guest can access arbitrary
>>>>>>>      files as QEMU, enlarge its storage, and more.  A non-malicious guest
>>>>>>>      can accidentally DoS itself, by writing a pattern probing recognizes.
>>>>>> Thank you for bringing that to my attention. This means that I'm even
>>>>>> more in favor of using Kevin's patches because in fact they don't
>>>>>> break anything.
>>>>> They break things differently.  The difference may or may not matter.
>>>>>
>>>>> Example: innocent guest writes a recognized pattern.
>>>>>
>>>>>     Now: next restart fails, guest DoSed itself until host operator gets
>>>>>     around to adding format=raw to the configuration.  Consequence:
>>>>>     downtime (probably lengthy), but no data corruption.
>>>>>
>>>>>     With Kevin's patch: write fails, guest may or may not handle the
>>>>>     failure gracefully.  Consequences can range from "guest complains to
>>>>>     its logs (who cares)" via "guest stops whatever it's doing and refuses
>>>>>     to continue until its hardware gets fixed (downtime as above)" to
>>>>>     "data corruption".
>>>> You somehow seem convinced that writing to sector 0 is a completely
>>>> normal operation. For x86, it isn't, though.
>>>>
>>>> There are only a couple of programs which do that, I can only think
>>>> of partitioning and setting up boot loaders. There's not a myriad of
>>>> programs which would increase the probability of one both writing a
>>>> recognizable pattern *and* not handling EPERM correctly.
>>>>
>>>> I see the probability of both happening at the same time as
>>>> extremely low, not least because there are only a handful of
>>>> programs which access that sector.
>>>>
>>> I'm not yet opposed to the "restricted-raw" method, but...
>>>
>>> I think the above is a somewhat dangerous viewpoint to take with QEMU.
>>> It is a bit of a slippery slope to start to assume what data guests
>>> will write to the disks provided to them.  Even if the probability of
>>> this happening is very low, with what usage we envision now, it is
>>> still entirely legitimate usage for a guest to write data starting at
>>> sector 0.
> Yup.
>
>> Then let's officially deprecate format probing, if we haven't done so
>> already. That way, there's no excuse.
> I'd gladly deprecate format probing, or at least format probing
> resulting in raw.  However, we can hardly deprecate something and keep
> it the default behavior!

Why not?

"It's the default due to legacy, but it's your fault if something breaks."

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-06 15:01     ` Kevin Wolf
@ 2014-11-07 15:21       ` Markus Armbruster
  2014-11-07 17:33         ` Jeff Cody
  2014-11-10  8:13         ` Markus Armbruster
  0 siblings, 2 replies; 38+ messages in thread
From: Markus Armbruster @ 2014-11-07 15:21 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Jeff Cody, qemu-devel, Stefan Hajnoczi, Max Reitz

Kevin Wolf <kwolf@redhat.com> writes:

> Am 06.11.2014 um 14:57 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > Am 04.11.2014 um 19:45 hat Markus Armbruster geschrieben:
>> >> I'll try to explain all solutions fairly.  Isn't easy when you're as
>> >> biased towards one of them as I am.  Please bear with me.
>> >> 
>> >> 
>> >> = The trust boundary between image contents and meta-data =
>> >> 
>> >> A disk image consists of image contents and meta-data.
>> >> 
>> >> Example: all of a raw image's contents is image contents.  Leaves just
>> >> file name and attributes for meta-data.
>> >
>> > Better: Leaves only protocol-specific metadata (e.g. file name and
>> > attributes for raw-posix).
>> 
>> Can you give examples for other protocols?
>
> Max already gave the example of NBD always implying raw.
>
> I can also imagine that protocols that take URLs would use the file name
> from the URL - which is not necessarily the end of the string, because
> query options could follow.
>
> Even though I don't think we have it today, it's also not entirely
> unthinkable that some network protocol specifically made for VM images
> (perhaps something like Sheepdog) could be storing the image format as
> metadata on the server. Actually, I think that would make a whole lot of
> sense for them.

Thanks.

>> >> = Insecure usage is easy, secure usage is hard =
>> >> 
>> >> The oldest stratum of user interfaces doesn't let you specify the image
>> >> format.  Use of raw images with these is insecure by design.  These
>> >> interfaces are still recommended for human users.
>> >> 
>> >> Example of insecure usage: -hda foo.img, where foo.img is raw.
>> >> 
>> >> With the next generation of interfaces, specifying the image format is
>> >> optional.  Use of raw images with these is insecure by default.
>> >> 
>> >> Example of insecure usage: -drive file=foo.img,index=0,media=cdrom,
>> >> where foo.img is raw.  The -hda above is actually sugar for this.
>> >> 
>> >> Equivalent secure usage: add format=raw.
>> >> 
>> >> Note that specifying just the top image's format is not enough, you also
>> >> have to specify any backing images' formats.  QCOW2 can optionally store
>> >> the backing image format in the image.  The other COW formats can't.
>> >> 
>> >> Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
>> >> with a raw backing file.
>> >
>> > Usually this is mitigated by the fact that backing files are read-only.
>> > Trouble is starting when you use things like commit.
>> 
>> Yes.
>> 
>> >> Equivalent secure usage: Beats me.  Maybe there's a funky -drive
>> >> backing.whatever to specify the backing image's format.
>> >
>> > Yes, you can override the backing file driver (backing.driver=raw should
>> > do the trick). Not really user-friendly, especially with long backing
>> > file chains, but it happens to be there.
>> >
>> > And of course, libvirt should be using it for non-qcow2 or qcow2 without
>> > the backing format header extension (but doesn't yet).
>> 
>> I'm glad it's there.  Too bad libvirt doesn't use it, yet.  Supports my
>> point that secure usage is too hard now.
>
> I don't know whether it's related to being too hard or just too new. I
> won't disagree when you say that it isn't obvious, but the libvirt
> authors are experts and probably know better than the average command
> line user what they should be doing ideally.

I probably know better than average, too.  Yet I wouldn't bet on me
being able to avoid insecure format probing 100%, because I have to ask
for security for every image and backing image separately, and some of
the fancy stuff QEMU can do taxes my poor old mind enough for me not to
trust it not to slip.

Likewise, I'm reluctant to trust even competently written software to
get it right 100%.  It's just too complex.

A global "insecure probing on/off" switch would help.

>> >> I proposed something less radical, namely to keep guessing the image
>> >> format, but base the guess on trusted meta-data only: file name and
>> >> attributes.  Block and character special files are raw.  For other
>> >> files, find the file name extension, and look up the format claiming it.
>> >> 
>> >> PRO: Plugs the hole.
>> >> 
>> >> CON: Breaks existing usage when the new guess differs from the old
>> >>     guess.  Common usage should be fine:
>> >> 
>> >>     * -hda test.qcow2
>> >> 
>> >>       Fine as long as test.qcow2 is really QCOW2 (as it should!), and
>> >>       either specifies a backing format (as it arguably should), or the
>> >>       backing file name is sane.
>> >> 
>> >>     * -hda disk.img
>> >> 
>> >>       Fine as long as disk.img is really a disk image (as it should).
>> >
>> > .img is not as clear, I've seen people using it for other formats. It's
>> > still a disk image, but not a raw one.
>> 
>> Is this usage common?
>
> More common that writing a qcow2 header to your boot sector. ;-)
>
> But seriously, one of the problems in this discussion is that we don't
> have any actual data for more exotic use cases. I can only say that I've
> seen it before, even though that doesn't mean much.
>
> If you want me to guess: Not really common, but probably one of the most
> common corner cases from those that we've been discussing here.

Plausible, given the anecdotical evidence we have.

>> >>     * -hda /dev/mapper/vg0-virtdisk
>> >> 
>> >>       Fine as long as the logical volume is raw.
>> >> 
>> >>     Less common usage can break:
>> >> 
>> >>     * -hda nbd://localhost
>> >> 
>> >>       Socket provides no clue, so no guess.
>> >> 
>> >>     Weird usage can conceivably break hard:
>> >> 
>> >>     * -hdd disk.img
>> >> 
>> >>       Breaks hard when disk.img is actually QCOW2, the guest boots
>> >>       anyway from another drive, then proceeds to overwrite this one.
>> >> 
>> >> Mitigation: lengthy transition period where we warn "this usage is
>> >> insecure, and we'll eventually break it; here's a hint on secure usage".
>> >> 
>> >> CON: We delay plugging the hole one more time.  But at least we no
>> >> longer expose our users to it silently.
>> >
>> > CON: Relies on metadata that is protocol-specific. Each protocol that
>> >      should support probing needs extra code. Essentially means that
>> >      probing will be disabled on anything except raw-posix (and if we're
>> >      lucky enough that someone pays attention during review, raw-win32)
>> 
>> Terminology: I use "probing" and "guessing from trusted meta-data".  The
>> former is for probing raw image contents.  The latter may only examine
>> trusted meta-data.
>> 
>> I suspect you mean "each protocol that should support guessing from
>> trusted meta-data needs extra code".  Do you?
>
> Yes.
>
> I'll try to remember using "probing" and "guessing" with your meaning.
>
>> >> == Prevent "bad" guest writes ==
>> >> 
>> >> Again, several variations, but this time, only the last one is serious,
>> >> the others are just for illustration.
>> >> 
>> >> Fail guest writes to those parts of the image that probing may examine
>> >> Can fail only writes to the first few sectors (at worst) of raw images.
>> >> 
>> >> PRO: Plugs the hole.
>
> PRO: Fix is small, local to raw block driver, and obviously complete (in
> the sense that it catches every usage of the raw format).

Point taken.

> PRO: Only affects images actually opened as raw; can't possibly break
> non-raw use cases

Related, but point taken.  Both apply to the variations, too.

> CON: Can affect raw images even with an explicit format=raw

For me, this one is covered by my CON:

>> >> CON: The virtual hardware is defective.  Breaks common guest software
>> >> that writes to the first few sectors, such as boot loaders and
>> >> partitioning tools.  Breaks guest software using the whole device, which
>> >> isn't common, but certainly not unheard of.
>> >> 
>> >> Variation: fail only writes of patterns that actually can make probing
>> >> guess something other than raw.
>> >> 
>> >> PRO: Still plugs the hole.
>> >> 
>> >> CON: Except when you upgrade to a version that recognizes more patterns.
>> >> 
>> >> CON: The virtual hardware is still defective, but the defects are
>> >> minimized.  We can hope that partition tables, boot sectors and such
>> >> won't match the patterns, so common guest software hopefully works.
>> >> Guest software using the whole device still breaks, only now it breaks
>> >> later rather than sooner.
>> >> 
>> >> Variation: fail writes only on *probed* raw images.
>
> PRO: Plugs the hole in the most common case (user relies consistently on
> probing)

This is my first CON with a different baseline.  A partial fix is better
than nothing, so if "nothing" is the baseline, file under PRO.  It's
worse than a full fix, so if that's the baseline (and it consistently is
in my memo), file it under CON.

In short, I don't disagree with you, I just wrote it up differently, and
possibly suboptimally.

> PRO: Users explicitly specifying format=raw can't possibly be affected
> any more, just like non-raw formats in the basic variant.

This is my second CON with a different baseline: "defective in some
configurations" is better than "defective in all configurations", but
worse than "not defective".

>> >> CON: Doesn't fully plug the hole: mixing probed usage (user doesn't
>> >> specify format) with non-probed usage (user specifies format) remains
>> >> insecure.  The guest's write succeeds in non-probed usage, and the guest
>> >> escapes isolation in the next probed usage.
>> >> 
>> >> CON: The virtual hardware is still defective, but it now comes with a
>> >> "defective on/off" switch, factory default "defective on".  We could add
>> >> a warning to guide users to switch defective off but then that warning
>> >> would annoy people who don't care to switch it off (sometimes with
>> >> reason), and we can't have that.  So we leave users who would care if
>> >> they knew in the dark.
>> 
>> Replace by
>> 
>> CON: The virtual hardware is still defective, but it now comes with a
>> "defective on/off" switch, factory default "defective on".  We could add
>> a warning to guide users to switch defective off.
>> 
>> >> The two variations can be combined.  This is Kevin's proposal.
>> >> 
>> >> CON: Doesn't fully plug the hole: union of both variations' flaws.
>> >> 
>> >> CON: The virtual hardware is still defective: interesection of both
>> >> variations' defects.
>
> PRO: Union of both variations' advantages wrt false positives, which are
> minimised as much as possible: Explicit format=... or usage of non-raw
> formats doesn't trigger the check. This limits it to cases that are
> already broken today (even though the failure mode changes - can
> possibly even be called a bonus bug fix).

This is worth spelling out, thanks.

> PRO: Still plugs the hole in the most common case (user relies
> consistently on probing)

This is my first CON with a different baseline.  Again, I'm not
disagreeing, just explaining the thinking behind my writing.

> PRO: The fix is still small, local to raw block driver, and obviously
> complete (in the sense that it catches every usage of the raw format).
>
>> > I like how you took care to avoid finding any PROs. :-)
>> 
>> Come on, this line 281 of 327, cut me some slack :)
>> 
>> > I'll leave commenting on this section to others for now. I feel I have
>> > already said enough about it in the other threads, and defending it here
>> > at this point wouldn't help the discussion.
>> 
>> The purpose of this document is to summarize our thoughts.  Need yours
>> to achieve it.  Can you give me your concise PROs?
>
> Added them above.

Thanks!

Are you ready to write up your conclusion, similar to how I did?

Before you do, let me refine / vary the hybrid approach I mentioned
under " Don't guess format from untrusted image contents" some.  I think
I can trace some inspiration to Max here.

Say we use trusted meta-data to compute a set of admissible formats, and
if the set has multiple members, use probing to pick one of them.

Example: foo.qcow2 -> { qcow2 }, no probing

Example: foo.qcow -> { qcow, qcow2 }, probe to pick one

Likewise for foo.vhdx and foo.vhd.

To ensure this actually knocks out condition (b), all members of the set
must have the image contents used by their members' probes within their
trust boundary.

Example: { qcow, qcow2 } is fine, because both formats have a header,
and each header covers the bytes either probe examines.

Example: { raw } is fine, because there is no probing.

Counterexample: { raw, qcow2 } is not possible, because qcow2 probes
outside raw's trusted metadata.

Note my careful wording "contents used by [other] probes".  Right now we
simply assume that the first 2048 bytes can be trusted.  This is not
obviously the case!  If I remember correctly, you proposed to cut it to
512 bytes, which feels a lot safer, since any sane format probably
aligns (untrusted) image contents to at least a 512 byte boundary, but
is still theoretically unsound.

PRO and CON like my proposal to guess from trusted meta-data only.  The
difference is in what existing usage exactly it breaks.

Can be combined with "refuse to use a format without an explicit format=
when any other non-raw format probe accepts", just like everything else
proposed so far.

>                   Of course, you could also add the CONs of the other
> proposal as PROs here, like "works with any filename and even protocols
> that don't have anything filename-like", but I think they are already
> covered well enough in the other section.

Yes, one alternative's PRO is often another alternative's CON.  Covering
each issue in all places explicitly could perhaps be clearer.  Instead,
I chose to pick a baseline, and PRO/CON off that, for (relative)
brevity.

>> > And yes, it also doesn't help when you accidentally type format=qcow2
>> > instead of format=raw. When you have images of both types, things like
>> > this happen with manual typing.
>> 
>> I consider mixing probed and non-probed usage a more plausible bad habit
>> than accidental use of format=qcow2, because the former works just fine,
>> but the latter fails (unless your guest has "helpfully" written a QCOW2
>> header).
>
> I'll admit that mixing probed and non-probed usage might be more common
> (though I think that I for one am pretty consistent in not using
> format=... with my trusted images - why would I?), but I consider both
> cases plausible. I've mistyped enough -f options for qemu-img. And that
> it fails doesn't prevent the typo.

My point wasn't that accidental misuse doesn't matter, only that
mistakes that have no immediately visible consequences can easily become
bad habits.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-07 15:21       ` Markus Armbruster
@ 2014-11-07 17:33         ` Jeff Cody
  2014-11-10  8:12           ` Markus Armbruster
  2014-11-10  8:13         ` Markus Armbruster
  1 sibling, 1 reply; 38+ messages in thread
From: Jeff Cody @ 2014-11-07 17:33 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

On Fri, Nov 07, 2014 at 04:21:38PM +0100, Markus Armbruster wrote:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 06.11.2014 um 14:57 hat Markus Armbruster geschrieben:
> >> Kevin Wolf <kwolf@redhat.com> writes:
> >> 
> >> > Am 04.11.2014 um 19:45 hat Markus Armbruster geschrieben:
> >> >> I'll try to explain all solutions fairly.  Isn't easy when you're as
> >> >> biased towards one of them as I am.  Please bear with me.
> >> >> 
> >> >> 
> >> >> = The trust boundary between image contents and meta-data =
> >> >> 
> >> >> A disk image consists of image contents and meta-data.
> >> >> 
> >> >> Example: all of a raw image's contents is image contents.  Leaves just
> >> >> file name and attributes for meta-data.
> >> >
> >> > Better: Leaves only protocol-specific metadata (e.g. file name and
> >> > attributes for raw-posix).
> >> 
> >> Can you give examples for other protocols?
> >
> > Max already gave the example of NBD always implying raw.
> >
> > I can also imagine that protocols that take URLs would use the file name
> > from the URL - which is not necessarily the end of the string, because
> > query options could follow.
> >
> > Even though I don't think we have it today, it's also not entirely
> > unthinkable that some network protocol specifically made for VM images
> > (perhaps something like Sheepdog) could be storing the image format as
> > metadata on the server. Actually, I think that would make a whole lot of
> > sense for them.
> 
> Thanks.
> 
> >> >> = Insecure usage is easy, secure usage is hard =
> >> >> 
> >> >> The oldest stratum of user interfaces doesn't let you specify the image
> >> >> format.  Use of raw images with these is insecure by design.  These
> >> >> interfaces are still recommended for human users.
> >> >> 
> >> >> Example of insecure usage: -hda foo.img, where foo.img is raw.
> >> >> 
> >> >> With the next generation of interfaces, specifying the image format is
> >> >> optional.  Use of raw images with these is insecure by default.
> >> >> 
> >> >> Example of insecure usage: -drive file=foo.img,index=0,media=cdrom,
> >> >> where foo.img is raw.  The -hda above is actually sugar for this.
> >> >> 
> >> >> Equivalent secure usage: add format=raw.
> >> >> 
> >> >> Note that specifying just the top image's format is not enough, you also
> >> >> have to specify any backing images' formats.  QCOW2 can optionally store
> >> >> the backing image format in the image.  The other COW formats can't.
> >> >> 
> >> >> Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
> >> >> with a raw backing file.
> >> >
> >> > Usually this is mitigated by the fact that backing files are read-only.
> >> > Trouble is starting when you use things like commit.
> >> 
> >> Yes.
> >> 
> >> >> Equivalent secure usage: Beats me.  Maybe there's a funky -drive
> >> >> backing.whatever to specify the backing image's format.
> >> >
> >> > Yes, you can override the backing file driver (backing.driver=raw should
> >> > do the trick). Not really user-friendly, especially with long backing
> >> > file chains, but it happens to be there.
> >> >
> >> > And of course, libvirt should be using it for non-qcow2 or qcow2 without
> >> > the backing format header extension (but doesn't yet).
> >> 
> >> I'm glad it's there.  Too bad libvirt doesn't use it, yet.  Supports my
> >> point that secure usage is too hard now.
> >
> > I don't know whether it's related to being too hard or just too new. I
> > won't disagree when you say that it isn't obvious, but the libvirt
> > authors are experts and probably know better than the average command
> > line user what they should be doing ideally.
> 
> I probably know better than average, too.  Yet I wouldn't bet on me
> being able to avoid insecure format probing 100%, because I have to ask
> for security for every image and backing image separately, and some of
> the fancy stuff QEMU can do taxes my poor old mind enough for me not to
> trust it not to slip.
> 
> Likewise, I'm reluctant to trust even competently written software to
> get it right 100%.  It's just too complex.
> 
> A global "insecure probing on/off" switch would help.
> 
> >> >> I proposed something less radical, namely to keep guessing the image
> >> >> format, but base the guess on trusted meta-data only: file name and
> >> >> attributes.  Block and character special files are raw.  For other
> >> >> files, find the file name extension, and look up the format claiming it.
> >> >> 
> >> >> PRO: Plugs the hole.
> >> >> 
> >> >> CON: Breaks existing usage when the new guess differs from the old
> >> >>     guess.  Common usage should be fine:
> >> >> 
> >> >>     * -hda test.qcow2
> >> >> 
> >> >>       Fine as long as test.qcow2 is really QCOW2 (as it should!), and
> >> >>       either specifies a backing format (as it arguably should), or the
> >> >>       backing file name is sane.
> >> >> 
> >> >>     * -hda disk.img
> >> >> 
> >> >>       Fine as long as disk.img is really a disk image (as it should).
> >> >
> >> > .img is not as clear, I've seen people using it for other formats. It's
> >> > still a disk image, but not a raw one.
> >> 
> >> Is this usage common?
> >
> > More common that writing a qcow2 header to your boot sector. ;-)
> >
> > But seriously, one of the problems in this discussion is that we don't
> > have any actual data for more exotic use cases. I can only say that I've
> > seen it before, even though that doesn't mean much.
> >
> > If you want me to guess: Not really common, but probably one of the most
> > common corner cases from those that we've been discussing here.
> 
> Plausible, given the anecdotical evidence we have.
> 
> >> >>     * -hda /dev/mapper/vg0-virtdisk
> >> >> 
> >> >>       Fine as long as the logical volume is raw.
> >> >> 
> >> >>     Less common usage can break:
> >> >> 
> >> >>     * -hda nbd://localhost
> >> >> 
> >> >>       Socket provides no clue, so no guess.
> >> >> 
> >> >>     Weird usage can conceivably break hard:
> >> >> 
> >> >>     * -hdd disk.img
> >> >> 
> >> >>       Breaks hard when disk.img is actually QCOW2, the guest boots
> >> >>       anyway from another drive, then proceeds to overwrite this one.
> >> >> 
> >> >> Mitigation: lengthy transition period where we warn "this usage is
> >> >> insecure, and we'll eventually break it; here's a hint on secure usage".
> >> >> 
> >> >> CON: We delay plugging the hole one more time.  But at least we no
> >> >> longer expose our users to it silently.
> >> >
> >> > CON: Relies on metadata that is protocol-specific. Each protocol that
> >> >      should support probing needs extra code. Essentially means that
> >> >      probing will be disabled on anything except raw-posix (and if we're
> >> >      lucky enough that someone pays attention during review, raw-win32)
> >> 
> >> Terminology: I use "probing" and "guessing from trusted meta-data".  The
> >> former is for probing raw image contents.  The latter may only examine
> >> trusted meta-data.
> >> 
> >> I suspect you mean "each protocol that should support guessing from
> >> trusted meta-data needs extra code".  Do you?
> >
> > Yes.
> >
> > I'll try to remember using "probing" and "guessing" with your meaning.
> >
> >> >> == Prevent "bad" guest writes ==
> >> >> 
> >> >> Again, several variations, but this time, only the last one is serious,
> >> >> the others are just for illustration.
> >> >> 
> >> >> Fail guest writes to those parts of the image that probing may examine
> >> >> Can fail only writes to the first few sectors (at worst) of raw images.
> >> >> 
> >> >> PRO: Plugs the hole.
> >
> > PRO: Fix is small, local to raw block driver, and obviously complete (in
> > the sense that it catches every usage of the raw format).
> 
> Point taken.
> 
> > PRO: Only affects images actually opened as raw; can't possibly break
> > non-raw use cases
> 
> Related, but point taken.  Both apply to the variations, too.
> 
> > CON: Can affect raw images even with an explicit format=raw
> 
> For me, this one is covered by my CON:
> 
> >> >> CON: The virtual hardware is defective.  Breaks common guest software
> >> >> that writes to the first few sectors, such as boot loaders and
> >> >> partitioning tools.  Breaks guest software using the whole device, which
> >> >> isn't common, but certainly not unheard of.
> >> >> 
> >> >> Variation: fail only writes of patterns that actually can make probing
> >> >> guess something other than raw.
> >> >> 
> >> >> PRO: Still plugs the hole.
> >> >> 
> >> >> CON: Except when you upgrade to a version that recognizes more patterns.
> >> >> 
> >> >> CON: The virtual hardware is still defective, but the defects are
> >> >> minimized.  We can hope that partition tables, boot sectors and such
> >> >> won't match the patterns, so common guest software hopefully works.
> >> >> Guest software using the whole device still breaks, only now it breaks
> >> >> later rather than sooner.
> >> >> 
> >> >> Variation: fail writes only on *probed* raw images.
> >
> > PRO: Plugs the hole in the most common case (user relies consistently on
> > probing)
> 
> This is my first CON with a different baseline.  A partial fix is better
> than nothing, so if "nothing" is the baseline, file under PRO.  It's
> worse than a full fix, so if that's the baseline (and it consistently is
> in my memo), file it under CON.
> 
> In short, I don't disagree with you, I just wrote it up differently, and
> possibly suboptimally.
> 
> > PRO: Users explicitly specifying format=raw can't possibly be affected
> > any more, just like non-raw formats in the basic variant.
> 
> This is my second CON with a different baseline: "defective in some
> configurations" is better than "defective in all configurations", but
> worse than "not defective".
> 
> >> >> CON: Doesn't fully plug the hole: mixing probed usage (user doesn't
> >> >> specify format) with non-probed usage (user specifies format) remains
> >> >> insecure.  The guest's write succeeds in non-probed usage, and the guest
> >> >> escapes isolation in the next probed usage.
> >> >> 
> >> >> CON: The virtual hardware is still defective, but it now comes with a
> >> >> "defective on/off" switch, factory default "defective on".  We could add
> >> >> a warning to guide users to switch defective off but then that warning
> >> >> would annoy people who don't care to switch it off (sometimes with
> >> >> reason), and we can't have that.  So we leave users who would care if
> >> >> they knew in the dark.
> >> 
> >> Replace by
> >> 
> >> CON: The virtual hardware is still defective, but it now comes with a
> >> "defective on/off" switch, factory default "defective on".  We could add
> >> a warning to guide users to switch defective off.
> >> 
> >> >> The two variations can be combined.  This is Kevin's proposal.
> >> >> 
> >> >> CON: Doesn't fully plug the hole: union of both variations' flaws.
> >> >> 
> >> >> CON: The virtual hardware is still defective: interesection of both
> >> >> variations' defects.
> >
> > PRO: Union of both variations' advantages wrt false positives, which are
> > minimised as much as possible: Explicit format=... or usage of non-raw
> > formats doesn't trigger the check. This limits it to cases that are
> > already broken today (even though the failure mode changes - can
> > possibly even be called a bonus bug fix).
> 
> This is worth spelling out, thanks.
> 
> > PRO: Still plugs the hole in the most common case (user relies
> > consistently on probing)
> 
> This is my first CON with a different baseline.  Again, I'm not
> disagreeing, just explaining the thinking behind my writing.
> 
> > PRO: The fix is still small, local to raw block driver, and obviously
> > complete (in the sense that it catches every usage of the raw format).
> >
> >> > I like how you took care to avoid finding any PROs. :-)
> >> 
> >> Come on, this line 281 of 327, cut me some slack :)
> >> 
> >> > I'll leave commenting on this section to others for now. I feel I have
> >> > already said enough about it in the other threads, and defending it here
> >> > at this point wouldn't help the discussion.
> >> 
> >> The purpose of this document is to summarize our thoughts.  Need yours
> >> to achieve it.  Can you give me your concise PROs?
> >
> > Added them above.
> 
> Thanks!
> 
> Are you ready to write up your conclusion, similar to how I did?
> 
> Before you do, let me refine / vary the hybrid approach I mentioned
> under " Don't guess format from untrusted image contents" some.  I think
> I can trace some inspiration to Max here.
> 
> Say we use trusted meta-data to compute a set of admissible formats, and
> if the set has multiple members, use probing to pick one of them.
> 
> Example: foo.qcow2 -> { qcow2 }, no probing
> 
> Example: foo.qcow -> { qcow, qcow2 }, probe to pick one
> 
> Likewise for foo.vhdx and foo.vhd.
> 
> To ensure this actually knocks out condition (b), all members of the set
> must have the image contents used by their members' probes within their
> trust boundary.
> 
> Example: { qcow, qcow2 } is fine, because both formats have a header,
> and each header covers the bytes either probe examines.
> 
> Example: { raw } is fine, because there is no probing.
> 
> Counterexample: { raw, qcow2 } is not possible, because qcow2 probes
> outside raw's trusted metadata.
> 
> Note my careful wording "contents used by [other] probes".  Right now we
> simply assume that the first 2048 bytes can be trusted.  This is not
> obviously the case!  If I remember correctly, you proposed to cut it to
> 512 bytes, which feels a lot safer, since any sane format probably
> aligns (untrusted) image contents to at least a 512 byte boundary, but
> is still theoretically unsound.
> 
> PRO and CON like my proposal to guess from trusted meta-data only.  The
> difference is in what existing usage exactly it breaks.
> 
> Can be combined with "refuse to use a format without an explicit format=
> when any other non-raw format probe accepts", just like everything else
> proposed so far.
> 

I think I like this approach, when combined with the above paragraph.
If I understand it correctly, it means it should also help protect a
bit against incorrectly detecting raw.

Looking at extension '.vhd':

foo.vhd -> {vpc, vhdx}

In practice, when probing by contents, we could get these results:

image         | probe result
------------------------------
vhdx               vhdx
dynamic vhd        vpc
fixed vhd          raw

If the results are either vhdx or vpc, then that is unambiguous and
OK.  If the result is raw, we would refuse to open it, because it is
not in the accepted set of formats for type '.vhd'.



If we look at a generic extension, '.img':

foo.img -> { raw, qcow2, qcow2, vhdx, vpc, qed, vmdk, parallels, unsupported }

If foo.img probe returned qcow2, we would fail, because { raw, qcow2 }
would both be valid.

But if probe is negative, and returns raw, it is still unknown,
because could mean {raw, vpc, unsupported}.

So that would mean .img would always require format=, right?

That also implies to me that the only extensions for raw that might
not require format= would be .iso and .raw.

> >                   Of course, you could also add the CONs of the other
> > proposal as PROs here, like "works with any filename and even protocols
> > that don't have anything filename-like", but I think they are already
> > covered well enough in the other section.
> 
> Yes, one alternative's PRO is often another alternative's CON.  Covering
> each issue in all places explicitly could perhaps be clearer.  Instead,
> I chose to pick a baseline, and PRO/CON off that, for (relative)
> brevity.
> 
> >> > And yes, it also doesn't help when you accidentally type format=qcow2
> >> > instead of format=raw. When you have images of both types, things like
> >> > this happen with manual typing.
> >> 
> >> I consider mixing probed and non-probed usage a more plausible bad habit
> >> than accidental use of format=qcow2, because the former works just fine,
> >> but the latter fails (unless your guest has "helpfully" written a QCOW2
> >> header).
> >
> > I'll admit that mixing probed and non-probed usage might be more common
> > (though I think that I for one am pretty consistent in not using
> > format=... with my trusted images - why would I?), but I consider both
> > cases plausible. I've mistyped enough -f options for qemu-img. And that
> > it fails doesn't prevent the typo.
> 
> My point wasn't that accidental misuse doesn't matter, only that
> mistakes that have no immediately visible consequences can easily become
> bad habits.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-07 15:17             ` Max Reitz
@ 2014-11-10  7:58               ` Markus Armbruster
  0 siblings, 0 replies; 38+ messages in thread
From: Markus Armbruster @ 2014-11-10  7:58 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, Jeff Cody, qemu-devel, Stefan Hajnoczi

Max Reitz <mreitz@redhat.com> writes:

> On 2014-11-07 at 15:52, Markus Armbruster wrote:
>> Max Reitz <mreitz@redhat.com> writes:
>>
>>> On 2014-11-06 at 15:56, Jeff Cody wrote:
>>>> On Thu, Nov 06, 2014 at 01:53:35PM +0100, Max Reitz wrote:
>>>>> On 2014-11-06 at 13:26, Markus Armbruster wrote:
>>>>>> Max Reitz <mreitz@redhat.com> writes:
>>>>>>
>>>>>>> On 2014-11-04 at 19:45, Markus Armbruster wrote:
>> [...]
>>>>>>>> = How this lets the guest escape isolation =
>>>>>>>>
>>>>>>>> Unfortunately, this lets the guest shift the trust boundary and escape
>>>>>>>> isolation, as follows:
>>>>>>>>
>>>>>>>> * Expose a raw image to the guest (whether you specify the format=raw or
>>>>>>>>      let QEMU guess it doesn't matter).  The complete contents becomes
>>>>>>>>      untrusted.
>>>>>>>>
>>>>>>>> * Reuse the image *without* specifying the raw format.  QEMU guesses the
>>>>>>>>      format based on untrusted image contents.  Now QEMU
>>>>>>>> guesses a format
>>>>>>>>      chosen by the guest, with meta-data chosen by the guest.  By
>>>>>>>>      controlling image meta-data, the malicious guest can
>>>>>>>> access arbitrary
>>>>>>>>      files as QEMU, enlarge its storage, and more.  A
>>>>>>>> non-malicious guest
>>>>>>>>      can accidentally DoS itself, by writing a pattern probing
>>>>>>>> recognizes.
>>>>>>> Thank you for bringing that to my attention. This means that I'm even
>>>>>>> more in favor of using Kevin's patches because in fact they don't
>>>>>>> break anything.
>>>>>> They break things differently.  The difference may or may not matter.
>>>>>>
>>>>>> Example: innocent guest writes a recognized pattern.
>>>>>>
>>>>>>     Now: next restart fails, guest DoSed itself until host operator gets
>>>>>>     around to adding format=raw to the configuration.  Consequence:
>>>>>>     downtime (probably lengthy), but no data corruption.
>>>>>>
>>>>>>     With Kevin's patch: write fails, guest may or may not handle the
>>>>>>     failure gracefully.  Consequences can range from "guest complains to
>>>>>>     its logs (who cares)" via "guest stops whatever it's doing and refuses
>>>>>>     to continue until its hardware gets fixed (downtime as above)" to
>>>>>>     "data corruption".
>>>>> You somehow seem convinced that writing to sector 0 is a completely
>>>>> normal operation. For x86, it isn't, though.
>>>>>
>>>>> There are only a couple of programs which do that, I can only think
>>>>> of partitioning and setting up boot loaders. There's not a myriad of
>>>>> programs which would increase the probability of one both writing a
>>>>> recognizable pattern *and* not handling EPERM correctly.
>>>>>
>>>>> I see the probability of both happening at the same time as
>>>>> extremely low, not least because there are only a handful of
>>>>> programs which access that sector.
>>>>>
>>>> I'm not yet opposed to the "restricted-raw" method, but...
>>>>
>>>> I think the above is a somewhat dangerous viewpoint to take with QEMU.
>>>> It is a bit of a slippery slope to start to assume what data guests
>>>> will write to the disks provided to them.  Even if the probability of
>>>> this happening is very low, with what usage we envision now, it is
>>>> still entirely legitimate usage for a guest to write data starting at
>>>> sector 0.
>> Yup.
>>
>>> Then let's officially deprecate format probing, if we haven't done so
>>> already. That way, there's no excuse.
>> I'd gladly deprecate format probing, or at least format probing
>> resulting in raw.  However, we can hardly deprecate something and keep
>> it the default behavior!
>
> Why not?
>
> "It's the default due to legacy, but it's your fault if something breaks."

Sorry, I'm not cynical enough to see it that way.

Dangerous feature: if we really think users shouldn't use X unless they
really know what they're doing, then we should require users to
explicitly choose X.

Here, the danger is "it's your fault if something breaks".

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-07 17:33         ` Jeff Cody
@ 2014-11-10  8:12           ` Markus Armbruster
  2014-11-10  9:14             ` Kevin Wolf
  0 siblings, 1 reply; 38+ messages in thread
From: Markus Armbruster @ 2014-11-10  8:12 UTC (permalink / raw)
  To: Jeff Cody; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Jeff Cody <jcody@redhat.com> writes:

> On Fri, Nov 07, 2014 at 04:21:38PM +0100, Markus Armbruster wrote:
[...]
>>                let me refine / vary the hybrid approach I mentioned
>> under " Don't guess format from untrusted image contents" some.  I think
>> I can trace some inspiration to Max here.
>> 
>> Say we use trusted meta-data to compute a set of admissible formats, and
>> if the set has multiple members, use probing to pick one of them.
>> 
>> Example: foo.qcow2 -> { qcow2 }, no probing
>> 
>> Example: foo.qcow -> { qcow, qcow2 }, probe to pick one
>> 
>> Likewise for foo.vhdx and foo.vhd.
>> 
>> To ensure this actually knocks out condition (b), all members of the set
>> must have the image contents used by their members' probes within their
>> trust boundary.
>> 
>> Example: { qcow, qcow2 } is fine, because both formats have a header,
>> and each header covers the bytes either probe examines.
>> 
>> Example: { raw } is fine, because there is no probing.
>> 
>> Counterexample: { raw, qcow2 } is not possible, because qcow2 probes
>> outside raw's trusted metadata.
>> 
>> Note my careful wording "contents used by [other] probes".  Right now we
>> simply assume that the first 2048 bytes can be trusted.  This is not
>> obviously the case!  If I remember correctly, you proposed to cut it to
>> 512 bytes, which feels a lot safer, since any sane format probably
>> aligns (untrusted) image contents to at least a 512 byte boundary, but
>> is still theoretically unsound.
>> 
>> PRO and CON like my proposal to guess from trusted meta-data only.  The
>> difference is in what existing usage exactly it breaks.
>> 
>> Can be combined with "refuse to use a format without an explicit format=
>> when any other non-raw format probe accepts", just like everything else
>> proposed so far.
>> 
>
> I think I like this approach, when combined with the above paragraph.
> If I understand it correctly, it means it should also help protect a
> bit against incorrectly detecting raw.

The protection is actually strong: it "detects" raw only when the
trusted meta-data makes raw the only option.  Required to satisfy
condition (b), because if raw is an option, then the *entire* raw image
contents is untrusted, and probing it violates "(b) Don't guess format
from untrusted image contents".

> Looking at extension '.vhd':
>
> foo.vhd -> {vpc, vhdx}
>
> In practice, when probing by contents, we could get these results:
>
> image         | probe result
> ------------------------------
> vhdx               vhdx
> dynamic vhd        vpc
> fixed vhd          raw
>
> If the results are either vhdx or vpc, then that is unambiguous and
> OK.  If the result is raw, we would refuse to open it, because it is
> not in the accepted set of formats for type '.vhd'.

Correct.

> If we look at a generic extension, '.img':
>
> foo.img -> { raw, qcow2, qcow2, vhdx, vpc, qed, vmdk, parallels, unsupported }
>
> If foo.img probe returned qcow2, we would fail, because { raw, qcow2 }
> would both be valid.
>
> But if probe is negative, and returns raw, it is still unknown,
> because could mean {raw, vpc, unsupported}.
>
> So that would mean .img would always require format=, right?
>
> That also implies to me that the only extensions for raw that might
> not require format= would be .iso and .raw.

.img means what we choose it to mean.

If we choose "can mean anything, including raw", then .img always
requires an explicit format with this approach.

If we choose "means raw", then same as above, except you can omit
format=raw, and you become prone to opening existing non-raw formats
raw, which can be bad.

Tradeoff.

[...]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-07 15:21       ` Markus Armbruster
  2014-11-07 17:33         ` Jeff Cody
@ 2014-11-10  8:13         ` Markus Armbruster
  1 sibling, 0 replies; 38+ messages in thread
From: Markus Armbruster @ 2014-11-10  8:13 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Jeff Cody, qemu-devel, Stefan Hajnoczi, Max Reitz

Markus Armbruster <armbru@redhat.com> writes:

[...]
>                let me refine / vary the hybrid approach I mentioned
> under " Don't guess format from untrusted image contents" some.  I think
> I can trace some inspiration to Max here.
>
> Say we use trusted meta-data to compute a set of admissible formats, and
> if the set has multiple members, use probing to pick one of them.
>
> Example: foo.qcow2 -> { qcow2 }, no probing
>
> Example: foo.qcow -> { qcow, qcow2 }, probe to pick one
>
> Likewise for foo.vhdx and foo.vhd.
>
> To ensure this actually knocks out condition (b), all members of the set
> must have the image contents used by their members' probes within their
> trust boundary.
>
> Example: { qcow, qcow2 } is fine, because both formats have a header,
> and each header covers the bytes either probe examines.
>
> Example: { raw } is fine, because there is no probing.
>
> Counterexample: { raw, qcow2 } is not possible, because qcow2 probes
> outside raw's trusted metadata.
>
> Note my careful wording "contents used by [other] probes".  Right now we
> simply assume that the first 2048 bytes can be trusted.  This is not
> obviously the case!  If I remember correctly, you proposed to cut it to
> 512 bytes, which feels a lot safer, since any sane format probably
> aligns (untrusted) image contents to at least a 512 byte boundary, but
> is still theoretically unsound.
>
> PRO and CON like my proposal to guess from trusted meta-data only.  The
> difference is in what existing usage exactly it breaks.
>
> Can be combined with "refuse to use a format without an explicit format=
> when any other non-raw format probe accepts", just like everything else
> proposed so far.

Can also be mitigated: lengthy transition period where we warn "this
usage is insecure, and we'll eventually break it; here's a hint on
secure usage".

[...]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-10  8:12           ` Markus Armbruster
@ 2014-11-10  9:14             ` Kevin Wolf
  2014-11-10 10:30               ` Markus Armbruster
  0 siblings, 1 reply; 38+ messages in thread
From: Kevin Wolf @ 2014-11-10  9:14 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Jeff Cody, qemu-devel, Stefan Hajnoczi, Max Reitz

Am 10.11.2014 um 09:12 hat Markus Armbruster geschrieben:
> Jeff Cody <jcody@redhat.com> writes:
> > So that would mean .img would always require format=, right?
> >
> > That also implies to me that the only extensions for raw that might
> > not require format= would be .iso and .raw.
> 
> .img means what we choose it to mean.
> 
> If we choose "can mean anything, including raw", then .img always
> requires an explicit format with this approach.
> 
> If we choose "means raw", then same as above, except you can omit
> format=raw, and you become prone to opening existing non-raw formats
> raw, which can be bad.

My current thoughts about .img are that we need to consider that

(a) it is occasionally used for multiple image formats and making it
    raw unconditionally is going to cause corruption.

(b) looking at file extensions is absolutely useless if we exlucde
    .img from the automatic detection because it's still the main
    extension for raw.

The common case could probably be covered by bringing probing back into
the game: If an .img file successfully probes for a non-raw format,
error out, avoiding the corruption. If it doesn't, assume raw.

Kevin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-10  9:14             ` Kevin Wolf
@ 2014-11-10 10:30               ` Markus Armbruster
  2014-11-10 14:24                 ` Jeff Cody
  0 siblings, 1 reply; 38+ messages in thread
From: Markus Armbruster @ 2014-11-10 10:30 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Jeff Cody, qemu-devel, Stefan Hajnoczi, Max Reitz

Kevin Wolf <kwolf@redhat.com> writes:

> Am 10.11.2014 um 09:12 hat Markus Armbruster geschrieben:
>> Jeff Cody <jcody@redhat.com> writes:
>> > So that would mean .img would always require format=, right?
>> >
>> > That also implies to me that the only extensions for raw that might
>> > not require format= would be .iso and .raw.
>> 
>> .img means what we choose it to mean.
>> 
>> If we choose "can mean anything, including raw", then .img always
>> requires an explicit format with this approach.
>> 
>> If we choose "means raw", then same as above, except you can omit
>> format=raw, and you become prone to opening existing non-raw formats
>> raw, which can be bad.
>
> My current thoughts about .img are that we need to consider that
>
> (a) it is occasionally used for multiple image formats and making it
>     raw unconditionally is going to cause corruption.
>
> (b) looking at file extensions is absolutely useless if we exlucde
>     .img from the automatic detection because it's still the main
>     extension for raw.
>
> The common case could probably be covered by bringing probing back into
> the game: If an .img file successfully probes for a non-raw format,
> error out, avoiding the corruption. If it doesn't, assume raw.

I think this is the "refuse to use a format without an explicit format=
when any other non-raw format probe accepts" idea, which should combine
nicely with all the other ideas proposed so far.

Combining it with the hybrid approach we're discussing here gets us
something like this, assuming .img means raw:

1. Guess possible formats from trusted meta-data

   For foo.img, this yields { raw }.

2. Probe to pick one

   Since { raw } has just one member, pick it without probing.

3. Now probe all other non-raw formats, and error out if any accepts

   Protects users from opening a non-raw foo.img raw.

1+2 are the hybrid approach, 3 is the "refuse" idea.

Adding 3 is an improvement, from "this usage will now break at runtime,
possibly corrupting data" to "this usage will now be rejected cleanly".

3 should also help catch insufficiently selective probe methods.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-10 10:30               ` Markus Armbruster
@ 2014-11-10 14:24                 ` Jeff Cody
  2014-11-11  8:28                   ` Markus Armbruster
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff Cody @ 2014-11-10 14:24 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

On Mon, Nov 10, 2014 at 11:30:25AM +0100, Markus Armbruster wrote:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 10.11.2014 um 09:12 hat Markus Armbruster geschrieben:
> >> Jeff Cody <jcody@redhat.com> writes:
> >> > So that would mean .img would always require format=, right?
> >> >
> >> > That also implies to me that the only extensions for raw that might
> >> > not require format= would be .iso and .raw.
> >> 
> >> .img means what we choose it to mean.
> >> 
> >> If we choose "can mean anything, including raw", then .img always
> >> requires an explicit format with this approach.
> >> 
> >> If we choose "means raw", then same as above, except you can omit
> >> format=raw, and you become prone to opening existing non-raw formats
> >> raw, which can be bad.
> >
> > My current thoughts about .img are that we need to consider that
> >
> > (a) it is occasionally used for multiple image formats and making it
> >     raw unconditionally is going to cause corruption.
> >
> > (b) looking at file extensions is absolutely useless if we exlucde
> >     .img from the automatic detection because it's still the main
> >     extension for raw.
> >
> > The common case could probably be covered by bringing probing back into
> > the game: If an .img file successfully probes for a non-raw format,
> > error out, avoiding the corruption. If it doesn't, assume raw.
> 
> I think this is the "refuse to use a format without an explicit format=
> when any other non-raw format probe accepts" idea, which should combine
> nicely with all the other ideas proposed so far.
> 
> Combining it with the hybrid approach we're discussing here gets us
> something like this, assuming .img means raw:
> 
> 1. Guess possible formats from trusted meta-data
> 
>    For foo.img, this yields { raw }.
> 
> 2. Probe to pick one
> 
>    Since { raw } has just one member, pick it without probing.
> 
> 3. Now probe all other non-raw formats, and error out if any accepts
> 
>    Protects users from opening a non-raw foo.img raw.
>

With the exception of fixed-size VHD.

But this problem exists today, so there isn't anything 'new' broken
there.  And we can fix that if we want by upgrading probing, to pass
both the first 512 bytes, and the last 512 bytes of the image file.

> 1+2 are the hybrid approach, 3 is the "refuse" idea.
> 
> Adding 3 is an improvement, from "this usage will now break at runtime,
> possibly corrupting data" to "this usage will now be rejected cleanly".
> 
> 3 should also help catch insufficiently selective probe methods.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
  2014-11-10 14:24                 ` Jeff Cody
@ 2014-11-11  8:28                   ` Markus Armbruster
  0 siblings, 0 replies; 38+ messages in thread
From: Markus Armbruster @ 2014-11-11  8:28 UTC (permalink / raw)
  To: Jeff Cody; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Jeff Cody <jcody@redhat.com> writes:

> On Mon, Nov 10, 2014 at 11:30:25AM +0100, Markus Armbruster wrote:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > Am 10.11.2014 um 09:12 hat Markus Armbruster geschrieben:
>> >> Jeff Cody <jcody@redhat.com> writes:
>> >> > So that would mean .img would always require format=, right?
>> >> >
>> >> > That also implies to me that the only extensions for raw that might
>> >> > not require format= would be .iso and .raw.
>> >> 
>> >> .img means what we choose it to mean.
>> >> 
>> >> If we choose "can mean anything, including raw", then .img always
>> >> requires an explicit format with this approach.
>> >> 
>> >> If we choose "means raw", then same as above, except you can omit
>> >> format=raw, and you become prone to opening existing non-raw formats
>> >> raw, which can be bad.
>> >
>> > My current thoughts about .img are that we need to consider that
>> >
>> > (a) it is occasionally used for multiple image formats and making it
>> >     raw unconditionally is going to cause corruption.
>> >
>> > (b) looking at file extensions is absolutely useless if we exlucde
>> >     .img from the automatic detection because it's still the main
>> >     extension for raw.
>> >
>> > The common case could probably be covered by bringing probing back into
>> > the game: If an .img file successfully probes for a non-raw format,
>> > error out, avoiding the corruption. If it doesn't, assume raw.
>> 
>> I think this is the "refuse to use a format without an explicit format=
>> when any other non-raw format probe accepts" idea, which should combine
>> nicely with all the other ideas proposed so far.
>> 
>> Combining it with the hybrid approach we're discussing here gets us
>> something like this, assuming .img means raw:
>> 
>> 1. Guess possible formats from trusted meta-data
>> 
>>    For foo.img, this yields { raw }.
>> 
>> 2. Probe to pick one
>> 
>>    Since { raw } has just one member, pick it without probing.
>> 
>> 3. Now probe all other non-raw formats, and error out if any accepts
>> 
>>    Protects users from opening a non-raw foo.img raw.
>>
>
> With the exception of fixed-size VHD.
>
> But this problem exists today, so there isn't anything 'new' broken
> there.  And we can fix that if we want by upgrading probing, to pass
> both the first 512 bytes, and the last 512 bytes of the image file.

Careful.  If condition "all members of the set must have the image
contents used by their members' probes within their trust boundary" is
satisfied, then probing can obviously be trusted to pick a member of the
set.  Else, we don't know without further reasoning.

Current code plus Kevin's patch passes the first 512 bytes to the
probing methods.  Actual methods use only parts of it, but what parts
exactly is currently opaque.  As long as our reaoning on probes doesn't
pierce that opacity, only formats that cannot store untrusted data
within the first 512 bytes satisfy the condition.  "raw" surely violates
it.  Whether there are any others I can't say without a review of all
probing methods.  Throwing in the last 512 bytes is prone to make many
more formats violate the condition.

To check the condition, we need to know each format's trust boundary and
what each probe examines.  If we declare both in code, we can check
automatically.  Or with useful approximations, check a somewhat tighter
condition.  That's why I keep harping on this condition.

Let me stress again: the condition is sufficient for trust, but not
necessary.  Here's an example of how probing can trustworthily examine
data outside the intersection of all formats' trusted data: consider a
two-stage probe, where the first stage reliably detects a family of
related formats, and the second stage disambiguates the family members.
Can be trusted even when the second stage examines data outside the
intersection.

The automatic checker sketched above can deal with this: simply declare
only the first stage's data.

>> 1+2 are the hybrid approach, 3 is the "refuse" idea.
>> 
>> Adding 3 is an improvement, from "this usage will now break at runtime,
>> possibly corrupting data" to "this usage will now be rejected cleanly".
>> 
>> 3 should also help catch insufficiently selective probe methods.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2014-11-11  8:28 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-04 18:45 [Qemu-devel] Image probing: how it can be insecure, and what we could do about it Markus Armbruster
2014-11-04 20:33 ` Jeff Cody
2014-11-05  7:04   ` Markus Armbruster
2014-11-05  7:30 ` Markus Armbruster
2014-11-05  8:38 ` Max Reitz
2014-11-05 10:18   ` Eric Blake
2014-11-06 12:43     ` Markus Armbruster
2014-11-06 13:02       ` Eric Blake
2014-11-05 11:15   ` Kevin Wolf
2014-11-06 12:26   ` Markus Armbruster
2014-11-06 12:53     ` Max Reitz
2014-11-06 14:56       ` Jeff Cody
2014-11-06 15:00         ` Max Reitz
2014-11-07 14:52           ` Markus Armbruster
2014-11-07 15:17             ` Max Reitz
2014-11-10  7:58               ` Markus Armbruster
2014-11-07  9:57       ` Markus Armbruster
2014-11-06 13:02     ` Kevin Wolf
2014-11-07 14:50       ` Markus Armbruster
2014-11-05 10:12 ` Gerd Hoffmann
2014-11-05 10:33 ` Eric Blake
2014-11-06 12:52   ` Markus Armbruster
2014-11-05 11:01 ` Kevin Wolf
2014-11-06 13:57   ` Markus Armbruster
2014-11-06 14:14     ` Eric Blake
2014-11-06 15:52       ` Jeff Cody
2014-11-06 14:35     ` Jeff Cody
2014-11-06 15:01     ` Kevin Wolf
2014-11-07 15:21       ` Markus Armbruster
2014-11-07 17:33         ` Jeff Cody
2014-11-10  8:12           ` Markus Armbruster
2014-11-10  9:14             ` Kevin Wolf
2014-11-10 10:30               ` Markus Armbruster
2014-11-10 14:24                 ` Jeff Cody
2014-11-11  8:28                   ` Markus Armbruster
2014-11-10  8:13         ` Markus Armbruster
2014-11-05 15:24 ` Dr. David Alan Gilbert
2014-11-06 13:04   ` Markus Armbruster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).