* [Qemu-devel] virtio device error reporting best practice? @ 2014-03-17 6:02 Dave Airlie 2014-03-17 14:28 ` Laszlo Ersek ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Dave Airlie @ 2014-03-17 6:02 UTC (permalink / raw) To: qemu-devel@nongnu.org So I'm looking at how best to do virtio gpu device error reporting, and how to deal with illegal stuff, I've two levels of errors I want to support, a) unrecoverable or bad guest kernel programming errors, b) per 3D context errors from the renderer backend, (b) I can easily report in an event queue and the guest kernel can in theory blow away the offenders, this is how GL works with some extensions, For (a) I can expect a response from every command I put into the main GPU control queue, the response should always be no error, but in some cases it will be because the guest hit some host resource error, or asked for something insane, (guest kernel drivers would be broken in most of these cases). Alternately I can use the separate event queue to send async errors when the guest does something bad, I'm also considering adding some sort of flag in config space saying the device needs a reset before it will continue doing anything, The main reason I'm considering this stuff is for security reasons if the guest asks for something really illegal or crazy what should the expected behaviour of the host be? (at least secure I know that). Dave. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-17 6:02 [Qemu-devel] virtio device error reporting best practice? Dave Airlie @ 2014-03-17 14:28 ` Laszlo Ersek 2014-03-17 14:40 ` Peter Maydell 2014-03-26 12:49 ` Stefan Hajnoczi 2014-03-17 14:50 ` Gerd Hoffmann 2014-03-19 0:34 ` Rusty Russell 2 siblings, 2 replies; 21+ messages in thread From: Laszlo Ersek @ 2014-03-17 14:28 UTC (permalink / raw) To: Dave Airlie, qemu-devel@nongnu.org; +Cc: Seiji Aguchi On 03/17/14 07:02, Dave Airlie wrote: > So I'm looking at how best to do virtio gpu device error reporting, > and how to deal with illegal stuff, > > I've two levels of errors I want to support, > > a) unrecoverable or bad guest kernel programming errors, > > b) per 3D context errors from the renderer backend, > > (b) I can easily report in an event queue and the guest kernel can in > theory blow away the offenders, this is how GL works with some > extensions, > > For (a) I can expect a response from every command I put into the main > GPU control queue, the response should always be no error, but in some > cases it will be because the guest hit some host resource error, or > asked for something insane, (guest kernel drivers would be broken in > most of these cases). > > Alternately I can use the separate event queue to send async errors > when the guest does something bad, > > I'm also considering adding some sort of flag in config space saying > the device needs a reset before it will continue doing anything, > > The main reason I'm considering this stuff is for security reasons if > the guest asks for something really illegal or crazy what should the > expected behaviour of the host be? (at least secure I know that). exit(1). If you grep qemu for it, you'll find such examples. Notably, "hw/virtio/virtio.c" is chock full of them; if the guest doesn't speak the basic protocol, there's nothing for the host to do. See also virtio-blk.c (missing or incorrect headers), virtio-net.c (similar protocol violations), virtio-scsi.c (wrong header size, bad config etc). For later, we have a use case on the horizon where all such exits -- not just virtio, but exit(1) or abort() on invalid guest behavior in general -- should be optionally coupled (dependent on the qemu command line) with an automatic dump-guest-memory, in order to help debugging the guest. Thanks Laszlo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-17 14:28 ` Laszlo Ersek @ 2014-03-17 14:40 ` Peter Maydell 2014-03-17 14:49 ` Laszlo Ersek 2014-03-17 14:57 ` Richard W.M. Jones 2014-03-26 12:49 ` Stefan Hajnoczi 1 sibling, 2 replies; 21+ messages in thread From: Peter Maydell @ 2014-03-17 14:40 UTC (permalink / raw) To: Laszlo Ersek; +Cc: Dave Airlie, qemu-devel@nongnu.org, Seiji Aguchi On 17 March 2014 14:28, Laszlo Ersek <lersek@redhat.com> wrote: > On 03/17/14 07:02, Dave Airlie wrote: >> The main reason I'm considering this stuff is for security reasons if >> the guest asks for something really illegal or crazy what should the >> expected behaviour of the host be? (at least secure I know that). > > exit(1). No thanks -- the guest should never be able to cause QEMU to exit (in an ideal world). Use qemu_log_mask(LOG_GUEST_ERROR, ...) and continue. > If you grep qemu for it, you'll find such examples. Notably, > "hw/virtio/virtio.c" is chock full of them; if the guest doesn't speak > the basic protocol, there's nothing for the host to do. See also > virtio-blk.c (missing or incorrect headers), virtio-net.c (similar > protocol violations), virtio-scsi.c (wrong header size, bad config etc). I think these are all examples of legacy code written before we had a sensible logging API for this kind of thing. thanks -- PMM ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-17 14:40 ` Peter Maydell @ 2014-03-17 14:49 ` Laszlo Ersek 2014-03-17 14:54 ` Peter Maydell ` (3 more replies) 2014-03-17 14:57 ` Richard W.M. Jones 1 sibling, 4 replies; 21+ messages in thread From: Laszlo Ersek @ 2014-03-17 14:49 UTC (permalink / raw) To: Peter Maydell; +Cc: Dave Airlie, qemu-devel@nongnu.org, Seiji Aguchi On 03/17/14 15:40, Peter Maydell wrote: > On 17 March 2014 14:28, Laszlo Ersek <lersek@redhat.com> wrote: >> On 03/17/14 07:02, Dave Airlie wrote: >>> The main reason I'm considering this stuff is for security reasons if >>> the guest asks for something really illegal or crazy what should the >>> expected behaviour of the host be? (at least secure I know that). >> >> exit(1). > > No thanks -- the guest should never be able to cause QEMU > to exit (in an ideal world). Use > qemu_log_mask(LOG_GUEST_ERROR, ...) > and continue. How do you continue with a garbled virtio ring? Say you detect an error that would cause integer overflow or buffer overflow in the host, a clear virtio protocol violation etc. Error reporting is just one thing -- what are the semantics of continuing? Exit(1) is considered guest crash. "The guest should never be able to cause QEMU to exit" is identical to "a bare metal kernel should never be able to crash a physical machine, even if it wants to". Some behavior is left undefined even in hardware manuals. Crashing as early as possible is safe and helpful. Just my opinion, naturally. Thanks, Laszlo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-17 14:49 ` Laszlo Ersek @ 2014-03-17 14:54 ` Peter Maydell 2014-03-17 14:57 ` Gerd Hoffmann ` (2 subsequent siblings) 3 siblings, 0 replies; 21+ messages in thread From: Peter Maydell @ 2014-03-17 14:54 UTC (permalink / raw) To: Laszlo Ersek; +Cc: Dave Airlie, qemu-devel@nongnu.org, Seiji Aguchi On 17 March 2014 14:49, Laszlo Ersek <lersek@redhat.com> wrote: > On 03/17/14 15:40, Peter Maydell wrote: >> On 17 March 2014 14:28, Laszlo Ersek <lersek@redhat.com> wrote: >>> On 03/17/14 07:02, Dave Airlie wrote: >>>> The main reason I'm considering this stuff is for security reasons if >>>> the guest asks for something really illegal or crazy what should the >>>> expected behaviour of the host be? (at least secure I know that). >>> >>> exit(1). >> >> No thanks -- the guest should never be able to cause QEMU >> to exit (in an ideal world). Use >> qemu_log_mask(LOG_GUEST_ERROR, ...) >> and continue. > > How do you continue with a garbled virtio ring? Say you detect an error > that would cause integer overflow or buffer overflow in the host, a > clear virtio protocol violation etc. Error reporting is just one thing > -- what are the semantics of continuing? Whatever the device likes. If you're emulating real hardware then the ideal is to match that. For virtio, whatever seems easiest. Something that would allow the guest to work after it does a warm reboot would be nice. > Exit(1) is considered guest crash. "The guest should never be able to > cause QEMU to exit" is identical to "a bare metal kernel should never be > able to crash a physical machine, even if it wants to". Some behavior is > left undefined even in hardware manuals. Crashing as early as possible > is safe and helpful. This would mean we couldn't model systems with a hardware watchdog timer that looks out for guest crashes and does a reboot, for instance. exit() is approximately equivalent to "machine caught fire" since it means that there's no resurrecting the VM via watchdog device, forcing reset through a management interface or anything else. thanks -- PMM ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-17 14:49 ` Laszlo Ersek 2014-03-17 14:54 ` Peter Maydell @ 2014-03-17 14:57 ` Gerd Hoffmann 2014-03-17 19:05 ` Andreas Färber 2014-03-18 12:45 ` Kevin Wolf 3 siblings, 0 replies; 21+ messages in thread From: Gerd Hoffmann @ 2014-03-17 14:57 UTC (permalink / raw) To: Laszlo Ersek, Michael S. Tsirkin Cc: Peter Maydell, Dave Airlie, qemu-devel@nongnu.org, Seiji Aguchi On Mo, 2014-03-17 at 15:49 +0100, Laszlo Ersek wrote: > On 03/17/14 15:40, Peter Maydell wrote: > > On 17 March 2014 14:28, Laszlo Ersek <lersek@redhat.com> wrote: > >> On 03/17/14 07:02, Dave Airlie wrote: > >>> The main reason I'm considering this stuff is for security reasons if > >>> the guest asks for something really illegal or crazy what should the > >>> expected behaviour of the host be? (at least secure I know that). > >> > >> exit(1). > > > > No thanks -- the guest should never be able to cause QEMU > > to exit (in an ideal world). Use > > qemu_log_mask(LOG_GUEST_ERROR, ...) > > and continue. > > How do you continue with a garbled virtio ring? Say you detect an error > that would cause integer overflow or buffer overflow in the host, a > clear virtio protocol violation etc. Error reporting is just one thing > -- what are the semantics of continuing? Stop processing until device is reset. This is what real hardware does, and there are places in qemu (xhci emulation for example) which does this too. We don't have a standard error register in virtio to report this to the guest though. Maybe something to consider for virtio 1.0? mst? cheers, Gerd ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-17 14:49 ` Laszlo Ersek 2014-03-17 14:54 ` Peter Maydell 2014-03-17 14:57 ` Gerd Hoffmann @ 2014-03-17 19:05 ` Andreas Färber 2014-03-18 12:45 ` Kevin Wolf 3 siblings, 0 replies; 21+ messages in thread From: Andreas Färber @ 2014-03-17 19:05 UTC (permalink / raw) To: Laszlo Ersek Cc: Peter Maydell, Dave Airlie, qemu-devel@nongnu.org, Seiji Aguchi Am 17.03.2014 15:49, schrieb Laszlo Ersek: > On 03/17/14 15:40, Peter Maydell wrote: >> On 17 March 2014 14:28, Laszlo Ersek <lersek@redhat.com> wrote: >>> On 03/17/14 07:02, Dave Airlie wrote: >>>> The main reason I'm considering this stuff is for security reasons if >>>> the guest asks for something really illegal or crazy what should the >>>> expected behaviour of the host be? (at least secure I know that). >>> >>> exit(1). >> >> No thanks -- the guest should never be able to cause QEMU >> to exit (in an ideal world). Use >> qemu_log_mask(LOG_GUEST_ERROR, ...) >> and continue. > > How do you continue with a garbled virtio ring? Say you detect an error > that would cause integer overflow or buffer overflow in the host, a > clear virtio protocol violation etc. Error reporting is just one thing > -- what are the semantics of continuing? > > Exit(1) is considered guest crash. I disagree. QEMU has a panic state/event for guest crashes, used by pvpanic and s390x, which does not exit() the process. If it's a state that the guest should not be able to trigger, then assertions are better than exit() since they allow to use gdb for investigating the origin. exit(1) should correspond to an unrecoverable issue on the host, such as ioctl failure, not a fault by the guest. And yes, there's quite a few of them around just like there's still tons of fprintf()s around. Error handling and reporting is a constant pain point... Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-17 14:49 ` Laszlo Ersek ` (2 preceding siblings ...) 2014-03-17 19:05 ` Andreas Färber @ 2014-03-18 12:45 ` Kevin Wolf 3 siblings, 0 replies; 21+ messages in thread From: Kevin Wolf @ 2014-03-18 12:45 UTC (permalink / raw) To: Laszlo Ersek Cc: Peter Maydell, Dave Airlie, qemu-devel@nongnu.org, Seiji Aguchi Am 17.03.2014 um 15:49 hat Laszlo Ersek geschrieben: > On 03/17/14 15:40, Peter Maydell wrote: > > On 17 March 2014 14:28, Laszlo Ersek <lersek@redhat.com> wrote: > >> On 03/17/14 07:02, Dave Airlie wrote: > >>> The main reason I'm considering this stuff is for security reasons if > >>> the guest asks for something really illegal or crazy what should the > >>> expected behaviour of the host be? (at least secure I know that). > >> > >> exit(1). > > > > No thanks -- the guest should never be able to cause QEMU > > to exit (in an ideal world). Use > > qemu_log_mask(LOG_GUEST_ERROR, ...) > > and continue. > > How do you continue with a garbled virtio ring? Say you detect an error > that would cause integer overflow or buffer overflow in the host, a > clear virtio protocol violation etc. Error reporting is just one thing > -- what are the semantics of continuing? Stop processing the ring until the guest resets it. You don't want a buggy guest driver to crash the whole system needlessly, it's good enough if the one device doesn't work any more. Also, consider that a qemu process can be doing entirely different things, like executing block jobs or hosting the builtin NBD server (and being a QMP server, too), and you don't want to allow the guest to interrupt this. As far as I can tell, exit() also doesn't do bdrv_close_all(), so it's an unclean shutdown that may leave images in an inconsistent state. And this is only the block layer perspective on it. Not sure about how well other subsystems cope with unclean shutdowns. Not to forget that a user will be surprised if his qemu process is suddenly gone without him intentionally shutting down the VM. An error message indicating the failing device inside the guest is much nicer. tl;dr: exit() calls in hw/ do exist, but these are longstanding bugs, not examples to follow. Kevin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-17 14:40 ` Peter Maydell 2014-03-17 14:49 ` Laszlo Ersek @ 2014-03-17 14:57 ` Richard W.M. Jones 2014-03-17 14:59 ` Richard W.M. Jones 1 sibling, 1 reply; 21+ messages in thread From: Richard W.M. Jones @ 2014-03-17 14:57 UTC (permalink / raw) To: Peter Maydell Cc: Dave Airlie, Laszlo Ersek, qemu-devel@nongnu.org, Seiji Aguchi On Mon, Mar 17, 2014 at 02:40:09PM +0000, Peter Maydell wrote: > On 17 March 2014 14:28, Laszlo Ersek <lersek@redhat.com> wrote: > > On 03/17/14 07:02, Dave Airlie wrote: > >> The main reason I'm considering this stuff is for security reasons if > >> the guest asks for something really illegal or crazy what should the > >> expected behaviour of the host be? (at least secure I know that). > > > > exit(1). > > No thanks -- the guest should never be able to cause QEMU > to exit (in an ideal world). Use > qemu_log_mask(LOG_GUEST_ERROR, ...) > and continue. Don't look too closely at the spice backend ... https://bugzilla.redhat.com/buglist.cgi?component=spice&list_id=2320267 Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-17 14:57 ` Richard W.M. Jones @ 2014-03-17 14:59 ` Richard W.M. Jones 0 siblings, 0 replies; 21+ messages in thread From: Richard W.M. Jones @ 2014-03-17 14:59 UTC (permalink / raw) To: Peter Maydell Cc: Laszlo Ersek, Dave Airlie, qemu-devel@nongnu.org, Seiji Aguchi On Mon, Mar 17, 2014 at 02:57:41PM +0000, Richard W.M. Jones wrote: > > On Mon, Mar 17, 2014 at 02:40:09PM +0000, Peter Maydell wrote: > > On 17 March 2014 14:28, Laszlo Ersek <lersek@redhat.com> wrote: > > > On 03/17/14 07:02, Dave Airlie wrote: > > >> The main reason I'm considering this stuff is for security reasons if > > >> the guest asks for something really illegal or crazy what should the > > >> expected behaviour of the host be? (at least secure I know that). > > > > > > exit(1). > > > > No thanks -- the guest should never be able to cause QEMU > > to exit (in an ideal world). Use > > qemu_log_mask(LOG_GUEST_ERROR, ...) > > and continue. > > Don't look too closely at the spice backend ... > > https://bugzilla.redhat.com/buglist.cgi?component=spice&list_id=2320267 A better link might be: https://bugzilla.redhat.com/show_bug.cgi?id=997932#c13 Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-17 14:28 ` Laszlo Ersek 2014-03-17 14:40 ` Peter Maydell @ 2014-03-26 12:49 ` Stefan Hajnoczi 1 sibling, 0 replies; 21+ messages in thread From: Stefan Hajnoczi @ 2014-03-26 12:49 UTC (permalink / raw) To: Laszlo Ersek; +Cc: Dave Airlie, qemu-devel@nongnu.org, Seiji Aguchi On Mon, Mar 17, 2014 at 03:28:08PM +0100, Laszlo Ersek wrote: > On 03/17/14 07:02, Dave Airlie wrote: > > So I'm looking at how best to do virtio gpu device error reporting, > > and how to deal with illegal stuff, > > > > I've two levels of errors I want to support, > > > > a) unrecoverable or bad guest kernel programming errors, > > > > b) per 3D context errors from the renderer backend, > > > > (b) I can easily report in an event queue and the guest kernel can in > > theory blow away the offenders, this is how GL works with some > > extensions, > > > > For (a) I can expect a response from every command I put into the main > > GPU control queue, the response should always be no error, but in some > > cases it will be because the guest hit some host resource error, or > > asked for something insane, (guest kernel drivers would be broken in > > most of these cases). > > > > Alternately I can use the separate event queue to send async errors > > when the guest does something bad, > > > > I'm also considering adding some sort of flag in config space saying > > the device needs a reset before it will continue doing anything, > > > > The main reason I'm considering this stuff is for security reasons if > > the guest asks for something really illegal or crazy what should the > > expected behaviour of the host be? (at least secure I know that). > > exit(1). > > If you grep qemu for it, you'll find such examples. Notably, > "hw/virtio/virtio.c" is chock full of them; if the guest doesn't speak > the basic protocol, there's nothing for the host to do. See also > virtio-blk.c (missing or incorrect headers), virtio-net.c (similar > protocol violations), virtio-scsi.c (wrong header size, bad config etc). > > For later, we have a use case on the horizon where all such exits -- not > just virtio, but exit(1) or abort() on invalid guest behavior in general > -- should be optionally coupled (dependent on the qemu command line) > with an automatic dump-guest-memory, in order to help debugging the guest. Please don't use exit(1). Instead you can put the device into a "broken" state and wait for the guest to reset it. exit(1) is nasty because a failure in one driver shouldn't bring down the entire VM if we can prevent it. Also, it's a denial-of-service if we ever allow virtio passthrough to nested guests - a nested guest could kill its parent and hence all other nested guests. Stefan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-17 6:02 [Qemu-devel] virtio device error reporting best practice? Dave Airlie 2014-03-17 14:28 ` Laszlo Ersek @ 2014-03-17 14:50 ` Gerd Hoffmann 2014-03-19 0:34 ` Rusty Russell 2 siblings, 0 replies; 21+ messages in thread From: Gerd Hoffmann @ 2014-03-17 14:50 UTC (permalink / raw) To: Dave Airlie; +Cc: qemu-devel@nongnu.org On Mo, 2014-03-17 at 16:02 +1000, Dave Airlie wrote: > So I'm looking at how best to do virtio gpu device error reporting, > and how to deal with illegal stuff, > > I've two levels of errors I want to support, > > a) unrecoverable or bad guest kernel programming errors, > > b) per 3D context errors from the renderer backend, What you find in modern real hardware (xhci for example) and which would also make sense for virtio is: * stick an error message into a event queue (for non-fatal errors), which would be a good fit for (b) * set a bit in a error status register, maybe raise IRQ, stop processing until reset (for fatal errors, i.e. your (a) case). such a register can live in config space for virtio. cheers, Gerd ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-17 6:02 [Qemu-devel] virtio device error reporting best practice? Dave Airlie 2014-03-17 14:28 ` Laszlo Ersek 2014-03-17 14:50 ` Gerd Hoffmann @ 2014-03-19 0:34 ` Rusty Russell 2014-03-19 8:12 ` Markus Armbruster 2014-03-20 6:51 ` Michael S. Tsirkin 2 siblings, 2 replies; 21+ messages in thread From: Rusty Russell @ 2014-03-19 0:34 UTC (permalink / raw) To: Dave Airlie, qemu-devel@nongnu.org Dave Airlie <airlied@gmail.com> writes: > So I'm looking at how best to do virtio gpu device error reporting, > and how to deal with illegal stuff, > > I've two levels of errors I want to support, > > a) unrecoverable or bad guest kernel programming errors, The QEMU standard approach is to exit at this point. No, really. > b) per 3D context errors from the renderer backend, > > (b) I can easily report in an event queue and the guest kernel can in > theory blow away the offenders, this is how GL works with some > extensions, That's probably sanest. > For (a) I can expect a response from every command I put into the main > GPU control queue, the response should always be no error, but in some > cases it will be because the guest hit some host resource error, or > asked for something insane, (guest kernel drivers would be broken in > most of these cases). > > Alternately I can use the separate event queue to send async errors > when the guest does something bad, > > I'm also considering adding some sort of flag in config space saying > the device needs a reset before it will continue doing anything, I generally dislike error codes which Never Happen; it's like making every void function return int just in case: the caller has no idea what to do if it fails. The litmus test: does *your* guest handle failures other than by giving up on the device? If so, sure, you need to have a sane error-reporting strategy. > The main reason I'm considering this stuff is for security reasons if > the guest asks for something really illegal or crazy what should the > expected behaviour of the host be? (at least secure I know that). If the guest userspace can do it, don't exit. If the kernel only, and it's should have known better, abort is OK. Sure that doesn't help much! Rusty. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-19 0:34 ` Rusty Russell @ 2014-03-19 8:12 ` Markus Armbruster 2014-03-20 3:40 ` Rusty Russell 2014-03-20 6:51 ` Michael S. Tsirkin 1 sibling, 1 reply; 21+ messages in thread From: Markus Armbruster @ 2014-03-19 8:12 UTC (permalink / raw) To: Rusty Russell; +Cc: Dave Airlie, qemu-devel@nongnu.org Rusty Russell <rusty@rustcorp.com.au> writes: > Dave Airlie <airlied@gmail.com> writes: >> So I'm looking at how best to do virtio gpu device error reporting, >> and how to deal with illegal stuff, >> >> I've two levels of errors I want to support, >> >> a) unrecoverable or bad guest kernel programming errors, > > The QEMU standard approach is to exit at this point. No, really. > >> b) per 3D context errors from the renderer backend, >> >> (b) I can easily report in an event queue and the guest kernel can in >> theory blow away the offenders, this is how GL works with some >> extensions, > > That's probably sanest. > >> For (a) I can expect a response from every command I put into the main >> GPU control queue, the response should always be no error, but in some >> cases it will be because the guest hit some host resource error, or >> asked for something insane, (guest kernel drivers would be broken in >> most of these cases). >> >> Alternately I can use the separate event queue to send async errors >> when the guest does something bad, >> >> I'm also considering adding some sort of flag in config space saying >> the device needs a reset before it will continue doing anything, > > I generally dislike error codes which Never Happen; it's like making > every void function return int just in case: the caller has no idea what > to do if it fails. > > The litmus test: does *your* guest handle failures other than by giving > up on the device? If so, sure, you need to have a sane error-reporting > strategy. Err, isn't this a circular argument? No need for QEMU to report the failure, because the guest won't handle it; no need to handle the failure, because QEMU won't report it. What about this: would you make your guest handle failures if they were reported? >> The main reason I'm considering this stuff is for security reasons if >> the guest asks for something really illegal or crazy what should the >> expected behaviour of the host be? (at least secure I know that). > > If the guest userspace can do it, don't exit. If the kernel only, and > it's should have known better, abort is OK. > > Sure that doesn't help much! Immediate exit() or abort() denies the guest the ability to degrade service gracefully (disable the device, cry for help and try to hobble on), or report its brokenness ungracefully (kernel panic, crash dump). I doubt denying that is okay unless the device is so important that without it you can't even hope to panic. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-19 8:12 ` Markus Armbruster @ 2014-03-20 3:40 ` Rusty Russell 2014-03-20 6:39 ` Markus Armbruster 0 siblings, 1 reply; 21+ messages in thread From: Rusty Russell @ 2014-03-20 3:40 UTC (permalink / raw) To: Markus Armbruster; +Cc: Dave Airlie, qemu-devel@nongnu.org Markus Armbruster <armbru@redhat.com> writes: > Rusty Russell <rusty@rustcorp.com.au> writes: >> The litmus test: does *your* guest handle failures other than by giving >> up on the device? If so, sure, you need to have a sane error-reporting >> strategy. > > Err, isn't this a circular argument? No need for QEMU to report the > failure, because the guest won't handle it; no need to handle the > failure, because QEMU won't report it. > > What about this: would you make your guest handle failures if they were > reported? Perhaps I was unclear, that's what I meant. >>> The main reason I'm considering this stuff is for security reasons if >>> the guest asks for something really illegal or crazy what should the >>> expected behaviour of the host be? (at least secure I know that). >> >> If the guest userspace can do it, don't exit. If the kernel only, and >> it's should have known better, abort is OK. >> >> Sure that doesn't help much! > > Immediate exit() or abort() denies the guest the ability to degrade > service gracefully (disable the device, cry for help and try to hobble > on), or report its brokenness ungracefully (kernel panic, crash dump). > I doubt denying that is okay unless the device is so important that > without it you can't even hope to panic. Oh yes, I completely agree with you! But QEMU practice doesn't :) Cheers, Rusty. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-20 3:40 ` Rusty Russell @ 2014-03-20 6:39 ` Markus Armbruster 2014-03-20 12:53 ` Peter Maydell 0 siblings, 1 reply; 21+ messages in thread From: Markus Armbruster @ 2014-03-20 6:39 UTC (permalink / raw) To: Rusty Russell; +Cc: Dave Airlie, qemu-devel@nongnu.org Rusty Russell <rusty@rustcorp.com.au> writes: > Markus Armbruster <armbru@redhat.com> writes: >> Rusty Russell <rusty@rustcorp.com.au> writes: >>> The litmus test: does *your* guest handle failures other than by giving >>> up on the device? If so, sure, you need to have a sane error-reporting >>> strategy. >> >> Err, isn't this a circular argument? No need for QEMU to report the >> failure, because the guest won't handle it; no need to handle the >> failure, because QEMU won't report it. >> >> What about this: would you make your guest handle failures if they were >> reported? > > Perhaps I was unclear, that's what I meant. > >>>> The main reason I'm considering this stuff is for security reasons if >>>> the guest asks for something really illegal or crazy what should the >>>> expected behaviour of the host be? (at least secure I know that). >>> >>> If the guest userspace can do it, don't exit. If the kernel only, and >>> it's should have known better, abort is OK. >>> >>> Sure that doesn't help much! >> >> Immediate exit() or abort() denies the guest the ability to degrade >> service gracefully (disable the device, cry for help and try to hobble >> on), or report its brokenness ungracefully (kernel panic, crash dump). >> I doubt denying that is okay unless the device is so important that >> without it you can't even hope to panic. > > Oh yes, I completely agree with you! But QEMU practice doesn't :) Ah, then we're in violent agreement :) Time to cease the practice. Will be hard as long as the code is chock-full of bad examples. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-20 6:39 ` Markus Armbruster @ 2014-03-20 12:53 ` Peter Maydell 2014-03-26 14:34 ` Markus Armbruster 0 siblings, 1 reply; 21+ messages in thread From: Peter Maydell @ 2014-03-20 12:53 UTC (permalink / raw) To: Markus Armbruster; +Cc: Rusty Russell, Dave Airlie, qemu-devel@nongnu.org On 20 March 2014 06:39, Markus Armbruster <armbru@redhat.com> wrote: > Time to cease the practice. Will be hard as long as the code > is chock-full of bad examples. I've been consistently rejecting new instances of guest triggerable exit() or abort() in code review when I see them for at least the last six months to a year or so (since we improved the qemu_log() facility so it could be used by device models too). thanks -- PMM ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-20 12:53 ` Peter Maydell @ 2014-03-26 14:34 ` Markus Armbruster 2014-03-27 0:54 ` Venkatesh Srinivas 0 siblings, 1 reply; 21+ messages in thread From: Markus Armbruster @ 2014-03-26 14:34 UTC (permalink / raw) To: Peter Maydell; +Cc: Rusty Russell, Dave Airlie, qemu-devel@nongnu.org Peter Maydell <peter.maydell@linaro.org> writes: > On 20 March 2014 06:39, Markus Armbruster <armbru@redhat.com> wrote: >> Time to cease the practice. Will be hard as long as the code >> is chock-full of bad examples. > > I've been consistently rejecting new instances of > guest triggerable exit() or abort() in code review when > I see them for at least the last six months to a year or so > (since we improved the qemu_log() facility so it could be > used by device models too). Appreciated. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-26 14:34 ` Markus Armbruster @ 2014-03-27 0:54 ` Venkatesh Srinivas 0 siblings, 0 replies; 21+ messages in thread From: Venkatesh Srinivas @ 2014-03-27 0:54 UTC (permalink / raw) To: Markus Armbruster Cc: Peter Maydell, Rusty Russell, Dave Airlie, qemu-devel@nongnu.org On Wed, Mar 26, 2014 at 7:34 AM, Markus Armbruster <armbru@redhat.com> wrote: > Peter Maydell <peter.maydell@linaro.org> writes: > >> On 20 March 2014 06:39, Markus Armbruster <armbru@redhat.com> wrote: >>> Time to cease the practice. Will be hard as long as the code >>> is chock-full of bad examples. >> >> I've been consistently rejecting new instances of >> guest triggerable exit() or abort() in code review when >> I see them for at least the last six months to a year or so >> (since we improved the qemu_log() facility so it could be >> used by device models too). > > Appreciated. How are these errors expected to be reflected into guests? PCI aborts could be one place to stand, for example. -- vs; ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-19 0:34 ` Rusty Russell 2014-03-19 8:12 ` Markus Armbruster @ 2014-03-20 6:51 ` Michael S. Tsirkin 2014-03-21 9:44 ` Yan Vugenfirer 1 sibling, 1 reply; 21+ messages in thread From: Michael S. Tsirkin @ 2014-03-20 6:51 UTC (permalink / raw) To: Rusty Russell; +Cc: Dave Airlie, qemu-devel@nongnu.org On Wed, Mar 19, 2014 at 11:04:19AM +1030, Rusty Russell wrote: > Dave Airlie <airlied@gmail.com> writes: > > So I'm looking at how best to do virtio gpu device error reporting, > > and how to deal with illegal stuff, > > > > I've two levels of errors I want to support, > > > > a) unrecoverable or bad guest kernel programming errors, > > The QEMU standard approach is to exit at this point. No, really. It's easy on the hypervisor but often not very friendly for driver writers who might not be qemu experts. QEMU's moving away from exiting on errors and it would be nice to have a robust way to report driver bugs. How about setting VIRTIO_CONFIG_S_DEVICE_FAILED ? Another idea that windows driver implemented is reporting failure reason hint. They wrote it out to ISR, specifically they notified host about watchdog timer expiration for net device in this way. > > b) per 3D context errors from the renderer backend, > > > > (b) I can easily report in an event queue and the guest kernel can in > > theory blow away the offenders, this is how GL works with some > > extensions, > > That's probably sanest. If it's possible to identify the offenders, I agree a VQ is better than config space for that. Need to make sure the queue is big enough to avoid underrun of that queue though. Is that always possible? > > GPU control queue, the response should always be no error, but in some > > cases it will be because the guest hit some host resource error, or > > asked for something insane, (guest kernel drivers would be broken in > > most of these cases). > > > > Alternately I can use the separate event queue to send async errors > > when the guest does something bad, > > > > I'm also considering adding some sort of flag in config space saying > > the device needs a reset before it will continue doing anything, > > I generally dislike error codes which Never Happen; it's like making > every void function return int just in case: the caller has no idea what > to do if it fails. > > The litmus test: does *your* guest handle failures other than by giving > up on the device? If so, sure, you need to have a sane error-reporting > strategy. Right but driver development is also a valid need. > > The main reason I'm considering this stuff is for security reasons if > > the guest asks for something really illegal or crazy what should the > > expected behaviour of the host be? (at least secure I know that). > > If the guest userspace can do it, don't exit. If the kernel only, and > it's should have known better, abort is OK. I second that, at least for now. Maybe we will add more capabilities in virtio 1.0, or after that. > Sure that doesn't help much! > Rusty. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] virtio device error reporting best practice? 2014-03-20 6:51 ` Michael S. Tsirkin @ 2014-03-21 9:44 ` Yan Vugenfirer 0 siblings, 0 replies; 21+ messages in thread From: Yan Vugenfirer @ 2014-03-21 9:44 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Rusty Russell, Dave Airlie, qemu-devel@nongnu.org [-- Attachment #1: Type: text/plain, Size: 3166 bytes --] On Mar 20, 2014, at 8:51 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Wed, Mar 19, 2014 at 11:04:19AM +1030, Rusty Russell wrote: >> Dave Airlie <airlied@gmail.com> writes: >>> So I'm looking at how best to do virtio gpu device error reporting, >>> and how to deal with illegal stuff, >>> >>> I've two levels of errors I want to support, >>> >>> a) unrecoverable or bad guest kernel programming errors, >> >> The QEMU standard approach is to exit at this point. No, really. > > It's easy on the hypervisor but often not very friendly for driver writers > who might not be qemu experts. > QEMU's moving away from exiting on errors and it would be nice > to have a robust way to report driver bugs. > How about setting VIRTIO_CONFIG_S_DEVICE_FAILED ? > > Another idea that windows driver implemented is reporting > failure reason hint. They wrote it out to ISR, specifically > they notified host about watchdog timer expiration for net device > in this way. I removed it for now and really would like to have an official way to bring it back. Also going back to the original question - Windows can handle graphic cards HW errors by reloading the driver and reseting the device (stating from Vista). > >>> b) per 3D context errors from the renderer backend, >>> >>> (b) I can easily report in an event queue and the guest kernel can in >>> theory blow away the offenders, this is how GL works with some >>> extensions, >> >> That's probably sanest. > > If it's possible to identify the offenders, I agree > a VQ is better than config space for that. > Need to make sure the queue is big enough to avoid > underrun of that queue though. Is that always possible? > >>> GPU control queue, the response should always be no error, but in some >>> cases it will be because the guest hit some host resource error, or >>> asked for something insane, (guest kernel drivers would be broken in >>> most of these cases). >>> >>> Alternately I can use the separate event queue to send async errors >>> when the guest does something bad, >>> >>> I'm also considering adding some sort of flag in config space saying >>> the device needs a reset before it will continue doing anything, >> >> I generally dislike error codes which Never Happen; it's like making >> every void function return int just in case: the caller has no idea what >> to do if it fails. >> >> The litmus test: does *your* guest handle failures other than by giving >> up on the device? If so, sure, you need to have a sane error-reporting >> strategy. > > Right but driver development is also a valid need. > >>> The main reason I'm considering this stuff is for security reasons if >>> the guest asks for something really illegal or crazy what should the >>> expected behaviour of the host be? (at least secure I know that). >> >> If the guest userspace can do it, don't exit. If the kernel only, and >> it's should have known better, abort is OK. > > I second that, at least for now. > Maybe we will add more capabilities in virtio 1.0, or > after that. > >> Sure that doesn't help much! >> Rusty. [-- Attachment #2: Type: text/html, Size: 4592 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2014-03-27 0:54 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-03-17 6:02 [Qemu-devel] virtio device error reporting best practice? Dave Airlie 2014-03-17 14:28 ` Laszlo Ersek 2014-03-17 14:40 ` Peter Maydell 2014-03-17 14:49 ` Laszlo Ersek 2014-03-17 14:54 ` Peter Maydell 2014-03-17 14:57 ` Gerd Hoffmann 2014-03-17 19:05 ` Andreas Färber 2014-03-18 12:45 ` Kevin Wolf 2014-03-17 14:57 ` Richard W.M. Jones 2014-03-17 14:59 ` Richard W.M. Jones 2014-03-26 12:49 ` Stefan Hajnoczi 2014-03-17 14:50 ` Gerd Hoffmann 2014-03-19 0:34 ` Rusty Russell 2014-03-19 8:12 ` Markus Armbruster 2014-03-20 3:40 ` Rusty Russell 2014-03-20 6:39 ` Markus Armbruster 2014-03-20 12:53 ` Peter Maydell 2014-03-26 14:34 ` Markus Armbruster 2014-03-27 0:54 ` Venkatesh Srinivas 2014-03-20 6:51 ` Michael S. Tsirkin 2014-03-21 9:44 ` Yan Vugenfirer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).