* [Qemu-devel] [PATCH] blockdev: reset werror/rerror on drive_del @ 2013-06-03 14:58 Stefan Hajnoczi 2013-06-03 15:20 ` Paolo Bonzini ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Stefan Hajnoczi @ 2013-06-03 14:58 UTC (permalink / raw) To: qemu-devel Cc: Kevin Wolf, dron, Markus Armbruster, Stefan Hajnoczi, Paolo Bonzini Paolo Bonzini <pbonzini@redhat.com> suggested the following test case: 1. Launch a guest and wait at the GRUB boot menu: qemu-system-x86_64 -enable-kvm -m 1024 \ -drive if=none,cache=none,file=test.img,id=foo,werror=stop,rerror=stop -device virtio-blk-pci,drive=foo,id=virtio0,addr=4 2. Hot unplug the device: (qemu) drive_del foo 3. Select the first boot menu entry Without this patch the guest pauses due to ENOMEDIUM. But it is not possible to resolve this situation - the drive has become anonymous. With this patch the guest the guest gets the ENOMEDIUM error. Note that this scenario actually happens sometimes during libvirt disk hot unplug, where device_del is followed by drive_del. I/O may still be submitted to the drive after drive_del if the guest does not process the PCI hot unplug notification. Reported-by: Dafna Ron <dron@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> --- blockdev.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/blockdev.c b/blockdev.c index d1ec99a..6eb81a3 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1180,6 +1180,10 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) */ if (bdrv_get_attached_dev(bs)) { bdrv_make_anon(bs); + + /* Further I/O must not pause the guest */ + bdrv_set_on_error(bs, BLOCKDEV_ON_ERROR_REPORT, + BLOCKDEV_ON_ERROR_REPORT); } else { drive_uninit(drive_get_by_blockdev(bs)); } -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] blockdev: reset werror/rerror on drive_del 2013-06-03 14:58 [Qemu-devel] [PATCH] blockdev: reset werror/rerror on drive_del Stefan Hajnoczi @ 2013-06-03 15:20 ` Paolo Bonzini 2013-06-04 16:37 ` Markus Armbruster 2013-06-05 8:26 ` Fam Zheng 2 siblings, 0 replies; 9+ messages in thread From: Paolo Bonzini @ 2013-06-03 15:20 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Kevin Wolf, dron, qemu-devel, Markus Armbruster Il 03/06/2013 16:58, Stefan Hajnoczi ha scritto: > Paolo Bonzini <pbonzini@redhat.com> suggested the following test case: > > 1. Launch a guest and wait at the GRUB boot menu: > > qemu-system-x86_64 -enable-kvm -m 1024 \ > -drive if=none,cache=none,file=test.img,id=foo,werror=stop,rerror=stop > -device virtio-blk-pci,drive=foo,id=virtio0,addr=4 > > 2. Hot unplug the device: > > (qemu) drive_del foo > > 3. Select the first boot menu entry > > Without this patch the guest pauses due to ENOMEDIUM. But it is not > possible to resolve this situation - the drive has become anonymous. > > With this patch the guest the guest gets the ENOMEDIUM error. > > Note that this scenario actually happens sometimes during libvirt disk > hot unplug, where device_del is followed by drive_del. I/O may still be > submitted to the drive after drive_del if the guest does not process the > PCI hot unplug notification. > > Reported-by: Dafna Ron <dron@redhat.com> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > blockdev.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/blockdev.c b/blockdev.c > index d1ec99a..6eb81a3 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -1180,6 +1180,10 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) > */ > if (bdrv_get_attached_dev(bs)) { > bdrv_make_anon(bs); > + > + /* Further I/O must not pause the guest */ > + bdrv_set_on_error(bs, BLOCKDEV_ON_ERROR_REPORT, > + BLOCKDEV_ON_ERROR_REPORT); > } else { > drive_uninit(drive_get_by_blockdev(bs)); > } > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Paolo ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] blockdev: reset werror/rerror on drive_del 2013-06-03 14:58 [Qemu-devel] [PATCH] blockdev: reset werror/rerror on drive_del Stefan Hajnoczi 2013-06-03 15:20 ` Paolo Bonzini @ 2013-06-04 16:37 ` Markus Armbruster 2013-06-04 17:04 ` Paolo Bonzini 2013-06-05 8:21 ` Stefan Hajnoczi 2013-06-05 8:26 ` Fam Zheng 2 siblings, 2 replies; 9+ messages in thread From: Markus Armbruster @ 2013-06-04 16:37 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Kevin Wolf, dron, qemu-devel, Paolo Bonzini Stefan Hajnoczi <stefanha@redhat.com> writes: > Paolo Bonzini <pbonzini@redhat.com> suggested the following test case: > > 1. Launch a guest and wait at the GRUB boot menu: > > qemu-system-x86_64 -enable-kvm -m 1024 \ > -drive if=none,cache=none,file=test.img,id=foo,werror=stop,rerror=stop > -device virtio-blk-pci,drive=foo,id=virtio0,addr=4 > > 2. Hot unplug the device: > > (qemu) drive_del foo > > 3. Select the first boot menu entry > > Without this patch the guest pauses due to ENOMEDIUM. But it is not > possible to resolve this situation - the drive has become anonymous. > > With this patch the guest the guest gets the ENOMEDIUM error. > > Note that this scenario actually happens sometimes during libvirt disk > hot unplug, where device_del is followed by drive_del. I/O may still be > submitted to the drive after drive_del if the guest does not process the > PCI hot unplug notification. > > Reported-by: Dafna Ron <dron@redhat.com> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > blockdev.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/blockdev.c b/blockdev.c > index d1ec99a..6eb81a3 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -1180,6 +1180,10 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) > */ > if (bdrv_get_attached_dev(bs)) { > bdrv_make_anon(bs); > + > + /* Further I/O must not pause the guest */ > + bdrv_set_on_error(bs, BLOCKDEV_ON_ERROR_REPORT, > + BLOCKDEV_ON_ERROR_REPORT); > } else { > drive_uninit(drive_get_by_blockdev(bs)); > } The user gets exactly what he ordered. He ordered "stop on error", then provoked errors by turning the virtual block device into a virtual pile of scrap metal. Because that's exactly what drive_del does when used while a device model is attached to the drive. The only sane use case for drive_del I can think of is revoking access to an image violently, after the guest failed to honor a hot unplug. Even then, using drive_del when the block device is removable is unnecessary. Just rip out the medium with eject -f. Look ma, no scrap metal. I'm not sure what you mean by "it is not possible to resolve this situation". The device is shot! Can't see how that could be resolved. I figure the bit that can't be resolved now is letting the user switch off "stop on error" safely before a drive_del. Even if we had a command for that, there'd still be a window between that command's execution and drive_del's. Your patch solves the problem by having drive_del switch it off unconditionally. Oookay, but please document it, because it's not exactly obvious. Re "the guest gets the ENOMEDIUM error": depends on the device. I doubt disks can signal "no medium", and even if they could, I doubt device drivers are prepared for it. Re "this scenario actually happens sometimes during libvirt disk hot unplug, where device_del is followed by drive_del": if I remember correctly, libvirt disk hot unplug runs drive_del right after device_del, opening a window where the guest sees a dead device. That's asking for trouble, and trouble is known to oblige. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] blockdev: reset werror/rerror on drive_del 2013-06-04 16:37 ` Markus Armbruster @ 2013-06-04 17:04 ` Paolo Bonzini 2013-06-04 19:24 ` Markus Armbruster 2013-06-05 8:21 ` Stefan Hajnoczi 1 sibling, 1 reply; 9+ messages in thread From: Paolo Bonzini @ 2013-06-04 17:04 UTC (permalink / raw) To: Markus Armbruster; +Cc: Kevin Wolf, dron, qemu-devel, Stefan Hajnoczi Il 04/06/2013 18:37, Markus Armbruster ha scritto: > I figure the bit that can't be resolved now is letting the user switch > off "stop on error" safely before a drive_del. Even if we had a command > for that, there'd still be a window between that command's execution and > drive_del's. Your patch solves the problem by having drive_del switch > it off unconditionally. Oookay, but please document it, because it's > not exactly obvious. It is not obvious, but it is not surprising either when you see it (i.e. you won't really be surprised by the errors in the guest and won't need to know that, under the hood, rerror has been changed from the value you specified). > Re "the guest gets the ENOMEDIUM error": depends on the device. I doubt > disks can signal "no medium", and even if they could, I doubt device > drivers are prepared for it. SCSI disks can signal whatever they want. Device drivers will just treat it as any other error (sense code) they don't recognize. > Re "this scenario actually happens sometimes during libvirt disk hot > unplug, where device_del is followed by drive_del": if I remember > correctly, libvirt disk hot unplug runs drive_del right after > device_del, opening a window where the guest sees a dead device. That's > asking for trouble, and trouble is known to oblige. I think it's causing too much trouble though, and Stefan's patch is making the trouble evident to the guest. Surprise removal is a fact of life, I don't think it makes sense to stop the machine on surprise removal. It's very different from I/O errors. Paolo ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] blockdev: reset werror/rerror on drive_del 2013-06-04 17:04 ` Paolo Bonzini @ 2013-06-04 19:24 ` Markus Armbruster 2013-06-04 19:32 ` Eric Blake 0 siblings, 1 reply; 9+ messages in thread From: Markus Armbruster @ 2013-06-04 19:24 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Kevin Wolf, dron, qemu-devel, Stefan Hajnoczi Paolo Bonzini <pbonzini@redhat.com> writes: > Il 04/06/2013 18:37, Markus Armbruster ha scritto: >> I figure the bit that can't be resolved now is letting the user switch >> off "stop on error" safely before a drive_del. Even if we had a command >> for that, there'd still be a window between that command's execution and >> drive_del's. Your patch solves the problem by having drive_del switch >> it off unconditionally. Oookay, but please document it, because it's >> not exactly obvious. > > It is not obvious, but it is not surprising either when you see it (i.e. > you won't really be surprised by the errors in the guest and won't need > to know that, under the hood, rerror has been changed from the value you > specified). > >> Re "the guest gets the ENOMEDIUM error": depends on the device. I doubt >> disks can signal "no medium", and even if they could, I doubt device >> drivers are prepared for it. > > SCSI disks can signal whatever they want. Device drivers will just > treat it as any other error (sense code) they don't recognize. My point is: the commit message claims "the guest gets the ENOMEDIUM error", which isn't really true. No biggie, of course. >> Re "this scenario actually happens sometimes during libvirt disk hot >> unplug, where device_del is followed by drive_del": if I remember >> correctly, libvirt disk hot unplug runs drive_del right after >> device_del, opening a window where the guest sees a dead device. That's >> asking for trouble, and trouble is known to oblige. > > I think it's causing too much trouble though, and Stefan's patch is > making the trouble evident to the guest. Surprise removal is a fact of > life, I don't think it makes sense to stop the machine on surprise > removal. It's very different from I/O errors. I don't disagree with Stefan's patch, or your defense of it. Except I'm reluctant to not document something non-obvious because "you don't need to know" when I can document it in less time it would take me to overcome my resistance to "you don't need to know" arguments ;) This is drive_add's documentation in hmp-commands.hx: Remove host block device. The result is that guest generated IO is no longer submitted against the host device underlying the disk. Once a drive has been deleted, the QEMU Block layer returns -EIO which results in IO errors in the guest for applications that are reading/writing to the device. Suggest to add: These errors are always reported to the guest, regardless of the drive's error actions (drive options rerror, werror). Independently, libvirt needs fixing. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] blockdev: reset werror/rerror on drive_del 2013-06-04 19:24 ` Markus Armbruster @ 2013-06-04 19:32 ` Eric Blake 2013-06-05 7:09 ` Markus Armbruster 0 siblings, 1 reply; 9+ messages in thread From: Eric Blake @ 2013-06-04 19:32 UTC (permalink / raw) To: Markus Armbruster Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, dron [-- Attachment #1: Type: text/plain, Size: 1741 bytes --] On 06/04/2013 01:24 PM, Markus Armbruster wrote: > Paolo Bonzini <pbonzini@redhat.com> writes: > >> Il 04/06/2013 18:37, Markus Armbruster ha scritto: >>> I figure the bit that can't be resolved now is letting the user switch >>> off "stop on error" safely before a drive_del. Even if we had a command >>> for that, there'd still be a window between that command's execution and >>> drive_del's. Your patch solves the problem by having drive_del switch >>> it off unconditionally. Oookay, but please document it, because it's >>> not exactly obvious. >> >> It is not obvious, but it is not surprising either when you see it (i.e. >> you won't really be surprised by the errors in the guest and won't need >> to know that, under the hood, rerror has been changed from the value you >> specified). >> > This is drive_add's documentation in hmp-commands.hx: > > Remove host block device. The result is that guest generated IO is > no longer submitted against the host device underlying the disk. > Once a drive has been deleted, the QEMU Block layer returns -EIO > which results in IO errors in the guest for applications that are > reading/writing to the device. > > Suggest to add: > > These errors are always reported to the guest, regardless of the > drive's error actions (drive options rerror, werror). > > Independently, libvirt needs fixing. Total agreement that libvirt needs to use a saner disk hot-unplug sequence when it is known that qemu provides one. I've filed https://bugzilla.redhat.com/show_bug.cgi?id=970761 to remind us to fix libvirt. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 621 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] blockdev: reset werror/rerror on drive_del 2013-06-04 19:32 ` Eric Blake @ 2013-06-05 7:09 ` Markus Armbruster 0 siblings, 0 replies; 9+ messages in thread From: Markus Armbruster @ 2013-06-05 7:09 UTC (permalink / raw) To: Eric Blake; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, dron Eric Blake <eblake@redhat.com> writes: > On 06/04/2013 01:24 PM, Markus Armbruster wrote: >> Paolo Bonzini <pbonzini@redhat.com> writes: >> >>> Il 04/06/2013 18:37, Markus Armbruster ha scritto: >>>> I figure the bit that can't be resolved now is letting the user switch >>>> off "stop on error" safely before a drive_del. Even if we had a command >>>> for that, there'd still be a window between that command's execution and >>>> drive_del's. Your patch solves the problem by having drive_del switch >>>> it off unconditionally. Oookay, but please document it, because it's >>>> not exactly obvious. >>> >>> It is not obvious, but it is not surprising either when you see it (i.e. >>> you won't really be surprised by the errors in the guest and won't need >>> to know that, under the hood, rerror has been changed from the value you >>> specified). >>> > >> This is drive_add's documentation in hmp-commands.hx: >> >> Remove host block device. The result is that guest generated IO is >> no longer submitted against the host device underlying the disk. >> Once a drive has been deleted, the QEMU Block layer returns -EIO >> which results in IO errors in the guest for applications that are >> reading/writing to the device. >> >> Suggest to add: >> >> These errors are always reported to the guest, regardless of the >> drive's error actions (drive options rerror, werror). >> >> Independently, libvirt needs fixing. > > Total agreement that libvirt needs to use a saner disk hot-unplug > sequence when it is known that qemu provides one. I've filed > https://bugzilla.redhat.com/show_bug.cgi?id=970761 > to remind us to fix libvirt. Sane sequence 1. device_del 2. Wait for DEVICE_DELETED 3. if timeout, drive_del Might make sense to offer a choice in the API between "fail" and "destroy the block device" if guest doesn't cooperate. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] blockdev: reset werror/rerror on drive_del 2013-06-04 16:37 ` Markus Armbruster 2013-06-04 17:04 ` Paolo Bonzini @ 2013-06-05 8:21 ` Stefan Hajnoczi 1 sibling, 0 replies; 9+ messages in thread From: Stefan Hajnoczi @ 2013-06-05 8:21 UTC (permalink / raw) To: Markus Armbruster; +Cc: Kevin Wolf, dron, qemu-devel, Paolo Bonzini On Tue, Jun 04, 2013 at 06:37:27PM +0200, Markus Armbruster wrote: > Stefan Hajnoczi <stefanha@redhat.com> writes: > > > Paolo Bonzini <pbonzini@redhat.com> suggested the following test case: > > > > 1. Launch a guest and wait at the GRUB boot menu: > > > > qemu-system-x86_64 -enable-kvm -m 1024 \ > > -drive if=none,cache=none,file=test.img,id=foo,werror=stop,rerror=stop > > -device virtio-blk-pci,drive=foo,id=virtio0,addr=4 > > > > 2. Hot unplug the device: > > > > (qemu) drive_del foo > > > > 3. Select the first boot menu entry > > > > Without this patch the guest pauses due to ENOMEDIUM. But it is not > > possible to resolve this situation - the drive has become anonymous. > > > > With this patch the guest the guest gets the ENOMEDIUM error. > > > > Note that this scenario actually happens sometimes during libvirt disk > > hot unplug, where device_del is followed by drive_del. I/O may still be > > submitted to the drive after drive_del if the guest does not process the > > PCI hot unplug notification. > > > > Reported-by: Dafna Ron <dron@redhat.com> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > --- > > blockdev.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/blockdev.c b/blockdev.c > > index d1ec99a..6eb81a3 100644 > > --- a/blockdev.c > > +++ b/blockdev.c > > @@ -1180,6 +1180,10 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) > > */ > > if (bdrv_get_attached_dev(bs)) { > > bdrv_make_anon(bs); > > + > > + /* Further I/O must not pause the guest */ > > + bdrv_set_on_error(bs, BLOCKDEV_ON_ERROR_REPORT, > > + BLOCKDEV_ON_ERROR_REPORT); > > } else { > > drive_uninit(drive_get_by_blockdev(bs)); > > } > > The user gets exactly what he ordered. He ordered "stop on error", then > provoked errors by turning the virtual block device into a virtual pile > of scrap metal. Because that's exactly what drive_del does when used > while a device model is attached to the drive. > > The only sane use case for drive_del I can think of is revoking access > to an image violently, after the guest failed to honor a hot unplug. > > Even then, using drive_del when the block device is removable is > unnecessary. Just rip out the medium with eject -f. Look ma, no scrap > metal. > > I'm not sure what you mean by "it is not possible to resolve this > situation". The device is shot! Can't see how that could be resolved. This is the critical part: the guest is paused and there is no way to resolve the continuous pause loop. The drive is gone but the guest hasn't PCI hot unplugged the storage controller. As a user, there's nothing you can do on the QEMU monitor to resume the guest - it will just pause itself again. This behavior is really bad, QEMU has basically wedged the guest into an unrecoverable state and that's what I was trying to describe. > I figure the bit that can't be resolved now is letting the user switch > off "stop on error" safely before a drive_del. Even if we had a command > for that, there'd still be a window between that command's execution and > drive_del's. Your patch solves the problem by having drive_del switch > it off unconditionally. Oookay, but please document it, because it's > not exactly obvious. Thanks for the documentation suggestion, will add it in v2. > Re "the guest gets the ENOMEDIUM error": depends on the device. I doubt > disks can signal "no medium", and even if they could, I doubt device > drivers are prepared for it. Yep, error reporting depends on the emulated storage controller. virtio-blk and IDE just report a generic error status. > Re "this scenario actually happens sometimes during libvirt disk hot > unplug, where device_del is followed by drive_del": if I remember > correctly, libvirt disk hot unplug runs drive_del right after > device_del, opening a window where the guest sees a dead device. That's > asking for trouble, and trouble is known to oblige. Agreed. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] blockdev: reset werror/rerror on drive_del 2013-06-03 14:58 [Qemu-devel] [PATCH] blockdev: reset werror/rerror on drive_del Stefan Hajnoczi 2013-06-03 15:20 ` Paolo Bonzini 2013-06-04 16:37 ` Markus Armbruster @ 2013-06-05 8:26 ` Fam Zheng 2 siblings, 0 replies; 9+ messages in thread From: Fam Zheng @ 2013-06-05 8:26 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, dron, Paolo Bonzini, qemu-devel, Markus Armbruster On Mon, 06/03 16:58, Stefan Hajnoczi wrote: > Paolo Bonzini <pbonzini@redhat.com> suggested the following test case: > > 1. Launch a guest and wait at the GRUB boot menu: > > qemu-system-x86_64 -enable-kvm -m 1024 \ > -drive if=none,cache=none,file=test.img,id=foo,werror=stop,rerror=stop > -device virtio-blk-pci,drive=foo,id=virtio0,addr=4 > > 2. Hot unplug the device: > > (qemu) drive_del foo > > 3. Select the first boot menu entry > > Without this patch the guest pauses due to ENOMEDIUM. But it is not > possible to resolve this situation - the drive has become anonymous. > > With this patch the guest the guest gets the ENOMEDIUM error. s/the guest the guest/the guest/ -- Fam ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-06-05 8:26 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-06-03 14:58 [Qemu-devel] [PATCH] blockdev: reset werror/rerror on drive_del Stefan Hajnoczi 2013-06-03 15:20 ` Paolo Bonzini 2013-06-04 16:37 ` Markus Armbruster 2013-06-04 17:04 ` Paolo Bonzini 2013-06-04 19:24 ` Markus Armbruster 2013-06-04 19:32 ` Eric Blake 2013-06-05 7:09 ` Markus Armbruster 2013-06-05 8:21 ` Stefan Hajnoczi 2013-06-05 8:26 ` Fam Zheng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).