From: Stefan Hajnoczi <stefanha@redhat.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
dron@redhat.com, qemu-devel@nongnu.org,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH] blockdev: reset werror/rerror on drive_del
Date: Wed, 5 Jun 2013 10:21:46 +0200 [thread overview]
Message-ID: <20130605082146.GB26845@stefanha-thinkpad.muc.redhat.com> (raw)
In-Reply-To: <87ppw1rfw8.fsf@blackfin.pond.sub.org>
On Tue, Jun 04, 2013 at 06:37:27PM +0200, Markus Armbruster wrote:
> Stefan Hajnoczi <stefanha@redhat.com> writes:
>
> > Paolo Bonzini <pbonzini@redhat.com> suggested the following test case:
> >
> > 1. Launch a guest and wait at the GRUB boot menu:
> >
> > qemu-system-x86_64 -enable-kvm -m 1024 \
> > -drive if=none,cache=none,file=test.img,id=foo,werror=stop,rerror=stop
> > -device virtio-blk-pci,drive=foo,id=virtio0,addr=4
> >
> > 2. Hot unplug the device:
> >
> > (qemu) drive_del foo
> >
> > 3. Select the first boot menu entry
> >
> > Without this patch the guest pauses due to ENOMEDIUM. But it is not
> > possible to resolve this situation - the drive has become anonymous.
> >
> > With this patch the guest the guest gets the ENOMEDIUM error.
> >
> > Note that this scenario actually happens sometimes during libvirt disk
> > hot unplug, where device_del is followed by drive_del. I/O may still be
> > submitted to the drive after drive_del if the guest does not process the
> > PCI hot unplug notification.
> >
> > Reported-by: Dafna Ron <dron@redhat.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > blockdev.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/blockdev.c b/blockdev.c
> > index d1ec99a..6eb81a3 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -1180,6 +1180,10 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
> > */
> > if (bdrv_get_attached_dev(bs)) {
> > bdrv_make_anon(bs);
> > +
> > + /* Further I/O must not pause the guest */
> > + bdrv_set_on_error(bs, BLOCKDEV_ON_ERROR_REPORT,
> > + BLOCKDEV_ON_ERROR_REPORT);
> > } else {
> > drive_uninit(drive_get_by_blockdev(bs));
> > }
>
> The user gets exactly what he ordered. He ordered "stop on error", then
> provoked errors by turning the virtual block device into a virtual pile
> of scrap metal. Because that's exactly what drive_del does when used
> while a device model is attached to the drive.
>
> The only sane use case for drive_del I can think of is revoking access
> to an image violently, after the guest failed to honor a hot unplug.
>
> Even then, using drive_del when the block device is removable is
> unnecessary. Just rip out the medium with eject -f. Look ma, no scrap
> metal.
>
> I'm not sure what you mean by "it is not possible to resolve this
> situation". The device is shot! Can't see how that could be resolved.
This is the critical part: the guest is paused and there is no way to
resolve the continuous pause loop. The drive is gone but the guest
hasn't PCI hot unplugged the storage controller. As a user, there's
nothing you can do on the QEMU monitor to resume the guest - it will
just pause itself again.
This behavior is really bad, QEMU has basically wedged the guest into an
unrecoverable state and that's what I was trying to describe.
> I figure the bit that can't be resolved now is letting the user switch
> off "stop on error" safely before a drive_del. Even if we had a command
> for that, there'd still be a window between that command's execution and
> drive_del's. Your patch solves the problem by having drive_del switch
> it off unconditionally. Oookay, but please document it, because it's
> not exactly obvious.
Thanks for the documentation suggestion, will add it in v2.
> Re "the guest gets the ENOMEDIUM error": depends on the device. I doubt
> disks can signal "no medium", and even if they could, I doubt device
> drivers are prepared for it.
Yep, error reporting depends on the emulated storage controller.
virtio-blk and IDE just report a generic error status.
> Re "this scenario actually happens sometimes during libvirt disk hot
> unplug, where device_del is followed by drive_del": if I remember
> correctly, libvirt disk hot unplug runs drive_del right after
> device_del, opening a window where the guest sees a dead device. That's
> asking for trouble, and trouble is known to oblige.
Agreed.
next prev parent reply other threads:[~2013-06-05 8:21 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-03 14:58 [Qemu-devel] [PATCH] blockdev: reset werror/rerror on drive_del Stefan Hajnoczi
2013-06-03 15:20 ` Paolo Bonzini
2013-06-04 16:37 ` Markus Armbruster
2013-06-04 17:04 ` Paolo Bonzini
2013-06-04 19:24 ` Markus Armbruster
2013-06-04 19:32 ` Eric Blake
2013-06-05 7:09 ` Markus Armbruster
2013-06-05 8:21 ` Stefan Hajnoczi [this message]
2013-06-05 8:26 ` Fam Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130605082146.GB26845@stefanha-thinkpad.muc.redhat.com \
--to=stefanha@redhat.com \
--cc=armbru@redhat.com \
--cc=dron@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).