From: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] drive_del vs. device_del: what should come first?
Date: Thu, 03 Apr 2014 17:20:57 +0200 [thread overview]
Message-ID: <533D7C59.9000703@linux.vnet.ibm.com> (raw)
In-Reply-To: <87d2gzpr46.fsf@blackfin.pond.sub.org>
On 02/04/14 19:40, Markus Armbruster wrote:
> Heinz Graalfs <graalfs@linux.vnet.ibm.com> writes:
>
>> On 01/04/14 17:48, Markus Armbruster wrote:
>>> Heinz Graalfs <graalfs@linux.vnet.ibm.com> writes:
>>>
>>>> Hi Kevin,
>>>>
>>>> doing a
>>>>
>>>> virsh detach-device ...
>>>>
>>>> ends up in the following QEMU monitor commands:
>>>>
>>>> 1. device_del ...
>>>> 2. drive_del ...
>>>>
>>>> qmp_device_del() performs the device unplug path.
>>>> In case of a block device do_drive_del() tries to
>>>> prevent further IO against the host device.
>>>>
>>>> However, bdrv_find() during drive_del() results in
>>>> an error, because the device is already gone. Due to
>>>> this error all the bdrv_xxx calls to quiesce the block
>>>> driver as well as all other processing is skipped.
>>>>
>>>> Is the sequence that libvirt triggers OK?
>>>> Shouldn't drive_del be executed first?
>>>
>>> No.
>>
>> OK, I see. The drive is deleted implicitly (release_drive()).
>> Doing a device_del() requires another drive_add() AND device_add().
>
> Yes. The automatic drive delete on unplug is a wart that will not be
> preserved in the new interfaces we're building now.
OK, thanks for the info.
>
>> (Doing just a device_add() complains about the missing drive.
>> A subsequent info qtree lets QEMU abort.)
>
> Really? Got a reproducer for me?
The problem seems to be gone on master. I was on 1.7.0 with the info
qtree problem.
>
>>> drive_del is nasty. Its purpose is to revoke access to an image even
>>> when the guest refuses to cooperate. To the guest, this looks like
>>> hardware failure.
>>
>> Deleting a device during active IO is nasty and it should look like a
>> hardware failure. I would expect lots of errors.
>>
>>>
>>> If you drive_del before device_del, even a perfectly well-behaved guest
>>> guest is exposed to a terminally broken device between drive_del and
>>> completion of unplug.
>>
>> The early drive_del() would mean that no further IO would be
>> possible.
>
> A polite way to describe the effect of drive_del on the guest.
>
> drive_del makes all attempts at actual I/O error out without any
> forewarning whatsoever. If you do that to your guest, and something
> breaks, you get to keep the pieces. Even if you "only" do it for a
> short period of time.
>
>>> Always try a device_del first, and only if that does not complete within
>>> reasonable time, and you absolutely must revoke access to the image,
>>> then whack it over the head with drive_del.
>>
>> What is this reasonable time?
>
> Depends on how heavily loaded host and guest are. Suggest to measure
> with a suitably thrashing guest, then shift left to taste.
>
>> On 390 we experience problems (QEMU abort) when asynch block IO
>> completes and the virtqueues are already gone. I suppose the
>> bdrv_drain_all() in bdrv_close() is a little late. I don't see such
>> problems with an early bdrv_drain_all() (drive_del) and an unplug
>> (device_del) afterwards.
>
> Sounds like a bug. Got a reproducer?
Here I don't have a reproducer either. Just an old stacktrace, that
occurred several times. It definitely depends on the asynch workload
that is running during the device_del. I will try if I can reproduce
the situation
Program received signal SIGFPE, Arithmetic exception.
0x000000008026e48a in virtqueue_fill (vq=0x809bcad0, elem=0x3ffd80013d8,
len=4097, idx=0) at qemu/hw/virtio/virtio.c:259
259 idx = (idx + vring_used_idx(vq)) % vq->vring.num;
(gdb) bt
#0 0x000000008026e48a in virtqueue_fill (vq=0x809bcad0,
elem=0x3ffd80013d8, len=4097, idx=0) at qemu/hw/virtio/virtio.c:259
#1 0x000000008026e68c in virtqueue_push (vq=0x809bcad0,
elem=0x3ffd80013d8, len=4097) at qemu/hw/virtio/virtio.c:283
#2 0x0000000080237962 in virtio_blk_req_complete (req=0x3ffd80013d0,
status=0) at qemu/hw/block/virtio-blk.c:50
#3 0x0000000080237c08 in virtio_blk_rw_complete (opaque=0x3ffd80013d0,
ret=0) at qemu/hw/block/virtio-blk.c:85
#4 0x0000000080032916 in bdrv_co_em_bh (opaque=0x3ffd800da50) at
qemu/block.c:3847
#5 0x000000008000da0c in aio_bh_poll (ctx=0x808f3e00) at qemu/async.c:81
#6 0x000000008000d498 in aio_poll (ctx=0x808f3e00, blocking=true) at
qemu/aio-posix.c:185
#7 0x0000000080126f56 in qemu_aio_wait () at qemu/main-loop.c:485
#8 0x000000008002a134 in bdrv_drain_all () at qemu/block.c:1449
#9 0x0000000080029e64 in bdrv_close (bs=0x809c1680) at qemu/block.c:1378
#10 0x000000008002ac56 in bdrv_delete (bs=0x809c1680) at qemu/block.c:1612
#11 0x000000008008a7fe in drive_uninit (dinfo=0x809aad00) at
qemu/blockdev.c:216
#12 0x000000008008a95c in drive_put_ref (dinfo=0x809aad00) at
qemu/blockdev.c:227
#13 0x000000008008a00e in blockdev_auto_del (bs=0x809c1680) at
qemu/blockdev.c:106
#14 0x00000000800dc004 in release_drive (obj=0x809b4640, name=0x809b5080
"drive", opaque=0x8044d4b8 <virtio_ccw_blk_properties+224>)
at qemu/hw/core/qdev-properties-system.c:90
#15 0x0000000080195dcc in object_property_del_all (obj=0x809b4640) at
qemu/qom/object.c:343
#16 0x0000000080196158 in object_finalize (data=0x809b4640) at
qemu/qom/object.c:397
#17 0x0000000080197322 in object_unref (obj=0x809b4640) at
qemu/qom/object.c:695
#18 0x0000000080195ff6 in object_unparent (obj=0x809b4640) at
qemu/qom/object.c:377
#19 0x00000000800e2bf6 in qdev_free (dev=0x809b4640) at
qemu/hw/core/qdev.c:286
#20 0x00000000802618e4 in virtio_ccw_busdev_unplug (dev=0x809b4640) at
qemu/hw/s390x/virtio-ccw.c:1582
#21 0x00000000800e27ca in qdev_unplug (dev=0x809b4640,
errp=0x3ffff9c9820) at qemu/hw/core/qdev.c:219
#22 0x000000008016cc9c in qmp_device_del (id=0x809c0490 "virtio-disk3",
errp=0x3ffff9c9820) at qemu/qdev-monitor.c:658
#23 0x000000008018da1e in qmp_marshal_input_device_del (mon=0x8090b2f0,
qdict=0x80987cd0, ret=0x3ffff9c9920) at qmp-marshal.c:2998
#24 0x0000000080293a9a in qmp_call_cmd (mon=0x8090b2f0, cmd=0x803824a0
<qmp_cmds+640>, params=0x80987cd0)
at qemu/monitor.c:4501
#25 0x0000000080293d90 in handle_qmp_command (parser=0x8090b378,
tokens=0x8090a4f0) at /scsi/qemu/monitor.c:4567
#26 0x000000008030607c in json_message_process_token (lexer=0x8090b380,
token=0x809a3780, type=JSON_OPERATOR, x=76, y=21)
at qemu/qobject/json-streamer.c:87
#27 0x000000008032b5b6 in json_lexer_feed_char (lexer=0x8090b380, ch=125
'}', flush=false) at qemu/qobject/json-lexer.c:303
#28 0x000000008032b844 in json_lexer_feed (lexer=0x8090b380,
buffer=0x3ffff9c9fa8 "}[B\377\377\234\237\060", size=1)
at qemu/qobject/json-lexer.c:356
#29 0x000000008030622e in json_message_parser_feed (parser=0x8090b378,
buffer=0x3ffff9c9fa8 "}[B\377\377\234\237\060", size=1)
at qemu/qobject/json-streamer.c:110
#30 0x0000000080293eb0 in monitor_control_read (opaque=0x8090b2f0,
buf=0x3ffff9c9fa8 "}[B\377\377\234\237\060", size=1)
at qemu/monitor.c:4588
#31 0x000000008016d3fe in qemu_chr_be_write (s=0x808f4120,
buf=0x3ffff9c9fa8 "}[B\377\377\234\237\060", len=1)
at qemu/qemu-char.c:165
#32 0x0000000080173380 in tcp_chr_read (chan=0x809783d0, cond=G_IO_IN,
opaque=0x808f4120) at qemu/qemu-char.c:2509
#33 0x000003fffd48f05a in g_main_context_dispatch () from
/lib64/libglib-2.0.so.0
#34 0x0000000080126bb8 in glib_pollfds_poll () at /scsi/qemu/main-loop.c:188
#35 0x0000000080126d16 in os_host_main_loop_wait (timeout=4294967295) at
qemu/main-loop.c:233
#36 0x0000000080126e06 in main_loop_wait (nonblocking=0) at
qemu/main-loop.c:465
#37 0x00000000801eccea in main_loop () at qemu/vl.c:2086
#38 0x00000000801f539c in main (argc=43, argv=0x3ffff9cb938,
envp=0x3ffff9cba98) at qemu/vl.c:4444
>
next prev parent reply other threads:[~2014-04-03 15:21 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-01 13:55 [Qemu-devel] drive_del vs. device_del: what should come first? Heinz Graalfs
2014-04-01 15:48 ` Markus Armbruster
2014-04-02 14:25 ` Heinz Graalfs
2014-04-02 17:40 ` Markus Armbruster
2014-04-03 15:20 ` Heinz Graalfs [this message]
2014-04-11 12:47 ` Heinz Graalfs
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=533D7C59.9000703@linux.vnet.ibm.com \
--to=graalfs@linux.vnet.ibm.com \
--cc=armbru@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).