qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] drive_del vs. device_del: what should come first?
Date: Fri, 11 Apr 2014 14:47:20 +0200	[thread overview]
Message-ID: <5347E458.6090001@linux.vnet.ibm.com> (raw)
In-Reply-To: <87d2gzpr46.fsf@blackfin.pond.sub.org>

Hello Markus,

I finally managed to reproduce the problem, at least once ...

The scenario was:
dd if=/dev/vdx1 of=partitionone

followed by a virsh detach... (with the device_del() under the cover)
during active dd processing

dmesg shows:

[79026.220718] User process fault: interruption code 0x40009 in 
qemu-system-s390x[80000000+43c000]
[79026.220737] CPU: 3 PID: 20117 Comm: qemu-system-s39 Not tainted 
3.10.33+ #4
[79026.220742] task: 0000000085930000 ti: 00000000e0b74000 task.ti: 
00000000e0b74000
[79026.220746] User PSW : 0705000180000000 0000000080269722 (0x80269722)
[79026.220774]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:0 
PM:0 EA:3
User GPRS: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[79026.220827]            0000000000000000 0000000000000000 
00000000803761dc 00000000809001d0
[79026.220831]            000003fffd0391a8 00000000808ff560 
00000000808eb020 000003ffffc03838
[79026.220834]            0000000000000000 0000000080375e88 
00000000802696f6 000003ffffc03838
[79026.220847] User Code: 0000000080269716: b9040034		lgr	%r3,%r4
            000000008026971a: a7290000		lghi	%r2,0
           #000000008026971e: b9870021		dlgr	%r2,%r1
           >0000000080269722: b9040012		lgr	%r1,%r2
            0000000080269726: 5010b0a0		st	%r1,160(%r11)
            000000008026972a: 5820b0a0		l	%r2,160(%r11)
            000000008026972e: e310b0a80004	lg	%r1,168(%r11)
            0000000080269734: 58101000		l	%r1,0(%r1)
[79026.220875] Last Breaking-Event-Address:
[79026.220878]  [<000000008026904c>] 0x8026904c

with
PSW addr: 0000000080269722 pointing into virtio_queue_empty

...
000000008026a67c t virtio_notify_vector
00000000802694a4 T virtio_queue_empty
000000008026b440 T virtio_queue_get_addr
...

and coming from:
R14 	00000000802696f6 virtqueue_fill
...
0000000080269f74 T virtqueue_avail_bytes
0000000080269540 T virtqueue_fill
000000008026979c T virtqueue_flush
0000000080269c2c T virtqueue_get_avail_bytes
...

see below the complete core_backtrace ...

On 02/04/14 19:40, Markus Armbruster wrote:
> Heinz Graalfs <graalfs@linux.vnet.ibm.com> writes:
>
>> On 01/04/14 17:48, Markus Armbruster wrote:
>>> Heinz Graalfs <graalfs@linux.vnet.ibm.com> writes:
>>>
>>>> Hi Kevin,
>>>>
>>>> doing a
>>>>
>>>>        virsh detach-device ...
>>>>
>>>> ends up in the following QEMU monitor commands:
>>>>
>>>> 1. device_del ...
>>>> 2. drive_del ...
>>>>
>>>> qmp_device_del() performs the device unplug path.
>>>> In case of a block device do_drive_del() tries to
>>>> prevent further IO against the host device.
>>>>
>>>> However, bdrv_find() during drive_del() results in
>>>> an error, because the device is already gone. Due to
>>>> this error all the bdrv_xxx calls to quiesce the block
>>>> driver as well as all other processing is skipped.
>>>>
>>>> Is the sequence that libvirt triggers OK?
>>>> Shouldn't drive_del be executed first?
>>>
>>> No.
>>
>> OK, I see. The drive is deleted implicitly (release_drive()).
>> Doing a device_del() requires another drive_add() AND device_add().
>
> Yes.  The automatic drive delete on unplug is a wart that will not be
> preserved in the new interfaces we're building now.
>
>> (Doing just a device_add() complains about the missing drive.
>> A subsequent info qtree lets QEMU abort.)
>
> Really?  Got a reproducer for me?
>
>>> drive_del is nasty.  Its purpose is to revoke access to an image even
>>> when the guest refuses to cooperate.  To the guest, this looks like
>>> hardware failure.
>>
>> Deleting a device during active IO is nasty and it should look like a
>> hardware failure. I would expect lots of errors.
>>
>>>
>>> If you drive_del before device_del, even a perfectly well-behaved guest
>>> guest is exposed to a terminally broken device between drive_del and
>>> completion of unplug.
>>
>> The early drive_del() would mean that no further IO would be
>> possible.
>
> A polite way to describe the effect of drive_del on the guest.
>
> drive_del makes all attempts at actual I/O error out without any
> forewarning whatsoever.  If you do that to your guest, and something
> breaks, you get to keep the pieces.  Even if you "only" do it for a
> short period of time.
>
>>> Always try a device_del first, and only if that does not complete within
>>> reasonable time, and you absolutely must revoke access to the image,
>>> then whack it over the head with drive_del.
>>
>> What is this reasonable time?
>
> Depends on how heavily loaded host and guest are.  Suggest to measure
> with a suitably thrashing guest, then shift left to taste.
>
>> On 390 we experience problems (QEMU abort) when asynch block IO
>> completes and the virtqueues are already gone. I suppose the
>> bdrv_drain_all() in bdrv_close() is a little late. I don't see such
>> problems with an early bdrv_drain_all() (drive_del) and an unplug
>> (device_del) afterwards.
>
> Sounds like a bug.  Got a reproducer?
>


here is the complete core_backtrace:

7d40b30ece2f256dfd8600446a9da7b0a1962730 0x269722 virtqueue_fill 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x269924 virtqueue_push 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x23a400 
virtio_blk_req_complete /test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x23a6a4 virtio_blk_rw_complete 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x34686 bdrv_co_em_bh 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0xcd40 aio_bh_poll 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0xc7f0 aio_poll 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2ac9c bdrv_drain_all 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2a718 bdrv_close 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2b97e bdrv_delete 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x36db4 bdrv_unref 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0xa02ea drive_uninit 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0xa0448 drive_put_ref 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x9fae6 blockdev_auto_del 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0xd0bf0 release_drive 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18c158 
object_property_del_all /test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18c4e4 object_finalize 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18d6d2 object_unref 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18c382 object_unparent 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x25ca14 
virtio_ccw_busdev_unplug /test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0xd6c88 qdev_unplug 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x15c5f0 qmp_device_del 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x182b92 
qmp_marshal_input_device_del 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x28fa36 qmp_call_cmd 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x28fd2c handle_qmp_command 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2fff3c 
json_message_process_token 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x32adfe json_lexer_feed_char 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x32b08c json_lexer_feed 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x3000ee 
json_message_parser_feed /test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x28fe4c monitor_control_read 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x15cd66 qemu_chr_be_write 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x162bfe tcp_chr_read 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
b8d9d8c328fccab2bc69bfa6d17bcb2f46dfb163 0x4ff58 g_main_context_dispatch 
/lib64/libglib-2.0.so.0 -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x10b654 glib_pollfds_poll 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x10b7e0 os_host_main_loop_wait 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x10b934 main_loop_wait 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x1e6a7e main_loop 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -
7d40b30ece2f256dfd8600446a9da7b0a1962730 0x1ef642 main 
/test/qemu/build/s390x-softmmu/qemu-system-s390x -

I suppose that asynchronous block IO processing completes after the
guest already did the virtqueue cleanup.
Both, machine check injection and device release processing are
triggered in the unplug callback virtio_ccw_busdev_unplug().
I suppose the device release processing (with its wait for asynchronous 
processing to complete) happens too late.

What do you think?

Thanks,
Heinz

      parent reply	other threads:[~2014-04-11 12:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-01 13:55 [Qemu-devel] drive_del vs. device_del: what should come first? Heinz Graalfs
2014-04-01 15:48 ` Markus Armbruster
2014-04-02 14:25   ` Heinz Graalfs
2014-04-02 17:40     ` Markus Armbruster
2014-04-03 15:20       ` Heinz Graalfs
2014-04-11 12:47       ` Heinz Graalfs [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5347E458.6090001@linux.vnet.ibm.com \
    --to=graalfs@linux.vnet.ibm.com \
    --cc=armbru@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).