From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35775) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WYaro-0003Fw-AE for qemu-devel@nongnu.org; Fri, 11 Apr 2014 08:47:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WYarf-0003oW-66 for qemu-devel@nongnu.org; Fri, 11 Apr 2014 08:47:36 -0400 Received: from e06smtp12.uk.ibm.com ([195.75.94.108]:46157) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WYare-0003oA-QU for qemu-devel@nongnu.org; Fri, 11 Apr 2014 08:47:27 -0400 Received: from /spool/local by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 11 Apr 2014 13:47:24 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by d06dlp03.portsmouth.uk.ibm.com (Postfix) with ESMTP id 866EA1B08051 for ; Fri, 11 Apr 2014 13:47:22 +0100 (BST) Received: from d06av01.portsmouth.uk.ibm.com (d06av01.portsmouth.uk.ibm.com [9.149.37.212]) by b06cxnps4075.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s3BClLpG61800634 for ; Fri, 11 Apr 2014 12:47:21 GMT Received: from d06av01.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av01.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s3BClLI8029102 for ; Fri, 11 Apr 2014 06:47:21 -0600 Message-ID: <5347E458.6090001@linux.vnet.ibm.com> Date: Fri, 11 Apr 2014 14:47:20 +0200 From: Heinz Graalfs MIME-Version: 1.0 References: <1396360537-12520-1-git-send-email-graalfs@linux.vnet.ibm.com> <87eh1h8304.fsf@blackfin.pond.sub.org> <533C1DC5.6050605@linux.vnet.ibm.com> <87d2gzpr46.fsf@blackfin.pond.sub.org> In-Reply-To: <87d2gzpr46.fsf@blackfin.pond.sub.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] drive_del vs. device_del: what should come first? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: Cornelia Huck , Christian Borntraeger , qemu-devel@nongnu.org Hello Markus, I finally managed to reproduce the problem, at least once ... The scenario was: dd if=/dev/vdx1 of=partitionone followed by a virsh detach... (with the device_del() under the cover) during active dd processing dmesg shows: [79026.220718] User process fault: interruption code 0x40009 in qemu-system-s390x[80000000+43c000] [79026.220737] CPU: 3 PID: 20117 Comm: qemu-system-s39 Not tainted 3.10.33+ #4 [79026.220742] task: 0000000085930000 ti: 00000000e0b74000 task.ti: 00000000e0b74000 [79026.220746] User PSW : 0705000180000000 0000000080269722 (0x80269722) [79026.220774] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:0 PM:0 EA:3 User GPRS: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [79026.220827] 0000000000000000 0000000000000000 00000000803761dc 00000000809001d0 [79026.220831] 000003fffd0391a8 00000000808ff560 00000000808eb020 000003ffffc03838 [79026.220834] 0000000000000000 0000000080375e88 00000000802696f6 000003ffffc03838 [79026.220847] User Code: 0000000080269716: b9040034 lgr %r3,%r4 000000008026971a: a7290000 lghi %r2,0 #000000008026971e: b9870021 dlgr %r2,%r1 >0000000080269722: b9040012 lgr %r1,%r2 0000000080269726: 5010b0a0 st %r1,160(%r11) 000000008026972a: 5820b0a0 l %r2,160(%r11) 000000008026972e: e310b0a80004 lg %r1,168(%r11) 0000000080269734: 58101000 l %r1,0(%r1) [79026.220875] Last Breaking-Event-Address: [79026.220878] [<000000008026904c>] 0x8026904c with PSW addr: 0000000080269722 pointing into virtio_queue_empty ... 000000008026a67c t virtio_notify_vector 00000000802694a4 T virtio_queue_empty 000000008026b440 T virtio_queue_get_addr ... and coming from: R14 00000000802696f6 virtqueue_fill ... 0000000080269f74 T virtqueue_avail_bytes 0000000080269540 T virtqueue_fill 000000008026979c T virtqueue_flush 0000000080269c2c T virtqueue_get_avail_bytes ... see below the complete core_backtrace ... On 02/04/14 19:40, Markus Armbruster wrote: > Heinz Graalfs writes: > >> On 01/04/14 17:48, Markus Armbruster wrote: >>> Heinz Graalfs writes: >>> >>>> Hi Kevin, >>>> >>>> doing a >>>> >>>> virsh detach-device ... >>>> >>>> ends up in the following QEMU monitor commands: >>>> >>>> 1. device_del ... >>>> 2. drive_del ... >>>> >>>> qmp_device_del() performs the device unplug path. >>>> In case of a block device do_drive_del() tries to >>>> prevent further IO against the host device. >>>> >>>> However, bdrv_find() during drive_del() results in >>>> an error, because the device is already gone. Due to >>>> this error all the bdrv_xxx calls to quiesce the block >>>> driver as well as all other processing is skipped. >>>> >>>> Is the sequence that libvirt triggers OK? >>>> Shouldn't drive_del be executed first? >>> >>> No. >> >> OK, I see. The drive is deleted implicitly (release_drive()). >> Doing a device_del() requires another drive_add() AND device_add(). > > Yes. The automatic drive delete on unplug is a wart that will not be > preserved in the new interfaces we're building now. > >> (Doing just a device_add() complains about the missing drive. >> A subsequent info qtree lets QEMU abort.) > > Really? Got a reproducer for me? > >>> drive_del is nasty. Its purpose is to revoke access to an image even >>> when the guest refuses to cooperate. To the guest, this looks like >>> hardware failure. >> >> Deleting a device during active IO is nasty and it should look like a >> hardware failure. I would expect lots of errors. >> >>> >>> If you drive_del before device_del, even a perfectly well-behaved guest >>> guest is exposed to a terminally broken device between drive_del and >>> completion of unplug. >> >> The early drive_del() would mean that no further IO would be >> possible. > > A polite way to describe the effect of drive_del on the guest. > > drive_del makes all attempts at actual I/O error out without any > forewarning whatsoever. If you do that to your guest, and something > breaks, you get to keep the pieces. Even if you "only" do it for a > short period of time. > >>> Always try a device_del first, and only if that does not complete within >>> reasonable time, and you absolutely must revoke access to the image, >>> then whack it over the head with drive_del. >> >> What is this reasonable time? > > Depends on how heavily loaded host and guest are. Suggest to measure > with a suitably thrashing guest, then shift left to taste. > >> On 390 we experience problems (QEMU abort) when asynch block IO >> completes and the virtqueues are already gone. I suppose the >> bdrv_drain_all() in bdrv_close() is a little late. I don't see such >> problems with an early bdrv_drain_all() (drive_del) and an unplug >> (device_del) afterwards. > > Sounds like a bug. Got a reproducer? > here is the complete core_backtrace: 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x269722 virtqueue_fill /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x269924 virtqueue_push /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x23a400 virtio_blk_req_complete /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x23a6a4 virtio_blk_rw_complete /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x34686 bdrv_co_em_bh /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xcd40 aio_bh_poll /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xc7f0 aio_poll /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2ac9c bdrv_drain_all /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2a718 bdrv_close /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2b97e bdrv_delete /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x36db4 bdrv_unref /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xa02ea drive_uninit /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xa0448 drive_put_ref /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x9fae6 blockdev_auto_del /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xd0bf0 release_drive /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18c158 object_property_del_all /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18c4e4 object_finalize /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18d6d2 object_unref /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18c382 object_unparent /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x25ca14 virtio_ccw_busdev_unplug /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xd6c88 qdev_unplug /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x15c5f0 qmp_device_del /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x182b92 qmp_marshal_input_device_del /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x28fa36 qmp_call_cmd /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x28fd2c handle_qmp_command /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2fff3c json_message_process_token /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x32adfe json_lexer_feed_char /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x32b08c json_lexer_feed /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x3000ee json_message_parser_feed /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x28fe4c monitor_control_read /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x15cd66 qemu_chr_be_write /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x162bfe tcp_chr_read /test/qemu/build/s390x-softmmu/qemu-system-s390x - b8d9d8c328fccab2bc69bfa6d17bcb2f46dfb163 0x4ff58 g_main_context_dispatch /lib64/libglib-2.0.so.0 - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x10b654 glib_pollfds_poll /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x10b7e0 os_host_main_loop_wait /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x10b934 main_loop_wait /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x1e6a7e main_loop /test/qemu/build/s390x-softmmu/qemu-system-s390x - 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x1ef642 main /test/qemu/build/s390x-softmmu/qemu-system-s390x - I suppose that asynchronous block IO processing completes after the guest already did the virtqueue cleanup. Both, machine check injection and device release processing are triggered in the unplug callback virtio_ccw_busdev_unplug(). I suppose the device release processing (with its wait for asynchronous processing to complete) happens too late. What do you think? Thanks, Heinz