From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54687) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WVjS3-0008RH-WF for qemu-devel@nongnu.org; Thu, 03 Apr 2014 11:21:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WVjRu-0002EX-7W for qemu-devel@nongnu.org; Thu, 03 Apr 2014 11:21:11 -0400 Received: from e06smtp16.uk.ibm.com ([195.75.94.112]:43253) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WVjRt-0002Di-UZ for qemu-devel@nongnu.org; Thu, 03 Apr 2014 11:21:02 -0400 Received: from /spool/local by e06smtp16.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 3 Apr 2014 16:21:00 +0100 Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by d06dlp03.portsmouth.uk.ibm.com (Postfix) with ESMTP id AE4911B08051 for ; Thu, 3 Apr 2014 16:20:55 +0100 (BST) Received: from d06av06.portsmouth.uk.ibm.com (d06av06.portsmouth.uk.ibm.com [9.149.37.217]) by b06cxnps4076.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s33FKlLS49610866 for ; Thu, 3 Apr 2014 15:20:47 GMT Received: from d06av06.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av06.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s33GKwcO021737 for ; Thu, 3 Apr 2014 10:20:58 -0600 Message-ID: <533D7C59.9000703@linux.vnet.ibm.com> Date: Thu, 03 Apr 2014 17:20:57 +0200 From: Heinz Graalfs MIME-Version: 1.0 References: <1396360537-12520-1-git-send-email-graalfs@linux.vnet.ibm.com> <87eh1h8304.fsf@blackfin.pond.sub.org> <533C1DC5.6050605@linux.vnet.ibm.com> <87d2gzpr46.fsf@blackfin.pond.sub.org> In-Reply-To: <87d2gzpr46.fsf@blackfin.pond.sub.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] drive_del vs. device_del: what should come first? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: qemu-devel@nongnu.org On 02/04/14 19:40, Markus Armbruster wrote: > Heinz Graalfs writes: > >> On 01/04/14 17:48, Markus Armbruster wrote: >>> Heinz Graalfs writes: >>> >>>> Hi Kevin, >>>> >>>> doing a >>>> >>>> virsh detach-device ... >>>> >>>> ends up in the following QEMU monitor commands: >>>> >>>> 1. device_del ... >>>> 2. drive_del ... >>>> >>>> qmp_device_del() performs the device unplug path. >>>> In case of a block device do_drive_del() tries to >>>> prevent further IO against the host device. >>>> >>>> However, bdrv_find() during drive_del() results in >>>> an error, because the device is already gone. Due to >>>> this error all the bdrv_xxx calls to quiesce the block >>>> driver as well as all other processing is skipped. >>>> >>>> Is the sequence that libvirt triggers OK? >>>> Shouldn't drive_del be executed first? >>> >>> No. >> >> OK, I see. The drive is deleted implicitly (release_drive()). >> Doing a device_del() requires another drive_add() AND device_add(). > > Yes. The automatic drive delete on unplug is a wart that will not be > preserved in the new interfaces we're building now. OK, thanks for the info. > >> (Doing just a device_add() complains about the missing drive. >> A subsequent info qtree lets QEMU abort.) > > Really? Got a reproducer for me? The problem seems to be gone on master. I was on 1.7.0 with the info qtree problem. > >>> drive_del is nasty. Its purpose is to revoke access to an image even >>> when the guest refuses to cooperate. To the guest, this looks like >>> hardware failure. >> >> Deleting a device during active IO is nasty and it should look like a >> hardware failure. I would expect lots of errors. >> >>> >>> If you drive_del before device_del, even a perfectly well-behaved guest >>> guest is exposed to a terminally broken device between drive_del and >>> completion of unplug. >> >> The early drive_del() would mean that no further IO would be >> possible. > > A polite way to describe the effect of drive_del on the guest. > > drive_del makes all attempts at actual I/O error out without any > forewarning whatsoever. If you do that to your guest, and something > breaks, you get to keep the pieces. Even if you "only" do it for a > short period of time. > >>> Always try a device_del first, and only if that does not complete within >>> reasonable time, and you absolutely must revoke access to the image, >>> then whack it over the head with drive_del. >> >> What is this reasonable time? > > Depends on how heavily loaded host and guest are. Suggest to measure > with a suitably thrashing guest, then shift left to taste. > >> On 390 we experience problems (QEMU abort) when asynch block IO >> completes and the virtqueues are already gone. I suppose the >> bdrv_drain_all() in bdrv_close() is a little late. I don't see such >> problems with an early bdrv_drain_all() (drive_del) and an unplug >> (device_del) afterwards. > > Sounds like a bug. Got a reproducer? Here I don't have a reproducer either. Just an old stacktrace, that occurred several times. It definitely depends on the asynch workload that is running during the device_del. I will try if I can reproduce the situation Program received signal SIGFPE, Arithmetic exception. 0x000000008026e48a in virtqueue_fill (vq=0x809bcad0, elem=0x3ffd80013d8, len=4097, idx=0) at qemu/hw/virtio/virtio.c:259 259 idx = (idx + vring_used_idx(vq)) % vq->vring.num; (gdb) bt #0 0x000000008026e48a in virtqueue_fill (vq=0x809bcad0, elem=0x3ffd80013d8, len=4097, idx=0) at qemu/hw/virtio/virtio.c:259 #1 0x000000008026e68c in virtqueue_push (vq=0x809bcad0, elem=0x3ffd80013d8, len=4097) at qemu/hw/virtio/virtio.c:283 #2 0x0000000080237962 in virtio_blk_req_complete (req=0x3ffd80013d0, status=0) at qemu/hw/block/virtio-blk.c:50 #3 0x0000000080237c08 in virtio_blk_rw_complete (opaque=0x3ffd80013d0, ret=0) at qemu/hw/block/virtio-blk.c:85 #4 0x0000000080032916 in bdrv_co_em_bh (opaque=0x3ffd800da50) at qemu/block.c:3847 #5 0x000000008000da0c in aio_bh_poll (ctx=0x808f3e00) at qemu/async.c:81 #6 0x000000008000d498 in aio_poll (ctx=0x808f3e00, blocking=true) at qemu/aio-posix.c:185 #7 0x0000000080126f56 in qemu_aio_wait () at qemu/main-loop.c:485 #8 0x000000008002a134 in bdrv_drain_all () at qemu/block.c:1449 #9 0x0000000080029e64 in bdrv_close (bs=0x809c1680) at qemu/block.c:1378 #10 0x000000008002ac56 in bdrv_delete (bs=0x809c1680) at qemu/block.c:1612 #11 0x000000008008a7fe in drive_uninit (dinfo=0x809aad00) at qemu/blockdev.c:216 #12 0x000000008008a95c in drive_put_ref (dinfo=0x809aad00) at qemu/blockdev.c:227 #13 0x000000008008a00e in blockdev_auto_del (bs=0x809c1680) at qemu/blockdev.c:106 #14 0x00000000800dc004 in release_drive (obj=0x809b4640, name=0x809b5080 "drive", opaque=0x8044d4b8 ) at qemu/hw/core/qdev-properties-system.c:90 #15 0x0000000080195dcc in object_property_del_all (obj=0x809b4640) at qemu/qom/object.c:343 #16 0x0000000080196158 in object_finalize (data=0x809b4640) at qemu/qom/object.c:397 #17 0x0000000080197322 in object_unref (obj=0x809b4640) at qemu/qom/object.c:695 #18 0x0000000080195ff6 in object_unparent (obj=0x809b4640) at qemu/qom/object.c:377 #19 0x00000000800e2bf6 in qdev_free (dev=0x809b4640) at qemu/hw/core/qdev.c:286 #20 0x00000000802618e4 in virtio_ccw_busdev_unplug (dev=0x809b4640) at qemu/hw/s390x/virtio-ccw.c:1582 #21 0x00000000800e27ca in qdev_unplug (dev=0x809b4640, errp=0x3ffff9c9820) at qemu/hw/core/qdev.c:219 #22 0x000000008016cc9c in qmp_device_del (id=0x809c0490 "virtio-disk3", errp=0x3ffff9c9820) at qemu/qdev-monitor.c:658 #23 0x000000008018da1e in qmp_marshal_input_device_del (mon=0x8090b2f0, qdict=0x80987cd0, ret=0x3ffff9c9920) at qmp-marshal.c:2998 #24 0x0000000080293a9a in qmp_call_cmd (mon=0x8090b2f0, cmd=0x803824a0 , params=0x80987cd0) at qemu/monitor.c:4501 #25 0x0000000080293d90 in handle_qmp_command (parser=0x8090b378, tokens=0x8090a4f0) at /scsi/qemu/monitor.c:4567 #26 0x000000008030607c in json_message_process_token (lexer=0x8090b380, token=0x809a3780, type=JSON_OPERATOR, x=76, y=21) at qemu/qobject/json-streamer.c:87 #27 0x000000008032b5b6 in json_lexer_feed_char (lexer=0x8090b380, ch=125 '}', flush=false) at qemu/qobject/json-lexer.c:303 #28 0x000000008032b844 in json_lexer_feed (lexer=0x8090b380, buffer=0x3ffff9c9fa8 "}[B\377\377\234\237\060", size=1) at qemu/qobject/json-lexer.c:356 #29 0x000000008030622e in json_message_parser_feed (parser=0x8090b378, buffer=0x3ffff9c9fa8 "}[B\377\377\234\237\060", size=1) at qemu/qobject/json-streamer.c:110 #30 0x0000000080293eb0 in monitor_control_read (opaque=0x8090b2f0, buf=0x3ffff9c9fa8 "}[B\377\377\234\237\060", size=1) at qemu/monitor.c:4588 #31 0x000000008016d3fe in qemu_chr_be_write (s=0x808f4120, buf=0x3ffff9c9fa8 "}[B\377\377\234\237\060", len=1) at qemu/qemu-char.c:165 #32 0x0000000080173380 in tcp_chr_read (chan=0x809783d0, cond=G_IO_IN, opaque=0x808f4120) at qemu/qemu-char.c:2509 #33 0x000003fffd48f05a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #34 0x0000000080126bb8 in glib_pollfds_poll () at /scsi/qemu/main-loop.c:188 #35 0x0000000080126d16 in os_host_main_loop_wait (timeout=4294967295) at qemu/main-loop.c:233 #36 0x0000000080126e06 in main_loop_wait (nonblocking=0) at qemu/main-loop.c:465 #37 0x00000000801eccea in main_loop () at qemu/vl.c:2086 #38 0x00000000801f539c in main (argc=43, argv=0x3ffff9cb938, envp=0x3ffff9cba98) at qemu/vl.c:4444 >