From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37819) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XH1Yr-0004S4-Vz for qemu-devel@nongnu.org; Mon, 11 Aug 2014 22:11:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XH1Yl-0002QF-Q6 for qemu-devel@nongnu.org; Mon, 11 Aug 2014 22:11:41 -0400 Received: from [58.251.49.30] (port=45001 helo=mail.sangfor.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XH1Yk-0002Ox-DB for qemu-devel@nongnu.org; Mon, 11 Aug 2014 22:11:35 -0400 Date: Tue, 12 Aug 2014 10:09:08 +0800 From: "Zhang Haoyu" References: <53E87FD1.3070600@huawei.com>, <20140811142136.GA496@stefanha-thinkpad.redhat.com>, <20140812005853.GC6226@T430.nay.redhat.com> Message-ID: <201408121009071107044@sangfor.com> Mime-Version: 1.0 Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] the whole virtual machine hangs when IO does notcome back! List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng , Stefan Hajnoczi Cc: qemu-devel , Bin Wu >> > Hi, >> > >> > I tested the reliability of qemu in the IPSAN environment as follows: >> > (1) create one VM on a X86 server which is connected to an IPSAN, and the VM >> > has only one system volume which is on the IPSAN; >> > (2) disconnect the network between the server and the IPSAN. On the server, >> > I have a "multipath" software which can hold the IO for a long time >> > (configurable) when the network is disconnected; >> > (3) about 30 seconds later, the whole VM hangs there, nothing can be done to >> > the VM! >> > >> > Then, I used "gstack" tool to collect the stacks of all qemu threads, it >> > looked like: >> > >> > Thread 8 (Thread 0x7fd840bb5700 (LWP 6671)): >> > #0 0x00007fd84253a4f6 in poll () from /lib64/libc.so.6 >> > #1 0x00007fd84410ceff in aio_poll () >> > #2 0x00007fd84429bb05 in qemu_aio_wait () >> > #3 0x00007fd844120f51 in bdrv_drain_all () >> > #4 0x00007fd8441f1a4a in bmdma_cmd_writeb () >> > #5 0x00007fd8441f216e in bmdma_write () >> > #6 0x00007fd8443a93cf in memory_region_write_accessor () >> > #7 0x00007fd8443a94a6 in access_with_adjusted_size () >> > #8 0x00007fd8443a9901 in memory_region_iorange_write () >> > #9 0x00007fd8443a19bd in ioport_writeb_thunk () >> > #10 0x00007fd8443a13a8 in ioport_write () >> > #11 0x00007fd8443a1f55 in cpu_outb () >> > #12 0x00007fd8443a5b12 in kvm_handle_io () >> > #13 0x00007fd8443a64a9 in kvm_cpu_exec () >> > #14 0x00007fd844330962 in qemu_kvm_cpu_thread_fn () >> > #15 0x00007fd8427e77b6 in start_thread () from /lib64/libpthread.so.0 >> > #16 0x00007fd8425439cd in clone () from /lib64/libc.so.6 >> > #17 0x0000000000000000 in ?? () >> >> Use virtio-blk. Read, write, and flush are asynchronous in virtio-blk. >> >> Note that the QEMU monitor commands are typically synchronous so they >> will still block the VM. >> > >If some of the requests are dropped by host and never return to QEMU, I think >bdrv_drain_all() will still cause the hang. Even with virtio-blk, reset has >such a call. Maybe we could add some -ETIMEDOUT machanism in QEMU's block >layer. > >A workaround might be to configure the host storage to fail the IO after a >timeout. > If -ETIMEOUT returned after a short time network disconnection, may unpredicted fault happened in VM ? e.g., the VM was reading important data(like, system data). Does aio replay work for this case? Thanks, Zhang Haoyu >Fam