From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:57304) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QULSW-00051s-He for qemu-devel@nongnu.org; Wed, 08 Jun 2011 12:18:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QULSU-0003Yh-PN for qemu-devel@nongnu.org; Wed, 08 Jun 2011 12:18:20 -0400 Received: from mail-gx0-f173.google.com ([209.85.161.173]:55909) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QULST-0003YQ-Lw for qemu-devel@nongnu.org; Wed, 08 Jun 2011 12:18:18 -0400 Received: by gxk26 with SMTP id 26so328303gxk.4 for ; Wed, 08 Jun 2011 09:18:17 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <4DEF90E2.5080202@sundar.org> References: <20110606165536.581119615@amt.cnet> <20110606165823.855959321@amt.cnet> <4DEF90E2.5080202@sundar.org> Date: Wed, 8 Jun 2011 17:18:16 +0100 Message-ID: From: Stefan Hajnoczi Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [patch 6/7] QEMU live block copy List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jagane Sundar Cc: "kwolf@redhat.com" , "dlaor@redhat.com" , "Jes.Sorensen@redhat.com" , Marcelo Tosatti , "qemu-devel@nongnu.org" , "avi@redhat.com" , "jdenemar@redhat.com" On Wed, Jun 8, 2011 at 4:10 PM, Jagane Sundar wrote: > On 6/7/2011 5:15 AM, Stefan Hajnoczi wrote: >> >> On Mon, Jun 6, 2011 at 5:55 PM, Marcelo Tosatti >> =A0wrote: >> >> I haven't reviewed this whole patch yet, but comments below. >> >> This patch, like image streaming, may hit deadlocks due to synchronous >> I/O emulation. =A0I discovered this problem when working on image >> streaming and it should be solved by getting rid of the asynchronous >> context concept. =A0The problem is that async I/O emulation will push a >> new context, preventing existing requests to complete until the >> current context is popped again. =A0If the image format has dependencies >> between requests (e.g. QED allocating writes are serialized), then >> this leads to deadlock because the new request cannot complete until >> the old one does, but the old one needs to wait for the context to be >> popped. =A0I think you are not affected by the QED allocating write case >> since the source image is only read, not written, by live block copy. >> But you might encounter this problem in other places. >> > Hello Stefan, > > Can you expand on this some more? I have similar concerns for Livebackup. > > At the beginning of your paragraph, =A0did you mean 'asynchronous I/O > emulation' instead of 'synchronous I/O emulation'? > > Also, I don't understand the 'stack' construct that you refer to. When yo= u > say 'push a new context', are you talking about what happens when a new > thread picks up a new async I/O req from the VM, and then proceeds to > execute the I/O req? What is this stack that you refer to? > > Any design documents, code snippets that I can look, other pointers welco= me. See async.c. There is synchronous I/O emulation in block.c for BlockDrivers that don't support .bdrv_read()/.bdrv_write() but only .bdrv_aio_readv()/.bdrv_aio_writev(). The way it works is that it pushes a new I/O context and then issues async I/O. Then it runs a special event loop waiting for that I/O to complete. After the I/O completes it pops the context again. The point of the context is that completions only get called for the current context. Therefore callers of the synchronous I/O functions don't need to worry that the state of the world might change during their "synchronous" operation - only their own I/O operation can complete. If a pending async I/O completes during synchronous I/O emulation, its callback is not invoked until the bottom half (BH) is called after the async context is popped. This guarantees that the synchronous operation and its caller have completed before I/O completion callbacks are invoked for pending async I/O. The problem is that QED allocating writes are serialized and cannot make progress if a pending request is unable to complete. Preventing the pending request from completing deadlocks QEMU. The same thing could happen in other places. I'm bringing this up in case anyone hits such a deadlock. I think we can solve this in the future by eliminating the async context concept. Kevin and I have discussed how that can be done but it's not possible or trivial to do yet. Stefan