From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:44090) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T1Zma-0004cS-79 for qemu-devel@nongnu.org; Wed, 15 Aug 2012 05:21:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T1ZmW-0002Dd-2k for qemu-devel@nongnu.org; Wed, 15 Aug 2012 05:20:56 -0400 Received: from e23smtp08.au.ibm.com ([202.81.31.141]:43922) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T1ZmV-00027v-8t for qemu-devel@nongnu.org; Wed, 15 Aug 2012 05:20:51 -0400 Received: from /spool/local by e23smtp08.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 15 Aug 2012 19:20:29 +1000 Received: from d23av02.au.ibm.com (d23av02.au.ibm.com [9.190.235.138]) by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q7F9C79Z13566114 for ; Wed, 15 Aug 2012 19:12:08 +1000 Received: from d23av02.au.ibm.com (loopback [127.0.0.1]) by d23av02.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q7F9KewX015677 for ; Wed, 15 Aug 2012 19:20:40 +1000 Date: Wed, 15 Aug 2012 14:52:05 +0530 From: Bharata B Rao Message-ID: <20120815092205.GM24944@in.ibm.com> References: <20120809130010.GA7960@in.ibm.com> <20120809130216.GC7960@in.ibm.com> <5028F815.40309@redhat.com> <20120814043801.GB24944@in.ibm.com> <502A0C66.3060107@redhat.com> <20120815052103.GJ24944@in.ibm.com> <502B571B.2090407@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <502B571B.2090407@redhat.com> Subject: Re: [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU block backend Reply-To: bharata@linux.vnet.ibm.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: Anthony Liguori , Anand Avati , Stefan Hajnoczi , Vijay Bellur , Amar Tumballi , qemu-devel@nongnu.org, Blue Swirl , Paolo Bonzini On Wed, Aug 15, 2012 at 10:00:27AM +0200, Kevin Wolf wrote: > Am 15.08.2012 07:21, schrieb Bharata B Rao: > > On Tue, Aug 14, 2012 at 10:29:26AM +0200, Kevin Wolf wrote: > >>>>> +static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret, void *arg) > >>>>> +{ > >>>>> + GlusterAIOCB *acb = (GlusterAIOCB *)arg; > >>>>> + BDRVGlusterState *s = acb->common.bs->opaque; > >>>>> + > >>>>> + acb->ret = ret; > >>>>> + if (qemu_gluster_send_pipe(s, acb) < 0) { > >>>>> + /* > >>>>> + * Gluster AIO callback thread failed to notify the waiting > >>>>> + * QEMU thread about IO completion. Nothing much can be done > >>>>> + * here but to abruptly abort. > >>>>> + * > >>>>> + * FIXME: Check if the read side of the fd handler can somehow > >>>>> + * be notified of this failure paving the way for a graceful exit. > >>>>> + */ > >>>>> + error_report("Gluster failed to notify QEMU about IO completion"); > >>>>> + abort(); > >>>> > >>>> In the extreme case you may choose to make this disk inaccessible > >>>> (something like bs->drv = NULL), but abort() kills the whole VM and > >>>> should only be called when there is a bug. > >>> > >>> There have been concerns raised about this earlier too. I settled for this > >>> since I couldn't see a better way out and I could see the precedence > >>> for this in posix-aio-compat.c > >>> > >>> So I could just do the necessary cleanup, set bs->drv to NULL and return from > >>> here ? But how do I wake up the QEMU thread that is waiting on the read side > >>> of the pipe ? W/o that, the QEMU thread that waits on the read side of the > >>> pipe is still hung. > >> > >> There is no other thread. But you're right, you should probably > >> unregister the aio_fd_handler and any other pending callbacks. > > > > As I clarified in the other mail, this (gluster_finish_aiocb) is called > > from gluster thread context and hence QEMU thread that raised the original > > read/write request is still blocked on qemu_aio_wait(). > > > > I tried the following cleanup instead of abrupt abort: > > > > close(read_fd); /* This will wake up the QEMU thread blocked on select(read_fd...) */ > > close(write_fd); > > qemu_aio_set_fd_handler(read_fd, NULL, NULL, NULL, NULL); > > qemu_aio_release(acb); > > s->qemu_aio_count--; > > bs->drv = NULL; > > > > I tested this by manually injecting faults into qemu_gluster_send_pipe(). > > With the above cleanup, the guest kernel crashes with IO errors. > > What does "crash" really mean? IO errors certainly shouldn't cause a > kernel to crash? Since an IO failed, it resulted in root file system corruption which subsequently led to a panic. [ 1.529042] dracut: Switching root qemu-system-x86_64: Gluster failed to notify QEMU about IO completion qemu-system-x86_64: Gluster failed to notify QEMU about IO completion qemu-system-x86_64: Gluster failed to notify QEMU about IO completion qemu-system-x86_64: Gluster failed to notify QEMU about IO completion [ 1.584130] end_request: I/O error, dev vda, sector 13615224 [ 1.585119] end_request: I/O error, dev vda, sector 13615344 [ 1.585119] end_request: I/O error, dev vda, sector 13615352 [ 1.585119] end_request: I/O error, dev vda, sector 13615360 [ 1.593188] end_request: I/O error, dev vda, sector 1030144 [ 1.594169] Buffer I/O error on device vda3, logical block 0 [ 1.594169] lost page write due to I/O error on vda3 [ 1.594169] EXT4-fs error (device vda3): __ext4_get_inode_loc:3539: inode #392441: block 1573135: comm systemd: unable to read itable block [...] [ 1.620064] EXT4-fs error (device vda3): __ext4_get_inode_loc:3539: inode #392441: block 1573135: comm systemd: unable to read itable block /usr/lib/systemd/systemd: error while loading shared libraries: libselinux.so.1: cannot open shared object file: Input/output error [ 1.626193] Kernel panic - not syncing: Attempted to kill init! [ 1.627789] Pid: 1, comm: systemd Not tainted 3.3.4-5.fc17.x86_64 #1 [ 1.630063] Call Trace: [ 1.631120] [] panic+0xba/0x1c6 [ 1.632477] [] do_exit+0x8b1/0x8c0 [ 1.633851] [] do_group_exit+0x3f/0xa0 [ 1.635258] [] sys_exit_group+0x17/0x20 [ 1.636619] [] system_call_fastpath+0x16/0x1b > > > Is there anything else that I need to do or do differently to retain the > > VM running w/o disk access ? > > > > I thought of completing the aio callback by doing > > acb->common.cb(acb->common.opaque, -EIO); > > but that would do a coroutine enter from gluster thread, which I don't think > > should be done. > > You would have to take the global qemu mutex at least. I agree it's not > a good thing to do. So is it really worth doing all this to handle this unlikely error ? The chances of this error happening is quite remote I believe. Regards, Bharata.