From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:53198) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R767i-0007fU-9g for qemu-devel@nongnu.org; Fri, 23 Sep 2011 09:49:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1R767d-0008MV-Ni for qemu-devel@nongnu.org; Fri, 23 Sep 2011 09:49:02 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36290) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R767d-0008MO-CM for qemu-devel@nongnu.org; Fri, 23 Sep 2011 09:48:57 -0400 Date: Fri, 23 Sep 2011 10:48:53 -0300 From: Luiz Capitulino Message-ID: <20110923104853.6f18acf2@doriath> In-Reply-To: References: <1314902275-5240-1-git-send-email-lcapitulino@redhat.com> <4E774636.7050500@redhat.com> <20110919110945.5a9cb43c@doriath> <4E7751B8.7040903@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v1 0/6]: block: Add I/O status support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: Kevin Wolf , aliguori@us.ibm.com, qemu-devel@nongnu.org On Fri, 23 Sep 2011 10:55:39 +0200 Markus Armbruster wrote: > Kevin Wolf writes: > > > Am 19.09.2011 16:09, schrieb Luiz Capitulino: > >> On Mon, 19 Sep 2011 15:40:06 +0200 > >> Kevin Wolf wrote: > >> > >>> Am 01.09.2011 20:37, schrieb Luiz Capitulino: > >>>> This series adds support to the block layer to keep track of devices' > >>>> I/O status. That information is also made available in QMP and HMP. > >>>> > >>>> The goal here is to allow management applications that miss the > >>>> BLOCK_IO_ERROR event to able to query the VM to determine if any device has > >>>> caused the VM to stop and which device caused it. > >>>> > >>>> Please, note that this series depends on the following series: > >>>> > >>>> o [PATCH v3 0/8]: Introduce the RunState type > >>>> o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00118.html > >>>> > >>>> And to be able to properly test it you'll also need: > >>>> > >>>> o [PATCH 0/3] qcow2/coroutine fixes > >>>> o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00074.html > >>>> > >>>> Here's an HMP example: > >>>> > >>>> (qemu) info status > >>>> VM status: paused (io-error) > >>>> (qemu) info block > >>>> ide0-hd0: removable=0 io-status=ok file=disks/test2.img ro=0 drv=qcow2 encrypted=0 > >>>> ide0-hd1: removable=0 io-status=nospace file=/dev/vg_doriath/kvmtest ro=0 drv=qcow2 encrypted=0 > >>>> ide1-cd0: removable=1 locked=0 io-status=ok [not inserted] > >>>> floppy0: removable=1 locked=0 [not inserted] > >>>> sd0: removable=1 locked=0 [not inserted] > >>>> > >>>> The "info status" command shows that the VM is stopped due to an I/O error. > >>>> By issuing "info block" it's possible to determine that the 'ide0-hd1' device > >>>> caused the error, which turns out to be due to no space. > >>> > >>> Looks like I didn't reply here yet? > >> > >> No, you didn't. > >> > >>> I still don't quite like that the devices are involved, but their part > >>> is minimal and it makes the implementation much easier, so for me that's > >>> acceptable. > >> > >> Suggestions on better ways of implementing this are welcome! :) > > > > I don't have one. :-) > > > > Surely it would be possible to do everything in block.c, but I see that > > this would make things much more complicated (would need to get an AIO > > callback added to the chain that can save the error code). It's not > > worth the trouble. > > The block layer necessarily detects errors in a huge number of places. > > It also has a rich and complex interface to device models, with error > reporting in many places. > > virtio-blk, ide and scsi-disk each have a single error handler to handle > block layer errors. > > For better or worse, these are the only block layer error "chokepoints" > we have now, and therefore the only easy places to set BDS I/O status. > But they're still wrong. > > The block layer shouldn't require device models to tell it what it > already knows. > > Now, I don't want to block your series just because it adds this wart. > But the wart should be documented in the code. Is something like: /* XXX: this should be implemented in the block layer */ good enough? > Also document that any device model implementing werror=stop|enospc or > rerror=stop must update I/O status. Extra points if you find a way to > enforce it instead. Why? It's probably a nice thing because we can know which device caused the VM to stop, but I don't see it as a must have (it's probably a no brainer to update the I/O status though). > > Apropos enforcing. Currently, -drive accepts any werror and rerror > action with if={ide,virtio,scsi,none}. We rely on device models not > implementing an action to check and fail during initialization. > scsi-generic and the floppy devices do. All the other qdevified devices > don't, and that's broken. I think I can fix that, but this series doesn't depend on that, right? (in the meaning that we could merge this before getting that problem fixed). > > Are werror and rerror really host state? >