From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:53198)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lcapitulino@redhat.com>) id 1R767i-0007fU-9g
	for qemu-devel@nongnu.org; Fri, 23 Sep 2011 09:49:03 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <lcapitulino@redhat.com>) id 1R767d-0008MV-Ni
	for qemu-devel@nongnu.org; Fri, 23 Sep 2011 09:49:02 -0400
Received: from mx1.redhat.com ([209.132.183.28]:36290)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lcapitulino@redhat.com>) id 1R767d-0008MO-CM
	for qemu-devel@nongnu.org; Fri, 23 Sep 2011 09:48:57 -0400
Date: Fri, 23 Sep 2011 10:48:53 -0300
From: Luiz Capitulino <lcapitulino@redhat.com>
Message-ID: <20110923104853.6f18acf2@doriath>
In-Reply-To: <m3k48ziqno.fsf@blackfin.pond.sub.org>
References: <1314902275-5240-1-git-send-email-lcapitulino@redhat.com>
	<4E774636.7050500@redhat.com> <20110919110945.5a9cb43c@doriath>
	<4E7751B8.7040903@redhat.com>
	<m3k48ziqno.fsf@blackfin.pond.sub.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v1 0/6]: block: Add I/O status support
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Markus Armbruster <armbru@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, aliguori@us.ibm.com, qemu-devel@nongnu.org

On Fri, 23 Sep 2011 10:55:39 +0200
Markus Armbruster <armbru@redhat.com> wrote:

> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 19.09.2011 16:09, schrieb Luiz Capitulino:
> >> On Mon, 19 Sep 2011 15:40:06 +0200
> >> Kevin Wolf <kwolf@redhat.com> wrote:
> >> 
> >>> Am 01.09.2011 20:37, schrieb Luiz Capitulino:
> >>>> This series adds support to the block layer to keep track of devices'
> >>>> I/O status. That information is also made available in QMP and HMP.
> >>>>
> >>>> The goal here is to allow management applications that miss the
> >>>> BLOCK_IO_ERROR event to able to query the VM to determine if any device has
> >>>> caused the VM to stop and which device caused it.
> >>>>
> >>>> Please, note that this series depends on the following series:
> >>>>
> >>>>  o [PATCH v3 0/8]: Introduce the RunState type
> >>>>  o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00118.html
> >>>>
> >>>> And to be able to properly test it you'll also need:
> >>>>
> >>>>  o [PATCH 0/3] qcow2/coroutine fixes
> >>>>  o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00074.html
> >>>>
> >>>> Here's an HMP example:
> >>>>
> >>>>   (qemu) info status 
> >>>>   VM status: paused (io-error)
> >>>>   (qemu) info block
> >>>>   ide0-hd0: removable=0 io-status=ok file=disks/test2.img ro=0 drv=qcow2 encrypted=0
> >>>>   ide0-hd1: removable=0 io-status=nospace file=/dev/vg_doriath/kvmtest ro=0 drv=qcow2 encrypted=0
> >>>>   ide1-cd0: removable=1 locked=0 io-status=ok [not inserted]
> >>>>   floppy0: removable=1 locked=0 [not inserted]
> >>>>   sd0: removable=1 locked=0 [not inserted]
> >>>>
> >>>> The "info status" command shows that the VM is stopped due to an I/O error.
> >>>> By issuing "info block" it's possible to determine that the 'ide0-hd1' device
> >>>> caused the error, which turns out to be due to no space.
> >>>
> >>> Looks like I didn't reply here yet?
> >> 
> >> No, you didn't.
> >> 
> >>> I still don't quite like that the devices are involved, but their part
> >>> is minimal and it makes the implementation much easier, so for me that's
> >>> acceptable.
> >> 
> >> Suggestions on better ways of implementing this are welcome! :)
> >
> > I don't have one. :-)
> >
> > Surely it would be possible to do everything in block.c, but I see that
> > this would make things much more complicated (would need to get an AIO
> > callback added to the chain that can save the error code). It's not
> > worth the trouble.
> 
> The block layer necessarily detects errors in a huge number of places.
> 
> It also has a rich and complex interface to device models, with error
> reporting in many places.
> 
> virtio-blk, ide and scsi-disk each have a single error handler to handle
> block layer errors.
> 
> For better or worse, these are the only block layer error "chokepoints"
> we have now, and therefore the only easy places to set BDS I/O status.
> But they're still wrong.
> 
> The block layer shouldn't require device models to tell it what it
> already knows.
> 
> Now, I don't want to block your series just because it adds this wart.
> But the wart should be documented in the code.

Is something like:

 /* XXX: this should be implemented in the block layer */

good enough?

> Also document that any device model implementing werror=stop|enospc or
> rerror=stop must update I/O status.  Extra points if you find a way to
> enforce it instead.

Why? It's probably a nice thing because we can know which device caused
the VM to stop, but I don't see it as a must have (it's probably a
no brainer to update the I/O status though).

> 
> Apropos enforcing.  Currently, -drive accepts any werror and rerror
> action with if={ide,virtio,scsi,none}.  We rely on device models not
> implementing an action to check and fail during initialization.
> scsi-generic and the floppy devices do.  All the other qdevified devices
> don't, and that's broken.

I think I can fix that, but this series doesn't depend on that, right?
(in the meaning that we could merge this before getting that problem fixed).

> 
> Are werror and rerror really host state?
>