From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:52798)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1SwYgU-0002Cm-2C
	for qemu-devel@nongnu.org; Wed, 01 Aug 2012 09:09:58 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1SwYgL-00044K-Ed
	for qemu-devel@nongnu.org; Wed, 01 Aug 2012 09:09:53 -0400
Received: from mx1.redhat.com ([209.132.183.28]:45948)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1SwYgL-00044C-5s
	for qemu-devel@nongnu.org; Wed, 01 Aug 2012 09:09:45 -0400
Message-ID: <50192A8D.3000309@redhat.com>
Date: Wed, 01 Aug 2012 15:09:33 +0200
From: Kevin Wolf <kwolf@redhat.com>
MIME-Version: 1.0
References: <1343127865-16608-1-git-send-email-pbonzini@redhat.com>
	<1343127865-16608-14-git-send-email-pbonzini@redhat.com>
	<5019016F.1050909@redhat.com> <50191044.7070109@redhat.com>
	<501917CB.1060704@redhat.com> <50191C97.8000005@redhat.com>
	<50191FA7.1070309@redhat.com> <50192182.8070509@redhat.com>
In-Reply-To: <50192182.8070509@redhat.com>
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: jcody@redhat.com, eblake@redhat.com, qemu-devel@nongnu.org, stefanha@linux.vnet.ibm.com

Am 01.08.2012 14:30, schrieb Paolo Bonzini:
> Il 01/08/2012 14:23, Kevin Wolf ha scritto:
>> Am 01.08.2012 14:09, schrieb Paolo Bonzini:
>>> Il 01/08/2012 13:49, Kevin Wolf ha scritto:
>>>>> The real question is: if I remove the possibility to inspect the (so far
>>>>> anonymous) target device with query-block-jobs, how do you read the
>>>>> status of the target device?...
>>>>
>>>> You don't? :-)
>>>
>>> That's a possibility. :)
>>>
>>> You can just report it in the block job.  It's more consistent with the
>>> fact that you got a BLOCK_JOB_ERROR and not a BLOCK_IO_ERROR.  So I
>>> would do:
>>>
>>> +    bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR,
>>> +                              action, is_read);
>>
>> Isn't job->bs the source?
> 
> It's not about source vs. target, it's about what block device the _job_
> is attached too.  The source is always going to be special because you
> must do "block-job-resume source".

This whole concept of a block job being attached to a single bs is
flawed. Because in the general case it isn't. And probably it should
have been a BackgroundJob rather than a BlockJob from the beginning,
because I'm sure there are use cases outside the block layer as well.

We really manage to get every single point messed up. :-(

> is_read tells you where the error was.

Rather indirect, but yes, for the mirror it works.

>> Also, if you miss the event (e.g. libvirt crashed), can you still tell
>> which bs caused the error?
> 
> No, but how is it different from "can you still tell which bs in the
> chain caused the error"?  Which you cannot tell anyway even from the
> event parameters we have had so far.

It isn't different. We really should report the exact BDS. In practice,
management tools care about ENOSPC which is always the top-level BDS,
and call the admin for help in other cases. The admin can try manually
which file is at fault, but I suppose he would be very glad to be told
by the error message which file it is.

>> Do we need another BlockJobInfo field for the
>> name (*cough* ;-)) of the failed bs?
>>
>> If I understand it right, error reporting on the source and on the
>> target would be completely symmetrical then, which I think is a good thing.
> 
> Yeah, it would, but job->bs _is_ special anyway.
> 
>> job->bs makes one image special, which isn't really useful for a generic
>> interface. So we should keep the fact that it exists internal and get
>> rid of it sometime.
>>
>>> +    if (action == BDRV_ACTION_STOP) {
>>> +        block_job_pause(job);
>>> +        block_job_iostatus_set_err(job, error);
>>> +        if (bs != job->bs) {
>>> +            bdrv_iostatus_set_err(bs, error);
>>> +        }
>>> +    }
>>>
>>> where the bdrv_iostatus_set_err is mostly to "prepare for the future"
>>> usage of named block devices.
>>
>> Again, I'd make it unconditional to get symmetric behaviour.
> 
> Not possible, because existing clients may expect "iostatus:
> {nospace,failed}" to imply a runstate of "not running (i/o error)".

Did we make such guarantees? Does libvirt actually make use of it?

>>> Hmm, but I'm a bit wary of introducing such a big change now.  We know
>>> what it makes nicer, but we don't know of anything irremediably broken
>>> without them, and we haven't thought enough of any warts it introduces.
>>
>> On the one hand, I can understand your concerns, but on the other hand,
>> introducing an API now and then making such a big change afterwards
>> scares me much more.
> 
> One example of the doubts I have: is iostatus a BlockBackend or a
> BlockDriverState thing?  If it a BlockBackend thing, does the target
> device have an iostatus at all?

I think it's better to have it in BlockDriverState, but in my
imagination the target is a BlockBackend anyway.

Kevin