From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:52798) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SwYgU-0002Cm-2C for qemu-devel@nongnu.org; Wed, 01 Aug 2012 09:09:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SwYgL-00044K-Ed for qemu-devel@nongnu.org; Wed, 01 Aug 2012 09:09:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45948) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SwYgL-00044C-5s for qemu-devel@nongnu.org; Wed, 01 Aug 2012 09:09:45 -0400 Message-ID: <50192A8D.3000309@redhat.com> Date: Wed, 01 Aug 2012 15:09:33 +0200 From: Kevin Wolf MIME-Version: 1.0 References: <1343127865-16608-1-git-send-email-pbonzini@redhat.com> <1343127865-16608-14-git-send-email-pbonzini@redhat.com> <5019016F.1050909@redhat.com> <50191044.7070109@redhat.com> <501917CB.1060704@redhat.com> <50191C97.8000005@redhat.com> <50191FA7.1070309@redhat.com> <50192182.8070509@redhat.com> In-Reply-To: <50192182.8070509@redhat.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 13/47] block: introduce block job error List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: jcody@redhat.com, eblake@redhat.com, qemu-devel@nongnu.org, stefanha@linux.vnet.ibm.com Am 01.08.2012 14:30, schrieb Paolo Bonzini: > Il 01/08/2012 14:23, Kevin Wolf ha scritto: >> Am 01.08.2012 14:09, schrieb Paolo Bonzini: >>> Il 01/08/2012 13:49, Kevin Wolf ha scritto: >>>>> The real question is: if I remove the possibility to inspect the (so far >>>>> anonymous) target device with query-block-jobs, how do you read the >>>>> status of the target device?... >>>> >>>> You don't? :-) >>> >>> That's a possibility. :) >>> >>> You can just report it in the block job. It's more consistent with the >>> fact that you got a BLOCK_JOB_ERROR and not a BLOCK_IO_ERROR. So I >>> would do: >>> >>> + bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR, >>> + action, is_read); >> >> Isn't job->bs the source? > > It's not about source vs. target, it's about what block device the _job_ > is attached too. The source is always going to be special because you > must do "block-job-resume source". This whole concept of a block job being attached to a single bs is flawed. Because in the general case it isn't. And probably it should have been a BackgroundJob rather than a BlockJob from the beginning, because I'm sure there are use cases outside the block layer as well. We really manage to get every single point messed up. :-( > is_read tells you where the error was. Rather indirect, but yes, for the mirror it works. >> Also, if you miss the event (e.g. libvirt crashed), can you still tell >> which bs caused the error? > > No, but how is it different from "can you still tell which bs in the > chain caused the error"? Which you cannot tell anyway even from the > event parameters we have had so far. It isn't different. We really should report the exact BDS. In practice, management tools care about ENOSPC which is always the top-level BDS, and call the admin for help in other cases. The admin can try manually which file is at fault, but I suppose he would be very glad to be told by the error message which file it is. >> Do we need another BlockJobInfo field for the >> name (*cough* ;-)) of the failed bs? >> >> If I understand it right, error reporting on the source and on the >> target would be completely symmetrical then, which I think is a good thing. > > Yeah, it would, but job->bs _is_ special anyway. > >> job->bs makes one image special, which isn't really useful for a generic >> interface. So we should keep the fact that it exists internal and get >> rid of it sometime. >> >>> + if (action == BDRV_ACTION_STOP) { >>> + block_job_pause(job); >>> + block_job_iostatus_set_err(job, error); >>> + if (bs != job->bs) { >>> + bdrv_iostatus_set_err(bs, error); >>> + } >>> + } >>> >>> where the bdrv_iostatus_set_err is mostly to "prepare for the future" >>> usage of named block devices. >> >> Again, I'd make it unconditional to get symmetric behaviour. > > Not possible, because existing clients may expect "iostatus: > {nospace,failed}" to imply a runstate of "not running (i/o error)". Did we make such guarantees? Does libvirt actually make use of it? >>> Hmm, but I'm a bit wary of introducing such a big change now. We know >>> what it makes nicer, but we don't know of anything irremediably broken >>> without them, and we haven't thought enough of any warts it introduces. >> >> On the one hand, I can understand your concerns, but on the other hand, >> introducing an API now and then making such a big change afterwards >> scares me much more. > > One example of the doubts I have: is iostatus a BlockBackend or a > BlockDriverState thing? If it a BlockBackend thing, does the target > device have an iostatus at all? I think it's better to have it in BlockDriverState, but in my imagination the target is a BlockBackend anyway. Kevin