From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:45991)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1SWQOA-0007WH-RT
	for qemu-devel@nongnu.org; Mon, 21 May 2012 07:03:03 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1SWQO4-0002To-U2
	for qemu-devel@nongnu.org; Mon, 21 May 2012 07:02:58 -0400
Received: from mx1.redhat.com ([209.132.183.28]:12917)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1SWQO4-0002TU-Dz
	for qemu-devel@nongnu.org; Mon, 21 May 2012 07:02:52 -0400
Message-ID: <4FBA20D6.10507@redhat.com>
Date: Mon, 21 May 2012 13:02:46 +0200
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <4FB6821A.1080902@redhat.com> <4FBA0AF9.8080705@redhat.com>
	<4FBA129E.80802@redhat.com> <4FBA19D7.1000806@redhat.com>
In-Reply-To: <4FBA19D7.1000806@redhat.com>
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Proposal for extensions of block job commands in
	QEMU 1.2
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Federico Simoncelli <fsimonce@redhat.com>, Luiz Capitulino <lcapitulino@redhat.com>, Eric Blake <eblake@redhat.com>, qemu-devel <qemu-devel@nongnu.org>, Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

Il 21/05/2012 12:32, Kevin Wolf ha scritto:
> Am 21.05.2012 12:02, schrieb Paolo Bonzini:
>> Il 21/05/2012 11:29, Kevin Wolf ha scritto:
>>>> * block-stream: I propose adding two options to the existing
>>>> block-stream command.  If this is rejected, only mirroring will be able
>>>> to use rerror/werror.
>>>>
>>>> The new options are of course rerror/werror.  They are enum options,
>>>> with the following possible values:
>>>
>>> Do we really need separate werror/rerror? For guest operations they
>>> really exist only for historical reasons: werror was there first, and
>>> when we wanted the same functionality, it seemed odd to overload werror
>>> to include reads as well.
>>>
>>> For block jobs, where there is no such option yet, we could go with a
>>> single error option, unless there is a use case for separate
>>> werror/rerror options.
>>
>> For mirroring rerror=source and werror=target.  I'm not sure there is an
>> actual usecase, but at least it is more interesting than for devices...
> 
> Hm. What if we add an active mirror? Then we can get some kind of COW,
> and rerror can happen on the target as well.

Errors during the read part of COW are always reported as werror.

> If source/target is really the distinction we want to have, should the
> available options be specific to the job type, so that you could have
> src_error and dst_error for mirroring?

Yes, that would make sense.

>>>> 'stop': The VM *and* the job will be paused---the VM is stopped even if
>>>> the block device has neither rerror=stop nor werror={stop,enospc}.  The
>>>> error is recorded in the block device's iostatus (which can be examined
>>>> with query-block).  However, a BLOCK_IO_ERROR event will _never_ pause a
>>>> job.
>>>>
>>>>   Rationale: stopping all I/O seems to be the best choice in order
>>>>   to limit the number of errors received.  However, due to backwards-
>>>>   compatibility with QEMU 1.1 we cannot pause the job when guest-
>>>>   initiated I/O causes an error.  We could do that if the block
>>>>   device has rerror=stop/werror={stop,enospc}, but it seems more
>>>>   complicated to just never do it.
>>>
>>> I don't agree with stopping the VM. Consider a case where the target is
>>> somewhere on the network and you lose the connection, but the primary
>>> image is local on the hard disk. You don't want to stop the VM just
>>> because continuing with the copy isn't possible for the moment.
>>
>> I think this is something that management should resolve.
> 
> Management doesn't necessarily exist.

Even a human sitting at a console is management.  (Though I don't plan
HMP to expose rerror/werror; so you can assume in some sense that
management exists).

>> For an error on the source, stopping the VM makes sense.  I don't
>> think management cares about what caused an I/O error on a device.
>> Does it matter if streaming was active or rather the guest was
>> executing "dd if=/dev/sda of=/dev/null".
> 
> Yes, there's a big difference: If it was a job, the guest can keep
> running without any problems. If it was a guest operation, we would have
> to return an error to the guest, which may offline the disk in response.

Ok, this makes sense.

>> Management may want to keep the VM stopped even for an error on the
>> target, as long as mirroring has finished the initial synchronization
>> step.  The VM can perform large amounts of I/O while the job is paused,
>> and then completing the job can take a large amount of time.
> 
> If management wants to limit the impact of this, it can decide to
> explicitly stop the VM when it receives the error event.

That can be too late.

Eric, is it a problem for libvirt if a pause or target error during
mirroring causes the job to exit steady state?  That means that after a
target error the offset can go back from 100% to <100%.

>>> If the VM is stopped (including BLOCK_IO_ERROR), no I/O should be going
>>> on at all. Do we really keep running the jobs in 1.1? If so, this is a
>>> bug and should be fixed before the release.
>>
>> Yes, we do.  Do you think it's a problem for migration (thinking more
>> about it: ouch, yes, it should be)?
> 
> I'm pretty sure that it is a problem for migration. And it's likely a
> problem in more cases.

On the other hand, in other cases it can be desirable (qemu -S, run
streaming before the VM starts).

>> I'd rather make the extension of query-block-jobs more generic, with a
>> list "devices" instead of a member "target", and making up the device
>> name in the implementation (so you have "device": "target" for mirroring).
> 
> Well, my idea for blockdev was something like (represented in a -drive
> syntax because I don't know what it will look like):
> 
> (qemu) blockdev_add file=foo.img,id=src
> (qemu) device_add virtio-blk-pci,drive=src
> ...
> (qemu) blockdev_add file=bar.img,id=dst
> (qemu) blockdev_mirror foo bar
> 
> Once QOM reaches the block layer, I guess we'll want to make all
> BlockDriverStates user visible anyway.

I don't disagree, but that's very different from what is done with
drive-mirror.

So for now I'll keep my proposed extension of query-block-jobs; later it
can be modified so that the target will have a name if you started the
mirroring with blockdev_mirror instead of drive_mirror.

Paolo