All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Federico Simoncelli <fsimonce@redhat.com>,
	Luiz Capitulino <lcapitulino@redhat.com>,
	Eric Blake <eblake@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Date: Mon, 21 May 2012 15:07:37 +0200	[thread overview]
Message-ID: <4FBA3E19.5020901@redhat.com> (raw)
In-Reply-To: <4FBA20D6.10507@redhat.com>

Am 21.05.2012 13:02, schrieb Paolo Bonzini:
> Il 21/05/2012 12:32, Kevin Wolf ha scritto:
>> Am 21.05.2012 12:02, schrieb Paolo Bonzini:
>>> Il 21/05/2012 11:29, Kevin Wolf ha scritto:
>>>>> * block-stream: I propose adding two options to the existing
>>>>> block-stream command.  If this is rejected, only mirroring will be able
>>>>> to use rerror/werror.
>>>>>
>>>>> The new options are of course rerror/werror.  They are enum options,
>>>>> with the following possible values:
>>>>
>>>> Do we really need separate werror/rerror? For guest operations they
>>>> really exist only for historical reasons: werror was there first, and
>>>> when we wanted the same functionality, it seemed odd to overload werror
>>>> to include reads as well.
>>>>
>>>> For block jobs, where there is no such option yet, we could go with a
>>>> single error option, unless there is a use case for separate
>>>> werror/rerror options.
>>>
>>> For mirroring rerror=source and werror=target.  I'm not sure there is an
>>> actual usecase, but at least it is more interesting than for devices...
>>
>> Hm. What if we add an active mirror? Then we can get some kind of COW,
>> and rerror can happen on the target as well.
> 
> Errors during the read part of COW are always reported as werror.

Good point.

Thinking a bit more about it, with an active mirror (i.e. a filter block
driver) things become a bit less clear anyway. The filter would have to
be linked to the job somehow.

Another interesting question is if we'll want to restrict ourselves to
one job at a time forever. But when we stop doing it, we'll need new
APIs anyway.

>> If source/target is really the distinction we want to have, should the
>> available options be specific to the job type, so that you could have
>> src_error and dst_error for mirroring?
> 
> Yes, that would make sense.

Of course, at the same time it also makes the implementation a bit more
complicated.

>>>>> 'stop': The VM *and* the job will be paused---the VM is stopped even if
>>>>> the block device has neither rerror=stop nor werror={stop,enospc}.  The
>>>>> error is recorded in the block device's iostatus (which can be examined
>>>>> with query-block).  However, a BLOCK_IO_ERROR event will _never_ pause a
>>>>> job.
>>>>>
>>>>>   Rationale: stopping all I/O seems to be the best choice in order
>>>>>   to limit the number of errors received.  However, due to backwards-
>>>>>   compatibility with QEMU 1.1 we cannot pause the job when guest-
>>>>>   initiated I/O causes an error.  We could do that if the block
>>>>>   device has rerror=stop/werror={stop,enospc}, but it seems more
>>>>>   complicated to just never do it.
>>>>
>>>> I don't agree with stopping the VM. Consider a case where the target is
>>>> somewhere on the network and you lose the connection, but the primary
>>>> image is local on the hard disk. You don't want to stop the VM just
>>>> because continuing with the copy isn't possible for the moment.
>>>
>>> I think this is something that management should resolve.
>>
>> Management doesn't necessarily exist.
> 
> Even a human sitting at a console is management.  (Though I don't plan
> HMP to expose rerror/werror; so you can assume in some sense that
> management exists).

But it's management that cares about good defaults. :-)

Why not expose the options in HMP?

>>> For an error on the source, stopping the VM makes sense.  I don't
>>> think management cares about what caused an I/O error on a device.
>>> Does it matter if streaming was active or rather the guest was
>>> executing "dd if=/dev/sda of=/dev/null".
>>
>> Yes, there's a big difference: If it was a job, the guest can keep
>> running without any problems. If it was a guest operation, we would have
>> to return an error to the guest, which may offline the disk in response.
> 
> Ok, this makes sense.
> 
>>> Management may want to keep the VM stopped even for an error on the
>>> target, as long as mirroring has finished the initial synchronization
>>> step.  The VM can perform large amounts of I/O while the job is paused,
>>> and then completing the job can take a large amount of time.
>>
>> If management wants to limit the impact of this, it can decide to
>> explicitly stop the VM when it receives the error event.
> 
> That can be too late.
> 
> Eric, is it a problem for libvirt if a pause or target error during
> mirroring causes the job to exit steady state?  That means that after a
> target error the offset can go back from 100% to <100%.

"too late" in what respect? With the passive mirror, we already have a
window in which data is on the source, but not copied to the target.
Does it make a big difference if it is a few bytes more or less?

>>>> If the VM is stopped (including BLOCK_IO_ERROR), no I/O should be going
>>>> on at all. Do we really keep running the jobs in 1.1? If so, this is a
>>>> bug and should be fixed before the release.
>>>
>>> Yes, we do.  Do you think it's a problem for migration (thinking more
>>> about it: ouch, yes, it should be)?
>>
>> I'm pretty sure that it is a problem for migration. And it's likely a
>> problem in more cases.
> 
> On the other hand, in other cases it can be desirable (qemu -S, run
> streaming before the VM starts).

We would have to verify that the whole qemu code can deal with it. I'm
pretty sure that today it can't and we had a related bug before, even
though I can't remember the details.

>>> I'd rather make the extension of query-block-jobs more generic, with a
>>> list "devices" instead of a member "target", and making up the device
>>> name in the implementation (so you have "device": "target" for mirroring).
>>
>> Well, my idea for blockdev was something like (represented in a -drive
>> syntax because I don't know what it will look like):
>>
>> (qemu) blockdev_add file=foo.img,id=src
>> (qemu) device_add virtio-blk-pci,drive=src
>> ...
>> (qemu) blockdev_add file=bar.img,id=dst
>> (qemu) blockdev_mirror foo bar
>>
>> Once QOM reaches the block layer, I guess we'll want to make all
>> BlockDriverStates user visible anyway.
> 
> I don't disagree, but that's very different from what is done with
> drive-mirror.

Yes. Which isn't a problem per se because drive-mirror will be replaced
by blockdev-mirror. However, things like query-block-jobs are probably
going to stay, so they should be designed for the future.

Things like this are why I don't feel overly comfortable with adding
more and more block layer features before we implement -blockdev.

> So for now I'll keep my proposed extension of query-block-jobs; later it
> can be modified so that the target will have a name if you started the
> mirroring with blockdev_mirror instead of drive_mirror.

You mean the same QMP field is a string when the block device was added
with blockdev_add and a dict when it was added with drive_add?
Maintaining this sounds like a nightmare to me.

Kevin

  reply	other threads:[~2012-05-21 13:08 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-18 17:08 [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2 Paolo Bonzini
2012-05-21  9:29 ` Kevin Wolf
2012-05-21 10:02   ` Paolo Bonzini
2012-05-21 10:32     ` Kevin Wolf
2012-05-21 11:02       ` Paolo Bonzini
2012-05-21 13:07         ` Kevin Wolf [this message]
2012-05-21 15:18           ` Paolo Bonzini
2012-05-21 13:13         ` Eric Blake
2012-05-21 12:20 ` Stefan Hajnoczi
2012-05-21 13:59 ` Luiz Capitulino
2012-05-21 14:09   ` Kevin Wolf
2012-05-21 14:16     ` Anthony Liguori
2012-05-21 14:17     ` Luiz Capitulino
2012-05-21 14:10   ` Anthony Liguori
2012-05-21 14:16     ` Luiz Capitulino
2012-05-21 14:19       ` Anthony Liguori
2012-05-21 14:26         ` Paolo Bonzini
2012-05-21 14:40           ` Anthony Liguori
2012-05-21 14:47             ` Paolo Bonzini
2012-05-21 15:44               ` Anthony Liguori
2012-05-21 15:55                 ` Paolo Bonzini
2012-05-21 14:17     ` Kevin Wolf
2012-05-21 14:39   ` Paolo Bonzini
2012-05-24 13:41 ` [Qemu-devel] Block job commands in QEMU 1.2 [v2, including support for replication] Paolo Bonzini
2012-05-24 14:00   ` Ori Mamluk
2012-05-24 14:19     ` Paolo Bonzini
2012-05-24 15:32       ` Dor Laor
2012-05-25  8:59         ` Paolo Bonzini
2012-05-24 16:57   ` Eric Blake
2012-05-25  8:48     ` Paolo Bonzini
2012-05-25 15:02       ` Eric Blake
2012-05-25  8:28   ` Stefan Hajnoczi
2012-05-25  8:42     ` Kevin Wolf
2012-05-25  9:43   ` Stefan Hajnoczi
2012-05-25 11:17     ` Paolo Bonzini
2012-05-25 12:09       ` Stefan Hajnoczi
2012-05-25 13:25         ` Paolo Bonzini
2012-05-25 16:57   ` Luiz Capitulino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FBA3E19.5020901@redhat.com \
    --to=kwolf@redhat.com \
    --cc=eblake@redhat.com \
    --cc=fsimonce@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.