Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Paolo Bonzini <pbonzini@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Federico Simoncelli <fsimonce@redhat.com>,
	Luiz Capitulino <lcapitulino@redhat.com>,
	Eric Blake <eblake@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Date: Mon, 21 May 2012 12:02:06 +0200	[thread overview]
Message-ID: <4FBA129E.80802@redhat.com> (raw)
In-Reply-To: <4FBA0AF9.8080705@redhat.com>

Il 21/05/2012 11:29, Kevin Wolf ha scritto:
>> * block-stream: I propose adding two options to the existing
>> block-stream command.  If this is rejected, only mirroring will be able
>> to use rerror/werror.
>>
>> The new options are of course rerror/werror.  They are enum options,
>> with the following possible values:
> 
> Do we really need separate werror/rerror? For guest operations they
> really exist only for historical reasons: werror was there first, and
> when we wanted the same functionality, it seemed odd to overload werror
> to include reads as well.
> 
> For block jobs, where there is no such option yet, we could go with a
> single error option, unless there is a use case for separate
> werror/rerror options.

For mirroring rerror=source and werror=target.  I'm not sure there is an
actual usecase, but at least it is more interesting than for devices...

>> 'report': The behavior is the same as in 1.1.  An I/O error,
>> respectively during a read or a write, will complete the job immediately
>> with an error code.
>>
>> 'ignore': An I/O error, respectively during a read or a write, will be
>> ignored.  For streaming, the job will complete with an error and the
>> backing file will be left in place.  For mirroring, the sector will be
>> marked again as dirty and re-examined later.
> 
> This is not really 'ignore' as used for guest operations. There it means
> "no matter what the return value is, the operation has succeeded". For
> streaming it would mean that it just goes on with the next cluster (and
> if we don't cut the backing file link at the end, it would at least not
> corrupt anything).

Yes, for streaming it would mean that it just goes on with the next
cluster and then report an error at the end.

> Just like with guest operations it's a mostly useless mode, do we really
> need this option?

Perhaps we should remove it for guest operations as well; certainly it
makes more sense (if any) for jobs than for guest operations.

>> 'stop': The VM *and* the job will be paused---the VM is stopped even if
>> the block device has neither rerror=stop nor werror={stop,enospc}.  The
>> error is recorded in the block device's iostatus (which can be examined
>> with query-block).  However, a BLOCK_IO_ERROR event will _never_ pause a
>> job.
>>
>>   Rationale: stopping all I/O seems to be the best choice in order
>>   to limit the number of errors received.  However, due to backwards-
>>   compatibility with QEMU 1.1 we cannot pause the job when guest-
>>   initiated I/O causes an error.  We could do that if the block
>>   device has rerror=stop/werror={stop,enospc}, but it seems more
>>   complicated to just never do it.
> 
> I don't agree with stopping the VM. Consider a case where the target is
> somewhere on the network and you lose the connection, but the primary
> image is local on the hard disk. You don't want to stop the VM just
> because continuing with the copy isn't possible for the moment.

I think this is something that management should resolve.  For an error
on the source, stopping the VM makes sense.  I don't think management
cares about what caused an I/O error on a device.  Does it matter if
streaming was active or rather the guest was executing "dd if=/dev/sda
of=/dev/null".

Management may want to keep the VM stopped even for an error on the
target, as long as mirroring has finished the initial synchronization
step.  The VM can perform large amounts of I/O while the job is paused,
and then completing the job can take a large amount of time.

> Of course, this means that you can't reuse the block device's io_status,
> but you need a separate job_iostatus.

For mirroring, source and target are separate devices and have separate
iostatuses anyway.

> If the VM is stopped (including BLOCK_IO_ERROR), no I/O should be going
> on at all. Do we really keep running the jobs in 1.1? If so, this is a
> bug and should be fixed before the release.

Yes, we do.  Do you think it's a problem for migration (thinking more
about it: ouch, yes, it should be)?

We have no pause/resume infrastructure, so we could simply force
synchronous cancellation at the end (before vm_stop_force_state).
Stefan, do you have any free cycle for this?

>> * query-block-jobs: The returned JSON object will grow an additional
>> member, "target".  The target field is a dictionary with two fields,
>> "info" and "stats" (resembling the output of query-block and
>> query-blockstat but for the mirroring target).  Member "device" of the
>> BlockInfo structure will be made optional.
>>
>>   Rationale: this allows libvirt to observe the high watermark of qcow2
>>   mirroring targets, and avoids putting a bad iostatus on a working
>>   migration source.
> 
> The mirroring target should be present in query-block instead. It is a
> user-visible BlockDriverState

It is not user visible, and making it user visible adds a lot more
things to worry about (e.g. making sure you cannot use it in a
device_add).

It reminds me of Xen's renaming of domains (foo->migrating-foo and
foo->zombie-foo), which was an endless source of pain.

I'd rather make the extension of query-block-jobs more generic, with a
list "devices" instead of a member "target", and making up the device
name in the implementation (so you have "device": "target" for mirroring).

>> * block-job-complete: new command specific to mirroring (switches the
>> device to the target), not related to the rest of the proposal.
> 
> What semantics will block-job-cancel have then for mirroring? Will it be
> incompatible with RHEL 6?

They will have the same semantics: make sure that the target matches
some state of the source, and drop it.  block-job-complete adds the
switch to the target (which I'll do with something like bdrv_append,
btw).  It synchronously opens the backing files of the target, and
asynchronously completes the job.

Upstream will not have drive-reopen; this will be incompatible with RHEL6.

Paolo

next prev parent reply	other threads:[~2012-05-21 10:02 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-18 17:08 [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2 Paolo Bonzini
2012-05-21  9:29 ` Kevin Wolf
2012-05-21 10:02   ` Paolo Bonzini [this message]
2012-05-21 10:32     ` Kevin Wolf
2012-05-21 11:02       ` Paolo Bonzini
2012-05-21 13:07         ` Kevin Wolf
2012-05-21 15:18           ` Paolo Bonzini
2012-05-21 13:13         ` Eric Blake
2012-05-21 12:20 ` Stefan Hajnoczi
2012-05-21 13:59 ` Luiz Capitulino
2012-05-21 14:09   ` Kevin Wolf
2012-05-21 14:16     ` Anthony Liguori
2012-05-21 14:17     ` Luiz Capitulino
2012-05-21 14:10   ` Anthony Liguori
2012-05-21 14:16     ` Luiz Capitulino
2012-05-21 14:19       ` Anthony Liguori
2012-05-21 14:26         ` Paolo Bonzini
2012-05-21 14:40           ` Anthony Liguori
2012-05-21 14:47             ` Paolo Bonzini
2012-05-21 15:44               ` Anthony Liguori
2012-05-21 15:55                 ` Paolo Bonzini
2012-05-21 14:17     ` Kevin Wolf
2012-05-21 14:39   ` Paolo Bonzini
2012-05-24 13:41 ` [Qemu-devel] Block job commands in QEMU 1.2 [v2, including support for replication] Paolo Bonzini
2012-05-24 14:00   ` Ori Mamluk
2012-05-24 14:19     ` Paolo Bonzini
2012-05-24 15:32       ` Dor Laor
2012-05-25  8:59         ` Paolo Bonzini
2012-05-24 16:57   ` Eric Blake
2012-05-25  8:48     ` Paolo Bonzini
2012-05-25 15:02       ` Eric Blake
2012-05-25  8:28   ` Stefan Hajnoczi
2012-05-25  8:42     ` Kevin Wolf
2012-05-25  9:43   ` Stefan Hajnoczi
2012-05-25 11:17     ` Paolo Bonzini
2012-05-25 12:09       ` Stefan Hajnoczi
2012-05-25 13:25         ` Paolo Bonzini
2012-05-25 16:57   ` Luiz Capitulino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FBA129E.80802@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=eblake@redhat.com \
    --cc=fsimonce@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).