From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:45991) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SWQOA-0007WH-RT for qemu-devel@nongnu.org; Mon, 21 May 2012 07:03:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SWQO4-0002To-U2 for qemu-devel@nongnu.org; Mon, 21 May 2012 07:02:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:12917) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SWQO4-0002TU-Dz for qemu-devel@nongnu.org; Mon, 21 May 2012 07:02:52 -0400 Message-ID: <4FBA20D6.10507@redhat.com> Date: Mon, 21 May 2012 13:02:46 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <4FB6821A.1080902@redhat.com> <4FBA0AF9.8080705@redhat.com> <4FBA129E.80802@redhat.com> <4FBA19D7.1000806@redhat.com> In-Reply-To: <4FBA19D7.1000806@redhat.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: Federico Simoncelli , Luiz Capitulino , Eric Blake , qemu-devel , Stefan Hajnoczi Il 21/05/2012 12:32, Kevin Wolf ha scritto: > Am 21.05.2012 12:02, schrieb Paolo Bonzini: >> Il 21/05/2012 11:29, Kevin Wolf ha scritto: >>>> * block-stream: I propose adding two options to the existing >>>> block-stream command. If this is rejected, only mirroring will be able >>>> to use rerror/werror. >>>> >>>> The new options are of course rerror/werror. They are enum options, >>>> with the following possible values: >>> >>> Do we really need separate werror/rerror? For guest operations they >>> really exist only for historical reasons: werror was there first, and >>> when we wanted the same functionality, it seemed odd to overload werror >>> to include reads as well. >>> >>> For block jobs, where there is no such option yet, we could go with a >>> single error option, unless there is a use case for separate >>> werror/rerror options. >> >> For mirroring rerror=source and werror=target. I'm not sure there is an >> actual usecase, but at least it is more interesting than for devices... > > Hm. What if we add an active mirror? Then we can get some kind of COW, > and rerror can happen on the target as well. Errors during the read part of COW are always reported as werror. > If source/target is really the distinction we want to have, should the > available options be specific to the job type, so that you could have > src_error and dst_error for mirroring? Yes, that would make sense. >>>> 'stop': The VM *and* the job will be paused---the VM is stopped even if >>>> the block device has neither rerror=stop nor werror={stop,enospc}. The >>>> error is recorded in the block device's iostatus (which can be examined >>>> with query-block). However, a BLOCK_IO_ERROR event will _never_ pause a >>>> job. >>>> >>>> Rationale: stopping all I/O seems to be the best choice in order >>>> to limit the number of errors received. However, due to backwards- >>>> compatibility with QEMU 1.1 we cannot pause the job when guest- >>>> initiated I/O causes an error. We could do that if the block >>>> device has rerror=stop/werror={stop,enospc}, but it seems more >>>> complicated to just never do it. >>> >>> I don't agree with stopping the VM. Consider a case where the target is >>> somewhere on the network and you lose the connection, but the primary >>> image is local on the hard disk. You don't want to stop the VM just >>> because continuing with the copy isn't possible for the moment. >> >> I think this is something that management should resolve. > > Management doesn't necessarily exist. Even a human sitting at a console is management. (Though I don't plan HMP to expose rerror/werror; so you can assume in some sense that management exists). >> For an error on the source, stopping the VM makes sense. I don't >> think management cares about what caused an I/O error on a device. >> Does it matter if streaming was active or rather the guest was >> executing "dd if=/dev/sda of=/dev/null". > > Yes, there's a big difference: If it was a job, the guest can keep > running without any problems. If it was a guest operation, we would have > to return an error to the guest, which may offline the disk in response. Ok, this makes sense. >> Management may want to keep the VM stopped even for an error on the >> target, as long as mirroring has finished the initial synchronization >> step. The VM can perform large amounts of I/O while the job is paused, >> and then completing the job can take a large amount of time. > > If management wants to limit the impact of this, it can decide to > explicitly stop the VM when it receives the error event. That can be too late. Eric, is it a problem for libvirt if a pause or target error during mirroring causes the job to exit steady state? That means that after a target error the offset can go back from 100% to <100%. >>> If the VM is stopped (including BLOCK_IO_ERROR), no I/O should be going >>> on at all. Do we really keep running the jobs in 1.1? If so, this is a >>> bug and should be fixed before the release. >> >> Yes, we do. Do you think it's a problem for migration (thinking more >> about it: ouch, yes, it should be)? > > I'm pretty sure that it is a problem for migration. And it's likely a > problem in more cases. On the other hand, in other cases it can be desirable (qemu -S, run streaming before the VM starts). >> I'd rather make the extension of query-block-jobs more generic, with a >> list "devices" instead of a member "target", and making up the device >> name in the implementation (so you have "device": "target" for mirroring). > > Well, my idea for blockdev was something like (represented in a -drive > syntax because I don't know what it will look like): > > (qemu) blockdev_add file=foo.img,id=src > (qemu) device_add virtio-blk-pci,drive=src > ... > (qemu) blockdev_add file=bar.img,id=dst > (qemu) blockdev_mirror foo bar > > Once QOM reaches the block layer, I guess we'll want to make all > BlockDriverStates user visible anyway. I don't disagree, but that's very different from what is done with drive-mirror. So for now I'll keep my proposed extension of query-block-jobs; later it can be modified so that the target will have a name if you started the mirroring with blockdev_mirror instead of drive_mirror. Paolo