From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34380) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aJ2HF-0008Qg-QS for qemu-devel@nongnu.org; Tue, 12 Jan 2016 11:58:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aJ2HA-0005H1-Jn for qemu-devel@nongnu.org; Tue, 12 Jan 2016 11:58:37 -0500 References: <1452578622-4492-1-git-send-email-den@openvz.org> <20160112141607.GD4841@noname.redhat.com> <569514E7.8090101@redhat.com> <20160112152051.GG4841@noname.redhat.com> <5695201C.2050504@openvz.org> <56952B5E.9080002@openvz.org> <20160112165247.GJ4841@noname.redhat.com> From: "Denis V. Lunev" Message-ID: <569530AA.6000703@openvz.org> Date: Tue, 12 Jan 2016 19:58:18 +0300 MIME-Version: 1.0 In-Reply-To: <20160112165247.GJ4841@noname.redhat.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: Paolo Bonzini , Laszlo Ersek , qemu-devel@nongnu.org, qemu-block@nongnu.org On 01/12/2016 07:52 PM, Kevin Wolf wrote: > Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben: >> On 01/12/2016 06:47 PM, Denis V. Lunev wrote: >>> On 01/12/2016 06:20 PM, Kevin Wolf wrote: >>>> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben: >>>>> On 12/01/2016 15:16, Kevin Wolf wrote: >>>>>>> Thus we should avoid selection of "pflash" drives for VM >>>>>>> state saving. >>>>>>> >>>>>>> For now "pflash" is read-write raw image as it configured by libvirt. >>>>>>> Thus there are no such images in the field and we could >>>>>>> safely disable >>>>>>> ability to save state to those images inside QEMU. >>>>>> This is obviously broken. If you write to the pflash, then it needs to >>>>>> be snapshotted in order to keep a consistent state. >>>>>> >>>>>> If you want to avoid snapshotting the image, make it read-only and it >>>>>> will be skipped even today. >>>>> Sort of. The point of having flash is to _not_ make it read-only, so >>>>> that is not a solution. >>>>> >>>>> Flash is already being snapshotted as part of saving RAM state. In >>>>> fact, for this reason the device (at least the one used with OVMF; I >>>>> haven't checked other pflash devices) can simply save it back to disk >>>>> on the migration destination, without the need to use "migrate -b" or >>>>> shared storage. >>>>> [...] >>>>> I don't like very much using IF_PFLASH this way, which is why I hadn't >>>>> replied to the patch so far---I hadn't made up my mind about *what* to >>>>> suggest instead, or whether to just accept it. However, it does work. >>>>> >>>>> Perhaps a separate "I know what I am doing" skip-snapshot option? Or >>>>> a device callback saying "not snapshotting this is fine"? >>>> Boy, is this ugly... >>>> >>>> What do you do with disk-only snapshots? The recovery only works as long >>>> as you have VM state. >>>> >>>> Kevin >>> actually I am in a bit of trouble :( >>> >>> I understand that this is ugly, but I would like to make working >>> 'virsh snapshot' for OVFM VMs. This is necessary for us to make >>> a release. >>> >>> Currently libvirt guys generate XML in the following way: >>> >>> >>> hvm >>> >> type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd >>> /var/lib/libvirt/qemu/nvram/f20efi_VARS.fd >>> >>> >>> This results in: >>> >>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >>> \ >>> -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1 >>> >>> This obviously can not pass check in bdrv_all_can_snapshot() >>> as 'pflash' is RW and raw, i.e. can not be snapshoted. >>> >>> They have discussed the switch to the following command line: >>> >>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >>> \ >>> -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1 >>> >>> and say that in this case VM state could fall into PFLASH >>> drive which is should not be big as the location of the >>> file is different. This means that I am doomed here. >>> >>> Either we should force libvirt people to forget about their >>> opinion that pflash should be small which I am unable to >>> do or I should invent a way to ban VM state saving into >>> pflash. >>> >>> OK. There are 2 options. >>> >>> 1) Ban pflash as it was done. >>> 2) Add 'no-vmstate' flag to -drive (invented just now). >>> >> something like this: >> >> diff --git a/block.c b/block.c >> index 3e1877d..8900589 100644 >> --- a/block.c >> +++ b/block.c >> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = { >> .help = "Block driver to use for the node", >> }, >> { >> + .name = "novmstate", >> + .type = QEMU_OPT_BOOL, >> + .help = "Ignore for selecting to save VM state", >> + }, >> + { >> .name = BDRV_OPT_CACHE_WB, >> .type = QEMU_OPT_BOOL, >> .help = "Enable writeback mode", >> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState >> *bs, BdrvChild *file, >> bs->request_alignment = 512; >> bs->zero_beyond_eof = true; >> bs->read_only = !(bs->open_flags & BDRV_O_RDWR); >> + bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false); >> >> if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) { >> error_setg(errp, >> diff --git a/block/snapshot.c b/block/snapshot.c >> index 2d86b88..33cdd86 100644 >> --- a/block/snapshot.c >> +++ b/block/snapshot.c >> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void) >> while (not_found && (bs = bdrv_next(bs))) { >> AioContext *ctx = bdrv_get_aio_context(bs); >> >> + if (bs->disable_vmstate_save) { >> + continue; >> + } >> + >> aio_context_acquire(ctx); >> not_found = !bdrv_can_snapshot(bs); >> aio_context_release(ctx); >> diff --git a/include/block/block_int.h b/include/block/block_int.h >> index 256609d..855a209 100644 >> --- a/include/block/block_int.h >> +++ b/include/block/block_int.h >> @@ -438,6 +438,9 @@ struct BlockDriverState { >> /* do we need to tell the quest if we have a volatile write cache? */ >> int enable_write_cache; >> >> + /* skip this BDS searching for one to save VM state */ >> + bool disable_vmstate_save; >> + >> /* the following member gives a name to every node on the bs graph. */ >> char node_name[32]; >> /* element of the list of named nodes building the graph */ > That sounds like an option. (No pun intended.) > > We can discuss the option name (perhaps "vmstate" defaulting to "on" is > better?) and variable names (I'd prefer them to match the option name); > also you'll need to extend the QAPI schema for blockdev-add. But all of > these are minor points and the idea seems sane. > > Kevin Perfect! Thanks all for a discussion :) Den