From: "Denis V. Lunev" <den@openvz.org>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Laszlo Ersek <lersek@redhat.com>,
qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot
Date: Tue, 12 Jan 2016 19:58:18 +0300 [thread overview]
Message-ID: <569530AA.6000703@openvz.org> (raw)
In-Reply-To: <20160112165247.GJ4841@noname.redhat.com>
On 01/12/2016 07:52 PM, Kevin Wolf wrote:
> Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben:
>> On 01/12/2016 06:47 PM, Denis V. Lunev wrote:
>>> On 01/12/2016 06:20 PM, Kevin Wolf wrote:
>>>> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben:
>>>>> On 12/01/2016 15:16, Kevin Wolf wrote:
>>>>>>> Thus we should avoid selection of "pflash" drives for VM
>>>>>>> state saving.
>>>>>>>
>>>>>>> For now "pflash" is read-write raw image as it configured by libvirt.
>>>>>>> Thus there are no such images in the field and we could
>>>>>>> safely disable
>>>>>>> ability to save state to those images inside QEMU.
>>>>>> This is obviously broken. If you write to the pflash, then it needs to
>>>>>> be snapshotted in order to keep a consistent state.
>>>>>>
>>>>>> If you want to avoid snapshotting the image, make it read-only and it
>>>>>> will be skipped even today.
>>>>> Sort of. The point of having flash is to _not_ make it read-only, so
>>>>> that is not a solution.
>>>>>
>>>>> Flash is already being snapshotted as part of saving RAM state. In
>>>>> fact, for this reason the device (at least the one used with OVMF; I
>>>>> haven't checked other pflash devices) can simply save it back to disk
>>>>> on the migration destination, without the need to use "migrate -b" or
>>>>> shared storage.
>>>>> [...]
>>>>> I don't like very much using IF_PFLASH this way, which is why I hadn't
>>>>> replied to the patch so far---I hadn't made up my mind about *what* to
>>>>> suggest instead, or whether to just accept it. However, it does work.
>>>>>
>>>>> Perhaps a separate "I know what I am doing" skip-snapshot option? Or
>>>>> a device callback saying "not snapshotting this is fine"?
>>>> Boy, is this ugly...
>>>>
>>>> What do you do with disk-only snapshots? The recovery only works as long
>>>> as you have VM state.
>>>>
>>>> Kevin
>>> actually I am in a bit of trouble :(
>>>
>>> I understand that this is ugly, but I would like to make working
>>> 'virsh snapshot' for OVFM VMs. This is necessary for us to make
>>> a release.
>>>
>>> Currently libvirt guys generate XML in the following way:
>>>
>>> <os>
>>> <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
>>> <loader readonly='yes'
>>> type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader>
>>> <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram>
>>> </os>
>>>
>>> This results in:
>>>
>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>>> \
>>> -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1
>>>
>>> This obviously can not pass check in bdrv_all_can_snapshot()
>>> as 'pflash' is RW and raw, i.e. can not be snapshoted.
>>>
>>> They have discussed the switch to the following command line:
>>>
>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>>> \
>>> -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1
>>>
>>> and say that in this case VM state could fall into PFLASH
>>> drive which is should not be big as the location of the
>>> file is different. This means that I am doomed here.
>>>
>>> Either we should force libvirt people to forget about their
>>> opinion that pflash should be small which I am unable to
>>> do or I should invent a way to ban VM state saving into
>>> pflash.
>>>
>>> OK. There are 2 options.
>>>
>>> 1) Ban pflash as it was done.
>>> 2) Add 'no-vmstate' flag to -drive (invented just now).
>>>
>> something like this:
>>
>> diff --git a/block.c b/block.c
>> index 3e1877d..8900589 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = {
>> .help = "Block driver to use for the node",
>> },
>> {
>> + .name = "novmstate",
>> + .type = QEMU_OPT_BOOL,
>> + .help = "Ignore for selecting to save VM state",
>> + },
>> + {
>> .name = BDRV_OPT_CACHE_WB,
>> .type = QEMU_OPT_BOOL,
>> .help = "Enable writeback mode",
>> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState
>> *bs, BdrvChild *file,
>> bs->request_alignment = 512;
>> bs->zero_beyond_eof = true;
>> bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
>> + bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false);
>>
>> if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
>> error_setg(errp,
>> diff --git a/block/snapshot.c b/block/snapshot.c
>> index 2d86b88..33cdd86 100644
>> --- a/block/snapshot.c
>> +++ b/block/snapshot.c
>> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void)
>> while (not_found && (bs = bdrv_next(bs))) {
>> AioContext *ctx = bdrv_get_aio_context(bs);
>>
>> + if (bs->disable_vmstate_save) {
>> + continue;
>> + }
>> +
>> aio_context_acquire(ctx);
>> not_found = !bdrv_can_snapshot(bs);
>> aio_context_release(ctx);
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 256609d..855a209 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -438,6 +438,9 @@ struct BlockDriverState {
>> /* do we need to tell the quest if we have a volatile write cache? */
>> int enable_write_cache;
>>
>> + /* skip this BDS searching for one to save VM state */
>> + bool disable_vmstate_save;
>> +
>> /* the following member gives a name to every node on the bs graph. */
>> char node_name[32];
>> /* element of the list of named nodes building the graph */
> That sounds like an option. (No pun intended.)
>
> We can discuss the option name (perhaps "vmstate" defaulting to "on" is
> better?) and variable names (I'd prefer them to match the option name);
> also you'll need to extend the QAPI schema for blockdev-add. But all of
> these are minor points and the idea seems sane.
>
> Kevin
Perfect!
Thanks all for a discussion :)
Den
next prev parent reply other threads:[~2016-01-12 16:58 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-12 6:03 [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot Denis V. Lunev
2016-01-12 14:16 ` Kevin Wolf
2016-01-12 14:59 ` Paolo Bonzini
2016-01-12 15:13 ` Denis V. Lunev
2016-01-12 15:16 ` Peter Maydell
2016-01-12 15:26 ` Kevin Wolf
2016-01-12 15:20 ` Kevin Wolf
2016-01-12 15:35 ` Paolo Bonzini
2016-01-12 15:47 ` Denis V. Lunev
2016-01-12 16:35 ` Denis V. Lunev
2016-01-12 16:52 ` Kevin Wolf
2016-01-12 16:58 ` Denis V. Lunev [this message]
2016-01-12 17:40 ` Markus Armbruster
2016-01-12 17:50 ` Kevin Wolf
2016-01-12 17:54 ` Denis V. Lunev
2016-01-13 8:09 ` Markus Armbruster
2016-01-13 10:43 ` Laszlo Ersek
2016-01-12 17:53 ` Denis V. Lunev
2016-01-13 10:41 ` Laszlo Ersek
2016-01-13 10:37 ` Laszlo Ersek
2016-01-13 11:11 ` Denis V. Lunev
2016-01-13 12:15 ` Laszlo Ersek
2016-01-12 15:10 ` Denis V. Lunev
2016-01-12 15:28 ` Kevin Wolf
2016-01-14 11:33 ` [Qemu-devel] [PATCH 1/1] RESUME " Denis V. Lunev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=569530AA.6000703@openvz.org \
--to=den@openvz.org \
--cc=kwolf@redhat.com \
--cc=lersek@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).