qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Denis V. Lunev" <den@openvz.org>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Laszlo Ersek <lersek@redhat.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot
Date: Tue, 12 Jan 2016 19:58:18 +0300	[thread overview]
Message-ID: <569530AA.6000703@openvz.org> (raw)
In-Reply-To: <20160112165247.GJ4841@noname.redhat.com>

On 01/12/2016 07:52 PM, Kevin Wolf wrote:
> Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben:
>> On 01/12/2016 06:47 PM, Denis V. Lunev wrote:
>>> On 01/12/2016 06:20 PM, Kevin Wolf wrote:
>>>> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben:
>>>>> On 12/01/2016 15:16, Kevin Wolf wrote:
>>>>>>> Thus we should avoid selection of "pflash" drives for VM
>>>>>>> state saving.
>>>>>>>
>>>>>>> For now "pflash" is read-write raw image as it configured by libvirt.
>>>>>>> Thus there are no such images in the field and we could
>>>>>>> safely disable
>>>>>>> ability to save state to those images inside QEMU.
>>>>>> This is obviously broken. If you write to the pflash, then it needs to
>>>>>> be snapshotted in order to keep a consistent state.
>>>>>>
>>>>>> If you want to avoid snapshotting the image, make it read-only and it
>>>>>> will be skipped even today.
>>>>> Sort of.  The point of having flash is to _not_ make it read-only, so
>>>>> that is not a solution.
>>>>>
>>>>> Flash is already being snapshotted as part of saving RAM state.  In
>>>>> fact, for this reason the device (at least the one used with OVMF; I
>>>>> haven't checked other pflash devices) can simply save it back to disk
>>>>> on the migration destination, without the need to use "migrate -b" or
>>>>> shared storage.
>>>>> [...]
>>>>> I don't like very much using IF_PFLASH this way, which is why I hadn't
>>>>> replied to the patch so far---I hadn't made up my mind about *what* to
>>>>> suggest instead, or whether to just accept it.  However, it does work.
>>>>>
>>>>> Perhaps a separate "I know what I am doing" skip-snapshot option?  Or
>>>>> a device callback saying "not snapshotting this is fine"?
>>>> Boy, is this ugly...
>>>>
>>>> What do you do with disk-only snapshots? The recovery only works as long
>>>> as you have VM state.
>>>>
>>>> Kevin
>>> actually I am in a bit of trouble :(
>>>
>>> I understand that this is ugly, but I would like to make working
>>> 'virsh snapshot' for OVFM VMs. This is necessary for us to make
>>> a release.
>>>
>>> Currently libvirt guys generate XML in the following way:
>>>
>>>   <os>
>>>     <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
>>>     <loader readonly='yes'
>>> type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader>
>>> <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram>
>>>   </os>
>>>
>>> This results in:
>>>
>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>>> \
>>>      -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1
>>>
>>> This obviously can not pass check in bdrv_all_can_snapshot()
>>> as 'pflash' is RW and raw, i.e. can not be snapshoted.
>>>
>>> They have discussed the switch to the following command line:
>>>
>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>>> \
>>>      -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1
>>>
>>> and say that in this case VM state could fall into PFLASH
>>> drive which is should not be big as the location of the
>>> file is different. This means that I am doomed here.
>>>
>>> Either we should force libvirt people to forget about their
>>> opinion that pflash should be small which I am unable to
>>> do or I should invent a way to ban VM state saving into
>>> pflash.
>>>
>>> OK. There are 2 options.
>>>
>>> 1) Ban pflash as it was done.
>>> 2) Add 'no-vmstate' flag to -drive (invented just now).
>>>
>> something like this:
>>
>> diff --git a/block.c b/block.c
>> index 3e1877d..8900589 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = {
>>               .help = "Block driver to use for the node",
>>           },
>>           {
>> +            .name = "novmstate",
>> +            .type = QEMU_OPT_BOOL,
>> +            .help = "Ignore for selecting to save VM state",
>> +        },
>> +        {
>>               .name = BDRV_OPT_CACHE_WB,
>>               .type = QEMU_OPT_BOOL,
>>               .help = "Enable writeback mode",
>> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState
>> *bs, BdrvChild *file,
>>       bs->request_alignment = 512;
>>       bs->zero_beyond_eof = true;
>>       bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
>> +    bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false);
>>
>>       if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
>>           error_setg(errp,
>> diff --git a/block/snapshot.c b/block/snapshot.c
>> index 2d86b88..33cdd86 100644
>> --- a/block/snapshot.c
>> +++ b/block/snapshot.c
>> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void)
>>       while (not_found && (bs = bdrv_next(bs))) {
>>           AioContext *ctx = bdrv_get_aio_context(bs);
>>
>> +        if (bs->disable_vmstate_save) {
>> +            continue;
>> +        }
>> +
>>           aio_context_acquire(ctx);
>>           not_found = !bdrv_can_snapshot(bs);
>>           aio_context_release(ctx);
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 256609d..855a209 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -438,6 +438,9 @@ struct BlockDriverState {
>>       /* do we need to tell the quest if we have a volatile write cache? */
>>       int enable_write_cache;
>>
>> +    /* skip this BDS searching for one to save VM state */
>> +    bool disable_vmstate_save;
>> +
>>       /* the following member gives a name to every node on the bs graph. */
>>       char node_name[32];
>>       /* element of the list of named nodes building the graph */
> That sounds like an option. (No pun intended.)
>
> We can discuss the option name (perhaps "vmstate" defaulting to "on" is
> better?) and variable names (I'd prefer them to match the option name);
> also you'll need to extend the QAPI schema for blockdev-add. But all of
> these are minor points and the idea seems sane.
>
> Kevin
Perfect!

Thanks all for a discussion :)

Den

  reply	other threads:[~2016-01-12 16:58 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-12  6:03 [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot Denis V. Lunev
2016-01-12 14:16 ` Kevin Wolf
2016-01-12 14:59   ` Paolo Bonzini
2016-01-12 15:13     ` Denis V. Lunev
2016-01-12 15:16       ` Peter Maydell
2016-01-12 15:26         ` Kevin Wolf
2016-01-12 15:20     ` Kevin Wolf
2016-01-12 15:35       ` Paolo Bonzini
2016-01-12 15:47       ` Denis V. Lunev
2016-01-12 16:35         ` Denis V. Lunev
2016-01-12 16:52           ` Kevin Wolf
2016-01-12 16:58             ` Denis V. Lunev [this message]
2016-01-12 17:40             ` Markus Armbruster
2016-01-12 17:50               ` Kevin Wolf
2016-01-12 17:54                 ` Denis V. Lunev
2016-01-13  8:09                 ` Markus Armbruster
2016-01-13 10:43                 ` Laszlo Ersek
2016-01-12 17:53               ` Denis V. Lunev
2016-01-13 10:41               ` Laszlo Ersek
2016-01-13 10:37         ` Laszlo Ersek
2016-01-13 11:11           ` Denis V. Lunev
2016-01-13 12:15             ` Laszlo Ersek
2016-01-12 15:10   ` Denis V. Lunev
2016-01-12 15:28     ` Kevin Wolf
2016-01-14 11:33 ` [Qemu-devel] [PATCH 1/1] RESUME " Denis V. Lunev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=569530AA.6000703@openvz.org \
    --to=den@openvz.org \
    --cc=kwolf@redhat.com \
    --cc=lersek@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).