From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:47599)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1aJ0QU-0007OK-SD
	for qemu-devel@nongnu.org; Tue, 12 Jan 2016 10:00:04 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1aJ0QT-0006Dt-PH
	for qemu-devel@nongnu.org; Tue, 12 Jan 2016 10:00:02 -0500
References: <1452578622-4492-1-git-send-email-den@openvz.org>
	<20160112141607.GD4841@noname.redhat.com>
From: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <569514E7.8090101@redhat.com>
Date: Tue, 12 Jan 2016 15:59:51 +0100
MIME-Version: 1.0
In-Reply-To: <20160112141607.GD4841@noname.redhat.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for
 internal snapshot
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>, "Denis V. Lunev" <den@openvz.org>
Cc: Laszlo Ersek <lersek@redhat.com>, qemu-devel@nongnu.org, qemu-block@nongnu.org


On 12/01/2016 15:16, Kevin Wolf wrote:
>> Thus we should avoid selection of "pflash" drives for VM state saving.
>>
>> For now "pflash" is read-write raw image as it configured by libvirt.
>> Thus there are no such images in the field and we could safely disable
>> ability to save state to those images inside QEMU.
>=20
> This is obviously broken. If you write to the pflash, then it needs to
> be snapshotted in order to keep a consistent state.
>=20
> If you want to avoid snapshotting the image, make it read-only and it
> will be skipped even today.

Sort of.  The point of having flash is to _not_ make it read-only, so=20
that is not a solution.

Flash is already being snapshotted as part of saving RAM state.  In=20
fact, for this reason the device (at least the one used with OVMF; I=20
haven't checked other pflash devices) can simply save it back to disk=20
on the migration destination, without the need to use "migrate -b" or=20
shared storage.

See commit 4c0cfc72b31a79f737a64ebbe0411e4b83e25771:

    Author: Laszlo Ersek <lersek@redhat.com>
    Date:   Sat Aug 23 12:19:07 2014 +0200

    pflash_cfi01: write flash contents to bdrv on incoming migration
   =20
    A drive that backs a pflash device is special:
    - it is very small,
    - its entire contents are kept in a RAMBlock at all times, covering t=
he
      guest-phys address range that provides the guest's view of the emul=
ated
      flash chip.
   =20
    The pflash device model keeps the drive (the host-side file) and the
    guest-visible flash contents in sync. When migrating the guest, the
    guest-visible flash contents (the RAMBlock) is migrated by default, b=
ut on
    the target host, the drive (the host-side file) remains in full sync =
with
    the RAMBlock only if:
    - the source and target hosts share the storage underlying the pflash
      drive,
    - or the migration requests full or incremental block migration too, =
which
      then covers all drives.
   =20
    Due to the special nature of pflash drives, the following scenario ma=
kes
    sense as well:
    - no full nor incremental block migration, covering all drives, along=
side
      the base migration (justified eg. by shared storage for "normal" (b=
ig)
      drives),
    - non-shared storage for pflash drives.
   =20
    In this case, currently only those portions of the flash drive are up=
dated
    on the target disk that the guest reprograms while running on the tar=
get
    host.
   =20
    In order to restore accord, dump the entire flash contents to the bdr=
v in
    a post_load() callback.
   =20
    - The read-only check follows the other call-sites of pflash_update()=
;
    - both "pfl->ro" and pflash_update() reflect / consider the case when
      "pfl->bs" is NULL;
    - the total size of the flash device is calculated as in
      pflash_cfi01_realize().
   =20
    When using shared storage, or requesting full or incremental block
    migration along with the normal migration, the patch should incur a
    harmless rewrite from the target side.
   =20
    It is assumed that, on the target host, RAM is loaded ahead of the ca=
ll to
    pflash_post_load().

I don't like very much using IF_PFLASH this way, which is why I hadn't
replied to the patch so far---I hadn't made up my mind about *what* to
suggest instead, or whether to just accept it.  However, it does work.

Perhaps a separate "I know what I am doing" skip-snapshot option?  Or
a device callback saying "not snapshotting this is fine"?

Paolo