From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47599) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aJ0QU-0007OK-SD for qemu-devel@nongnu.org; Tue, 12 Jan 2016 10:00:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aJ0QT-0006Dt-PH for qemu-devel@nongnu.org; Tue, 12 Jan 2016 10:00:02 -0500 References: <1452578622-4492-1-git-send-email-den@openvz.org> <20160112141607.GD4841@noname.redhat.com> From: Paolo Bonzini Message-ID: <569514E7.8090101@redhat.com> Date: Tue, 12 Jan 2016 15:59:51 +0100 MIME-Version: 1.0 In-Reply-To: <20160112141607.GD4841@noname.redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf , "Denis V. Lunev" Cc: Laszlo Ersek , qemu-devel@nongnu.org, qemu-block@nongnu.org On 12/01/2016 15:16, Kevin Wolf wrote: >> Thus we should avoid selection of "pflash" drives for VM state saving. >> >> For now "pflash" is read-write raw image as it configured by libvirt. >> Thus there are no such images in the field and we could safely disable >> ability to save state to those images inside QEMU. >=20 > This is obviously broken. If you write to the pflash, then it needs to > be snapshotted in order to keep a consistent state. >=20 > If you want to avoid snapshotting the image, make it read-only and it > will be skipped even today. Sort of. The point of having flash is to _not_ make it read-only, so=20 that is not a solution. Flash is already being snapshotted as part of saving RAM state. In=20 fact, for this reason the device (at least the one used with OVMF; I=20 haven't checked other pflash devices) can simply save it back to disk=20 on the migration destination, without the need to use "migrate -b" or=20 shared storage. See commit 4c0cfc72b31a79f737a64ebbe0411e4b83e25771: Author: Laszlo Ersek Date: Sat Aug 23 12:19:07 2014 +0200 pflash_cfi01: write flash contents to bdrv on incoming migration =20 A drive that backs a pflash device is special: - it is very small, - its entire contents are kept in a RAMBlock at all times, covering t= he guest-phys address range that provides the guest's view of the emul= ated flash chip. =20 The pflash device model keeps the drive (the host-side file) and the guest-visible flash contents in sync. When migrating the guest, the guest-visible flash contents (the RAMBlock) is migrated by default, b= ut on the target host, the drive (the host-side file) remains in full sync = with the RAMBlock only if: - the source and target hosts share the storage underlying the pflash drive, - or the migration requests full or incremental block migration too, = which then covers all drives. =20 Due to the special nature of pflash drives, the following scenario ma= kes sense as well: - no full nor incremental block migration, covering all drives, along= side the base migration (justified eg. by shared storage for "normal" (b= ig) drives), - non-shared storage for pflash drives. =20 In this case, currently only those portions of the flash drive are up= dated on the target disk that the guest reprograms while running on the tar= get host. =20 In order to restore accord, dump the entire flash contents to the bdr= v in a post_load() callback. =20 - The read-only check follows the other call-sites of pflash_update()= ; - both "pfl->ro" and pflash_update() reflect / consider the case when "pfl->bs" is NULL; - the total size of the flash device is calculated as in pflash_cfi01_realize(). =20 When using shared storage, or requesting full or incremental block migration along with the normal migration, the patch should incur a harmless rewrite from the target side. =20 It is assumed that, on the target host, RAM is loaded ahead of the ca= ll to pflash_post_load(). I don't like very much using IF_PFLASH this way, which is why I hadn't replied to the patch so far---I hadn't made up my mind about *what* to suggest instead, or whether to just accept it. However, it does work. Perhaps a separate "I know what I am doing" skip-snapshot option? Or a device callback saying "not snapshotting this is fine"? Paolo