From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33079) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YlDUm-0000FP-Rf for qemu-devel@nongnu.org; Thu, 23 Apr 2015 05:32:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YlDUf-00008i-VJ for qemu-devel@nongnu.org; Thu, 23 Apr 2015 05:32:32 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59981) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YlDUf-00008d-O5 for qemu-devel@nongnu.org; Thu, 23 Apr 2015 05:32:25 -0400 Date: Thu, 23 Apr 2015 11:32:23 +0200 From: Kevin Wolf Message-ID: <20150423093223.GC5289@noname.redhat.com> References: <1426069701-1405-1-git-send-email-den@openvz.org> <1426069701-1405-9-git-send-email-den@openvz.org> <20150422130855.GI27617@stefanha-thinkpad.redhat.com> <55379F36.3060906@openvz.org> <20150423092031.GC8811@stefanha-thinkpad.redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="fdj2RfSjLxBAspz7" Content-Disposition: inline In-Reply-To: <20150423092031.GC8811@stefanha-thinkpad.redhat.com> Subject: Re: [Qemu-devel] [PATCH 08/27] block/parallels: _co_writev callback for Parallels format List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: "Denis V. Lunev" , qemu-devel@nongnu.org, Stefan Hajnoczi --fdj2RfSjLxBAspz7 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Am 23.04.2015 um 11:20 hat Stefan Hajnoczi geschrieben: > On Wed, Apr 22, 2015 at 04:16:38PM +0300, Denis V. Lunev wrote: > > On 22/04/15 16:08, Stefan Hajnoczi wrote: > > >On Wed, Mar 11, 2015 at 01:28:02PM +0300, Denis V. Lunev wrote: > > >>+static int64_t allocate_cluster(BlockDriverState *bs, int64_t sector= _num) > > >>+{ > > >>+ BDRVParallelsState *s =3D bs->opaque; > > >>+ uint32_t idx, offset, tmp; > > >>+ int64_t pos; > > >>+ int ret; > > >>+ > > >>+ idx =3D sector_num / s->tracks; > > >>+ offset =3D sector_num % s->tracks; > > >>+ > > >>+ if (idx >=3D s->catalog_size) { > > >>+ return -EINVAL; > > >>+ } > > >>+ if (s->catalog_bitmap[idx] !=3D 0) { > > >>+ return (uint64_t)s->catalog_bitmap[idx] * s->off_multiplier = + offset; > > >>+ } > > >>+ > > >>+ pos =3D bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS; > > >>+ bdrv_truncate(bs->file, (pos + s->tracks) << BDRV_SECTOR_BITS); > > >>+ s->catalog_bitmap[idx] =3D pos / s->off_multiplier; > > >>+ > > >>+ tmp =3D cpu_to_le32(s->catalog_bitmap[idx]); > > >>+ > > >>+ ret =3D bdrv_pwrite_sync(bs->file, > > >>+ sizeof(ParallelsHeader) + idx * sizeof(tmp), &tmp, sizeo= f(tmp)); > > >What is the purpose of the sync? > > This is necessary to preserve image consistency on crash from > > my point of view. There is no check consistency at the moment. > > The sync will be removed later when proper crash detection > > code will be added (patches 19, 20, 21) >=20 > Let's look at possible orderings in case of failure: >=20 > A. BAT update > B. Data write >=20 > This sync enforces A, B ordering. If we can see B, then A must also > have happened thanks to the sync. >=20 > But A, B ordering is too conservative. Imagine B, A ordering and the > failure where we crash before A. It means we wrote the data but never > linked it into the BAT. >=20 > What happens in that case? We've leaked a cluster in the underlying > image file but it doesn't corrupt the visible disk from the guest > point-of-view. >=20 > Because your implementation uses truncate to extend the file size before > A, even the A, B failure case results in a leaked cluster. So the B, A > case is not worse in any way. >=20 > Why do other image formats sync cluster allocation updates? Because > they support backing files and in that case an A, B ordering results in > data corruption so they enforce B, A ordering (the opposite of what > you're trying to do!). >=20 > The reason why A, B ordering results in data corruption when backing > files are in use is because the guest's write request might touch only a > subset of the cluster (a couple of sectors out of the whole cluster). > So the guest needs to copy the remaining sectors from the backing file. > If there is a dangling BAT entry like in the A, B failure case, then the > guest will see a zeroed cluster instead of the contents of the backing > file. This is a data corruption, but only if a backing file is being > used! >=20 > So the sync is not necessary, both A, B and B, A ordering work for > block/parallels.c. Actually, I suspect this means that the parallels driver is restricted to protocols with bdrv_has_zero_init() =3D=3D true, otherwise zeros can turn into random data (which means that it can't work e.g. directly on host block devices). Do we enforce this? Kevin --fdj2RfSjLxBAspz7 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJVOLwnAAoJEH8JsnLIjy/WIVoQAL5J3wonw3QRTofApob3AxeW PbRHxUY+mjKDQKfD/DuyckLAdKdK/1w0phDxXELlE/OotVSHSKCw6lAmCh/abdMV yBtCobsjXWFhSHy6KkuiXXi2xTEeufEBZ1ydPrzBT5/pIm8DysI5w78e3GKtcCnT VWa48D+mbSY06HFdGG5aEwDfmj/cDv9V9hUQZ+/JoTAoQaPknpVwnFxf9Qmunthf orW/535ggjmeYTNNLMIGLivFdRoBkSZsI4Nut1PccrdSpTdn3Z2iGXXFXe6Z9vC2 tbU7qE2GSm95sREsIZ40wi4ftBXjtmSB/r1QRJImHpV3aaT/oFRFYogTqMijFi9j lFGczPr+R07wFxT29QcfDHW7ANNn7JfkXtlUe8U3M1UZSlnVzNK2pQXupFcI+z3T XzQHGYbbbvy5A6IVRcBK5sZsw4quJ2nfEtzaUuuSScjsK9ci4lQrd0wJZjqlgOLj TOpKiWFW1+R4+MNlohjYXdhtSwiEiYcGjoZ2jVyi5ETf40p+nPBZG7WCdKzd2S1p JdMT8RAq/4NfwCUGcHKiSRBQPInDfaoGy7NY/286B0aBTRCJVeV/AweopbkTY0YD K+AylKf/X/ED5Rgm926knJzfuKakbIJzB5em2+8YGVZ7t07PEpSJaYjLeZgdqIg+ kH4Nm14IM0TCAnCIkY7u =kAdJ -----END PGP SIGNATURE----- --fdj2RfSjLxBAspz7--