From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:33079)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1YlDUm-0000FP-Rf
	for qemu-devel@nongnu.org; Thu, 23 Apr 2015 05:32:33 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1YlDUf-00008i-VJ
	for qemu-devel@nongnu.org; Thu, 23 Apr 2015 05:32:32 -0400
Received: from mx1.redhat.com ([209.132.183.28]:59981)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1YlDUf-00008d-O5
	for qemu-devel@nongnu.org; Thu, 23 Apr 2015 05:32:25 -0400
Date: Thu, 23 Apr 2015 11:32:23 +0200
From: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20150423093223.GC5289@noname.redhat.com>
References: <1426069701-1405-1-git-send-email-den@openvz.org>
	<1426069701-1405-9-git-send-email-den@openvz.org>
	<20150422130855.GI27617@stefanha-thinkpad.redhat.com>
	<55379F36.3060906@openvz.org>
	<20150423092031.GC8811@stefanha-thinkpad.redhat.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="fdj2RfSjLxBAspz7"
Content-Disposition: inline
In-Reply-To: <20150423092031.GC8811@stefanha-thinkpad.redhat.com>
Subject: Re: [Qemu-devel] [PATCH 08/27] block/parallels: _co_writev callback
 for Parallels format
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "Denis V. Lunev" <den@openvz.org>, qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@gmail.com>


--fdj2RfSjLxBAspz7
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Am 23.04.2015 um 11:20 hat Stefan Hajnoczi geschrieben:
> On Wed, Apr 22, 2015 at 04:16:38PM +0300, Denis V. Lunev wrote:
> > On 22/04/15 16:08, Stefan Hajnoczi wrote:
> > >On Wed, Mar 11, 2015 at 01:28:02PM +0300, Denis V. Lunev wrote:
> > >>+static int64_t allocate_cluster(BlockDriverState *bs, int64_t sector=
_num)
> > >>+{
> > >>+    BDRVParallelsState *s =3D bs->opaque;
> > >>+    uint32_t idx, offset, tmp;
> > >>+    int64_t pos;
> > >>+    int ret;
> > >>+
> > >>+    idx =3D sector_num / s->tracks;
> > >>+    offset =3D sector_num % s->tracks;
> > >>+
> > >>+    if (idx >=3D s->catalog_size) {
> > >>+        return -EINVAL;
> > >>+    }
> > >>+    if (s->catalog_bitmap[idx] !=3D 0) {
> > >>+        return (uint64_t)s->catalog_bitmap[idx] * s->off_multiplier =
+ offset;
> > >>+    }
> > >>+
> > >>+    pos =3D bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS;
> > >>+    bdrv_truncate(bs->file, (pos + s->tracks) << BDRV_SECTOR_BITS);
> > >>+    s->catalog_bitmap[idx] =3D pos / s->off_multiplier;
> > >>+
> > >>+    tmp =3D cpu_to_le32(s->catalog_bitmap[idx]);
> > >>+
> > >>+    ret =3D bdrv_pwrite_sync(bs->file,
> > >>+            sizeof(ParallelsHeader) + idx * sizeof(tmp), &tmp, sizeo=
f(tmp));
> > >What is the purpose of the sync?
> > This is necessary to preserve image consistency on crash from
> > my point of view. There is no check consistency at the moment.
> > The sync will be removed later when proper crash detection
> > code will be added (patches 19, 20, 21)
>=20
> Let's look at possible orderings in case of failure:
>=20
> A. BAT update
> B. Data write
>=20
> This sync enforces A, B ordering.  If we can see B, then A must also
> have happened thanks to the sync.
>=20
> But A, B ordering is too conservative.  Imagine B, A ordering and the
> failure where we crash before A.  It means we wrote the data but never
> linked it into the BAT.
>=20
> What happens in that case?  We've leaked a cluster in the underlying
> image file but it doesn't corrupt the visible disk from the guest
> point-of-view.
>=20
> Because your implementation uses truncate to extend the file size before
> A, even the A, B failure case results in a leaked cluster.  So the B, A
> case is not worse in any way.
>=20
> Why do other image formats sync cluster allocation updates?  Because
> they support backing files and in that case an A, B ordering results in
> data corruption so they enforce B, A ordering (the opposite of what
> you're trying to do!).
>=20
> The reason why A, B ordering results in data corruption when backing
> files are in use is because the guest's write request might touch only a
> subset of the cluster (a couple of sectors out of the whole cluster).
> So the guest needs to copy the remaining sectors from the backing file.
> If there is a dangling BAT entry like in the A, B failure case, then the
> guest will see a zeroed cluster instead of the contents of the backing
> file.  This is a data corruption, but only if a backing file is being
> used!
>=20
> So the sync is not necessary, both A, B and B, A ordering work for
> block/parallels.c.

Actually, I suspect this means that the parallels driver is restricted
to protocols with bdrv_has_zero_init() =3D=3D true, otherwise zeros can turn
into random data (which means that it can't work e.g. directly on host
block devices).

Do we enforce this?

Kevin

--fdj2RfSjLxBAspz7
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVOLwnAAoJEH8JsnLIjy/WIVoQAL5J3wonw3QRTofApob3AxeW
PbRHxUY+mjKDQKfD/DuyckLAdKdK/1w0phDxXELlE/OotVSHSKCw6lAmCh/abdMV
yBtCobsjXWFhSHy6KkuiXXi2xTEeufEBZ1ydPrzBT5/pIm8DysI5w78e3GKtcCnT
VWa48D+mbSY06HFdGG5aEwDfmj/cDv9V9hUQZ+/JoTAoQaPknpVwnFxf9Qmunthf
orW/535ggjmeYTNNLMIGLivFdRoBkSZsI4Nut1PccrdSpTdn3Z2iGXXFXe6Z9vC2
tbU7qE2GSm95sREsIZ40wi4ftBXjtmSB/r1QRJImHpV3aaT/oFRFYogTqMijFi9j
lFGczPr+R07wFxT29QcfDHW7ANNn7JfkXtlUe8U3M1UZSlnVzNK2pQXupFcI+z3T
XzQHGYbbbvy5A6IVRcBK5sZsw4quJ2nfEtzaUuuSScjsK9ci4lQrd0wJZjqlgOLj
TOpKiWFW1+R4+MNlohjYXdhtSwiEiYcGjoZ2jVyi5ETf40p+nPBZG7WCdKzd2S1p
JdMT8RAq/4NfwCUGcHKiSRBQPInDfaoGy7NY/286B0aBTRCJVeV/AweopbkTY0YD
K+AylKf/X/ED5Rgm926knJzfuKakbIJzB5em2+8YGVZ7t07PEpSJaYjLeZgdqIg+
kH4Nm14IM0TCAnCIkY7u
=kAdJ
-----END PGP SIGNATURE-----

--fdj2RfSjLxBAspz7--