From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55569) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b82Pf-0003bh-0k for qemu-devel@nongnu.org; Wed, 01 Jun 2016 05:26:12 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b82PZ-0001aU-V6 for qemu-devel@nongnu.org; Wed, 01 Jun 2016 05:26:05 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33578) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b82PZ-0001aI-Mt for qemu-devel@nongnu.org; Wed, 01 Jun 2016 05:26:01 -0400 Date: Wed, 1 Jun 2016 11:25:57 +0200 From: Kevin Wolf Message-ID: <20160601092557.GB5356@noname.str.redhat.com> References: <1463229957-14253-1-git-send-email-den@openvz.org> <1463229957-14253-3-git-send-email-den@openvz.org> <20160527173335.GF27946@stefanha-x1.localdomain> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="1yeeQ81UyVL57Vl7" Content-Disposition: inline In-Reply-To: <20160527173335.GF27946@stefanha-x1.localdomain> Subject: Re: [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: "Denis V. Lunev" , qemu-devel@nongnu.org, Pavel Butsykin , Jeff Cody , Markus Armbruster , Stefan Hajnoczi , John Snow --1yeeQ81UyVL57Vl7 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Am 27.05.2016 um 19:33 hat Stefan Hajnoczi geschrieben: > On Sat, May 14, 2016 at 03:45:50PM +0300, Denis V. Lunev wrote: > > + qemu_co_mutex_lock(&s->lock); > > + cluster_offset =3D \ > > + qcow2_alloc_compressed_cluster_offset(bs, sector_num << 9, out= _len); >=20 > The backslash isn't necessary for wrapping lines in C. This kind of > thing is only necessary in languages like Python where the grammar is > whitespace sensistive. >=20 > The C compiler is happy with an arbitrary amount of whitespace > (newlines) in the middle of a statement. The backslash in C is handled > by the preprocessor: it joins the line. That's useful for macro > definitions where you need to tell the preprocessor that several lines > belong to one macro definition. But it's not needed for normal C code. >=20 > > + if (!cluster_offset) { > > + qemu_co_mutex_unlock(&s->lock); > > + ret =3D -EIO; > > + goto fail; > > + } > > + cluster_offset &=3D s->cluster_offset_mask; > > =20 > > - BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); > > - ret =3D bdrv_pwrite(bs->file->bs, cluster_offset, out_buf, out= _len); > > - if (ret < 0) { > > - goto fail; > > - } > > + ret =3D qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_l= en); > > + qemu_co_mutex_unlock(&s->lock); > > + if (ret < 0) { > > + goto fail; > > } > > =20 > > + iov =3D (struct iovec) { > > + .iov_base =3D out_buf, > > + .iov_len =3D out_len, > > + }; > > + qemu_iovec_init_external(&hd_qiov, &iov, 1); > > + > > + BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); > > + ret =3D bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len, &hd= _qiov, 0); >=20 > There is a race condition here: >=20 > If the newly allocated cluster is only partially filled by compressed > data then qcow2_alloc_compressed_cluster_offset() remembers that more > bytes are still available in the cluster. The > qcow2_alloc_compressed_cluster_offset() caller will continue filling the > same cluster. >=20 > Imagine two compressed writes running at the same time. Write A > allocates just a few bytes so write B shares a sector with the first > write: >=20 > Sector 1 > |AAABBBBBBBBB| >=20 > The race condition is that bdrv_co_pwritev() uses read-modify-write (a > bounce buffer). If both requests call bdrv_co_pwritev() around the same > time then the following could happen: >=20 > Sector 1 > |000BBBBBBBBB| >=20 > or: >=20 > Sector 1 > |AAA000000000| >=20 > It's necessary to hold s->lock around the compressed data write to avoid > this race condition. I don't think this race condition exists. bdrv_co_pwritev() can indeed perform RMW if the lower layer requires alignment, but that's not something callers need to care about (which would be an awful API) but is fully handled there by marking requests serialising if they involve RMW. Kevin --1yeeQ81UyVL57Vl7 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJXTqolAAoJEH8JsnLIjy/WtQEQAKFzMv0V2itiSzpWr/nqtExs gWA/BkbZOc7HRa+vFzMoARn91ae5FJf5YkjeMBpILSho6p2L+64P6Hhs8Owr945H /QcUkIrYkNx72SsJcWbWxRXX+eFdNI1V9WR+IlSuLNcX5aQAdJ7OCdL9F/T8mGNM tA2AMYnW+rl5RnWBWa+SbWpfWCK7BwO3sIEike2pAURNUfw1aMyOtykfYg9sHT2/ /MejUwkS1ewqO6m0DzjwJ6+GSvKAhho2TQzDffcU53949FkGrFE2N9aApGVZcW6z 7gKOslt0MOZzy2VAHxbJyRn56YPS8+UEcAu2XRZDUCrNHGRbsOYoj0/0bCQst+4S A1WMdU18I1fdJ3TPDJlDcHmi9tfwz6oQhE5qBI22KVoys/10/dnA2NK3wBSFgCmc qYuwINFOo+Bybj5iXALIQPKBjjhqZOZAVUwMw0tcQ5SlQioDG6iY71p59Aol7v3C 7C58qEUAIX2bfMoB3V33hQbODEr3b6WsJmznWmuSMc5kYVdwP3l1wgljm+m07/bd OAesLBdiP5SxAzhm5PW1axhAtxuAULk+4GVDwfbjkr24tvV6DRgPxeRYgiKbS3ks so2Te2oyqB1inj6+tPOyms2MzUk4t3BM9Gwm7TjYDehhcsGeaqyR3xEySyf50BPz jgwYMFHlSNPm6iReF+x4 =zXm2 -----END PGP SIGNATURE----- --1yeeQ81UyVL57Vl7--