From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40300) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fakWJ-0003tW-3N for qemu-devel@nongnu.org; Wed, 04 Jul 2018 12:20:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fakWH-00005o-AK for qemu-devel@nongnu.org; Wed, 04 Jul 2018 12:20:43 -0400 Date: Wed, 4 Jul 2018 18:20:31 +0200 From: Kevin Wolf Message-ID: <20180704162031.GH4334@localhost.localdomain> References: <20180703180751.243496-1-vsementsov@virtuozzo.com> <20180703180751.243496-2-vsementsov@virtuozzo.com> <0e6e9f14-ad45-51c5-0d14-3e2e3dcaf5cb@virtuozzo.com> <20180704150833.GG4334@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [PATCH 1/2] block: add BDRV_REQ_SERIALISING flag List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladimir Sementsov-Ogievskiy Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, famz@redhat.com, stefanha@redhat.com, mreitz@redhat.com, jcody@redhat.com, eblake@redhat.com, jsnow@redhat.com, den@openvz.org Am 04.07.2018 um 18:11 hat Vladimir Sementsov-Ogievskiy geschrieben: > 04.07.2018 18:08, Kevin Wolf wrote: > > Am 04.07.2018 um 16:44 hat Vladimir Sementsov-Ogievskiy geschrieben: > > > 03.07.2018 21:07, Vladimir Sementsov-Ogievskiy wrote: > > > > Serialized writes should be used in copy-on-write of backup(sync=none) > > > > for image fleecing scheme. > > > > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy > > > > --- > > > > include/block/block.h | 5 ++++- > > > > block/io.c | 4 ++++ > > > > 2 files changed, 8 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/include/block/block.h b/include/block/block.h > > > > index e5c7759a0c..107113aad5 100644 > > > > --- a/include/block/block.h > > > > +++ b/include/block/block.h > > > > @@ -58,8 +58,11 @@ typedef enum { > > > > * content. */ > > > > BDRV_REQ_WRITE_UNCHANGED = 0x40, > > > > + /* Force request serializing. Only for writes. */ > > > > + BDRV_REQ_SERIALISING = 0x80, > > > > + > > > > /* Mask of valid flags */ > > > > - BDRV_REQ_MASK = 0x7f, > > > > + BDRV_REQ_MASK = 0xff, > > > > } BdrvRequestFlags; > > > > typedef struct BlockSizes { > > > > diff --git a/block/io.c b/block/io.c > > > > index 1a2272fad3..d5ba078514 100644 > > > > --- a/block/io.c > > > > +++ b/block/io.c > > > > @@ -1572,6 +1572,10 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child, > > > > max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX), > > > > align); > > > > + if (flags & BDRV_REQ_SERIALISING) { > > > > + mark_request_serialising(req, bdrv_get_cluster_size(bs)); > > > > + } > > > > + > > > > waited = wait_serialising_requests(req); > > > > assert(!waited || !req->serialising); > > > Kevin, about this assertion, introduced in 28de2dcd88de "block: Assert > > > serialisation assumptions in pwritev"? Will not it fail with fleecing > > > scheme? I'm afraid it will, when we will wait for client read with our > > > request, marked serializing a moment ago... > > Hm, looks like it yes. > > > > > Can we just switch it to assert(!waited || !req->partial);, setting > > > req->partial in bdrv_co_pwritev for parts of unaligned requests? And allow > > > new flag only for aligned requests? > > > > > > Other ideas? > > The commit message of 28de2dcd88de tells you what we need to do (and > > that just changing the assertion is wrong): > > > > If a request calls wait_serialising_requests() and actually has to wait > > in this function (i.e. a coroutine yield), other requests can run and > > previously read data (like the head or tail buffer) could become > > outdated. In this case, we would have to restart from the beginning to > > read in the updated data. > > > > However, we're lucky and don't actually need to do that: A request can > > only wait in the first call of wait_serialising_requests() because we > > mark it as serialising before that call, so any later requests would > > wait. So as we don't wait in practice, we don't have to reload the data. > > > > This is an important assumption that may not be broken or data > > corruption will happen. Document it with some assertions. > > > > So we may need to return -EAGAIN here, check that in the caller and > > repeat the write request from the very start. > > But in case of aligned request, there no previously read data, and we can > safely continue. And actually it's our case (backup writes are aligned). Hm, right. I don't particularly like req->partial because it's easy to forget to set it to false when you do something that would need to be repeated, but I don't have a better idea. Kevin