From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36885) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c59XQ-00075x-Qs for qemu-devel@nongnu.org; Fri, 11 Nov 2016 05:58:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c59XM-0005CD-3S for qemu-devel@nongnu.org; Fri, 11 Nov 2016 05:58:28 -0500 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:51147 helo=mx01.kamp.de) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c59XL-0005BY-Pd for qemu-devel@nongnu.org; Fri, 11 Nov 2016 05:58:24 -0500 Message-ID: <5825A449.4020807@kamp.de> Date: Fri, 11 Nov 2016 11:58:17 +0100 From: Peter Lieven MIME-Version: 1.0 References: <1478819426-7564-1-git-send-email-eblake@redhat.com> <1478819426-7564-3-git-send-email-eblake@redhat.com> In-Reply-To: <1478819426-7564-3-git-send-email-eblake@redhat.com> Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [Qemu-stable] [PATCH 2/2] block: Pass unaligned discard requests to drivers List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake , qemu-devel@nongnu.org Cc: Kevin Wolf , qemu-block@nongnu.org, qemu-stable@nongnu.org, Max Reitz , Stefan Hajnoczi , pbonzini@redhat.com Am 11.11.2016 um 00:10 schrieb Eric Blake: > Discard is advisory, so rounding the requests to alignment > boundaries is never semantically wrong from the data that > the guest sees. But at least the Dell Equallogic iSCSI SANs > has an interesting property that its advertised discard > alignment is 15M, yet documents that discarding a sequence > of 1M slices will eventually result in the 15M page being > marked as discarded, and it is possible to observe which > pages have been discarded. > > Between commits 9f1963b and b8d0a980, we converted the block > layer to a byte-based interface that ultimately ignores any > unaligned head or tail based on the driver's advertised > discard granularity, which means that qemu 2.7 refuses to > pass any discard request smaller than 15M down to the Dell > Equallogic hardware. This is a slight regression in behavior > compared to earlier qemu, where a guest executing discards > in power-of-2 chunks used to be able to get every page > discarded, but is now left with various pages still allocated > because the guest requests did not align with the hardware's > 15M pages. > > Since the SCSI specification says nothing about a minimum > discard granularity, and only documents the preferred > alignment, it is best if the block layer gives the driver > every bit of information about discard requests, rather than > rounding it to alignment boundaries early. > > Rework the block layer discard algorithm to mirror the write > zero algorithm: always peel off any unaligned head or tail > and manage that in isolation, then do the bulk of the request > on an aligned boundary. The fallback when the driver returns > -ENOTSUP for an unaligned request is to silently ignore that > portion of the discard request; but for devices that can pass > the partial request all the way down to hardware, this can > result in the hardware coalescing requests and discarding > aligned pages after all. > > Reported by: Peter Lieven > Signed-off-by: Eric Blake > CC: qemu-stable@nongnu.org > --- > block/io.c | 36 +++++++++++++++++++++++------------- > 1 file changed, 23 insertions(+), 13 deletions(-) > > diff --git a/block/io.c b/block/io.c > index aa532a5..76c3030 100644 > --- a/block/io.c > +++ b/block/io.c > @@ -2421,7 +2421,7 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset, > { > BdrvTrackedRequest req; > int max_pdiscard, ret; > - int head, align; > + int head, tail, align; > > if (!bs->drv) { > return -ENOMEDIUM; > @@ -2444,19 +2444,15 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset, > return 0; > } > > - /* Discard is advisory, so ignore any unaligned head or tail */ > + /* Discard is advisory, but some devices track and coalesce > + * unaligned requests, so we must pass everything down rather than > + * round here. Still, most devices will just silently ignore > + * unaligned requests (by returning -ENOTSUP), so we must fragment > + * the request accordingly. */ > align = MAX(bs->bl.pdiscard_alignment, bs->bl.request_alignment); > assert(align % bs->bl.request_alignment == 0); > head = offset % align; > - if (head) { > - head = MIN(count, align - head); > - count -= head; > - offset += head; > - } > - count = QEMU_ALIGN_DOWN(count, align); > - if (!count) { > - return 0; > - } > + tail = (offset + count) % align; > > bdrv_inc_in_flight(bs); > tracked_request_begin(&req, bs, offset, count, BDRV_TRACKED_DISCARD); > @@ -2468,11 +2464,25 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset, > > max_pdiscard = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_pdiscard, INT_MAX), > align); > - assert(max_pdiscard); > + assert(max_pdiscard >= bs->bl.request_alignment); > > while (count > 0) { > int ret; > - int num = MIN(count, max_pdiscard); > + int num = count; > + > + if (head) { > + /* Make a small request up to the first aligned sector. */ > + num = MIN(count, align - head); > + head = (head + num) % align; Is there any way that head is != 0 after this? Peter