qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: qemu-block@nongnu.org
Cc: kwolf@redhat.com, stefanha@redhat.com, qemu-devel@nongnu.org
Subject: [Qemu-devel] [PULL 4/4] block: Pass unaligned discard requests to drivers
Date: Tue, 22 Nov 2016 17:04:53 +0100	[thread overview]
Message-ID: <1479830693-26676-5-git-send-email-kwolf@redhat.com> (raw)
In-Reply-To: <1479830693-26676-1-git-send-email-kwolf@redhat.com>

From: Eric Blake <eblake@redhat.com>

Discard is advisory, so rounding the requests to alignment
boundaries is never semantically wrong from the data that
the guest sees.  But at least the Dell Equallogic iSCSI SANs
has an interesting property that its advertised discard
alignment is 15M, yet documents that discarding a sequence
of 1M slices will eventually result in the 15M page being
marked as discarded, and it is possible to observe which
pages have been discarded.

Between commits 9f1963b and b8d0a980, we converted the block
layer to a byte-based interface that ultimately ignores any
unaligned head or tail based on the driver's advertised
discard granularity, which means that qemu 2.7 refuses to
pass any discard request smaller than 15M down to the Dell
Equallogic hardware.  This is a slight regression in behavior
compared to earlier qemu, where a guest executing discards
in power-of-2 chunks used to be able to get every page
discarded, but is now left with various pages still allocated
because the guest requests did not align with the hardware's
15M pages.

Since the SCSI specification says nothing about a minimum
discard granularity, and only documents the preferred
alignment, it is best if the block layer gives the driver
every bit of information about discard requests, rather than
rounding it to alignment boundaries early.

Rework the block layer discard algorithm to mirror the write
zero algorithm: always peel off any unaligned head or tail
and manage that in isolation, then do the bulk of the request
on an aligned boundary.  The fallback when the driver returns
-ENOTSUP for an unaligned request is to silently ignore that
portion of the discard request; but for devices that can pass
the partial request all the way down to hardware, this can
result in the hardware coalescing requests and discarding
aligned pages after all.

Reported by: Peter Lieven <pl@kamp.de>
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/block/io.c b/block/io.c
index 085ac34..4f00562 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2424,7 +2424,7 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset,
 {
     BdrvTrackedRequest req;
     int max_pdiscard, ret;
-    int head, align;
+    int head, tail, align;
 
     if (!bs->drv) {
         return -ENOMEDIUM;
@@ -2447,19 +2447,15 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset,
         return 0;
     }
 
-    /* Discard is advisory, so ignore any unaligned head or tail */
+    /* Discard is advisory, but some devices track and coalesce
+     * unaligned requests, so we must pass everything down rather than
+     * round here.  Still, most devices will just silently ignore
+     * unaligned requests (by returning -ENOTSUP), so we must fragment
+     * the request accordingly.  */
     align = MAX(bs->bl.pdiscard_alignment, bs->bl.request_alignment);
     assert(align % bs->bl.request_alignment == 0);
     head = offset % align;
-    if (head) {
-        head = MIN(count, align - head);
-        count -= head;
-        offset += head;
-    }
-    count = QEMU_ALIGN_DOWN(count, align);
-    if (!count) {
-        return 0;
-    }
+    tail = (offset + count) % align;
 
     bdrv_inc_in_flight(bs);
     tracked_request_begin(&req, bs, offset, count, BDRV_TRACKED_DISCARD);
@@ -2471,11 +2467,34 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset,
 
     max_pdiscard = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_pdiscard, INT_MAX),
                                    align);
-    assert(max_pdiscard);
+    assert(max_pdiscard >= bs->bl.request_alignment);
 
     while (count > 0) {
         int ret;
-        int num = MIN(count, max_pdiscard);
+        int num = count;
+
+        if (head) {
+            /* Make small requests to get to alignment boundaries. */
+            num = MIN(count, align - head);
+            if (!QEMU_IS_ALIGNED(num, bs->bl.request_alignment)) {
+                num %= bs->bl.request_alignment;
+            }
+            head = (head + num) % align;
+            assert(num < max_pdiscard);
+        } else if (tail) {
+            if (num > align) {
+                /* Shorten the request to the last aligned cluster.  */
+                num -= tail;
+            } else if (!QEMU_IS_ALIGNED(tail, bs->bl.request_alignment) &&
+                       tail > bs->bl.request_alignment) {
+                tail %= bs->bl.request_alignment;
+                num -= tail;
+            }
+        }
+        /* limit request size */
+        if (num > max_pdiscard) {
+            num = max_pdiscard;
+        }
 
         if (bs->drv->bdrv_co_pdiscard) {
             ret = bs->drv->bdrv_co_pdiscard(bs, offset, num);
-- 
1.8.3.1

  parent reply	other threads:[~2016-11-22 16:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-22 16:04 [Qemu-devel] [PULL 0/4] Block layer patches for 2.8.0-rc1 Kevin Wolf
2016-11-22 16:04 ` [Qemu-devel] [PULL 1/4] qcow2: Inform block layer about discard boundaries Kevin Wolf
2016-11-22 16:04 ` [Qemu-devel] [PULL 2/4] block: Let write zeroes fallback work even with small max_transfer Kevin Wolf
2016-11-22 16:04 ` [Qemu-devel] [PULL 3/4] block: Return -ENOTSUP rather than assert on unaligned discards Kevin Wolf
2016-11-22 16:04 ` Kevin Wolf [this message]
2016-11-22 19:30 ` [Qemu-devel] [Qemu-block] [PULL 0/4] Block layer patches for 2.8.0-rc1 Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1479830693-26676-5-git-send-email-kwolf@redhat.com \
    --to=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).