qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: qemu-devel@nongnu.org
Cc: qemu-block@nongnu.org, kwolf@redhat.com, pbonzini@redhat.com,
	qemu-stable@nongnu.org, "Denis V . Lunev" <den@openvz.org>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Fam Zheng <famz@redhat.com>, Max Reitz <mreitz@redhat.com>
Subject: [Qemu-devel] [PATCH v3 3/9] block: Let write zeroes fallback work even with small max_transfer
Date: Thu, 17 Nov 2016 14:13:56 -0600	[thread overview]
Message-ID: <1479413642-22463-4-git-send-email-eblake@redhat.com> (raw)
In-Reply-To: <1479413642-22463-1-git-send-email-eblake@redhat.com>

Commit 443668ca rewrote the write_zeroes logic to guarantee that
an unaligned request never crosses a cluster boundary.  But
in the rewrite, the new code assumed that at most one iteration
would be needed to get to an alignment boundary.

However, it is easy to trigger an assertion failure: the Linux
kernel limits loopback devices to advertise a max_transfer of
only 64k.  Any operation that requires falling back to writes
rather than more efficient zeroing must obey max_transfer during
that fallback, which means an unaligned head may require multiple
iterations of the write fallbacks before reaching the aligned
boundaries, when layering a format with clusters larger than 64k
atop the protocol of file access to a loopback device.

Test case:

$ qemu-img create -f qcow2 -o cluster_size=1M file 10M
$ losetup /dev/loop2 /path/to/file
$ qemu-io -f qcow2 /dev/loop2
qemu-io> w 7m 1k
qemu-io> w -z 8003584 2093056

In fairness to Denis (as the original listed author of the culprit
commit), the faulty logic for at most one iteration is probably all
my fault in reworking his idea.  But the solution is to restore what
was in place prior to that commit: when dealing with an unaligned
head or tail, iterate as many times as necessary while fragmenting
the operation at max_transfer boundaries.

Reported-by: Ed Swierk <eswierk@skyportsystems.com>
CC: qemu-stable@nongnu.org
CC: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/io.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index aa532a5..085ac34 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1214,6 +1214,8 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
     int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
     int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
                         bs->bl.request_alignment);
+    int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
+                                    MAX_WRITE_ZEROES_BOUNCE_BUFFER);

     assert(alignment % bs->bl.request_alignment == 0);
     head = offset % alignment;
@@ -1229,9 +1231,12 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
          * boundaries.
          */
         if (head) {
-            /* Make a small request up to the first aligned sector.  */
-            num = MIN(count, alignment - head);
-            head = 0;
+            /* Make a small request up to the first aligned sector. For
+             * convenience, limit this request to max_transfer even if
+             * we don't need to fall back to writes.  */
+            num = MIN(MIN(count, max_transfer), alignment - head);
+            head = (head + num) % alignment;
+            assert(num < max_write_zeroes);
         } else if (tail && num > alignment) {
             /* Shorten the request to the last aligned sector.  */
             num -= tail;
@@ -1257,8 +1262,6 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,

         if (ret == -ENOTSUP) {
             /* Fall back to bounce buffer if write zeroes is unsupported */
-            int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
-                                            MAX_WRITE_ZEROES_BOUNCE_BUFFER);
             BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;

             if ((flags & BDRV_REQ_FUA) &&
-- 
2.7.4

  parent reply	other threads:[~2016-11-17 20:14 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-17 20:13 [Qemu-devel] [PATCH v2 for-2.8* 0/9] Fix block regressions, add blkdebug tests Eric Blake
2016-11-17 20:13 ` [Qemu-devel] [PATCH v2 1/9] nbd: Allow unmap and fua during write zeroes Eric Blake
2016-11-17 21:10   ` Max Reitz
2016-11-17 21:14     ` Eric Blake
2016-11-18 14:12     ` Paolo Bonzini
2016-11-17 20:13 ` [Qemu-devel] [PATCH v2 2/9] qcow2: Inform block layer about discard boundaries Eric Blake
2016-11-17 21:24   ` Max Reitz
2016-11-17 20:13 ` Eric Blake [this message]
2016-11-17 21:40   ` [Qemu-devel] [PATCH v3 3/9] block: Let write zeroes fallback work even with small max_transfer Max Reitz
2016-11-22 13:16   ` Kevin Wolf
2016-11-22 13:22     ` Eric Blake
2016-11-22 13:30       ` Kevin Wolf
2016-11-17 20:13 ` [Qemu-devel] [PATCH v2 4/9] block: Return -ENOTSUP rather than assert on unaligned discards Eric Blake
2016-11-17 22:01   ` Max Reitz
2016-11-18 22:48     ` Eric Blake
2016-11-17 20:13 ` [Qemu-devel] [PATCH v2 5/9] block: Pass unaligned discard requests to drivers Eric Blake
2016-11-17 22:26   ` Max Reitz
2016-11-17 23:01     ` Eric Blake
2016-11-17 23:03       ` Max Reitz
2016-11-17 23:44   ` Max Reitz
2016-11-18  1:13     ` Eric Blake
2016-11-19 22:05       ` Max Reitz
2016-11-21 13:39         ` Peter Lieven
2016-11-22 14:03   ` Kevin Wolf
2016-11-22 14:13     ` Eric Blake
2016-11-22 14:56       ` Eric Blake
2016-11-17 20:13 ` [Qemu-devel] [PATCH v2 6/9] blkdebug: Sanity check block layer guarantees Eric Blake
2016-11-17 22:36   ` Max Reitz
2016-11-17 20:14 ` [Qemu-devel] [PATCH v2 7/9] blkdebug: Add pass-through write_zero and discard support Eric Blake
2016-11-17 22:47   ` Max Reitz
2016-11-18 23:08     ` Eric Blake
2016-11-17 23:27   ` Max Reitz
2016-11-18  1:17     ` Eric Blake
2016-11-17 20:14 ` [Qemu-devel] [PATCH v2 8/9] blkdebug: Add ability to override unmap geometries Eric Blake
2016-11-17 23:02   ` Max Reitz
2016-11-21 21:11     ` Eric Blake
2016-11-21 21:29       ` Eric Blake
2016-11-17 20:14 ` [Qemu-devel] [PATCH v2 9/9] tests: Add coverage for recent block geometry fixes Eric Blake
2016-11-17 23:19   ` Max Reitz
2016-11-18  1:19     ` Eric Blake
2016-11-17 23:42   ` Max Reitz
2016-11-18  1:28     ` Eric Blake
2016-11-19 21:45       ` Max Reitz
2016-11-19 22:17         ` Max Reitz
2016-11-21 11:38           ` Kevin Wolf
2016-11-21 16:16             ` Eric Blake
2016-11-22 16:05 ` [Qemu-devel] [PATCH v2 for-2.8* 0/9] Fix block regressions, add blkdebug tests Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1479413642-22463-4-git-send-email-eblake@redhat.com \
    --to=eblake@redhat.com \
    --cc=den@openvz.org \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).