qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: qemu-devel@nongnu.org
Cc: kwolf@redhat.com, qemu-block@nongnu.org,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Fam Zheng <famz@redhat.com>, Max Reitz <mreitz@redhat.com>,
	Ronnie Sahlberg <ronniesahlberg@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Peter Lieven <pl@kamp.de>
Subject: [Qemu-devel] [PATCH v2 02/13] block: Track write zero limits in bytes
Date: Wed,  1 Jun 2016 15:10:02 -0600	[thread overview]
Message-ID: <1464815413-613-3-git-send-email-eblake@redhat.com> (raw)
In-Reply-To: <1464815413-613-1-git-send-email-eblake@redhat.com>

Another step towards removing sector-based interfaces: convert
the maximum write and minimum alignment values from sectors to
bytes.  Rename the variables to let the compiler check that all
users are converted to the new semantics.

The maximum remains an int as long as BDRV_REQUEST_MAX_SECTORS
is constrained by INT_MAX (this means that we can't even
support a 2G write_zeroes, but just under it) - changing
operation lengths to unsigned or to 64-bits is a much bigger
audit, and debatable if we even want to do it (since at the
core, a 32-bit platform will still have ssize_t as its
underlying limit on write()).

Meanwhile, alignment is changed to 'uint32_t', since it makes no
sense to have an alignment larger than the maximum write, and
less painful to use an unsigned type with well-defined behavior
in bit operations than to have to worry about what happens if
a driver mistakenly supplies a negative alignment.

Add an assert that no one was trying to use sectors to get a
write zeroes larger than 2G, and therefore that a later conversion
to bytes won't be impacted by keeping the limit at 32 bits.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/block_int.h | 10 ++++++----
 block/io.c                | 22 +++++++++++++---------
 block/iscsi.c             | 13 ++++++-------
 block/qcow2.c             |  2 +-
 block/qed.c               |  2 +-
 block/vmdk.c              |  6 +++---
 6 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 30a9717..2e9c81f 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -328,11 +328,13 @@ typedef struct BlockLimits {
     /* optimal alignment for discard requests in sectors */
     int64_t discard_alignment;

-    /* maximum number of sectors that can zeroized at once */
-    int max_write_zeroes;
+    /* maximum number of bytes that can zeroized at once (since it is
+     * signed, it must be < 2G, if set) */
+    int32_t max_pwrite_zeroes;

-    /* optimal alignment for write zeroes requests in sectors */
-    int64_t write_zeroes_alignment;
+    /* optimal alignment for write zeroes requests in bytes, must be
+     * power of 2, and less than max_pwrite_zeroes if that is set */
+    uint32_t pwrite_zeroes_alignment;

     /* optimal transfer length in sectors */
     int opt_transfer_length;
diff --git a/block/io.c b/block/io.c
index 26b5845..108cd35 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1121,15 +1121,19 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
     int head = 0;
     int tail = 0;

-    int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_write_zeroes,
-                                        BDRV_REQUEST_MAX_SECTORS);
-    if (bs->bl.write_zeroes_alignment) {
-        assert(is_power_of_2(bs->bl.write_zeroes_alignment));
-        head = sector_num & (bs->bl.write_zeroes_alignment - 1);
-        tail = (sector_num + nb_sectors) & (bs->bl.write_zeroes_alignment - 1);
-        max_write_zeroes &= ~(bs->bl.write_zeroes_alignment - 1);
+    int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
+    int write_zeroes_sector_align =
+        bs->bl.pwrite_zeroes_alignment >> BDRV_SECTOR_BITS;
+
+    max_write_zeroes >>= BDRV_SECTOR_BITS;
+    if (write_zeroes_sector_align) {
+        assert(is_power_of_2(bs->bl.pwrite_zeroes_alignment));
+        head = sector_num & (write_zeroes_sector_align - 1);
+        tail = (sector_num + nb_sectors) & (write_zeroes_sector_align - 1);
+        max_write_zeroes &= ~(write_zeroes_sector_align - 1);
     }

+    assert(nb_sectors <= BDRV_REQUEST_MAX_SECTORS);
     while (nb_sectors > 0 && !ret) {
         int num = nb_sectors;

@@ -1139,9 +1143,9 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
          */
         if (head) {
             /* Make a small request up to the first aligned sector.  */
-            num = MIN(nb_sectors, bs->bl.write_zeroes_alignment - head);
+            num = MIN(nb_sectors, write_zeroes_sector_align - head);
             head = 0;
-        } else if (tail && num > bs->bl.write_zeroes_alignment) {
+        } else if (tail && num > write_zeroes_sector_align) {
             /* Shorten the request to the last aligned sector.  */
             num -= tail;
         }
diff --git a/block/iscsi.c b/block/iscsi.c
index 94f9974..52ea9d7 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1715,16 +1715,15 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
         bs->bl.discard_alignment = iscsilun->block_size >> BDRV_SECTOR_BITS;
     }

-    if (iscsilun->bl.max_ws_len < 0xffffffff) {
-        bs->bl.max_write_zeroes =
-            sector_limits_lun2qemu(iscsilun->bl.max_ws_len, iscsilun);
+    if (iscsilun->bl.max_ws_len < 0xffffffff / iscsilun->block_size) {
+        bs->bl.max_pwrite_zeroes =
+            iscsilun->bl.max_ws_len * iscsilun->block_size;
     }
     if (iscsilun->lbp.lbpws) {
-        bs->bl.write_zeroes_alignment =
-            sector_limits_lun2qemu(iscsilun->bl.opt_unmap_gran, iscsilun);
+        bs->bl.pwrite_zeroes_alignment =
+            iscsilun->bl.opt_unmap_gran * iscsilun->block_size;
     } else {
-        bs->bl.write_zeroes_alignment =
-            iscsilun->block_size >> BDRV_SECTOR_BITS;
+        bs->bl.pwrite_zeroes_alignment = iscsilun->block_size;
     }
     bs->bl.opt_transfer_length =
         sector_limits_lun2qemu(iscsilun->bl.opt_xfer_len, iscsilun);
diff --git a/block/qcow2.c b/block/qcow2.c
index ecac399..a6ea6cb 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1193,7 +1193,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BDRVQcow2State *s = bs->opaque;

-    bs->bl.write_zeroes_alignment = s->cluster_sectors;
+    bs->bl.pwrite_zeroes_alignment = s->cluster_size;
 }

 static int qcow2_set_key(BlockDriverState *bs, const char *key)
diff --git a/block/qed.c b/block/qed.c
index b591d4a..0ab5b40 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -518,7 +518,7 @@ static void bdrv_qed_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BDRVQEDState *s = bs->opaque;

-    bs->bl.write_zeroes_alignment = s->header.cluster_size >> BDRV_SECTOR_BITS;
+    bs->bl.pwrite_zeroes_alignment = s->header.cluster_size;
 }

 /* We have nothing to do for QED reopen, stubs just return
diff --git a/block/vmdk.c b/block/vmdk.c
index 372e5ed..8494d63 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -998,9 +998,9 @@ static void vmdk_refresh_limits(BlockDriverState *bs, Error **errp)

     for (i = 0; i < s->num_extents; i++) {
         if (!s->extents[i].flat) {
-            bs->bl.write_zeroes_alignment =
-                MAX(bs->bl.write_zeroes_alignment,
-                    s->extents[i].cluster_sectors);
+            bs->bl.pwrite_zeroes_alignment =
+                MAX(bs->bl.pwrite_zeroes_alignment,
+                    s->extents[i].cluster_sectors << BDRV_SECTOR_BITS);
         }
     }
 }
-- 
2.5.5

  parent reply	other threads:[~2016-06-01 21:10 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-01 21:10 [Qemu-devel] [PATCH v2 00/13] Kill sector-based write_zeroes Eric Blake
2016-06-01 21:10 ` [Qemu-devel] [PATCH v2 01/13] iscsi: Use block size as minimum zero/discard alignment Eric Blake
2016-06-01 21:10 ` Eric Blake [this message]
2016-06-01 21:10 ` [Qemu-devel] [PATCH v2 03/13] block: Add .bdrv_co_pwrite_zeroes() Eric Blake
2016-06-01 21:10 ` [Qemu-devel] [PATCH v2 04/13] block: Switch bdrv_write_zeroes() to byte interface Eric Blake
2016-06-02 11:01   ` Kevin Wolf
2016-06-01 21:10 ` [Qemu-devel] [PATCH v2 05/13] iscsi: Convert to bdrv_co_pwrite_zeroes() Eric Blake
2016-06-01 21:10 ` [Qemu-devel] [PATCH v2 06/13] qcow2: " Eric Blake
2016-06-01 21:10 ` [Qemu-devel] [PATCH v2 07/13] blkreplay: " Eric Blake
2016-06-01 21:10 ` [Qemu-devel] [PATCH v2 08/13] gluster: " Eric Blake
2016-06-01 21:10 ` [Qemu-devel] [PATCH v2 09/13] qed: " Eric Blake
2016-06-02 11:16   ` Kevin Wolf
2016-06-02 12:40     ` Eric Blake
2016-06-02 12:45       ` Kevin Wolf
2016-06-01 21:10 ` [Qemu-devel] [PATCH v2 10/13] raw-posix: " Eric Blake
2016-06-03 16:21   ` Kevin Wolf
2016-06-01 21:10 ` [Qemu-devel] [PATCH v2 11/13] raw_bsd: " Eric Blake
2016-06-01 21:10 ` [Qemu-devel] [PATCH v2 12/13] vmdk: " Eric Blake
2016-06-01 21:10 ` [Qemu-devel] [PATCH v2 13/13] block: Kill bdrv_co_write_zeroes() Eric Blake
2016-06-02 11:26 ` [Qemu-devel] [PATCH v2 00/13] Kill sector-based write_zeroes Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1464815413-613-3-git-send-email-eblake@redhat.com \
    --to=eblake@redhat.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=ronniesahlberg@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).