qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: qemu-devel@nongnu.org
Subject: [Qemu-devel] [PULL 18/25] block: Cater to iscsi with non-power-of-2 discard
Date: Tue,  2 Aug 2016 21:39:28 +0200	[thread overview]
Message-ID: <1470166775-3671-19-git-send-email-pbonzini@redhat.com> (raw)
In-Reply-To: <1470166775-3671-1-git-send-email-pbonzini@redhat.com>

From: Eric Blake <eblake@redhat.com>

Dell Equallogic iSCSI SANs have a very unusual advertised geometry:

$ iscsi-inq -e 1 -c $((0xb0)) iscsi://XXX/0
wsnz:0
maximum compare and write length:1
optimal transfer length granularity:0
maximum transfer length:0
optimal transfer length:0
maximum prefetch xdread xdwrite transfer length:0
maximum unmap lba count:30720
maximum unmap block descriptor count:2
optimal unmap granularity:30720
ugavalid:1
unmap granularity alignment:0
maximum write same length:30720

which says that both the maximum and the optimal discard size
is 15M.  It is not immediately apparent if the device allows
discard requests not aligned to the optimal size, nor if it
allows discards at a finer granularity than the optimal size.

I tried to find details in the SCSI Commands Reference Manual
Rev. A on what valid values of maximum and optimal sizes are
permitted, but while that document mentions a "Block Limits
VPD Page", I couldn't actually find documentation of that page
or what values it would have, or if a SCSI device has an
advertisement of its minimal unmap granularity.  So it is not
obvious to me whether the Dell Equallogic device is compliance
with the SCSI specification.

Fortunately, it is easy enough to support non-power-of-2 sizing,
even if it means we are less efficient than truly possible when
targetting that device (for example, it means that we refuse to
unmap anything that is not a multiple of 15M and aligned to a
15M boundary, even if the device truly does support a smaller
granularity where unmapping actually works).

Reported-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1469129688-22848-5-git-send-email-eblake@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/io.c                | 15 +++++++++------
 include/block/block_int.h | 37 ++++++++++++++++++++-----------------
 2 files changed, 29 insertions(+), 23 deletions(-)

diff --git a/block/io.c b/block/io.c
index 7323f0f..d5493ba 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1180,10 +1180,11 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
     int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
                         bs->bl.request_alignment);
 
-    assert(is_power_of_2(alignment));
-    head = offset & (alignment - 1);
-    tail = (offset + count) & (alignment - 1);
-    max_write_zeroes &= ~(alignment - 1);
+    assert(alignment % bs->bl.request_alignment == 0);
+    head = offset % alignment;
+    tail = (offset + count) % alignment;
+    max_write_zeroes = QEMU_ALIGN_DOWN(max_write_zeroes, alignment);
+    assert(max_write_zeroes >= bs->bl.request_alignment);
 
     while (count > 0 && !ret) {
         int num = count;
@@ -2429,9 +2430,10 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset,
 
     /* Discard is advisory, so ignore any unaligned head or tail */
     align = MAX(bs->bl.pdiscard_alignment, bs->bl.request_alignment);
-    assert(is_power_of_2(align));
-    head = MIN(count, -offset & (align - 1));
+    assert(align % bs->bl.request_alignment == 0);
+    head = offset % align;
     if (head) {
+        head = MIN(count, align - head);
         count -= head;
         offset += head;
     }
@@ -2449,6 +2451,7 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset,
 
     max_pdiscard = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_pdiscard, INT_MAX),
                                    align);
+    assert(max_pdiscard);
 
     while (count > 0) {
         int ret;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 1fe0fd9..47665be 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -330,36 +330,39 @@ typedef struct BlockLimits {
      * otherwise. */
     uint32_t request_alignment;
 
-    /* maximum number of bytes that can be discarded at once (since it
-     * is signed, it must be < 2G, if set), should be multiple of
+    /* Maximum number of bytes that can be discarded at once (since it
+     * is signed, it must be < 2G, if set). Must be multiple of
      * pdiscard_alignment, but need not be power of 2. May be 0 if no
      * inherent 32-bit limit */
     int32_t max_pdiscard;
 
-    /* optimal alignment for discard requests in bytes, must be power
-     * of 2, less than max_pdiscard if that is set, and multiple of
-     * bl.request_alignment. May be 0 if bl.request_alignment is good
-     * enough */
+    /* Optimal alignment for discard requests in bytes. A power of 2
+     * is best but not mandatory.  Must be a multiple of
+     * bl.request_alignment, and must be less than max_pdiscard if
+     * that is set. May be 0 if bl.request_alignment is good enough */
     uint32_t pdiscard_alignment;
 
-    /* maximum number of bytes that can zeroized at once (since it is
-     * signed, it must be < 2G, if set), should be multiple of
+    /* Maximum number of bytes that can zeroized at once (since it is
+     * signed, it must be < 2G, if set). Must be multiple of
      * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
     int32_t max_pwrite_zeroes;
 
-    /* optimal alignment for write zeroes requests in bytes, must be
-     * power of 2, less than max_pwrite_zeroes if that is set, and
-     * multiple of bl.request_alignment. May be 0 if
-     * bl.request_alignment is good enough */
+    /* Optimal alignment for write zeroes requests in bytes. A power
+     * of 2 is best but not mandatory.  Must be a multiple of
+     * bl.request_alignment, and must be less than max_pwrite_zeroes
+     * if that is set. May be 0 if bl.request_alignment is good
+     * enough */
     uint32_t pwrite_zeroes_alignment;
 
-    /* optimal transfer length in bytes (must be power of 2, and
-     * multiple of bl.request_alignment), or 0 if no preferred size */
+    /* Optimal transfer length in bytes.  A power of 2 is best but not
+     * mandatory.  Must be a multiple of bl.request_alignment, or 0 if
+     * no preferred size */
     uint32_t opt_transfer;
 
-    /* maximal transfer length in bytes (need not be power of 2, but
-     * should be multiple of opt_transfer), or 0 for no 32-bit limit.
-     * For now, anything larger than INT_MAX is clamped down. */
+    /* Maximal transfer length in bytes.  Need not be power of 2, but
+     * must be multiple of opt_transfer and bl.request_alignment, or 0
+     * for no 32-bit limit.  For now, anything larger than INT_MAX is
+     * clamped down. */
     uint32_t max_transfer;
 
     /* memory alignment, in bytes so that no bounce buffer is needed */
-- 
2.7.4

  parent reply	other threads:[~2016-08-02 19:40 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-02 19:39 [Qemu-devel] [PULL 00/25] Misc QEMU fixes for 2016-08-02 Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 01/25] util/qht: Document memory ordering assumptions Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 02/25] numa: set the memory backend "is_mapped" field Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 03/25] fix qemu exit on memory hotplug when allocation fails at prealloc time Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 04/25] checkpatch: add check for bzero Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 05/25] util: drop inet_nonblocking_connect() Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 06/25] util: drop unix_nonblocking_connect() Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 07/25] util: Drop inet_listen() Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 08/25] qht: do not segfault when gathering stats from an uninitialized qht Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 09/25] target-i386: fix typo in xsetbv implementation Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 10/25] qdist: fix memory leak during binning Paolo Bonzini
2016-08-02 21:13   ` Marc-André Lureau
2016-08-02 19:39 ` [Qemu-devel] [PULL 11/25] qdist: use g_realloc_n instead of g_realloc Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 12/25] qdist: return "(empty)" instead of NULL when printing an empty dist Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 13/25] mptsas: really fix migration compatibility Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 14/25] i2c: fix migration regression introduced by broadcast support Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 15/25] nbd: Fix bad flag detection on server Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 16/25] nbd: Limit nbdflags to 16 bits Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 17/25] osdep: Document differences in rounding macros Paolo Bonzini
2016-08-02 19:39 ` Paolo Bonzini [this message]
2016-08-02 19:39 ` [Qemu-devel] [PULL 19/25] fw_cfg: Make base type "fw_cfg" abstract Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 20/25] apic: fix broken migration for kvm-apic Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 21/25] x86: ioapic: ignore level irq during processing Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 22/25] x86: ioapic: add support for explicit EOI Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 23/25] Reorganize help output of '-display' option Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 24/25] qdev: Fix use after free in qdev_init_nofail error path Paolo Bonzini
2016-08-02 19:39 ` [Qemu-devel] [PULL 25/25] util: Fix assertion in iov_copy() upon zero 'bytes' and non-zero 'offset' Paolo Bonzini
2016-08-03 10:52 ` [Qemu-devel] [PULL 00/25] Misc QEMU fixes for 2016-08-02 Peter Maydell
2016-08-03 16:24   ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1470166775-3671-19-git-send-email-pbonzini@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).