qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack
@ 2013-11-22 12:39 Paolo Bonzini
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 01/19] block: generalize BlockLimits handling to cover bdrv_aio_discard too Paolo Bonzini
                   ` (19 more replies)
  0 siblings, 20 replies; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

This series fixes the implementation of WRITE SAME in the SCSI emulation
layer.  On top of this, it ensures that zero writes from the guest can
be offloaded to the host device or filesystem if that's supported.
This is a relatively important part of the thin provisioning feature,
and builds on the unmap flag to write_zeroes, recently added by Peter
Lieven.  The feature works with iscsi, block device and filesystem
targets.

v2->v3
        Drop "block/iscsi: use UNMAP to write zeroes if LBPRZ=1" [Peter]

        Patches 3/4: change tracepoint flags to %#x [Stefan]

        Patch 6: clean up conditionals for misaligned num [Stefan]
        Patch 6: fix freeing of bounce buffer [Peter]

        Patch 7: fix bdi->cluster_size for vhdx [Stefan]

        Patch 8: do not check bs->backing_hd to set
        bdi->unallocated_blocks_are_zero [Peter]

        Patch 19: use qemu_blockalign/qemu_vfree [Stefan]

v1->v2
        I cleaned up the series, putting all the block layer changes
        together and all the SCSI changes at the end.

        qcow2 and iscsi are now better covered, which caused the
        introduction of patches 6-11.  I also downloaded the latest
        which contributed some changes in patches 10-11.  It turns
        out we were too cautious in treating LBPRZ as advisory when
        using the UNMAP command.  On the other hand, the API we
        introduced is useful for qcow2/qed/vmdk WRITE SAME, so no
        damage done.


Paolo Bonzini (17):
  block: generalize BlockLimits handling to cover bdrv_aio_discard too
  block: add flags to BlockRequest
  block: add flags argument to bdrv_co_write_zeroes tracepoint
  block: add bdrv_aio_write_zeroes
  block: handle ENOTSUP from discard in generic code
  block: make bdrv_co_do_write_zeroes stricter in producing aligned
    requests
  vpc, vhdx: add get_info
  block drivers: add discard/write_zeroes properties to bdrv_get_info
    implementation
  block drivers: expose requirement for write same alignment from
    formats
  block/iscsi: check WRITE SAME support differently depending on
    MAY_UNMAP
  raw-posix: implement write_zeroes with MAY_UNMAP for files
  raw-posix: implement write_zeroes with MAY_UNMAP for block devices
  raw-posix: add support for write_zeroes on XFS and block devices
  qemu-iotests: 033 is fast
  scsi-disk: catch write protection errors in UNMAP
  scsi-disk: reject ANCHOR=1 for UNMAP and WRITE SAME commands
  scsi-disk: correctly implement WRITE SAME

Peter Lieven (2):
  block/iscsi: remove .bdrv_has_zero_init
  block/iscsi: updated copyright

 block.c                  | 145 +++++++++++++++++++++++----------------
 block/iscsi.c            |  27 +++++---
 block/qcow2.c            |   3 +
 block/qed.c              |   3 +
 block/raw-aio.h          |   3 +-
 block/raw-posix.c        | 175 ++++++++++++++++++++++++++++++++++++++++++++---
 block/vdi.c              |   1 +
 block/vhdx.c             |  13 ++++
 block/vmdk.c             |   4 ++
 block/vpc.c              |  15 ++++
 hw/scsi/scsi-disk.c      | 154 ++++++++++++++++++++++++++++++++++-------
 include/block/block.h    |   4 ++
 tests/qemu-iotests/group |   2 +-
 trace-events             |   4 +-
 14 files changed, 452 insertions(+), 101 deletions(-)

-- 
1.8.4.2

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 01/19] block: generalize BlockLimits handling to cover bdrv_aio_discard too
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-25  9:09   ` Peter Lieven
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 02/19] block: add flags to BlockRequest Paolo Bonzini
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

bdrv_co_discard is only covering drivers which have a .bdrv_co_discard()
implementation, but not those with .bdrv_aio_discard(). Not very nice,
and easy to avoid.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c | 80 +++++++++++++++++++++++++++++++++--------------------------------
 1 file changed, 41 insertions(+), 39 deletions(-)

diff --git a/block.c b/block.c
index e22b55f..1b3e8b2 100644
--- a/block.c
+++ b/block.c
@@ -4288,6 +4288,8 @@ static void coroutine_fn bdrv_discard_co_entry(void *opaque)
 int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
                                  int nb_sectors)
 {
+    int max_discard;
+
     if (!bs->drv) {
         return -ENOMEDIUM;
     } else if (bdrv_check_request(bs, sector_num, nb_sectors)) {
@@ -4305,55 +4307,55 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
         return 0;
     }
 
-    if (bs->drv->bdrv_co_discard) {
-        int max_discard = bs->bl.max_discard ?
-                          bs->bl.max_discard : MAX_DISCARD_DEFAULT;
+    if (!bs->drv->bdrv_co_discard && !bs->drv->bdrv_aio_discard) {
+        return 0;
+    }
 
-        while (nb_sectors > 0) {
-            int ret;
-            int num = nb_sectors;
+    max_discard = bs->bl.max_discard ?  bs->bl.max_discard : MAX_DISCARD_DEFAULT;
+    while (nb_sectors > 0) {
+        int ret;
+        int num = nb_sectors;
 
-            /* align request */
-            if (bs->bl.discard_alignment &&
-                num >= bs->bl.discard_alignment &&
-                sector_num % bs->bl.discard_alignment) {
-                if (num > bs->bl.discard_alignment) {
-                    num = bs->bl.discard_alignment;
-                }
-                num -= sector_num % bs->bl.discard_alignment;
+        /* align request */
+        if (bs->bl.discard_alignment &&
+            num >= bs->bl.discard_alignment &&
+            sector_num % bs->bl.discard_alignment) {
+            if (num > bs->bl.discard_alignment) {
+                num = bs->bl.discard_alignment;
             }
+            num -= sector_num % bs->bl.discard_alignment;
+        }
 
-            /* limit request size */
-            if (num > max_discard) {
-                num = max_discard;
-            }
+        /* limit request size */
+        if (num > max_discard) {
+            num = max_discard;
+        }
 
+        if (bs->drv->bdrv_co_discard) {
             ret = bs->drv->bdrv_co_discard(bs, sector_num, num);
-            if (ret) {
-                return ret;
+        } else {
+            BlockDriverAIOCB *acb;
+            CoroutineIOCompletion co = {
+                .coroutine = qemu_coroutine_self(),
+            };
+
+            acb = bs->drv->bdrv_aio_discard(bs, sector_num, nb_sectors,
+                                            bdrv_co_io_em_complete, &co);
+            if (acb == NULL) {
+                return -EIO;
+            } else {
+                qemu_coroutine_yield();
+                ret = co.ret;
             }
-
-            sector_num += num;
-            nb_sectors -= num;
         }
-        return 0;
-    } else if (bs->drv->bdrv_aio_discard) {
-        BlockDriverAIOCB *acb;
-        CoroutineIOCompletion co = {
-            .coroutine = qemu_coroutine_self(),
-        };
-
-        acb = bs->drv->bdrv_aio_discard(bs, sector_num, nb_sectors,
-                                        bdrv_co_io_em_complete, &co);
-        if (acb == NULL) {
-            return -EIO;
-        } else {
-            qemu_coroutine_yield();
-            return co.ret;
+        if (ret) {
+            return ret;
         }
-    } else {
-        return 0;
+
+        sector_num += num;
+        nb_sectors -= num;
     }
+    return 0;
 }
 
 int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 02/19] block: add flags to BlockRequest
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 01/19] block: generalize BlockLimits handling to cover bdrv_aio_discard too Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-25  9:11   ` Peter Lieven
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 03/19] block: add flags argument to bdrv_co_write_zeroes tracepoint Paolo Bonzini
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

This lets bdrv_co_do_rw receive flags, so that it can be used for
zero writes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c               | 17 +++++++++++------
 include/block/block.h |  1 +
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 1b3e8b2..0665f35 100644
--- a/block.c
+++ b/block.c
@@ -74,6 +74,7 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
                                                int64_t sector_num,
                                                QEMUIOVector *qiov,
                                                int nb_sectors,
+                                               BdrvRequestFlags flags,
                                                BlockDriverCompletionFunc *cb,
                                                void *opaque,
                                                bool is_write);
@@ -3655,7 +3656,7 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
 {
     trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
 
-    return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors,
+    return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors, 0,
                                  cb, opaque, false);
 }
 
@@ -3665,7 +3666,7 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
 {
     trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
 
-    return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors,
+    return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors, 0,
                                  cb, opaque, true);
 }
 
@@ -3837,8 +3838,10 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int num_reqs)
     /* Run the aio requests. */
     mcb->num_requests = num_reqs;
     for (i = 0; i < num_reqs; i++) {
-        bdrv_aio_writev(bs, reqs[i].sector, reqs[i].qiov,
-            reqs[i].nb_sectors, multiwrite_cb, mcb);
+        bdrv_co_aio_rw_vector(bs, reqs[i].sector, reqs[i].qiov,
+                              reqs[i].nb_sectors, reqs[i].flags,
+                              multiwrite_cb, mcb,
+                              true);
     }
 
     return 0;
@@ -3980,10 +3983,10 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque)
 
     if (!acb->is_write) {
         acb->req.error = bdrv_co_do_readv(bs, acb->req.sector,
-            acb->req.nb_sectors, acb->req.qiov, 0);
+            acb->req.nb_sectors, acb->req.qiov, acb->req.flags);
     } else {
         acb->req.error = bdrv_co_do_writev(bs, acb->req.sector,
-            acb->req.nb_sectors, acb->req.qiov, 0);
+            acb->req.nb_sectors, acb->req.qiov, acb->req.flags);
     }
 
     acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
@@ -3994,6 +3997,7 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
                                                int64_t sector_num,
                                                QEMUIOVector *qiov,
                                                int nb_sectors,
+                                               BdrvRequestFlags flags,
                                                BlockDriverCompletionFunc *cb,
                                                void *opaque,
                                                bool is_write)
@@ -4005,6 +4009,7 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
     acb->req.sector = sector_num;
     acb->req.nb_sectors = nb_sectors;
     acb->req.qiov = qiov;
+    acb->req.flags = flags;
     acb->is_write = is_write;
     acb->done = NULL;
 
diff --git a/include/block/block.h b/include/block/block.h
index 4d9e67c..703d875 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -311,6 +311,7 @@ typedef struct BlockRequest {
     /* Fields to be filled by multiwrite caller */
     int64_t sector;
     int nb_sectors;
+    int flags;
     QEMUIOVector *qiov;
     BlockDriverCompletionFunc *cb;
     void *opaque;
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 03/19] block: add flags argument to bdrv_co_write_zeroes tracepoint
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 01/19] block: generalize BlockLimits handling to cover bdrv_aio_discard too Paolo Bonzini
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 02/19] block: add flags to BlockRequest Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-25  9:12   ` Peter Lieven
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 04/19] block: add bdrv_aio_write_zeroes Paolo Bonzini
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c      | 2 +-
 trace-events | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 0665f35..cb07e57 100644
--- a/block.c
+++ b/block.c
@@ -2887,7 +2887,7 @@ int coroutine_fn bdrv_co_write_zeroes(BlockDriverState *bs,
                                       int64_t sector_num, int nb_sectors,
                                       BdrvRequestFlags flags)
 {
-    trace_bdrv_co_write_zeroes(bs, sector_num, nb_sectors);
+    trace_bdrv_co_write_zeroes(bs, sector_num, nb_sectors, flags);
 
     if (!(bs->open_flags & BDRV_O_UNMAP)) {
         flags &= ~BDRV_REQ_MAY_UNMAP;
diff --git a/trace-events b/trace-events
index 8695e9e..f159ac9 100644
--- a/trace-events
+++ b/trace-events
@@ -64,7 +64,7 @@ bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d"
 bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
 bdrv_co_copy_on_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
 bdrv_co_writev(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
-bdrv_co_write_zeroes(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
+bdrv_co_write_zeroes(void *bs, int64_t sector_num, int nb_sector, int flags) "bs %p sector_num %"PRId64" nb_sectors %d flags %#x"
 bdrv_co_io_em(void *bs, int64_t sector_num, int nb_sectors, int is_write, void *acb) "bs %p sector_num %"PRId64" nb_sectors %d is_write %d acb %p"
 bdrv_co_do_copy_on_readv(void *bs, int64_t sector_num, int nb_sectors, int64_t cluster_sector_num, int cluster_nb_sectors) "bs %p sector_num %"PRId64" nb_sectors %d cluster_sector_num %"PRId64" cluster_nb_sectors %d"
 
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 04/19] block: add bdrv_aio_write_zeroes
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (2 preceding siblings ...)
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 03/19] block: add flags argument to bdrv_co_write_zeroes tracepoint Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-25  9:13   ` Peter Lieven
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 05/19] block: handle ENOTSUP from discard in generic code Paolo Bonzini
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

This will be used by the SCSI layer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c               | 11 +++++++++++
 include/block/block.h |  3 +++
 trace-events          |  1 +
 3 files changed, 15 insertions(+)

diff --git a/block.c b/block.c
index cb07e57..f9674d9 100644
--- a/block.c
+++ b/block.c
@@ -3670,6 +3670,17 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
                                  cb, opaque, true);
 }
 
+BlockDriverAIOCB *bdrv_aio_write_zeroes(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors, BdrvRequestFlags flags,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    trace_bdrv_aio_write_zeroes(bs, sector_num, nb_sectors, flags, opaque);
+
+    return bdrv_co_aio_rw_vector(bs, sector_num, NULL, nb_sectors,
+                                 BDRV_REQ_ZERO_WRITE | flags,
+                                 cb, opaque, true);
+}
+
 
 typedef struct MultiwriteCB {
     int error;
diff --git a/include/block/block.h b/include/block/block.h
index 703d875..4967ed2 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -216,6 +216,9 @@ int bdrv_write(BlockDriverState *bs, int64_t sector_num,
                const uint8_t *buf, int nb_sectors);
 int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num,
                int nb_sectors, BdrvRequestFlags flags);
+BlockDriverAIOCB *bdrv_aio_write_zeroes(BlockDriverState *bs, int64_t sector_num,
+                                        int nb_sectors, BdrvRequestFlags flags,
+                                        BlockDriverCompletionFunc *cb, void *opaque);
 int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags);
 int bdrv_writev(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov);
 int bdrv_pread(BlockDriverState *bs, int64_t offset,
diff --git a/trace-events b/trace-events
index f159ac9..d318d6f 100644
--- a/trace-events
+++ b/trace-events
@@ -60,6 +60,7 @@ bdrv_aio_discard(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs
 bdrv_aio_flush(void *bs, void *opaque) "bs %p opaque %p"
 bdrv_aio_readv(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
 bdrv_aio_writev(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
+bdrv_aio_write_zeroes(void *bs, int64_t sector_num, int nb_sectors, int flags, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d flags %#x opaque %p"
 bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d"
 bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
 bdrv_co_copy_on_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 05/19] block: handle ENOTSUP from discard in generic code
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (3 preceding siblings ...)
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 04/19] block: add bdrv_aio_write_zeroes Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-25 10:06   ` Peter Lieven
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 06/19] block: make bdrv_co_do_write_zeroes stricter in producing aligned requests Paolo Bonzini
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

Similar to write_zeroes, let the generic code receive a ENOTSUP for
discard operations.  Since bdrv_discard has advisory semantics,
we can just swallow the error.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c           |  2 +-
 block/raw-posix.c | 12 ++++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index f9674d9..b18ee6b 100644
--- a/block.c
+++ b/block.c
@@ -4364,7 +4364,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
                 ret = co.ret;
             }
         }
-        if (ret) {
+        if (ret && ret != -ENOTSUP) {
             return ret;
         }
 
diff --git a/block/raw-posix.c b/block/raw-posix.c
index f836c8e..cfa3162 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -323,10 +323,10 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
     }
 #endif
 
-    s->has_discard = 1;
+    s->has_discard = true;
 #ifdef CONFIG_XFS
     if (platform_test_xfs_fd(s->fd)) {
-        s->is_xfs = 1;
+        s->is_xfs = true;
     }
 #endif
 
@@ -698,8 +698,8 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
     int ret = -EOPNOTSUPP;
     BDRVRawState *s = aiocb->bs->opaque;
 
-    if (s->has_discard == 0) {
-        return 0;
+    if (!s->has_discard) {
+        return -ENOTSUP;
     }
 
     if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
@@ -734,8 +734,8 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 
     if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
         ret == -ENOTTY) {
-        s->has_discard = 0;
-        ret = 0;
+        s->has_discard = false;
+        ret = -ENOTSUP;
     }
     return ret;
 }
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 06/19] block: make bdrv_co_do_write_zeroes stricter in producing aligned requests
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (4 preceding siblings ...)
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 05/19] block: handle ENOTSUP from discard in generic code Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-25 10:23   ` Peter Lieven
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 07/19] vpc, vhdx: add get_info Paolo Bonzini
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

Right now, bdrv_co_do_write_zeroes will only try to align the
beginning of the request.  However, it is simpler for many
formats to expect the block layer to separate both the head *and*
the tail.  This makes sure that the format's bdrv_co_write_zeroes
function will be called with aligned sector_num and nb_sectors for
the bulk of the request.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index b18ee6b..de66b21 100644
--- a/block.c
+++ b/block.c
@@ -2768,14 +2768,21 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
     while (nb_sectors > 0 && !ret) {
         int num = nb_sectors;
 
-        /* align request */
-        if (bs->bl.write_zeroes_alignment &&
-            num >= bs->bl.write_zeroes_alignment &&
-            sector_num % bs->bl.write_zeroes_alignment) {
-            if (num > bs->bl.write_zeroes_alignment) {
+        /* Align request.  Block drivers can expect the "bulk" of the request
+         * to be aligned.
+         */
+        if (bs->bl.write_zeroes_alignment
+            && num > bs->bl.write_zeroes_alignment) {
+            if (sector_num % bs->bl.write_zeroes_alignment != 0) {
+                /* Make a small request up to the first aligned sector.  */
                 num = bs->bl.write_zeroes_alignment;
+                num -= sector_num % bs->bl.write_zeroes_alignment;
+            } else if ((sector_num + num) % bs->bl.write_zeroes_alignment != 0) {
+                /* Shorten the request to the last aligned sector.  num cannot
+                 * underflow because num > bs->bl.write_zeroes_alignment.
+                 */
+                num -= (sector_num + num) % bs->bl.write_zeroes_alignment;
             }
-            num -= sector_num % bs->bl.write_zeroes_alignment;
         }
 
         /* limit request size */
@@ -2793,16 +2800,20 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
             /* Fall back to bounce buffer if write zeroes is unsupported */
             iov.iov_len = num * BDRV_SECTOR_SIZE;
             if (iov.iov_base == NULL) {
-                /* allocate bounce buffer only once and ensure that it
-                 * is big enough for this and all future requests.
-                 */
-                size_t bufsize = num <= nb_sectors ? num : max_write_zeroes;
-                iov.iov_base = qemu_blockalign(bs, bufsize * BDRV_SECTOR_SIZE);
-                memset(iov.iov_base, 0, bufsize * BDRV_SECTOR_SIZE);
+                iov.iov_base = qemu_blockalign(bs, num * BDRV_SECTOR_SIZE);
+                memset(iov.iov_base, 0, num * BDRV_SECTOR_SIZE);
             }
             qemu_iovec_init_external(&qiov, &iov, 1);
 
             ret = drv->bdrv_co_writev(bs, sector_num, num, &qiov);
+
+            /* Keep bounce buffer around if it is big enough for all
+             * all future requests.
+             */
+            if (num < max_write_zeroes) {
+                qemu_vfree(iov.iov_base);
+                iov.iov_base = NULL;
+            }
         }
 
         sector_num += num;
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 07/19] vpc, vhdx: add get_info
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (5 preceding siblings ...)
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 06/19] block: make bdrv_co_do_write_zeroes stricter in producing aligned requests Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-25 10:27   ` Peter Lieven
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 08/19] block drivers: add discard/write_zeroes properties to bdrv_get_info implementation Paolo Bonzini
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/vhdx.c | 10 ++++++++++
 block/vpc.c  | 14 ++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/block/vhdx.c b/block/vhdx.c
index 7d1af96..ed6fa53 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1043,6 +1043,15 @@ static void vhdx_block_translate(BDRVVHDXState *s, int64_t sector_num,
 }
 
 
+static int vhdx_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+    BDRVVHDXState *s = bs->opaque;
+
+    bdi->cluster_size = s->block_size;
+
+    return 0;
+}
+
 
 static coroutine_fn int vhdx_co_readv(BlockDriverState *bs, int64_t sector_num,
                                       int nb_sectors, QEMUIOVector *qiov)
@@ -1885,6 +1894,7 @@ static BlockDriver bdrv_vhdx = {
     .bdrv_co_readv          = vhdx_co_readv,
     .bdrv_co_writev         = vhdx_co_writev,
     .bdrv_create            = vhdx_create,
+    .bdrv_get_info          = vhdx_get_info,
 
     .create_options         = vhdx_create_options,
 };
diff --git a/block/vpc.c b/block/vpc.c
index 577cc45..551876f 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -455,6 +455,18 @@ fail:
     return -1;
 }
 
+static int vpc_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+    BDRVVPCState *s = (BDRVVPCState *)bs->opaque;
+    VHDFooter *footer = (VHDFooter *) s->footer_buf;
+
+    if (cpu_to_be32(footer->type) != VHD_FIXED) {
+        bdi->cluster_size = s->block_size;
+    }
+
+    return 0;
+}
+
 static int vpc_read(BlockDriverState *bs, int64_t sector_num,
                     uint8_t *buf, int nb_sectors)
 {
@@ -857,6 +869,8 @@ static BlockDriver bdrv_vpc = {
     .bdrv_read              = vpc_co_read,
     .bdrv_write             = vpc_co_write,
 
+    .bdrv_get_info          = vpc_get_info,
+
     .create_options         = vpc_create_options,
     .bdrv_has_zero_init     = vpc_has_zero_init,
 };
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 08/19] block drivers: add discard/write_zeroes properties to bdrv_get_info implementation
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (6 preceding siblings ...)
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 07/19] vpc, vhdx: add get_info Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-25 10:29   ` Peter Lieven
  2013-12-03 15:09   ` Kevin Wolf
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 09/19] block drivers: expose requirement for write same alignment from formats Paolo Bonzini
                   ` (11 subsequent siblings)
  19 siblings, 2 replies; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/qcow2.c | 2 ++
 block/qed.c   | 2 ++
 block/vdi.c   | 1 +
 block/vhdx.c  | 3 +++
 block/vpc.c   | 1 +
 5 files changed, 9 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2fe37ed..1542750 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1892,6 +1892,8 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
 static int qcow2_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
     BDRVQcowState *s = bs->opaque;
+    bdi->unallocated_blocks_are_zero = true;
+    bdi->can_write_zeroes_with_unmap = (s->qcow_version >= 3);
     bdi->cluster_size = s->cluster_size;
     bdi->vm_state_offset = qcow2_vm_state_offset(s);
     return 0;
diff --git a/block/qed.c b/block/qed.c
index adc2736..59516a5 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1475,6 +1475,8 @@ static int bdrv_qed_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
     memset(bdi, 0, sizeof(*bdi));
     bdi->cluster_size = s->header.cluster_size;
     bdi->is_dirty = s->header.features & QED_F_NEED_CHECK;
+    bdi->unallocated_blocks_are_zero = true;
+    bdi->can_write_zeroes_with_unmap = true;
     return 0;
 }
 
diff --git a/block/vdi.c b/block/vdi.c
index b6ec002..2d7490f 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -331,6 +331,7 @@ static int vdi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
     logout("\n");
     bdi->cluster_size = s->block_size;
     bdi->vm_state_offset = 0;
+    bdi->unallocated_blocks_are_zero = true;
     return 0;
 }
 
diff --git a/block/vhdx.c b/block/vhdx.c
index ed6fa53..67bbe10 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1049,6 +1049,9 @@ static int vhdx_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
 
     bdi->cluster_size = s->block_size;
 
+    bdi->unallocated_blocks_are_zero =
+        (s->params.data_bits & VHDX_PARAMS_HAS_PARENT) == 0;
+
     return 0;
 }
 
diff --git a/block/vpc.c b/block/vpc.c
index 551876f..1d326cb 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -464,6 +464,7 @@ static int vpc_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
         bdi->cluster_size = s->block_size;
     }
 
+    bdi->unallocated_blocks_are_zero = true;
     return 0;
 }
 
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 09/19] block drivers: expose requirement for write same alignment from formats
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (7 preceding siblings ...)
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 08/19] block drivers: add discard/write_zeroes properties to bdrv_get_info implementation Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-25 10:33   ` Peter Lieven
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 10/19] block/iscsi: remove .bdrv_has_zero_init Paolo Bonzini
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

This will let misaligned but large requests use zero clusters.  This
is important because the cluster size is not guest visible.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/qcow2.c | 1 +
 block/qed.c   | 1 +
 block/vmdk.c  | 4 ++++
 3 files changed, 6 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 1542750..81c4dad 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -718,6 +718,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     qemu_opts_del(opts);
+    bs->bl.write_zeroes_alignment = s->cluster_sectors;
 
     if (s->use_lazy_refcounts && s->qcow_version < 3) {
         error_setg(errp, "Lazy refcounts require a qcow2 image with at least "
diff --git a/block/qed.c b/block/qed.c
index 59516a5..450a1fa 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -495,6 +495,7 @@ static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
         }
     }
 
+    bs->bl.write_zeroes_alignment = s->header.cluster_size >> BDRV_SECTOR_BITS;
     s->need_check_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
                                             qed_need_check_timer_cb, s);
 
diff --git a/block/vmdk.c b/block/vmdk.c
index 6555663..63ea7ff 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -428,6 +428,10 @@ static int vmdk_add_extent(BlockDriverState *bs,
     extent->l2_size = l2_size;
     extent->cluster_sectors = flat ? sectors : cluster_sectors;
 
+    if (!flat) {
+        bs->bl.write_zeroes_alignment =
+            MAX(bs->bl.write_zeroes_alignment, cluster_sectors);
+    }
     if (s->num_extents > 1) {
         extent->end_sector = (*(extent - 1)).end_sector + extent->sectors;
     } else {
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 10/19] block/iscsi: remove .bdrv_has_zero_init
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (8 preceding siblings ...)
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 09/19] block drivers: expose requirement for write same alignment from formats Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 11/19] block/iscsi: updated copyright Paolo Bonzini
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

From: Peter Lieven <pl@kamp.de>

since commit 3ac21627 the default value changed to 0.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/iscsi.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index b7b5238..b6b62aa 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1505,11 +1505,6 @@ static int iscsi_truncate(BlockDriverState *bs, int64_t offset)
     return 0;
 }
 
-static int iscsi_has_zero_init(BlockDriverState *bs)
-{
-    return 0;
-}
-
 static int iscsi_create(const char *filename, QEMUOptionParameter *options,
                         Error **errp)
 {
@@ -1608,8 +1603,6 @@ static BlockDriver bdrv_iscsi = {
     .bdrv_aio_writev = iscsi_aio_writev,
     .bdrv_aio_flush  = iscsi_aio_flush,
 
-    .bdrv_has_zero_init = iscsi_has_zero_init,
-
 #ifdef __linux__
     .bdrv_ioctl       = iscsi_ioctl,
     .bdrv_aio_ioctl   = iscsi_aio_ioctl,
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 11/19] block/iscsi: updated copyright
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (9 preceding siblings ...)
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 10/19] block/iscsi: remove .bdrv_has_zero_init Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 12/19] block/iscsi: check WRITE SAME support differently depending on MAY_UNMAP Paolo Bonzini
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

From: Peter Lieven <pl@kamp.de>

added myself to reflect recent work on the iscsi block driver.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/iscsi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index b6b62aa..20f4f55 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -2,6 +2,7 @@
  * QEMU Block driver for iSCSI images
  *
  * Copyright (c) 2010-2011 Ronnie Sahlberg <ronniesahlberg@gmail.com>
+ * Copyright (c) 2012-2013 Peter Lieven <pl@kamp.de>
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to deal
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 12/19] block/iscsi: check WRITE SAME support differently depending on MAY_UNMAP
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (10 preceding siblings ...)
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 11/19] block/iscsi: updated copyright Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-25 10:34   ` Peter Lieven
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 13/19] raw-posix: implement write_zeroes with MAY_UNMAP for files Paolo Bonzini
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

The current check is right for MAY_UNMAP=1.  For MAY_UNMAP=0, just
try and fall back to regular writes as soon as a WRITE SAME command
fails.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/iscsi.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 20f4f55..93fee6d 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -55,6 +55,7 @@ typedef struct IscsiLun {
     QEMUTimer *nop_timer;
     uint8_t lbpme;
     uint8_t lbprz;
+    uint8_t has_write_same;
     struct scsi_inquiry_logical_block_provisioning lbp;
     struct scsi_inquiry_block_limits bl;
     unsigned char *zeroblock;
@@ -976,8 +977,13 @@ coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
         return -EINVAL;
     }
 
-    if (!iscsilun->lbp.lbpws) {
-        /* WRITE SAME is not supported by the target */
+    if (!(flags & BDRV_REQ_MAY_UNMAP) && !iscsilun->has_write_same) {
+        /* WRITE SAME without UNMAP is not supported by the target */
+        return -ENOTSUP;
+    }
+
+    if ((flags & BDRV_REQ_MAY_UNMAP) && !iscsilun->lbp.lbpws) {
+        /* WRITE SAME with UNMAP is not supported by the target */
         return -ENOTSUP;
     }
 
@@ -1012,6 +1018,14 @@ retry:
     }
 
     if (iTask.status != SCSI_STATUS_GOOD) {
+        if (iTask.status == SCSI_STATUS_CHECK_CONDITION &&
+            iTask.task->sense.key == SCSI_SENSE_ILLEGAL_REQUEST &&
+            iTask.task->sense.ascq == SCSI_SENSE_ASCQ_INVALID_OPERATION_CODE) {
+            /* WRITE SAME is not supported by the target */
+            iscsilun->has_write_same = false;
+            return -ENOTSUP;
+        }
+
         return -EIO;
     }
 
@@ -1375,6 +1389,7 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     iscsilun->type = inq->periperal_device_type;
+    iscsilun->has_write_same = true;
 
     if ((ret = iscsi_readcapacity_sync(iscsilun)) != 0) {
         goto out;
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 13/19] raw-posix: implement write_zeroes with MAY_UNMAP for files
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (11 preceding siblings ...)
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 12/19] block/iscsi: check WRITE SAME support differently depending on MAY_UNMAP Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 14/19] raw-posix: implement write_zeroes with MAY_UNMAP for block devices Paolo Bonzini
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

Writing zeroes to a file can be done by punching a hole if
MAY_UNMAP is set.

Note that in this case ENOTSUP is not ignored, but makes
the block layer fall back to the generic implementation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/raw-posix.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 trace-events      |  1 +
 2 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index cfa3162..7f3f47d 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -139,9 +139,10 @@ typedef struct BDRVRawState {
     void *aio_ctx;
 #endif
 #ifdef CONFIG_XFS
-    bool is_xfs : 1;
+    bool is_xfs:1;
 #endif
-    bool has_discard : 1;
+    bool has_discard:1;
+    bool discard_zeroes:1;
 } BDRVRawState;
 
 typedef struct BDRVRawReopenState {
@@ -283,6 +284,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
     Error *local_err = NULL;
     const char *filename;
     int fd, ret;
+    struct stat st;
 
     opts = qemu_opts_create_nofail(&raw_runtime_opts);
     qemu_opts_absorb_qdict(opts, options, &local_err);
@@ -324,6 +326,15 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
 #endif
 
     s->has_discard = true;
+
+    if (fstat(s->fd, &st) < 0) {
+        error_setg_errno(errp, errno, "Could not stat file");
+        goto fail;
+    }
+    if (S_ISREG(st.st_mode)) {
+        s->discard_zeroes = true;
+    }
+
 #ifdef CONFIG_XFS
     if (platform_test_xfs_fd(s->fd)) {
         s->is_xfs = true;
@@ -787,6 +798,29 @@ static int aio_worker(void *arg)
     return ret;
 }
 
+static int paio_submit_co(BlockDriverState *bs, int fd,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        int type)
+{
+    RawPosixAIOData *acb = g_slice_new(RawPosixAIOData);
+    ThreadPool *pool;
+
+    acb->bs = bs;
+    acb->aio_type = type;
+    acb->aio_fildes = fd;
+
+    if (qiov) {
+        acb->aio_iov = qiov->iov;
+        acb->aio_niov = qiov->niov;
+    }
+    acb->aio_nbytes = nb_sectors * 512;
+    acb->aio_offset = sector_num * 512;
+
+    trace_paio_submit_co(sector_num, nb_sectors, type);
+    pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
+    return thread_pool_submit_co(pool, aio_worker, acb);
+}
+
 static BlockDriverAIOCB *paio_submit(BlockDriverState *bs, int fd,
         int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
         BlockDriverCompletionFunc *cb, void *opaque, int type)
@@ -1199,6 +1233,31 @@ static coroutine_fn BlockDriverAIOCB *raw_aio_discard(BlockDriverState *bs,
                        cb, opaque, QEMU_AIO_DISCARD);
 }
 
+static int coroutine_fn raw_co_write_zeroes(
+    BlockDriverState *bs, int64_t sector_num,
+    int nb_sectors, BdrvRequestFlags flags)
+{
+    BDRVRawState *s = bs->opaque;
+
+    if (!(flags & BDRV_REQ_MAY_UNMAP)) {
+        return -ENOTSUP;
+    }
+    if (!s->discard_zeroes) {
+        return -ENOTSUP;
+    }
+    return paio_submit_co(bs, s->fd, sector_num, NULL, nb_sectors,
+                          QEMU_AIO_DISCARD);
+}
+
+static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+    BDRVRawState *s = bs->opaque;
+
+    bdi->unallocated_blocks_are_zero = s->discard_zeroes;
+    bdi->can_write_zeroes_with_unmap = s->discard_zeroes;
+    return 0;
+}
+
 static QEMUOptionParameter raw_create_options[] = {
     {
         .name = BLOCK_OPT_SIZE,
@@ -1222,6 +1281,7 @@ static BlockDriver bdrv_file = {
     .bdrv_create = raw_create,
     .bdrv_has_zero_init = bdrv_has_zero_init_1,
     .bdrv_co_get_block_status = raw_co_get_block_status,
+    .bdrv_co_write_zeroes = raw_co_write_zeroes,
 
     .bdrv_aio_readv = raw_aio_readv,
     .bdrv_aio_writev = raw_aio_writev,
@@ -1230,6 +1290,7 @@ static BlockDriver bdrv_file = {
 
     .bdrv_truncate = raw_truncate,
     .bdrv_getlength = raw_getlength,
+    .bdrv_get_info = raw_get_info,
     .bdrv_get_allocated_file_size
                         = raw_get_allocated_file_size,
 
@@ -1585,6 +1646,7 @@ static BlockDriver bdrv_host_device = {
 
     .bdrv_truncate      = raw_truncate,
     .bdrv_getlength	= raw_getlength,
+    .bdrv_get_info = raw_get_info,
     .bdrv_get_allocated_file_size
                         = raw_get_allocated_file_size,
 
diff --git a/trace-events b/trace-events
index d318d6f..e32d00c 100644
--- a/trace-events
+++ b/trace-events
@@ -128,6 +128,7 @@ thread_pool_cancel(void *req, void *opaque) "req %p opaque %p"
 
 # block/raw-win32.c
 # block/raw-posix.c
+paio_submit_co(int64_t sector_num, int nb_sectors, int type) "sector_num %"PRId64" nb_sectors %d type %d"
 paio_submit(void *acb, void *opaque, int64_t sector_num, int nb_sectors, int type) "acb %p opaque %p sector_num %"PRId64" nb_sectors %d type %d"
 
 # ioport.c
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 14/19] raw-posix: implement write_zeroes with MAY_UNMAP for block devices
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (12 preceding siblings ...)
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 13/19] raw-posix: implement write_zeroes with MAY_UNMAP for files Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 15/19] raw-posix: add support for write_zeroes on XFS and " Paolo Bonzini
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

See the next commit for the description of the Linux kernel problem
that is worked around in raw_open_common.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/raw-posix.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 7f3f47d..b3feed6 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -334,6 +334,22 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
     if (S_ISREG(st.st_mode)) {
         s->discard_zeroes = true;
     }
+    if (S_ISBLK(st.st_mode)) {
+#ifdef BLKDISCARDZEROES
+        unsigned int arg;
+        if (ioctl(s->fd, BLKDISCARDZEROES, &arg) == 0 && arg) {
+            s->discard_zeroes = true;
+        }
+#endif
+#ifdef __linux__
+        /* On Linux 3.10, BLKDISCARD leaves stale data in the page cache.  Do
+         * not rely on the contents of discarded blocks unless using O_DIRECT.
+         */
+        if (!(bs->open_flags & BDRV_O_NOCACHE)) {
+            s->discard_zeroes = false;
+        }
+#endif
+    }
 
 #ifdef CONFIG_XFS
     if (platform_test_xfs_fd(s->fd)) {
@@ -1586,6 +1602,26 @@ static coroutine_fn BlockDriverAIOCB *hdev_aio_discard(BlockDriverState *bs,
                        cb, opaque, QEMU_AIO_DISCARD|QEMU_AIO_BLKDEV);
 }
 
+static coroutine_fn int hdev_co_write_zeroes(BlockDriverState *bs,
+    int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
+{
+    BDRVRawState *s = bs->opaque;
+    int rc;
+
+    rc = fd_open(bs);
+    if (rc < 0) {
+        return rc;
+    }
+    if (!(flags & BDRV_REQ_MAY_UNMAP)) {
+        return -ENOTSUP;
+    }
+    if (!s->discard_zeroes) {
+        return -ENOTSUP;
+    }
+    return paio_submit_co(bs, s->fd, sector_num, NULL, nb_sectors,
+                          QEMU_AIO_DISCARD|QEMU_AIO_BLKDEV);
+}
+
 static int hdev_create(const char *filename, QEMUOptionParameter *options,
                        Error **errp)
 {
@@ -1638,6 +1674,7 @@ static BlockDriver bdrv_host_device = {
     .bdrv_reopen_abort   = raw_reopen_abort,
     .bdrv_create        = hdev_create,
     .create_options     = raw_create_options,
+    .bdrv_co_write_zeroes = hdev_co_write_zeroes,
 
     .bdrv_aio_readv	= raw_aio_readv,
     .bdrv_aio_writev	= raw_aio_writev,
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 15/19] raw-posix: add support for write_zeroes on XFS and block devices
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (13 preceding siblings ...)
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 14/19] raw-posix: implement write_zeroes with MAY_UNMAP for block devices Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 16/19] qemu-iotests: 033 is fast Paolo Bonzini
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

The code is similar to the implementation of discard and write_zeroes
with UNMAP.  However, failure must be propagated up to block.c.

The stale page cache problem can be reproduced as follows:

    # modprobe scsi-debug lbpws=1 lbprz=1
    # ./qemu-io /dev/sdXX
    qemu-io> write -P 0xcc 0 2M
    qemu-io> write -z 0 1M
    qemu-io> read -P 0x00 0 512
    Pattern verification failed at offset 0, 512 bytes
    qemu-io> read -v 0 512
    00000000:  cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
    ...

    # ./qemu-io --cache=none /dev/sdXX
    qemu-io> write -P 0xcc 0 2M
    qemu-io> write -z 0 1M
    qemu-io> read -P 0x00 0 512
    qemu-io> read -v 0 512
    00000000:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    ...

And similarly with discard instead of "write -z".

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/raw-aio.h   |  3 +-
 block/raw-posix.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 74 insertions(+), 13 deletions(-)

diff --git a/block/raw-aio.h b/block/raw-aio.h
index c61f159..7ad0a8a 100644
--- a/block/raw-aio.h
+++ b/block/raw-aio.h
@@ -21,9 +21,10 @@
 #define QEMU_AIO_IOCTL        0x0004
 #define QEMU_AIO_FLUSH        0x0008
 #define QEMU_AIO_DISCARD      0x0010
+#define QEMU_AIO_WRITE_ZEROES 0x0020
 #define QEMU_AIO_TYPE_MASK \
         (QEMU_AIO_READ|QEMU_AIO_WRITE|QEMU_AIO_IOCTL|QEMU_AIO_FLUSH| \
-         QEMU_AIO_DISCARD)
+         QEMU_AIO_DISCARD|QEMU_AIO_WRITE_ZEROES)
 
 /* AIO flags */
 #define QEMU_AIO_MISALIGNED   0x1000
diff --git a/block/raw-posix.c b/block/raw-posix.c
index b3feed6..10c6b34 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -142,6 +142,7 @@ typedef struct BDRVRawState {
     bool is_xfs:1;
 #endif
     bool has_discard:1;
+    bool has_write_zeroes:1;
     bool discard_zeroes:1;
 } BDRVRawState;
 
@@ -326,6 +327,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
 #endif
 
     s->has_discard = true;
+    s->has_write_zeroes = true;
 
     if (fstat(s->fd, &st) < 0) {
         error_setg_errno(errp, errno, "Could not stat file");
@@ -344,9 +346,11 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
 #ifdef __linux__
         /* On Linux 3.10, BLKDISCARD leaves stale data in the page cache.  Do
          * not rely on the contents of discarded blocks unless using O_DIRECT.
+         * Same for BLKZEROOUT.
          */
         if (!(bs->open_flags & BDRV_O_NOCACHE)) {
             s->discard_zeroes = false;
+            s->has_write_zeroes = false;
         }
 #endif
     }
@@ -702,6 +706,23 @@ static ssize_t handle_aiocb_rw(RawPosixAIOData *aiocb)
 }
 
 #ifdef CONFIG_XFS
+static int xfs_write_zeroes(BDRVRawState *s, int64_t offset, uint64_t bytes)
+{
+    struct xfs_flock64 fl;
+
+    memset(&fl, 0, sizeof(fl));
+    fl.l_whence = SEEK_SET;
+    fl.l_start = offset;
+    fl.l_len = bytes;
+
+    if (xfsctl(NULL, s->fd, XFS_IOC_ZERO_RANGE, &fl) < 0) {
+        DEBUG_BLOCK_PRINT("cannot write zero range (%s)\n", strerror(errno));
+        return -errno;
+    }
+
+    return 0;
+}
+
 static int xfs_discard(BDRVRawState *s, int64_t offset, uint64_t bytes)
 {
     struct xfs_flock64 fl;
@@ -720,6 +741,42 @@ static int xfs_discard(BDRVRawState *s, int64_t offset, uint64_t bytes)
 }
 #endif
 
+static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
+{
+    int ret = -EOPNOTSUPP;
+    BDRVRawState *s = aiocb->bs->opaque;
+
+    if (s->has_write_zeroes == 0) {
+        return -ENOTSUP;
+    }
+
+    if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
+#ifdef BLKZEROOUT
+        do {
+            uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes };
+            if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) {
+                return 0;
+            }
+        } while (errno == EINTR);
+
+        ret = -errno;
+#endif
+    } else {
+#ifdef CONFIG_XFS
+        if (s->is_xfs) {
+            return xfs_write_zeroes(s, aiocb->aio_offset, aiocb->aio_nbytes);
+        }
+#endif
+    }
+
+    if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
+        ret == -ENOTTY) {
+        s->has_write_zeroes = false;
+        ret = -ENOTSUP;
+    }
+    return ret;
+}
+
 static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 {
     int ret = -EOPNOTSUPP;
@@ -804,6 +861,9 @@ static int aio_worker(void *arg)
     case QEMU_AIO_DISCARD:
         ret = handle_aiocb_discard(aiocb);
         break;
+    case QEMU_AIO_WRITE_ZEROES:
+        ret = handle_aiocb_write_zeroes(aiocb);
+        break;
     default:
         fprintf(stderr, "invalid aio request (0x%x)\n", aiocb->aio_type);
         ret = -EINVAL;
@@ -1256,13 +1316,13 @@ static int coroutine_fn raw_co_write_zeroes(
     BDRVRawState *s = bs->opaque;
 
     if (!(flags & BDRV_REQ_MAY_UNMAP)) {
-        return -ENOTSUP;
-    }
-    if (!s->discard_zeroes) {
-        return -ENOTSUP;
+        return paio_submit_co(bs, s->fd, sector_num, NULL, nb_sectors,
+                              QEMU_AIO_WRITE_ZEROES);
+    } else if (s->discard_zeroes) {
+        return paio_submit_co(bs, s->fd, sector_num, NULL, nb_sectors,
+                              QEMU_AIO_DISCARD);
     }
-    return paio_submit_co(bs, s->fd, sector_num, NULL, nb_sectors,
-                          QEMU_AIO_DISCARD);
+    return -ENOTSUP;
 }
 
 static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
@@ -1613,13 +1673,13 @@ static coroutine_fn int hdev_co_write_zeroes(BlockDriverState *bs,
         return rc;
     }
     if (!(flags & BDRV_REQ_MAY_UNMAP)) {
-        return -ENOTSUP;
-    }
-    if (!s->discard_zeroes) {
-        return -ENOTSUP;
+        return paio_submit_co(bs, s->fd, sector_num, NULL, nb_sectors,
+                              QEMU_AIO_WRITE_ZEROES|QEMU_AIO_BLKDEV);
+    } else if (s->discard_zeroes) {
+        return paio_submit_co(bs, s->fd, sector_num, NULL, nb_sectors,
+                              QEMU_AIO_DISCARD|QEMU_AIO_BLKDEV);
     }
-    return paio_submit_co(bs, s->fd, sector_num, NULL, nb_sectors,
-                          QEMU_AIO_DISCARD|QEMU_AIO_BLKDEV);
+    return -ENOTSUP;
 }
 
 static int hdev_create(const char *filename, QEMUOptionParameter *options,
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 16/19] qemu-iotests: 033 is fast
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (14 preceding siblings ...)
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 15/19] raw-posix: add support for write_zeroes on XFS and " Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 17/19] scsi-disk: catch write protection errors in UNMAP Paolo Bonzini
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/qemu-iotests/group | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index b63b18c..303e0f3 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -39,7 +39,7 @@
 030 rw auto backing
 031 rw auto quick
 032 rw auto
-033 rw auto
+033 rw auto quick
 034 rw auto backing
 035 rw auto quick
 036 rw auto quick
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 17/19] scsi-disk: catch write protection errors in UNMAP
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (15 preceding siblings ...)
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 16/19] qemu-iotests: 033 is fast Paolo Bonzini
@ 2013-11-22 12:39 ` Paolo Bonzini
  2013-11-22 12:40 ` [Qemu-devel] [PATCH v3 18/19] scsi-disk: reject ANCHOR=1 for UNMAP and WRITE SAME commands Paolo Bonzini
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

This is the same that is already done for WRITE SAME.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/scsi/scsi-disk.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 74e6a14..4138268 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -1543,6 +1543,7 @@ done:
 
 static void scsi_disk_emulate_unmap(SCSIDiskReq *r, uint8_t *inbuf)
 {
+    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
     uint8_t *p = inbuf;
     int len = r->req.cmd.xfer;
     UnmapCBData *data;
@@ -1560,6 +1561,11 @@ static void scsi_disk_emulate_unmap(SCSIDiskReq *r, uint8_t *inbuf)
         goto invalid_param_len;
     }
 
+    if (bdrv_is_read_only(s->qdev.conf.bs)) {
+        scsi_check_condition(r, SENSE_CODE(WRITE_PROTECTED));
+        return;
+    }
+
     data = g_new0(UnmapCBData, 1);
     data->r = r;
     data->inbuf = &p[8];
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 18/19] scsi-disk: reject ANCHOR=1 for UNMAP and WRITE SAME commands
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (16 preceding siblings ...)
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 17/19] scsi-disk: catch write protection errors in UNMAP Paolo Bonzini
@ 2013-11-22 12:40 ` Paolo Bonzini
  2013-11-22 12:40 ` [Qemu-devel] [PATCH v3 19/19] scsi-disk: correctly implement WRITE SAME Paolo Bonzini
  2013-12-03 14:29 ` [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Stefan Hajnoczi
  19 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

Since we report ANC_SUP==0 in VPD page B2h, we need to return
an error (ILLEGAL REQUEST/INVALID FIELD IN CDB) for all WRITE SAME
requests with ANCHOR==1.

Inspired by a similar patch to the LIO in-kernel target.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/scsi/scsi-disk.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 4138268..0640bb0 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -1548,6 +1548,11 @@ static void scsi_disk_emulate_unmap(SCSIDiskReq *r, uint8_t *inbuf)
     int len = r->req.cmd.xfer;
     UnmapCBData *data;
 
+    /* Reject ANCHOR=1.  */
+    if (r->req.cmd.buf[1] & 0x1) {
+        goto invalid_field;
+    }
+
     if (len < 8) {
         goto invalid_param_len;
     }
@@ -1578,6 +1583,10 @@ static void scsi_disk_emulate_unmap(SCSIDiskReq *r, uint8_t *inbuf)
 
 invalid_param_len:
     scsi_check_condition(r, SENSE_CODE(INVALID_PARAM_LEN));
+    return;
+
+invalid_field:
+    scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
 }
 
 static void scsi_disk_emulate_write_data(SCSIRequest *req)
@@ -1856,8 +1865,9 @@ static int32_t scsi_disk_emulate_command(SCSIRequest *req, uint8_t *buf)
 
         /*
          * We only support WRITE SAME with the unmap bit set for now.
+         * Reject UNMAP=0 or ANCHOR=1.
          */
-        if (!(req->cmd.buf[1] & 0x8)) {
+        if (!(req->cmd.buf[1] & 0x8) || (req->cmd.buf[1] & 0x10)) {
             goto illegal_request;
         }
 
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v3 19/19] scsi-disk: correctly implement WRITE SAME
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (17 preceding siblings ...)
  2013-11-22 12:40 ` [Qemu-devel] [PATCH v3 18/19] scsi-disk: reject ANCHOR=1 for UNMAP and WRITE SAME commands Paolo Bonzini
@ 2013-11-22 12:40 ` Paolo Bonzini
  2013-12-03 14:29 ` [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Stefan Hajnoczi
  19 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-22 12:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pl, ronniesahlberg, stefanha

Fetch the data to be written from the input buffer.  If it is all zeroes,
we can use the write_zeroes call (possibly with the new MAY_UNMAP flag).
Otherwise, do as many write cycles as needed, writing 512k at a time.

Strictly speaking, this is still incorrect because a zero cluster should
only be written if the MAY_UNMAP flag is set.  But this is a bug in qcow2
and the other formats, not in the SCSI code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/scsi/scsi-disk.c | 140 +++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 116 insertions(+), 24 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 0640bb0..efadfc0 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -41,6 +41,7 @@ do { printf("scsi-disk: " fmt , ## __VA_ARGS__); } while (0)
 #include <scsi/sg.h>
 #endif
 
+#define SCSI_WRITE_SAME_MAX         524288
 #define SCSI_DMA_BUF_SIZE           131072
 #define SCSI_MAX_INQUIRY_LEN        256
 #define SCSI_MAX_MODE_LEN           256
@@ -634,6 +635,8 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf)
             buflen = 0x40;
             memset(outbuf + 4, 0, buflen - 4);
 
+            outbuf[4] = 0x1; /* wsnz */
+
             /* optimal transfer length granularity */
             outbuf[6] = (min_io_size >> 8) & 0xff;
             outbuf[7] = min_io_size & 0xff;
@@ -1589,6 +1592,111 @@ invalid_field:
     scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
 }
 
+typedef struct WriteSameCBData {
+    SCSIDiskReq *r;
+    int64_t sector;
+    int nb_sectors;
+    QEMUIOVector qiov;
+    struct iovec iov;
+} WriteSameCBData;
+
+static void scsi_write_same_complete(void *opaque, int ret)
+{
+    WriteSameCBData *data = opaque;
+    SCSIDiskReq *r = data->r;
+    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+
+    assert(r->req.aiocb != NULL);
+    r->req.aiocb = NULL;
+    bdrv_acct_done(s->qdev.conf.bs, &r->acct);
+    if (r->req.io_canceled) {
+        goto done;
+    }
+
+    if (ret < 0) {
+        if (scsi_handle_rw_error(r, -ret)) {
+            goto done;
+        }
+    }
+
+    data->nb_sectors -= data->iov.iov_len / 512;
+    data->sector += data->iov.iov_len / 512;
+    data->iov.iov_len = MIN(data->nb_sectors * 512, data->iov.iov_len);
+    if (data->iov.iov_len) {
+        bdrv_acct_start(s->qdev.conf.bs, &r->acct, data->iov.iov_len, BDRV_ACCT_WRITE);
+        r->req.aiocb = bdrv_aio_writev(s->qdev.conf.bs, data->sector,
+                                       &data->qiov, data->iov.iov_len / 512,
+                                       scsi_write_same_complete, r);
+        return;
+    }
+
+    scsi_req_complete(&r->req, GOOD);
+
+done:
+    if (!r->req.io_canceled) {
+        scsi_req_unref(&r->req);
+    }
+    qemu_vfree(data->iov.iov_base);
+    g_free(data);
+}
+
+static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf)
+{
+    SCSIRequest *req = &r->req;
+    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
+    uint32_t nb_sectors = scsi_data_cdb_length(r->req.cmd.buf);
+    WriteSameCBData *data;
+    uint8_t *buf;
+    int i;
+
+    /* Fail if PBDATA=1 or LBDATA=1 or ANCHOR=1.  */
+    if (nb_sectors == 0 || (req->cmd.buf[1] & 0x16)) {
+        scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
+        return;
+    }
+
+    if (bdrv_is_read_only(s->qdev.conf.bs)) {
+        scsi_check_condition(r, SENSE_CODE(WRITE_PROTECTED));
+        return;
+    }
+    if (!check_lba_range(s, r->req.cmd.lba, nb_sectors)) {
+        scsi_check_condition(r, SENSE_CODE(LBA_OUT_OF_RANGE));
+        return;
+    }
+
+    if (buffer_is_zero(inbuf, s->qdev.blocksize)) {
+        int flags = (req->cmd.buf[1] & 0x8) ? BDRV_REQ_MAY_UNMAP : 0;
+
+        /* The request is used as the AIO opaque value, so add a ref.  */
+        scsi_req_ref(&r->req);
+        bdrv_acct_start(s->qdev.conf.bs, &r->acct, nb_sectors * s->qdev.blocksize,
+                        BDRV_ACCT_WRITE);
+        r->req.aiocb = bdrv_aio_write_zeroes(s->qdev.conf.bs,
+                                             r->req.cmd.lba * (s->qdev.blocksize / 512),
+                                             nb_sectors * (s->qdev.blocksize / 512),
+                                             flags, scsi_aio_complete, r);
+        return;
+    }
+
+    data = g_new0(WriteSameCBData, 1);
+    data->r = r;
+    data->sector = r->req.cmd.lba * (s->qdev.blocksize / 512);
+    data->nb_sectors = nb_sectors * (s->qdev.blocksize / 512);
+    data->iov.iov_len = MIN(data->nb_sectors * 512, SCSI_WRITE_SAME_MAX);
+    data->iov.iov_base = buf = qemu_blockalign(s->qdev.conf.bs, data->iov.iov_len);
+    qemu_iovec_init_external(&data->qiov, &data->iov, 1);
+
+    for (i = 0; i < data->iov.iov_len; i += s->qdev.blocksize) {
+        memcpy(&buf[i], inbuf, s->qdev.blocksize);
+    }
+
+    scsi_req_ref(&r->req);
+    bdrv_acct_start(s->qdev.conf.bs, &r->acct, data->iov.iov_len, BDRV_ACCT_WRITE);
+    r->req.aiocb = bdrv_aio_writev(s->qdev.conf.bs, data->sector,
+                                   &data->qiov, data->iov.iov_len / 512,
+                                   scsi_write_same_complete, data);
+}
+
 static void scsi_disk_emulate_write_data(SCSIRequest *req)
 {
     SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
@@ -1612,6 +1720,10 @@ static void scsi_disk_emulate_write_data(SCSIRequest *req)
         scsi_disk_emulate_unmap(r, r->iov.iov_base);
         break;
 
+    case WRITE_SAME_10:
+    case WRITE_SAME_16:
+        scsi_disk_emulate_write_same(r, r->iov.iov_base);
+        break;
     default:
         abort();
     }
@@ -1854,30 +1966,10 @@ static int32_t scsi_disk_emulate_command(SCSIRequest *req, uint8_t *buf)
         break;
     case WRITE_SAME_10:
     case WRITE_SAME_16:
-        nb_sectors = scsi_data_cdb_length(r->req.cmd.buf);
-        if (bdrv_is_read_only(s->qdev.conf.bs)) {
-            scsi_check_condition(r, SENSE_CODE(WRITE_PROTECTED));
-            return 0;
-        }
-        if (!check_lba_range(s, r->req.cmd.lba, nb_sectors)) {
-            goto illegal_lba;
-        }
-
-        /*
-         * We only support WRITE SAME with the unmap bit set for now.
-         * Reject UNMAP=0 or ANCHOR=1.
-         */
-        if (!(req->cmd.buf[1] & 0x8) || (req->cmd.buf[1] & 0x10)) {
-            goto illegal_request;
-        }
-
-        /* The request is used as the AIO opaque value, so add a ref.  */
-        scsi_req_ref(&r->req);
-        r->req.aiocb = bdrv_aio_discard(s->qdev.conf.bs,
-                                        r->req.cmd.lba * (s->qdev.blocksize / 512),
-                                        nb_sectors * (s->qdev.blocksize / 512),
-                                        scsi_aio_complete, r);
-        return 0;
+        DPRINTF("WRITE SAME %d (len %lu)\n",
+                req->cmd.buf[0] == WRITE_SAME_10 ? 10 : 16,
+                (long)r->req.cmd.xfer);
+        break;
     default:
         DPRINTF("Unknown SCSI command (%2.2x)\n", buf[0]);
         scsi_check_condition(r, SENSE_CODE(INVALID_OPCODE));
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/19] block: generalize BlockLimits handling to cover bdrv_aio_discard too
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 01/19] block: generalize BlockLimits handling to cover bdrv_aio_discard too Paolo Bonzini
@ 2013-11-25  9:09   ` Peter Lieven
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Lieven @ 2013-11-25  9:09 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, ronniesahlberg, stefanha

On 22.11.2013 13:39, Paolo Bonzini wrote:
> bdrv_co_discard is only covering drivers which have a .bdrv_co_discard()
> implementation, but not those with .bdrv_aio_discard(). Not very nice,
> and easy to avoid.
>
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block.c | 80 +++++++++++++++++++++++++++++++++--------------------------------
>   1 file changed, 41 insertions(+), 39 deletions(-)
>
> diff --git a/block.c b/block.c
> index e22b55f..1b3e8b2 100644
> --- a/block.c
> +++ b/block.c
> @@ -4288,6 +4288,8 @@ static void coroutine_fn bdrv_discard_co_entry(void *opaque)
>   int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
>                                    int nb_sectors)
>   {
> +    int max_discard;
> +
>       if (!bs->drv) {
>           return -ENOMEDIUM;
>       } else if (bdrv_check_request(bs, sector_num, nb_sectors)) {
> @@ -4305,55 +4307,55 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
>           return 0;
>       }
>   
> -    if (bs->drv->bdrv_co_discard) {
> -        int max_discard = bs->bl.max_discard ?
> -                          bs->bl.max_discard : MAX_DISCARD_DEFAULT;
> +    if (!bs->drv->bdrv_co_discard && !bs->drv->bdrv_aio_discard) {
> +        return 0;
> +    }
>   
> -        while (nb_sectors > 0) {
> -            int ret;
> -            int num = nb_sectors;
> +    max_discard = bs->bl.max_discard ?  bs->bl.max_discard : MAX_DISCARD_DEFAULT;
> +    while (nb_sectors > 0) {
> +        int ret;
> +        int num = nb_sectors;
>   
> -            /* align request */
> -            if (bs->bl.discard_alignment &&
> -                num >= bs->bl.discard_alignment &&
> -                sector_num % bs->bl.discard_alignment) {
> -                if (num > bs->bl.discard_alignment) {
> -                    num = bs->bl.discard_alignment;
> -                }
> -                num -= sector_num % bs->bl.discard_alignment;
> +        /* align request */
> +        if (bs->bl.discard_alignment &&
> +            num >= bs->bl.discard_alignment &&
> +            sector_num % bs->bl.discard_alignment) {
> +            if (num > bs->bl.discard_alignment) {
> +                num = bs->bl.discard_alignment;
>               }
> +            num -= sector_num % bs->bl.discard_alignment;
> +        }
>   
> -            /* limit request size */
> -            if (num > max_discard) {
> -                num = max_discard;
> -            }
> +        /* limit request size */
> +        if (num > max_discard) {
> +            num = max_discard;
> +        }
>   
> +        if (bs->drv->bdrv_co_discard) {
>               ret = bs->drv->bdrv_co_discard(bs, sector_num, num);
> -            if (ret) {
> -                return ret;
> +        } else {
> +            BlockDriverAIOCB *acb;
> +            CoroutineIOCompletion co = {
> +                .coroutine = qemu_coroutine_self(),
> +            };
> +
> +            acb = bs->drv->bdrv_aio_discard(bs, sector_num, nb_sectors,
> +                                            bdrv_co_io_em_complete, &co);
> +            if (acb == NULL) {
> +                return -EIO;
> +            } else {
> +                qemu_coroutine_yield();
> +                ret = co.ret;
>               }
> -
> -            sector_num += num;
> -            nb_sectors -= num;
>           }
> -        return 0;
> -    } else if (bs->drv->bdrv_aio_discard) {
> -        BlockDriverAIOCB *acb;
> -        CoroutineIOCompletion co = {
> -            .coroutine = qemu_coroutine_self(),
> -        };
> -
> -        acb = bs->drv->bdrv_aio_discard(bs, sector_num, nb_sectors,
> -                                        bdrv_co_io_em_complete, &co);
> -        if (acb == NULL) {
> -            return -EIO;
> -        } else {
> -            qemu_coroutine_yield();
> -            return co.ret;
> +        if (ret) {
> +            return ret;
>           }
> -    } else {
> -        return 0;
> +
> +        sector_num += num;
> +        nb_sectors -= num;
>       }
> +    return 0;
>   }
>   
>   int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
Reviewed-by: Peter Lieven <pl@kamp.de>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v3 02/19] block: add flags to BlockRequest
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 02/19] block: add flags to BlockRequest Paolo Bonzini
@ 2013-11-25  9:11   ` Peter Lieven
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Lieven @ 2013-11-25  9:11 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, ronniesahlberg, stefanha

On 22.11.2013 13:39, Paolo Bonzini wrote:
> This lets bdrv_co_do_rw receive flags, so that it can be used for
> zero writes.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block.c               | 17 +++++++++++------
>   include/block/block.h |  1 +
>   2 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/block.c b/block.c
> index 1b3e8b2..0665f35 100644
> --- a/block.c
> +++ b/block.c
> @@ -74,6 +74,7 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
>                                                  int64_t sector_num,
>                                                  QEMUIOVector *qiov,
>                                                  int nb_sectors,
> +                                               BdrvRequestFlags flags,
>                                                  BlockDriverCompletionFunc *cb,
>                                                  void *opaque,
>                                                  bool is_write);
> @@ -3655,7 +3656,7 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
>   {
>       trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
>   
> -    return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors,
> +    return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors, 0,
>                                    cb, opaque, false);
>   }
>   
> @@ -3665,7 +3666,7 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>   {
>       trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
>   
> -    return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors,
> +    return bdrv_co_aio_rw_vector(bs, sector_num, qiov, nb_sectors, 0,
>                                    cb, opaque, true);
>   }
>   
> @@ -3837,8 +3838,10 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int num_reqs)
>       /* Run the aio requests. */
>       mcb->num_requests = num_reqs;
>       for (i = 0; i < num_reqs; i++) {
> -        bdrv_aio_writev(bs, reqs[i].sector, reqs[i].qiov,
> -            reqs[i].nb_sectors, multiwrite_cb, mcb);
> +        bdrv_co_aio_rw_vector(bs, reqs[i].sector, reqs[i].qiov,
> +                              reqs[i].nb_sectors, reqs[i].flags,
> +                              multiwrite_cb, mcb,
> +                              true);
>       }
>   
>       return 0;
> @@ -3980,10 +3983,10 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque)
>   
>       if (!acb->is_write) {
>           acb->req.error = bdrv_co_do_readv(bs, acb->req.sector,
> -            acb->req.nb_sectors, acb->req.qiov, 0);
> +            acb->req.nb_sectors, acb->req.qiov, acb->req.flags);
>       } else {
>           acb->req.error = bdrv_co_do_writev(bs, acb->req.sector,
> -            acb->req.nb_sectors, acb->req.qiov, 0);
> +            acb->req.nb_sectors, acb->req.qiov, acb->req.flags);
>       }
>   
>       acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
> @@ -3994,6 +3997,7 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
>                                                  int64_t sector_num,
>                                                  QEMUIOVector *qiov,
>                                                  int nb_sectors,
> +                                               BdrvRequestFlags flags,
>                                                  BlockDriverCompletionFunc *cb,
>                                                  void *opaque,
>                                                  bool is_write)
> @@ -4005,6 +4009,7 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
>       acb->req.sector = sector_num;
>       acb->req.nb_sectors = nb_sectors;
>       acb->req.qiov = qiov;
> +    acb->req.flags = flags;
>       acb->is_write = is_write;
>       acb->done = NULL;
>   
> diff --git a/include/block/block.h b/include/block/block.h
> index 4d9e67c..703d875 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -311,6 +311,7 @@ typedef struct BlockRequest {
>       /* Fields to be filled by multiwrite caller */
>       int64_t sector;
>       int nb_sectors;
> +    int flags;
>       QEMUIOVector *qiov;
>       BlockDriverCompletionFunc *cb;
>       void *opaque;
Reviewed-by: Peter Lieven <pl@kamp.de>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/19] block: add flags argument to bdrv_co_write_zeroes tracepoint
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 03/19] block: add flags argument to bdrv_co_write_zeroes tracepoint Paolo Bonzini
@ 2013-11-25  9:12   ` Peter Lieven
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Lieven @ 2013-11-25  9:12 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, ronniesahlberg, stefanha

On 22.11.2013 13:39, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block.c      | 2 +-
>   trace-events | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/block.c b/block.c
> index 0665f35..cb07e57 100644
> --- a/block.c
> +++ b/block.c
> @@ -2887,7 +2887,7 @@ int coroutine_fn bdrv_co_write_zeroes(BlockDriverState *bs,
>                                         int64_t sector_num, int nb_sectors,
>                                         BdrvRequestFlags flags)
>   {
> -    trace_bdrv_co_write_zeroes(bs, sector_num, nb_sectors);
> +    trace_bdrv_co_write_zeroes(bs, sector_num, nb_sectors, flags);
>   
>       if (!(bs->open_flags & BDRV_O_UNMAP)) {
>           flags &= ~BDRV_REQ_MAY_UNMAP;
> diff --git a/trace-events b/trace-events
> index 8695e9e..f159ac9 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -64,7 +64,7 @@ bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d"
>   bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
>   bdrv_co_copy_on_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
>   bdrv_co_writev(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
> -bdrv_co_write_zeroes(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
> +bdrv_co_write_zeroes(void *bs, int64_t sector_num, int nb_sector, int flags) "bs %p sector_num %"PRId64" nb_sectors %d flags %#x"
>   bdrv_co_io_em(void *bs, int64_t sector_num, int nb_sectors, int is_write, void *acb) "bs %p sector_num %"PRId64" nb_sectors %d is_write %d acb %p"
>   bdrv_co_do_copy_on_readv(void *bs, int64_t sector_num, int nb_sectors, int64_t cluster_sector_num, int cluster_nb_sectors) "bs %p sector_num %"PRId64" nb_sectors %d cluster_sector_num %"PRId64" cluster_nb_sectors %d"
>   

Reviewed-by: Peter Lieven <pl@kamp.de>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v3 04/19] block: add bdrv_aio_write_zeroes
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 04/19] block: add bdrv_aio_write_zeroes Paolo Bonzini
@ 2013-11-25  9:13   ` Peter Lieven
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Lieven @ 2013-11-25  9:13 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, ronniesahlberg, stefanha

On 22.11.2013 13:39, Paolo Bonzini wrote:
> This will be used by the SCSI layer.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block.c               | 11 +++++++++++
>   include/block/block.h |  3 +++
>   trace-events          |  1 +
>   3 files changed, 15 insertions(+)
>
> diff --git a/block.c b/block.c
> index cb07e57..f9674d9 100644
> --- a/block.c
> +++ b/block.c
> @@ -3670,6 +3670,17 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>                                    cb, opaque, true);
>   }
>   
> +BlockDriverAIOCB *bdrv_aio_write_zeroes(BlockDriverState *bs,
> +        int64_t sector_num, int nb_sectors, BdrvRequestFlags flags,
> +        BlockDriverCompletionFunc *cb, void *opaque)
> +{
> +    trace_bdrv_aio_write_zeroes(bs, sector_num, nb_sectors, flags, opaque);
> +
> +    return bdrv_co_aio_rw_vector(bs, sector_num, NULL, nb_sectors,
> +                                 BDRV_REQ_ZERO_WRITE | flags,
> +                                 cb, opaque, true);
> +}
> +
>   
>   typedef struct MultiwriteCB {
>       int error;
> diff --git a/include/block/block.h b/include/block/block.h
> index 703d875..4967ed2 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -216,6 +216,9 @@ int bdrv_write(BlockDriverState *bs, int64_t sector_num,
>                  const uint8_t *buf, int nb_sectors);
>   int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num,
>                  int nb_sectors, BdrvRequestFlags flags);
> +BlockDriverAIOCB *bdrv_aio_write_zeroes(BlockDriverState *bs, int64_t sector_num,
> +                                        int nb_sectors, BdrvRequestFlags flags,
> +                                        BlockDriverCompletionFunc *cb, void *opaque);
>   int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags);
>   int bdrv_writev(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov);
>   int bdrv_pread(BlockDriverState *bs, int64_t offset,
> diff --git a/trace-events b/trace-events
> index f159ac9..d318d6f 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -60,6 +60,7 @@ bdrv_aio_discard(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs
>   bdrv_aio_flush(void *bs, void *opaque) "bs %p opaque %p"
>   bdrv_aio_readv(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
>   bdrv_aio_writev(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
> +bdrv_aio_write_zeroes(void *bs, int64_t sector_num, int nb_sectors, int flags, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d flags %#x opaque %p"
>   bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d"
>   bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
>   bdrv_co_copy_on_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
Reviewed-by: Peter Lieven <pl@kamp.de>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/19] block: handle ENOTSUP from discard in generic code
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 05/19] block: handle ENOTSUP from discard in generic code Paolo Bonzini
@ 2013-11-25 10:06   ` Peter Lieven
  2013-11-25 10:16     ` Paolo Bonzini
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Lieven @ 2013-11-25 10:06 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, ronniesahlberg, stefanha

On 22.11.2013 13:39, Paolo Bonzini wrote:
> Similar to write_zeroes, let the generic code receive a ENOTSUP for
> discard operations.  Since bdrv_discard has advisory semantics,
> we can just swallow the error.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block.c           |  2 +-
>   block/raw-posix.c | 12 ++++++------
>   2 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/block.c b/block.c
> index f9674d9..b18ee6b 100644
> --- a/block.c
> +++ b/block.c
> @@ -4364,7 +4364,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
>                   ret = co.ret;
>               }
>           }
> -        if (ret) {
> +        if (ret && ret != -ENOTSUP) {
>               return ret;
>           }
>   
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index f836c8e..cfa3162 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -323,10 +323,10 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>       }
>   #endif
>   
> -    s->has_discard = 1;
> +    s->has_discard = true;
>   #ifdef CONFIG_XFS
>       if (platform_test_xfs_fd(s->fd)) {
> -        s->is_xfs = 1;
> +        s->is_xfs = true;
>       }
>   #endif
>   
> @@ -698,8 +698,8 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
>       int ret = -EOPNOTSUPP;
>       BDRVRawState *s = aiocb->bs->opaque;
>   
> -    if (s->has_discard == 0) {
> -        return 0;
> +    if (!s->has_discard) {
> +        return -ENOTSUP;
>       }
>   
>       if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
> @@ -734,8 +734,8 @@ static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
>   
>       if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
>           ret == -ENOTTY) {
> -        s->has_discard = 0;
> -        ret = 0;
> +        s->has_discard = false;
> +        ret = -ENOTSUP;
>       }
>       return ret;
>   }
wouldn't it make sense to return -ENOTSUP in other drivers a well if the
operation is not supported and not return 0?

Otherwise:
Reviewed-by: Peter Lieven <pl@kamp.de>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/19] block: handle ENOTSUP from discard in generic code
  2013-11-25 10:06   ` Peter Lieven
@ 2013-11-25 10:16     ` Paolo Bonzini
  0 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-25 10:16 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, ronniesahlberg, qemu-devel, stefanha

Il 25/11/2013 11:06, Peter Lieven ha scritto:
> wouldn't it make sense to return -ENOTSUP in other drivers a well if the
> operation is not supported and not return 0?

Yes; patches are welcome. :)

(I have a patch for block/iscsi.c that handles LBPU=1/LBPWS=0, and one
of the things it does is "return 0" -> "return -ENOTSUP" for consistency
between iscsi_co_discard and iscsi_co_write_zeroes).

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/19] block: make bdrv_co_do_write_zeroes stricter in producing aligned requests
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 06/19] block: make bdrv_co_do_write_zeroes stricter in producing aligned requests Paolo Bonzini
@ 2013-11-25 10:23   ` Peter Lieven
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Lieven @ 2013-11-25 10:23 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, ronniesahlberg, stefanha

On 22.11.2013 13:39, Paolo Bonzini wrote:
> Right now, bdrv_co_do_write_zeroes will only try to align the
> beginning of the request.  However, it is simpler for many
> formats to expect the block layer to separate both the head *and*
> the tail.  This makes sure that the format's bdrv_co_write_zeroes
> function will be called with aligned sector_num and nb_sectors for
> the bulk of the request.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block.c | 35 +++++++++++++++++++++++------------
>   1 file changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/block.c b/block.c
> index b18ee6b..de66b21 100644
> --- a/block.c
> +++ b/block.c
> @@ -2768,14 +2768,21 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>       while (nb_sectors > 0 && !ret) {
>           int num = nb_sectors;
>   
> -        /* align request */
> -        if (bs->bl.write_zeroes_alignment &&
> -            num >= bs->bl.write_zeroes_alignment &&
> -            sector_num % bs->bl.write_zeroes_alignment) {
> -            if (num > bs->bl.write_zeroes_alignment) {
> +        /* Align request.  Block drivers can expect the "bulk" of the request
> +         * to be aligned.
> +         */
> +        if (bs->bl.write_zeroes_alignment
> +            && num > bs->bl.write_zeroes_alignment) {
> +            if (sector_num % bs->bl.write_zeroes_alignment != 0) {
> +                /* Make a small request up to the first aligned sector.  */
>                   num = bs->bl.write_zeroes_alignment;
> +                num -= sector_num % bs->bl.write_zeroes_alignment;
> +            } else if ((sector_num + num) % bs->bl.write_zeroes_alignment != 0) {
> +                /* Shorten the request to the last aligned sector.  num cannot
> +                 * underflow because num > bs->bl.write_zeroes_alignment.
> +                 */
> +                num -= (sector_num + num) % bs->bl.write_zeroes_alignment;
>               }
> -            num -= sector_num % bs->bl.write_zeroes_alignment;
>           }
>   
>           /* limit request size */
> @@ -2793,16 +2800,20 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>               /* Fall back to bounce buffer if write zeroes is unsupported */
>               iov.iov_len = num * BDRV_SECTOR_SIZE;
>               if (iov.iov_base == NULL) {
> -                /* allocate bounce buffer only once and ensure that it
> -                 * is big enough for this and all future requests.
> -                 */
> -                size_t bufsize = num <= nb_sectors ? num : max_write_zeroes;
> -                iov.iov_base = qemu_blockalign(bs, bufsize * BDRV_SECTOR_SIZE);
> -                memset(iov.iov_base, 0, bufsize * BDRV_SECTOR_SIZE);
> +                iov.iov_base = qemu_blockalign(bs, num * BDRV_SECTOR_SIZE);
> +                memset(iov.iov_base, 0, num * BDRV_SECTOR_SIZE);
>               }
>               qemu_iovec_init_external(&qiov, &iov, 1);
>   
>               ret = drv->bdrv_co_writev(bs, sector_num, num, &qiov);
> +
> +            /* Keep bounce buffer around if it is big enough for all
> +             * all future requests.
> +             */
> +            if (num < max_write_zeroes) {
> +                qemu_vfree(iov.iov_base);
> +                iov.iov_base = NULL;
> +            }
>           }
>   
>           sector_num += num;
Reviewed-by: Peter Lieven <pl@kamp.de>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/19] vpc, vhdx: add get_info
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 07/19] vpc, vhdx: add get_info Paolo Bonzini
@ 2013-11-25 10:27   ` Peter Lieven
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Lieven @ 2013-11-25 10:27 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, ronniesahlberg, stefanha

On 22.11.2013 13:39, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block/vhdx.c | 10 ++++++++++
>   block/vpc.c  | 14 ++++++++++++++
>   2 files changed, 24 insertions(+)
>
> diff --git a/block/vhdx.c b/block/vhdx.c
> index 7d1af96..ed6fa53 100644
> --- a/block/vhdx.c
> +++ b/block/vhdx.c
> @@ -1043,6 +1043,15 @@ static void vhdx_block_translate(BDRVVHDXState *s, int64_t sector_num,
>   }
>   
>   
> +static int vhdx_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
> +{
> +    BDRVVHDXState *s = bs->opaque;
> +
> +    bdi->cluster_size = s->block_size;
> +
> +    return 0;
> +}
> +
>   
>   static coroutine_fn int vhdx_co_readv(BlockDriverState *bs, int64_t sector_num,
>                                         int nb_sectors, QEMUIOVector *qiov)
> @@ -1885,6 +1894,7 @@ static BlockDriver bdrv_vhdx = {
>       .bdrv_co_readv          = vhdx_co_readv,
>       .bdrv_co_writev         = vhdx_co_writev,
>       .bdrv_create            = vhdx_create,
> +    .bdrv_get_info          = vhdx_get_info,
>   
>       .create_options         = vhdx_create_options,
>   };
> diff --git a/block/vpc.c b/block/vpc.c
> index 577cc45..551876f 100644
> --- a/block/vpc.c
> +++ b/block/vpc.c
> @@ -455,6 +455,18 @@ fail:
>       return -1;
>   }
>   
> +static int vpc_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
> +{
> +    BDRVVPCState *s = (BDRVVPCState *)bs->opaque;
> +    VHDFooter *footer = (VHDFooter *) s->footer_buf;
> +
> +    if (cpu_to_be32(footer->type) != VHD_FIXED) {
> +        bdi->cluster_size = s->block_size;
> +    }
> +
> +    return 0;
> +}
> +
>   static int vpc_read(BlockDriverState *bs, int64_t sector_num,
>                       uint8_t *buf, int nb_sectors)
>   {
> @@ -857,6 +869,8 @@ static BlockDriver bdrv_vpc = {
>       .bdrv_read              = vpc_co_read,
>       .bdrv_write             = vpc_co_write,
>   
> +    .bdrv_get_info          = vpc_get_info,
> +
>       .create_options         = vpc_create_options,
>       .bdrv_has_zero_init     = vpc_has_zero_init,
>   };
Reviewed-by: Peter Lieven <pl@kamp.de>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/19] block drivers: add discard/write_zeroes properties to bdrv_get_info implementation
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 08/19] block drivers: add discard/write_zeroes properties to bdrv_get_info implementation Paolo Bonzini
@ 2013-11-25 10:29   ` Peter Lieven
  2013-12-03 15:09   ` Kevin Wolf
  1 sibling, 0 replies; 36+ messages in thread
From: Peter Lieven @ 2013-11-25 10:29 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, ronniesahlberg, stefanha

On 22.11.2013 13:39, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block/qcow2.c | 2 ++
>   block/qed.c   | 2 ++
>   block/vdi.c   | 1 +
>   block/vhdx.c  | 3 +++
>   block/vpc.c   | 1 +
>   5 files changed, 9 insertions(+)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 2fe37ed..1542750 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1892,6 +1892,8 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
>   static int qcow2_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
>   {
>       BDRVQcowState *s = bs->opaque;
> +    bdi->unallocated_blocks_are_zero = true;
> +    bdi->can_write_zeroes_with_unmap = (s->qcow_version >= 3);
>       bdi->cluster_size = s->cluster_size;
>       bdi->vm_state_offset = qcow2_vm_state_offset(s);
>       return 0;
> diff --git a/block/qed.c b/block/qed.c
> index adc2736..59516a5 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -1475,6 +1475,8 @@ static int bdrv_qed_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
>       memset(bdi, 0, sizeof(*bdi));
>       bdi->cluster_size = s->header.cluster_size;
>       bdi->is_dirty = s->header.features & QED_F_NEED_CHECK;
> +    bdi->unallocated_blocks_are_zero = true;
> +    bdi->can_write_zeroes_with_unmap = true;
>       return 0;
>   }
>   
> diff --git a/block/vdi.c b/block/vdi.c
> index b6ec002..2d7490f 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -331,6 +331,7 @@ static int vdi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
>       logout("\n");
>       bdi->cluster_size = s->block_size;
>       bdi->vm_state_offset = 0;
> +    bdi->unallocated_blocks_are_zero = true;
>       return 0;
>   }
>   
> diff --git a/block/vhdx.c b/block/vhdx.c
> index ed6fa53..67bbe10 100644
> --- a/block/vhdx.c
> +++ b/block/vhdx.c
> @@ -1049,6 +1049,9 @@ static int vhdx_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
>   
>       bdi->cluster_size = s->block_size;
>   
> +    bdi->unallocated_blocks_are_zero =
> +        (s->params.data_bits & VHDX_PARAMS_HAS_PARENT) == 0;
> +
>       return 0;
>   }
>   
> diff --git a/block/vpc.c b/block/vpc.c
> index 551876f..1d326cb 100644
> --- a/block/vpc.c
> +++ b/block/vpc.c
> @@ -464,6 +464,7 @@ static int vpc_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
>           bdi->cluster_size = s->block_size;
>       }
>   
> +    bdi->unallocated_blocks_are_zero = true;
>       return 0;
>   }
>   
Reviewed-by: Peter Lieven <pl@kamp.de>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v3 09/19] block drivers: expose requirement for write same alignment from formats
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 09/19] block drivers: expose requirement for write same alignment from formats Paolo Bonzini
@ 2013-11-25 10:33   ` Peter Lieven
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Lieven @ 2013-11-25 10:33 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, ronniesahlberg, stefanha

On 22.11.2013 13:39, Paolo Bonzini wrote:
> This will let misaligned but large requests use zero clusters.  This
> is important because the cluster size is not guest visible.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block/qcow2.c | 1 +
>   block/qed.c   | 1 +
>   block/vmdk.c  | 4 ++++
>   3 files changed, 6 insertions(+)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 1542750..81c4dad 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -718,6 +718,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>       }
>   
>       qemu_opts_del(opts);
> +    bs->bl.write_zeroes_alignment = s->cluster_sectors;
please add a patch that sets

bs->bl.discard_alignment = s->cluster_sectors;

>   
>       if (s->use_lazy_refcounts && s->qcow_version < 3) {
>           error_setg(errp, "Lazy refcounts require a qcow2 image with at least "
> diff --git a/block/qed.c b/block/qed.c
> index 59516a5..450a1fa 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -495,6 +495,7 @@ static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
>           }
>       }
>   
> +    bs->bl.write_zeroes_alignment = s->header.cluster_size >> BDRV_SECTOR_BITS;
>       s->need_check_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>                                               qed_need_check_timer_cb, s);
>   
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 6555663..63ea7ff 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -428,6 +428,10 @@ static int vmdk_add_extent(BlockDriverState *bs,
>       extent->l2_size = l2_size;
>       extent->cluster_sectors = flat ? sectors : cluster_sectors;
>   
> +    if (!flat) {
> +        bs->bl.write_zeroes_alignment =
> +            MAX(bs->bl.write_zeroes_alignment, cluster_sectors);
> +    }
>       if (s->num_extents > 1) {
>           extent->end_sector = (*(extent - 1)).end_sector + extent->sectors;
>       } else {
>

Reviewed-by: Peter Lieven <pl@kamp.de>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v3 12/19] block/iscsi: check WRITE SAME support differently depending on MAY_UNMAP
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 12/19] block/iscsi: check WRITE SAME support differently depending on MAY_UNMAP Paolo Bonzini
@ 2013-11-25 10:34   ` Peter Lieven
  2013-11-25 10:42     ` Paolo Bonzini
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Lieven @ 2013-11-25 10:34 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, ronniesahlberg, stefanha

On 22.11.2013 13:39, Paolo Bonzini wrote:
> The current check is right for MAY_UNMAP=1.  For MAY_UNMAP=0, just
> try and fall back to regular writes as soon as a WRITE SAME command
> fails.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block/iscsi.c | 19 +++++++++++++++++--
>   1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 20f4f55..93fee6d 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -55,6 +55,7 @@ typedef struct IscsiLun {
>       QEMUTimer *nop_timer;
>       uint8_t lbpme;
>       uint8_t lbprz;
> +    uint8_t has_write_same;
>       struct scsi_inquiry_logical_block_provisioning lbp;
>       struct scsi_inquiry_block_limits bl;
>       unsigned char *zeroblock;
> @@ -976,8 +977,13 @@ coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
>           return -EINVAL;
>       }
>   
> -    if (!iscsilun->lbp.lbpws) {
> -        /* WRITE SAME is not supported by the target */
> +    if (!(flags & BDRV_REQ_MAY_UNMAP) && !iscsilun->has_write_same) {
> +        /* WRITE SAME without UNMAP is not supported by the target */
> +        return -ENOTSUP;
> +    }
> +
> +    if ((flags & BDRV_REQ_MAY_UNMAP) && !iscsilun->lbp.lbpws) {
> +        /* WRITE SAME with UNMAP is not supported by the target */
>           return -ENOTSUP;
>       }
>   
> @@ -1012,6 +1018,14 @@ retry:
>       }
>   
>       if (iTask.status != SCSI_STATUS_GOOD) {
> +        if (iTask.status == SCSI_STATUS_CHECK_CONDITION &&
> +            iTask.task->sense.key == SCSI_SENSE_ILLEGAL_REQUEST &&
> +            iTask.task->sense.ascq == SCSI_SENSE_ASCQ_INVALID_OPERATION_CODE) {
> +            /* WRITE SAME is not supported by the target */
> +            iscsilun->has_write_same = false;
> +            return -ENOTSUP;
> +        }
> +
>           return -EIO;
>       }
>   
> @@ -1375,6 +1389,7 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
>       }
>   
>       iscsilun->type = inq->periperal_device_type;
> +    iscsilun->has_write_same = true;
>   
>       if ((ret = iscsi_readcapacity_sync(iscsilun)) != 0) {
>           goto out;
Maybe the naming has_write_same is misleading. It might be better to call
it try_write_same or has_write_same_failed with inverse logic.

If you just change the naming:

Reviewed-by: Peter Lieven <pl@kamp.de>

Peter

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v3 12/19] block/iscsi: check WRITE SAME support differently depending on MAY_UNMAP
  2013-11-25 10:34   ` Peter Lieven
@ 2013-11-25 10:42     ` Paolo Bonzini
  0 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2013-11-25 10:42 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, ronniesahlberg, qemu-devel, stefanha

Il 25/11/2013 11:34, Peter Lieven ha scritto:
>>   @@ -1012,6 +1018,14 @@ retry:
>>       }
>>         if (iTask.status != SCSI_STATUS_GOOD) {
>> +        if (iTask.status == SCSI_STATUS_CHECK_CONDITION &&
>> +            iTask.task->sense.key == SCSI_SENSE_ILLEGAL_REQUEST &&
>> +            iTask.task->sense.ascq ==
>> SCSI_SENSE_ASCQ_INVALID_OPERATION_CODE) {
>> +            /* WRITE SAME is not supported by the target */
>> +            iscsilun->has_write_same = false;
>> +            return -ENOTSUP;
>> +        }
>> +
>>           return -EIO;
>>       }
>>   @@ -1375,6 +1389,7 @@ static int iscsi_open(BlockDriverState *bs,
>> QDict *options, int flags,
>>       }
>>         iscsilun->type = inq->periperal_device_type;
>> +    iscsilun->has_write_same = true;
>>         if ((ret = iscsi_readcapacity_sync(iscsilun)) != 0) {
>>           goto out;
> 
> Maybe the naming has_write_same is misleading. It might be better to call
> it try_write_same or has_write_same_failed with inverse logic.

I was using the same names as block/raw-posix.c.  I'm not sure I like
the other names, but if the maintainers prefer them I'll gladly change them.

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack
  2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
                   ` (18 preceding siblings ...)
  2013-11-22 12:40 ` [Qemu-devel] [PATCH v3 19/19] scsi-disk: correctly implement WRITE SAME Paolo Bonzini
@ 2013-12-03 14:29 ` Stefan Hajnoczi
  19 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2013-12-03 14:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, stefanha, pl, qemu-devel, ronniesahlberg

On Fri, Nov 22, 2013 at 01:39:42PM +0100, Paolo Bonzini wrote:
> This series fixes the implementation of WRITE SAME in the SCSI emulation
> layer.  On top of this, it ensures that zero writes from the guest can
> be offloaded to the host device or filesystem if that's supported.
> This is a relatively important part of the thin provisioning feature,
> and builds on the unmap flag to write_zeroes, recently added by Peter
> Lieven.  The feature works with iscsi, block device and filesystem
> targets.
> 
> v2->v3
>         Drop "block/iscsi: use UNMAP to write zeroes if LBPRZ=1" [Peter]
> 
>         Patches 3/4: change tracepoint flags to %#x [Stefan]
> 
>         Patch 6: clean up conditionals for misaligned num [Stefan]
>         Patch 6: fix freeing of bounce buffer [Peter]
> 
>         Patch 7: fix bdi->cluster_size for vhdx [Stefan]
> 
>         Patch 8: do not check bs->backing_hd to set
>         bdi->unallocated_blocks_are_zero [Peter]
> 
>         Patch 19: use qemu_blockalign/qemu_vfree [Stefan]
> 
> v1->v2
>         I cleaned up the series, putting all the block layer changes
>         together and all the SCSI changes at the end.
> 
>         qcow2 and iscsi are now better covered, which caused the
>         introduction of patches 6-11.  I also downloaded the latest
>         which contributed some changes in patches 10-11.  It turns
>         out we were too cautious in treating LBPRZ as advisory when
>         using the UNMAP command.  On the other hand, the API we
>         introduced is useful for qcow2/qed/vmdk WRITE SAME, so no
>         damage done.
> 
> 
> Paolo Bonzini (17):
>   block: generalize BlockLimits handling to cover bdrv_aio_discard too
>   block: add flags to BlockRequest
>   block: add flags argument to bdrv_co_write_zeroes tracepoint
>   block: add bdrv_aio_write_zeroes
>   block: handle ENOTSUP from discard in generic code
>   block: make bdrv_co_do_write_zeroes stricter in producing aligned
>     requests
>   vpc, vhdx: add get_info
>   block drivers: add discard/write_zeroes properties to bdrv_get_info
>     implementation
>   block drivers: expose requirement for write same alignment from
>     formats
>   block/iscsi: check WRITE SAME support differently depending on
>     MAY_UNMAP
>   raw-posix: implement write_zeroes with MAY_UNMAP for files
>   raw-posix: implement write_zeroes with MAY_UNMAP for block devices
>   raw-posix: add support for write_zeroes on XFS and block devices
>   qemu-iotests: 033 is fast
>   scsi-disk: catch write protection errors in UNMAP
>   scsi-disk: reject ANCHOR=1 for UNMAP and WRITE SAME commands
>   scsi-disk: correctly implement WRITE SAME
> 
> Peter Lieven (2):
>   block/iscsi: remove .bdrv_has_zero_init
>   block/iscsi: updated copyright
> 
>  block.c                  | 145 +++++++++++++++++++++++----------------
>  block/iscsi.c            |  27 +++++---
>  block/qcow2.c            |   3 +
>  block/qed.c              |   3 +
>  block/raw-aio.h          |   3 +-
>  block/raw-posix.c        | 175 ++++++++++++++++++++++++++++++++++++++++++++---
>  block/vdi.c              |   1 +
>  block/vhdx.c             |  13 ++++
>  block/vmdk.c             |   4 ++
>  block/vpc.c              |  15 ++++
>  hw/scsi/scsi-disk.c      | 154 ++++++++++++++++++++++++++++++++++-------
>  include/block/block.h    |   4 ++
>  tests/qemu-iotests/group |   2 +-
>  trace-events             |   4 +-
>  14 files changed, 452 insertions(+), 101 deletions(-)

Some of it is SCSI but I've applied the whole series.

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/19] block drivers: add discard/write_zeroes properties to bdrv_get_info implementation
  2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 08/19] block drivers: add discard/write_zeroes properties to bdrv_get_info implementation Paolo Bonzini
  2013-11-25 10:29   ` Peter Lieven
@ 2013-12-03 15:09   ` Kevin Wolf
  2013-12-03 15:21     ` Paolo Bonzini
  1 sibling, 1 reply; 36+ messages in thread
From: Kevin Wolf @ 2013-12-03 15:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: ronniesahlberg, pl, qemu-devel, stefanha

Am 22.11.2013 um 13:39 hat Paolo Bonzini geschrieben:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/qcow2.c | 2 ++
>  block/qed.c   | 2 ++
>  block/vdi.c   | 1 +
>  block/vhdx.c  | 3 +++
>  block/vpc.c   | 1 +
>  5 files changed, 9 insertions(+)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 2fe37ed..1542750 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1892,6 +1892,8 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
>  static int qcow2_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
>  {
>      BDRVQcowState *s = bs->opaque;
> +    bdi->unallocated_blocks_are_zero = true;
> +    bdi->can_write_zeroes_with_unmap = (s->qcow_version >= 3);
>      bdi->cluster_size = s->cluster_size;
>      bdi->vm_state_offset = qcow2_vm_state_offset(s);
>      return 0;

We must change qcow2_discard_clusters() to set the zero flag instead of
just deallocating the cluster. (We should be doing that anyway, because
the backing file reappearing isn't very nice.)

Not quite sure what to do with version 2 images, though.

For the other formats, I guess this is only correct because they don't
implement discard anyway?

Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/19] block drivers: add discard/write_zeroes properties to bdrv_get_info implementation
  2013-12-03 15:09   ` Kevin Wolf
@ 2013-12-03 15:21     ` Paolo Bonzini
  2013-12-03 17:10       ` Peter Lieven
  0 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2013-12-03 15:21 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: ronniesahlberg, pl, qemu-devel, stefanha

Il 03/12/2013 16:09, Kevin Wolf ha scritto:
>> >      BDRVQcowState *s = bs->opaque;
>> > +    bdi->unallocated_blocks_are_zero = true;
>> > +    bdi->can_write_zeroes_with_unmap = (s->qcow_version >= 3);
>> >      bdi->cluster_size = s->cluster_size;
>> >      bdi->vm_state_offset = qcow2_vm_state_offset(s);
>> >      return 0;
> We must change qcow2_discard_clusters() to set the zero flag instead of
> just deallocating the cluster. (We should be doing that anyway, because
> the backing file reappearing isn't very nice.)

No, that's not needed:

* unallocated_blocks_are_zero is only meaningful for bs->backing_hd ==
NULL (not too intuitive, but I didn't introduce that interface :)).  In
fact, v2 was checking backing_hd == NULL but I removed it after Peter
noticed I was being inconsistent.

* can_write_zeroes_with_unmap correctly returns true only if zero
clusters are enabled

> For the other formats, I guess this is only correct because they don't
> implement discard anyway?

No, it is correct because that's what their bdrv_co_readv (or similar)
will return when a block is not allocated and there is no backing file.
 Of course, for many formats there will never be a backing file.

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/19] block drivers: add discard/write_zeroes properties to bdrv_get_info implementation
  2013-12-03 15:21     ` Paolo Bonzini
@ 2013-12-03 17:10       ` Peter Lieven
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Lieven @ 2013-12-03 17:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, ronniesahlberg@gmail.com, qemu-devel@nongnu.org,
	stefanha@redhat.com

> unallocated_blocks_are_zero  is false if there is a backing hd. Same as has_zero_init.

Peter 

> Am 03.12.2013 um 16:21 schrieb Paolo Bonzini <pbonzini@redhat.com>:
> 
> Il 03/12/2013 16:09, Kevin Wolf ha scritto:
>>>>     BDRVQcowState *s = bs->opaque;
>>>> +    bdi->unallocated_blocks_are_zero = true;
>>>> +    bdi->can_write_zeroes_with_unmap = (s->qcow_version >= 3);
>>>>     bdi->cluster_size = s->cluster_size;
>>>>     bdi->vm_state_offset = qcow2_vm_state_offset(s);
>>>>     return 0;
>> We must change qcow2_discard_clusters() to set the zero flag instead of
>> just deallocating the cluster. (We should be doing that anyway, because
>> the backing file reappearing isn't very nice.)
> 
> No, that's not needed:
> 
> * unallocated_blocks_are_zero is only meaningful for bs->backing_hd ==
> NULL (not too intuitive, but I didn't introduce that interface :)).  In
> fact, v2 was checking backing_hd == NULL but I removed it after Peter
> noticed I was being inconsistent.
> 
> * can_write_zeroes_with_unmap correctly returns true only if zero
> clusters are enabled
> 
>> For the other formats, I guess this is only correct because they don't
>> implement discard anyway?
> 
> No, it is correct because that's what their bdrv_co_readv (or similar)
> will return when a block is not allocated and there is no backing file.
> Of course, for many formats there will never be a backing file.
> 
> Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2013-12-03 17:10 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-22 12:39 [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Paolo Bonzini
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 01/19] block: generalize BlockLimits handling to cover bdrv_aio_discard too Paolo Bonzini
2013-11-25  9:09   ` Peter Lieven
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 02/19] block: add flags to BlockRequest Paolo Bonzini
2013-11-25  9:11   ` Peter Lieven
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 03/19] block: add flags argument to bdrv_co_write_zeroes tracepoint Paolo Bonzini
2013-11-25  9:12   ` Peter Lieven
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 04/19] block: add bdrv_aio_write_zeroes Paolo Bonzini
2013-11-25  9:13   ` Peter Lieven
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 05/19] block: handle ENOTSUP from discard in generic code Paolo Bonzini
2013-11-25 10:06   ` Peter Lieven
2013-11-25 10:16     ` Paolo Bonzini
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 06/19] block: make bdrv_co_do_write_zeroes stricter in producing aligned requests Paolo Bonzini
2013-11-25 10:23   ` Peter Lieven
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 07/19] vpc, vhdx: add get_info Paolo Bonzini
2013-11-25 10:27   ` Peter Lieven
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 08/19] block drivers: add discard/write_zeroes properties to bdrv_get_info implementation Paolo Bonzini
2013-11-25 10:29   ` Peter Lieven
2013-12-03 15:09   ` Kevin Wolf
2013-12-03 15:21     ` Paolo Bonzini
2013-12-03 17:10       ` Peter Lieven
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 09/19] block drivers: expose requirement for write same alignment from formats Paolo Bonzini
2013-11-25 10:33   ` Peter Lieven
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 10/19] block/iscsi: remove .bdrv_has_zero_init Paolo Bonzini
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 11/19] block/iscsi: updated copyright Paolo Bonzini
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 12/19] block/iscsi: check WRITE SAME support differently depending on MAY_UNMAP Paolo Bonzini
2013-11-25 10:34   ` Peter Lieven
2013-11-25 10:42     ` Paolo Bonzini
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 13/19] raw-posix: implement write_zeroes with MAY_UNMAP for files Paolo Bonzini
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 14/19] raw-posix: implement write_zeroes with MAY_UNMAP for block devices Paolo Bonzini
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 15/19] raw-posix: add support for write_zeroes on XFS and " Paolo Bonzini
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 16/19] qemu-iotests: 033 is fast Paolo Bonzini
2013-11-22 12:39 ` [Qemu-devel] [PATCH v3 17/19] scsi-disk: catch write protection errors in UNMAP Paolo Bonzini
2013-11-22 12:40 ` [Qemu-devel] [PATCH v3 18/19] scsi-disk: reject ANCHOR=1 for UNMAP and WRITE SAME commands Paolo Bonzini
2013-11-22 12:40 ` [Qemu-devel] [PATCH v3 19/19] scsi-disk: correctly implement WRITE SAME Paolo Bonzini
2013-12-03 14:29 ` [Qemu-devel] [PATCH v3 00/19] block & scsi: write_zeroes support through the whole stack Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).