From: Kevin Wolf <kwolf@redhat.com>
To: qemu-block@nongnu.org
Cc: kwolf@redhat.com, peter.maydell@linaro.org, qemu-devel@nongnu.org
Subject: [PULL 07/28] block/rbd: add write zeroes support
Date: Fri, 9 Jul 2021 14:50:14 +0200 [thread overview]
Message-ID: <20210709125035.191321-8-kwolf@redhat.com> (raw)
In-Reply-To: <20210709125035.191321-1-kwolf@redhat.com>
From: Peter Lieven <pl@kamp.de>
This patch wittingly sets BDRV_REQ_NO_FALLBACK and silently ignores
BDRV_REQ_MAY_UNMAP for older librbd versions.
The rationale for this is as follows (citing Ilya Dryomov current RBD
maintainer):
---8<---
a) remove the BDRV_REQ_MAY_UNMAP check in qemu_rbd_co_pwrite_zeroes()
and as a consequence always unmap if librbd is too old
It's not clear what qemu's expectation is but in general Write
Zeroes is allowed to unmap. The only guarantee is that subsequent
reads return zeroes, everything else is a hint. This is how it is
specified in the kernel and in the NVMe spec.
In particular, block/nvme.c implements it as follows:
if (flags & BDRV_REQ_MAY_UNMAP) {
cdw12 |= (1 << 25);
}
This sets the Deallocate bit. But if it's not set, the device may
still deallocate:
"""
If the Deallocate bit (CDW12.DEAC) is set to '1' in a Write Zeroes
command, and the namespace supports clearing all bytes to 0h in the
values read (e.g., bits 2:0 in the DLFEAT field are set to 001b)
from a deallocated logical block and its metadata (excluding
protection information), then for each specified logical block, the
controller:
- should deallocate that logical block;
...
If the Deallocate bit is cleared to '0' in a Write Zeroes command,
and the namespace supports clearing all bytes to 0h in the values
read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from
a deallocated logical block and its metadata (excluding protection
information), then, for each specified logical block, the
controller:
- may deallocate that logical block;
"""
https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-2021.06.02-Ratified-1.pdf
b) set BDRV_REQ_NO_FALLBACK in supported_zero_flags
Again, it's not clear what qemu expects here, but without it we end
up in a ridiculous situation where specifying the "don't allow slow
fallback" switch immediately fails all efficient zeroing requests on
a device where Write Zeroes is always efficient:
$ qemu-io -c 'help write' | grep -- '-[zun]'
-n, -- with -z, don't allow slow fallback
-u, -- with -z, allow unmapping
-z, -- write zeroes using blk_co_pwrite_zeroes
$ qemu-io -f rbd -c 'write -z -u -n 0 1M' rbd:foo/bar
write failed: Operation not supported
--->8---
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-6-idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
block/rbd.c | 32 +++++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)
diff --git a/block/rbd.c b/block/rbd.c
index 380ad28861..3152bc8ba0 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -75,7 +75,8 @@ typedef enum {
RBD_AIO_READ,
RBD_AIO_WRITE,
RBD_AIO_DISCARD,
- RBD_AIO_FLUSH
+ RBD_AIO_FLUSH,
+ RBD_AIO_WRITE_ZEROES
} RBDAIOCmd;
typedef struct BDRVRBDState {
@@ -999,6 +1000,10 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
}
}
+#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
+ bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK;
+#endif
+
/* When extending regular files, we get zeros from the OS */
bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
@@ -1123,6 +1128,18 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
case RBD_AIO_FLUSH:
r = rbd_aio_flush(s->image, c);
break;
+#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
+ case RBD_AIO_WRITE_ZEROES: {
+ int zero_flags = 0;
+#ifdef RBD_WRITE_ZEROES_FLAG_THICK_PROVISION
+ if (!(flags & BDRV_REQ_MAY_UNMAP)) {
+ zero_flags = RBD_WRITE_ZEROES_FLAG_THICK_PROVISION;
+ }
+#endif
+ r = rbd_aio_write_zeroes(s->image, offset, bytes, c, zero_flags, 0);
+ break;
+ }
+#endif
default:
r = -EINVAL;
}
@@ -1193,6 +1210,16 @@ static int coroutine_fn qemu_rbd_co_pdiscard(BlockDriverState *bs,
return qemu_rbd_start_co(bs, offset, count, NULL, 0, RBD_AIO_DISCARD);
}
+#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
+static int
+coroutine_fn qemu_rbd_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+ int count, BdrvRequestFlags flags)
+{
+ return qemu_rbd_start_co(bs, offset, count, NULL, flags,
+ RBD_AIO_WRITE_ZEROES);
+}
+#endif
+
static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi)
{
BDRVRBDState *s = bs->opaque;
@@ -1473,6 +1500,9 @@ static BlockDriver bdrv_rbd = {
.bdrv_co_pwritev = qemu_rbd_co_pwritev,
.bdrv_co_flush_to_disk = qemu_rbd_co_flush,
.bdrv_co_pdiscard = qemu_rbd_co_pdiscard,
+#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
+ .bdrv_co_pwrite_zeroes = qemu_rbd_co_pwrite_zeroes,
+#endif
.bdrv_snapshot_create = qemu_rbd_snap_create,
.bdrv_snapshot_delete = qemu_rbd_snap_remove,
--
2.31.1
next prev parent reply other threads:[~2021-07-09 12:55 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-09 12:50 [PULL 00/28] Block layer patches Kevin Wolf
2021-07-09 12:50 ` [PULL 01/28] MAINTAINERS: update block/rbd.c maintainer Kevin Wolf
2021-07-09 12:50 ` [PULL 02/28] block/rbd: Add support for rbd image encryption Kevin Wolf
2021-07-09 12:50 ` [PULL 03/28] block/rbd: bump librbd requirement to luminous release Kevin Wolf
2021-07-09 12:50 ` [PULL 04/28] block/rbd: store object_size in BDRVRBDState Kevin Wolf
2021-07-09 12:50 ` [PULL 05/28] block/rbd: update s->image_size in qemu_rbd_getlength Kevin Wolf
2021-07-09 12:50 ` [PULL 06/28] block/rbd: migrate from aio to coroutines Kevin Wolf
2021-07-09 12:50 ` Kevin Wolf [this message]
2021-07-09 12:50 ` [PULL 08/28] block/rbd: drop qemu_rbd_refresh_limits Kevin Wolf
2021-07-09 12:50 ` [PULL 09/28] util/uri: do not check argument of uri_free() Kevin Wolf
2021-07-09 12:50 ` [PULL 10/28] export/fuse: Pass default_permissions for mount Kevin Wolf
2021-12-24 15:04 ` Vladimir Sementsov-Ogievskiy
2022-01-03 10:04 ` Hanna Reitz
2021-07-09 12:50 ` [PULL 11/28] export/fuse: Add allow-other option Kevin Wolf
2021-07-09 12:50 ` [PULL 12/28] export/fuse: Give SET_ATTR_SIZE its own branch Kevin Wolf
2021-07-09 12:50 ` [PULL 13/28] export/fuse: Let permissions be adjustable Kevin Wolf
2021-07-09 12:50 ` [PULL 14/28] iotests/308: Test +w on read-only FUSE exports Kevin Wolf
2021-07-09 12:50 ` [PULL 15/28] iotests/fuse-allow-other: Test allow-other Kevin Wolf
2021-07-09 12:50 ` [PULL 16/28] block/rbd: fix type of task->complete Kevin Wolf
2021-07-09 12:50 ` [PULL 17/28] MAINTAINERS: add block/rbd.c reviewer Kevin Wolf
2021-07-09 12:50 ` [PULL 18/28] vhost-user: Fix backends without multiqueue support Kevin Wolf
2021-07-09 12:50 ` [PULL 19/28] blockdev: fix drive-backup transaction endless drained section Kevin Wolf
2021-07-09 12:50 ` [PULL 20/28] qcow2: Prohibit backing file changes in 'qemu-img amend' Kevin Wolf
2021-07-09 12:50 ` [PULL 21/28] qemu-img: Require -F with -b backing image Kevin Wolf
2021-07-09 12:50 ` [PULL 22/28] qemu-img: Improve error for rebase without backing format Kevin Wolf
2021-07-09 12:50 ` [PULL 23/28] qcow2: Fix dangling pointer after reopen for 'file' Kevin Wolf
2021-07-09 12:50 ` [PULL 24/28] block: Add bdrv_reopen_queue_free() Kevin Wolf
2021-07-09 12:50 ` [PULL 25/28] block: Acquire AioContexts during bdrv_reopen_multiple() Kevin Wolf
2021-07-09 12:50 ` [PULL 26/28] block: Support multiple reopening with x-blockdev-reopen Kevin Wolf
2021-07-09 12:50 ` [PULL 27/28] iotests: Test reopening multiple devices at the same time Kevin Wolf
2021-07-09 12:50 ` [PULL 28/28] block: Make blockdev-reopen stable API Kevin Wolf
2021-07-10 20:27 ` [PULL 00/28] Block layer patches Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210709125035.191321-8-kwolf@redhat.com \
--to=kwolf@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).