qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, armbru@redhat.com,
	xiechanglong.d@gmail.com, wencongyang2@huawei.com,
	eblake@redhat.com, hreitz@redhat.com, kwolf@redhat.com,
	vsementsov@virtuozzo.com, jsnow@redhat.com
Subject: [PATCH v2 10/19] block: introduce fleecing block driver
Date: Fri, 27 Aug 2021 21:17:59 +0300	[thread overview]
Message-ID: <20210827181808.311670-11-vsementsov@virtuozzo.com> (raw)
In-Reply-To: <20210827181808.311670-1-vsementsov@virtuozzo.com>

Introduce a new driver, that works in pair with copy-before-write to
improve fleecing.

Without fleecing driver, old fleecing scheme looks as follows:

[guest]
  |
  |root
  v
[copy-before-write] -----> [temp.qcow2] <--- [nbd export]
  |                 target  |
  |file                     |backing
  v                         |
[active disk] <-------------+

With fleecing driver, new scheme is:

[guest]
  |
  |root
  v
[copy-before-write] -----> [fleecing] <--- [nbd export]
  |                 target  |    |
  |file                     |    |file
  v                         |    v
[active disk]<--source------+  [temp.img]

Benefits of new scheme:

1. Access control: if remote client try to read data that not covered
   by original dirty bitmap used on copy-before-write open, client gets
   -EACCES.

2. Discard support: if remote client do DISCARD, this additionally to
   discarding data in temp.img informs block-copy process to not copy
   these clusters. Next read from discarded area will return -EACCES.
   This is significant thing: when fleecing user reads data that was
   not yet copied to temp.img, we can avoid copying it on further guest
   write.

3. Synchronisation between client reads and block-copy write is more
   efficient: it doesn't block intersecting block-copy write during
   client read.

4. We don't rely on backing feature: active disk should not be backing
   of temp image, so we avoid some permission-related difficulties and
   temp image now is not required to support backing, it may be simple
   raw image.

Note that now nobody calls fleecing_drv_activate(), so new driver is
actually unusable. It's a work for the following patch: support
fleecing block driver in copy-before-write filter driver.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qapi/block-core.json |  17 ++-
 block/fleecing.h     |  16 +++
 block/fleecing-drv.c | 260 +++++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS          |   1 +
 block/meson.build    |   1 +
 5 files changed, 294 insertions(+), 1 deletion(-)
 create mode 100644 block/fleecing-drv.c

diff --git a/qapi/block-core.json b/qapi/block-core.json
index c42d23752d..8a333136f5 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2826,13 +2826,14 @@
 # @blkreplay: Since 4.2
 # @compress: Since 5.0
 # @copy-before-write: Since 6.2
+# @fleecing: Since 6.2
 #
 # Since: 2.9
 ##
 { 'enum': 'BlockdevDriver',
   'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
             'cloop', 'compress', 'copy-before-write', 'copy-on-read', 'dmg',
-            'file', 'ftp', 'ftps', 'gluster',
+            'file', 'fleecing', 'ftp', 'ftps', 'gluster',
             {'name': 'host_cdrom', 'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
             {'name': 'host_device', 'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
             'http', 'https', 'iscsi',
@@ -4077,6 +4078,19 @@
   'base': 'BlockdevOptionsGenericFormat',
   'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap' } }
 
+##
+# @BlockdevOptionsFleecing:
+#
+# Driver that works in pair with copy-before-write to make fleecing scheme.
+#
+# @source: source node of fleecing
+#
+# Since: 6.2
+##
+{ 'struct': 'BlockdevOptionsFleecing',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { 'source': 'str' } }
+
 ##
 # @BlockdevOptions:
 #
@@ -4133,6 +4147,7 @@
       'copy-on-read':'BlockdevOptionsCor',
       'dmg':        'BlockdevOptionsGenericFormat',
       'file':       'BlockdevOptionsFile',
+      'fleecing':   'BlockdevOptionsFleecing',
       'ftp':        'BlockdevOptionsCurlFtp',
       'ftps':       'BlockdevOptionsCurlFtps',
       'gluster':    'BlockdevOptionsGluster',
diff --git a/block/fleecing.h b/block/fleecing.h
index fb7b2f86c4..75ad2f8b19 100644
--- a/block/fleecing.h
+++ b/block/fleecing.h
@@ -80,6 +80,9 @@
 #include "block/block-copy.h"
 #include "block/reqlist.h"
 
+
+/* fleecing.c */
+
 typedef struct FleecingState FleecingState;
 
 /*
@@ -132,4 +135,17 @@ void fleecing_discard(FleecingState *f, int64_t offset, int64_t bytes);
 void fleecing_mark_done_and_wait_readers(FleecingState *f, int64_t offset,
                                          int64_t bytes);
 
+
+/* fleecing-drv.c */
+
+/* Returns true if @bs->drv is fleecing block driver */
+bool is_fleecing_drv(BlockDriverState *bs);
+
+/*
+ * Normally FleecingState is created by copy-before-write filter. Then
+ * copy-before-write filter calls fleecing_drv_activate() to share FleecingState
+ * with fleecing block driver.
+ */
+void fleecing_drv_activate(BlockDriverState *bs, FleecingState *fleecing);
+
 #endif /* FLEECING_H */
diff --git a/block/fleecing-drv.c b/block/fleecing-drv.c
new file mode 100644
index 0000000000..9161e13809
--- /dev/null
+++ b/block/fleecing-drv.c
@@ -0,0 +1,260 @@
+/*
+ * fleecing block driver
+ *
+ * Copyright (c) 2021 Virtuozzo International GmbH.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+
+#include "sysemu/block-backend.h"
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "block/block_int.h"
+#include "block/coroutines.h"
+#include "block/qdict.h"
+#include "block/block-copy.h"
+#include "block/reqlist.h"
+
+#include "block/copy-before-write.h"
+#include "block/fleecing.h"
+
+typedef struct BDRVFleecingState {
+    FleecingState *fleecing;
+    BdrvChild *source;
+} BDRVFleecingState;
+
+static coroutine_fn int fleecing_co_preadv_part(
+        BlockDriverState *bs, uint64_t offset, uint64_t bytes,
+        QEMUIOVector *qiov, size_t qiov_offset, int flags)
+{
+    BDRVFleecingState *s = bs->opaque;
+    const BlockReq *req;
+    int ret;
+
+    if (!s->fleecing) {
+        /* fleecing_drv_activate() was not called */
+        return -EINVAL;
+    }
+
+    /* TODO: upgrade to async loop using AioTask */
+    while (bytes) {
+        int64_t cur_bytes;
+
+        ret = fleecing_read_lock(s->fleecing, offset, bytes, &req, &cur_bytes);
+        if (ret < 0) {
+            return ret;
+        }
+
+        if (req) {
+            ret = bdrv_co_preadv_part(s->source, offset, cur_bytes,
+                                      qiov, qiov_offset, flags);
+            fleecing_read_unlock(s->fleecing, req);
+        } else {
+            ret = bdrv_co_preadv_part(bs->file, offset, cur_bytes,
+                                      qiov, qiov_offset, flags);
+        }
+        if (ret < 0) {
+            return ret;
+        }
+
+        bytes -= cur_bytes;
+        offset += cur_bytes;
+        qiov_offset += cur_bytes;
+    }
+
+    return 0;
+}
+
+static int coroutine_fn fleecing_co_block_status(BlockDriverState *bs,
+                                                 bool want_zero, int64_t offset,
+                                                 int64_t bytes, int64_t *pnum,
+                                                 int64_t *map,
+                                                 BlockDriverState **file)
+{
+    BDRVFleecingState *s = bs->opaque;
+    const BlockReq *req = NULL;
+    int ret;
+    int64_t cur_bytes;
+
+    if (!s->fleecing) {
+        /* fleecing_drv_activate() was not called */
+        return -EINVAL;
+    }
+
+    ret = fleecing_read_lock(s->fleecing, offset, bytes, &req, &cur_bytes);
+    if (ret < 0) {
+        return ret;
+    }
+
+    *pnum = cur_bytes;
+    *map = offset;
+
+    if (req) {
+        *file = s->source->bs;
+        fleecing_read_unlock(s->fleecing, req);
+    } else {
+        *file = bs->file->bs;
+    }
+
+    return ret;
+}
+
+static int coroutine_fn fleecing_co_pdiscard(BlockDriverState *bs,
+                                             int64_t offset, int bytes)
+{
+    BDRVFleecingState *s = bs->opaque;
+    if (!s->fleecing) {
+        /* fleecing_drv_activate() was not called */
+        return -EINVAL;
+    }
+
+    fleecing_discard(s->fleecing, offset, bytes);
+
+    bdrv_co_pdiscard(bs->file, offset, bytes);
+
+    /*
+     * Ignore bdrv_co_pdiscard() result: fleecing_discard() succeeded, that
+     * means that next read from this area will fail with -EACCES. More correct
+     * to report success now.
+     */
+    return 0;
+}
+
+static int coroutine_fn fleecing_co_pwrite_zeroes(BlockDriverState *bs,
+        int64_t offset, int bytes, BdrvRequestFlags flags)
+{
+    BDRVFleecingState *s = bs->opaque;
+    if (!s->fleecing) {
+        /* fleecing_drv_activate() was not called */
+        return -EINVAL;
+    }
+
+    /*
+     * TODO: implement cache, to have a chance to fleecing user to read and
+     * discard this data before actual writing to temporary image.
+     */
+    return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
+}
+
+static coroutine_fn int fleecing_co_pwritev(BlockDriverState *bs,
+                                       uint64_t offset,
+                                       uint64_t bytes,
+                                       QEMUIOVector *qiov, int flags)
+{
+    BDRVFleecingState *s = bs->opaque;
+    if (!s->fleecing) {
+        /* fleecing_drv_activate() was not called */
+        return -EINVAL;
+    }
+
+    /*
+     * TODO: implement cache, to have a chance to fleecing user to read and
+     * discard this data before actual writing to temporary image.
+     */
+    return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
+}
+
+
+static void fleecing_refresh_filename(BlockDriverState *bs)
+{
+    pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
+            bs->file->bs->filename);
+}
+
+static int fleecing_open(BlockDriverState *bs, QDict *options, int flags,
+                         Error **errp)
+{
+    BDRVFleecingState *s = bs->opaque;
+
+    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+                               BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
+                               false, errp);
+    if (!bs->file) {
+        return -EINVAL;
+    }
+
+    s->source = bdrv_open_child(NULL, options, "source", bs, &child_of_bds,
+                               BDRV_CHILD_DATA, false, errp);
+    if (!s->source) {
+        return -EINVAL;
+    }
+
+    bs->total_sectors = bs->file->bs->total_sectors;
+
+    return 0;
+}
+
+static void fleecing_child_perm(BlockDriverState *bs, BdrvChild *c,
+                                BdrvChildRole role,
+                                BlockReopenQueue *reopen_queue,
+                                uint64_t perm, uint64_t shared,
+                                uint64_t *nperm, uint64_t *nshared)
+{
+    bdrv_default_perms(bs, c, role, reopen_queue, perm, shared, nperm, nshared);
+
+    if (role & BDRV_CHILD_PRIMARY) {
+        *nshared &= BLK_PERM_CONSISTENT_READ;
+    } else {
+        *nperm &= BLK_PERM_CONSISTENT_READ;
+
+        /*
+         * copy-before-write filter is responsible for source child and need
+         * write access to it.
+         */
+        *nshared |= BLK_PERM_WRITE;
+    }
+}
+
+BlockDriver bdrv_fleecing_drv = {
+    .format_name = "fleecing",
+    .instance_size = sizeof(BDRVFleecingState),
+
+    .bdrv_open                  = fleecing_open,
+
+    .bdrv_co_preadv_part        = fleecing_co_preadv_part,
+    .bdrv_co_pwritev            = fleecing_co_pwritev,
+    .bdrv_co_pwrite_zeroes      = fleecing_co_pwrite_zeroes,
+    .bdrv_co_pdiscard           = fleecing_co_pdiscard,
+    .bdrv_co_block_status       = fleecing_co_block_status,
+
+    .bdrv_refresh_filename      = fleecing_refresh_filename,
+
+    .bdrv_child_perm            = fleecing_child_perm,
+};
+
+bool is_fleecing_drv(BlockDriverState *bs)
+{
+    return bs && bs->drv == &bdrv_fleecing_drv;
+}
+
+void fleecing_drv_activate(BlockDriverState *bs, FleecingState *fleecing)
+{
+    BDRVFleecingState *s = bs->opaque;
+
+    assert(is_fleecing_drv(bs));
+
+    s->fleecing = fleecing;
+}
+
+static void fleecing_init(void)
+{
+    bdrv_register(&bdrv_fleecing_drv);
+}
+
+block_init(fleecing_init);
diff --git a/MAINTAINERS b/MAINTAINERS
index 20f3c7eb68..9b62f99557 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2343,6 +2343,7 @@ F: block/copy-before-write.h
 F: block/copy-before-write.c
 F: block/fleecing.h
 F: block/fleecing.c
+F: block/fleecing-drv.c
 F: include/block/aio_task.h
 F: block/aio_task.c
 F: util/qemu-co-shared-resource.c
diff --git a/block/meson.build b/block/meson.build
index caa52f49c2..10cd0c0ad2 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -19,6 +19,7 @@ block_ss.add(files(
   'dirty-bitmap.c',
   'filter-compress.c',
   'fleecing.c',
+  'fleecing-drv.c',
   'io.c',
   'mirror.c',
   'nbd.c',
-- 
2.29.2



  parent reply	other threads:[~2021-08-27 18:38 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-27 18:17 [PATCH v2 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
2021-08-27 18:17 ` [PATCH v2 01/19] block/block-copy: move copy_bitmap initialization to block_copy_state_new() Vladimir Sementsov-Ogievskiy
2021-08-27 18:17 ` [PATCH v2 02/19] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value Vladimir Sementsov-Ogievskiy
2021-08-27 18:17 ` [PATCH v2 03/19] block/block-copy: block_copy_state_new(): add bitmap parameter Vladimir Sementsov-Ogievskiy
2021-08-27 18:17 ` [PATCH v2 04/19] block/copy-before-write: add bitmap open parameter Vladimir Sementsov-Ogievskiy
2021-08-27 18:17 ` [PATCH v2 05/19] block/block-copy: add block_copy_reset() Vladimir Sementsov-Ogievskiy
2021-08-27 18:17 ` [PATCH v2 06/19] block: intoduce reqlist Vladimir Sementsov-Ogievskiy
2021-08-27 18:17 ` [PATCH v2 07/19] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status() Vladimir Sementsov-Ogievskiy
2021-08-27 18:17 ` [PATCH v2 08/19] block/reqlist: add reqlist_wait_all() Vladimir Sementsov-Ogievskiy
2021-08-27 18:17 ` [PATCH v2 09/19] block: introduce FleecingState class Vladimir Sementsov-Ogievskiy
2021-08-27 18:17 ` Vladimir Sementsov-Ogievskiy [this message]
2021-09-01 11:44   ` [PATCH v2 10/19] block: introduce fleecing block driver Markus Armbruster
2021-09-03 13:29     ` Vladimir Sementsov-Ogievskiy
2021-08-27 18:18 ` [PATCH v2 11/19] block/copy-before-write: support " Vladimir Sementsov-Ogievskiy
2021-08-27 18:18 ` [PATCH v2 12/19] block/block-copy: add write-unchanged mode Vladimir Sementsov-Ogievskiy
2021-08-27 18:18 ` [PATCH v2 13/19] block/copy-before-write: use write-unchanged in fleecing mode Vladimir Sementsov-Ogievskiy
2021-08-27 18:18 ` [PATCH v2 14/19] iotests/image-fleecing: add test-case for fleecing format node Vladimir Sementsov-Ogievskiy
2021-08-27 18:18 ` [PATCH v2 15/19] iotests.py: add qemu_io_pipe_and_status() Vladimir Sementsov-Ogievskiy
2021-08-27 18:18 ` [PATCH v2 16/19] iotests/image-fleecing: add test case with bitmap Vladimir Sementsov-Ogievskiy
2021-08-27 18:18 ` [PATCH v2 17/19] block: blk_root(): return non-const pointer Vladimir Sementsov-Ogievskiy
2021-08-27 18:18 ` [PATCH v2 18/19] qapi: backup: add immutable-source parameter Vladimir Sementsov-Ogievskiy
2021-09-01 11:47   ` Markus Armbruster
2021-08-27 18:18 ` [PATCH v2 19/19] iotests/image-fleecing: test push backup with fleecing Vladimir Sementsov-Ogievskiy
2021-09-22  7:45 ` [PATCH v2 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210827181808.311670-11-vsementsov@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=hreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=wencongyang2@huawei.com \
    --cc=xiechanglong.d@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).