qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Cc: armbru@redhat.com, kwolf@redhat.com, mreitz@redhat.com,
	jsnow@redhat.com, famz@redhat.com, den@openvz.org
Subject: Re: [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
Date: Fri, 29 Jun 2018 12:24:30 -0500	[thread overview]
Message-ID: <8ecd1901-4148-6dc5-667d-d3c13260f534@redhat.com> (raw)
In-Reply-To: <20180629151524.138542-3-vsementsov@virtuozzo.com>

On 06/29/2018 10:15 AM, Vladimir Sementsov-Ogievskiy wrote:
> We need to synchronize backup job with reading from fleecing image
> like it was done in block/replication.c.
> 
> Otherwise, the following situation is theoretically possible:
> 

Grammar suggestions:

> 1. client start reading

client starts reading

> 2. client understand, that there is no corresponding cluster in
>     fleecing image
> 3. client is going to read from backing file (i.e. active image)

client sees that no corresponding cluster has been allocated in the 
fleecing image, so the request is forwarded to the backing file

> 4. guest writes to active image
> 5. this write is stopped by backup(sync=none) and cluster is copied to
>     fleecing image
> 6. guest write continues...
> 7. and client reads _new_ (or partly new) date from active image

Interesting race. Can it actually happen, or does our read code already 
serialize writes to the same area while a read is underway?

In short, I see what problem you are claiming exists: the moment the 
client starts reading from the backing file, that portion of the backing 
file must remain unchanged until after the client is done reading.  But 
I don't know enough details of the block layer to know if this is 
actually a problem, or if adding the new filter is just overhead.

> 
> So, this fleecing-filter should be above fleecing image, the whole
> picture of fleecing looks like this:
> 
>      +-------+           +------------+
>      |       |           |            |
>      | guest |           | NBD client +<------+
>      |       |           |            |       |
>      ++-----++           +------------+       |only read
>       |     ^                                 |
>       | IO  |                                 |
>       v     |                           +-----+------+
>      ++-----+---------+                 |            |
>      |                |                 |  internal  |
>      |  active image  +----+            | NBD server |
>      |                |    |            |            |
>      +-+--------------+    |backup      +-+----------+
>        ^                   |sync=none     ^
>        |backing            |              |only read
>        |                   |              |
>      +-+--------------+    |       +------+----------+
>      |                |    |       |                 |
>      | fleecing image +<---+       | fleecing filter |
>      |                |            |                 |
>      +--------+-------+            +-----+-----------+
>               ^                          |
>               |                          |
>               +--------------------------+
>                         file

Can you also show the sequence of QMP commands to set up this structure 
(or maybe you do in 3/3; which I haven't looked at yet).

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   qapi/block-core.json    |  6 ++--
>   block/fleecing-filter.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++
>   block/Makefile.objs     |  1 +
>   3 files changed, 85 insertions(+), 2 deletions(-)
>   create mode 100644 block/fleecing-filter.c
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 577ce5e999..43872c3d79 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2542,7 +2542,8 @@
>               'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs',
>               'null-aio', 'null-co', 'nvme', 'parallels', 'qcow', 'qcow2', 'qed',
>               'quorum', 'raw', 'rbd', 'replication', 'sheepdog', 'ssh',
> -            'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs' ] }
> +            'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs',
> +            'fleecing-filter' ] }

Missing a 'since 3.0' documentation blurb; also, this enum has been kept 
sorted, so your new filter needs to come earlier.

>   
>   ##
>   # @BlockdevOptionsFile:
> @@ -3594,7 +3595,8 @@
>         'vmdk':       'BlockdevOptionsGenericCOWFormat',
>         'vpc':        'BlockdevOptionsGenericFormat',
>         'vvfat':      'BlockdevOptionsVVFAT',
> -      'vxhs':       'BlockdevOptionsVxHS'
> +      'vxhs':       'BlockdevOptionsVxHS',
> +      'fleecing-filter': 'BlockdevOptionsGenericFormat'

Again, this has been kept sorted.

> +static coroutine_fn int fleecing_co_preadv(BlockDriverState *bs,
> +                                           uint64_t offset, uint64_t bytes,
> +                                           QEMUIOVector *qiov, int flags)
> +{
> +    int ret;
> +    BlockJob *job = bs->file->bs->backing->bs->job;
> +    CowRequest req;
> +
> +    backup_wait_for_overlapping_requests(job, offset, bytes);
> +    backup_cow_request_begin(&req, job, offset, bytes);
> +
> +    ret = bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
> +
> +    backup_cow_request_end(&req);
> +
> +    return ret;
> +}

So the idea here is that you force a serializing request to ensure that 
there are no other writes to the area in the meantime.

> +
> +static coroutine_fn int fleecing_co_pwritev(BlockDriverState *bs,
> +                                            uint64_t offset, uint64_t bytes,
> +                                            QEMUIOVector *qiov, int flags)
> +{
> +    return -EINVAL;

and you force this to be a read-only interface. (Does the block layer 
actually require us to provide a pwritev callback, or can we leave it 
NULL instead?)

> +BlockDriver bdrv_fleecing_filter = {
> +    .format_name = "fleecing-filter",
> +    .protocol_name = "fleecing-filter",
> +    .instance_size = 0,
> +
> +    .bdrv_open = fleecing_open,
> +    .bdrv_close = fleecing_close,
> +
> +    .bdrv_getlength = fleecing_getlength,
> +    .bdrv_co_preadv = fleecing_co_preadv,
> +    .bdrv_co_pwritev = fleecing_co_pwritev,
> +
> +    .is_filter = true,
> +    .bdrv_recurse_is_first_non_filter = fleecing_recurse_is_first_non_filter,
> +    .bdrv_child_perm        = bdrv_filter_default_perms,

No .bdrv_co_block_status callback?  That probably hurts querying for 
sparse regions.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

  reply	other threads:[~2018-06-29 17:24 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-29 15:15 [Qemu-devel] [PATCH v2 0/3] image fleecing Vladimir Sementsov-Ogievskiy
2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 1/3] blockdev-backup: enable non-root nodes for backup source Vladimir Sementsov-Ogievskiy
2018-06-29 17:13   ` Eric Blake
2018-06-29 17:31   ` John Snow
2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing Vladimir Sementsov-Ogievskiy
2018-06-29 17:24   ` Eric Blake [this message]
2018-07-02  6:35     ` Fam Zheng
2018-07-02 11:27       ` Vladimir Sementsov-Ogievskiy
2018-07-02 11:47     ` Vladimir Sementsov-Ogievskiy
2018-06-29 17:30   ` John Snow
2018-06-29 17:40     ` Eric Blake
2018-07-02 12:09       ` Vladimir Sementsov-Ogievskiy
2018-07-03 11:15         ` Kevin Wolf
2018-07-03 11:52           ` Vladimir Sementsov-Ogievskiy
2018-07-03 16:11           ` Vladimir Sementsov-Ogievskiy
2018-07-03 18:02             ` Kevin Wolf
2018-07-04 14:07           ` Max Reitz
2018-07-02 11:57     ` Vladimir Sementsov-Ogievskiy
2018-07-03 11:22       ` Kevin Wolf
2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 3/3] qemu-iotests: Image fleecing test case 222 Vladimir Sementsov-Ogievskiy
2018-06-29 15:31   ` Vladimir Sementsov-Ogievskiy
2018-06-29 17:58   ` Eric Blake
2018-06-29 21:04     ` John Snow
2018-07-02  6:45       ` Fam Zheng
2018-07-02 12:58       ` Vladimir Sementsov-Ogievskiy
2018-06-29 16:38 ` [Qemu-devel] [PATCH v2 0/3] image fleecing John Snow
2018-06-29 17:36   ` Vladimir Sementsov-Ogievskiy
2018-06-29 17:52     ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ecd1901-4148-6dc5-667d-d3c13260f534@redhat.com \
    --to=eblake@redhat.com \
    --cc=armbru@redhat.com \
    --cc=den@openvz.org \
    --cc=famz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).