* Re: [Qemu-devel] [PATCH] block/rbd: add .bdrv_reopen_prepare() stub
2016-05-17 10:03 ` Sebastian Färber
@ 2016-05-17 14:00 ` Jason Dillaman
2016-05-17 18:48 ` Josh Durgin
2016-05-18 5:31 ` Jeff Cody
2 siblings, 0 replies; 9+ messages in thread
From: Jason Dillaman @ 2016-05-17 14:00 UTC (permalink / raw)
To: Sebastian Färber
Cc: Kevin Wolf, Josh Durgin, Max Reitz, qemu-devel, qemu-block,
Jeffrey Cody
On Tue, May 17, 2016 at 6:03 AM, Sebastian Färber <sfaerber82@gmail.com> wrote:
> Hi Kevin,
>
>> A correct reopen implementation must consider all options and flags that
>> .bdrv_open() looked at.
>>
>> The options are okay, as both "filename" and "password-secret" aren't
>> things that we want to allow a reopen to change. However, in the flags
>> BDRV_O_NOCACHE makes a difference:
>>
>> if (flags & BDRV_O_NOCACHE) {
>> rados_conf_set(s->cluster, "rbd_cache", "false");
>> } else {
>> rados_conf_set(s->cluster, "rbd_cache", "true");
>> }
>>
>> A reopen must either update the setting, or if it can't (e.g. because
>> librbd doesn't support it) any attempt to change the flag must fail.
>
> Thanks for the feedback.
> As far as i can tell it's not possible to update the cache settings
> without reconnecting. I've added a check in the following patch.
> Would be great if someone who knows the internals of ceph/rbd could
> have a look as well.
Correct, you cannot currently dynamically enable nor disable the
cache. You would need to close the image and re-open it to change the
cache settings.
> Sebastian
>
> -- >8 --
> Subject: [PATCH] block/rbd: add .bdrv_reopen_prepare() stub
>
> Add support for reopen() by adding the .bdrv_reopen_prepare() stub
>
> Signed-off-by: Sebastian Färber <sfaerber82@gmail.com>
> ---
> block/rbd.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 5bc5b32..8ecf096 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -577,6 +577,19 @@ failed_opts:
> return r;
> }
>
> +/* Note that this will not re-establish a connection with the Ceph cluster
> + - it is effectively a NOP. */
> +static int qemu_rbd_reopen_prepare(BDRVReopenState *state,
> + BlockReopenQueue *queue, Error **errp)
> +{
> + if (state->flags & BDRV_O_NOCACHE &&
> + ((state->bs->open_flags & BDRV_O_NOCACHE) == 0)) {
> + error_setg(errp, "Cannot turn off rbd_cache during reopen");
> + return -EINVAL;
> + }
> + return 0;
> +}
> +
> static void qemu_rbd_close(BlockDriverState *bs)
> {
> BDRVRBDState *s = bs->opaque;
> @@ -976,6 +989,7 @@ static BlockDriver bdrv_rbd = {
> .instance_size = sizeof(BDRVRBDState),
> .bdrv_needs_filename = true,
> .bdrv_file_open = qemu_rbd_open,
> + .bdrv_reopen_prepare = qemu_rbd_reopen_prepare,
> .bdrv_close = qemu_rbd_close,
> .bdrv_create = qemu_rbd_create,
> .bdrv_has_zero_init = bdrv_has_zero_init_1,
> --
> 1.8.3.1
>
>
--
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] block/rbd: add .bdrv_reopen_prepare() stub
2016-05-17 10:03 ` Sebastian Färber
2016-05-17 14:00 ` Jason Dillaman
@ 2016-05-17 18:48 ` Josh Durgin
2016-05-18 7:36 ` Sebastian Färber
2016-05-18 8:19 ` Kevin Wolf
2016-05-18 5:31 ` Jeff Cody
2 siblings, 2 replies; 9+ messages in thread
From: Josh Durgin @ 2016-05-17 18:48 UTC (permalink / raw)
To: Sebastian Färber, Kevin Wolf; +Cc: qemu-block, qemu-devel, jcody, mreitz
On 05/17/2016 03:03 AM, Sebastian Färber wrote:
> Hi Kevin,
>
>> A correct reopen implementation must consider all options and flags that
>> .bdrv_open() looked at.
>>
>> The options are okay, as both "filename" and "password-secret" aren't
>> things that we want to allow a reopen to change. However, in the flags
>> BDRV_O_NOCACHE makes a difference:
>>
>> if (flags & BDRV_O_NOCACHE) {
>> rados_conf_set(s->cluster, "rbd_cache", "false");
>> } else {
>> rados_conf_set(s->cluster, "rbd_cache", "true");
>> }
>>
>> A reopen must either update the setting, or if it can't (e.g. because
>> librbd doesn't support it) any attempt to change the flag must fail.
Updating this setting on an open image won't do anything, but if you
rbd_close() and rbd_open() it again the setting will take effect.
rbd_close() will force a flush of any pending I/O in librbd and
free the memory for librbd's ImageCtx, which may or may not be desired
here.
> Thanks for the feedback.
> As far as i can tell it's not possible to update the cache settings
> without reconnecting. I've added a check in the following patch.
> Would be great if someone who knows the internals of ceph/rbd could
> have a look as well.
There's no need to reset the librados state, so connections to the
cluster can stick around. I'm a bit unclear on the bdrv_reopen_*
functions though - what is their intended use and semantics?
> Sebastian
>
> -- >8 --
> Subject: [PATCH] block/rbd: add .bdrv_reopen_prepare() stub
>
> Add support for reopen() by adding the .bdrv_reopen_prepare() stub
>
> Signed-off-by: Sebastian Färber <sfaerber82@gmail.com>
> ---
> block/rbd.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 5bc5b32..8ecf096 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -577,6 +577,19 @@ failed_opts:
> return r;
> }
>
> +/* Note that this will not re-establish a connection with the Ceph cluster
> + - it is effectively a NOP. */
> +static int qemu_rbd_reopen_prepare(BDRVReopenState *state,
> + BlockReopenQueue *queue, Error **errp)
> +{
> + if (state->flags & BDRV_O_NOCACHE &&
> + ((state->bs->open_flags & BDRV_O_NOCACHE) == 0)) {
> + error_setg(errp, "Cannot turn off rbd_cache during reopen");
> + return -EINVAL;
> + }
> + return 0;
> +}
> +
> static void qemu_rbd_close(BlockDriverState *bs)
> {
> BDRVRBDState *s = bs->opaque;
> @@ -976,6 +989,7 @@ static BlockDriver bdrv_rbd = {
> .instance_size = sizeof(BDRVRBDState),
> .bdrv_needs_filename = true,
> .bdrv_file_open = qemu_rbd_open,
> + .bdrv_reopen_prepare = qemu_rbd_reopen_prepare,
> .bdrv_close = qemu_rbd_close,
> .bdrv_create = qemu_rbd_create,
> .bdrv_has_zero_init = bdrv_has_zero_init_1,
> --
> 1.8.3.1
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] block/rbd: add .bdrv_reopen_prepare() stub
2016-05-17 18:48 ` Josh Durgin
@ 2016-05-18 7:36 ` Sebastian Färber
2016-05-18 8:19 ` Kevin Wolf
1 sibling, 0 replies; 9+ messages in thread
From: Sebastian Färber @ 2016-05-18 7:36 UTC (permalink / raw)
To: Josh Durgin, Kevin Wolf; +Cc: qemu-block, qemu-devel, jcody, mreitz
>
> There's no need to reset the librados state, so connections to the
> cluster can stick around. I'm a bit unclear on the bdrv_reopen_*
> functions though - what is their intended use and semantics?
My motivation for implementing this basic reopen support is getting
active block commit in qemu (driven by libvirt) to work for rbd.
For example:
- VM is running and using rbd device as primary disk
- Create external local qcow2 snapshot with rbd as backing device
- All writes end up in local qcow2 file
- Do active block commit to get changes back to rbd after a while
Without the patch the last part (active block commit) fails in qemu:
Block format 'rbd' used by device '' does not support reopening files
I think, in this case, block commit wants to switch the rbd device to RW
to make sure it can write back the changes. Which is unnecessary in this case
because it already is writeable.
Unfortunately i don't think i can implement proper/full-blown reopen support (like gluster)
for rbd - this would be way above my head ...
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] block/rbd: add .bdrv_reopen_prepare() stub
2016-05-17 18:48 ` Josh Durgin
2016-05-18 7:36 ` Sebastian Färber
@ 2016-05-18 8:19 ` Kevin Wolf
2016-05-18 15:54 ` Jason Dillaman
1 sibling, 1 reply; 9+ messages in thread
From: Kevin Wolf @ 2016-05-18 8:19 UTC (permalink / raw)
To: Josh Durgin; +Cc: Sebastian Färber, qemu-block, qemu-devel, jcody, mreitz
Am 17.05.2016 um 20:48 hat Josh Durgin geschrieben:
> On 05/17/2016 03:03 AM, Sebastian Färber wrote:
> >Hi Kevin,
> >
> >>A correct reopen implementation must consider all options and flags that
> >>.bdrv_open() looked at.
> >>
> >>The options are okay, as both "filename" and "password-secret" aren't
> >>things that we want to allow a reopen to change. However, in the flags
> >>BDRV_O_NOCACHE makes a difference:
> >>
> >> if (flags & BDRV_O_NOCACHE) {
> >> rados_conf_set(s->cluster, "rbd_cache", "false");
> >> } else {
> >> rados_conf_set(s->cluster, "rbd_cache", "true");
> >> }
> >>
> >>A reopen must either update the setting, or if it can't (e.g. because
> >>librbd doesn't support it) any attempt to change the flag must fail.
>
> Updating this setting on an open image won't do anything, but if you
> rbd_close() and rbd_open() it again the setting will take effect.
> rbd_close() will force a flush of any pending I/O in librbd and
> free the memory for librbd's ImageCtx, which may or may not be desired
> here.
First rbd_close() and then rbd_open() risks that the rbd_open() fails
and we end up with no usable image at all. Can we open a second instance
of the image first and only close the first one if that succeeded?
We already flush all requests before calling this, so that part
shouldn't make a difference.
> >Thanks for the feedback.
> >As far as i can tell it's not possible to update the cache settings
> >without reconnecting. I've added a check in the following patch.
> >Would be great if someone who knows the internals of ceph/rbd could
> >have a look as well.
>
> There's no need to reset the librados state, so connections to the
> cluster can stick around. I'm a bit unclear on the bdrv_reopen_*
> functions though - what is their intended use and semantics?
They change the options and flags that were specified in .bdrv_open().
The most important use case today is switching between read-only and
read-write, but changing the cache mode or any other option can be
requested as well.
> >Sebastian
> >
> >-- >8 --
> >Subject: [PATCH] block/rbd: add .bdrv_reopen_prepare() stub
> >
> >Add support for reopen() by adding the .bdrv_reopen_prepare() stub
> >
> >Signed-off-by: Sebastian Färber <sfaerber82@gmail.com>
> >---
> > block/rbd.c | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> >diff --git a/block/rbd.c b/block/rbd.c
> >index 5bc5b32..8ecf096 100644
> >--- a/block/rbd.c
> >+++ b/block/rbd.c
> >@@ -577,6 +577,19 @@ failed_opts:
> > return r;
> > }
> >
> >+/* Note that this will not re-establish a connection with the Ceph cluster
> >+ - it is effectively a NOP. */
> >+static int qemu_rbd_reopen_prepare(BDRVReopenState *state,
> >+ BlockReopenQueue *queue, Error **errp)
> >+{
> >+ if (state->flags & BDRV_O_NOCACHE &&
> >+ ((state->bs->open_flags & BDRV_O_NOCACHE) == 0)) {
This misses the other direction, where you try to turn on caching. If we
don't implement the real functionality, we should always error out if
the bit changes. The most readable check is probably:
(state->flags & BDRV_O_NOCACHE) != (state->bs->open_flags & BDRV_O_NOCACHE)
> >+ error_setg(errp, "Cannot turn off rbd_cache during reopen");
> >+ return -EINVAL;
> >+ }
> >+ return 0;
> >+}
Kevin
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] block/rbd: add .bdrv_reopen_prepare() stub
2016-05-18 8:19 ` Kevin Wolf
@ 2016-05-18 15:54 ` Jason Dillaman
0 siblings, 0 replies; 9+ messages in thread
From: Jason Dillaman @ 2016-05-18 15:54 UTC (permalink / raw)
To: Kevin Wolf
Cc: Josh Durgin, Jeffrey Cody, Max Reitz, Sebastian Färber,
qemu-block, qemu-devel
On Wed, May 18, 2016 at 4:19 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Updating this setting on an open image won't do anything, but if you
>> rbd_close() and rbd_open() it again the setting will take effect.
>> rbd_close() will force a flush of any pending I/O in librbd and
>> free the memory for librbd's ImageCtx, which may or may not be desired
>> here.
>
> First rbd_close() and then rbd_open() risks that the rbd_open() fails
> and we end up with no usable image at all. Can we open a second instance
> of the image first and only close the first one if that succeeded?
>
> We already flush all requests before calling this, so that part
> shouldn't make a difference.
You can open the same image twice without issue. It's perfectly fine
to rbd_open() with the new settings and upon success rbd_close() the
original image handle.
--
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] block/rbd: add .bdrv_reopen_prepare() stub
2016-05-17 10:03 ` Sebastian Färber
2016-05-17 14:00 ` Jason Dillaman
2016-05-17 18:48 ` Josh Durgin
@ 2016-05-18 5:31 ` Jeff Cody
2 siblings, 0 replies; 9+ messages in thread
From: Jeff Cody @ 2016-05-18 5:31 UTC (permalink / raw)
To: Sebastian Färber; +Cc: Kevin Wolf, qemu-block, qemu-devel, jdurgin, mreitz
On Tue, May 17, 2016 at 12:03:53PM +0200, Sebastian Färber wrote:
> Hi Kevin,
>
> > A correct reopen implementation must consider all options and flags that
> > .bdrv_open() looked at.
> >
> > The options are okay, as both "filename" and "password-secret" aren't
> > things that we want to allow a reopen to change. However, in the flags
> > BDRV_O_NOCACHE makes a difference:
> >
> > if (flags & BDRV_O_NOCACHE) {
> > rados_conf_set(s->cluster, "rbd_cache", "false");
> > } else {
> > rados_conf_set(s->cluster, "rbd_cache", "true");
> > }
> >
> > A reopen must either update the setting, or if it can't (e.g. because
> > librbd doesn't support it) any attempt to change the flag must fail.
>
> Thanks for the feedback.
> As far as i can tell it's not possible to update the cache settings
> without reconnecting. I've added a check in the following patch.
> Would be great if someone who knows the internals of ceph/rbd could
> have a look as well.
>
> Sebastian
>
Is there any reason you cannot create a new connection, and then tear down
the old one, and switch over to use the new connection?
The reopen functions have a three operations:
* _prepare()
Parses flags, creates new connections / file descriptors, etc., but does not
do anything that cannot be undone, or anything that makes the reopen
operation "live".
* _commit()
(if applicable) switches over to the new fd, and makes the
reopen operation complete (and cleans up anything that needs to be cleaned
up from _prepare()).
* _abort()
cleanly undoes anything done in _prepare() - called if there is an error in
_prepare(), or if the reopen operation is part of a larger transaction of
multiple reopens, one of which failed.
For instance, it may be useful to look at how it is handled in the gluster
driver, and see if you can do something similar here in ceph. For gluster,
we create a new gluster connection in the _prepare(), and in the _commit()
we close the old fd, and switch the image driver instance state fd over to
the new one.
> -- >8 --
> Subject: [PATCH] block/rbd: add .bdrv_reopen_prepare() stub
>
> Add support for reopen() by adding the .bdrv_reopen_prepare() stub
>
> Signed-off-by: Sebastian Färber <sfaerber82@gmail.com>
> ---
> block/rbd.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 5bc5b32..8ecf096 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -577,6 +577,19 @@ failed_opts:
> return r;
> }
>
> +/* Note that this will not re-establish a connection with the Ceph cluster
> + - it is effectively a NOP. */
> +static int qemu_rbd_reopen_prepare(BDRVReopenState *state,
> + BlockReopenQueue *queue, Error **errp)
> +{
> + if (state->flags & BDRV_O_NOCACHE &&
> + ((state->bs->open_flags & BDRV_O_NOCACHE) == 0)) {
> + error_setg(errp, "Cannot turn off rbd_cache during reopen");
> + return -EINVAL;
> + }
> + return 0;
> +}
> +
> static void qemu_rbd_close(BlockDriverState *bs)
> {
> BDRVRBDState *s = bs->opaque;
> @@ -976,6 +989,7 @@ static BlockDriver bdrv_rbd = {
> .instance_size = sizeof(BDRVRBDState),
> .bdrv_needs_filename = true,
> .bdrv_file_open = qemu_rbd_open,
> + .bdrv_reopen_prepare = qemu_rbd_reopen_prepare,
> .bdrv_close = qemu_rbd_close,
> .bdrv_create = qemu_rbd_create,
> .bdrv_has_zero_init = bdrv_has_zero_init_1,
> --
> 1.8.3.1
>
^ permalink raw reply [flat|nested] 9+ messages in thread