From: Hanna Reitz <hreitz@redhat.com>
To: Raphael Pour <raphael.pour@hetzner.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
"open list:RBD" <qemu-block@nongnu.org>,
Peter Lieven <pl@kamp.de>,
"open list:All patches CC here" <qemu-devel@nongnu.org>,
Ilya Dryomov <idryomov@gmail.com>
Subject: Re: [PATCH] block/rbd: support driver-specific reopen
Date: Fri, 1 Jul 2022 11:41:18 +0200 [thread overview]
Message-ID: <21cbfa05-0e2c-8e69-a5ab-fac31f87531f@redhat.com> (raw)
In-Reply-To: <20220413122656.3070251-1-raphael.pour@hetzner.com>
On 13.04.22 14:26, Raphael Pour wrote:
> This patch completes the reopen functionality for an attached RBD where altered
> driver options can be passed to. This is necessary to move RBDs between ceph
> clusters without interrupting QEMU, where some ceph settings need to be adjusted.
>
> The reopen_prepare method early returns if no rbd-specific driver options are
> given to maintain compatible with the previous behavior by dropping all
> generic block layer options. Otherwise the reopen acts similar to qemu_rbd_open.
>
> The reopen_commit tears down the old state and replaces it with the new
> one.
>
> The reopen_abort drops an ongoing reopen.
>
> Signed-off-by: Raphael Pour <raphael.pour@hetzner.com>
> ---
> block/rbd.c | 206 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 201 insertions(+), 5 deletions(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 6caf35cbba..e7b45d1c50 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -1029,19 +1029,213 @@ out:
I think the comment above this point (“Since RBD is currently...”)
should either be dropped now or moved above the `if (new_s->snap &&
...)` condition.
> static int qemu_rbd_reopen_prepare(BDRVReopenState *state,
> BlockReopenQueue *queue, Error **errp)
> {
> - BDRVRBDState *s = state->bs->opaque;
> - int ret = 0;
> + BDRVRBDState *new_s = state->bs->opaque;
> + BlockdevOptionsRbd *opts = NULL;
> + const QDictEntry *e;
> + Error *local_err = NULL;
> + char *keypairs, *secretid;
> + rbd_image_info_t info;
> + int r = 0;
>
> - if (s->snap && state->flags & BDRV_O_RDWR) {
> + if (new_s->snap && state->flags & BDRV_O_RDWR) {
> error_setg(errp,
> "Cannot change node '%s' to r/w when using RBD snapshot",
> bdrv_get_device_or_node_name(state->bs));
Is this still the case? I understand next to nothing about RBD, but can
the user not make it R/W if they simultaneously decide to switch from
snapshot to not-snapshot?
(I.e. shouldn’t we just let the generic code below figure out whether
we’ll get an error with the whole new configuration?)
> - ret = -EINVAL;
> + r = -EINVAL;
If it is still relevant: Why not return the error immediately here?
If we don’t, it looks like a couple of bad things might happen below;
like `r` getting overwritten, or `errp` getting set twice (which would
cause an assertion failure).
> }
>
> - return ret;
> + /*
> + * Remove all keys from the generic layer which
> + * can't be converted by rbd
> + */
Does any other driver do this? Shouldn’t we leave them there so that
the generic layer can verify that they aren’t changed?
> + qdict_del(state->options, "driver");
> + qdict_del(state->options, "node-name");
> + qdict_del(state->options, "auto-read-only");
> + qdict_del(state->options, "discard");
> + qdict_del(state->options, "cache");
Because AFAIU this would mean that users could attempt to change e.g.
the @cache option, and wouldn’t receive an error back, even though there
is no support for changing it.
> +
> + /*
> + * To maintain the compatibility prior the rbd-reopen,
> + * where the generic layer can be altered without any
> + * rbd argument given,
What does “the generic layer can be altered” mean? As far as I
understand, it was only possible to change between read/write and
read-only access.
> we must early return if there
> + * aren't any rbd-specific options left.
> + */
> + if (qdict_size(state->options) == 0) {
> + return r;
> + }
> +
> + new_s = state->opaque = g_new0(BDRVReopenState, 1);
This seems like it’s only “new” from this point on, but before that, it
was the old state. I find it confusing that a variable named “new_s”
apparently stored the old state before this point, so if that were the
case, I’d use a different variable (e.g. the previously existing `s`)
for `state->bs->opaque`.
> +
> + keypairs = g_strdup(qdict_get_try_str(state->options, "=keyvalue-pairs"));
> + if (keypairs) {
> + qdict_del(state->options, "=keyvalue-pairs");
> + }
> +
> + secretid = g_strdup(qdict_get_try_str(state->options, "password-secret"));
> + if (secretid) {
> + qdict_del(state->options, "password-secret");
> + }
> +
> + r = qemu_rbd_convert_options(state->options, &opts, &local_err);
> + if (local_err) {
> + /*
> + * If keypairs are present, that means some options are present in
> + * the modern option format. Don't attempt to parse legacy option
> + * formats, as we won't support mixed usage.
> + */
> + if (keypairs) {
> + error_propagate(errp, local_err);
> + goto out;
> + }
> +
> + /*
> + * If the initial attempt to convert and process the options failed,
> + * we may be attempting to open an image file that has the rbd options
> + * specified in the older format consisting of all key/value pairs
> + * encoded in the filename. Go ahead and attempt to parse the
> + * filename, and see if we can pull out the required options.
> + */
> + r = qemu_rbd_attempt_legacy_options(state->options, &opts, &keypairs);
> + if (r < 0) {
> + /*
> + * Propagate the original error, not the legacy parsing fallback
> + * error, as the latter was just a best-effort attempt.
> + */
> + error_propagate(errp, local_err);
> + goto out;
> + }
> + /*
> + * Take care whenever deciding to actually deprecate; once this ability
> + * is removed, we will not be able to open any images with legacy-styled
> + * backing image strings.
> + */
> + warn_report("RBD options encoded in the filename as keyvalue pairs "
> + "is deprecated");
> + }
> +
> + /*
> + * Remove the processed options from the QDict (the visitor processes
> + * _all_ options in the QDict)
> + */
> + while ((e = qdict_first(state->options))) {
> + qdict_del(state->options, e->key);
> + }
OK, I see why you’d then want to remove all non-RBD options before this
point. Other block drivers seem to solve this by having an
X_runtime_opts QemuOptsList, which contains all driver-handled options,
so they can then use qemu_opts_absorb_qdict() to extract the options
they can handle. (Compare e.g. qcow2_update_options_prepare().)
> +
> + r = qemu_rbd_connect(&new_s->cluster, &new_s->io_ctx, opts,
> + !(state->flags & BDRV_O_NOCACHE), keypairs,
> + secretid, errp);
I assume that’s possible without causing issues while we still have the
old connection open? (I can’t test this, but I assume you did, so I’m
just asking back for confirmation :))
> + if (r < 0) {
> + goto out;
> + }
> +
> + new_s->snap = g_strdup(opts->snapshot);
> + new_s->image_name = g_strdup(opts->image);
> +
> + /* rbd_open is always r/w */
> + r = rbd_open(new_s->io_ctx, new_s->image_name, &new_s->image, new_s->snap);
> + if (r < 0) {
> + error_setg_errno(errp, -r, "error reading header from %s",
> + new_s->image_name);
> + goto failed_open;
> + }
> +
> + if (opts->has_encrypt) {
> +#ifdef LIBRBD_SUPPORTS_ENCRYPTION
> + r = qemu_rbd_encryption_load(new_s->image, opts->encrypt, errp);
> + if (r < 0) {
> + goto failed_post_open;
> + }
> +#else
> + r = -ENOTSUP;
> + error_setg(errp, "RBD library does not support image encryption");
> + goto failed_post_open;
> +#endif
> + }
> +
> + r = rbd_stat(new_s->image, &info, sizeof(info));
> + if (r < 0) {
> + error_setg_errno(errp, -r, "error getting image info from %s",
> + new_s->image_name);
> + goto failed_post_open;
> + }
> + new_s->image_size = info.size;
> + new_s->object_size = info.obj_size;
> +
> + /*
> + * If we are using an rbd snapshot, we must be r/o, otherwise
> + * leave as-is
> + */
> + if (new_s->snap != NULL) {
> + r = bdrv_apply_auto_read_only(state->bs, "rbd snapshots are read-only",
> + errp);
> + if (r < 0) {
> + goto failed_post_open;
> + }
> + }
> +
> +#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
> + state->bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK;
> +#endif
> +
> + /* When extending regular files, we get zeros from the OS */
> + state->bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
> +
> + r = 0;
> + goto out;
It seems to me like all of this code comes directly from
qemu_rbd_open(). I think it should therefore be put into a new function
that’s used by both qemu_rbd_open() and qemu_rbd_reopen_prepare().
> +
> +failed_post_open:
> + rbd_close(new_s->image);
> +failed_open:
> + rados_ioctx_destroy(new_s->io_ctx);
> + g_free(new_s->snap);
> + g_free(new_s->image_name);
> + rados_shutdown(new_s->cluster);
> +out:
> + qapi_free_BlockdevOptionsRbd(opts);
> + g_free(keypairs);
> + g_free(secretid);
> + return r;
> }
>
> +static void qemu_rbd_reopen_abort(BDRVReopenState *reopen_state)
> +{
> + BDRVRBDState *s = reopen_state->bs->opaque;
Should this not be `reopen_state->opaque`, i.e. the new state? I
would’ve thought in case of abort we need to leave the old state intact.
> +
> + if (s->io_ctx) {
> + rados_ioctx_destroy(s->io_ctx);
> + }
> +
> + if (s->cluster) {
> + rados_shutdown(s->cluster);
> + }
> +
> + g_free(s->snap);
> + g_free(reopen_state->opaque);
> + reopen_state->opaque = NULL;
These two lines look as I’d’ve expected them, but that makes the
preceding code more suspicious (i.e. we close the old state, then free
the new one).
> +}
> +
> +static void qemu_rbd_reopen_commit(BDRVReopenState *reopen_state)
> +{
> + BDRVRBDState *s = reopen_state->bs->opaque;
> + BDRVRBDState *new_s = reopen_state->opaque;
> +
> + rados_aio_flush(s->io_ctx);
> +
> + rbd_close(s->image);
> + rados_ioctx_destroy(s->io_ctx);
> + g_free(s->snap);
> + rados_shutdown(s->cluster);
> +
> + s->io_ctx = new_s->io_ctx;
> + s->cluster = new_s->cluster;
> + s->image = new_s->image;
> + s->snap = new_s->snap;
> +
> + g_free(reopen_state->opaque);
> + reopen_state->opaque = NULL;
> +}
This looks OK.
> +
> +
> static void qemu_rbd_close(BlockDriverState *bs)
> {
> BDRVRBDState *s = bs->opaque;
> @@ -1628,6 +1822,8 @@ static BlockDriver bdrv_rbd = {
> .bdrv_file_open = qemu_rbd_open,
> .bdrv_close = qemu_rbd_close,
> .bdrv_reopen_prepare = qemu_rbd_reopen_prepare,
> + .bdrv_reopen_commit = qemu_rbd_reopen_commit,
> + .bdrv_reopen_abort = qemu_rbd_reopen_abort,
> .bdrv_co_create = qemu_rbd_co_create,
> .bdrv_co_create_opts = qemu_rbd_co_create_opts,
> .bdrv_has_zero_init = bdrv_has_zero_init_1,
next prev parent reply other threads:[~2022-07-01 9:57 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-13 12:26 [PATCH] block/rbd: support driver-specific reopen Raphael Pour
2022-06-16 9:33 ` Raphael Pour
2022-06-17 9:14 ` Raphael Pour
2022-07-01 9:41 ` Hanna Reitz [this message]
2022-07-18 12:01 ` Raphael Pour
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=21cbfa05-0e2c-8e69-a5ab-fac31f87531f@redhat.com \
--to=hreitz@redhat.com \
--cc=idryomov@gmail.com \
--cc=kwolf@redhat.com \
--cc=pl@kamp.de \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=raphael.pour@hetzner.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).