From: Kevin Wolf <kwolf@redhat.com>
To: Fam Zheng <famz@redhat.com>
Cc: qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>,
Jeff Cody <jcody@redhat.com>,
Markus Armbruster <armbru@redhat.com>,
Eric Blake <eblake@redhat.com>,
qemu-block@nongnu.org, rjones@redhat.com, stefanha@redhat.com,
pbonzini@redhat.com, John Snow <jsnow@redhat.com>,
berrange@redhat.com, den@openvz.org
Subject: Re: [Qemu-devel] [PATCH v6 08/22] raw-posix: Add image locking support
Date: Fri, 17 Jun 2016 15:07:27 +0200 [thread overview]
Message-ID: <20160617130727.GI5431@noname.redhat.com> (raw)
In-Reply-To: <1464943756-14143-9-git-send-email-famz@redhat.com>
Am 03.06.2016 um 10:49 hat Fam Zheng geschrieben:
> virtlockd in libvirt locks the first byte, we lock byte 1 to avoid
> the intervene.
>
> Both file and host device protocols are covered.
>
> The complication is with reopen. We have three different locking states,
> namely "unlocked", "shared locked" and "exclusively locked".
>
> There have three different states, "unlocked", "shared locked" and "exclusively
> locked".
This seems to be a corrupted copy of the previous sentence. :-)
> When we reopen, the new fd may need a new locking mode. Moving away to
> or from exclusive is a bit tricky because we cannot do it atomically. This
> patch solves it by dup() s->fd to s->lock_fd and avoid close(), so that there
> isn't a racy window where we drop the lock on one fd before acquiring the
> exclusive lock on the other.
>
> To make the logic easier to manage, and allow better reuse, the code is
> internally organized by state transition table (old_lock -> new_lock).
>
> Signed-off-by: Fam Zheng <famz@redhat.com>
I must admit that I don't fully understand yet why we can't change the
lock atomincally and how s->lock_fd helps. In any case, I think it
deserves comments in the code and not only in the commit message.
So I'm not giving a full review here, but I think I have one important
point to make at least.
> block/raw-posix.c | 285 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 285 insertions(+)
>
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index bb8669f..6347350 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -133,6 +133,7 @@ do { \
>
> typedef struct BDRVRawState {
> int fd;
> + int lock_fd;
> int type;
> int open_flags;
> size_t buf_align;
> @@ -153,6 +154,7 @@ typedef struct BDRVRawState {
>
> typedef struct BDRVRawReopenState {
> int fd;
> + int lock_fd;
> int open_flags;
> #ifdef CONFIG_LINUX_AIO
> int use_aio;
> @@ -397,6 +399,37 @@ static void raw_attach_aio_context(BlockDriverState *bs,
> #endif
> }
>
> +static int raw_lockf_fd(int fd, BdrvLockfCmd cmd)
> +{
> + assert(fd >= 0);
> + /* Locking byte 1 avoids interfereing with virtlockd. */
> + switch (cmd) {
> + case BDRV_LOCKF_EXCLUSIVE:
> + return qemu_lock_fd(fd, 1, 1, true);
> + case BDRV_LOCKF_SHARED:
> + return qemu_lock_fd(fd, 1, 1, false);
> + case BDRV_LOCKF_UNLOCK:
> + return qemu_unlock_fd(fd, 1, 1);
> + default:
> + abort();
> + }
> +}
> +
> +static int raw_lockf(BlockDriverState *bs, BdrvLockfCmd cmd)
> +{
> +
> + BDRVRawState *s = bs->opaque;
> +
> + if (s->lock_fd < 0) {
> + s->lock_fd = qemu_dup(s->fd);
> + if (s->lock_fd < 0) {
> + return s->lock_fd;
> + }
> + }
> +
> + return raw_lockf_fd(s->lock_fd, cmd);
> +}
> +
> #ifdef CONFIG_LINUX_AIO
> static int raw_set_aio(LinuxAioState **aio_ctx, int *use_aio, int bdrv_flags)
> {
> @@ -483,6 +516,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
> raw_parse_flags(bdrv_flags, &s->open_flags);
>
> s->fd = -1;
> + s->lock_fd = -1;
> fd = qemu_open(filename, s->open_flags, 0644);
> if (fd < 0) {
> ret = -errno;
> @@ -593,6 +627,241 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
> return ret;
> }
>
> +typedef enum {
> + RAW_REOPEN_PREPARE,
> + RAW_REOPEN_COMMIT,
> + RAW_REOPEN_ABORT
> +} RawReopenOperation;
> +
> +typedef int (*RawReopenFunc)(BDRVReopenState *state,
> + RawReopenOperation op,
> + BdrvLockfCmd old_lock,
> + BdrvLockfCmd new_lock,
> + Error **errp);
> +
> +static
> +int raw_reopen_identical(BDRVReopenState *state,
This is unusual formatting. I'm used to having everything on a single
line or "static int" on its own line, but breaking between "static" and
"int" feels odd.
> + RawReopenOperation op,
> + BdrvLockfCmd old_lock,
> + BdrvLockfCmd new_lock,
> + Error **errp)
> +{
> + assert(old_lock == new_lock);
> + return 0;
> +}
> +
> +static
> +int raw_reopen_from_unlock(BDRVReopenState *state,
> + RawReopenOperation op,
> + BdrvLockfCmd old_lock,
> + BdrvLockfCmd new_lock,
> + Error **errp)
> +{
> + BDRVRawReopenState *raw_s = state->opaque;
> + int ret = 0;
> +
> + assert(old_lock != new_lock);
> + assert(old_lock == BDRV_LOCKF_UNLOCK);
> + switch (op) {
> + case RAW_REOPEN_PREPARE:
> + ret = raw_lockf_fd(raw_s->lock_fd, new_lock);
> + if (ret) {
> + error_setg_errno(errp, -ret, "Failed to lock new fd");
> + }
> + break;
> + case RAW_REOPEN_COMMIT:
> + case RAW_REOPEN_ABORT:
> + break;
> + }
> +
> + return ret;
> +}
> +
> +static
> +int raw_reopen_to_unlock(BDRVReopenState *state,
> + RawReopenOperation op,
> + BdrvLockfCmd old_lock,
> + BdrvLockfCmd new_lock,
> + Error **errp)
> +{
> + BDRVRawState *s = state->bs->opaque;
> + int ret = 0;
> +
> + assert(old_lock != new_lock);
> + assert(old_lock == BDRV_LOCKF_UNLOCK);
> + switch (op) {
> + case RAW_REOPEN_PREPARE:
> + break;
> + case RAW_REOPEN_COMMIT:
> + if (s->lock_fd >= 0) {
> + qemu_close(s->lock_fd);
> + s->lock_fd = -1;
> + }
> + break;
> + case RAW_REOPEN_ABORT:
> + break;
> + }
> +
> + return ret;
> +}
> +
> +static
> +int raw_reopen_upgrade(BDRVReopenState *state,
> + RawReopenOperation op,
> + BdrvLockfCmd old_lock,
> + BdrvLockfCmd new_lock,
> + Error **errp)
> +{
> + BDRVRawReopenState *raw_s = state->opaque;
> + BDRVRawState *s = state->bs->opaque;
> + int ret = 0, ret2;
> +
> + assert(old_lock == BDRV_LOCKF_SHARED);
> + assert(new_lock == BDRV_LOCKF_EXCLUSIVE);
> + switch (op) {
> + case RAW_REOPEN_PREPARE:
> + ret = raw_lockf_fd(raw_s->lock_fd, BDRV_LOCKF_SHARED);
> + if (ret) {
> + error_setg_errno(errp, -ret, "Failed to lock new fd (shared)");
> + break;
> + }
> + ret = raw_lockf_fd(s->lock_fd, BDRV_LOCKF_UNLOCK);
> + if (ret) {
> + error_setg_errno(errp, -ret, "Failed to unlock old fd");
> + goto restore;
> + }
> + ret = raw_lockf_fd(raw_s->lock_fd, BDRV_LOCKF_EXCLUSIVE);
> + if (ret) {
> + error_setg_errno(errp, -ret, "Failed to lock new fd (exclusive)");
> + goto restore;
> + }
> + break;
> + case RAW_REOPEN_COMMIT:
> + break;
> + case RAW_REOPEN_ABORT:
> + raw_lockf_fd(raw_s->lock_fd, BDRV_LOCKF_SHARED);
> + ret = raw_lockf_fd(s->lock_fd, BDRV_LOCKF_SHARED);
> + if (ret) {
> + error_report("Failed to restore lock on old fd");
> + }
> + break;
> + }
> +
> + return ret;
> +restore:
> + ret2 = raw_lockf_fd(s->lock_fd, BDRV_LOCKF_SHARED);
> + if (ret2) {
> + error_report("Failed to restore old lock");
> + }
> + return ret;
> +
> +}
That final empty line doesn't look intentional.
> +
> +static
> +int raw_reopen_downgrade(BDRVReopenState *state,
> + RawReopenOperation op,
> + BdrvLockfCmd old_lock,
> + BdrvLockfCmd new_lock,
> + Error **errp)
> +{
> + BDRVRawReopenState *raw_s = state->opaque;
> + BDRVRawState *s = state->bs->opaque;
> + int ret;
> +
> + assert(old_lock == BDRV_LOCKF_EXCLUSIVE);
> + assert(new_lock == BDRV_LOCKF_SHARED);
> + switch (op) {
> + case RAW_REOPEN_PREPARE:
> + break;
> + case RAW_REOPEN_COMMIT:
> + ret = raw_lockf_fd(s->lock_fd, BDRV_LOCKF_SHARED);
> + if (ret) {
> + error_report("Failed to downgrade old lock");
> + break;
> + }
> + ret = raw_lockf_fd(raw_s->lock_fd, BDRV_LOCKF_SHARED);
> + if (ret) {
> + error_report("Failed to lock new fd");
> + break;
> + }
> + break;
> + case RAW_REOPEN_ABORT:
> + break;
> + }
> +
> + return ret;
> +}
> +
> +static const struct RawReopenFuncRecord {
> + BdrvLockfCmd old_lock;
> + BdrvLockfCmd new_lock;
> + RawReopenFunc func;
> + bool need_lock_fd;
> +} reopen_functions[] = {
> + {BDRV_LOCKF_UNLOCK, BDRV_LOCKF_UNLOCK, raw_reopen_identical, false},
> + {BDRV_LOCKF_UNLOCK, BDRV_LOCKF_SHARED, raw_reopen_from_unlock, true},
> + {BDRV_LOCKF_UNLOCK, BDRV_LOCKF_EXCLUSIVE, raw_reopen_from_unlock, true},
> + {BDRV_LOCKF_SHARED, BDRV_LOCKF_UNLOCK, raw_reopen_to_unlock, false},
> + {BDRV_LOCKF_SHARED, BDRV_LOCKF_SHARED, raw_reopen_identical, false},
> + {BDRV_LOCKF_SHARED, BDRV_LOCKF_EXCLUSIVE, raw_reopen_upgrade, true},
> + {BDRV_LOCKF_EXCLUSIVE, BDRV_LOCKF_UNLOCK, raw_reopen_to_unlock, false},
> + {BDRV_LOCKF_EXCLUSIVE, BDRV_LOCKF_SHARED, raw_reopen_downgrade, true},
> + {BDRV_LOCKF_EXCLUSIVE, BDRV_LOCKF_EXCLUSIVE, raw_reopen_identical, false},
> +};
> +
> +static int raw_reopen_handle_lock(BDRVReopenState *state,
> + RawReopenOperation op,
> + Error **errp)
I think we have one big problem here: We don't know whether raw_s->fd is
already locked or not. If dup() and setting the new flags with fcntl()
succeeded, it is, but if we had to fall back on qemu_open(), it isn't.
This means that doing nothing in the raw_reopen_identical case isn't
right because reopening without intending to change anything about the
locking could end up unlocking the image.
Kevin
next prev parent reply other threads:[~2016-06-17 13:07 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-03 8:48 [Qemu-devel] [PATCH v6 00/22] block: Lock images when opening Fam Zheng
2016-06-03 8:48 ` [Qemu-devel] [PATCH v6 01/22] block: Add flag bits for image locking Fam Zheng
2016-06-17 9:05 ` Kevin Wolf
2016-06-03 8:48 ` [Qemu-devel] [PATCH v6 02/22] qapi: Add lock-mode in blockdev-add options Fam Zheng
2016-06-17 9:17 ` Kevin Wolf
2016-06-18 11:16 ` Fam Zheng
2016-06-20 13:24 ` Kevin Wolf
2016-06-03 8:48 ` [Qemu-devel] [PATCH v6 03/22] blockdev: Add and parse "lock-mode" option for image locking Fam Zheng
2016-06-17 9:23 ` Kevin Wolf
2016-07-05 13:37 ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2016-07-08 2:56 ` Fam Zheng
2016-07-08 9:50 ` Kevin Wolf
2016-07-08 13:05 ` Max Reitz
2016-09-06 16:45 ` Kevin Wolf
2016-09-06 16:51 ` Daniel P. Berrange
2016-09-06 17:15 ` Kevin Wolf
2016-06-03 8:48 ` [Qemu-devel] [PATCH v6 04/22] block: Introduce image file locking Fam Zheng
2016-06-08 1:51 ` Jason Dillaman
2016-06-08 2:45 ` Fam Zheng
2016-06-17 11:34 ` Kevin Wolf
2016-06-22 7:23 ` Fam Zheng
2016-06-22 8:30 ` Kevin Wolf
2016-06-22 9:04 ` Fam Zheng
2016-06-03 8:48 ` [Qemu-devel] [PATCH v6 05/22] osdep: Add qemu_lock_fd and qemu_unlock_fd Fam Zheng
2016-06-17 12:12 ` Kevin Wolf
2016-06-22 7:34 ` Fam Zheng
2016-06-22 8:34 ` Kevin Wolf
2016-06-22 9:10 ` Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 06/22] osdep: Introduce qemu_dup Fam Zheng
2016-06-17 12:32 ` Kevin Wolf
2016-06-17 13:08 ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2016-06-22 7:37 ` Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 07/22] raw-posix: Use qemu_dup Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 08/22] raw-posix: Add image locking support Fam Zheng
2016-06-03 23:53 ` Fam Zheng
2016-06-17 13:07 ` Kevin Wolf [this message]
2016-06-22 8:27 ` Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 09/22] qemu-io: Add "-L" option for BDRV_O_NO_LOCK Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 10/22] qemu-img: Add "-L" option to sub commands Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 11/22] qemu-img: Update documentation of "-L" option Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 12/22] qemu-nbd: Add "--no-lock/-L" option Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 13/22] block: Don't lock drive-backup target image in none mode Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 14/22] mirror: Disable image locking on target backing chain Fam Zheng
2016-06-06 15:03 ` Max Reitz
2016-06-08 3:22 ` Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 15/22] qemu-iotests: 046: Move version detection out from verify_io Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 16/22] qemu-iotests: Wait for QEMU processes before checking image in 091 Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 17/22] qemu-iotests: 030: Disable image locking when checking test image Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 18/22] iotests: 087: Disable image locking in cases where file is shared Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 19/22] iotests: Disable image locking in 085 Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 20/22] tests: Use null-co:// instead of /dev/null Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 21/22] block: Turn on image locking by default Fam Zheng
2016-06-03 8:49 ` [Qemu-devel] [PATCH v6 22/22] qemu-iotests: Add test case 153 for image locking Fam Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160617130727.GI5431@noname.redhat.com \
--to=kwolf@redhat.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=den@openvz.org \
--cc=eblake@redhat.com \
--cc=famz@redhat.com \
--cc=jcody@redhat.com \
--cc=jsnow@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=rjones@redhat.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.