From: Anthony Liguori <anthony@codemonkey.ws>
To: "Richard W.M. Jones" <rjones@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH VERSION 3] Disk image exclusive and shared locks.
Date: Tue, 15 Dec 2009 12:45:13 -0600 [thread overview]
Message-ID: <4B27D939.5040005@codemonkey.ws> (raw)
In-Reply-To: <20091215180925.GM3406@amd.home.annexia.org>
Richard W.M. Jones wrote:
> Thanks for looking at this.
>
Thanks for following up :-)
> The use case is "cluster filesystem with an admin tool that must be
> run exclusively". Cluster nodes open the block device for write, but
> with a shared lock. The admin tool needs exclusive access (no nodes
> must be writing), so it tries to open the device with lock=exclusive.
> This only succeeds if the normal cluster nodes have backed off.
>
This seems like a reasonable uncommon scenario and my concern is that it
really hurts the interface. lock=on|off I think qualifies as "The
obvious use is the correct one" whereas I think
lock=exclusive|shared|none would qualify as "Read the implementation and
you'll get it right."
And that's mainly due to the number of corner cases that have to be
considered in order for exclusive to degrade into shared.
> However, the patch would be a bit simpler if we just had lock=on|off
> at the moment, and it wouldn't stop us from adding the shared case in
> future. (For my needs I don't care about the shared case).
>
Let's stick with lock=on|off. If anyone comes along and actually wants
shared|exclusive, we can revisit the thread :-)
>> I really dislike this as an interface. I think we need to make a
>> decision about whether we delay open until after migration has
>> completed. I think unless there's a really compelling argument against
>> it, this is probably what we should do.
>>
>
> I'm guessing this needs some quite major surgery to qemu so that block
> device opening is delayed in the case of migration. I can take a look
> at this.
>
I think it's actually easier than I expected it to be.
We really just need to delay lock acquisition until it's really needed
(read/write op). For probing, we can get away without having a lock
held since it's read only and we only use it to decide which bdrv to use.
So what I'd expect is that we would to acquire a lock until we actually
ran the vm. We would basically have a bdrv_lock_all() that could fail
if some lock couldn't be acquired. We would execute that right before
running a VM (which means it happens after incoming migration completes).
After we completed migration (and did an fsck), we would call
bdrv_unlock_all() to drop the lock. Before we did 'cont' again, we
would issue bdrv_lock_all().
As long as we do a bdrv_unlock_all() before writing the last byte of the
migration stream, it's not at all racy. We also don't have to close the
fd on the source which means that we maintain the same level of robustness.
>> As it stands, we cannot make lock=!none the default if it takes an extra
>> monitor command to allow for live migration. I think if we're going to
>> introduce this functionality, we probably should be enabling it by
>> default.
>>
>
> Xen has a similar feature, and it is enabled by default.
>
And I really, really dislike it :-) There are a lot of corner cases
with requiring the use of '!' to the point where we ended up just using
it for everything in our management tools. IIRC, they don't use
advisory locking and instead try to be clever with something similar to
lsof.
Regards,
Anthony Liguori
next prev parent reply other threads:[~2009-12-15 18:45 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-15 16:42 [Qemu-devel] [PATCH VERSION 3] Disk image exclusive and shared locks Richard W.M. Jones
2009-12-15 18:02 ` Anthony Liguori
2009-12-15 18:09 ` Richard W.M. Jones
2009-12-15 18:45 ` Anthony Liguori [this message]
2009-12-15 18:33 ` Jamie Lokier
2009-12-15 23:26 ` Jamie Lokier
2009-12-16 10:37 ` Kevin Wolf
2009-12-17 13:26 ` Jamie Lokier
2009-12-17 10:53 ` Christoph Hellwig
2009-12-17 11:06 ` Richard W.M. Jones
2009-12-17 15:38 ` Jamie Lokier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B27D939.5040005@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=qemu-devel@nongnu.org \
--cc=rjones@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).