From: Stefan Hajnoczi <stefanha@gmail.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Supriya Kannery <supriyak@linux.vnet.ibm.com>,
Anthony Liguori <aliguori@us.ibm.com>,
qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] Safely reopening image files by stashing fds
Date: Tue, 9 Aug 2011 13:00:56 +0100 [thread overview]
Message-ID: <20110809120014.GA11165@stefanha-thinkpad.localdomain> (raw)
In-Reply-To: <4E411C61.6050000@redhat.com>
On Tue, Aug 09, 2011 at 01:39:13PM +0200, Kevin Wolf wrote:
> Am 09.08.2011 12:56, schrieb Stefan Hajnoczi:
> > On Tue, Aug 9, 2011 at 11:50 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >> On Tue, Aug 9, 2011 at 11:35 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> >>> Am 09.08.2011 12:25, schrieb Stefan Hajnoczi:
> >>>> On Mon, Aug 8, 2011 at 4:16 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> >>>>> Am 08.08.2011 16:49, schrieb Stefan Hajnoczi:
> >>>>>> On Fri, Aug 5, 2011 at 10:48 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> >>>>>>> Am 05.08.2011 11:29, schrieb Stefan Hajnoczi:
> >>>>>>>> On Fri, Aug 5, 2011 at 10:07 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> >>>>>>>>> Am 05.08.2011 10:40, schrieb Stefan Hajnoczi:
> >>>>>>>>>> We've discussed safe methods for reopening image files (e.g. useful for
> >>>>>>>>>> changing the hostcache parameter). The problem is that closing the file first
> >>>>>>>>>> and then opening it again exposes us to the error case where the open fails.
> >>>>>>>>>> At that point we cannot get to the file anymore and our options are to
> >>>>>>>>>> terminate QEMU, pause the VM, or offline the block device.
> >>>>>>>>>>
> >>>>>>>>>> This window of vulnerability can be eliminated by keeping the file descriptor
> >>>>>>>>>> around and falling back to it should the open fail.
> >>>>>>>>>>
> >>>>>>>>>> The challenge for the file descriptor approach is that image formats, like
> >>>>>>>>>> VMDK, can span multiple files. Therefore the solution is not as simple as
> >>>>>>>>>> stashing a single file descriptor and reopening from it.
> >>>>>>>>>
> >>>>>>>>> So far I agree. The rest I believe is wrong because you can't assume
> >>>>>>>>> that every backend uses file descriptors. The qemu block layer is based
> >>>>>>>>> on BlockDriverStates, not fds. They are a concept that should be hidden
> >>>>>>>>> in raw-posix.
> >>>>>>>>>
> >>>>>>>>> I think something like this could do:
> >>>>>>>>>
> >>>>>>>>> struct BDRVReopenState {
> >>>>>>>>> BlockDriverState *bs;
> >>>>>>>>> /* can be extended by block drivers */
> >>>>>>>>> };
> >>>>>>>>>
> >>>>>>>>> .bdrv_reopen(BlockDriverState *bs, BDRVReopenState **reopen_state, int
> >>>>>>>>> flags);
> >>>>>>>>> .bdrv_reopen_commit(BDRVReopenState *reopen_state);
> >>>>>>>>> .bdrv_reopen_abort(BDRVReopenState *reopen_state);
> >>>>>>>>>
> >>>>>>>>> raw-posix would store the old file descriptor in its reopen_state. On
> >>>>>>>>> commit, it closes the old descriptors, on abort it reverts to the old
> >>>>>>>>> one and closes the newly opened one.
> >>>>>>>>>
> >>>>>>>>> Makes things a bit more complicated than the simple bdrv_reopen I had in
> >>>>>>>>> mind before, but it allows VMDK to get an all-or-nothing semantics.
> >>>>>>>>
> >>>>>>>> Can you show how bdrv_reopen() would use these new interfaces? I'm
> >>>>>>>> not 100% clear on the idea.
> >>>>>>>
> >>>>>>> Well, you wouldn't only call bdrv_reopen, but also either
> >>>>>>> bdrv_reopen_commit/abort (for the top-level caller we can have a wrapper
> >>>>>>> function that does both, but that's syntactic sugar).
> >>>>>>>
> >>>>>>> For example we would have:
> >>>>>>>
> >>>>>>> int vmdk_reopen()
> >>>>>>
> >>>>>> .bdrv_reopen() is a confusing name for this operation because it does
> >>>>>> not reopen anything. bdrv_prepare_reopen() might be clearer.
> >>>>>
> >>>>> Makes sense.
> >>>>>
> >>>>>>
> >>>>>>> {
> >>>>>>> *((VMDKReopenState**) rs) = malloc();
> >>>>>>>
> >>>>>>> foreach (extent in s->extents) {
> >>>>>>> ret = bdrv_reopen(extent->file, &extent->reopen_state)
> >>>>>>> if (ret < 0)
> >>>>>>> goto fail;
> >>>>>>> }
> >>>>>>> return 0;
> >>>>>>>
> >>>>>>> fail:
> >>>>>>> foreach (extent in rs->already_reopened) {
> >>>>>>> bdrv_reopen_abort(extent->reopen_state);
> >>>>>>> }
> >>>>>>> return ret;
> >>>>>>> }
> >>>>>>
> >>>>>>> void vmdk_reopen_commit()
> >>>>>>> {
> >>>>>>> foreach (extent in s->extents) {
> >>>>>>> bdrv_reopen_commit(extent->reopen_state);
> >>>>>>> }
> >>>>>>> free(rs);
> >>>>>>> }
> >>>>>>>
> >>>>>>> void vmdk_reopen_abort()
> >>>>>>> {
> >>>>>>> foreach (extent in s->extents) {
> >>>>>>> bdrv_reopen_abort(extent->reopen_state);
> >>>>>>> }
> >>>>>>> free(rs);
> >>>>>>> }
> >>>>>>
> >>>>>> Does the caller invoke bdrv_close(bs) after bdrv_prepare_reopen(bs,
> >>>>>> &rs)?
> >>>>>
> >>>>> No. Closing the old backend would be part of bdrv_reopen_commit.
> >>>>>
> >>>>> Do you have a use case where it would be helpful if the caller invoked
> >>>>> bdrv_close?
> >>>>
> >>>> When the caller does bdrv_close() two BlockDriverStates are never open
> >>>> for the same image file. I thought this was a property we wanted.
> >>>>
> >>>> Also, in the block_set_hostcache case we need to reopen without
> >>>> switching to a new BlockDriverState instance. That means the reopen
> >>>> needs to be in-place with respect to the BlockDriverState *bs pointer.
> >>>> We cannot create a new instance.
> >>>
> >>> Yes, but where do you even get the second BlockDriverState from?
> >>>
> >>> My prototype only returns an int, not a new BlockDriverState. Until
> >>> bdrv_reopen_commit() it would refer to the old file descriptors etc. and
> >>> after bdrv_reopen_commit() the very same BlockDriverState would refer to
> >>> the new ones.
> >>
> >> It seems I don't understand the API. I thought it was:
> >>
> >> do_block_set_hostcache()
> >> {
> >> bdrv_prepare_reopen(bs, &rs);
> >> ...open new file and check everything is okay...
> >> if (ret == 0) {
> >> bdrv_reopen_commit(rs);
> >> } else {
> >> bdrv_reopen_abort(rs);
> >> }
> >> return ret;
> >> }
> >>
> >> If the caller isn't opening the new file then what's the point of
> >> giving the caller control over prepare, commit, and abort?
> >
> > After sending the last email I realized what I was missing:
> >
> > You need the prepare, commit, and abort API in order to handle
> > multi-file block drivers like VMDK.
>
> Yes, this is whole point of separating commit out. Does the proposal
> make sense to you now?
It depends on the details. Adding more functions that every BlockDriver
must implement is bad, so it's important that we only drop this
functionality into raw-posix.c, vmdk.c, and block.c as appropriate.
I liked the idea of doing a generic FDStash type that the monitor and
bdrv_reopen() can use. Blue's idea to hook at the qemu_open() level
takes that further.
But if we can do prepare, commit, and abort in a relatively simple way
then I'm for it.
Stefan
next prev parent reply other threads:[~2011-08-09 13:46 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-05 8:40 [Qemu-devel] Safely reopening image files by stashing fds Stefan Hajnoczi
2011-08-05 9:04 ` Paolo Bonzini
2011-08-05 9:27 ` Stefan Hajnoczi
2011-08-05 9:55 ` Paolo Bonzini
2011-08-05 13:03 ` Stefan Hajnoczi
2011-08-05 13:12 ` Daniel P. Berrange
2011-08-05 14:28 ` Christoph Hellwig
2011-08-05 15:24 ` Stefan Hajnoczi
2011-08-05 15:43 ` Kevin Wolf
2011-08-05 15:49 ` Anthony Liguori
2011-08-08 7:02 ` Supriya Kannery
2011-08-08 8:12 ` Kevin Wolf
2011-08-09 9:22 ` supriya kannery
2011-08-09 9:51 ` Kevin Wolf
2011-08-09 9:32 ` supriya kannery
2011-08-16 19:18 ` [Qemu-devel] [RFC] " Supriya Kannery
2011-08-16 19:18 ` Supriya Kannery
2011-08-17 14:35 ` Kevin Wolf
2011-10-10 18:28 ` [Qemu-devel] " Kevin Wolf
2011-10-11 5:21 ` Supriya Kannery
2011-08-05 14:27 ` Christoph Hellwig
2011-08-05 9:07 ` Kevin Wolf
2011-08-05 9:29 ` Stefan Hajnoczi
2011-08-05 9:48 ` Kevin Wolf
2011-08-08 14:49 ` Stefan Hajnoczi
2011-08-08 15:16 ` Kevin Wolf
2011-08-09 10:25 ` Stefan Hajnoczi
2011-08-09 10:35 ` Kevin Wolf
2011-08-09 10:50 ` Stefan Hajnoczi
2011-08-09 10:56 ` Stefan Hajnoczi
2011-08-09 11:39 ` Kevin Wolf
2011-08-09 12:00 ` Stefan Hajnoczi [this message]
2011-08-09 12:24 ` Kevin Wolf
2011-08-09 19:39 ` Blue Swirl
2011-08-10 7:58 ` Kevin Wolf
2011-08-10 17:20 ` Blue Swirl
2011-08-11 7:37 ` Kevin Wolf
2011-08-11 16:21 ` Blue Swirl
2011-08-05 20:16 ` Blue Swirl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110809120014.GA11165@stefanha-thinkpad.localdomain \
--to=stefanha@gmail.com \
--cc=aliguori@us.ibm.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=supriyak@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).