All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Supriya Kannery <supriyak@linux.vnet.ibm.com>,
	Anthony Liguori <aliguori@us.ibm.com>,
	qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] Safely reopening image files by stashing fds
Date: Mon, 08 Aug 2011 17:16:53 +0200	[thread overview]
Message-ID: <4E3FFDE5.1020802@redhat.com> (raw)
In-Reply-To: <CAJSP0QVtJ0Nd03ajcaqEa+gnyM9b9+jfLhB91jcLhP-OsBKaTg@mail.gmail.com>

Am 08.08.2011 16:49, schrieb Stefan Hajnoczi:
> On Fri, Aug 5, 2011 at 10:48 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 05.08.2011 11:29, schrieb Stefan Hajnoczi:
>>> On Fri, Aug 5, 2011 at 10:07 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>> Am 05.08.2011 10:40, schrieb Stefan Hajnoczi:
>>>>> We've discussed safe methods for reopening image files (e.g. useful for
>>>>> changing the hostcache parameter).  The problem is that closing the file first
>>>>> and then opening it again exposes us to the error case where the open fails.
>>>>> At that point we cannot get to the file anymore and our options are to
>>>>> terminate QEMU, pause the VM, or offline the block device.
>>>>>
>>>>> This window of vulnerability can be eliminated by keeping the file descriptor
>>>>> around and falling back to it should the open fail.
>>>>>
>>>>> The challenge for the file descriptor approach is that image formats, like
>>>>> VMDK, can span multiple files.  Therefore the solution is not as simple as
>>>>> stashing a single file descriptor and reopening from it.
>>>>
>>>> So far I agree. The rest I believe is wrong because you can't assume
>>>> that every backend uses file descriptors. The qemu block layer is based
>>>> on BlockDriverStates, not fds. They are a concept that should be hidden
>>>> in raw-posix.
>>>>
>>>> I think something like this could do:
>>>>
>>>> struct BDRVReopenState {
>>>>    BlockDriverState *bs;
>>>>    /* can be extended by block drivers */
>>>> };
>>>>
>>>> .bdrv_reopen(BlockDriverState *bs, BDRVReopenState **reopen_state, int
>>>> flags);
>>>> .bdrv_reopen_commit(BDRVReopenState *reopen_state);
>>>> .bdrv_reopen_abort(BDRVReopenState *reopen_state);
>>>>
>>>> raw-posix would store the old file descriptor in its reopen_state. On
>>>> commit, it closes the old descriptors, on abort it reverts to the old
>>>> one and closes the newly opened one.
>>>>
>>>> Makes things a bit more complicated than the simple bdrv_reopen I had in
>>>> mind before, but it allows VMDK to get an all-or-nothing semantics.
>>>
>>> Can you show how bdrv_reopen() would use these new interfaces?  I'm
>>> not 100% clear on the idea.
>>
>> Well, you wouldn't only call bdrv_reopen, but also either
>> bdrv_reopen_commit/abort (for the top-level caller we can have a wrapper
>> function that does both, but that's syntactic sugar).
>>
>> For example we would have:
>>
>> int vmdk_reopen()
> 
> .bdrv_reopen() is a confusing name for this operation because it does
> not reopen anything.  bdrv_prepare_reopen() might be clearer.

Makes sense.

> 
>> {
>>    *((VMDKReopenState**) rs) = malloc();
>>
>>    foreach (extent in s->extents) {
>>        ret = bdrv_reopen(extent->file, &extent->reopen_state)
>>        if (ret < 0)
>>            goto fail;
>>    }
>>    return 0;
>>
>> fail:
>>    foreach (extent in rs->already_reopened) {
>>        bdrv_reopen_abort(extent->reopen_state);
>>    }
>>    return ret;
>> }
> 
>> void vmdk_reopen_commit()
>> {
>>    foreach (extent in s->extents) {
>>        bdrv_reopen_commit(extent->reopen_state);
>>    }
>>    free(rs);
>> }
>>
>> void vmdk_reopen_abort()
>> {
>>    foreach (extent in s->extents) {
>>        bdrv_reopen_abort(extent->reopen_state);
>>    }
>>    free(rs);
>> }
> 
> Does the caller invoke bdrv_close(bs) after bdrv_prepare_reopen(bs,
> &rs)? 

No. Closing the old backend would be part of bdrv_reopen_commit.

Do you have a use case where it would be helpful if the caller invoked
bdrv_close?

> There is more state than just the file descriptors and I'm not
> sure that that gets preserved unless we add code to stash away stuff.
> I'm basically hoping this interface does not require touching every
> BlockDriver.

If we only want to change flags like O_DIRECT or O_SYNC, I think format
drivers (except VMDK) can use a standard implementation that just
reopens bs->file.

If we wanted bdrv_reopen to ensure that all caches are dropped etc. then
I think we need a specific implementation in all drivers unless
bdrv->bdrv_open/bdrv_close is good enough to emulate it.

Kevin

  reply	other threads:[~2011-08-08 15:14 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-05  8:40 [Qemu-devel] Safely reopening image files by stashing fds Stefan Hajnoczi
2011-08-05  9:04 ` Paolo Bonzini
2011-08-05  9:27   ` Stefan Hajnoczi
2011-08-05  9:55     ` Paolo Bonzini
2011-08-05 13:03       ` Stefan Hajnoczi
2011-08-05 13:12     ` Daniel P. Berrange
2011-08-05 14:28       ` Christoph Hellwig
2011-08-05 15:24         ` Stefan Hajnoczi
2011-08-05 15:43           ` Kevin Wolf
2011-08-05 15:49             ` Anthony Liguori
2011-08-08  7:02               ` Supriya Kannery
2011-08-08  8:12                 ` Kevin Wolf
2011-08-09  9:22                   ` supriya kannery
2011-08-09  9:51                     ` Kevin Wolf
2011-08-09  9:32                       ` supriya kannery
2011-08-16 19:18                         ` [Qemu-devel] [RFC] " Supriya Kannery
2011-08-16 19:18                         ` Supriya Kannery
2011-08-17 14:35                           ` Kevin Wolf
2011-10-10 18:28                     ` [Qemu-devel] " Kevin Wolf
2011-10-11  5:21                       ` Supriya Kannery
2011-08-05 14:27     ` Christoph Hellwig
2011-08-05  9:07 ` Kevin Wolf
2011-08-05  9:29   ` Stefan Hajnoczi
2011-08-05  9:48     ` Kevin Wolf
2011-08-08 14:49       ` Stefan Hajnoczi
2011-08-08 15:16         ` Kevin Wolf [this message]
2011-08-09 10:25           ` Stefan Hajnoczi
2011-08-09 10:35             ` Kevin Wolf
2011-08-09 10:50               ` Stefan Hajnoczi
2011-08-09 10:56                 ` Stefan Hajnoczi
2011-08-09 11:39                   ` Kevin Wolf
2011-08-09 12:00                     ` Stefan Hajnoczi
2011-08-09 12:24                       ` Kevin Wolf
2011-08-09 19:39                         ` Blue Swirl
2011-08-10  7:58                           ` Kevin Wolf
2011-08-10 17:20                             ` Blue Swirl
2011-08-11  7:37                               ` Kevin Wolf
2011-08-11 16:21                                 ` Blue Swirl
2011-08-05 20:16 ` Blue Swirl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E3FFDE5.1020802@redhat.com \
    --to=kwolf@redhat.com \
    --cc=aliguori@us.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=supriyak@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.