From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:50855)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1Qqmsd-0003Cd-Ek
	for qemu-devel@nongnu.org; Tue, 09 Aug 2011 10:02:09 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1Qqmsa-0003U4-Fj
	for qemu-devel@nongnu.org; Tue, 09 Aug 2011 10:02:03 -0400
Received: from mx1.redhat.com ([209.132.183.28]:60408)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1Qqmsa-0003Rd-7u
	for qemu-devel@nongnu.org; Tue, 09 Aug 2011 10:02:00 -0400
Message-ID: <4E411C61.6050000@redhat.com>
Date: Tue, 09 Aug 2011 13:39:13 +0200
From: Kevin Wolf <kwolf@redhat.com>
MIME-Version: 1.0
References: <CAJSP0QV-3_BK61TYbLMGj+ktap-j1_8Prnjbh7MFYT90Tkdgkg@mail.gmail.com>
	<4E3BB2C3.4020607@redhat.com>
	<CAJSP0QV4YtB4Hi=ivOj1=MJYaEM7=Rwr_wDUseH8KYTc5OFpWw@mail.gmail.com>
	<4E3BBC64.9020005@redhat.com>
	<CAJSP0QVtJ0Nd03ajcaqEa+gnyM9b9+jfLhB91jcLhP-OsBKaTg@mail.gmail.com>
	<4E3FFDE5.1020802@redhat.com>
	<CAJSP0QW7og96jPGnJ7U_VaNnzTUNFuniP3tbezvQrM_54E9cdw@mail.gmail.com>
	<4E410D68.1060701@redhat.com>
	<CAJSP0QXfy4LyOh=MkaVigQuGoPmtEQuasGxfG2wsm6MUHWDpCw@mail.gmail.com>
	<CAJSP0QWzNRKV6gdDtxpkZO-M9KcHX88a3bt+yPD+6+LT=pVV2w@mail.gmail.com>
In-Reply-To: <CAJSP0QWzNRKV6gdDtxpkZO-M9KcHX88a3bt+yPD+6+LT=pVV2w@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Safely reopening image files by stashing fds
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Supriya Kannery <supriyak@linux.vnet.ibm.com>, Anthony Liguori <aliguori@us.ibm.com>, qemu-devel <qemu-devel@nongnu.org>

Am 09.08.2011 12:56, schrieb Stefan Hajnoczi:
> On Tue, Aug 9, 2011 at 11:50 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Tue, Aug 9, 2011 at 11:35 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> Am 09.08.2011 12:25, schrieb Stefan Hajnoczi:
>>>> On Mon, Aug 8, 2011 at 4:16 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>>> Am 08.08.2011 16:49, schrieb Stefan Hajnoczi:
>>>>>> On Fri, Aug 5, 2011 at 10:48 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>>>>> Am 05.08.2011 11:29, schrieb Stefan Hajnoczi:
>>>>>>>> On Fri, Aug 5, 2011 at 10:07 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>>>>>>> Am 05.08.2011 10:40, schrieb Stefan Hajnoczi:
>>>>>>>>>> We've discussed safe methods for reopening image files (e.g. useful for
>>>>>>>>>> changing the hostcache parameter).  The problem is that closing the file first
>>>>>>>>>> and then opening it again exposes us to the error case where the open fails.
>>>>>>>>>> At that point we cannot get to the file anymore and our options are to
>>>>>>>>>> terminate QEMU, pause the VM, or offline the block device.
>>>>>>>>>>
>>>>>>>>>> This window of vulnerability can be eliminated by keeping the file descriptor
>>>>>>>>>> around and falling back to it should the open fail.
>>>>>>>>>>
>>>>>>>>>> The challenge for the file descriptor approach is that image formats, like
>>>>>>>>>> VMDK, can span multiple files.  Therefore the solution is not as simple as
>>>>>>>>>> stashing a single file descriptor and reopening from it.
>>>>>>>>>
>>>>>>>>> So far I agree. The rest I believe is wrong because you can't assume
>>>>>>>>> that every backend uses file descriptors. The qemu block layer is based
>>>>>>>>> on BlockDriverStates, not fds. They are a concept that should be hidden
>>>>>>>>> in raw-posix.
>>>>>>>>>
>>>>>>>>> I think something like this could do:
>>>>>>>>>
>>>>>>>>> struct BDRVReopenState {
>>>>>>>>>    BlockDriverState *bs;
>>>>>>>>>    /* can be extended by block drivers */
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>> .bdrv_reopen(BlockDriverState *bs, BDRVReopenState **reopen_state, int
>>>>>>>>> flags);
>>>>>>>>> .bdrv_reopen_commit(BDRVReopenState *reopen_state);
>>>>>>>>> .bdrv_reopen_abort(BDRVReopenState *reopen_state);
>>>>>>>>>
>>>>>>>>> raw-posix would store the old file descriptor in its reopen_state. On
>>>>>>>>> commit, it closes the old descriptors, on abort it reverts to the old
>>>>>>>>> one and closes the newly opened one.
>>>>>>>>>
>>>>>>>>> Makes things a bit more complicated than the simple bdrv_reopen I had in
>>>>>>>>> mind before, but it allows VMDK to get an all-or-nothing semantics.
>>>>>>>>
>>>>>>>> Can you show how bdrv_reopen() would use these new interfaces?  I'm
>>>>>>>> not 100% clear on the idea.
>>>>>>>
>>>>>>> Well, you wouldn't only call bdrv_reopen, but also either
>>>>>>> bdrv_reopen_commit/abort (for the top-level caller we can have a wrapper
>>>>>>> function that does both, but that's syntactic sugar).
>>>>>>>
>>>>>>> For example we would have:
>>>>>>>
>>>>>>> int vmdk_reopen()
>>>>>>
>>>>>> .bdrv_reopen() is a confusing name for this operation because it does
>>>>>> not reopen anything.  bdrv_prepare_reopen() might be clearer.
>>>>>
>>>>> Makes sense.
>>>>>
>>>>>>
>>>>>>> {
>>>>>>>    *((VMDKReopenState**) rs) = malloc();
>>>>>>>
>>>>>>>    foreach (extent in s->extents) {
>>>>>>>        ret = bdrv_reopen(extent->file, &extent->reopen_state)
>>>>>>>        if (ret < 0)
>>>>>>>            goto fail;
>>>>>>>    }
>>>>>>>    return 0;
>>>>>>>
>>>>>>> fail:
>>>>>>>    foreach (extent in rs->already_reopened) {
>>>>>>>        bdrv_reopen_abort(extent->reopen_state);
>>>>>>>    }
>>>>>>>    return ret;
>>>>>>> }
>>>>>>
>>>>>>> void vmdk_reopen_commit()
>>>>>>> {
>>>>>>>    foreach (extent in s->extents) {
>>>>>>>        bdrv_reopen_commit(extent->reopen_state);
>>>>>>>    }
>>>>>>>    free(rs);
>>>>>>> }
>>>>>>>
>>>>>>> void vmdk_reopen_abort()
>>>>>>> {
>>>>>>>    foreach (extent in s->extents) {
>>>>>>>        bdrv_reopen_abort(extent->reopen_state);
>>>>>>>    }
>>>>>>>    free(rs);
>>>>>>> }
>>>>>>
>>>>>> Does the caller invoke bdrv_close(bs) after bdrv_prepare_reopen(bs,
>>>>>> &rs)?
>>>>>
>>>>> No. Closing the old backend would be part of bdrv_reopen_commit.
>>>>>
>>>>> Do you have a use case where it would be helpful if the caller invoked
>>>>> bdrv_close?
>>>>
>>>> When the caller does bdrv_close() two BlockDriverStates are never open
>>>> for the same image file.  I thought this was a property we wanted.
>>>>
>>>> Also, in the block_set_hostcache case we need to reopen without
>>>> switching to a new BlockDriverState instance.  That means the reopen
>>>> needs to be in-place with respect to the BlockDriverState *bs pointer.
>>>>  We cannot create a new instance.
>>>
>>> Yes, but where do you even get the second BlockDriverState from?
>>>
>>> My prototype only returns an int, not a new BlockDriverState. Until
>>> bdrv_reopen_commit() it would refer to the old file descriptors etc. and
>>> after bdrv_reopen_commit() the very same BlockDriverState would refer to
>>> the new ones.
>>
>> It seems I don't understand the API.  I thought it was:
>>
>> do_block_set_hostcache()
>> {
>>    bdrv_prepare_reopen(bs, &rs);
>>    ...open new file and check everything is okay...
>>    if (ret == 0) {
>>        bdrv_reopen_commit(rs);
>>    } else {
>>        bdrv_reopen_abort(rs);
>>    }
>>    return ret;
>> }
>>
>> If the caller isn't opening the new file then what's the point of
>> giving the caller control over prepare, commit, and abort?
> 
> After sending the last email I realized what I was missing:
> 
> You need the prepare, commit, and abort API in order to handle
> multi-file block drivers like VMDK.

Yes, this is whole point of separating commit out. Does the proposal
make sense to you now?

Kevin