From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:50855) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qqmsd-0003Cd-Ek for qemu-devel@nongnu.org; Tue, 09 Aug 2011 10:02:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Qqmsa-0003U4-Fj for qemu-devel@nongnu.org; Tue, 09 Aug 2011 10:02:03 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60408) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qqmsa-0003Rd-7u for qemu-devel@nongnu.org; Tue, 09 Aug 2011 10:02:00 -0400 Message-ID: <4E411C61.6050000@redhat.com> Date: Tue, 09 Aug 2011 13:39:13 +0200 From: Kevin Wolf MIME-Version: 1.0 References: <4E3BB2C3.4020607@redhat.com> <4E3BBC64.9020005@redhat.com> <4E3FFDE5.1020802@redhat.com> <4E410D68.1060701@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Safely reopening image files by stashing fds List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Supriya Kannery , Anthony Liguori , qemu-devel Am 09.08.2011 12:56, schrieb Stefan Hajnoczi: > On Tue, Aug 9, 2011 at 11:50 AM, Stefan Hajnoczi wrote: >> On Tue, Aug 9, 2011 at 11:35 AM, Kevin Wolf wrote: >>> Am 09.08.2011 12:25, schrieb Stefan Hajnoczi: >>>> On Mon, Aug 8, 2011 at 4:16 PM, Kevin Wolf wrote: >>>>> Am 08.08.2011 16:49, schrieb Stefan Hajnoczi: >>>>>> On Fri, Aug 5, 2011 at 10:48 AM, Kevin Wolf wrote: >>>>>>> Am 05.08.2011 11:29, schrieb Stefan Hajnoczi: >>>>>>>> On Fri, Aug 5, 2011 at 10:07 AM, Kevin Wolf wrote: >>>>>>>>> Am 05.08.2011 10:40, schrieb Stefan Hajnoczi: >>>>>>>>>> We've discussed safe methods for reopening image files (e.g. useful for >>>>>>>>>> changing the hostcache parameter). The problem is that closing the file first >>>>>>>>>> and then opening it again exposes us to the error case where the open fails. >>>>>>>>>> At that point we cannot get to the file anymore and our options are to >>>>>>>>>> terminate QEMU, pause the VM, or offline the block device. >>>>>>>>>> >>>>>>>>>> This window of vulnerability can be eliminated by keeping the file descriptor >>>>>>>>>> around and falling back to it should the open fail. >>>>>>>>>> >>>>>>>>>> The challenge for the file descriptor approach is that image formats, like >>>>>>>>>> VMDK, can span multiple files. Therefore the solution is not as simple as >>>>>>>>>> stashing a single file descriptor and reopening from it. >>>>>>>>> >>>>>>>>> So far I agree. The rest I believe is wrong because you can't assume >>>>>>>>> that every backend uses file descriptors. The qemu block layer is based >>>>>>>>> on BlockDriverStates, not fds. They are a concept that should be hidden >>>>>>>>> in raw-posix. >>>>>>>>> >>>>>>>>> I think something like this could do: >>>>>>>>> >>>>>>>>> struct BDRVReopenState { >>>>>>>>> BlockDriverState *bs; >>>>>>>>> /* can be extended by block drivers */ >>>>>>>>> }; >>>>>>>>> >>>>>>>>> .bdrv_reopen(BlockDriverState *bs, BDRVReopenState **reopen_state, int >>>>>>>>> flags); >>>>>>>>> .bdrv_reopen_commit(BDRVReopenState *reopen_state); >>>>>>>>> .bdrv_reopen_abort(BDRVReopenState *reopen_state); >>>>>>>>> >>>>>>>>> raw-posix would store the old file descriptor in its reopen_state. On >>>>>>>>> commit, it closes the old descriptors, on abort it reverts to the old >>>>>>>>> one and closes the newly opened one. >>>>>>>>> >>>>>>>>> Makes things a bit more complicated than the simple bdrv_reopen I had in >>>>>>>>> mind before, but it allows VMDK to get an all-or-nothing semantics. >>>>>>>> >>>>>>>> Can you show how bdrv_reopen() would use these new interfaces? I'm >>>>>>>> not 100% clear on the idea. >>>>>>> >>>>>>> Well, you wouldn't only call bdrv_reopen, but also either >>>>>>> bdrv_reopen_commit/abort (for the top-level caller we can have a wrapper >>>>>>> function that does both, but that's syntactic sugar). >>>>>>> >>>>>>> For example we would have: >>>>>>> >>>>>>> int vmdk_reopen() >>>>>> >>>>>> .bdrv_reopen() is a confusing name for this operation because it does >>>>>> not reopen anything. bdrv_prepare_reopen() might be clearer. >>>>> >>>>> Makes sense. >>>>> >>>>>> >>>>>>> { >>>>>>> *((VMDKReopenState**) rs) = malloc(); >>>>>>> >>>>>>> foreach (extent in s->extents) { >>>>>>> ret = bdrv_reopen(extent->file, &extent->reopen_state) >>>>>>> if (ret < 0) >>>>>>> goto fail; >>>>>>> } >>>>>>> return 0; >>>>>>> >>>>>>> fail: >>>>>>> foreach (extent in rs->already_reopened) { >>>>>>> bdrv_reopen_abort(extent->reopen_state); >>>>>>> } >>>>>>> return ret; >>>>>>> } >>>>>> >>>>>>> void vmdk_reopen_commit() >>>>>>> { >>>>>>> foreach (extent in s->extents) { >>>>>>> bdrv_reopen_commit(extent->reopen_state); >>>>>>> } >>>>>>> free(rs); >>>>>>> } >>>>>>> >>>>>>> void vmdk_reopen_abort() >>>>>>> { >>>>>>> foreach (extent in s->extents) { >>>>>>> bdrv_reopen_abort(extent->reopen_state); >>>>>>> } >>>>>>> free(rs); >>>>>>> } >>>>>> >>>>>> Does the caller invoke bdrv_close(bs) after bdrv_prepare_reopen(bs, >>>>>> &rs)? >>>>> >>>>> No. Closing the old backend would be part of bdrv_reopen_commit. >>>>> >>>>> Do you have a use case where it would be helpful if the caller invoked >>>>> bdrv_close? >>>> >>>> When the caller does bdrv_close() two BlockDriverStates are never open >>>> for the same image file. I thought this was a property we wanted. >>>> >>>> Also, in the block_set_hostcache case we need to reopen without >>>> switching to a new BlockDriverState instance. That means the reopen >>>> needs to be in-place with respect to the BlockDriverState *bs pointer. >>>> We cannot create a new instance. >>> >>> Yes, but where do you even get the second BlockDriverState from? >>> >>> My prototype only returns an int, not a new BlockDriverState. Until >>> bdrv_reopen_commit() it would refer to the old file descriptors etc. and >>> after bdrv_reopen_commit() the very same BlockDriverState would refer to >>> the new ones. >> >> It seems I don't understand the API. I thought it was: >> >> do_block_set_hostcache() >> { >> bdrv_prepare_reopen(bs, &rs); >> ...open new file and check everything is okay... >> if (ret == 0) { >> bdrv_reopen_commit(rs); >> } else { >> bdrv_reopen_abort(rs); >> } >> return ret; >> } >> >> If the caller isn't opening the new file then what's the point of >> giving the caller control over prepare, commit, and abort? > > After sending the last email I realized what I was missing: > > You need the prepare, commit, and abort API in order to handle > multi-file block drivers like VMDK. Yes, this is whole point of separating commit out. Does the proposal make sense to you now? Kevin