From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60747) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YSSoR-0003aK-OQ for qemu-devel@nongnu.org; Mon, 02 Mar 2015 11:03:25 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YSSoN-0002VF-OJ for qemu-devel@nongnu.org; Mon, 02 Mar 2015 11:03:19 -0500 Message-ID: <54F489C0.3030704@redhat.com> Date: Mon, 02 Mar 2015 11:03:12 -0500 From: Max Reitz MIME-Version: 1.0 References: <1425055440-18038-1-git-send-email-mreitz@redhat.com> <20150302092542.GA4443@noname.redhat.com> <54F47E77.6000007@redhat.com> <20150302155701.GD4443@noname.redhat.com> In-Reply-To: <20150302155701.GD4443@noname.redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v4 00/11] block: Rework bdrv_close_all() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: Fam Zheng , qemu-block@nongnu.org, qemu-devel@nongnu.org, Markus Armbruster , Stefan Hajnoczi , Paolo Bonzini On 2015-03-02 at 10:57, Kevin Wolf wrote: > Am 02.03.2015 um 16:15 hat Max Reitz geschrieben: >> On 2015-03-02 at 04:25, Kevin Wolf wrote: >>> Am 27.02.2015 um 17:43 hat Max Reitz geschrieben: >>>> Currently, bdrv_close_all() force-closes all BDSs with a BlockBackend, >>>> which can lead to data corruption (see the iotest added in the final >>>> patch of this series) and is most certainly very ugly. >>>> >>>> This series reworks bdrv_close_all() to instead eject the BDS trees from >>>> all BlockBackends and then close the monitor-owned BDS trees, which are >>>> the only BDSs without a BB. In effect, all BDSs are closed just by >>>> getting closed automatically due to their reference count becoming 0. >>>> >>>> The benefit over the approach taken in v1 and v2 is that in device >>>> models we often cannot simply drop the reference to a BB because there >>>> may be some user which we forgot about. By ejecting the BDS trees from >>>> the BB, the BB itself becomes unusable, but in a clean way (it will >>>> return errors when accessed, but nothing will crash). Also, it is much >>>> simpler (no reference tracking necessary). >>>> >>>> The only disadvantage (I can see) is that the BBs are leaked; but this >>>> does not matter because the qemu process is about to exit anyway. >>> I haven't looked at the actual patches yet, but just from this >>> description and the diffstat: We need to make sure that the refcount >>> really drops to 0. >> If you mean the BBs: Tried that in v2, doesn't seem to work. I had >> my fun tracing around where the handles to the BBs for device models >> stay and who might have access to them, but when I say "fun" I mean >> that ironically. >> >> If you mean the BDSs: See below (below below). > Yes, I mean BDSes. > >>> That is, if there are NBD servers, block jobs, etc. >>> that hold an additional reference, we must make sure to stop them. It >>> doesn't look like this series takes care of this, does it? >> Everyone using BBs is fine; for instance, NBD servers are stopped >> (they register a notifier for when the BDS tree is ejected from a >> BB). > Good point. Even if the NBD server were not stopped, it still would > hold a reference only to the BB and not to the BDS, so the BDS would > correctly be closed. > >> So block jobs are not handled, once because I still didn't get >> around to make them work on BBs instead of BDSs (and I probably >> never will, although they really should), and because I missed the >> bdrv_ref() in block_job_create() >> >> I guess that means manually cancelling all block jobs in bdrv_close_all()... > Even in the far future, will block jobs really always work on a BB? Even > if they modify some node in the middle of the chain? I think they should create an unnamed BB for that case. But if we start to work on that, we'll have to create a BB for every block job, and that means supporting multiple BBs per BDS (if a block job is started on a root BDS). > Anyway, for now, we probably need to cancel them, yes. Essentially you > end up calling everything manually that had to register a notifier in > the earlier versions. Well, not absolutely everything, only everything that does not use a BB (and as far as I'm aware, those are only the monitor (which can hold references to BDSs without BBs) and block jobs). >>> Hm... Perhaps we could even install an atexit handler that asserts that >>> all BDSes are really gone in the end? >> I had that for BBs in v1 and v2, it was just a function that was >> called at the end of bdrv_close_all(). For that to work for BDSs, >> however, we'd need a list of all BDSs and assert that it's empty. I >> found that a bit too much. > The list of all relevant BDSes should be the union of bdrv_states and > bdrv_graph_states, and I think it doesn't have to be empty, but all of > its entries have to be closed (bs->drv == NULL). The idea of this series is that bdrv_close() should not be called outside of bdrv_delete() (and bdrv_open(), if that failed), so whenever a BDS is closed, it should be deleted. > It's not 100% complete because we have these stupid QMP commands that > address BDSes using filenames instead of node or BB names, but it's > probably a useful subset to check. > >> A simpler but uglier way would be a counter that's incremented every >> time a BDS is created and decremented every time one is deleted (or >> incremented once per refcount increment and drecrement once per >> refcount decrement). We'd then assert that this counter is 0. > Ugh. Well, it does the job, so holding my nose while applying should > do... OK. Max