From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:60747)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1YSSoR-0003aK-OQ
	for qemu-devel@nongnu.org; Mon, 02 Mar 2015 11:03:25 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1YSSoN-0002VF-OJ
	for qemu-devel@nongnu.org; Mon, 02 Mar 2015 11:03:19 -0500
Message-ID: <54F489C0.3030704@redhat.com>
Date: Mon, 02 Mar 2015 11:03:12 -0500
From: Max Reitz <mreitz@redhat.com>
MIME-Version: 1.0
References: <1425055440-18038-1-git-send-email-mreitz@redhat.com>
	<20150302092542.GA4443@noname.redhat.com>
	<54F47E77.6000007@redhat.com>
	<20150302155701.GD4443@noname.redhat.com>
In-Reply-To: <20150302155701.GD4443@noname.redhat.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v4 00/11] block: Rework bdrv_close_all()
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Fam Zheng <famz@redhat.com>, qemu-block@nongnu.org, qemu-devel@nongnu.org, Markus Armbruster <armbru@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>

On 2015-03-02 at 10:57, Kevin Wolf wrote:
> Am 02.03.2015 um 16:15 hat Max Reitz geschrieben:
>> On 2015-03-02 at 04:25, Kevin Wolf wrote:
>>> Am 27.02.2015 um 17:43 hat Max Reitz geschrieben:
>>>> Currently, bdrv_close_all() force-closes all BDSs with a BlockBackend,
>>>> which can lead to data corruption (see the iotest added in the final
>>>> patch of this series) and is most certainly very ugly.
>>>>
>>>> This series reworks bdrv_close_all() to instead eject the BDS trees from
>>>> all BlockBackends and then close the monitor-owned BDS trees, which are
>>>> the only BDSs without a BB. In effect, all BDSs are closed just by
>>>> getting closed automatically due to their reference count becoming 0.
>>>>
>>>> The benefit over the approach taken in v1 and v2 is that in device
>>>> models we often cannot simply drop the reference to a BB because there
>>>> may be some user which we forgot about. By ejecting the BDS trees from
>>>> the BB, the BB itself becomes unusable, but in a clean way (it will
>>>> return errors when accessed, but nothing will crash). Also, it is much
>>>> simpler (no reference tracking necessary).
>>>>
>>>> The only disadvantage (I can see) is that the BBs are leaked; but this
>>>> does not matter because the qemu process is about to exit anyway.
>>> I haven't looked at the actual patches yet, but just from this
>>> description and the diffstat: We need to make sure that the refcount
>>> really drops to 0.
>> If you mean the BBs: Tried that in v2, doesn't seem to work. I had
>> my fun tracing around where the handles to the BBs for device models
>> stay and who might have access to them, but when I say "fun" I mean
>> that ironically.
>>
>> If you mean the BDSs: See below (below below).
> Yes, I mean BDSes.
>
>>> That is, if there are NBD servers, block jobs, etc.
>>> that hold an additional reference, we must make sure to stop them. It
>>> doesn't look like this series takes care of this, does it?
>> Everyone using BBs is fine; for instance, NBD servers are stopped
>> (they register a notifier for when the BDS tree is ejected from a
>> BB).
> Good point. Even if the NBD server were not stopped, it still would
> hold a reference only to the BB and not to the BDS, so the BDS would
> correctly be closed.
>
>> So block jobs are not handled, once because I still didn't get
>> around to make them work on BBs instead of BDSs (and I probably
>> never will, although they really should), and because I missed the
>> bdrv_ref() in block_job_create()
>>
>> I guess that means manually cancelling all block jobs in bdrv_close_all()...
> Even in the far future, will block jobs really always work on a BB? Even
> if they modify some node in the middle of the chain?

I think they should create an unnamed BB for that case.

But if we start to work on that, we'll have to create a BB for every 
block job, and that means supporting multiple BBs per BDS (if a block 
job is started on a root BDS).

> Anyway, for now, we probably need to cancel them, yes. Essentially you
> end up calling everything manually that had to register a notifier in
> the earlier versions.

Well, not absolutely everything, only everything that does not use a BB 
(and as far as I'm aware, those are only the monitor (which can hold 
references to BDSs without BBs) and block jobs).

>>> Hm... Perhaps we could even install an atexit handler that asserts that
>>> all BDSes are really gone in the end?
>> I had that for BBs in v1 and v2, it was just a function that was
>> called at the end of bdrv_close_all(). For that to work for BDSs,
>> however, we'd need a list of all BDSs and assert that it's empty. I
>> found that a bit too much.
> The list of all relevant BDSes should be the union of bdrv_states and
> bdrv_graph_states, and I think it doesn't have to be empty, but all of
> its entries have to be closed (bs->drv == NULL).

The idea of this series is that bdrv_close() should not be called 
outside of bdrv_delete() (and bdrv_open(), if that failed), so whenever 
a BDS is closed, it should be deleted.

> It's not 100% complete because we have these stupid QMP commands that
> address BDSes using filenames instead of node or BB names, but it's
> probably a useful subset to check.
>
>> A simpler but uglier way would be a counter that's incremented every
>> time a BDS is created and decremented every time one is deleted (or
>> incremented once per refcount increment and drecrement once per
>> refcount decrement). We'd then assert that this counter is 0.
> Ugh. Well, it does the job, so holding my nose while applying should
> do...

OK.

Max