From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59709)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <yarygin@linux.vnet.ibm.com>) id 1Ysqpn-00084f-2U
	for qemu-devel@nongnu.org; Thu, 14 May 2015 06:57:48 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <yarygin@linux.vnet.ibm.com>) id 1Ysqpk-0004TC-Bx
	for qemu-devel@nongnu.org; Thu, 14 May 2015 06:57:47 -0400
Received: from e06smtp10.uk.ibm.com ([195.75.94.106]:33109)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <yarygin@linux.vnet.ibm.com>) id 1Ysqpk-0004Qj-2q
	for qemu-devel@nongnu.org; Thu, 14 May 2015 06:57:44 -0400
Received: from /spool/local
	by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <yarygin@linux.vnet.ibm.com>;
	Thu, 14 May 2015 11:57:42 +0100
From: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
References: <1431530311-21647-1-git-send-email-yarygin@linux.vnet.ibm.com>
	<55536C6B.4040400@redhat.com> <87vbfw77xb.fsf@linux.vnet.ibm.com>
	<20150514022542.GC862@ad.nay.redhat.com>
Date: Thu, 14 May 2015 13:57:32 +0300
In-Reply-To: <20150514022542.GC862@ad.nay.redhat.com> (Fam Zheng's message of
	"Thu, 14 May 2015 10:25:42 +0800")
Message-ID: <878ucrctpf.fsf@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain
Subject: Re: [Qemu-devel] [PATCH] block: Let bdrv_drain_all() to call
	aio_poll() for each AioContext
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Fam Zheng <famz@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-block@nongnu.org, qemu-devel@nongnu.org, Christian Borntraeger <borntraeger@de.ibm.com>, Stefan Hajnoczi <stefanha@redhat.com>, Cornelia Huck <cornelia.huck@de.ibm.com>, Paolo Bonzini <pbonzini@redhat.com>

Fam Zheng <famz@redhat.com> writes:

> On Wed, 05/13 19:34, Alexander Yarygin wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>> 
>> > On 13/05/2015 17:18, Alexander Yarygin wrote:
>> >> After the commit 9b536adc ("block: acquire AioContext in
>> >> bdrv_drain_all()") the aio_poll() function got called for every
>> >> BlockDriverState, in assumption that every device may have its own
>> >> AioContext. The bdrv_drain_all() function is called in each
>> >> virtio_reset() call,
>> >
>> > ... which should actually call bdrv_drain().  Can you fix that?
>> >
>> 
>> I thought about it, but couldn't come to conclusion that it's safe. The
>> comment above bdrv_drain_all() states "... it is not possible to have a
>> function to drain a single device's I/O queue.",
>
> I think that comment is stale - it predates the introduction of per BDS req
> tracking and bdrv_drain.
>

It says "completion of an asynchronous I/O operation can trigger any
number of other I/O operations on other devices". If this is no longer
the case, then I agree :). But I think it doesn't exclude this
patch anyway: bdrv_drain_all() is called in other places as well,
e.g. in do_vm_stop().

>> besides that what if we
>> have several virtual disks that share host file?
>
> I'm not sure what you mean, bdrv_drain works on a BDS, each virtual disk has
> one of which.
>
>> Or I'm wrong and it's ok to do?
>> 
>> >> which in turn is called for every virtio-blk
>> >> device on initialization, so we got aio_poll() called
>> >> 'length(device_list)^2' times.
>> >> 
>> >> If we have thousands of disks attached, there are a lot of
>> >> BlockDriverStates but only a few AioContexts, leading to tons of
>> >> unnecessary aio_poll() calls. For example, startup times with 1000 disks
>> >> takes over 13 minutes.
>> >> 
>> >> This patch changes the bdrv_drain_all() function allowing it find shared
>> >> AioContexts and to call aio_poll() only for unique ones. This results in
>> >> much better startup times, e.g. 1000 disks do come up within 5 seconds.
>> >
>> > I'm not sure this patch is correct.  You may have to call aio_poll
>> > multiple times before a BlockDriverState is drained.
>> >
>> > Paolo
>> >
>> 
>> 
>> Ah, right. We need second loop, something like this:
>> 
>> @@ -2030,20 +2033,33 @@ void bdrv_drain(BlockDriverState *bs)
>>  void bdrv_drain_all(void)
>>  {
>>      /* Always run first iteration so any pending completion BHs run */
>> -    bool busy = true;
>> +    bool busy = true, pending = false;
>>      BlockDriverState *bs;
>> +    GList *aio_ctxs = NULL, *ctx;
>> +    AioContext *aio_context;
>> 
>>      while (busy) {
>>          busy = false;
>> 
>>          QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
>> -            AioContext *aio_context = bdrv_get_aio_context(bs);
>> +            aio_context = bdrv_get_aio_context(bs);
>> 
>>              aio_context_acquire(aio_context);
>>              busy |= bdrv_drain_one(bs);
>>              aio_context_release(aio_context);
>> +            if (!aio_ctxs || !g_list_find(aio_ctxs, aio_context))
>> +                aio_ctxs = g_list_append(aio_ctxs, aio_context);
>
> Braces are required even for single line if. Moreover, I don't understand this
> - aio_ctxs is a duplicate of bdrv_states.
>
> Fam
>
>

length(bdrv_states) == amount of virtual disks
length(aio_ctxs) == amount of threads

We can get as many disks as we want, while amount of threads is
limited. In my case there were 1024 disks sharing one AioContext that
gives overhead at least in 1023 calls of aio_poll(). 

[.. skipped ..]