From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:45318)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1X2Gbs-0001iY-Ax
	for qemu-devel@nongnu.org; Wed, 02 Jul 2014 05:13:55 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1X2Gbk-0001Hs-N5
	for qemu-devel@nongnu.org; Wed, 02 Jul 2014 05:13:48 -0400
Received: from mx1.redhat.com ([209.132.183.28]:44413)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1X2Gbk-0001Hl-F9
	for qemu-devel@nongnu.org; Wed, 02 Jul 2014 05:13:40 -0400
Message-ID: <53B3CD3C.4040506@redhat.com>
Date: Wed, 02 Jul 2014 11:13:32 +0200
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <CACVXFVNVwpcEUeBQr0UsW58B2CG3Qghj-2SSfx9GC1rrwbjw0A@mail.gmail.com>
	<20140627120129.GO12061@stefanha-thinkpad.muc.redhat.com>
	<CACVXFVPYd6Hs9k6x4pVGJpEzLgY2DAJbm02pgc-cOwQH6rhMzg@mail.gmail.com>
	<53ADE769.3060903@redhat.com>
	<CACVXFVMYQ62tREcPDs0rh834BuKtygYqLJic71cmuTYjFzgCJg@mail.gmail.com>
	<20140630080850.GB30969@stefanha-thinkpad.redhat.com>
	<CACVXFVMBxO=aMVnGLuFReB8RUA+qyfOkpNyhaiDaf+DNinRKDg@mail.gmail.com>
	<CAJSP0QX70kWC87vSCObe56GCj6NKHnVSH3s7PuGrTWZneaOSkA@mail.gmail.com>
	<CACVXFVMfD_i-T+R-SiQkcZseEOhBzqDLG2eu0+hZOQawnDa1mw@mail.gmail.com>
	<53B2E69A.1090707@redhat.com>
	<20140702085453.GI4660@stefanha-thinkpad.redhat.com>
In-Reply-To: <20140702085453.GI4660@stefanha-thinkpad.redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit
	580b6b2aa2
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <famz@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>, Ming Lei <tom.leiming@gmail.com>, qemu-devel <qemu-devel@nongnu.org>, Stefan Hajnoczi <stefanha@gmail.com>

Il 02/07/2014 10:54, Stefan Hajnoczi ha scritto:
>> Both can be eliminated by introducing a fast path in bdrv_aio_{read,write}v,
>> that bypasses coroutines in the common case of no I/O throttling, no
>> copy-on-write, etc.
>
> I tried that in 2012 and couldn't measure an improvement above the noise
> threshold, although it was without dataplane.
>
> BTW, we cannot eliminate the BH because the block layer guarantees that
> callbacks are not invoked with reentrancy.  They are always invoked
> directly from the event loop through a BH.  This simplifies callers
> since they don't need to worry about callbacks happening while they are
> still in bdrv_aio_readv(), for example.
>
> Removing this guarantee (by making callers safe first) is orthogonal to
> coroutines.  But it's hard to do since it requires auditing a lot of
> code.

You could also audit the few implementations of bdrv_aio_readv/writev 
(including bdrv_aio_readv/writev_em) to guarantee that they do not 
directly invoke the callbacks.  The rule was there before conversion to 
coroutine, so the implementations should be fine.  In fact, most of them 
are just forwarding to another bdrv_aio_readv/writev, and the others go 
through an EventNotifier or bottom half.

Drivers that implement bdrv_co_readv/writev would not enjoy the fast 
path, and would keep using the BH.

> Another idea is to skip aio_notify() when we're sure the event loop
> isn't blocked in g_poll().  Doing this is a thread-safe and lockless way
> might be tricky though.

Yes, good idea for 2.2 but not now.

> So to recap, three issues are being discussed here:
>
> 1. rt_sigprocmask due to getcontext() in qemu_coroutine_new().  We
> shouldn't be invoking qemu_coroutine_new() often.  The freelist is
> probably too small.

Yes, right now it's 64 for the whole of QEMU.  Originally it was 64 per 
thread (using TLS) but then TLS was dropped because of problems when 
coroutines were created in the VCPU thread and destroyed in the 
iothread.  Nowadays, the size should probably be dynamic---like 64 per 
iothread to keep it simple.

> 2. Coroutines might be slower than the non-coroutine aio codepath.  I
> don't think this is the case, they are very cheap and I was never able
> to measure a real difference.
>
> 3. The block layer requires a BH with aio_notify() for
> bdrv_aio_readv()/bdrv_aio_writev()/bdrv_aio_flush() callbacks regardless
> of coroutines or not.  Eliminating the BH or skipping aio_notify() will
> take some work but could speed up QEMU as a whole.

I think (3) is not really true, and the BH is the actual reason why 
coroutines are slower.

Paolo