From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44096) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XuIvk-0003xf-JM for qemu-devel@nongnu.org; Fri, 28 Nov 2014 05:37:46 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XuIvf-0000iY-1r for qemu-devel@nongnu.org; Fri, 28 Nov 2014 05:37:40 -0500 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:48776 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XuIve-0000gl-Nt for qemu-devel@nongnu.org; Fri, 28 Nov 2014 05:37:34 -0500 Message-ID: <54785067.60905@kamp.de> Date: Fri, 28 Nov 2014 11:37:27 +0100 From: Peter Lieven MIME-Version: 1.0 References: <1417084026-12307-1-git-send-email-pl@kamp.de> <1417084026-12307-4-git-send-email-pl@kamp.de> <547753F7.2030709@redhat.com> <54782EC3.10005@kamp.de> <54784E55.6060405@redhat.com> In-Reply-To: <54784E55.6060405@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH 3/3] qemu-coroutine: use a ring per thread for the pool List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , ming.lei@canonical.com, Kevin Wolf , Stefan Hajnoczi , "qemu-devel@nongnu.org" , Markus Armbruster Am 28.11.2014 um 11:28 schrieb Paolo Bonzini: > > On 28/11/2014 09:13, Peter Lieven wrote: >> Am 27.11.2014 um 17:40 schrieb Paolo Bonzini: >>> On 27/11/2014 11:27, Peter Lieven wrote: >>>> +static __thread struct CoRoutinePool { >>>> + Coroutine *ptrs[POOL_MAX_SIZE]; >>>> + unsigned int size; >>>> + unsigned int nextfree; >>>> +} CoPool; >>>> >>> The per-thread ring unfortunately didn't work well last time it was >>> tested. Devices that do not use ioeventfd (not just the slow ones, even >>> decently performing ones like ahci, nvme or megasas) will create the >>> coroutine in the VCPU thread, and destroy it in the iothread. The >>> result is that coroutines cannot be reused. >>> >>> Can you check if this is still the case? >> I already tested at least for IDE and for ioeventfd=off. The coroutine >> is created in the vCPU thread and destroyed in the I/O thread. >> >> I also havea more complicated version which sets per therad coroutine pool only >> for dataplane. Avoiding the lock for dedicated iothreads. >> >> For those who want to take a look: >> >> https://github.com/plieven/qemu/commit/325bc4ef5c7039337fa785744b145e2bdbb7b62e > Can you test it against the patch I just sent in Kevin's linux-aio > coroutine thread? Was already doing it ;-) At least with test-couroutine.c.... master: Run operation 40000000 iterations 12.851414 s, 3112K operations/s, 321ns per coroutine paolo: Run operation 40000000 iterations 11.951720 s, 3346K operations/s, 298ns per coroutine plieven/perf_master2: Run operation 40000000 iterations 9.013785 s, 4437K operations/s, 225ns per coroutine plieven/perf_master: Run operation 40000000 iterations 11.072883 s, 3612K operations/s, 276ns per coroutine However, perf_master and perf_master2 have a regerssion regarding nesting as it seems. @Kevin: Could that be the reason why they performe bad in some szenarios? Regarding the bypass that is discussed. If it is not just a benchmark thing but really necessary for some peoples use cases why not add a new aio mode like "bypass" and use it only then. If the performance is really needed the user he/she might trade it in for lost features like iothrottling, filters etc. Peter