From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48011) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y0rtr-0000DE-Q0 for qemu-devel@nongnu.org; Tue, 16 Dec 2014 08:10:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Y0rtl-0004wP-4R for qemu-devel@nongnu.org; Tue, 16 Dec 2014 08:10:51 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38195) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y0rtk-0004wA-Ep for qemu-devel@nongnu.org; Tue, 16 Dec 2014 08:10:45 -0500 Date: Tue, 16 Dec 2014 14:10:34 +0100 From: Kevin Wolf Message-ID: <20141216131034.GA3301@noname.str.redhat.com> References: <1418305950-30924-1-git-send-email-pbonzini@redhat.com> <1418305950-30924-2-git-send-email-pbonzini@redhat.com> <20141216110727.GA27195@noname.str.redhat.com> <54901764.6050307@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54901764.6050307@redhat.com> Subject: Re: [Qemu-devel] [PATCH v2 1/5] linux-aio: queue requests that cannot be submitted List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: ming.lei@canonical.com, pl@kamp.de, qemu-devel@nongnu.org, stefanha@redhat.com Am 16.12.2014 um 12:28 hat Paolo Bonzini geschrieben: > > > On 16/12/2014 12:07, Kevin Wolf wrote: > > Am 11.12.2014 um 14:52 hat Paolo Bonzini geschrieben: > >> Keep a queue of requests that were not submitted; pass them to > >> the kernel when a completion is reported, unless the queue is > >> plugged. > >> > >> The array of iocbs is rebuilt every time from scratch. This > >> avoids keeping the iocbs array and list synchronized. > >> > >> Signed-off-by: Paolo Bonzini > > > > Just found out that in qemu-img bench, this patch seems to cost about > > 5-8% for me. > > What execution? Queue depth=1? My usual one: $ ./qemu-img bench -t none -c 10000000 -n /dev/loop0 Sending 10000000 requests, 4096 bytes each, 64 in parallel > For me it was noisy but I couldn't see a pessimization, and this patch > should only add a handful of pointer accesses. Also, does perf point at > a culprit, and does patch 5 restore some of the performance? > > Weird guess: TLB misses from accessing iocbs[0] on the stack (using a > different coroutine stack every time)? Perf would report that as a > large cost of this line: > > iocbs[len++] = &aiocb->iocb; No, I can't seem to read much from the perf results. The cost seems to be spread fairly evenly across ioq_submit(), with the exception of the instruction after the call to io_submit(). Not sure why the next instruction always takes so much time (independent of what it is), but it has been this way before. I was surprised to see a "rep stos" scoring at 10% in laio_submit(), apparently io_prep_*() do a memset on the iocb. Not sure if that is necessary, but again, it has always been this way. Patch 5 doesn't restore the performance, which makes sense, as qemu-img only sends single requests. Kevin