From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59527) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y0qJF-0004G6-56 for qemu-devel@nongnu.org; Tue, 16 Dec 2014 06:29:03 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Y0qJ9-0007aO-0i for qemu-devel@nongnu.org; Tue, 16 Dec 2014 06:28:57 -0500 Received: from mx1.redhat.com ([209.132.183.28]:55514) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y0qJ8-0007aC-Ob for qemu-devel@nongnu.org; Tue, 16 Dec 2014 06:28:50 -0500 Message-ID: <54901764.6050307@redhat.com> Date: Tue, 16 Dec 2014 12:28:36 +0100 From: Paolo Bonzini MIME-Version: 1.0 References: <1418305950-30924-1-git-send-email-pbonzini@redhat.com> <1418305950-30924-2-git-send-email-pbonzini@redhat.com> <20141216110727.GA27195@noname.str.redhat.com> In-Reply-To: <20141216110727.GA27195@noname.str.redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH v2 1/5] linux-aio: queue requests that cannot be submitted List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: ming.lei@canonical.com, pl@kamp.de, qemu-devel@nongnu.org, stefanha@redhat.com On 16/12/2014 12:07, Kevin Wolf wrote: > Am 11.12.2014 um 14:52 hat Paolo Bonzini geschrieben: >> Keep a queue of requests that were not submitted; pass them to >> the kernel when a completion is reported, unless the queue is >> plugged. >> >> The array of iocbs is rebuilt every time from scratch. This >> avoids keeping the iocbs array and list synchronized. >> >> Signed-off-by: Paolo Bonzini > > Just found out that in qemu-img bench, this patch seems to cost about > 5-8% for me. What execution? Queue depth=1? For me it was noisy but I couldn't see a pessimization, and this patch should only add a handful of pointer accesses. Also, does perf point at a culprit, and does patch 5 restore some of the performance? Weird guess: TLB misses from accessing iocbs[0] on the stack (using a different coroutine stack every time)? Perf would report that as a large cost of this line: iocbs[len++] = &aiocb->iocb; > An optimisation for the unplugged case would probably be easy, but that > would be cheating, as the devices that we're really interested in always > plug the queue (perhaps I should extend qemu-img bench to do that > optionally, too). If you want to do that, you also have to move the "refilling" of the queue to a bottom half. If you refill from the completion routine, you always have a single empty slot and plugging doesn't do anything. Paolo > Anything clever that we can do about this? Or will we just have to live > with the fact that sending a single request is now slower than it used > to be before bdrv_plug?