From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57373) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c6vYz-0003SM-BO for qemu-devel@nongnu.org; Wed, 16 Nov 2016 03:27:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c6vYu-0007cD-Ms for qemu-devel@nongnu.org; Wed, 16 Nov 2016 03:27:25 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48618) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c6vYu-0007bz-HH for qemu-devel@nongnu.org; Wed, 16 Nov 2016 03:27:20 -0500 Date: Wed, 16 Nov 2016 16:27:16 +0800 From: Fam Zheng Message-ID: <20161116082716.GA28320@lemon> References: <1478711602-12620-1-git-send-email-stefanha@redhat.com> <5826231D.7070208@redhat.com> <20161114152642.GE26198@stefanha-x1.localdomain> <90b5f81f-eab0-72dc-63b8-143477cb5286@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <90b5f81f-eab0-72dc-63b8-143477cb5286@redhat.com> Subject: Re: [Qemu-devel] [RFC 0/3] aio: experimental virtio-blk polling mode List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Stefan Hajnoczi , Karl Rister , Stefan Hajnoczi , qemu-devel@nongnu.org, Andrew Theurer On Mon, 11/14 16:29, Paolo Bonzini wrote: > > > On 14/11/2016 16:26, Stefan Hajnoczi wrote: > > On Fri, Nov 11, 2016 at 01:59:25PM -0600, Karl Rister wrote: > >> QEMU_AIO_POLL_MAX_NS IOPs > >> unset 31,383 > >> 1 46,860 > >> 2 46,440 > >> 4 35,246 > >> 8 34,973 > >> 16 46,794 > >> 32 46,729 > >> 64 35,520 > >> 128 45,902 > > > > The environment variable is in nanoseconds. The range of values you > > tried are very small (all <1 usec). It would be interesting to try > > larger values in the ballpark of the latencies you have traced. For > > example 2000, 4000, 8000, 16000, and 32000 ns. > > > > Very interesting that QEMU_AIO_POLL_MAX_NS=1 performs so well without > > much CPU overhead. > > That basically means "avoid a syscall if you already know there's > something to do", so in retrospect it's not that surprising. Still > interesting though, and it means that the feature is useful even if you > don't have CPU to waste. With the "deleted" bug fixed I did a little more testing to understand this. Setting QEMU_AIO_POLL_MAX_NS=1 doesn't mean run_poll_handlers() will only loop for 1 ns - the patch only checks at every 1024 polls. The first poll in a run_poll_handlers() call can hardly succeed, so we poll at least 1024 times. According to my test, on average each run_poll_handlers() takes ~12000ns, which is ~160 iterations of the poll loop, before geting a new event (either from virtio queue or linux-aio, I don't have the ratio here). So in the worse case (no new event), 1024 iterations is basically (12000 / 160 * 1024) = 76800 ns! The above is with iodepth=1 and jobs=1. With iodepth=32 and jobs=1, or iodepth=8 and jobs=4, the numbers are ~30th poll with 5600ns. Fam