From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:58564)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1c6J9l-00037e-Ka
	for qemu-devel@nongnu.org; Mon, 14 Nov 2016 10:26:50 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1c6J9i-0003Tu-94
	for qemu-devel@nongnu.org; Mon, 14 Nov 2016 10:26:49 -0500
Received: from mail-wm0-x241.google.com ([2a00:1450:400c:c09::241]:33764)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <stefanha@gmail.com>) id 1c6J9i-0003Tj-1U
	for qemu-devel@nongnu.org; Mon, 14 Nov 2016 10:26:46 -0500
Received: by mail-wm0-x241.google.com with SMTP id u144so16252976wmu.0
	for <qemu-devel@nongnu.org>; Mon, 14 Nov 2016 07:26:45 -0800 (PST)
Date: Mon, 14 Nov 2016 15:26:42 +0000
From: Stefan Hajnoczi <stefanha@gmail.com>
Message-ID: <20161114152642.GE26198@stefanha-x1.localdomain>
References: <1478711602-12620-1-git-send-email-stefanha@redhat.com>
	<5826231D.7070208@redhat.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="so9zsI5B81VjUb/o"
Content-Disposition: inline
In-Reply-To: <5826231D.7070208@redhat.com>
Subject: Re: [Qemu-devel] [RFC 0/3] aio: experimental virtio-blk polling mode
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Karl Rister <krister@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>, qemu-devel@nongnu.org, Andrew Theurer <atheurer@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Fam Zheng <famz@redhat.com>


--so9zsI5B81VjUb/o
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Nov 11, 2016 at 01:59:25PM -0600, Karl Rister wrote:
> On 11/09/2016 11:13 AM, Stefan Hajnoczi wrote:
> > Recent performance investigation work done by Karl Rister shows that the
> > guest->host notification takes around 20 us.  This is more than the "ov=
erhead"
> > of QEMU itself (e.g. block layer).
> >=20
> > One way to avoid the costly exit is to use polling instead of notificat=
ion.
> > The main drawback of polling is that it consumes CPU resources.  In ord=
er to
> > benefit performance the host must have extra CPU cycles available on ph=
ysical
> > CPUs that aren't used by the guest.
> >=20
> > This is an experimental AioContext polling implementation.  It adds a p=
olling
> > callback into the event loop.  Polling functions are implemented for vi=
rtio-blk
> > virtqueue guest->host kick and Linux AIO completion.
> >=20
> > The QEMU_AIO_POLL_MAX_NS environment variable sets the number of nanose=
conds to
> > poll before entering the usual blocking poll(2) syscall.  Try setting t=
his
> > variable to the time from old request completion to new virtqueue kick.
> >=20
> > By default no polling is done.  The QEMU_AIO_POLL_MAX_NS must be set to=
 get any
> > polling!
> >=20
> > Karl: I hope you can try this patch series with several QEMU_AIO_POLL_M=
AX_NS
> > values.  If you don't find a good value we should double-check the trac=
ing data
> > to see if this experimental code can be improved.
>=20
> Stefan
>=20
> I ran some quick tests with your patches and got some pretty good gains,
> but also some seemingly odd behavior.
>
> These results are for a 5 minute test doing sequential 4KB requests from
> fio using O_DIRECT, libaio, and IO depth of 1.  The requests are
> performed directly against the virtio-blk device (no filesystem) which
> is backed by a 400GB NVme card.
>=20
> QEMU_AIO_POLL_MAX_NS      IOPs
>                unset    31,383
>                    1    46,860
>                    2    46,440
>                    4    35,246
>                    8    34,973
>                   16    46,794
>                   32    46,729
>                   64    35,520
>                  128    45,902

The environment variable is in nanoseconds.  The range of values you
tried are very small (all <1 usec).  It would be interesting to try
larger values in the ballpark of the latencies you have traced.  For
example 2000, 4000, 8000, 16000, and 32000 ns.

Very interesting that QEMU_AIO_POLL_MAX_NS=3D1 performs so well without
much CPU overhead.

> I found the results for 4, 8, and 64 odd so I re-ran some tests to check
> for consistency.  I used values of 2 and 4 and ran each 5 times.  Here
> is what I got:
>=20
> Iteration    QEMU_AIO_POLL_MAX_NS=3D2   QEMU_AIO_POLL_MAX_NS=3D4
>         1                    46,972                   35,434
>         2                    46,939                   35,719
>         3                    47,005                   35,584
>         4                    47,016                   35,615
>         5                    47,267                   35,474
>=20
> So the results seem consistent.

That is interesting.  I don't have an explanation for the consistent
difference between 2 and 4 ns polling time.  The time difference is so
small yet the IOPS difference is clear.

Comparing traces could shed light on the cause for this difference.

> I saw some discussion on the patches made which make me think you'll be
> making some changes, is that right?  If so, I may wait for the updates
> and then we can run the much more exhaustive set of workloads
> (sequential read and write, random read and write) at various block
> sizes (4, 8, 16, 32, 64, 128, and 256) and multiple IO depths (1 and 32)
> that we were doing when we started looking at this.

I'll send an updated version of the patches.

Stefan

--so9zsI5B81VjUb/o
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQEcBAEBAgAGBQJYKdeyAAoJEJykq7OBq3PIspIH/0dMkZVg3A+0EVdlN8W2AYgJ
t1KM5TQMxcZsZxWMOk1Mbph3XsooMMIb3Zxm+hSk3cM0KT7lljUxRjdhsFurf2Ik
O7HdUJ9iWVk7lzRn9F2kwG2WLKmGbPUM/TDeY6ZnauMeREFGUNtK2UiMcWIbve03
E0uX17HeAuQjX0DnnDq6czozf5UNXdAMGzW1ao1Q8AFihGkgaFZwG6OxeCr5NN29
HRAiGKXWzWssNeCl8MAMiMwyaoRpv1LOKMFBv5NkYhV9lpu7fJoDlkETwScVY1PQ
+aCsTCp515pPnrOPGtNDk0o67ryvNw53j4p+1x3/JVFiDrOTc/wnu0/tYlwTpNs=
=iiu7
-----END PGP SIGNATURE-----

--so9zsI5B81VjUb/o--