From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42050) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cC1rs-0005Cs-RG for qemu-devel@nongnu.org; Wed, 30 Nov 2016 05:12:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cC1rr-0002Q1-NX for qemu-devel@nongnu.org; Wed, 30 Nov 2016 05:12:00 -0500 Received: from mx1.redhat.com ([209.132.183.28]:58676) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cC1rr-0002P7-Eq for qemu-devel@nongnu.org; Wed, 30 Nov 2016 05:11:59 -0500 Date: Wed, 30 Nov 2016 09:38:00 +0000 From: Stefan Hajnoczi Message-ID: <20161130093800.GA2589@stefanha-x1.localdomain> References: <20161124151225.GA11963@stefanha-x1.localdomain> <20161129103236.GE15786@lemon> <14abb3dd-b639-3c31-cade-073fff209ca6@redhat.com> <20161129132354.GF15786@lemon> <20161130054214.GA22613@lemon> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="FCuugMFkClbJLl1L" Content-Disposition: inline In-Reply-To: <20161130054214.GA22613@lemon> Subject: Re: [Qemu-devel] Linux kernel polling for QEMU List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng Cc: Stefan Hajnoczi , Paolo Bonzini , Eliezer Tamir , "Michael S. Tsirkin" , qemu-devel , Jens Axboe , Christian Borntraeger , Davide Libenzi , Christoph Hellwig --FCuugMFkClbJLl1L Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Nov 30, 2016 at 01:42:14PM +0800, Fam Zheng wrote: > On Tue, 11/29 20:43, Stefan Hajnoczi wrote: > > On Tue, Nov 29, 2016 at 1:24 PM, Fam Zheng wrote: > > > On Tue, 11/29 12:17, Paolo Bonzini wrote: > > >> On 29/11/2016 11:32, Fam Zheng wrote: > > >> * it still needs a system call before polling is entered. Ideally, = QEMU > > >> could run without any system call while in polling mode. > > >> > > >> Another possibility is to add a system call for single_task_running(= ). > > >> It should be simple enough that you can implement it in the vDSO and > > >> avoid a context switch. There are convenient hooking points in > > >> add_nr_running and sub_nr_running. > > > > > > That sounds good! > >=20 > > With this solution QEMU can either poll virtqueues or the host kernel > > can poll NIC and storage controller descriptor rings, but not both at > > the same time in one thread. This is one of the reasons why I think > > exploring polling in the kernel makes more sense. >=20 > That's true. I have one question though: controller rings are in a differ= ent > layer in the kernel, I wonder what the syscall interface looks like to ask > kernel to poll both hardware rings and memory locations in the same loop?= It's > not obvious to me after reading your eventfd patch. Current descriptor ring polling in select(2)/poll(2) is supported for network sockets. Take a look at the POLL_BUSY_LOOP flag in fs/select.c:do_poll(). If the .poll() callback sets the flag then it indicates that the fd supports busy loop polling. The way this is implemented for network sockets is that the socket looks up the napi index and is able to use the NIC driver to poll the rx ring. Then it checks whether the socket's receive queue contains data after the rx ring was processed. The virtio_net.ko driver supports this interface, for example. See drivers/net/virtio_net.c:virtnet_busy_poll(). Busy loop polling isn't supported for block I/O yet. There is currently a completely independent code path for O_DIRECT synchronous I/O where NVMe can poll for request completion. But it doesn't work together with asynchronous I/O (e.g. Linux AIO using eventfd with select(2)/poll(2)). > > The disadvantage of the kernel approach is that you must make the > > ppoll(2)/epoll_wait(2) syscall even for polling, and you probably need > > to do eventfd reads afterwards so the minimum event loop iteration > > latency is higher than doing polling in userspace. >=20 > And userspace drivers powered by dpdk or vfio will still want to do polli= ng in > userspace anyway, we may want to take that into account as well. vfio supports interrupts so it can definitely be integrated with adaptive kernel polling (i.e. poll for a little while and then wait for an interrupt if there was no event). Does dpdk ever use interrupts? Stefan --FCuugMFkClbJLl1L Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEcBAEBAgAGBQJYPp34AAoJEJykq7OBq3PIjPEIAKprxKgcziblI7xpaVmf0Kvz HhbdfXzL3XzSSTXBsb3y+xV1DaVLpAhwxqTlymVyVw0ncGWiUmTFEqWHBcfc7czp IvxeoXJrgSYUVfN8VDdyeYG+b568lFsh4stF9FkXdLW5fLpHnk03OXw2CUcjLEm3 2fOfLpUxVc8foEh3odVdG0cPDNEi3UOv0H+NAalysOywqXzWipRlRWuhrbk+V4Rr gy6w8CpCWavkmGaFIyPIM4QSHGR49A0jBym51gb1EAbQ29+86efvuVZaa1VjluRO mVnvI7kGCNolC7E4jGLB6O3kT19u/sk91krdyi2C9W3olEihXmxV0sG8kR+QHj8= =elNg -----END PGP SIGNATURE----- --FCuugMFkClbJLl1L--