From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33489) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cC2Sv-0002o3-SU for qemu-devel@nongnu.org; Wed, 30 Nov 2016 05:50:18 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cC2Ss-0003b0-PT for qemu-devel@nongnu.org; Wed, 30 Nov 2016 05:50:17 -0500 Received: from mx1.redhat.com ([209.132.183.28]:42888) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cC2Ss-0003aE-HB for qemu-devel@nongnu.org; Wed, 30 Nov 2016 05:50:14 -0500 Date: Wed, 30 Nov 2016 18:50:09 +0800 From: Fam Zheng Message-ID: <20161130105009.GB27283@lemon> References: <20161124151225.GA11963@stefanha-x1.localdomain> <20161129103236.GE15786@lemon> <14abb3dd-b639-3c31-cade-073fff209ca6@redhat.com> <20161129132354.GF15786@lemon> <20161130054214.GA22613@lemon> <20161130093800.GA2589@stefanha-x1.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161130093800.GA2589@stefanha-x1.localdomain> Subject: Re: [Qemu-devel] Linux kernel polling for QEMU List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Eliezer Tamir , "Michael S. Tsirkin" , Stefan Hajnoczi , qemu-devel , Jens Axboe , Christian Borntraeger , Davide Libenzi , Paolo Bonzini , Christoph Hellwig On Wed, 11/30 09:38, Stefan Hajnoczi wrote: > On Wed, Nov 30, 2016 at 01:42:14PM +0800, Fam Zheng wrote: > > On Tue, 11/29 20:43, Stefan Hajnoczi wrote: > > > On Tue, Nov 29, 2016 at 1:24 PM, Fam Zheng wrote: > > > > On Tue, 11/29 12:17, Paolo Bonzini wrote: > > > >> On 29/11/2016 11:32, Fam Zheng wrote: > > > >> * it still needs a system call before polling is entered. Ideally, QEMU > > > >> could run without any system call while in polling mode. > > > >> > > > >> Another possibility is to add a system call for single_task_running(). > > > >> It should be simple enough that you can implement it in the vDSO and > > > >> avoid a context switch. There are convenient hooking points in > > > >> add_nr_running and sub_nr_running. > > > > > > > > That sounds good! > > > > > > With this solution QEMU can either poll virtqueues or the host kernel > > > can poll NIC and storage controller descriptor rings, but not both at > > > the same time in one thread. This is one of the reasons why I think > > > exploring polling in the kernel makes more sense. > > > > That's true. I have one question though: controller rings are in a different > > layer in the kernel, I wonder what the syscall interface looks like to ask > > kernel to poll both hardware rings and memory locations in the same loop? It's > > not obvious to me after reading your eventfd patch. > > Current descriptor ring polling in select(2)/poll(2) is supported for > network sockets. Take a look at the POLL_BUSY_LOOP flag in > fs/select.c:do_poll(). If the .poll() callback sets the flag then it > indicates that the fd supports busy loop polling. > > The way this is implemented for network sockets is that the socket looks > up the napi index and is able to use the NIC driver to poll the rx ring. > Then it checks whether the socket's receive queue contains data after > the rx ring was processed. > > The virtio_net.ko driver supports this interface, for example. See > drivers/net/virtio_net.c:virtnet_busy_poll(). > > Busy loop polling isn't supported for block I/O yet. There is currently > a completely independent code path for O_DIRECT synchronous I/O where > NVMe can poll for request completion. But it doesn't work together with > asynchronous I/O (e.g. Linux AIO using eventfd with select(2)/poll(2)). This makes perfect sense now, thanks for the pointers! > > > > The disadvantage of the kernel approach is that you must make the > > > ppoll(2)/epoll_wait(2) syscall even for polling, and you probably need > > > to do eventfd reads afterwards so the minimum event loop iteration > > > latency is higher than doing polling in userspace. > > > > And userspace drivers powered by dpdk or vfio will still want to do polling in > > userspace anyway, we may want to take that into account as well. > > vfio supports interrupts so it can definitely be integrated with > adaptive kernel polling (i.e. poll for a little while and then wait for > an interrupt if there was no event). > > Does dpdk ever use interrupts? Yes, interrupt mode is supported there. For example see the intx/msix init code in drivers/net/ixgbe/ixgbe_ethdev.c:ixgbe_dev_start(). Fam