qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: qemu-devel@nongnu.org
Cc: Jens Axboe <axboe@fb.com>, Christoph Hellwig <hch@lst.de>,
	Eliezer Tamir <eliezer.tamir@linux.intel.com>,
	Davide Libenzi <davidel@xmailserver.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Fam Zheng <famz@redhat.com>
Subject: [Qemu-devel] Linux kernel polling for QEMU
Date: Thu, 24 Nov 2016 15:12:25 +0000	[thread overview]
Message-ID: <20161124151225.GA11963@stefanha-x1.localdomain> (raw)

[-- Attachment #1: Type: text/plain, Size: 3448 bytes --]

I looked through the socket SO_BUSY_POLL and blk_mq poll support in
recent Linux kernels with an eye towards integrating the ongoing QEMU
polling work.  The main missing feature is eventfd polling support which
I describe below.

Background
----------
We're experimenting with polling in QEMU so I wondered if there are
advantages to having the kernel do polling instead of userspace.

One such advantage has been pointed out by Christian Borntraeger and
Paolo Bonzini: a userspace thread spins blindly without knowing when it
is hogging a CPU that other tasks need.  The kernel knows when other
tasks need to run and can skip polling in that case.

Power management might also benefit if the kernel was aware of polling
activity on the system.  That way polling can be controlled by the
system administrator in a single place.  Perhaps smarter power saving
choices can also be made by the kernel.

Another advantage is that the kernel can poll hardware rings (e.g. NIC
rx rings) whereas QEMU can only poll its own virtual memory (including
guest RAM).  That means the kernel can bypass interrupts for devices
that are using kernel drivers.

State of polling in Linux
-------------------------
SO_BUSY_POLL causes recvmsg(2), select(2), and poll(2) family system
calls to spin awaiting new receive packets.  From what I can tell epoll
is not supported so that system call will sleep without polling.

blk_mq poll is mainly supported by NVMe.  It is only available with
synchronous direct I/O.  select(2), poll(2), epoll, and Linux AIO are
therefore not integrated.  It would be nice to extend the code so a
process waiting on Linux AIO using io_getevents(2), select(2), poll(2),
or epoll will poll.

QEMU and KVM-specific polling
-----------------------------
There are a few QEMU/KVM-specific items that require polling support:

QEMU's event loop aio_notify() mechanism wakes up the event loop from a
blocking poll(2) or epoll call.  It is used when another thread adds or
changes an event loop resource (such as scheduling a BH).  There is a
userspace memory location (ctx->notified) that is written by
aio_notify() as well as an eventfd that can be signalled.

kvm.ko's ioeventfd is signalled upon guest MMIO/PIO accesses.  Virtio
devices use ioeventfd as a doorbell after new requests have been placed
in a virtqueue, which is a descriptor ring in userspace memory.

Eventfd polling support could look like this:

  struct eventfd_poll_info poll_info = {
      .addr = ...memory location...,
      .size = sizeof(uint32_t),
      .op   = EVENTFD_POLL_OP_NOT_EQUAL, /* check *addr != val */
      .val  = ...last value...,
  };
  ioctl(eventfd, EVENTFD_SET_POLL, &poll_info);

In the kernel, eventfd stashes this information and eventfd_poll()
evaluates the operation (e.g. not equal, bitwise and, etc) to detect
progress.

Note that this eventfd polling mechanism doesn't actually poll the
eventfd counter value.  It's useful for situations where the eventfd is
a doorbell/notification that some object in userspace memory has been
updated.  So it polls that userspace memory location directly.

This new eventfd feature also provides a poor man's Linux AIO polling
support: set the Linux AIO shared ring index as the eventfd polling
memory location.  This is not as good as true Linux AIO polling support
where the kernel polls the NVMe, virtio_blk, etc ring since we'd still
rely on an interrupt to complete I/O requests.

Thoughts?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

             reply	other threads:[~2016-11-24 15:12 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-24 15:12 Stefan Hajnoczi [this message]
2016-11-28  9:31 ` [Qemu-devel] Linux kernel polling for QEMU Eliezer Tamir
2016-11-28 15:29   ` Stefan Hajnoczi
2016-11-28 15:41     ` Paolo Bonzini
2016-11-29 10:45       ` Stefan Hajnoczi
2016-11-30 17:41         ` Avi Kivity
2016-12-01 11:45           ` Stefan Hajnoczi
2016-12-01 11:59             ` Avi Kivity
2016-12-01 14:35               ` Paolo Bonzini
2016-12-02 10:12                 ` Stefan Hajnoczi
2016-12-07 10:38                   ` Avi Kivity
2016-12-07 10:32                 ` Avi Kivity
2016-11-28 20:41   ` Willem de Bruijn
2016-11-29  8:19 ` Christian Borntraeger
2016-11-29 11:00   ` Stefan Hajnoczi
2016-11-29 11:58     ` Christian Borntraeger
2016-11-29 10:32 ` Fam Zheng
2016-11-29 11:17   ` Paolo Bonzini
2016-11-29 13:24     ` Fam Zheng
2016-11-29 13:27       ` Paolo Bonzini
2016-11-29 14:17         ` Fam Zheng
2016-11-29 15:24           ` Andrew Jones
2016-11-29 15:39             ` Fam Zheng
2016-11-29 16:01               ` Andrew Jones
2016-11-29 16:13                 ` Paolo Bonzini
2016-11-29 19:38                   ` Andrew Jones
2016-11-30  7:19                     ` Peter Maydell
2016-11-30  9:05                       ` Andrew Jones
2016-11-30  9:46                         ` Peter Maydell
2016-11-30 14:18                           ` Paolo Bonzini
2016-12-05 11:20                             ` Alex Bennée
2016-11-29 15:45             ` Paolo Bonzini
2016-11-29 20:43       ` Stefan Hajnoczi
2016-11-30  5:42         ` Fam Zheng
2016-11-30  9:38           ` Stefan Hajnoczi
2016-11-30 10:50             ` Fam Zheng
2016-11-30 15:10               ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161124151225.GA11963@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=axboe@fb.com \
    --cc=borntraeger@de.ibm.com \
    --cc=davidel@xmailserver.org \
    --cc=eliezer.tamir@linux.intel.com \
    --cc=famz@redhat.com \
    --cc=hch@lst.de \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).