Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Anthony Liguori <aliguori@us.ibm.com>,
	Avi Kivity <avi@redhat.com>,
	qemu-devel@nongnu.org,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
Date: Tue, 25 Jan 2011 13:27:46 +0200	[thread overview]
Message-ID: <20110125112746.GA3575@redhat.com> (raw)
In-Reply-To: <AANLkTi=b6qMRNk9FveYhGviRXyOXBVu9kpZ2wuBTgfpn@mail.gmail.com>

On Tue, Jan 25, 2011 at 09:49:04AM +0000, Stefan Hajnoczi wrote:
> On Tue, Jan 25, 2011 at 7:12 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Mon, Jan 24, 2011 at 8:05 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> >> Am 24.01.2011 20:47, schrieb Michael S. Tsirkin:
> >>> On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote:
> >>>> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin:
> >>>>> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
> >>>>>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
> >>>>>>> Virtqueue notify is currently handled synchronously in userspace virtio.  This
> >>>>>>> prevents the vcpu from executing guest code while hardware emulation code
> >>>>>>> handles the notify.
> >>>>>>>
> >>>>>>> On systems that support KVM, the ioeventfd mechanism can be used to make
> >>>>>>> virtqueue notify a lightweight exit by deferring hardware emulation to the
> >>>>>>> iothread and allowing the VM to continue execution.  This model is similar to
> >>>>>>> how vhost receives virtqueue notifies.
> >>>>>>>
> >>>>>>> The result of this change is improved performance for userspace virtio devices.
> >>>>>>> Virtio-blk throughput increases especially for multithreaded scenarios and
> >>>>>>> virtio-net transmit throughput increases substantially.
> >>>>>>>
> >>>>>>> Some virtio devices are known to have guest drivers which expect a notify to be
> >>>>>>> processed synchronously and spin waiting for completion.  Only enable ioeventfd
> >>>>>>> for virtio-blk and virtio-net for now.
> >>>>>>>
> >>>>>>> Care must be taken not to interfere with vhost-net, which uses host
> >>>>>>> notifiers.  If the set_host_notifier() API is used by a device
> >>>>>>> virtio-pci will disable virtio-ioeventfd and let the device deal with
> >>>>>>> host notifiers as it wishes.
> >>>>>>>
> >>>>>>> After migration and on VM change state (running/paused) virtio-ioeventfd
> >>>>>>> will enable/disable itself.
> >>>>>>>
> >>>>>>>  * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd
> >>>>>>>  * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd
> >>>>>>>  * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd
> >>>>>>>  * vm_change_state(running=0) -> disable virtio-ioeventfd
> >>>>>>>  * vm_change_state(running=1) -> enable virtio-ioeventfd
> >>>>>>>
> >>>>>>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> >>>>>>
> >>>>>> On current git master I'm getting hangs when running iozone on a
> >>>>>> virtio-blk disk. "Hang" means that it's not responsive any more and has
> >>>>>> 100% CPU consumption.
> >>>>>>
> >>>>>> I bisected the problem to this patch. Any ideas?
> >>>>>>
> >>>>>> Kevin
> >>>>>
> >>>>> Does it help if you set ioeventfd=off on command line?
> >>>>
> >>>> Yes, with ioeventfd=off it seems to work fine.
> >>>>
> >>>> Kevin
> >>>
> >>> Then it's the ioeventfd that is to blame.
> >>> Is it the io thread that consumes 100% CPU?
> >>> Or the vcpu thread?
> >>
> >> I was building with the default options, i.e. there is no IO thread.
> >>
> >> Now I'm just running the test with IO threads enabled, and so far
> >> everything looks good. So I can only reproduce the problem with IO
> >> threads disabled.
> >
> > Hrm...aio uses SIGUSR2 to force the vcpu to process aio completions
> > (relevant when --enable-io-thread is not used).  I will take a look at
> > that again and see why we're spinning without checking for ioeventfd
> > completion.
> 
> Here's my understanding of --disable-io-thread.  Added Anthony on CC,
> please correct me.
> 
> When I/O thread is disabled our only thread runs guest code until an
> exit request is made.  There are synchronous exit cases like a halt
> instruction or single step.  There are also asynchronous exit cases
> when signal handlers use qemu_notify_event(), which does cpu_exit(),
> to set env->exit_request = 1 and unlink the current tb.
> 
> With this structure in mind, anything which needs to interrupt the
> vcpu in order to process events must use signals and
> qemu_notify_event().  Otherwise that event source may be starved and
> never processed.
> 
> virtio-ioeventfd currently does not use signals and will therefore
> never interrupt the vcpu.
> 
> However, you normally don't notice the missing signal handler because
> some other event interrupts the vcpu and we enter select(2) to process
> all pending handlers.  So virtio-ioeventfd mostly gets a free ride on
> top of timer events.  This is suboptimal because it adds latency to
> virtqueue kick - we're waiting for another event to interrupt the vcpu
> before we can process virtqueue-kick.
> 
> If any other vcpu interruption makes virtio-ioeventfd chug along then
> why are you seeing 100% CPU livelock?  My theory is that dynticks has
> a race condition which causes timers to stop working in QEMU.  Here is
> an strace of QEMU --disable-io-thread entering live lock.  I can
> trigger this by starting a VM and running "while true; do true; done"
> at the shell.  Then strace the QEMU process:
> 
> 08:04:34.985177 ioctl(11, KVM_RUN, 0)   = 0
> 08:04:34.985242 --- SIGALRM (Alarm clock) @ 0 (0) ---
> 08:04:34.985319 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
> 08:04:34.985368 rt_sigreturn(0x2758ad0) = 0
> 08:04:34.985423 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 0})
> 08:04:34.985484 read(5, "\1\0\0\0\0\0\0\0", 512) = 8
> 08:04:34.985538 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
> 08:04:34.985588 timer_settime(0, 0, {it_interval={0, 0}, it_value={0,
> 273000}}, NULL) = 0
> 08:04:34.985646 ioctl(11, KVM_RUN, 0)   = -1 EINTR (Interrupted system call)
> 08:04:34.985928 --- SIGALRM (Alarm clock) @ 0 (0) ---
> 08:04:34.986007 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
> 08:04:34.986063 rt_sigreturn(0x2758ad0) = -1 EINTR (Interrupted system call)
> 08:04:34.986124 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 0})
> 08:04:34.986188 read(5, "\1\0\0\0\0\0\0\0", 512) = 8
> 08:04:34.986246 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
> 08:04:34.986299 timer_settime(0, 0, {it_interval={0, 0}, it_value={0,
> 250000}}, NULL) = 0
> 08:04:34.986359 ioctl(11, KVM_INTERRUPT, 0x7fff90404ef0) = 0
> 08:04:34.986406 ioctl(11, KVM_RUN, 0)   = 0
> 08:04:34.986465 ioctl(11, KVM_RUN, 0)   = 0              <--- guest
> finishes execution
> 
>                 v--- dynticks_rearm_timer() returns early because
> timer is already scheduled
> 08:04:34.986533 timer_gettime(0, {it_interval={0, 0}, it_value={0, 24203}}) = 0
> 08:04:34.986585 --- SIGALRM (Alarm clock) @ 0 (0) ---    <--- timer expires
> 08:04:34.986661 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
> 08:04:34.986710 rt_sigreturn(0x2758ad0) = 0
> 
>                 v--- we re-enter the guest without rearming the timer!
> 08:04:34.986754 ioctl(11, KVM_RUN^C <unfinished ...>
> [QEMU hang, 100% CPU]
> 
> So dynticks fails to rearm the timer before we enter the guest.  This
> is a race condition: we check that there is already a timer scheduled
> and head on towards re-entering the guest, the timer expires before we
> enter the guest, we re-enter the guest without realizing the timer has
> expired.  Now we're inside the guest without the hope of a timer
> expiring - and the guest is running a CPU-bound workload that doesn't
> need to perform I/O.
> 
> The result is a hung QEMU (screen does not update) and a softlockup
> inside the guest once we do kick it to life again (by detaching
> strace).
> 
> I think the only way to avoid this race condition in dynticks is to
> mask SIGALRM, then check if the timer expired, and then ioctl(KVM_RUN)
> with atomic signal mask change back to SIGALRM enabled.  Thoughts?
> 
> Back to virtio-ioeventfd, we really shouldn't use virtio-ioeventfd
> when there is no I/O thread.

Can we make it work with SIGIO?

>  It doesn't make sense because there's no
> opportunity to process the virtqueue while the guest code is executing
> in parallel like there is with I/O thread.  It will just degrade
> performance when QEMU only has one thread.

Probably. But it's really better to check this than theorethise about
it.

>  I'll send a patch to
> disable it when we build without I/O thread.
> 
> Stefan

next prev parent reply	other threads:[~2011-01-25 11:28 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-12 15:02 [Qemu-devel] [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 1/4] virtio-pci: Rename bugs field to flags Stefan Hajnoczi
2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify Stefan Hajnoczi
2011-01-24 18:54   ` Kevin Wolf
2011-01-24 19:36     ` Michael S. Tsirkin
2011-01-24 19:48       ` Kevin Wolf
2011-01-24 19:47         ` Michael S. Tsirkin
2011-01-24 20:05           ` Kevin Wolf
2011-01-25  7:12             ` Stefan Hajnoczi
2011-01-25  9:49               ` Stefan Hajnoczi
2011-01-25  9:54                 ` Stefan Hajnoczi
2011-01-25 11:27                 ` Michael S. Tsirkin [this message]
2011-01-25 13:20                   ` Stefan Hajnoczi
2011-01-25 14:07                     ` Stefan Hajnoczi
2011-01-25 19:18                 ` Anthony Liguori
2011-01-25 19:45                   ` Stefan Hajnoczi
2011-01-25 19:51                     ` Anthony Liguori
2011-01-25 19:59                       ` Stefan Hajnoczi
2011-01-26  0:18                         ` Anthony Liguori
2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 3/4] virtio-pci: Don't use ioeventfd on old kernels Stefan Hajnoczi
2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 4/4] docs: Document virtio PCI -device ioeventfd=on|off Stefan Hajnoczi
2010-12-12 15:14 ` [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
2010-12-12 20:41 ` Michael S. Tsirkin
2010-12-12 20:42   ` Michael S. Tsirkin
2010-12-12 20:56     ` Michael S. Tsirkin
2010-12-12 21:09       ` Michael S. Tsirkin
2010-12-13 10:24         ` Stefan Hajnoczi
2010-12-13 10:38           ` Michael S. Tsirkin
2010-12-13 13:11             ` Stefan Hajnoczi
2010-12-13 13:35               ` Michael S. Tsirkin
2010-12-13 13:36                 ` Michael S. Tsirkin
2010-12-13 14:06                   ` Stefan Hajnoczi
2010-12-13 15:27                   ` Stefan Hajnoczi
2010-12-13 16:00                     ` Michael S. Tsirkin
2010-12-13 16:29                       ` Stefan Hajnoczi
2010-12-13 16:30                         ` Michael S. Tsirkin
2010-12-13 16:12                     ` Michael S. Tsirkin
2010-12-13 16:28                       ` Stefan Hajnoczi
2010-12-13 17:57                         ` Stefan Hajnoczi
2010-12-13 18:52                           ` Michael S. Tsirkin
2010-12-15 11:42                             ` Stefan Hajnoczi
2010-12-15 11:48                               ` Stefan Hajnoczi
2010-12-15 12:00                                 ` Michael S. Tsirkin
2010-12-15 12:14                               ` Michael S. Tsirkin
2010-12-15 12:59                                 ` Stefan Hajnoczi
2010-12-16 16:40                                   ` Stefan Hajnoczi
2010-12-16 23:39                                     ` Michael S. Tsirkin
2010-12-19 14:49                                   ` Michael S. Tsirkin
2011-01-06 16:41                                     ` Stefan Hajnoczi
2011-01-06 17:04                                       ` Michael S. Tsirkin
2011-01-06 18:00                                       ` Michael S. Tsirkin
2011-01-07  8:56                                         ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110125112746.GA3575@redhat.com \
    --to=mst@redhat.com \
    --cc=aliguori@us.ibm.com \
    --cc=avi@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).