From: "Michael S. Tsirkin" <mst@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
Anthony Liguori <aliguori@us.ibm.com>,
Avi Kivity <avi@redhat.com>,
qemu-devel@nongnu.org,
Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
Date: Tue, 25 Jan 2011 13:27:46 +0200 [thread overview]
Message-ID: <20110125112746.GA3575@redhat.com> (raw)
In-Reply-To: <AANLkTi=b6qMRNk9FveYhGviRXyOXBVu9kpZ2wuBTgfpn@mail.gmail.com>
On Tue, Jan 25, 2011 at 09:49:04AM +0000, Stefan Hajnoczi wrote:
> On Tue, Jan 25, 2011 at 7:12 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Mon, Jan 24, 2011 at 8:05 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> >> Am 24.01.2011 20:47, schrieb Michael S. Tsirkin:
> >>> On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote:
> >>>> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin:
> >>>>> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
> >>>>>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
> >>>>>>> Virtqueue notify is currently handled synchronously in userspace virtio. This
> >>>>>>> prevents the vcpu from executing guest code while hardware emulation code
> >>>>>>> handles the notify.
> >>>>>>>
> >>>>>>> On systems that support KVM, the ioeventfd mechanism can be used to make
> >>>>>>> virtqueue notify a lightweight exit by deferring hardware emulation to the
> >>>>>>> iothread and allowing the VM to continue execution. This model is similar to
> >>>>>>> how vhost receives virtqueue notifies.
> >>>>>>>
> >>>>>>> The result of this change is improved performance for userspace virtio devices.
> >>>>>>> Virtio-blk throughput increases especially for multithreaded scenarios and
> >>>>>>> virtio-net transmit throughput increases substantially.
> >>>>>>>
> >>>>>>> Some virtio devices are known to have guest drivers which expect a notify to be
> >>>>>>> processed synchronously and spin waiting for completion. Only enable ioeventfd
> >>>>>>> for virtio-blk and virtio-net for now.
> >>>>>>>
> >>>>>>> Care must be taken not to interfere with vhost-net, which uses host
> >>>>>>> notifiers. If the set_host_notifier() API is used by a device
> >>>>>>> virtio-pci will disable virtio-ioeventfd and let the device deal with
> >>>>>>> host notifiers as it wishes.
> >>>>>>>
> >>>>>>> After migration and on VM change state (running/paused) virtio-ioeventfd
> >>>>>>> will enable/disable itself.
> >>>>>>>
> >>>>>>> * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd
> >>>>>>> * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd
> >>>>>>> * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd
> >>>>>>> * vm_change_state(running=0) -> disable virtio-ioeventfd
> >>>>>>> * vm_change_state(running=1) -> enable virtio-ioeventfd
> >>>>>>>
> >>>>>>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> >>>>>>
> >>>>>> On current git master I'm getting hangs when running iozone on a
> >>>>>> virtio-blk disk. "Hang" means that it's not responsive any more and has
> >>>>>> 100% CPU consumption.
> >>>>>>
> >>>>>> I bisected the problem to this patch. Any ideas?
> >>>>>>
> >>>>>> Kevin
> >>>>>
> >>>>> Does it help if you set ioeventfd=off on command line?
> >>>>
> >>>> Yes, with ioeventfd=off it seems to work fine.
> >>>>
> >>>> Kevin
> >>>
> >>> Then it's the ioeventfd that is to blame.
> >>> Is it the io thread that consumes 100% CPU?
> >>> Or the vcpu thread?
> >>
> >> I was building with the default options, i.e. there is no IO thread.
> >>
> >> Now I'm just running the test with IO threads enabled, and so far
> >> everything looks good. So I can only reproduce the problem with IO
> >> threads disabled.
> >
> > Hrm...aio uses SIGUSR2 to force the vcpu to process aio completions
> > (relevant when --enable-io-thread is not used). I will take a look at
> > that again and see why we're spinning without checking for ioeventfd
> > completion.
>
> Here's my understanding of --disable-io-thread. Added Anthony on CC,
> please correct me.
>
> When I/O thread is disabled our only thread runs guest code until an
> exit request is made. There are synchronous exit cases like a halt
> instruction or single step. There are also asynchronous exit cases
> when signal handlers use qemu_notify_event(), which does cpu_exit(),
> to set env->exit_request = 1 and unlink the current tb.
>
> With this structure in mind, anything which needs to interrupt the
> vcpu in order to process events must use signals and
> qemu_notify_event(). Otherwise that event source may be starved and
> never processed.
>
> virtio-ioeventfd currently does not use signals and will therefore
> never interrupt the vcpu.
>
> However, you normally don't notice the missing signal handler because
> some other event interrupts the vcpu and we enter select(2) to process
> all pending handlers. So virtio-ioeventfd mostly gets a free ride on
> top of timer events. This is suboptimal because it adds latency to
> virtqueue kick - we're waiting for another event to interrupt the vcpu
> before we can process virtqueue-kick.
>
> If any other vcpu interruption makes virtio-ioeventfd chug along then
> why are you seeing 100% CPU livelock? My theory is that dynticks has
> a race condition which causes timers to stop working in QEMU. Here is
> an strace of QEMU --disable-io-thread entering live lock. I can
> trigger this by starting a VM and running "while true; do true; done"
> at the shell. Then strace the QEMU process:
>
> 08:04:34.985177 ioctl(11, KVM_RUN, 0) = 0
> 08:04:34.985242 --- SIGALRM (Alarm clock) @ 0 (0) ---
> 08:04:34.985319 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
> 08:04:34.985368 rt_sigreturn(0x2758ad0) = 0
> 08:04:34.985423 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 0})
> 08:04:34.985484 read(5, "\1\0\0\0\0\0\0\0", 512) = 8
> 08:04:34.985538 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
> 08:04:34.985588 timer_settime(0, 0, {it_interval={0, 0}, it_value={0,
> 273000}}, NULL) = 0
> 08:04:34.985646 ioctl(11, KVM_RUN, 0) = -1 EINTR (Interrupted system call)
> 08:04:34.985928 --- SIGALRM (Alarm clock) @ 0 (0) ---
> 08:04:34.986007 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
> 08:04:34.986063 rt_sigreturn(0x2758ad0) = -1 EINTR (Interrupted system call)
> 08:04:34.986124 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 0})
> 08:04:34.986188 read(5, "\1\0\0\0\0\0\0\0", 512) = 8
> 08:04:34.986246 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
> 08:04:34.986299 timer_settime(0, 0, {it_interval={0, 0}, it_value={0,
> 250000}}, NULL) = 0
> 08:04:34.986359 ioctl(11, KVM_INTERRUPT, 0x7fff90404ef0) = 0
> 08:04:34.986406 ioctl(11, KVM_RUN, 0) = 0
> 08:04:34.986465 ioctl(11, KVM_RUN, 0) = 0 <--- guest
> finishes execution
>
> v--- dynticks_rearm_timer() returns early because
> timer is already scheduled
> 08:04:34.986533 timer_gettime(0, {it_interval={0, 0}, it_value={0, 24203}}) = 0
> 08:04:34.986585 --- SIGALRM (Alarm clock) @ 0 (0) --- <--- timer expires
> 08:04:34.986661 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
> 08:04:34.986710 rt_sigreturn(0x2758ad0) = 0
>
> v--- we re-enter the guest without rearming the timer!
> 08:04:34.986754 ioctl(11, KVM_RUN^C <unfinished ...>
> [QEMU hang, 100% CPU]
>
> So dynticks fails to rearm the timer before we enter the guest. This
> is a race condition: we check that there is already a timer scheduled
> and head on towards re-entering the guest, the timer expires before we
> enter the guest, we re-enter the guest without realizing the timer has
> expired. Now we're inside the guest without the hope of a timer
> expiring - and the guest is running a CPU-bound workload that doesn't
> need to perform I/O.
>
> The result is a hung QEMU (screen does not update) and a softlockup
> inside the guest once we do kick it to life again (by detaching
> strace).
>
> I think the only way to avoid this race condition in dynticks is to
> mask SIGALRM, then check if the timer expired, and then ioctl(KVM_RUN)
> with atomic signal mask change back to SIGALRM enabled. Thoughts?
>
> Back to virtio-ioeventfd, we really shouldn't use virtio-ioeventfd
> when there is no I/O thread.
Can we make it work with SIGIO?
> It doesn't make sense because there's no
> opportunity to process the virtqueue while the guest code is executing
> in parallel like there is with I/O thread. It will just degrade
> performance when QEMU only has one thread.
Probably. But it's really better to check this than theorethise about
it.
> I'll send a patch to
> disable it when we build without I/O thread.
>
> Stefan
next prev parent reply other threads:[~2011-01-25 11:28 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-12 15:02 [Qemu-devel] [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 1/4] virtio-pci: Rename bugs field to flags Stefan Hajnoczi
2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify Stefan Hajnoczi
2011-01-24 18:54 ` Kevin Wolf
2011-01-24 19:36 ` Michael S. Tsirkin
2011-01-24 19:48 ` Kevin Wolf
2011-01-24 19:47 ` Michael S. Tsirkin
2011-01-24 20:05 ` Kevin Wolf
2011-01-25 7:12 ` Stefan Hajnoczi
2011-01-25 9:49 ` Stefan Hajnoczi
2011-01-25 9:54 ` Stefan Hajnoczi
2011-01-25 11:27 ` Michael S. Tsirkin [this message]
2011-01-25 13:20 ` Stefan Hajnoczi
2011-01-25 14:07 ` Stefan Hajnoczi
2011-01-25 19:18 ` Anthony Liguori
2011-01-25 19:45 ` Stefan Hajnoczi
2011-01-25 19:51 ` Anthony Liguori
2011-01-25 19:59 ` Stefan Hajnoczi
2011-01-26 0:18 ` Anthony Liguori
2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 3/4] virtio-pci: Don't use ioeventfd on old kernels Stefan Hajnoczi
2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 4/4] docs: Document virtio PCI -device ioeventfd=on|off Stefan Hajnoczi
2010-12-12 15:14 ` [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
2010-12-12 20:41 ` Michael S. Tsirkin
2010-12-12 20:42 ` Michael S. Tsirkin
2010-12-12 20:56 ` Michael S. Tsirkin
2010-12-12 21:09 ` Michael S. Tsirkin
2010-12-13 10:24 ` Stefan Hajnoczi
2010-12-13 10:38 ` Michael S. Tsirkin
2010-12-13 13:11 ` Stefan Hajnoczi
2010-12-13 13:35 ` Michael S. Tsirkin
2010-12-13 13:36 ` Michael S. Tsirkin
2010-12-13 14:06 ` Stefan Hajnoczi
2010-12-13 15:27 ` Stefan Hajnoczi
2010-12-13 16:00 ` Michael S. Tsirkin
2010-12-13 16:29 ` Stefan Hajnoczi
2010-12-13 16:30 ` Michael S. Tsirkin
2010-12-13 16:12 ` Michael S. Tsirkin
2010-12-13 16:28 ` Stefan Hajnoczi
2010-12-13 17:57 ` Stefan Hajnoczi
2010-12-13 18:52 ` Michael S. Tsirkin
2010-12-15 11:42 ` Stefan Hajnoczi
2010-12-15 11:48 ` Stefan Hajnoczi
2010-12-15 12:00 ` Michael S. Tsirkin
2010-12-15 12:14 ` Michael S. Tsirkin
2010-12-15 12:59 ` Stefan Hajnoczi
2010-12-16 16:40 ` Stefan Hajnoczi
2010-12-16 23:39 ` Michael S. Tsirkin
2010-12-19 14:49 ` Michael S. Tsirkin
2011-01-06 16:41 ` Stefan Hajnoczi
2011-01-06 17:04 ` Michael S. Tsirkin
2011-01-06 18:00 ` Michael S. Tsirkin
2011-01-07 8:56 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110125112746.GA3575@redhat.com \
--to=mst@redhat.com \
--cc=aliguori@us.ibm.com \
--cc=avi@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
--cc=stefanha@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).