qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Hans de Goede <hdegoede@redhat.com>
To: Alex Bligh <alex@alex.org.uk>
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] main-loop: Don't lock starve io-threads when main_loop_tlg has pending events
Date: Tue, 08 Oct 2013 22:16:50 +0200	[thread overview]
Message-ID: <52546832.9040900@redhat.com> (raw)
In-Reply-To: <6CD4D4EE-341E-48D0-98F4-D55C0D3922D4@alex.org.uk>

Hi,

On 10/08/2013 10:01 PM, Alex Bligh wrote:

<snip>

>> The purpose of the 1 ns timeout is to cause os_host_main_loop_wait
>> to unlock the iothread, as $subject says the problem I'm seeing seems
>> to be lock starvation not cpu starvation.
>>
>> Note as I already indicated I'm in no way an expert in this, if you
>> and or Paolo suspect cpu starvation may happen too, then bumping
>> the timeout to 250 us is fine with me too.
>>
>> If we go with 250 us that thus pose the question though if we should
>> always keep a minimum timeout of 250 us when not non-blocking, or only
>> bump it to 250 us when main_loop_tlg has already expired events and
>> thus is causing a timeout of 0.
>
> I am by no means an expert in the iothread bit, so let's pool our
> ignorance ... :-)
>
> Somewhere within that patch series (7b595f35 I think) I fixed up
> the spin counter bit, which made it slightly less yucky and work
> with milliseconds. I hope I didn't break it but there seems
> something slightly odd about the use case here.
>
> If you are getting the spin error, this implies something is
> pretty much constantly polling os_host_main_loop_wait with a
> zero timeout. As you point out this is going to be main_loop_wait
> and almost certainly main_loop_wait called with nonblocking
> set to 1.

No, it is calling main_loop_wait with nonblocking set to 0, so
normally the lock would get released. But
timerlistgroup_deadline_ns(&main_loop_tlg) is returning 0,
causing timeout_ns to be 0, and this causes the lock to not get
released.

I'm quite sure this is what is happing because once my
bisect pointed to the "aio / timers: Convert mainloop to use timeout"
commit as a culprit, I read that commit very carefully multiple
times and that seemed like the only problem it could cause,
so I added a debug printf to test for that case, and it triggered.

What I believe is happening in my troublesome scenario is that one
thread is calling main_loop_wait(0) repeatedly, waiting for another
thread to do some work (*), but that other thread is not getting a
chance to do that work because the iothread never gets unlocked.

*) likely the spice-server thread which does a lot of work for
the qxl device


>
> The comment at line 208 suggests that "the I/O thread is very busy
> or we are incorrectly busy waiting in the I/O thread". Do we know
> which is happening? Perhaps rather than give up the io_thread
> mutex on every call (which is in practice what a 1 nanosecond
> timeout does) we should give it up if we have not released
> it for X nanoseconds (maybe X=250us), or on every Y calls. I think
> someone other than me should consider the effect of dropping and
> reacquiring a mutex so frequently under heavy I/O load, but I'm not
> sure it's a great idea.

We're only waiting so short if there are timers which want to run
immediately, normally we would wait a lot longer.

> So on reflection you might be more right with 1 nanosecond than
> 250us as a timeout of 250us, but I wonder whether a strategy
> of just dropping the lock occasionally (and still using a zero
> timeout) might be better.

Paolo probably has some better insights on this, but he seems to
have called it a day for today, and I'm going to do the same :)

So lets continue this tomorrow.

Regards,

Hans

  parent reply	other threads:[~2013-10-08 20:17 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-08 19:10 [Qemu-devel] [PATCH] main-loop: Don't lock starve io-threads when main_loop_tlg has pending events Hans de Goede
2013-10-08 19:13 ` Paolo Bonzini
2013-10-08 19:21   ` Hans de Goede
2013-10-08 19:33     ` Alex Bligh
2013-10-08 19:41       ` Hans de Goede
2013-10-08 20:01         ` Alex Bligh
2013-10-08 20:07           ` Alex Bligh
2013-10-08 20:16           ` Hans de Goede [this message]
2013-10-08 20:32             ` Alex Bligh
2013-10-08 20:50             ` Paolo Bonzini
2013-10-09 12:58               ` Hans de Goede
2013-10-09 13:18                 ` Alex Bligh
2013-10-09 18:03                   ` Hans de Goede
2013-10-09 18:15                     ` Hans de Goede
2013-10-09 18:28                     ` Alex Bligh
2013-10-09 18:36                       ` Alex Bligh
2013-10-09 18:49                         ` Hans de Goede
2013-10-09 19:03                           ` Paolo Bonzini
2013-10-09 19:15                             ` Hans de Goede
2013-10-09 14:37                 ` Paolo Bonzini
2013-10-09 16:19                   ` Alex Bligh
2013-10-09 16:26                     ` Paolo Bonzini
2013-10-09 16:33                       ` Alex Bligh
2013-10-09 17:53                       ` Hans de Goede
2013-10-09 18:09                   ` Hans de Goede
2013-10-08 19:48 ` Alex Bligh
2013-10-08 20:01   ` Hans de Goede
2013-10-08 21:25     ` Alex Bligh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52546832.9040900@redhat.com \
    --to=hdegoede@redhat.com \
    --cc=alex@alex.org.uk \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).