From: Hans de Goede <hdegoede@redhat.com>
To: Alex Bligh <alex@alex.org.uk>
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] main-loop: Don't lock starve io-threads when main_loop_tlg has pending events
Date: Tue, 08 Oct 2013 22:16:50 +0200 [thread overview]
Message-ID: <52546832.9040900@redhat.com> (raw)
In-Reply-To: <6CD4D4EE-341E-48D0-98F4-D55C0D3922D4@alex.org.uk>
Hi,
On 10/08/2013 10:01 PM, Alex Bligh wrote:
<snip>
>> The purpose of the 1 ns timeout is to cause os_host_main_loop_wait
>> to unlock the iothread, as $subject says the problem I'm seeing seems
>> to be lock starvation not cpu starvation.
>>
>> Note as I already indicated I'm in no way an expert in this, if you
>> and or Paolo suspect cpu starvation may happen too, then bumping
>> the timeout to 250 us is fine with me too.
>>
>> If we go with 250 us that thus pose the question though if we should
>> always keep a minimum timeout of 250 us when not non-blocking, or only
>> bump it to 250 us when main_loop_tlg has already expired events and
>> thus is causing a timeout of 0.
>
> I am by no means an expert in the iothread bit, so let's pool our
> ignorance ... :-)
>
> Somewhere within that patch series (7b595f35 I think) I fixed up
> the spin counter bit, which made it slightly less yucky and work
> with milliseconds. I hope I didn't break it but there seems
> something slightly odd about the use case here.
>
> If you are getting the spin error, this implies something is
> pretty much constantly polling os_host_main_loop_wait with a
> zero timeout. As you point out this is going to be main_loop_wait
> and almost certainly main_loop_wait called with nonblocking
> set to 1.
No, it is calling main_loop_wait with nonblocking set to 0, so
normally the lock would get released. But
timerlistgroup_deadline_ns(&main_loop_tlg) is returning 0,
causing timeout_ns to be 0, and this causes the lock to not get
released.
I'm quite sure this is what is happing because once my
bisect pointed to the "aio / timers: Convert mainloop to use timeout"
commit as a culprit, I read that commit very carefully multiple
times and that seemed like the only problem it could cause,
so I added a debug printf to test for that case, and it triggered.
What I believe is happening in my troublesome scenario is that one
thread is calling main_loop_wait(0) repeatedly, waiting for another
thread to do some work (*), but that other thread is not getting a
chance to do that work because the iothread never gets unlocked.
*) likely the spice-server thread which does a lot of work for
the qxl device
>
> The comment at line 208 suggests that "the I/O thread is very busy
> or we are incorrectly busy waiting in the I/O thread". Do we know
> which is happening? Perhaps rather than give up the io_thread
> mutex on every call (which is in practice what a 1 nanosecond
> timeout does) we should give it up if we have not released
> it for X nanoseconds (maybe X=250us), or on every Y calls. I think
> someone other than me should consider the effect of dropping and
> reacquiring a mutex so frequently under heavy I/O load, but I'm not
> sure it's a great idea.
We're only waiting so short if there are timers which want to run
immediately, normally we would wait a lot longer.
> So on reflection you might be more right with 1 nanosecond than
> 250us as a timeout of 250us, but I wonder whether a strategy
> of just dropping the lock occasionally (and still using a zero
> timeout) might be better.
Paolo probably has some better insights on this, but he seems to
have called it a day for today, and I'm going to do the same :)
So lets continue this tomorrow.
Regards,
Hans
next prev parent reply other threads:[~2013-10-08 20:17 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-08 19:10 [Qemu-devel] [PATCH] main-loop: Don't lock starve io-threads when main_loop_tlg has pending events Hans de Goede
2013-10-08 19:13 ` Paolo Bonzini
2013-10-08 19:21 ` Hans de Goede
2013-10-08 19:33 ` Alex Bligh
2013-10-08 19:41 ` Hans de Goede
2013-10-08 20:01 ` Alex Bligh
2013-10-08 20:07 ` Alex Bligh
2013-10-08 20:16 ` Hans de Goede [this message]
2013-10-08 20:32 ` Alex Bligh
2013-10-08 20:50 ` Paolo Bonzini
2013-10-09 12:58 ` Hans de Goede
2013-10-09 13:18 ` Alex Bligh
2013-10-09 18:03 ` Hans de Goede
2013-10-09 18:15 ` Hans de Goede
2013-10-09 18:28 ` Alex Bligh
2013-10-09 18:36 ` Alex Bligh
2013-10-09 18:49 ` Hans de Goede
2013-10-09 19:03 ` Paolo Bonzini
2013-10-09 19:15 ` Hans de Goede
2013-10-09 14:37 ` Paolo Bonzini
2013-10-09 16:19 ` Alex Bligh
2013-10-09 16:26 ` Paolo Bonzini
2013-10-09 16:33 ` Alex Bligh
2013-10-09 17:53 ` Hans de Goede
2013-10-09 18:09 ` Hans de Goede
2013-10-08 19:48 ` Alex Bligh
2013-10-08 20:01 ` Hans de Goede
2013-10-08 21:25 ` Alex Bligh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52546832.9040900@redhat.com \
--to=hdegoede@redhat.com \
--cc=alex@alex.org.uk \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).