From: Hans de Goede <hdegoede@redhat.com>
To: Alex Bligh <alex@alex.org.uk>
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] main-loop: Don't lock starve io-threads when main_loop_tlg has pending events
Date: Tue, 08 Oct 2013 22:16:50 +0200 [thread overview]
Message-ID: <52546832.9040900@redhat.com> (raw)
In-Reply-To: <6CD4D4EE-341E-48D0-98F4-D55C0D3922D4@alex.org.uk>
Hi,
On 10/08/2013 10:01 PM, Alex Bligh wrote:
<snip>
>> The purpose of the 1 ns timeout is to cause os_host_main_loop_wait
>> to unlock the iothread, as $subject says the problem I'm seeing seems
>> to be lock starvation not cpu starvation.
>>
>> Note as I already indicated I'm in no way an expert in this, if you
>> and or Paolo suspect cpu starvation may happen too, then bumping
>> the timeout to 250 us is fine with me too.
>>
>> If we go with 250 us that thus pose the question though if we should
>> always keep a minimum timeout of 250 us when not non-blocking, or only
>> bump it to 250 us when main_loop_tlg has already expired events and
>> thus is causing a timeout of 0.
>
> I am by no means an expert in the iothread bit, so let's pool our
> ignorance ... :-)
>
> Somewhere within that patch series (7b595f35 I think) I fixed up
> the spin counter bit, which made it slightly less yucky and work
> with milliseconds. I hope I didn't break it but there seems
> something slightly odd about the use case here.
>
> If you are getting the spin error, this implies something is
> pretty much constantly polling os_host_main_loop_wait with a
> zero timeout. As you point out this is going to be main_loop_wait
> and almost certainly main_loop_wait called with nonblocking
> set to 1.
No, it is calling main_loop_wait with nonblocking set to 0, so
normally the lock would get released. But
timerlistgroup_deadline_ns(&main_loop_tlg) is returning 0,
causing timeout_ns to be 0, and this causes the lock to not get
released.
I'm quite sure this is what is happing because once my
bisect pointed to the "aio / timers: Convert mainloop to use timeout"
commit as a culprit, I read that commit very carefully multiple
times and that seemed like the only problem it could cause,
so I added a debug printf to test for that case, and it triggered.
What I believe is happening in my troublesome scenario is that one
thread is calling main_loop_wait(0) repeatedly, waiting for another
thread to do some work (*), but that other thread is not getting a
chance to do that work because the iothread never gets unlocked.
*) likely the spice-server thread which does a lot of work for
the qxl device
>
> The comment at line 208 suggests that "the I/O thread is very busy
> or we are incorrectly busy waiting in the I/O thread". Do we know
> which is happening? Perhaps rather than give up the io_thread
> mutex on every call (which is in practice what a 1 nanosecond
> timeout does) we should give it up if we have not released
> it for X nanoseconds (maybe X=250us), or on every Y calls. I think
> someone other than me should consider the effect of dropping and
> reacquiring a mutex so frequently under heavy I/O load, but I'm not
> sure it's a great idea.
We're only waiting so short if there are timers which want to run
immediately, normally we would wait a lot longer.
> So on reflection you might be more right with 1 nanosecond than
> 250us as a timeout of 250us, but I wonder whether a strategy
> of just dropping the lock occasionally (and still using a zero
> timeout) might be better.
Paolo probably has some better insights on this, but he seems to
have called it a day for today, and I'm going to do the same :)
So lets continue this tomorrow.
Regards,
Hans
next prev parent reply other threads:[~2013-10-08 20:17 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-08 19:10 [Qemu-devel] [PATCH] main-loop: Don't lock starve io-threads when main_loop_tlg has pending events Hans de Goede
2013-10-08 19:13 ` Paolo Bonzini
2013-10-08 19:21 ` Hans de Goede
2013-10-08 19:33 ` Alex Bligh
2013-10-08 19:41 ` Hans de Goede
2013-10-08 20:01 ` Alex Bligh
2013-10-08 20:07 ` Alex Bligh
2013-10-08 20:16 ` Hans de Goede [this message]
2013-10-08 20:32 ` Alex Bligh
2013-10-08 20:50 ` Paolo Bonzini
2013-10-09 12:58 ` Hans de Goede
2013-10-09 13:18 ` Alex Bligh
2013-10-09 18:03 ` Hans de Goede
2013-10-09 18:15 ` Hans de Goede
2013-10-09 18:28 ` Alex Bligh
2013-10-09 18:36 ` Alex Bligh
2013-10-09 18:49 ` Hans de Goede
2013-10-09 19:03 ` Paolo Bonzini
2013-10-09 19:15 ` Hans de Goede
2013-10-09 14:37 ` Paolo Bonzini
2013-10-09 16:19 ` Alex Bligh
2013-10-09 16:26 ` Paolo Bonzini
2013-10-09 16:33 ` Alex Bligh
2013-10-09 17:53 ` Hans de Goede
2013-10-09 18:09 ` Hans de Goede
2013-10-08 19:48 ` Alex Bligh
2013-10-08 20:01 ` Hans de Goede
2013-10-08 21:25 ` Alex Bligh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52546832.9040900@redhat.com \
--to=hdegoede@redhat.com \
--cc=alex@alex.org.uk \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.